ORIGINAL RESEARCH

Genetic determinants of the response to coronavirus infection COVID-19

About authors

Center for Strategic Planning and Management of Medical and Biological Health Risks, Moscow, Russia

Received: 2020-07-01 Accepted: 2020-07-17 Published online: 2020-07-27
|

The novel infection caused by the SARS-CoV-2 coronavirus has become one of the global challenges to mankind in the 21st century. Since mid-February 2020 the infection started spreading quickly across the globe. In March the WHO announced the COVID-19 outbreak a pandemic. By June (at the time of writing), over 13 million confirmed cases were recorded, more than half a million infected people died.

The causative agent for COVID-19 is the novel SARS-CoV-2 virus belonging to the betacoronaviruses group, also comprising SARS-CoV. In 2002, SARS-CoV caused an outbreak of coronavirus infection resulting in atypical pneumonia and severe acute respiratory syndrome (SARS).

High transmission rate and tissue tropism make SARS-CoV-2 a dangerous pathogen. Furthermore, the virus is capable of inflicting damage to other tissues and organs (blood vessels, kidney, central nervous system, intestines). One of the coronavirus infection properties is a markedly varied clinical course: from asymptomatic to extremely severe, associated with multiple organ failure. Presumably, the clinical course of COVID-19 in each individual patient result from the genetic characteristics of both patient and virus determining the nature of their interaction. Identification of these characteristics will make it possible to develop the COVID-19 complications risk stratification model and will become a basis for the tailor-made prevention and treatment of the infection caused by SARS-CoV-2.

1. SARS-CoV-2 genomic structure

The SARS-CoV-2 coronavirus genome is ~ 30,000 nucleotides long. Unlike other highly pathogenic viruses causing severe disease in humans (SARS-CoV and MERS-CoV) this virus has a higher rate of transmission. The origin of the virus still remains unclear. The comparative data analysis has shown that SARS-CoV-2 could emerge due to recombination between the pangolin coronavirus Pangolin-CoV and the bat coronavirus RaTG13 [1]. The receptor binding domain of the Pangolin-CoV spike (S) protein has high sequence similarity with SARS-CoV-2: six inputs of the virus responsible for the major cell receptor binding have essentially identical sequences.  The primary amino acid sequence of the novel coronavirus receptor binding domain is different from those of SARS-CoV, therefore, the receptor binding affinity of the SARS-CoV S protein is 10 times higher compared to the SARS-CoV S protein [2].

2. SARS-CoV-2 life cycle

SARS-CoV-2 employs a number of receptors for cellular entry. The angiotensin-converting enzyme 2 (ACE2) is a major receptor for SARS-CoV-2 [3]. In addition to the major receptor, the virus may employ other cell proteins, such as CD147 and GRP78, as additional receptors. For efficient host cell penetration, the SARS-CoV-2 spike (S) protein should undergo the proteolytic activation by the following cell proteases: furin and cellular serine protease TMPRSS2 [4].

Apart from that, SARS-CoV-2 may use the endosomal way of cell entry involving the cathepsin L protease [5]. Once the virus gets inside the cell, it reprograms the host cell’s biosynthetic pathways for its own use, exploiting various cell proteins and forming the interactome (whole set of molecular interactions in a particular cell) of viral proteins and RNA with host factors [6].

3. Polymorphisms in SARS-CoV-2 genome and their impact on biological properties of the virus

Since the first SARS-CoV-2 genome sequencing until present, tens of thousands SARS-CoV-2 full-genome sequences have been obtained from different regions of the world. The data on the newly sequenced SARS-CoV-2 isolates is deposited in the GISAID dataset (https://www.gisaid.org/) containing over 63,000 SARS-COV-2 full-genome sequences (at mid-July).

Two major clades were identified based on the differences between the nucleotide sequences of the viruses which circulated in late 2019–early 2020. Clade I included subclades characterized by amino acid substitutions in the proteins ORF3a: p.251G>V or S: p.614D>G. Clade II was distinguished from clade I by the following mutations: substitution in protein ORF8: p.84L>S (28144T>C) and protein ORF1ab: p.2839S (8782C>T) [7].

The S protein mutation characterized by aspartic acid to glycine shift at the amino acid position 614 (614G) related to clade I attracted more and more attention as more data were accumulated. The explosive outbreak of the described variant of the virus was observed in Italy in late February.  Currently, the viruses carrying mutation G614 are widespread all over the world. While in March the described substitution rate in viral genomes was 26%, in April it was 65%, and in May the mutation rate reached 70%. The G614 genotype may be associated with higher viral load in infected patients resulting in higher transmissibility of the virus.  Currently, the role of the G614 variant, and its biological properties (including transmissibility) are being actively studied [8].

During the pandemic, various polymorphisms in both structural and non-structural proteins possibly affecting the biological properties of the virus were reported. For example, the nucleocapsid (N) protein polymorphism at positions 203 and 204 (R203K/G204R) was able to reduce the binding of antigenic peptide to HLA-C*07 prevailing in European population [9]. It has been reported that mutation N501T in the SARS-CoV-2 S protein can significantly enhance binding to ACE2 [10]. Based on 10 most common mutations (mutation rate over 5%), the SARS-CoV-2 genomes can be divided into several major groups:

  • Group 1 carries both a missense mutation (ORF8:c.251tTa>tCa) and a synonymous mutation (orf1ab:c.8517agC>agT).
  • Group 2 carries three mutations, including the missense variant S (c.1841gAt>gGt), the ORF1AB upstream gene variant and the synonymous variant ORF1AB: c.2772ttC>ttT.
  • Group 3 carries a nucleotide substitution (orf1ab:c.10818ttG>ttT).
  • Group 4 carries a new missense mutation (ORF3a:c.752gGt>gTt) first detected in China.

Isolates from France and other countries carry mutations ORF3a: c.752gGt>gTt, often tohether with mutation S: c.1099Gtc>Ttc [11]. To date, hundreds of SARS-CoV-2 gene polymorphisms have been reported, and the new polymorphisms are still being identified. This may indicate the continuous process of evolution and adaptation of the virus to new host species.

4. Pleiotropic spectrum of COVID-19 manifestations is associated with distinct human genome features

One of the major COVID-19 infection features is a markedly varied clinical course: from asymptomatic patients to patients with acute respiratory distress syndrome (ARDS) and multiple organ failure.  Such clinical manifestations diversity could be hardly explained by the variability of the virus, taking into account its negligible genetic variability compared to other RNA viruses.  Many studies have been conducted aimed at the search for host factors essential for the life cycle of the virus, especially for the host cell entry. For example, the studies of the ACE2 and TMPRSS2 receptors expression patterns in various tissues and organs were carried out at single-cell resolution, which demonstrated that the described receptors expression could be observed not only in the cells of respiratory epithelium and lungs, but also in the intestinal epithelial cells,  cardiomyocytes, hepatocytes and neurons [12]. Apparently, specific expression patterns of these receptors may determine the COVID-19 course.

The course of COVID-19 may be also defined by many other factors, such as comorbidities and environmental factors. It is hypothesised that genetic determinants (sets of gene variants responsible for body’s response to SARS-CoV-2 infection) play a vital part in susceptibility to the virus.

5. Genetic determinants of susceptibility to COVID-19

Expression of receptors and host factors essential to follow the basic viral life cycle stages is a major factor in the body’s susceptibility to coronavirus. The presence of receptor together with co-receptors is important for the effective viral penetration into the target cell. Thus, co-expression of ACE2 and TMPRSS2 may define the target cells for coronavirus. A number of studies using scRNA have demonstrated that different ACE2 and TMPRSS2 co-expression patterns are observed in various human cells, tissues and organs. This can explain the heterogeneity of the COVID-19 clinical manifestation, when the pathogenesis involves not only lungs, but also other organs: liver, kidney, intestines, blood vessels, myocardium and brain [13, 14].

The ACE2 and TMRSS2 expression in normal cells is low. However, pulmonary disorders and exposure to pollutants and toxic chemicals are usually associated with increased expression of these receptors. Susceptibility of such cells to viral infection is higher compared to normal cells. That can explain the higher proportion of infection and more severe course of the disease in people with comorbidities.

One of the main properties of the virus is the ability to infect cells of the immune system and cause the immunodeficiency disorders [15]. Allelic variations determining the structure of proteins encoded by these genes, as well as the variants in the regulatory non-coding regions affect the expression, contribute to the body’s antiviral response and determine the severity of the disease.

The following international consortia facilitating the response to SARS-CoV-2 genetics basis research has been created for identification of such factors:
- COVID-19 Host Genetics Initiative [16];
- COVID-19 Genomics UK (COG-UK) Consortiu [17], etc.

Genes and allelic variants most probably associated with COVID-19 susceptibility and severity are listed in the table.

ACE2 gene variants and susceptibility to SARS-CoV-2

Carriers of different allelic variants in the ACE2 gene coding region demonstrate different viral spike (S) protein binding affinity. For example, alleles rs73635825 (S19P) and rs143936283 (E329G) significantly differ in the SARS-CoV-2 S protein binding affinity [17]. Both “dangerous” ACE2 alleles increasing the binding affinity of ACE2 to the S protein (S19P, I21V, E23K, K26R, T27A, N64K, T92I, Q102P and H378R) and “protective” ACE2 variants (K31R, N33I, H34R, E35K, E37K, D38V, Y50F, N51S, M62V, K68E, F72V, Y83H, G326E, G352V, D355N, Q388L and D509Y) decreasing the efficiency of receptor binding to the SARS-CoV-2 S protein have been reported [18]. The study of Italian population revealed some rare “protective” missense variants of ACE2 gene: p.Asn720Asp, p.Lys26Arg, p.Gly211Arg (MAF 0.002 to 0.015). These variants were able to interfere with binding to the viral S protein. However, it should be remembered that recent analysis of 1000 genomes from the UK Biobank revealed no relationship between the COVID-19 severity and the variants in the ACE2 and TMPRSS2 genes [19].

TMPRSS2 gene variants and susceptibility to SARS-CoV-2

The TMPRSS2 cellular serine protease is essential for the SARS-CoV-2 S protein proteolytic activation needed for host cell entry. Differential expression of TMPRSS2 may determine the tissue specific virus–host interaction playing a vital part in susceptibility to viral infection.

Thus, the lung-specific loci variants affecting the expression profiles (eQTL) associated with the TMPRSS2 expression may be responsible for different susceptibility and response to SARS-CoV-2 infection. It has been shown that the eQTL variant rs35074065 is associated with higher expression of TMPRSS2 and low expression of the interferon-induced MX1 gene [20]. A number of alleles associated with increased expression of TMPRSS2 (for example, rs2070788, rs9974589, rs7364083) are common in European population [21]. The eQTL most common in Europeans (rs8134378) located near the androgen-dependent enhancer TMPRSS2 may be associated with increased TMPRSS2 expression in men [22]. Despite the predicted associations between the ACE2 and TMPRSS2 allelic variants, the recently published report has confirmed no association between the described variants and the COVID-19 susceptibility [23].

HLA and immune response features, immunodepletion in patients with coronavirus infection

It has been reported that the ability to bind and present antigens is a key point of effective immune response mobilization.  Various MHC molecules (HLA molecules) bind to the viral proteins’ antigenic peptides emerging in different way during the disease. This may explain the differences in the ability to develop an immune response between individuals.

Carriers of the HLA-B*46:01 variant had a few predicted binding peptides for SARS-CoV-2, suggesting that individuals with that allele might be particularly vulnerable to COVID-19. For example, it was shown that in people with such genotype the SARS-CoV course was more severe [24]. Among all HLA Class 1 alleles, HLA-A*02:02, HLA-B*15:03 and HLA-C*12:03 bind to the maximum range of the SARS-CoV-2 conserved antigenic peptides. On the contrary, alleles A*25:01, B*46:01, 150 C*01:02 HLA-A, -B, and -C are responsible for binding to the minimum range of antigenic peptides.

It should be noted that when presenting the antigenic peptides of 8-13 amino acids in length, the 44 peptides are highly conserved and are found in all coronaviruses, including the common human coronaviruses (OC43, HKU1, NL63 and 229E) [24].

6. Genetic determinants of COVID-19 severity and comorbidities

As mentioned above, the heterogeneity of the COVID-19 clinical manifestation may be associated with the differences between the allelic variants of genes not required for the viral life strategy implementation. The recent genome-wide association study (GWAS) aimed at the search for relationship between genes and COVID-19 severity in patients from Italy and Spain has revealed loci and genomic variants responsible for discrimination of patients based on the disease severity. One of these is locus 3p21.31, which comprises genes SLC6A20, LZTFL1, CCR9, FYCO1, CXCR6 and XCR1. Genes of the locus 3p21.31 are associated with chemokines and the movement of immune cells toward sites of inflammation. It should be noted that gene SLC6A20 located within the described locus physically and functionally interacts with ACE2 and is able to modulate the properties of the receptor. The other detected locus (9q34) is associated with inheritance of AB0 blood type antigens [26]. The relationship between the COVID-19 severity and the rs8176747, rs41302905 and rs8176719 alleles defining the blood type in Chinese population has been previously reported. It has been shown that patients with blood type O have a decreased risk of severe infection, and patients with blood type A are vulnerable to severe COVID-19 [27].

Genetic variants involved in inflammatory response

In patients with COVID-19, a complex of factors contributes to the excessive inflammatory response and the cytokine storm syndrome.  That has been confirmed by recent histological analysis of samples acquired from the post-mortem examination of patients with fatal COVID-19. The study has demonstrated that excessive inflammatory response is the most common cause of death in COVID-19 patients [34].

One of the major clinical manifestations of pneumonia is high level of proinflammatory cytokines (IL-6, TNF-α and IL1-β) and acute phase proteins (APPs).  Allelic variants in genes responsible for inflammatory response may affect the disease severity. For IL-6, the correlation with IL-6-174C allele associated with high IL-6 level and severe course of pneumonia has been revealed in patients with severe COVID-19 [28]. Polymorphism of the C3 gene encoding the complement component 3 (C3) together with ACE1 allelic variant may also contribute to the COVID-19 severity [29].

Single nucleotide polymorphism (SNP) allele frequencies may vary among people of different ethnic groups. These are also associated with various COVID-19 susceptibility and severity. For example, the CCR5 Δ32 variant is associated with severe COVID-19 in patients of European origin [30]. The study of gene expression profiles in the infected lung cells has revealed a number of genes related to monocyte colony-stimulating factor 2 (CSF2), pro-inflammatory cytokine cascades and calcium-binding proteins S100A8 and S100A9 [31].

Genetic variants involved in coagulation pathway

The prevalence of coagulation disorders in patients with COVID-19 may be associated with gene variants involved in blood coagulation cascade. Elevated plasmin and plasminogen known to potentially promote the coronavirus S protein proteolytic activation are also associated with increased susceptibility to COVID-19 [32].

Genetic variants involved in antiviral response

The virus–host cell interaction induces the specific antiviral response by the viral RNA sensors activation, which leads to activation of the interferon synthesis pathway. The secreted interferons induce the interferon-stimulated genes (ISG) activation via interaction with receptors. This confers resistance to viral infection. The described response involves more than a hundred factors, both sensors RIG-I, MDA5, MAVS, STING, cGAS, TLR3, TLR9, TRIM25, RNF166, and effectors IFNa, IFNb, IFN-λ, OAS1, MX1 and IFITM3. The role of these genes’ allelic variants currently remains unknown. It has been shown that the variant rs12252 of the IFITM3 gene may be associated with excessive inflammatory response and severe COVID-19 [33].

Conclusion

The extensive genome-wide association studies launched by the international consortia are important steps in the process of the novel coronavirus infection pathogenesis investigation. Sample size increase together with various ethnic groups’ analysis will make it possible to identify the unique rare allelic variants responsible for susceptibility to COVID-19. Reconstructing the allelic variants’ cumulative contribution to complex gene networks regulating the antiviral response might shed some light on the COVID-19 pathogenesis and help to create a genetic risk prediction model allowing one to define the probability of severe COVID-19.

КОММЕНТАРИИ (0)