METHOD
Development of PCR test for detection of the SARS-CoV-2 genetic variants alpha, beta, gamma, delta
Centre for Strategic Planning and Management of Biomedical Health Risks of Federal Medical Biological Agency, Moscow, Russia
Correspondence should be addressed: Anna K. Shuryaeva
Pogodinskaya, 10, str. 1, 119121, Moscow; ur.zmpsc@aveayruhsa
Author contribution: Shipulin GA, Savochkina YuA, Davydova EE, Yudin SM — planning the experiment; Shipulin GA, Savochkina YuA, Yudin SM — literature analysis; Shuryaeva AK, Shivlyagina EE, Nosova AO, Luparev AR, Malova TV — experimental procedure, data interpretation; Savochkina YuA, Shivlyagina EE, Nosova AO — reagent kit development; Luparev AR — statistical analysis; Shipulin GA, Shuryaeva AK — manuscript writing and editing; Davydova EE, Yudin SM — manuscript editing.
Compliance with ethical standards: the study was performed in accordance with the requirements of the Declaration of Helsinki and GOST R ISO 14155-2014.
A novel coronavirus, causing the dangerous respiratory disease in humans, COVID-19, was found at the end of the year 2019 in China. After the whole genome sequencing the virus was classified as a betacoronavirus and referred to as SARSCoV-2 [1–3]. High mutagenic capability of coronaviruses is well known [4]. The majority of emerging mutations do not affect the properties of the virus, however, some mutations can result in functional alterations, including fast viral spread and/or more severe illness.
Since the start of SARS-CoV-2 pandemia, a large number of genome sequences were published in the GISAID database [5], which made it possible to trace both the viral spread and the emergence of mutations in the viral genome.
In December 2020, British authorities declared a rapid increase in the number of COVID-19 cases. The sharp spike in morbidity was caused by the novel SARS-CoV-2 coronavirus genetic variant, which differed from the reference genome. This variant carried mutations in the gene encoding the S protein [6, 7]. Based on the phylogenetic analysis, the new British lineage of SARS-CoV-2 was named B.1.1.7 and classified as a variant of concern (VOC). The B.1.1.7 variant has a number of typical mutations in the spike protein, such as N501Y, A570D, D614G, and P681H, as well as amino acid deletions 69-70 and 144Y [8, 9]. Simultaneously, one more epidemiologically significant lineage was detected, which caused the outbreak in various provinces of the Republic of South Africa. This lineage is known as B.1.351. The South African strain contains nine spike mutations in addition to D614G, including the mutation cluster (for example, 242–244del and R246I) in the spike protein N-terminal domain (NTD), three mutations (K417N, E484K and N501Y) in the receptor binding domain (RBD), and one mutation (A701V) located close to the furin cleavage site [10, 11]. In January 2021, experts of the National Institute of Infectious Diseases, Japan, discovered a novel SARS-CoV-2 variant, the P.1 lineage, the isolates of which were identified in tourists coming from Brazil. Some mutations present in the Brazilian strain were previously discovered in the British and South African variants. This isolate contains 12 mutations in the spike protein, including N501Y and E484K [12]. Later the discussed above coronavirus genetic lineages were named using the letters of the Greek alphabet in accordance with the nomenclature, introduced by the WHO: Alpha (B.1.1.7 and subvariants Q.1–Q.8), Beta (B.1.351), Gamma (P.1). All three SARS-CoV-2 genetic variants listed above are considered the variants of concern (VOC) by the WHO and ECDC, and circulate all over the world [6].
In the spring of 2021 in India the new SARS-COV-2 lineage B.1.617.2 was discovered, which spread rapidly and gave rise to the new wave of the pandemic, with the sharp rise in the incidence rate and the number of deaths in India and then in other countries in different parts of the world. Along with some closely related genetic lineages identified most recently, AY.1 and AY.2-124, the B.1.617.2 lineage contains the following typical mutations in the S protein: T19R, E156G, del157/158, L452R, T478K, D614G, P681R, and D950N, including mutations in the RBD [13]. Lineages B.1.617.2 and AY.1-124 are classified as the variants of concern, and are referred to as Delta in accordance with the WHO classification.
According to a number of reports, mutation N501Y in the S gene, found in the genomes of tree discussed above VOC out of four, affects the S protein-ACE2 binding affinity, which may result in increased transmissibility [14–16]. The presence of mutation E484K can allow the virus escape from monoclonal antibodies produced in response to the previous infection with SARS-CoV-2 containing no such mutation, or in response to vaccination [17–19].
It was shown for individuals infected with the Alpha lineage, that the mean duration of the acute infection was longer compared to the values obtained for the strains of other lineages (13.3 days vs. 8.2 days, respectively) [20]. To date, there is no evidence that the SARS-CoV-2 Alpha genetic variant has the potential to escape from monoclonal antibodies, produced in response to vaccination [8, 21–23]. However, according to literature, the South African and Brazilian variants could reduce the efficacy of some vaccines against COVID-19 [7, 24]. Structural analysis of the RBD mutation L452R and the P681R mutation, located in the furin cleavage site, has shown that these mutations may help to increase the efficacy of SARSCoV-2 binding to the ACE2 receptor and the S1-S2 cleavage rate, thus contributing to the increased transmissibility. The presence of this mutation in RBD indicates the reduced binding with the selected monoclonal antibodies, and may affect their neutralization potency [13, 25].
Based on the foregoing, detection of various coronavirus variants and tracing the spread of the SARS-CoV-2 isolates, being the variants of utmost epidemiological significance, is an important challenge.
Currently, viral genome sequencing is the most common technique, allowing one to discover the rapidly emerging SARS-CoV-2 variants. However, studies involving the use of genome sequencing are expensive and time-consuming. Moreover, the method is also limited by the more stringent requirements to viral load in the sample compared to PCR. Reverse transcription polymerase chain reaction (RT-PCR) is a method, which enables screening the samples containing the SARS-CoV-2 RNA for the already known functionally significant mutations in the S gene of coronavirus, identification of which allows one to define and differentiate the viral VOC of utmost epidemiological significance. The use of oligonucleotide probes with modified LNA nucleotides enables the successful differentiation of even single nucleotide substitutions in the assayed gene sequence [26]. RT-PCR is recommended to increase the output and performance of screening for the toppriority genetic variants by the guidelines on the detection and identification of SARS-CoV-2 variants, issued by the experts of the WHO and ECDC [27]. There are reagent kits of foreign production for detection of important mutations in the gene encoding the S protein by RT-PCR. For example, reagent kit, manufactured by the Korean company PowerChek, identifies mutations, typical for the lineages Delta, Alpha, Beta, and Gamma, and the kit, produced by the French company ID, detects mutations L452R, E484K, and E484Q. In Russia only the reagent kits for detection of mutations typical for the Alpha lineage have been registered (manufactured by the Central Research Institute of Epidemiology and DNA-Technology LLC), the other lineages are not covered.
Thus, the development of the domestic reagent kit for the timely monitoring of the top-priority SARS-CoV-2 variants spread is extremely relevant. For this purpose we have developed the RT-PCR-based method and the reagent kit for detection of the range of mutations in the S gene of coronavirus, typical for four genetic variants of utmost epidemiological significance, as well as for detection of RNA of the SARS-CoV-2 variants Delta, Alpha, and Beta/Gamma (without differentiation between the latter) by identification of the relevant combination of key mutations. The study was aimed to develop the reagent kit allowing one to identify mutations N501Y, P681H and the deletion 69-70del, the combination of which is typical for the SARS-CoV-2 Alpha genetic lineage, mutation E484K, which in combination with the mutation N501Y is typical for the Beta and Gamma lineages, mutations P681R and L452R, the combination of which is typical for the Delta lineage, as well as to detect the presence of each of the listed above mutations. The kit may be used as a monitoring tool enabling the prompt acquisition of data on the dynamic changes in the spread of the top-priority SARS-CoV-2 strains (or genetic variants), carrying the functionally significant mutations.
METHODS
Design of oligonucleotides for amplification of the S gene target regions
The whole genome alignment of the SARS-CoV-2 sequences, deposited in the GISAID database (accessed 05.10.2021), was used to select the diagnostic primers and probes for detection of mutations, typical for the most epidemiologically significant SARS-CoV-2 genetic lineages. Based on the alignment data, regions carrying the relevant mutations were defined in the viral S gene, primers and LNA-modified probes were selected to detect the following mutations in the genome of the SARSCoV-2 coronavirus: N501Y, deletions 69-70del, P681H and P681R, L452R, E484K, E484Q. Oligonucleotides were selected in accordance with the standard requirements for selection of primers and TaqMan probes [28, 29] with the use of the Oligo Calc [30] and OligoAnalyzer Tool [31] online resources. Thermodynamic properties and secondary structure of fluorescent probes were assessed using the Mfold Web Server [32]. Oligonucleotides were synthesized by the Genterra (Russia).
Clinical samples
Nasopharyngeal and oropharyngeal swabs (n = 10,297) were obtained at the Head Center for Hygiene and Epidemiology of the Federal Medical Biological Agency from patients with symptoms of ARVI from December 2020 to September 2021. Swab samples were collected in the test tubes, containing transport medium (Central Research Institute of Epidemiology; Russia), and stored at a temperature of –70 °С before testing.
RNA extraction
RNA was extracted using the AmpliTest RIBO-prep kit (FSBI SPC FMBA; Russia) in accordance with the manufacturer's guidelines. RNA was eluted in 50 µl of RNA elution buffer. RNA samples were stored at a temperature not exceeding –70 °С before using.
RT-PCR
Multiplex RT-PCR was carried out in accordance with the instructions. In addition to RNA samples, each test included the positive control samples (PCS) А, В, С, PCS W for the wild type, and the negative control.
Each sample was tested for the presence of mutations, typical for the VOC Alpha, Delta, Beta/Gamma, in three test tubes, containing mixtures 2-А, 2-В, and 2-С. Fluorescence signal accumulation curves for three different channels were analyzed (table). The volume of RNA solution to be tested was 10 µl.
Amplification program included the following thermal cycling steps: 50 °C — 15 min, 95 °C — 15 min. The following steps were repeated in 45 cycles: 95 °C — 15 s, 60 °C — 30 s, 72 °C — 15 s.
The AmpliTest SARS-CoV-2 VOC v.2 reagent kit was optimized using the Rotor-Gene Q (Qiagen; Germany), CFX96 (Bio-Rad Laboratories; USA), and DTprime (DNA-Technology; Russia) systems.
Positive control samples (PCS)
The mixture of recombinant plasmids containing the amplified target S-gene fragments, being the targets of the selected primers and probes, was used as a PCS.
The PCR products were purified using the MiniElute Gel Extraction Kit (Qiagen; Germany), ligated into the pGEM-T plasmid vector (Promega; USA), and transformed in Escherichia coli. Recombinant plasmids of individual clones were purified using the Plasmid Miniprep Kit (Axygen; USA). The PCS nucleotide sequences were confirmed by Sanger sequencing (cycle sequencing) using the ABI PRISM Big Dye v.3.1 kit (Thermo Fisher Scientific; USA) in accordance with the manufacturer's guidelines, and the Applied Biosystems Sanger Sequencing 3500 Series Genetic Analyzer (Thermo Fisher Scientific; USA). The forward and reverse PCR primers, flanking the fragment to be amplified, were used for sequencing.
The sequencing reaction was carried out in 5 µL of reaction mixture in the 0.2 mL thin wall microtubes. The reaction mixture contained 0.8 µL of specific primer (forward or reverse) with a concentration of 1 µМ and 1 µL of the Ready Reaction BigDye Terminator v3.1 mixture (Thermo Fisher Scientific; USA). The components were mixed carefully, and thermal cycling was performed in the SimpliAmp VeriFlex 96 thermal cycler (Thermo Fisher Scientific; USA) in accordance with the following program: initial denaturation 2 min at 95 °С, then 45 cycles (denaturation 20 s at 95 °С, primer annealing 30 s at 60 °С, elongation 1 min at 68 °С). Capillary electrophoresis was performed in the Applied Biosystems Sanger Sequencing 3500 Genetic Analyzer (Thermo Fisher Scientific; USA) after the sample purification of excess dideoxynucleotides and salts.
Four PCSs were developed for the reagent kit: PCS А, containing the fragment of the SARS-CoV-2 S-gene codon 501 carrying the mutation N501Y, and the S-gene fragment carrying the deletion 69-70del; PCS В, containing the mixture of recombinant plasmids with the fragments of the SARS-CoV-2 S-gene codon 484 carrying mutations E484K and E484Q; PCS С, containing palsmids carrying mutations P681H, L452R and P681R in the SARS-CoV-2 S gene; PCS W, containing palsmids with the fragments of the SARS-CoV-2 S-gene codons 484 and 501 with no mutations N501Y and E484K.
The concentration of each PCS was measured by Droplet Digital PCR (ddPCR) with the use of the QX200 Droplet Digital PCR System (Bio-Rad Laboratories; USA). PCSs were introduced at the stage of RT-PCR as the separate samples.
Analytical sensitivity and specificity
To assess the analytical sensitivity, we used the SARS-CoV-2 RNA samples, obtained by extraction from biomaterial (smears from the mucous membrane of nasopharynx and oropharynx), which were studied using the diagnostic system developed in order to identify the minimum dilution, at which the samples were detected as positive. The concentration of each sample was previously measured by ddPCR using the QX200 Droplet Digital PCR System (Bio-Rad Laboratories; USA). The sensitivity threshold was defined based on the minimum dilution detected in three iterations.
To assess the analytical specificity of the reagent kit, we used RNA of the SARS-CoV-2 strain No. GK2020/1 from the collection of the N. F. Gamaleya National Research Center, as well as RNA of the strains of coronaviruses HCoV 229E, HCoV OC43, HCoV Nl63, SARS-CoV HKU39849, MERS-CoV (European Virus ArchiveGlobal 011N-03868 — Coronavirus RNA specificity panel), strains of influenza A virus (H1N1) (ATCC VR-1469), influenza A virus (H3N2) (ATCC VR-776) and influenza B virus (Victoria Lineage) (ATCC VR-1930) from the American Type Culture Collection (АТСС; USA), strains of Streptococcus pneumoniae (№ 131116), Streptococcus pyogenes (№ 130001), Hаemophilus influenzaе (№ 151221), Staphylococcus aureus (№ 201108), Klebsiella pneumoniae from the collection of pathogenic microorganisms of the Scientific Centre for Expert Evaluation of Medicinal Products of the Ministry of Health of the Russian Federation, strains of human parainfluenza viruses types 1, 2, 3, human rhinoviruses types 13, 17, 26 from the State Collection of the Smorodintsev Research Institute of Influenza in a concentration not less than 1 x 106 GE/mL.
Whole genome sequencing (WGS)
Whole genome sequencing (WGS) was used as a reference method for identification of mutations in the RNA samples.
Samples with the cycle threshold (Ct) in RT-PCR not exceeding 25 were used for sequencing. Reverse transcription of the RNA samples was performed using the Reverta-L reagent kit (Centre for Strategic Planning of FMBA; Russia). The resulting cDNA was amplified using the AmpliSeq for Illumina SARS-CoV-2 Research Panel (Illumina; USA), containing 247 amplicones in two pools, which covered the entire SARS-CoV-2 genome. Libraries for the high-throughput sequencing were prepared using the AmpliSeq Library PLUS kit (Illumina; USA). The quality of libraries was evaluated by capillary electrophoresis with the Agilent Bioanalyzer 2100 (Agilent; USA). The library concentrations were measured with the Qubit 4 Fluorometer (Thermo Fisher Scientific; USA) using the HS Qubit dsDNA reagent kit (Thermo Fisher Scientific; USA). Sequencing was carried out with the use of the Illumina NextSeq 550 system (Illumina; USA) and the NextSeq 500/550 v2.5 kit (300 cycles) (Illumina; USA). All the procedures were carried out in accordance with the guidelines, issued by the manufacturers.
Statistical analysis
The diagnostic sensitivity (percentage of positive test results in the group of positive samples) was calculated as А/(А+В)×100%, and the diagnostic specificity (percentage of negative test results in the group of negative samples) was calculated as А/(А+В)×100%, where В was the number of assayed samples, which tested positive using the kit, of the total number of samples, and А was the number of assayed samples, which tested negative using the kit, of the total number of samples with the true positive results. The 95% confidence intervals for the diagnostic characteristics were calculated by the Clopper– Pearson method.
RESULTS
When developing the reagent kit, we have selected the SARS-CoV-2 S-gene regions, flanking mutations, typical for the SARS-CoV-2 genetic lineages Alpha, Beta/Gamma, and Delta, to be the targets. We have selected primers and probes for identification of mutations N501Y, P681H and the deletion 69–70del, the combination of which is typical for the SARSCoV-2 Alpha genetic lineage (B.1.1.7), mutations P681R and L452R, the combination of which is typical for the Delta genetic lineage (B.1.617.2 and all the AY variants), mutation E484K, which in combination with N501Y is typical for the Beta and Gamma genetic lineages (B.1.351/P.1, these variants have not been differentiated).
The reagent kit has been developed in the format of using three RT-PCR mixtures for analysis. Each mixture enables the detection of three markers, involving registration in three different fluorescence channels. In addition to the key mutations in the S gene, the range of genetic markers, identified using the kit, includes two codons, E484 and N501, matching with the reference SARS-CoV-2 RNA sequence. These serve as endogenous controls for the exclusion of false positive results.
The samples of all the listed above viruses and bacteria were tested in order to define the analytical specificity, no false positive results were obtained. When assaying RNA of the SARS-CoV-2 strain No. GK2020/1 from the collection of the N. F. Gamaleya National Research Center using the kit, no mutations were found in the codons 501, 484, 681, and 69–70 of the SARS-CoV-2 S gene, the results corresponded to the whole genome nucleotide sequence of this isolate, deposited in the GISAID database (EPI_ISL_421275). The following result was obtained for all samples containing no SARS-CoV-2 RNA: the samples have no SARS-CoV-2 RNA for analysis.
To assess the analytical sensitivity, we used the biomaterial samples with the known concentration of the SARS-CoV-2 viral RNA, defined by ddPCR. The genetic variants were reproducibly detected up to the SARS-CoV-2 RNA concentration of 1 х 103 copies/mL, which defined the analytical sensitivity of the reagent kit.
The diagnostic characteristics of the reagent kit developed were assessed in the clinical trial, involving studying the SARSCoV-2 RNA samples with genotypes defined by WGS or Sanger sequencing of the S gene. To assess the diagnostic sensitivity, a total of 192 nasopharyngeal and oropharyngeal swab samples were analyzed, including 32 samples containing the SARS-CoV-2 RNA with a combination of mutations in the S gene, characteristic of the Alpha genetic lineage (B.1.1.7), 28 samples containing the SARS-CoV-2 RNA with a combination of mutations, characteristic of the Beta/Gamma genetic lineages (B.1.351/P1), 50 samples containing the SARS-CoV-2 RNA of the Delta genetic lineage (B.1.617.2 and all the AY variants). To assess the diagnostic specificity, a total of 82 RNA samples obtained from nasopharyngeal and oropharyngeal swabs were analyzed, which contained the SARS-CoV-2 RNA with no mutations under analysis. All the target mutations and their combinations, corresponding to the Delta, Alpha or Beta/Gamma genetic lineages, were identified in all samples. No discordant (nonspecific) results were registered. Thus, the diagnostic sensitivity (DS) of 100% was defined (DS with the 95% level of confidence was 94.2–100% for the Delta lineage, 91.1–100% for the Alpha lineage, 89.9–100% for the Beta/ Gamma lineages). The diagnostic specificity was 100% (with 95% level of confidence it was 96.4–100%).
The further practical use of the reagent kit showed full compliance of VOC and their key mutations' identification results with the results of WGS performed in 1500 samples of viral RNA of the genetic lineages Alpha, Beta, Gamma, and others (collected from March to May 2021), as well as Delta (collected from June to August 2021). All the samples carried the relevant combinations of mutations in the S gene, based on which the following SARS-CoV-2 genetic lineages were defined: Delta (n = 750), Alpha (n = 302), Beta/Gamma (n = 32), and others (n = 416). The RT-PCR results were completely matched with the data of WGS, so high diagnostic performance of the reagent kit was shown.
From December 1, 2020 to September 30, 2021 more than 10,000 clinical samples of the SARS-CoV-2 RNA were tested, collected in Moscow and Moscow Region within the framework of monitoring the spread of the SARS-CoV-2 VOC. The findings were split per month and presented as a percentage of the variants identified (figure). During the studied period (December 2020 to September 2021) we observed the emergence of VOC or single functionally significant mutations and their subsequent replacement by the novel SARS-CoV-2 virus variants. Thus, in the samples, obtained from December 2020 to January 2021, no isolates were found carrying mutations, typical for VOC. During the period the gradual increase in the proportion of distinct lineages was observed. In five months the propotion of the SARS-CoV-2 Beta/Gamma lineages samples changed from 0 to 1.6%, and the proportion of samples belonging to the Alpha lineage increased from 0 to 19.2%. The proportion of the Delta variant rose rapidly to 25% by the end of May, reached 87% by mid-June 2021, exceeded 95% in the first week of July, and was above 99% during the next months until the end of the studied period.
DISCUSSION
The emergence of the novel SARS-CoV-2 variants remains a grave worldwide concern, since the new variants can have the poteintial of higher transmissibility, affect the disease duration and severity, reduce the vaccine efficacy, and increase the mortality rate [6, 7, 14–20, 24].
Currently, WGS is the most widely used versatile method for identification of the new SARS-CoV-2 variants. Unfortunately, the method is very time-consuming and expensive. RT-PCR is an efficient alternative method, suitable for identification of the previously defined mutations, the markers of the key lineages, the detection of which is important for monitoring the spread of the genetic variants, and understanding the epidemiological situation.
The outcome of the study is the real time RT-PCR diagnostic test system for identification of the functionally significant mutations allowing one to define the SARS-CoV-2 VOC. The analysis includes the following steps: RNA extraction, RNA reverse transcription, cDNA amplification with the real time fluorescence hybridization detection, and data interpretation.
The kit provided allows one to distinguish RNA of the SARS-CoV-2 Alpha, Beta/Gamma and Delta variants from the other genetic lineages based on the combinations of typical mutations. According to a number of studies [15–23], the presence of distinct mutations (E484K, N501Y, P681H, P681R), associated with different genetic variants of coronavirus, results in the functionally significant alterations in the S-protein structure, including those contributing to the increased transmissibility, reduced neutralization efficiency of antibodies, produced in response to infection with the earlier circulating variants of the virus or in response to vaccination. That is why the detection of individual mutations, without taking into account the combinations of mutations, is also of some epidemiological significance.
The kit shows high analytical sensitivity (1 х 103 copies/mL of the SARS-CoV-2 RNA) for each of the genetic variants to be detected, and 100% analytical specificity in the tested panel of microorganisms. The diagnostic specificity with the 95% level of confidence is 94.2–100% for the Delta lineage, 91.1– 100% for the Alpha lineage, 89.9–100% for the Beta/Gamma lineages, and the diagnostic specificity is 96.4–100% with the 95% level of confidence.
The use of the reagent kit developed within the framework of epidemiological monitoring made it possible to detect the emergence of the SARS-CoV-2 Delta variant in Moscow in April 2021 in a timely manner, and to report promptly the dramatic increase in the proportion of Delta strains among the SARSCoV-2 variants, detected in the surveyed patients in Moscow Region. It is important to note that the emergence and further spread of the Delta variant in Russia have resulted in the almost complete substitution of the other coronavirus lineages.
CONCLUSIONS
The highly sensitive, specific, and easy to use AmpliTest SARSCoV-2 VOC v.2 reagent kit for identification of mutations in the S gene of coronavirus, typical for the Alpha, Beta/Gamma and Delta genetic lineages, by RT-PCR has been developed in the Centre for Strategic Planning of the Federal Medical Biological Agency. The kit was validated using the samples containing the coronavirus RNA with genotypes, determined by the SARS-CoV-2 genome sequencing. The PCR results showed a perfect match with the sequencing data. The use of the PCR reagent kit enables fast and efficient assessment of the spread of the coronavirus genetic variants, and immediately taking the appropriate anti-epidemic measures based on the data obtained.