REVIEW
Omics technologies in the diagnostics of Mycobacterium tuberculosis
1 Federal Research and Clinical Center of Physical-Chemical Medicine of Federal Medical Biological Agency, Moscow, Russia
2 Mendeleev University of Chemical Technology of Russia, Moscow, Russia
Correspondence should be addressed: Julia A. Bespyatykh
Malaya Pirogovskaya, 1a, Moscow, 119435, Russia; gro.mcpcr@seBailuJ
Funding: the study was supported by RSF grant № 20-75-10144.
Author contribution: Bespyatykh JA — study concept, manuscript writing and editing; Basmanov DV — analysis of raw data on biosensors and microarrays, manuscript writing.
Mycobacterium tuberculosis is a causative agent of tuberculosis (TB) and holds one of the leading roles among the causes of the infectious disease deaths. The major issues of TB management include the increase in cases of infection with the multidrug-resistant (MDR) and extensively drug-resistant (XDR) strains, and human immunodeficiency virus (HIV) co-infection [1]. Currently, TB therapy includes first-line drugs taken for 6–9 months and has severe side effects [2]. MDR/XDR TB requires extended treatment with the second-line anti-TB drugs in addition to the first-line drugs, pyrazinamide and high-dose isoniazid [3]. Regardless of the current stage-bystage approach to treatment regimens, the increase in cases of MDR and XDR TB creates new obstacles to the available drug therapy [4]. Thus, the need to improve medicines and infection control methods together with the need to develop new vaccine preparations becomes apparent.
M. tuberculosis is transmitted mainly by airborne droplets, by inhaling an aerosol containing cells of the pathogen [5]. After entering the lungs, bacteria infect alveolar macrophages and escape the host's immune response [6]. To further spread through the body, mycobacteria suppress the protective mechanism of macrophages, including autophagy, phagosome oxidation, reactive oxygen and nitrogen species release [7, 8]. Furthermore, infected macrophages produce chemokines, which attract inflammatory cells, such as neutrophils and natural killer cells, thus promoting the further development of inflammation and the formation of multinucleated giant cells known as granulomas [9]. Thus, these are precisely granulomas that provide the niche for bacteria and serve as a reservoir of infection.
Proteins secreted by M. tuberculosis (secretome) play a key role in the abnormal immune responses and intracellular growth progression [10, 11]. The early secreted antigenic target protein 6 (ESAT-6) being the major virulence factor of M. tuberculosis is known to regulate the host immune responses by inhibiting pro-inflammatory responses, such as interferon (IFN) γ [12] and interleukin-12 (IL12) production [13]. Moreover, ESAT-6 stimulates IL6 production by macrophages [14] and plays an important role in inducing macrophage polarization and macrophage transition to epithelioid macrophages being the main component of the tuberculous granuloma [15, 16]. It is shown that Rv1988, a secreted effector protein, methylates the host cell histones, thus providing epigenetic modulation of the anti-mycobacterial function of macrophages [17]. All of this confirms the important role of the M. tuberculosis proteins in virulence [18]. In addition, high-throughput proteome-wide screening of potential antigens of M. tuberculosis can be used to develop new vaccines [19].
Despite the fact that the M. tuberculosis genome was extensively studied in the early 2000s, analysis of its proteome lagged due to protein isolation protocol complexity and the need to use a complex and expensive equipment [20]. To date, more than 30% of proteome have not been characterized and are represented by hypothetical proteins [21]. Decoding the function of these proteins would contribute to better understanding of the M. tuberculosis physiology and virulence. Clearly, this is only possible with the use of other OMICs technologies, such as transcriptomics and proteogenomics. There is no doubt that it is important to identify and characterize all genes, however, special attention should be paid to gene products responsible for virulence and pathogenesis. Transcripts, protein products, and metabolites can be defined and quantified, thus allowing one to reveal the differences in the pathogenicity and drug resistance between the lineages and strains of M. tuberculosis. In general, the combination of such approaches could help to find new anti-TB drug targets and contribute to implementation of the WHO End TB Strategy.
Genomics
Currently, understanding the genetic potential of the studied object would determine the future strategy of any research. The M. tuberculosis H37Rv strain complete genome sequence was first published in 1998 [22], and the development of sequencing technologies has resulted in the fact that there are more than 13,000 genomes of M. tuberculosis currently available in the NCBI database. However, it is worth mentioning that the majority of those are not arranged in rings. Annotation of the M. tuberculosis H37Rv strain genome (version 27 according to the TubercuList database (http://tuberculist.epfl.ch/)) containing 4018 protein-encoding genes, of which 26% belong to the class of proteins with hypothetical functions, has been considered a reference system and the most complete sequence for a long time. In 2019, complete sequence of the RUS_B0 strain belong to the Beijing family was published [23].
Mycobacteria have all the genes necessary for synthesis of essential amino acids, vitamins, enzymes, and cofactors.
It was found that these bacteria have a high proportion of genes encoding enzymes involved in lipogenesis and lipolysis. Furthermore, M. tuberculosis has genes necessary for synthesis of glycolytic enzymes, enzymes involved in the anabolic pentose phosphate pathway, which generates NADPH and pentoses, enzymes of the tricarboxylic acid and glyoxylate cycles synthesizing carbohydrates from fats. The tubercle bacillus also has enzymes involved in the aerobic, microaerophilic and anoxic electron transfer. It is shown that mycobacteria are able to survive in various environments, including oxygen-rich lungs, macrophages, and the center of caseous granuloma [24].
Mycobacterial genome is rich with fatty acid metabolismrelated genes, including mycolic acids containing acidic asparagine- and glycine-rich motif polypeptides. The major part of the genome is also constituted by genes encoding PE (proline-glutamate, n = 99) and PPE (proline-proline-glutamate, n = 68) protein family, variability of which provides the differences between antigens and the ability to inhibit immune response [25].
Large amounts of DNA sequences repeats are one of the M. tuberculosis genome features: for example, insertion sequences (IS), contributing to mycobacterial DNA polymorphism, and variable number of tandem repeats (VNTR). Along with the IS, there are direct repeat regions (DR) separated by variable sequences (spacers), major polymorphic tandem repeats (MPTR), and polymorphic GC-repetitive sequence (PGRS). All these features of the M. tuberculosis genome provide the basis for the pathogen diagnosis and typing techniques: IS6110-based restriction fragment length polymorphism (RFLP typing) [26], spoligotyping [27], VNTR typing [28]. In addition, prophage-like sequences, phiRv1 and phiRv2, have been found in the genome of H37Rv. These are considered to be associated with the pathogenicity factors due to the fact that no prophages have been found in the genomes of avirulent H37Ra and M. bovis BCG strains.
The development of more specific typing methods, for example, including drug resistance and virulence in certain different lineages of M. tuberculosis, requires confirming the functions of certain genes, their contribution to cell metabolism, and especially realization of the unique features of the pathogen. Clearly, all of that could be defined only with the use of additional methods, such as transcriptome and proteome analysis.
Proteomics
Proteomics is an important tool for identification of both known and new protein targets that are a part of the virulence system and cell protective mechanisms, and are a key element of the host–pathogen interaction. The increased or decreased synthesis of various proteins related to the host immunity and pathogen virulence indicates the role of these proteins in the protective mechanisms or pathogenesis. Such upregulation or downregulation is useful for identification of proteins that could be considered the important targets for medications or the development of diagnostic tools representing various stages of pathogenesis and the level of infection. Starting with identification of any such protein in order to find out whether the protein is a drug target or a diagnostic marker, as well as to monitor protein kinetics in various organs in response to infection, it is necessary to use the approach that includes the following consistent steps: identification of new targets, in vitro verification of their role, comparative analysis of the target unicity and specificity, in vivo verification of the effects of the target in in vivo models.
The methods of proteome analysis were most comprehensively described earlier [20], including considering the key results of the proteome analysis of the TB causative agent. To date, these are supplemented by information about the proteomic and transcriptomic data integration. For example, system OMICs analysis of the Beijing B0/W148 cluster was performed [23] that made it possible to reveal additional unique features of the cluster members.
The relevance of using the system approach could be also attributed to the fact that the transcript (even the abundant one) availability does not always result in protein synthesis. Thus, if there is no proteome analysis, transcriptomics will not be fully suggestive. For its part, proteome analysis makes it possible to register the end product, and combining these data with transcriptomic data provides additional insights into cell physiology. Currently, the acquired data inconsistencies are also a major shortcoming. It is clear that when developing diagnostic panels, each finding should be dealt with in the context of other findings, and the system should be considered as a whole.
Transcriptomics
As mentioned above, bacteria must extremely quickly adapt to changing environmental conditions, that is why gene expression alterations occurring in response to the host protective mechanisms or the effects of medications are essential for the pathogen survival and functioning. Transcriptome (full set of transcripts produced by bacteria cell) studies performed using various approaches complement primarily the genome sequencing data. The methods used for transcriptome analysis of mycobacteria have been discussed in detail earlier [29]. Clearly, the main discoveries of transcriptomics are related to studying the resistance of the causative agent of TB. Advances in technology have facilitated the emergence of DNAmicroarrays, which have provided a powerful tool to explore differential gene expression, including under in vivo conditions [30]. However, this method has not yet been extended to the diagnosis. The use of transcripts as the diagnostic targets has a number of critical disadvantages, foremost among which is their lifetime. Anyway, transcriptomics is extremely important for understanding of cell metabolism and the use of other OMICs data for the diagnosis, since it was system analysis that made it possible to reveal the complex changes in the abundance of proteins responsible for fatty acid biosynthesis in the M. tuberculosis virulent strains (at the transcriptome and proteome levels) [23, 31, 32]. For its part, the latter showed the relevance of understanding of how various molecular fragments of the genome (genes and transcripts) became integrated into networks representing metabolism, regulation, signaling, and protein-protein interactions.
Metabolomics
Metabolic pathways provide the basis for cell functioning. Studying metabolic pathways and metabolic reconstruction is an important step in modelling of cellular activities and, most important, in understanding the underlying mechanisms at the system level.
Mycobacteria owe much of their unique properties to mycolic acids being the components of the bacterial cell wall. Several studies demonstrate the importance of mycolic acids for bacterial growth, survival, and pathogenesis [33]. In this way biosynthesis of mycolic acids has become the subject of numerous biochemical and genetic studies [34]. Thus, the detailed model of the mycolic acid synthesis in M. tuberculosis was constructed that included 197 metabolites involved in 219 reactions catalyzed by 28 proteins. Comparative analysis of the M. tuberculosis H37Rv strain and human metabolic pathways showed that AccD3, Fas, FabH, Pks13, DesA1/2, DesA3 were potential targets for development of anti-TB drugs [35].
Continuous accumulation of protein data, decoding of their enzymatic properties allowed modeling a number of metabolic networks of mycobacteria. Thus, genome-scale metabolic network (GSMN) includes 849 unique chemical reactions involving 739 metabolites and 726 genes [36]. It should be noted that there is currently a lot of confusion due to incomplete characterization of some proteins and incomplete information about the biochemical reactions these proteins are involved in. Meanwhile, the use of available metabolic networks has made it possible to define 318 proteins essential for mycobacterial growth [35]. Thus, it can be assumed that these 318 proteins play an important role in maintaining metabolism of mycobacteria.
Protein-protein interactions provide the basis for intracellular signaling pathways, as well as for various transcriptional regulatory networks. To date, the extended version of the M. tuberculosis H37Rv strain protein interaction network is represented by the STRING database [37]. STRING contains literature data that describe both empirical interactions and interactions explored by genome analysis with the use of bioinformatic algorithms. Thus, network covers various types of direct mediated interactions and linkages, such as: а) physical complex formation of two proteins essential for the functional unit formation; b) co-regulation of genes belonging to one operon or adjacent genes; c) interaction of proteins involved in certain metabolic pathway and therefore affecting each other; d) associations between proteins based on the predominant coexistence, co-expression or domain fusion. The network represents the first integrated view of linkages between various proteins similar to acquiring the road map of the city. Currently, the database is filled during integration of system analysis, including the main emphasis on experimental specification of predicted relationships. Based on this information, a functional distance matrix and a subsequent protein proximity index were obtained, which help to understand how the influence of a particular protein can distribute to the metabolic network as a whole. This index was used to predict the strategy of maximum metabolic disruption by inhibiting the least amount of proteins. The study found that simultaneous inhibition of the combination of four proteins could affect a total of 471 proteins involved in 33 pathways, which resulted in 75% metabolic disruption [38].
OMICs in diagnostics
The studies on finding TB biomarkers are conducted constantly. A lot of promising candidates for defining the risk of infection, disease risk, probability of cure, and protection against infection have been found [39]. The majority of such biomarkers are associated with the host immunity and include proteins, metabolites, cell markers, and transcripts [40]. Despite numerous reports of the correlations with various TB stages, especially in children [41], there are currently no commercially available predictive biomarkers. Clearly, this indicates insufficient clinical significance of the described markers and the need for further search.
Certainly the most success is seen in molecular testing using genomic data both for the presence of M. tuberculosis and drug resistance. Thus, Xpert XDR (Cepheid; CA, USA) allows one to detect the genetic material of M. tuberculosis together with mutations associated with resistance to rifampicin, isoniazid, injectable drugs, and fluoroquinolones. For its part, the Russian equivalent, the hydrogel microarrays developed by the Engelhardt Institute of Molecular Biology of Russian Academy of Sciences [42], particularly TB-TEST, make it possible to perform typing and simultaneously determine resistance based on a total of 114 genetic determinants: of those 28 mutations in rpoB associated with rifampicin resistance; 11 mutations in katG, five in inhA and five in ahpC associated with isoniazid resistance; 18 mutations in embB assiciated with ethambutol resistance; 15 mutations in gyrA; 23 mutations in gyrB associated with resistance to fluoroquinolones; 4 mutations in rrs; 5 mutations in eis assicuated with resistance to aminoglycosides and capreomycin [43].
The whole-genome sequencing is becoming an increasingly attractive option for identification of drug resistance in M. tuberculosis, the method can be also used to improve understanding of the TB transmission [44]. This technique is based on detecting mutations associated with drug resistance in the M. tuberculosis genome, the data show the correlation between the genetic mutations and the results of the сulturebased drug susceptibility tests performed at least for four firstline drugs (isoniazid, rifampicin, ethambutol, and pyrazinamide) [45, 46]. At the same time, some differences regarding phenotypic and genetic drug resistance profiles suggest that the use of genomic data does not always allow to define the drug susceptibility of the bacterium. Taking into account the fact that the structure of the M. tuberculosis population is heterogenous and has its own characteristics, one could speak of developing the local diagnostic test systems. It's been reported that Beijing family strains prevail in our country (50–80% of all cases) [47]. Strong association with the drug resistance and higher virulence compared to other genotypes have been proven for this genotype family [48]. The latter was shown by both studying in vivo models [49] and epidemiological research. The increased abundance of virulence factors in Beijing family strains has been shown at the molecular level, including through the system OMICs analysis [23, 32, 50].
Taking into account all of the above, it can be assumed that early diagnosis of the Beijing family strains is the most promising. To date, the use of the label-free microfluidic biosensor device based on the photonic crystal surface mode (PC SM) is the most common approach. The PC SM biosensor makes it possible to assess a broad range of interactions: from the formation of various protein–protein complexes to interaction between oligonucleotides with different sequences. The fact that chemical reactions occur in an isolated zone of minimum volume, thus precluding contamination, reducing analysis time, and making the analysis procedure more user-friendly, is the main advantage of the technique. Such interactions, which are registered in the real-time mode, require no pre-labeling of target biomolecules [51], thereby simplifying and speeding up the analysis.
Taking into account the potential of the PC SM biosensor with a two-dimensional spatial resolution [52], we have proposed a fundamentally new method for typing of the TB causative agent [53]: the photonic crystal surface was modified with dextran, and the oligonucleotide system was used to detect the M. tuberculosis single-stranded DNA. It is worth mentioning that the surface modification method was optimized for detection of the M. tuberculosis DNA [54]. In addition, a simplified variant has been proposed for differential detection of the Beijing and LAM families, which are most common in our country. Such an approach would make it possible not only to simplify the diagnosis, but also to reduce the costs of the diagnostic test system development and production. Furthermore, the platform itself could be used for the M. tuberculosis proteome typing.
Currently, two main immunological approaches to the diagnosis of latent TB infection are used that are included in the WHO guidelines [55]: tuberculin skin test (TST) and interferon-γ release assay (IGRA). Despite the fact that IGRA is more specific than TST, none of these tests allow one to distinguish between latent TB infection and active TB. Both tests have reduced sensitivity in various immunocompromised subpopulations. Cohort studies have shown that both TST and IGRA have low predictive value for the latent TB infection progression to active TB [56], that is why it is important to test only people at high risk of the disease progression and use all the available clinical data to complement test results. There are a number of convenient estimation tools, such as Online TST/IGRA Interpreter. C-Tb skin test (Statens Serum Institut; Copenhagen, Denmark) based on the specific TB antigens ESAT-6 and CFP10 showed the safety profile and accuracy similar to those of IGRA during the third phase of the clinical trial [57, 58].
Several groups of researchers reported a possibility of using HspX protein as a marker of the disease, including the latent form [59, 60]. However, subsequent systemic analysis of various M. tuberculosis strains showed the insolvency of such approach [32]. Thus, the search for the infectious process biomarkers is still the most pressing task for researchers all over the world.
CONCLUSION
Tuberculosis remains one of the main health problems in our country. Despite the existence of various test systems, identification and particularly typing of the pathogen are still an urgent task both in Russia and abroad. Available technologies are based mostly on the use of the well-studied genomic data. In the recent decades, plenty of other OMICs technologies have been introduced worldwide, such as metagenomics, transcriptomics, proteomics, metabolomics, and culturomics, which play a key role in understanding the main mechanisms of the bacterial virulence, resistance, and pathogenicity. We have reviewed various OMICs technologies and the possibility of their use as diagnostic tools. According to the state-of-theart knowledge, OMICs technologies should be used in concert rather than separately in order to obtain meaningful data on the pathogenesis of Mycobacterium tuberculosis. Furthermore, for the nuanced and holistic understanding, new technologies are needed, such as bioinformatics, nanotechnology, singlecell genomics, together with the new technologies of gene expression, such as nanostring, and imaging tools. The sets of the existing and new OMICs data considered together should create an integrated presentation of the M. tuberculosis gene regulation and promote the development of both diagnostic panels and new efficient treatment methods.