The Genomic and Structural Organization of SARS-CoV-2: A Mutational Perspective

The ongoing pandemic due to the novel SARS-CoV-2 disease (COVID-19) has exerted a great toll on human health. The SARS-CoV-2 is the third most pathogenic human CoV after SARS-CoV-1 and MERS-CoV, which is classified within the genus Betacoronavirus. Though the actual source of its origin and transmission is still unclear, genetic analysis has shown its very close similarity (~96%) with bat SARS-like CoV. SARS-CoV-2 is a spherically-icosahedral virus with a plus-sense single-strand RNA (~30 kb) genome defined into thirteen open reading frames, which encode 2 non-structural polyproteins, 4 structural proteins and 6 accessory proteins. Of its structural proteins the ‘S1’ subunit of spike (S) contains the cellular ACE-2 receptor binding domain (RBD) whereas the ‘S2’ subunit is required for cell membrane fusion. The membrane (M) protein participates in cell-fusion whereas envelope (E) is necessary for virion assembly and morphogenesis. The non-structural polyproteins (pp1a and pp1b) undergo proteolytic processing to produce a total of 16 small proteins, which are involved in mRNA synthesis and replication. Of the accessory proteins (3a, 6, 7a, 7b, 8 and 9b), few are known to modulate host-innate immunity. Interestingly, ‘3b’ is absent in SARS-CoV-2 that significantly differentiates it from other human CoV. Detection of several novel mutations in ‘3a’, ‘3b’ and ‘ORF8’ proteins, notably in the ‘S’ RBD strongly suggest SARS-CoV-2 enhanced cell attachment and facilitated entry, its high infectivity and disease severity in humans. The recent emergence of highly contagious SARS-CoV-2 RBD variants in the United Kingdom (B.1.1.7 strain), South Africa (B.1.351 strain) and Brazil (P.1 strain), and their subsequent spread to other counties have raised serious concerns.


Introduction
In the recent decades, incidences of devastating viral epidemics have marked the Asia-Pacific region as the global hot-spot for emerging new pathogenic viruses [1]. The novel severe acute respiratory syndrome coronavirus-2 (SARS-CoV-2) disease or COVID-19 pandemic first reported in China in Dec. 2019, has affected nearly 114 million people, with over 2.5 million deaths worldwide [2][3][4]. SARS-CoV-2 is the seventh human infecting and third highly pathogenic human CoV after the SARS-CoV-1 and Middle-East respiratory syndrome CoV (MERS-CoV) [4,5]. The human-to-human direct transmission of SARS-CoV-2 through multiple modes, such as nasal droplets, oral mucus and aerosols, including fomites has been confirmed [6,7]. SARS-CoV-2 has an incubation period of 2-14 days with symptoms of cough, fever, headache and shortness of breath, which may progress to mild-to-severe pneumonia and death [4,8]. COVID-19 patients, mostly of old age group or with comorbidity like pulmonary, cardiac, renal or hepatic disorders have shown higher mortality rate than those with pneumonia only. Among COVID-19 positive cases, about 80% individuals remain asymptomatic, 15% with hospitalization and 5% develop acute respiratory distress syndrome (ARDS) or multi-organ failure [9]. In addition, a good proportion of COVID-19 patients have also shown evidences of digestive and hepato-biliary symptoms, where high amounts of SARS-CoV-2 genomic RNA has been detected in gastrointestinal, rectal and stool specimen [10][11][12]. Further detection of SARS-CoV-2 RNA in municipal or waste water samples has suggested its plausible fecal-oral transmission through contaminated water [13].
In general, human CoV do not cause life-threatening disease. However, because of the zoonotic origin of SARS-CoV-2 similar to SARS-CoV-1, humans lack pre-existing natural immunity i.e. 'herd-immunity' against it [14]. In naïve population therefore, exposure to SARS-CoV-2 leads to a much delayed time to develop adaptive-immune responses. Also, unlike SARS-CoV-1, most of the spread of SARS-CoV-2 occurs through asymptomatic infection [15][16][17], which is a bottleneck for its quick containment. Moreover, unlike SARS-CoV-1 and MERS-CoV, the precise mechanism of modulation of host-innate immune responses and severe pathogenesis by SARS-CoV still remains elusive [18]. Nonetheless, within a year of COVID-19 health crisis, we have a remarkable understanding of its epidemiology, clinical presentations, immune-pathobiology, treatments and preventive strategies. Currently plasma or antibody based rapid-test kits and reverse-transcription polymerase chain reaction (RT-PCR) are the routine diagnostic tools to identify COVID-19 positive cases. In the absence of specific therapeutics, several repurposed drugs are currently under clinical trials or approved for emergency use [19,20]. Fortunately, of the leading vaccine candidates under final stages of trials, at least seven are now granted approval in some countries.

Structural Proteins
Of the four encoded structural proteins, the club or pear-shaped spike (S) proteins bind to cell receptor angiotensinconverting enzyme 2 (ACE2) whereas envelope (E) and membrane (M) proteins form the viral shell or envelope ( Figure 1B). The nucleocapsid (N) proteins form the viral capsid in which the genomic RNA is packaged. Notably, the characteristic structural projections known as hemagglutinin-esterase (HE) proteins located beneath the spikes in some betacoronaviruses are absent in SARS-CoV-2 [21]. The 'S' protein is a large type I trans-membrane glycoprotein that is present on the outer surface of the virion, and gives a crown-like appearance that has earned it the name 'coronavirus' [23]. The SARS-CoV-2 'S' protein shares ~76% sequence identity with that of SARS-CoV-1 and ~ 80% identity with bat-SL-CoV [24,25]. 'S' protein has two structural subunits (S1 and S2) where the 'S1' subunit contains the ACE2-binding domain (RBD) and the 'S2' subunit contains structural elements viz., a membrane fusion peptide (FP), an internal fusion peptide (IFP), two heptamer repeats (HR) and a trans-membrane domain (TM) required for cell membrane fusion [26]. Notably, the SARS-CoV-2 'S1'is highly variable with nearly 70% identity to those of bat-SL-CoV and SARS-CoV-1. In contrast, 'S2' is highly conserved and shares up to 99% identity with that of bat-SL-CoV and SARS-CoV-1 [27]. Interestingly, the SARS-CoV-2 S1-S2 junction sequences contain a 'furin-like' cleavage site, as a result of 12-nucleotide insertion, not found in bat-SL-CoV and SARS-CoV-1 [27,28]. In addition, of the six residues located in RBD, five differ between SARSCoV-2 and SARS-CoV-1, suggesting that the novel SARS-CoV-2 spike protein binds to ACE2 more strongly than that of SARS-CoV-1, hence making the virus more infectious to humans [27][28][29][30].
The 'E' protein is a small glycoprotein that plays a critical role in the assembly and morphogenesis of virions within the infected cell [31]. The 'M' protein is a trans-membrane glycoprotein and is involved in virion morphogenesis and maturation through interacting with other structural proteins [32]. The 'N' is a phosphoprotein that is highly conserved among SARS-CoV-1 and bat-SL-CoV, and plays a crucial role in RNA encapsidation, capsid structure stabilization, replication and transcription. Recently, high-resolution crystal structure of SARS-CoV-2 nucleocapsid (N2b domain) has revealed its compact intertwined architecture and self-assembly properties very similar to that of SARSCoV-1 and MERS-CoV [33]. Though not established in COVID-19 patients, it has been reported to be highly antigenic in about 90% of SARS-CoV-1-infected individuals [34].

Non-structural Proteins
The non-structural replicase proteins are translated as two large polyproteins (pp1a and pp1b) that further undergo proteolytic cleavage to produce 16 individually active small proteins [35]. The 'pp1a' is processed to generate 11 proteins (nsp1-nsp11) whereas 'pp1b' produces 5 proteins (nsp12-16). Non-structural proteins of SARS-CoV-2 are involved in viral RNA transcription and replication, including modulation of host innate immunity [33]. However, the precise functions of nsp2 and nsp11 still remain elusive [36]. Due to insignificant differences between pp1a/b of SARS-CoV-2 and those of other betacoronaviruses, the pp1a/b of SARS-CoV-2 has not been thoroughly analyzed. Nonetheless, a 42 amino acid insertion in SARS-CoV-2 'pp1a/b' papain-like protease region has been reported recently [37].

Accessory Proteins
In addition to the structural and non-structural proteins, seven accessory proteins (3a, 3b, 6, 7a, 7b, 8 and 9b) are also synthesized by all betacoronaviruses, and they are mainly involved in countering host innate immune system. However, in a recent sequence analysis of different betacoronavirus genomic RNAs, SARS-CoV-2 has shown insertion of six stop codons leading to aborted translation of '3b' [28]. This observation has suggested the number of accessory proteins to be six instead of seven and therefore, consolidating the number of all synthesized proteins to twenty six. Also, SARS-CoV-2 significantly varies in the conserved aggregation motif (VLVVL), including deletion of the diacidic motif (XDE) within '3a' as compared to SARS-CoV-1, bat-SL-CoV and civet-SL-CoV [28,37].

Significant Mutations in SARS-CoV-2 Genes and Proteins
Approximately 80% of human infecting viruses are zoonotic, which initially poorly adapt and replicates in a new host or cross species-barrier [1]. RNA viruses, due to the high replication-error rate (~10-4 error/site/cycle) of their RNA polymerase are more genetically diversified than DNA viruses [1]. Because SARS-CoV-2 is newly introduced to humans, we have limited knowledge on the mechanism(s) underlying its high infectivity and pathogenesis in humans. Nonetheless, ample of recent comparative RNA and protein sequence analyses has reported several mutations in different SARS-CoV-2 isolates, suggesting their crucial roles in high infection rates and disease severity [28,[39][40][41][42][43][44]. Most importantly, substitution mutations in SARS-CoV-2 'S' RBD residues, including other unique mutations in RBM as well as introduction of furin-like sequences have been suggested for its strong binding with ACE-2 and high infectivity [27,28,44,45]. Notably, proteolytic cleavage of 'S' protein at furin-site leads to its open conformation that gives advantage of enhanced binding to host ACE-2 [27]. This has been supported by neuropilin-1 mediated cleavage of SARS-CoV-2 furin towards its significant improvement in ACE-2 binding [46]. Very recently, comparative sequence analysis of the four structural and 'ORF8' proteins of 100 SARS-CoV-2 isolates of different countries has revealed thirteen substitutions and/or deletion in 'S', three substitutions in 'N', and one substitution in 'M' protein [47]. Another such study has reported frequent mutations in 'S', 'N', 'ORF1ab', and 'ORF8' in twenty SARS-CoV-2 isolates, and evaluated their significance in protein stability and functionality in relation to virus transmission [40]. Further sequence analysis of SARS-CoV-2 isolates, of Russia and other countries has revealed a set of seven common mutations in 'S' and 'N' proteins, suggesting their role in varying patterns of spread [42]. Analysis of SARS-CoV-2 isolates within the United States has identified 921 mutations that included 487 missense, 348 synonymous, 66 intergenic, 4 in-frame deletions, and 5 nonsense insertions/deletions in at least three samples [48].
Recently, few genetic variants of SARS-CoV-2 with substitution mutations in 'RBD' have emerged in different geographical regions, and have subsequently spread across the borders. The UK or B.1.1.7 (501Tyr monosubstitution) and the South African or B.1.351 (20His/501Tyr dual-substitutions) strains have been suggested with much faster transmission and increased risk of death. Most recently, the Brazil or P.1 variant with three substitutions in 'RBD' and 17 other unique mutations has emerged in Brazil [49].

Conclusion
Humans lack natural immunity against zoonotic SL-CoV, notably the novel SARS-CoV-2 which has faster 'human-to-human' transmission rates and higher pathogenicity than SARS-CoV-1 and MERS-CoV Nonetheless, previous experiences with SARS-CoV-1 and MERS-CoV outbreaks have helped understanding the SARS-CoV-2 pathobiology to some extent. Over the course of cross-species infection and adaptation, some acquired mutations may lead to evolution of even more aggressive vial strains. Although the clinical manifestations of COVID-19 are well understood now, the mechanism(s) underlying its high infection rate and pathogenicity is hitherto not clearly established. In view of this, comparative genome and protein sequence analysis of SARS-CoV-2 has revealed several novel mutations, notably in the spike RBD, including generation of a furin-like site. This strongly suggests SARS-CoV-2 enhanced cell attachment and facilitated entry, its high infectivity and severe pathogenesis in humans. The observed mutations within the non-structural proteins may be crucial for the enhanced replication of the viral genome. In addition, mutations within the accessory proteins could have significant roles in evading or modulating host innate immune system and sustaining virus replication. Therefore, novel mutations acquired by SARS-CoV-2 during 'human-adaptation' and 'human-to-human' spread provide insights into its transmission dynamics which together with clinical and epidemiological data can predict disease prognosis. Moreover, the consequence of these mutations on virus infectivity and tissue-tropism remain to be studied in animal models. Since SARS-CoV-2 is transmitted even during the asymptomatic phase of COVID-19, it would be interesting to study its replication in early phase, and how the innate-and adaptive-immune systems respond to its life cycle. Nonetheless, further rigorous genetic and molecular studies would enhance our knowledge on the subject towards updating treatment and preventive strategies, especially for the recently emerged variants.

Author Contributions
Conceptualization, writing-original draft preparation, review and editing, M.K.P.; writing-original draft preparation, S.N. All authors have read and agreed to the published version of the manuscript.

Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.

Ethical Approval
Not applicable.

Conflict of Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.