The Possible Molecular Mechanism of SARS-CoV-2 Main Protease: New Structural Insights from Computational Methods

Main protease (Mpro) is one of the key enzymes in the life cycle of SARS-CoV-2 that plays a pivotal role in mediating viral replication, transcription, and makes it an attractive drug target for this virus. The catalytic site of this enzyme comprises of a dyad His41 and Cys145 and lacks the third catalytic residue, which is replaced by a stable water molecule (W). The computational structural analysis on crystal data for Mpro protein suggests that W1, W2, His163, and Tyr161 may also play a vital role in the activity of this enzyme and they may act as catalytic partners along with Cys(145)His(41) catalytic dyad. The thiolate–imidazolium ion-pair between Cys145 (-SH---NE2-) His41 and Cys145 (-SH---NE2) His163 have been stabilized by W1 (with W2) and Tyr161, respectively. Therefore, unique interactions of W2---W1--ND1-His41-NE2---SH-Cys145 or Cys145-SH---NE2-His163-ND1---OH-Tyr161 in Mpro serve as an excellent drug target for this enzyme and suggest a rethink of the conventional definition of chemical geometry of inhibitor binding site, its shape, and complementarities. Our computational hypothesis suggests two essential clues that may be implemented to design a new inhibitor for Mpro protein. The strategies are: (i) ligand should be occupied either W1 or W2 or both of these position to displace these water molecules from the catalytic region, and (ii) ligand should be made H-bonds with Cys145 (-SH), His41 (NE2/ND1) and His163(NE2) to inhibit Mpro. The results from this computational study could be of interest to the experimental community and also provide a testable hypothesis for experimental validation.


Introduction
In December 2019, a new infectious respiratory disease severe acute respiratory syndrome caused by novel coronavirus 2 (SARS-CoV-2) emerged in Wuhan City, Hubei Province, China [1-3] which spread all around the world, infecting millions of people and causing thousands of death [case fatality rate (CFR): 8.8%, John Hopkins Coronavirus Resource Center, accessed 3 January 2021] [4]. On March 11, 2020, the World health organization (WHO) officially announced coronavirus disease 2019 (COVID-19) as a global pandemic. Previously, SARS was also appeared by infection of coronavirus (CoV) in 2002 in Guangdong, China, and this SARS-CoV was globally spread with associated with 8,096 cases and 774 deaths [5]. The genomic contents and structural integrity of SARS-CoV-2 are closely related to SARS-CoV. Till now, no specific approved drugs or vaccines are available to combat CoV or SARS-CoV-2. It is unexplainable that the sequence similarities between CoV-2 and CoV translate into similar biological properties, including pandemic potential [6]. However, clinical management studies suggest preventing infection and conventional control measures, notably, travel restrictions, world-wide lockdown, patient isolation and supportive care, and more. Currently, few common drugs that have been prescribed as therapeutic interventions for COVID-19 include a combination of lopinavir and ritonavir, ribavirin, chloroquine phosphate, hydroxychloroquine, azithromycin, arbidol, remdesivir, favipiravir, and dexamethasone [7][8][9]. However, most but not all have been ineffective in completely neutralizing the virus. At the same time, extensive computational studies had been conducted using computational simulation methods to explore the molecular mechanism of proteolysis of SASR-CoV-2 Mpro [10,11].
Generally, coronavirus is a large, single-stranded positive-sense RNA virus and possesses large RNA genomes (27 to 32 kb) that is subcategorized into alpha (α-), beta (β-), gamma (γ-), and delta (Δ-) genera. Recent studies highlight that COVID-19 belongs to β type genera, shares ancestry with bat coronavirus HKU9-1, and strongly interacts with human ACE2 (angiotensin-converting enzyme 2) [12]. The envelope of SARS-CoV-2 contains the membrane/M (222 residues length; ref. seq: YP_009724393.1), envelope/E (75 residues long; GenBank: QIS30667.1) and spike protein/S (1273 residues; ref.seq: YP_009724390.1) that mediates virus entry into host cells [13]. Main protease (Mpro) or 3chymotrypsin-like protease (3CL pro ) is one of another key enzyme in the life cycle of COVID-19, plays a pivotal role in mediating viral replication and transcription that makes it an attractive drug target for this virus. The Mpro resembles the structure of cysteine proteases and contains two antiparallel β-barrels domains and one helical domain, which are required for enzymatic activity [14]. The active site of this enzyme comprises a catalytic dyad, His41, and Cys145, and lacks the third catalytic residue, which is replaced by a stable water molecule [15] (Figure 1). This Cys-His catalytic dyad is located in a cleft between domains I and II, and the N-terminal residues 1 to 7 (or N finger) of Mpro are considered to play an important role in the proteolytic activity. The C-terminal domain III is reported to be required for dimerization [16]. , and lacks the third residue, which is replaced by a stable water molecule W1 (W639) and W2 (W864) in the 6YB7 crystal structure. The thiolate-imidazolium ion-pair may be formed between SH of Cys145 and NE2 of His41, whereas W1 makes H-bond with H atom of ND2 of His41.
The Main protease enzyme indicates the possibility of an alternative catalytic mechanism for the coronavirus. Again, considering the tautomeric forms of catalytic histidine in serine protease [17], the ND1 nitrogen atom of the imidazole ring was initially protonated and then the NE2 atom of catalytic His41 and ND1 of His163 was energy minimized accordingly. This study may shed some light on conserved water-mediated H-bonding interactions with His41 and the stabilization of active residue Cys145 as well as their manifestation in the catalytic mechanism of the main protease. Therefore, conserved water molecules may play an important role in the catalytic activity of Mpro and also shown a vital role in protein structure, enzyme catalysis, protein architecture, conformational stability, protein plasticity, stabilization of salt bridges, ligand binding, and selectivity for specific interactions [18][19][20]. The results obtained so far for all of the apo structures of main protease highlight the roles played by some water molecules in the coupling of His41 residues. The interaction of His41 with W1 and W2 and the interaction of His163 and Tyr161 are conserved in all of the X-ray structures. This study mainly tries to elucidate the plausible role of water molecules, the participation of His163 and Tyr161 in the subsite specificity of ligand binding region to this enzyme. In this aspect, our computational study provides a testable hypothesis for the catalytic mechanism (W2---W1---His41---Cys145 or Cys145---His163---Tyr161) of Mpro protein that may be effective near future to solve the crystal structure of Mpro protein with potential inhibitor for COVID-19.
The different research groups across the world are working to explain the catalytic mechanism of SARS-CoV-2 that causes COVID-19 disease. In case, the structural information on SARS-CoV-2 protein is unavailable, the structurebased drug design is highly challenging. The crystal structures of Mpro provide insufficient insight into the structural and functional importance of water molecules in the vicinity of the ligand-binding pocket. Furthermore, highresolution structures cannot yield detailed information regarding the binding process of ligands, whereas present computer techniques can provide some valuable information that guides the experimental method. Zhang et al. determined the crystal structure of ligand-free conformation of Mpro protein to develop the lead compound into a potent inhibitor of the SARS-CoV-2 Mpro [21]. In particular, the present computational study appears as a promising approach for the study of protein-water-ligand interactions, water displacement strategy by proper ligand, and a detailed description of the structural properties of protein surface-bound individual water molecules.

Structures Collection
The 22 crystal structures with ligand-bound conformations (PDB ID: 5R7Y, 5REB, 5REM, 5REN, 5REU, 5REW, 5REX, 5RF2, 5RF6, 5RFG, 5RFH, 5RFI, 5RFJ, 5RFK, 5RFM, 5RFN, 5RFO, 5RFR, 5RFT, 5RFV, 5RFY, and 5RGL) and two ligand-free conformations (PDB ID: 6Y84 and 6YB7) were obtained from the RCSB-Protein data bank [22]. The crystallographic and structural information of those proteins has been included in supplementary information in Table S1. The X-ray structure of 6YB7 (from Mpro) was taken as a template for further computational study because its catalytic region (Cys145 and His41) contains ligand-free open conformation, whereas the catalytic regions of remaining crystal structures contain ligand-bound closed conformation which creates a barricade against ligand to enter at this site. The catalytic residue's SG atom of Cys145 is facing towards NE2 of His163 within the distance of 3.80 Å in 5RGL and 5REX crystal structures. So, the ligands (PDB name U0Y and T4V) were removed from the respective crystal structure because they covalently attached with a thiol group (-SG) of Cys145.

Analysis of H-binding Interactions
The H-bonding distances between residues of each catalytic region were measured using the Swiss PDB Viewer program [23]. The residue-residue, residue-water, ligand -water, and water-water interactions of the above-mentioned 22 crystal structures were identified accordingly.

Energy Minimization
To investigate the short contacts between Cys145 of SG to NE2 of His41 and Cys145 of SG to NE2 of His163, the ligand-bound conformation of 5RGL and 5REX were energy minimized using CHARMM-36 all-atom force field (with map correction) [24] of the Visual Molecular Dynamics (VMD v. 1.9.3) program [25]. During minimization, His41 was protonated as HSD and HSE accordingly, whereas His163 was converted only as HSD form. Therefore, the catalytic region of those structures was prepared as HSD41---Cys145---HSD163 and HSE41---Cys145---HSD163 form to visualize the movement of the thiol group (-SH) of Cys145. The minimization was adopted to stabilize the system using steepest-descent [26] followed by 1000 steps.

Identification of Conserved Water Molecules at Catalytic Region
Two conserved water molecules (W1 and W2) were identified in the catalytic region of 6YB7 and 6Y84 (ligandfree) using the 3dSS program [27]. In addition, the conserved positions of these water molecules were verified by the Swiss PDB Viewer program. Two water molecules (oxygen atoms) or two atoms (oxygen atom of water and proper atom of ligand) whose center-to-center distance was within 1.80 Å [28] in between reference and movable structures were assigned as conserved [29][30][31]. The W1 is H-bonded with ND1 of His41, whereas W2 strongly interacts with W1; hence, they were assigned as conserved water positions at the catalytic region of unliganded Mpro. The crystal structure of 6YB7 from Mpro was taken as a reference with the remaining 23 X-ray structures assigned as movable, and conserved water sites were determined between two respective crystal structures. Interestingly, respective atoms of each ligand from corresponding crystal structures (ligand-bound forms) are observed to occupy either the positions of W1 or W2 or both at the catalytic region of Mpro. The summary of the computational methodology is provided in Figure 2 given below.

Comparative Analysis of Ligand Binding Sites in Crystal Structures of Mpro Protein
The crystal structures of Mpro enzyme are available in the protein data bank either as ligand-bound or as ligandfree form. Hence, our comparative analysis on 22 crystal structures of ligand-bound forms and 2 ligand-free forms revealed structural transitions of the catalytic region and active participation of His163, Try161, and conserved water molecules. Interestingly, the ligand JFM (PDB ID: 5R7Y), T0Y (PDB ID: 5REB), and HVB (PDB ID: 5RF2) are not observed to interact with any the side chain of catalytic residues. The NTG ligand (PDB ID: 5RF6) is H-bonded with OG1 of Thr25 and NE2 of His41, whereas remaining 16 ligands are observed to interact with catalytic -SG group of Cys145 (Table S1).

Possible Catalytic Mechanism of Mpro Enzyme
The detailed structural analysis of 24 X-ray structures and previous experimental evidence suggests that the activity of the Mpro enzyme is mainly controlled by two residues, Cys145 and His41 (catalytic dyad), whereas W1 water molecule (W639 in 6YB7 and W615 in 6Y84) acts as a catalytic partner. The His41, located at the β-sheet of the right domain, can polarize and interact with the -SH group of Cys145 at α-helix of the left domain, forming a thiolateimidazolium ion-pair [32] which influences the ligand-binding propensity of the protein. Possibly W1 stabilizes the thiolate-imidazolium ion-pair (Cys145-SH---NE-His41) through the formation of an H-bond with ND1 of His41 ( Figure 3). Since this bond is observed to be collinear to the covalent Cβ-Cγ bond of His41, the imidazole ring may rotate around this bond without disrupting the hydrogen bond. Thus, the imidazole ring stays in an optimum position during the catalytic process. 0In addition, the H-bonding interactions have been also observed between the -SH group of Cys145 and NE2 of His163 in energy minimized structures of 5RGL and 5REX. In detail, the H atom of the thiol group of Cys145 moves towards NE2 of His163 when NE2 of His41 is protonated as HSE, but the concern H atom is reversely moved towards NE2 of His41 when ND1of His41 is protonated as HSD. The superimposed energy minimized structures between HSD41 and HSE41 have indicated the change of orientation of the H atom of Cys145 from His163 to His41 ( Figure  4). Hence, His163 may polarize and interact with the -SH group of Cys145 (in HSE41 state) and form a thiolateimidazolium ion-pair, which may influence the inhibitor-binding propensity of this region. Possibly -OH of Tyr161 is found to stabilize the thiolate-imidazolium ion-pair through the formation of an H-bond with ND1 of His163 (energy minimized and crystal structure of 5RGL and 5REX). So, the recognition of W2---W1---ND1-His41-NE2---SH-(Cys145) or Cys145 (-SH)---NE2-His163-ND2---OH-Tyr161 seems very interesting of this enzyme. These possible enzymatic mechanisms suggest the main protease has an interesting catalytic system that contains His41, Cys145, and His163 by which it may cleave the residues. The crystallographic evidence (PDB ID: 5REX and 5RGL) are also highlighting the distances from Cys145 (SH) to (NE) His163 that are within 3.89 Å, whereas the distances from His41 (ND) to Cys145 (SH) in remaining crystal structures are within 3.90 Å (except 5RFM) (Table 1). These interaction patterns have also been confirmed the functional role of His163 along with Cys145 of Mpro enzyme. Based on the above-mentioned mechanism, Hilgenfeld et al. proposed αketoamide act as a lead compound into a potent inhibitor of the SARS-CoV-2 Mpro, and it strongly H-bonds with His163, Cys145, and His41 (PDB ID: 6Y2F and PDB: 6Y2G) [21]. Hence, crystallographic and other evidences have confirmed the participation of His163 and Tyr161 in the catalytic mechanism ( Figure 5).

Discussion
The present computational study focuses on the following basic information (i) catalytic region of Mpro enzyme and its conformational changes upon binding different ligands, (ii) participation of His163 and Tyr161, and (iii) role of conserved water molecules for possible drug discovery. However, the remote residues from the catalytic region of Mpro were perhaps thought to be non-essential for functional activity, but our computational hypothesis suggests that (i) His163, Tyr161, W1, and W2 water molecules may play an important role in tuning the changes of conformation though they are away from active site region, and (ii) they may act as catalytic partners with Cys145-His41 pair. The coupling of a catalytic dyad with non-catalytic partners (His163 and Tyr161) may provide some pointers for future biochemical or experimental assays to better understand the molecular basis of this enzyme operation. The computational investigations are consistent with the existence of two conserved water molecules (W1 and W2) that maintain the architecture of catalytic dyad (Cys145-His41) in the ligand-free conformations of Mpro.
In particular, W1, W2 (at His41), and Tyr161 (at His163) are involved in the stabilization of thiolate-imidazolium ion-pair (Cys145-SH---NE2-His41 and Cys145-SH---NE2-His163 dyad). Again, we also observed well-organized networks are formed from His41 to His163 in the Mpro crystal structure. Interestingly, crystallographic evidence provides two immense suggestions: (i) ligands in few crystal structures have occupied either at W1 or W2 or both places to displace these water molecules; (ii) most of these ligands can either bind near Cys145 or His41 or His163 to make H-bonds with these residues but they are not observed to interact with those three residues conjugally (except PDB ID: 6Y2G). In this regard, our computational structural analyses highlight the presence of geometrical and electronic clouds of two conserved water molecules near His41 (in the ligand-free form) and also the presence of ligand either at the Cys145-His41 site or at Cys145-His163 region (in the ligand-bound state). The more distant residues in the ligand-binding region were perhaps thought to be non-essential for the catalytic activity of Mpro, but computational results suggest that His163, Tyr61, W1 and W2 may play a significant role. These vital clues obviously may guide medicinal chemist to rethink the structure based drug design protocol of Mpro protein and also to assist the structural biologists to solve and refine the crystal structure of the Mpro protein without any scientific ambiguity. Therefore, our computational investigations on interactions of W2---W1---ND1-His41-NE2---SH-Cys145 or Cys145-SH)---NE2-His163-ND1---OH-Tyr161 of Mpro might serve as an excellent drug target for coronavirus.

Conclusion
The present computational study focuses on the ligand binding pocket of the Mpro protein to characterize its stabilization, the role of conserved water molecules, and participation of non-catalytic residues during the enzymatic mechanism. This computational evidence encourages us to identify structural and functional discriminative features of the catalytic region of Mpro for comparative analysis between its ligand-free and ligandbound form. The structural analysis on crystal data for Mpro protein suggests that W1, W2, His163, and Tyr161 may play a decisive role in the activity of this enzyme though His163 and Try161 are occupied remotely from the catalytic region, and they may act as catalytic partners with Cys145-His41 catalytic pair. Possibly, thiolate-imidazolium ionpair between Cys145 (-SH---NE2-) His41 and Cys145 (-SH---NE2-) His163 have been stabilized by W1 (with W2) and Tyr161 respectively. Therefore, unique interactions of W2---W1---ND1-His41-NE2---SH-Cys145 or Cys145-SH---NE2-His163-ND1---OH-Tyr161 in Mpro serve as an excellent drug target for this enzyme and suggest a rethink of the conventional definition of chemical geometry of inhibitor binding site, its shape, and complementarities. The crystallographic information suggests two essential clues that may be implemented for designing new inhibitors or drugs for SARS-Cov-2-Mpro. Our computational strategies act as an important approach so that the synthesis of new compounds or repositioning of exit drugs can be evaluated. The strategies are: (i) ligand should be occupied either W1 or W2 or both these position to displace water molecules from the catalytic region; and (ii) ligand should be made Hbonds with Cys145 (-SH), His41 (NE2/ND1) and His163 (NE2) to inhibit this enzyme The results from this computational study could be of interest to the experimental community and also provide a testable hypothesis for experimental validation.

Data Availability
All data included in this study are available upon request by contact with the corresponding author.

Authors' Contributions
Alvea Tasneem, Gyan Prakash Rai and Saima Reyaz contributed equally to this manuscript and shared first authorship. Hridoy R. Bairagya conceived of the scientific idea and wrote the main manuscript. Alvea Tasneem conducted experimental studies. Hridoy R. Bairagya, Alvea Tasneem, Gyan Prakash Rai and Saima Reyaz conducted processing and analysis of computational data. Alvea Tasneem, Gyan Prakash Rai and Saima Reyaz contributed to the interpretation of the results and Alvea Tasneem prepared the figures and tables. All authors reviewed and edited the manuscript accordingly. All authors listed in the manuscript have concurred with the content of the manuscript and submission.

Acknowledgments
HRB is thankful to C.S.I.R. (Council of Scientific and Industrial Research, Govt. of India) for providing Senior Research Associateship (Pool Scientist scheme). HRB gratefully acknowledges the Department of Biophysics, AIIMS, New Delhi. AT, GPR and SR gratefully acknowledge the Department of Computer Science, Jamia Millia Islamia, a Central University-New Delhi, for providing computational resources. We express our sincere thanks to Prof. Bishnu Prasad Mukhopadhyay from NIT, Durgapur (West Bengal, India) for fruitful scientific discussion on this research work.

Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Ethical Approval
The manuscript does not contain experiments on animals and humans; hence ethical permission not required.  Tables   Table S1. List of X-ray crystal structures of Main Protease (Mpro) enzyme and RBD of spike protein of SARS-CoV-2 in the study. Main protease with unliganded active site [56] 121