Steviol glycosides are the major secondary metabolites synthesized through steviol glycoside biosynthesis pathway operating in the leaves of Stevia rebaudiana. Present article documents the structural analysis of enzymes specific to steviol glycoside biosynthesis pathway, kaurenoic acid-13 hydoxylase (KAH) and three UDP-glycosyltransferases (UGT85C2, UGT74G1 and UGT76G1). The in silico protein structure prediction server SWISS MODEL was used to predict and evaluate the models. The secondary structure data of predicted model for KAH was in accordance to that of cytochrome P450s suggesting its nativeness to the respective superfamily. Similarly, the secondary structure data of target UGTs also showed conservation with the structural information of glycosyltransferases superfamily. PROCHECK and QMEAN Z-score evaluations suggested that the models predicted for the 4 query enzymatic proteins were of good quality. In addition, Ligand binding site analysis and molecular docking analysis was carried out for the predicted models. The following data suggested a possibility of the presence of an alternate pathway for the synthesis of steviol glycosides.
PDF Abstract XML References Citation
How to cite this article
Steviol glycosides are the diterpene secondary metabolites from S. rebaudiana. These are the glycosylated products of the precursor steviol (Richman et al., 2005). Steviol glycosides are used as dietary supplements as natural sweetner in various nations. It has been known that these glycosides are anti-diabetic, non-cariogenic and non-mutagenic (Yadav and Guleria, 2011). Steviol glycosides are synthesized in the leaves of Stevia via steviol glycoside biosynthesis pathway (Fig. 1). Steviol glycoside biosynthesis pathway comprises of 16 steps catalyzed by several enzymes. Among these, the last 5 steps are specific to steviol glycoside biosynthesis pathway (Yadav and Guleria, 2011; Guleria et al., 2011; Brandle and Telmer, 2007). These steps are catalyzed by enzymes Kaurenoic Acid-13 Hydroxylase (KAH) and four UDP-glycosyltransferases (UGTs) identified as UGT85C2, UGT74G1 and UGT76G1. One UGT is still to identify (Yadav and Guleria, 2011; Brandle and Telmer, 2007). Despite of the huge and wide prevailing importance, various aspects of this pathway are still hidden.
Formulating three dimensional structure of a protein is of great help in understanding its biochemical functions and molecular interaction properties (Bordoli et al., 2009). Protein structures are more conserved than protein or DNA sequences.
|Fig. 1:||Schematic representation of steviol glycoside biosynthesis pathway, Enzymes (bolded) are specific to steviol glycoside biosynthesis pathway|
So, in silico approaches are being used to predict the structure of an unknown protein taking known three dimensional structures of related protein family as a template reference (Chithia and Lesk, 1986; Mugilan et al., 2010; Smith and Plazas, 2011; Joseph and Nair, 2012). Comparative homology modeling has been utilized to predict three dimensional structures for various plant proteins such as cytosolic glutamine synthetase from Camellia sinensis (Yadav, 2009), late embryogenesis abundant protein from Arabidopsis thaliana (Boobalan and Bharathi, 2010) lectins from mushroom (Khan and Khan, 2011) and thioredoxin from Triticum aestivum (Prabhavathi et al., 2011). Higher the sequence similarity, more significant is the structural identity and consequently, an increased reliability of the predicted model (Marti-Renom et al., 2000; Huq, 2008a, b).
In this study, protein structures of steviol glycoside biosynthesis pathway enzymes; Kaurenoic Acid-13 Hydoxylase (KAH) and three UDP-glycosyltransferases (UGT85C2, UGT74G1 and UGT76G1) were analyzed using in silico approaches. This is the first report documenting the computational elucidation of three dimensional models of these enzymatic proteins. These structures were further analyzed for the prediction of ligand binding sites and ligand interaction affinities.
MATERIALS AND METHODS
Retrieval of target sequences: The amino acid sequences of the enzymatic proteins KAH, UGT85C2, UGT74G1 and UGT76G1 were obtained from NCBI (http://www.ncbi.nlm.nih.gov) Protein Database in FASTA format. It was made ascertain that three dimensional structures of these proteins were not available in Protein Data Bank (PDB). Accession number, protein lengths have been tabulated (supplementary Table 1 in appendix).
Structure prediction and evaluation: The three dimensional structures were predicted using SWISS-MODEL, an automated protein homology-modeling server (Schwede et al., 2003). The amino acid sequences of respective proteins were submitted to SWISS-MODEL server one-by-one and following steps were performed: domain annotation, template identification, automated modeling and structure assessment. Domain annotation determined the superfamily to which respective protein belonged as well as secondary structure elements of the target protein. Template identification predicted the possible templates for target sequence on the basis of target-template sequence similarity. Three dimensional structures were determined using automated modeling mode and the predicted models were evaluated using Structure Assessment tool of SWISS-MODEL server (Bordoli et al., 2009).
Ligand binding site prediction: Ligand binding sites of the evaluated models were predicted by submitting models to Q-SiteFinder server. This server depicts the energetically favorable ligand binding sites by using methyl probes at a grid resolution of 0.9 Å on a three dimensional grid encompassing the whole protein molecule (Laurie and Jackson, 2005).
Molecular docking analysis: Molecular docking between the predicted models corresponding to KAH, UGT85C2, UGT74G1, UGT76G1 and the pathway substrate molecules was performed using Molecular Docking server (http://www.dockingserver.com). It involves three steps: uploading of desired ligand molecule from NCBI PubChem, uploading of the query protein molecules through PDB and finally followed by docking. Each docking run was repeated twice to get best results (Bikadi and Hazai, 2009). Selected ligands were ent-kaurenoic acid, steviol, steviolmonoside, steviolbioside, stevioside and rebaudioside A. The uploaded proteins were three dimensional models predicted for enzymatic proteins KAH, UGT85C2, UGT74G1 and UGT76G1. Each protein was docked with every uploaded ligands.
RESULTS AND DISCUSSION
Homologue identification and secondary structure analysis: Kaurenoic Acid-13 Hydroxylase (KAH) and three UDP glycosyltransferases (UGT85C2, UGT74G1 and UGT76G1) are the key enzymes of steviol glycoside biosynthesis pathway (Fig. 1). In this paper, SWISS-MODEL was used to predict structures of these enzymatic proteins. Domain annotation data indicated that KAH belongs to Cytochrome P450 superfamily. No plant enzymatic proteins belonging to P450 superfamily have been analyzed for their structures (Rupasinghe and Schuler, 2006). While domain annotation of UGT85C2, UGT74G1 and UGT76G1 documented that these enzymatic proteins belong to glycosyltransferases (GTs) superfamily and/or UDP-glucuronosyl/UDP-glucosyltransferase family. The structural analysis of GT proteins have been carried out for Medicago truncatula (Shao et al., 2005) and Vitis vinifera (Offen et al., 2006). Secondary structure predicts the α-helices and β-sheets present in the query protein. The analysis data of predicted secondary structures for all the four proteins are shown in Fig. 2 and their evaluations are presented in Supplementary Table 1 (Appendix). KAH possessed a higher number of α-helices (264) than β-sheets (37) or random coils (174) (Fig. 2a). Presence of increased number of helices than β-sheets is the conserved structural feature of Cytochrome P450s (Graham and Petersen, 1999; Stout, 2004). This suggested the nativeness of KAH to Cytochrome P450s and its helical nature. UGT85C2 possessed 220 random coils, 196 α-helices and 64 β-sheets (Fig. 2b). An increased number of random coils (213) than α-helices (189) or β-sheets (57) were also observed for UGT74G1 (Fig. 2c). Similarly, UGT76G1 showed 207 random coils, 189 α-helices and 61 β-sheets (Fig. 2d). The data documented the dominating character of random coils in predicted secondary structures of UGTs. This suggests that these three UGTs belong to the same family and possess coiled geometry.
Model prediction and evaluation: Template identification searched templates for the query sequences on the basis of significant sequence similarity. Retinoic acid bound cyanobacterial CYP120A1 protein (PDB ID 2ve3 chain A), a Cytochrome P450 was identified as template for KAH showing 33.9% of sequence similarity. 2vg8 chain A, N and O glucosyltransferase involved in xenobiotic metabolism of plants was identified as template for UGT74G1 with a highest sequence similarity of 26.6%.
|Fig. 2(a-d):||Secondary structure for enzymes (a) KAH, (b) UGT85C2, (c) UGT74G1 and (d) UGT76G1. KAH possesses helical structure and UGTs possess coiled structure|
|Fig. 3(a-d):||Three dimensional model of (a) KAH, (b) UGT85C2, (c) UGT74G1 and (d) UGT76G1|
While 2pq6 chain A, crystal structure of Medicago truncatula UGT85H2, was identified as template for both UGT85C2 and UGT76G1 with a sequence similarity of 43.3 and 28%, respectively. For all the query proteins template-target sequence identity was more than 25%, hence suitable to conduct automated modeling (Schwede et al., 2003).
The three dimensional structure of enzymatic protein KAH was determined at 2.1 Å resolution by Automated modeling method of SWISS-MODEL server. The employed template was 2ve3 chain A, a cyanobacterial cytochrome P450. Out of total 476 residues, 447 residues were included by the software to constitute the modeled structure. The structure consisted of a single chain comprising of 17 α-helices and 12 β-strands arranged in four β-pleated sheets (Fig. 3a). This kind of arrangement is also a characteristic feature of Cytochrome P450 folds (Rupasinghe and Schuler, 2006; Kuhnel et al., 2008). Conserved secondary structure suggests that KAH belongs to cytochrome P450 superfamily. The enzymatic protein UGT85C2 was modeled at a resolution of 2.1Å by using template 2pq6 chain A, UGT85H2 from M. truncatula. A total of 468 residues out of 483 residues were included to constitute the three dimensional model. The predicted model possessed single chain consisting of 20 α-helices and 13 β-strands. The β-strands were arranged in two β-pleated sheets, one possessing seven stranded parallel β-strands and the other with six stranded parallel β-strands (Fig. 3b). This feature is common for the UGTs modeled till date using experimental approaches (Shao et al., 2005; Li et al., 2007). Results suggested the nativeness of computationally modeled UGT85C2 with the experimentally solved UGTs.
Similarly, automated modeling was carried out for the protein UGT74G1. This enzymatic protein was modeled on the basis of template 2vg8 chain A at 1.75 Å resolution. The predicted model consisted of 449 residues out of the total 460 residues. The geometry of model comprised of one chain, 16 α-helices, 13 β-strands arranged in 7 and 6 stranded parallel two β-pleated sheets (Fig. 3c). The geometry and topology of modeled UGT74G1 was similar to UGT85C2. Hence, similar to UGT85C2, the UGT74G1 was conserved for structural features with the experimentally solved UGTs (Shao et al., 2005; Brazier-Hicks et al., 2007). The other UDP glycosyltransferase protein, UGT76G1 was modeled by automated modeling on the basis of template utilized for UGT85C2, 2pq6 chain A. The predicted model was constituted of 445 residues out of 458 residues. The model possessed single chain with 19 α-helices and 13 β-strands (Fig. 3d). Like UGT85C2 and UGT74G1, the β-strands of UGT76G1 were also arranged in the form of seven stranded and six stranded parallel two β-pleated sheets (Shao et al., 2005; Li et al., 2007). It was observed that the entire query UGTs possessed similar geometry of β-pleated sheets. Thus, it suggests that the studied UGTs; UGT85C2, UGT74G1 and UGT76G1 belong to same protein family.
The predicted models were assessed by evaluation through PROCHECK analysis (Laskowski et al., 1993). The number of residues in the most favored regions of Ramachandran plot deciphers the quality of predicted model. Another evaluating factor is the overall average value of G-factors. G-factors include the dihedral angles involved in phi-psi distributions, chi1-chi2 distribution, chi1, chi3-chi4, omega and main chain covalent forces constituting main chain bond angles and bond lengths. G-factors determine the degree of unusualness in the predicted model. The overall average values of G-factors below -0.5 and -1.0 corresponds to unusual and highly unusual properties of the model, respectively. Comparison of following parameters of the predicted models with respect to the template could help to assess the quality of the model.
The model for enzymatic protein KAH possessed 83.6% of the residues in the most favored regions of the Ramachandran plot (Fig. 4a), comparable to 89.6% for its template 2ve3A. The average value of G-factors was observed to be 0.03 (more than -0.5) for KAH and was 0.12 for its template 2ve3A. The data thus suggests that the model predicted for KAH is usual and worth use for representing the protein KAH. Ramachandran plot for UGT85C2 predicted model showed 87.2% of the residues in most favored regions (Fig. 4b), comparable to 89.6% for its template protein 2pq6A. The G-factor average value was 0.02 (more than -0.5) for UGT85C2 and was 0.35 for its template. Both the features of target protein were in close proximity to the template and within the required limits. Thus the model predicted for UGT85C2 was normal not unusual and can be used to represent the target protein. The model predicted for UGT74G1 showed 85.4% of residues in the most favored regions of Ramachandran plot (Fig. 4c), whereas 92.3% for its template 2vg8A. The average value of G-factor was -0.09 (more than -0.5) for UGT74G1. The G-factor was comparatively higher for this target protein to that of 0.11 for its template protein. In this case, both the features in target protein showed a higher variability from template protein. However, the G-factor was within limits which suggest that the predicted model is usual and can be used as a representative of the protein UGT74G1. The predicted three dimensional model for UGT76G1 possessed 87.0% of the residues in the most favored regions of Ramachandran plot (Fig. 4d), comparable to 89.6% for its template protein 2pq6A. The G-factor average was 0.02 (more than -0.5) for the modeled target protein and 0.35 for the template. Like UGT85C2, UGT76G1 was observed to be in close proximity of the template protein and the respective values were within limits. Hence, the predicted model was enough usual to represent structure of the target protein UGT76G1.
Recently, a new measure QMEAN Z-score has been introduced to determine the closeness of the computationally predicted models with the experimentally validated structures (Benkert et al., 2011). QMEAN score is a linear combination of six structural features: two distance dependent interaction potentials of mean force based on C-β atoms and on all atom types, torsion angle potential evaluating the local backbone geometry of the structure, solvation potential describing the burial status of residues and solvent accessibility in the form of SSE and ACC agreements. In order to calculate the QMEAN Z-score of a predicted model, the normalized raw scores of the model are compared to the scores obtained for a representative set of high resolution X-ray structures of similar size (number of residues of query protein ±10%).
|Fig. 4(a-b):||Ramachandran plots for the predicted three dimensional models of (a) KAH, (b) UGT85C2, (c) UGT74G1 and (d) UGT76G1|
The output is obtained in the form of a model quality plot in which the query model is marked on normalized QMEAN score data obtained from high resolution structures of similar size. The predicted model of KAH has normalized QMEAN score less than 1 that lies within the prescribed limits (Fig. 5a). Thus the QMEAN score suggests that model is of good quality.
|Fig. 5(a-d):||Normalized QMEAN Z-scores for the proposed three dimensional models of (a) KAH, (b) UGT85C2, (c) UGT74G1 and (d) UGT76G1|
The QMEAN score for UGT85C2 was predicted to be greater than 1 but less than 2 which was within the defined limits (Fig. 5b). Hence, the model was of good quality. The QMEAN score was very similar for both UGT74G1 (Fig. 5c) and UGT76G1 (Fig. 5d) to that of UGT85C2. Hence, models for both the enzymatic proteins UGT74G1 and UGT76G1 were also of good quality. Various quality assessment features suggested that the predicted models were of good quality and these models can be used to represent these enzymatic proteins KAH, UGT85C2, UGT74G1 and UGT76G1.
Ligand binding site prediction: In order to predict Ligand binding sites for the obtained evaluated models, the models were submitted to Q-SiteFinder server. Ten Ligand binding sites were predicted for each query model which were later arranged on the basis of total interaction energies. Out of predicted sites, the binding site with most favorable interaction energy, area and volume was identified as first predicted binding site. Ligand binding sites for the four query proteins are shown in Fig. 6. The red marked site was the first predicted site. The volume for predicted sites were 1221 cubic Angstroms for KAH (Fig. 6a), 300 cubic Angstroms for UGT85C2 (Fig. 6b), 1536 cubic Angstroms for UGT74G1 (Fig. 6c) and 513 cubic Angstroms for UGT76G1 (Fig. 6d). Various residues present in putative ligand binding sites of proteins KAH, UGT85C, UGT74G1 and UGT76G1 are shown in Supplementary Table 2 (Appendix).
Molecular docking analysis: Molecular docking predicts the stable protein-ligand interactions on the basis of protein-ligand complex geometries and binding energies (Shakyawar et al., 2011). The assessment is based on binding affinity between the protein and ligand molecules. It allows the accurate prediction of binding geometry and binding energies. It has been known that good binding geometry prediction depends upon the accurate prediction of binding energy. Least is the binding energy, higher is the binding affinity.
|Fig. 6(a-d):||Ligand binding sites predicted for three dimensional models of (a) KAH, (b) UGT85C2, (c) UGT74G1 and (d) UGT76G1. Ten sites were predicted for each protein. The site highlighted with red color is the first predicted site and a close view of the same has been shown as inset|
It has been found that the computationally predicted and experimentally solved binding energies are correlated, that further supports the computationally predicted data (Bikadi and Hazai, 2009; Lakshmi et al., 2011). Molecular docking was carried out to analyze the affinity of the studied enzymatic proteins with the substrates of the steviol glycoside biosynthesis pathway (Supplementary Table 1).
First committed step of steviol glycoside biosynthesis pathway involves conversion of ent-kaurenoic acid to steviol by the activity of enzyme KAH. The interaction affinity of KAH, UGT85C2, UGT74G1 and UGT76G1 was evaluated for ent-kaurenoic acid and steviol. The model predicted for KAH showed highest affinity for the ligand steviol (-9.23 kcal mol-1), followed by steviolmonoside (-9.07 kcal mol-1) and ent-kaurenoic acid (-8.07 kcal mol-1). UGT85C2 was observed to possess highest affinity for ent-kaurenoic acid (-7.30 kcal mol-1), followed by steviolmonoside (-6.91 kcal mol-1) and steviol (-6.32 kcal mol-1). Similarly, UGT74G1 model showed highest affinity for ent-kaurenoic acid (-7.97 kcal mol-1) followed by steviol (-7.19 kcal mol-1) and steviolmonoside (-2.41 kcal mol-1). The docking results for three dimensional model of UGT76G1 suggested its highest binding affinity for ent-kaurenoic acid (-7.11 kcal mol-1), followed by steviol (-6.91 kcal mol-1) and steviolmonoside (-6.48 kcal mol-1). Results demonstrate that these enzymes possibly have the ability to interact with more than one ligands of steviol glycoside biosynthesis pathway. This is in well support by the presence of number of active binding sites on the protein surfaces. Data documents the probability of existence of alternative pathways or branching for the synthesis of various metabolites of steviol glycoside biosynthesis pathway.
This is the first report on prediction of three dimensional models for the enzymatic proteins KAH, UGT85C2, UGT74G1 and UGT76G1 from Stevia rebaudiana. From the evaluation data of PROCHECK and QMEAN Z-score, it was found that the predicted models were enough usual and good to represent the query proteins. Further, prediction of ligand binding sites for the predicted models of enzymatic proteins has opened way to carry out various manipulative studies that could facilitate metabolic engineering. Understanding three dimensional structures and ligand binding sites provide the possibility of manipulating enzyme sequences for interaction with more ligands in addition to the already known ones. This study can help find way to increase the turn over production of steviol glycosides. Models for UGT85C2 and UGT76G1 were predicted on the basis of a common template, suggesting the possibility of single UGT that might be catalyzing two or more reactions by interacting with more ligands. Earlier study has also documented the multi-substrate activity of UGTs (Wang and Hou, 2009). This work suggested that the pathway may be modulated to specifically enhance the production of desired steviol glycosides to obtain final yield sweeter and less bitter.
We thank the Director, CSIR-IHBT, Palampur for his continuous support and encouragement. We would like to acknowledge the financial support from Council of Scientific and Industrial Research (CSIR), Govt of India. PG is also thankful to CSIR for proving fellowship as JRF.
|Supplementary Table 1:|| |
Detailed description of models and their evaluation data for proteins, KAH, UGT85C2, UGT74G1 and UGT76G1
|Supplementary Table 2:|| |
Residues present in putative ligand binding sites of proteins KAH, UGT85C, UGT74G1, UGT76G1
- Bikadi, Z. and E. Hazai, 2009. Application of the PM6 semi-empirical method to modeling proteins enhances docking accuracy of AutoDock. J. Cheminf., 1: 1-15.