Abstract: Background: New study findings indicate that variation in the transcription factor 7-like 2 (TCF7L2) gene, linked to the pathogenesis of type 2 diabetes. In the present study, the protein structure model of TCF7L2 was generated, to understand the structure, function and mechanism of the action of proteins. The present study was designed to enlist some of the physiochemical and functional properties of TCF7L2 protein and provides information about its three-dimensional structure. Materials and Methods: The PDB file of TCF7L2 [CAG38811] was generated by Phyre 2 servers. Model construction and regularization (including geometry optimization) of model were done by optimization protocol in YASARA. The energy of the model was minimized using the standard protocols of combined application of simulated annealing, conjugate gradient and steepest descent. The UCLA-DOE server was used to visual analysis of the quality of a putative crystal structure for protein. The validation for structure models was performed by using PROCHECK. The model was further analyzed by WHATIF, QMEAN and ProSA. Results: The model showed good stereo-chemical property in terms of overall G-factor value of -0.64 indicating that geometry of model corresponds to the probability conformation with 67.9% residue in the core region of Ramachandran plot showing high accuracy of model prediction. The Z-score of -6.07 predicted by ProSA represents the good quality of the model. The Z-score also measures the divergence of total energy of the structure with respect to an energy distribution derived from random conformations. The scores indicate a highly reliable structure and are well within the range of scores typically found for proteins of similar size. The energy plot shows the local model quality by plotting knowledge-based energies as a function of amino acid sequence position. Conclusions: The generated model could be supportive to understand the functional characteristics of transcription factor 7-like 2 (TCF7L2). The variants in TCF7L2 associated with the risk for type 2 diabetes.
INTRODUCTION
Impaired insulin secretion is concomitant with type 2 diabetes, as discussed by Anna et al.1. The transcription factor 7-like 2 (TCL7L2) is a transcription factor whose gene product is concerned for blood glucose homeostasis: A High Mobility Group (HMG), as discussed by Yi et al.2. Factor TCF7L2 is responsible for type 2 diabetes, whose gene is positioned on chromosome 10q25 and legalizes cell differentiation and proliferation, as discussed elsewhere3,4. The TCF7L2 has recently been implicated in the pathogenesis of type 2 diabetes (T2D) through regulation of pancreatic β-cell insulin secretion, as discussed elsewhere5,6. There are variants of TCL7L2, which intensify the menace for type 2 diabetes, further the variants likely impact both insulin sensitivity and insulin secretion, as discussed by Shu et al.7. Bioinformatics helps in management of complex biological data, sequence analysis and algorithmic designing, as discussed elsewhere8,9. However, by using the in silico analysis we can analyze the protein sequences10,11. Therefore, the present study enlists some of the physiochemical and functional properties of TCF7L2 protein and provides information about its three-dimensional structure.
MATERIALS AND METHODS
Operating system: The present study was conducted by using Intel (R) Core (TM) i3-370 M CPU @ 2.40 GHz and 32 bit operating system (HP ProBook).
Sequence retrieval, alignment and homology modeling: The FASTA sequence of transcription factor 7-like 2 (TCF7L2 [CAG38811]) protein was retrieve from NCBI. The PDB file of TCF7L2 [CAG38811] protein was generated by Phyre 2 servers by using its FASTA sequence. In order to build a model of protein domain, multiple sequence alignment was performed between full length TCF7L2 protein sequence and another protein sequences in this database. To build the model of the TCF7L2 protein with more homology, structure of TCF7L2 protein model in 3D-JIGSAW server was selected as template. Model construction and regularization (including geometry optimization) of model were done by optimization protocol in YASARA. The energy of the model was minimized using the standard protocols of combined application of simulated annealing, conjugate gradient and steepest descent.
Model reputation: The UCLA-DOE server provides a visual analysis of the quality of a putative crystal structure for protein. Verify 3D expects this crystal structure to be submitted in PDB format, as discussed by Luthy et al.12. The validation for structure models was performed by using PROCHECK, as discussed elsewhere13,14, which provides satisfactory results suggesting reliability of the model, as discussed by Sehgal et al.15. The model was selected on the basis of various factors such as overall G-factor, number of residues in core that fall in generously allowed and disallowed regions in Ramachandran plot. The model was further analyzed by WHATIF, as discussed elsewhere16,17 QMEAN18,19 and ProSA, as discussed by Wiederstein and Sippl20. The ProSA was used for the display of Z-score and energy plots.
RESULTS AND DISCUSSION
Building of protein model: The basic principle of homology modeling is the selection of template and sequence alignment between the target and the template, as discussed by Chhabra and Dixit21. Sequence alignment of TCF7L2 protein by using the phyre 2 server, revealed sequence homology with catenin binding domain (ID = 99%), which was selected as template for the model building of TCF7L2 protein. Total 41 residues (7% of query sequence) have been modelled with 99% confidence by the single highest scoring template, as discussed by Pitchai et al.22. To build the model, PSI-BLAST was done with the maximum E-value allowed for template being 0.005. Using catenin binding domain sequence modeling of TCF7L2 protein domains was done with the help of YASARA (Fig. 1).
Model reputation: The model showed good stereo chemical property in terms of overall G-factor value of -0.64 indicating that geometry of the model corresponds to the probability conformation with 67.9% residues in the core region of Ramachandran plot showing high accuracy of model predicted, as discussed elsewhere23. The number of residues in allowed and generously allowed region was 20.5% and 11.5%, respectively and none of the residues were present in the disallowed region of the plot (Fig. 2). The above results indicate that the protein model is reliable, as discussed by Sahu and Shukla24.
The verify 3D graph illustrate the compatibility of an atomic model (3D) with amino acid sequence, as discussed by Biswas25 and score profile access the quality of the model, as discussed by Sahu and Shukla24. The high score of 0.28 indicates that environment profile of the model is good (Fig. 3).
Fig. 1: | TCF7L2 protein ribbon model generated using YASARA |
Fig. 2: | Ramachandran plot analysis of TCF7L2 protein. Total number of residues were 156 with 67.9% in most favored regions [A, B, L], 20.5% in additional allowed regions [a,b,l,p], 11.5% in generously allowed regions and 0% in disallowed regions |
Profile score above zero in the verify 3D graph, as discussed elsewhere12,26 corresponds to acceptable environment of the model. In verified 3D plot, 17.50% of the residues had an averaged 3D-1D score> = 0.2.
Model validation: ProSA was used to check the three-dimensional model of TCF7L2 proteins for potential errors. The ProSA Z-score of -6.07 indicates the overall model quality and measures the deviation of the total energy of TCF7L2 protein (Fig. 4). The predicted value of Z-score -6.07 was in a range characteristic of native proteins indicating very less erroneous structures, as discussed by Mustufa et al.27.
The quality of estimated model is based on the QMEAN scoring function were normalized with respect to the number of interactions, as discussed by Benkert et al.28.
Fig. 3: | Verified 3D graph of TCF7L2 protein [CAG38811] |
Fig. 4(a-b): | ProSA web service analysis of TCF7L2 protein model |
Fig. 5(a-b): | (a) Density plot for QMEAN showing the value of Z-score and QMEAN score and (b) Plot showing the QMEAN value as well as Z-score |
The QMEAN score of the model was 0.189 and the Z-score was -4.16, which was very close to the value of 0 and this shows the fine quality of the model, as discussed elsewhere29,30 because the estimated reliability of the model was expected to be in between 0 and 1 and this could be inferred from the density plot for QMEAN scores of the reference set (Fig. 5a). A comparison between normalized QMEAN score (0.40) and protein size in non-redundant set of PDB structures in the plot revealed different set of Z-values for different parameters such as C-beta interactions (-1.22), interactions between all atoms (-1.46), solvation (-0.39), torsion (-3.83), SSE agreement (-0.53) and ACC agreement (-2.88) (Fig. 5b). The Z-score measures the total energy deviation of the TCF7L2 protein structure with respect to an energy distribution derived from random conformations, as discussed by Rekik et al.31.
CONCLUSION
The generated model could be supportive to understand the functional characteristics of transcription factor 7-like 2 (TCF7L2). The variants in TCF7L2 associated with the risk for type 2 diabetes. The in silico molecular modeling and validation studies is helpful to understand the structure, function and mechanism of proteins action. The structure validation of generated model was done by using WHATIF, PROCHECK, ProSA and QMEAN confirmed the reliability of the model.
The model showed good stereo-chemical property in terms of overall G-factor value of -0.64 indicating that geometry of model corresponds to the probability conformation with 67.9% residue in the core region of Ramachandran plot showing high accuracy of model prediction. The Z-score of -6.07 predicted by ProSA represents the good quality of the model. The Z-score also measures the divergence of total energy of the structure with respect to an energy distribution derived from random conformations. The scores indicate a highly reliable structure and are well within the range of scores typically found for proteins of similar size. The energy plot shows the local model quality by plotting knowledge-based energies as a function of amino acid sequence position.
ACKNOWLEDGMENTS
The authors are thankful to Dr. M.R. Jape (Principal, Rama Medical College, Ghaziabad Utar Pradesh, India) for their precious support and approval of present study project. The thanks also to the Bioinformatics and Biochemistry Laboratory research group members for technical support.
SIGNIFICANCE STATEMENT
• | As we know that the transcription factor 7-like 2 (TCF7L2) recently implicated in the pathogenesis of type 2 diabetes (T2D). The present study was designed to enlist some of the physiochemical and functional properties of TCF7L2 protein and provides information about its 3D structure |
• | The generated model could be supportive to understand the characteristics of TCF7L2. The in silico molecular modeling and validation studies of TCF7L2 is helpful to understand the structure, function and mechanism of proteins action |
• | The predicted model of TCF7L2 is useful for finding interactions with other proteins involved in type 2 diabetes (T2D). These findings can be used to future investigation about the molecular basis of type 2 diabetes (T2D) and new drug designing for treatment |