Representing Protein Sequence with Low Number of Dimensions
Abstract:
This study introduces a simple method based on representing protein sequence by fix dimensions of the length three. We present hidden Markov model combining scores method. Three scoring algorithms are combined to represent protein sequence of amino acids for better remote homology detection. We tested the method on the SCOP version 1.37 dataset. The results show that, with such a simple representation, we are able to achieve superior performance to previously presented protein homology detection methods while achieving better computational efficiency.
How to cite this article
Nazar Zaki and Safaai Deris, 2005. Representing Protein Sequence with Low Number of Dimensions. Journal of Biological Sciences, 5: 795-800.
REFERENCES
Smith, T.F. and M.S. Waterman, 1981. Identification of common molecular subsequences. J. Mol. Biol., 147: 195-197.
CrossRef PubMed Direct Link
Pearson, W. R., 1990. Rapid and sensitive sequence comparisons with fastap and fasta. Method. Enzymol., 183: 63-98.
PubMed
Altschul, S.F., W. Gish, W. Miller, E.W. Myers and D.J. Lipman, 1990. Basic local alignment search tool. J. Mol. Biol., 215: 403-410.
CrossRef PubMed Direct Link
Gribskov. M., R. Luthy and D. Eisenberg, 1990. Profile analysis. Method Enzymol., 183: 146-159.
5. Baldi, P., Y. Chauvin, T. Hunkapiller and M.A. Mcclure, 1994. Hidden Markov models of biological primary sequence information. Proc. Natl. Acad. Sci. USA., 91: 1059-1063.
Direct Link
Krogh, A., M. Brown, I.S. Mian, K. Sjolander and D. Haussler, 1994. Hidden Markov models in computational biology applications to protein modeling. J. Mol. Biol., 235: 1501-1531.
Direct Link
Altschul, S.F., T.L. Madden, A.A. Schaffer, J. Zhang, Z. Zhang, W. Miller and D.J. Lipman, 1997. Gapped BLAST and PSI-BLAST: A new generation of protein database search programs. Nucl. Acids Res., 25: 3389-3402.
CrossRef PubMed Direct Link
Karplus, K., C. Barrett and R. Hughey, 1998. Hidden Markov models for detecting remote protein homologies. Bioinformatics, 14: 846-856.
Direct Link
Jaakkola, T., M. Diekhans and D. Haussler, 2000. A discriminative framework for detecting remote protein homologies. J. Comp. Biol., 7: 95-114.
Vapnik, V.N., 1998. Statistical Learning Theory. 1st Edn., John Wiley and Sons, New York
Cristianini, N. and J. Shawe-Taylor, 2000. An Introduction to Support Vector Machines. Cambridge University Press, Cambridge, UK
Leslie, C., E. Eskin, J. Weston and W. Noble, 2004. Mismatch string kernels for discriminative protein classification. Bioinformatics, 20: 467-476.
Direct Link
Barrett, C., R. Hughey and K. Karplus, 1997. Scoring hidden markov models. CABIOS, 13: 191-199.
Liao, L. and W.S. Noble, 2003. Combining pairwise sequence similarity and support vector machines for detecting remote protein evolutionary and structural relationships. J. Comp. Biol., 10: 857-868.
Direct Link
Logan, B., P. Moreno, B. Suzek, Z. Weng and S. Kasif, 2001. A Study of Remote Homology Detection. Cambridge Research Laboratary, Cambridge
Jaakkola, T., M. Diekhans and D. Haussler, 1999. Using the Fisher kernel Method to Detect Remote Protein Homologies. CA. AAAI Press, Menlo Park, pp: 149-158.
Murzin, A.G., S.E. Brenner, T. Hubbard and C. Chothia, 1995. Scop A structural classification of proteins database for the investigation of sequences and structures. J. Mol. Biol., 247: 536-540.
PubMed Direct Link
Swets, J.A., 1988. Measuring the accuracy of diagnostic systems. Science, 270: 1285-1293.
CrossRef PubMed Direct Link
Salzberg, S.L., 1997. On comparing classifiers pitfalls to avoid and a recommended approach. Data Mining Know. Dis., 1: 317-328.
Henikoff, S. and J.G. Henikoff, 1997. Embedding strategies for effective use of information from multiple sequence alignments. Protein Sci., 6: 698-705.
Direct Link
Shimshoni, Y. and N. Intrator, 1998. Classifying seismic signals by integrating ensembles of neural networks. IEEE Signal Process, 46: 1194-1201.
CrossRef
Merz, C., 1999. Using correspondence analysis to combine classifiers. Mac. Learn., 36: 33-58.
© Science Alert. All Rights Reserved