HOME JOURNALS CONTACT

Journal of Biological Sciences

Year: 2005 | Volume: 5 | Issue: 6 | Page No.: 795-800
DOI: 10.3923/jbs.2005.795.800
Representing Protein Sequence with Low Number of Dimensions
Nazar Zaki and Safaai Deris

Abstract: This study introduces a simple method based on representing protein sequence by fix dimensions of the length three. We present hidden Markov model combining scores method. Three scoring algorithms are combined to represent protein sequence of amino acids for better remote homology detection. We tested the method on the SCOP version 1.37 dataset. The results show that, with such a simple representation, we are able to achieve superior performance to previously presented protein homology detection methods while achieving better computational efficiency.

Fulltext PDF

How to cite this article
Nazar Zaki and Safaai Deris, 2005. Representing Protein Sequence with Low Number of Dimensions. Journal of Biological Sciences, 5: 795-800.

Keywords: Support vector machine, hidden Markov model and protein homology detection

REFERENCES

  • Smith, T.F. and M.S. Waterman, 1981. Identification of common molecular subsequences. J. Mol. Biol., 147: 195-197.
    CrossRef    PubMed    Direct Link    


  • Pearson, W. R., 1990. Rapid and sensitive sequence comparisons with fastap and fasta. Method. Enzymol., 183: 63-98.
    PubMed    


  • Altschul, S.F., W. Gish, W. Miller, E.W. Myers and D.J. Lipman, 1990. Basic local alignment search tool. J. Mol. Biol., 215: 403-410.
    CrossRef    PubMed    Direct Link    


  • Gribskov. M., R. Luthy and D. Eisenberg, 1990. Profile analysis. Method Enzymol., 183: 146-159.


  • 5. Baldi, P., Y. Chauvin, T. Hunkapiller and M.A. Mcclure, 1994. Hidden Markov models of biological primary sequence information. Proc. Natl. Acad. Sci. USA., 91: 1059-1063.
    Direct Link    


  • Krogh, A., M. Brown, I.S. Mian, K. Sjolander and D. Haussler, 1994. Hidden Markov models in computational biology applications to protein modeling. J. Mol. Biol., 235: 1501-1531.
    Direct Link    


  • Altschul, S.F., T.L. Madden, A.A. Schaffer, J. Zhang, Z. Zhang, W. Miller and D.J. Lipman, 1997. Gapped BLAST and PSI-BLAST: A new generation of protein database search programs. Nucl. Acids Res., 25: 3389-3402.
    CrossRef    PubMed    Direct Link    


  • Karplus, K., C. Barrett and R. Hughey, 1998. Hidden Markov models for detecting remote protein homologies. Bioinformatics, 14: 846-856.
    Direct Link    


  • Jaakkola, T., M. Diekhans and D. Haussler, 2000. A discriminative framework for detecting remote protein homologies. J. Comp. Biol., 7: 95-114.


  • Vapnik, V.N., 1998. Statistical Learning Theory. 1st Edn., John Wiley and Sons, New York


  • Cristianini, N. and J. Shawe-Taylor, 2000. An Introduction to Support Vector Machines. Cambridge University Press, Cambridge, UK


  • Leslie, C., E. Eskin, J. Weston and W. Noble, 2004. Mismatch string kernels for discriminative protein classification. Bioinformatics, 20: 467-476.
    Direct Link    


  • Barrett, C., R. Hughey and K. Karplus, 1997. Scoring hidden markov models. CABIOS, 13: 191-199.


  • Liao, L. and W.S. Noble, 2003. Combining pairwise sequence similarity and support vector machines for detecting remote protein evolutionary and structural relationships. J. Comp. Biol., 10: 857-868.
    Direct Link    


  • Logan, B., P. Moreno, B. Suzek, Z. Weng and S. Kasif, 2001. A Study of Remote Homology Detection. Cambridge Research Laboratary, Cambridge


  • Jaakkola, T., M. Diekhans and D. Haussler, 1999. Using the Fisher kernel Method to Detect Remote Protein Homologies. CA. AAAI Press, Menlo Park, pp: 149-158.


  • Murzin, A.G., S.E. Brenner, T. Hubbard and C. Chothia, 1995. Scop A structural classification of proteins database for the investigation of sequences and structures. J. Mol. Biol., 247: 536-540.
    PubMed    Direct Link    


  • Swets, J.A., 1988. Measuring the accuracy of diagnostic systems. Science, 270: 1285-1293.
    CrossRef    PubMed    Direct Link    


  • Salzberg, S.L., 1997. On comparing classifiers pitfalls to avoid and a recommended approach. Data Mining Know. Dis., 1: 317-328.


  • Henikoff, S. and J.G. Henikoff, 1997. Embedding strategies for effective use of information from multiple sequence alignments. Protein Sci., 6: 698-705.
    Direct Link    


  • Shimshoni, Y. and N. Intrator, 1998. Classifying seismic signals by integrating ensembles of neural networks. IEEE Signal Process, 46: 1194-1201.
    CrossRef    


  • Merz, C., 1999. Using correspondence analysis to combine classifiers. Mac. Learn., 36: 33-58.

  • © Science Alert. All Rights Reserved