Subscribe Now Subscribe Today
Research Article
 

Representing Protein Sequence with Low Number of Dimensions



Nazar Zaki and Safaai Deris
 
ABSTRACT

This study introduces a simple method based on representing protein sequence by fix dimensions of the length three. We present hidden Markov model combining scores method. Three scoring algorithms are combined to represent protein sequence of amino acids for better remote homology detection. We tested the method on the SCOP version 1.37 dataset. The results show that, with such a simple representation, we are able to achieve superior performance to previously presented protein homology detection methods while achieving better computational efficiency.

Services
Related Articles in ASCI
Similar Articles in this Journal
Search in Google Scholar
View Citation
Report Citation

 
  How to cite this article:

Nazar Zaki and Safaai Deris , 2005. Representing Protein Sequence with Low Number of Dimensions. Journal of Biological Sciences, 5: 795-800.

DOI: 10.3923/jbs.2005.795.800

URL: https://scialert.net/abstract/?doi=jbs.2005.795.800

REFERENCES
5. Baldi, P., Y. Chauvin, T. Hunkapiller and M.A. Mcclure, 1994. Hidden Markov models of biological primary sequence information. Proc. Natl. Acad. Sci. USA., 91: 1059-1063.
Direct Link  |  

Altschul, S.F., T.L. Madden, A.A. Schaffer, J. Zhang, Z. Zhang, W. Miller and D.J. Lipman, 1997. Gapped BLAST and PSI-BLAST: A new generation of protein database search programs. Nucl. Acids Res., 25: 3389-3402.
PubMed  |  Direct Link  |  

Altschul, S.F., W. Gish, W. Miller, E.W. Myers and D.J. Lipman, 1990. Basic local alignment search tool. J. Mol. Biol., 215: 403-410.
CrossRef  |  PubMed  |  Direct Link  |  

Barrett, C., R. Hughey and K. Karplus, 1997. Scoring hidden markov models. CABIOS, 13: 191-199.

Cristianini, N. and J. Shawe-Taylor, 2000. An Introduction to Support Vector Machines. Cambridge University Press, Cambridge, UK.

Gribskov. M., R. Luthy and D. Eisenberg, 1990. Profile analysis. Method Enzymol., 183: 146-159.

Henikoff, S. and J.G. Henikoff, 1997. Embedding strategies for effective use of information from multiple sequence alignments. Protein Sci., 6: 698-705.
Direct Link  |  

Jaakkola, T., M. Diekhans and D. Haussler, 1999. Using the Fisher kernel Method to Detect Remote Protein Homologies. CA. AAAI Press, Menlo Park, pp: 149-158..

Jaakkola, T., M. Diekhans and D. Haussler, 2000. A discriminative framework for detecting remote protein homologies. J. Comp. Biol., 7: 95-114.

Karplus, K., C. Barrett and R. Hughey, 1998. Hidden Markov models for detecting remote protein homologies. Bioinformatics, 14: 846-856.
Direct Link  |  

Krogh, A., M. Brown, I.S. Mian, K. Sjolander and D. Haussler, 1994. Hidden Markov models in computational biology applications to protein modeling. J. Mol. Biol., 235: 1501-1531.
Direct Link  |  

Leslie, C., E. Eskin, J. Weston and W. Noble, 2004. Mismatch string kernels for discriminative protein classification. Bioinformatics, 20: 467-476.
Direct Link  |  

Liao, L. and W.S. Noble, 2003. Combining pairwise sequence similarity and support vector machines for detecting remote protein evolutionary and structural relationships. J. Comp. Biol., 10: 857-868.
Direct Link  |  

Logan, B., P. Moreno, B. Suzek, Z. Weng and S. Kasif, 2001. A Study of Remote Homology Detection. Cambridge Research Laboratary, Cambridge.

Merz, C., 1999. Using correspondence analysis to combine classifiers. Mac. Learn., 36: 33-58.

Murzin, A.G., S.E. Brenner, T. Hubbard and C. Chothia, 1995. Scop A structural classification of proteins database for the investigation of sequences and structures. J. Mol. Biol., 247: 536-540.
PubMed  |  Direct Link  |  

Pearson, W. R., 1990. Rapid and sensitive sequence comparisons with fastap and fasta. Method. Enzymol., 183: 63-98.
PubMed  |  

Salzberg, S.L., 1997. On comparing classifiers pitfalls to avoid and a recommended approach. Data Mining Know. Dis., 1: 317-328.

Shimshoni, Y. and N. Intrator, 1998. Classifying seismic signals by integrating ensembles of neural networks. IEEE Signal Process, 46: 1194-1201.
CrossRef  |  

Smith, T.F. and M.S. Waterman, 1981. Identification of common molecular subsequences. J. Mol. Biol., 147: 195-197.
CrossRef  |  PubMed  |  Direct Link  |  

Swets, J.A., 1988. Measuring the accuracy of diagnostic systems. Science, 270: 1285-1293.
CrossRef  |  PubMed  |  Direct Link  |  

Vapnik, V.N., 1998. Statistical Learning Theory. 1st Edn., John Wiley and Sons, New York.

©  2020 Science Alert. All Rights Reserved