HOME JOURNALS CONTACT

Journal of Applied Sciences

Year: 2013 | Volume: 13 | Issue: 22 | Page No.: 5230-5234
DOI: 10.3923/jas.2013.5230.5234
Improved of Phrase Extraction Algorithm in Tibetan and Chinese Statistical Machine Translation
Cao Hui and Dong Xiaofang

Abstract: The extraction of the bilingual phrase is one of the key steps in the phrase-based translation model of Statistical machine translation. Extracting bilingual phrase accurately and sufficiently is the focus of the study. By improving the phrase extraction algorithm get the final phrase translation probability table. There is the situation that a Tibetan word aligned to many Chinese in the word alignment matrix. Using the Och algorithm extracts phrase pairs. When it does not meet Och’ condition, adding Tibetan dictionaries information. Comparing results by two methods which is the same size between different linguistic corpus and different sentence pairs of Tibetan-Chinese parallel corpora, the improved will be better in the experiment.

Fulltext PDF

How to cite this article
Cao Hui and Dong Xiaofang, 2013. Improved of Phrase Extraction Algorithm in Tibetan and Chinese Statistical Machine Translation. Journal of Applied Sciences, 13: 5230-5234.

Keywords: Statistical machine translation, phrase extraction, translation model and tibetan-Chinese bilingual phrase pairs

REFERENCES

  • David, C., 2005. A hierarchical phrase-based model for statistical machine translation. Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics, June 25-30, 2005, USA., pp: 263-270.


  • Deng, Y.G., J. Xu and Y.Q. Gao, 2008. Phrase table training for precision and recall: What makes a good phrase and a good phrase pair? Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics, June 15-20, 2008, USA., pp: 81-88.


  • Chen, Y.Z., B.L. Li and S.W. Yu, 2003. The design and implementation of a Tibetan word segmentation system. J. Chin. Inform., 17: 15-20.


  • He, Y., Y. Zhou, C. Zong and X. Wang, 2007. Method of phrase translation extraction based on loose scale. J. Chin. Inform., 21: 91-95.


  • Li, Y., X.Z. He, J.Y. Ai and H. Yu, 2009. Tibetan encoding and its transformation. Comput. Appl., 29: 2017-2018.


  • Qi, K.Y., 2006. Information processing in Tibetan word segmentation research. J. Northwest Univ. Nationalities, 4: 92-97.


  • Vogel, S., 2005. PESA: Phrase pair extraction as sentence splitting. Proceedings of the Machine Translation Summit X, September 14, 2005, Phuket, Thailand, pp: 251-258.


  • Zhao, B. and S. Vogel, 2005. A generalized alignment-free phrase extraction. Proceedings of the ACL Workshop on Building and Using Parallel Texts, June 29-30, 2005, USA., pp: 141-144.

  • © Science Alert. All Rights Reserved