Asian Science Citation Index is committed to provide an authoritative, trusted and significant information by the coverage of the most important and influential journals to meet the needs of the global scientific community.  
ASCI Database
308-Lasani Town,
Sargodha Road,
Faisalabad, Pakistan
Fax: +92-41-8815544
Contact Via Web
Suggest a Journal
Information Technology Journal
Year: 2013  |  Volume: 12  |  Issue: 20  |  Page No.: 5955 - 5961

Simple Semi-supervised Learning for Chinese Word Segmentation and Pos Tagging

Xinxin Li, Xuan Wang and Muhammad Waqas Anwar    

Abstract: Strategies of unlabeled data selection are important for semi-supervised learning of natural language processing tasks. To increase the accuracy and diversity of new labeled data, plenty of methods have been proposed, such as ensemble-based self-training, co-training and tri-training methods. In this paper, we propose a simple and effective semi-supervised algorithm for Chinese word segmentation and part-of-speech tagging problem which selects new labeled data agreed by two different approaches: character-based and word-based models. Theoretical and experimental analysis verifies that sentences with same annotation on both models are more accurate than those generated by single models and are suitable for semi-supervised learning as additional data. Experimental results on Chinese Treebank 5.0 demonstrate that our semi-supervised approach is comparable with the best reported semi-supervised approach which employs complex feature engineering.

Cited References   |    Fulltext    |   Related Articles   |   Back
  Related Articles

Copyright   |   Desclaimer   |    Privacy Policy   |   Browsers   |   Accessibility