HOME JOURNALS CONTACT

Information Technology Journal

Year: 2013 | Volume: 12 | Issue: 23 | Page No.: 7673-7676
DOI: 10.3923/itj.2013.7673.7676
Progressive Similarity Transductive Support Vector Machine Algorithm for Small Sample Text Classification
Jianbin Ma and Ying Li

Abstract: Support Vector Machine (SVM) algorithm is applied to text classification widely. However, SVM’s limitation is that it is difficult to label samples rightly if available training samples are small. So TSVM (Transductive Support Vector Machine) was introduced to minimize misclassification of test samples via., training on labeled and unlabeled samples. However, in the training process of TSVM, the parameter N (the number of positive samples) should be inputted artificially. The parameter N is difficult to estimate. In this study, PSTSVM (Progressive Similarity Transductive Support Vector Machine) was introduced which labeled most likely unlabeled samples pairwise by similarity computing and then retrained to readjust the hyperplane. The experimental results on Reuters dataset showed that PSTSVM algorithm was effective on a mixed training set of unlabeled samples and labeled samples.

Fulltext PDF

How to cite this article
Jianbin Ma and Ying Li, 2013. Progressive Similarity Transductive Support Vector Machine Algorithm for Small Sample Text Classification. Information Technology Journal, 12: 7673-7676.

Keywords: PSTSVM, small sample, text classification and support vector machine

REFERENCES

  • Chen, Y.S., G.P. Wang and S.H. Dong, 2003. A progressive transductive inference algorithm based on support vector machine. J. Software, 14: 451-460.


  • Chen, Y.S., G.P. Wang and S. Dong, 2003. Learning with progressive transductive support vector machine. Pattern Recogn. Lett., 24: 1845-1855.
    CrossRef    Direct Link    


  • Drucker, H., D. Wu and V.N. Vapnik, 1999. Support vector machines for spam categorization. IEEE Trans. Neural Network, 10: 1048-1054.
    CrossRef    Direct Link    


  • Joachims, T., 1998. Text categorization with support vector machines: Learning with many relevant features. Proceedings of the 10th European Conference on Machine Learning, Chemnitz, Germany, April 21-23, 1998, Springer, Berlin, Heidelberg, pp: 137-142.


  • Joachims, T., 1999. Transductive inference for text classification using support vector machines. Proceedings of the 16th International Conference on Machine Learning, June 27-30, 1999, Morgan Kaufmann Publishers Inc., San Francisco, CA, USA., pp: 200-209.


  • Joachims, T., 2001. A statistical learning model of text classification for support vector machines. Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, September 9-13, 2001, New Orleans, USA., pp: 128-136.


  • Ma, J.B., G.F. Teng, Y.X. Zhang, Y.L. Li and Y. Li, 2009. A cybercrime forensic method for chinese web information authorship analysis. Proceedings of 2009 Pacific Asia Workshop on Intelligence and Security Informatics, April 27, 2009, Bangkok, Thailand, pp: 14-24.


  • Ma, J.B., Y. Li, G.F. Teng and Y.X. Zhang, 2013. An authorship attribution forensic method for web information. ICIC Exp. Lett., 7: 2609-2613.


  • Ren, G.B., J. Zhang, Y. Ma and P.J. Song, 2010. An unlabeled samples labeling method of TSVM for remote sensing image. Proceedings of the 3rd IEEE International Conference on Computer Science and Information Technology, July 9-11, 2010, Chengdu, China, pp: 286-290.


  • Vapnik, V.N., 1998. Statistical Learning Theory. Wiley, New York, USA


  • Wang, Y. and Z. Gong, 2008. Hierarchical classification of web pages using support vector machine. Proceedings of the 11th International Conference on Asian Digital Libraries, December 2-5, 2008, Bali, Indonesia, pp: 12-21.

  • © Science Alert. All Rights Reserved