Asian Science
Citation Index is committed to provide an authoritative, trusted
and significant information by the coverage of the most important
and influential journals to meet the needs of the global scientific
community.
Abstract: Support Vector Machine (SVM) algorithm is applied to text classification widely. However, SVMs limitation is that it is difficult to label samples rightly if available training samples are small. So TSVM (Transductive Support Vector Machine) was introduced to minimize misclassification of test samples via., training on labeled and unlabeled samples. However, in the training process of TSVM, the parameter N (the number of positive samples) should be inputted artificially. The parameter N is difficult to estimate. In this study, PSTSVM (Progressive Similarity Transductive Support Vector Machine) was introduced which labeled most likely unlabeled samples pairwise by similarity computing and then retrained to readjust the hyperplane. The experimental results on Reuters dataset showed that PSTSVM algorithm was effective on a mixed training set of unlabeled samples and labeled samples.