|
|
|
|
Research Article
|
|
Blocking Distribution Based Hierarchical Reconstruction for Text Categorization |
|
Wen Li,
Weili Wang
and
Ling Chai
|
|
|
ABSTRACT
|
As one of the important techniques in large-scale data organizing,
text categorization has been widely investigated. But the existing hierarchical
classification methods often suffer from inter-level error transmission, namely
blocking. In this paper, blocking distribution based topology reconstruction
method was proposed for hierarchical text categorization problem. Firstly, blocking
distribution recognition technique is put forward to mining out the serious
high-level misclassification class. Subsequently, original hierarchical structure
are reconstructed using blocking direction information obtained ahead, which
increasing the path for the blocking instance to the correct subclass. Experimental
studies on Chinese text classification benchmark Tan Corp, demonstrate that
the proposed algorithm performs better than the traditional hierarchical and
state-of-the-art flat classification strategies.
|
|
|
|
|
|
|
|
REFERENCES |
1: He, L., Y. Jia, W. Han, S. Tan and Z. Chen, 2012. Research and development of large scale hierarchical classification problem. J. Comput., 35: 2101-2115.
2: Huang, C.C., S.L. Chuang and L.F. Chien, 2004. Liveclassifier: Creating hierarchical text classifiers through web corpora. Proceedings of the 13th International Conference on World Wide Web, May 17, 2004, New York, USA., pp: 184-192
3: Ceci, M. and D. Malerba, 2007. Classifying web documents in a hierarchy of categories: A comprehensive study. J. Intell. Inform. Syst., 28: 37-78. CrossRef | Direct Link |
4: Sun, A., E.P. Lim and W.K. Ng, 2003. Performance measurement framework for hierarchical text classification. J. Am. Soc. Inform. Sci. Technol., 54: 1014-1028. CrossRef | Direct Link |
5: Ruiz, M.E., 2001. Combining machine learning and hierarchical structures for text categorization. Ph.D. Thesis, Graduate College of University of Iowa, Ames, USA.
6: Li, W., D.Q. Miao, W. Wang and N. Zhang, 2010. Hierarchical rough decision theoretic framework for text classification. Proceedings of the 9th IEEE International Conference on Cognitive Informatics, July 7-9, 2010, Beijing, China, pp: 484-489 CrossRef |
7: Yuan, S., R. Li, S. Zhou and Y. Hu, 2004. Hierarchical Chinese document categorization. J. China Inst. Commun., 25: 55-63.
8: Sun, A., E.P. Lim, W.K. Ng and J. Srivastava, 2004. Blocking reduction strategies in hierarchical text classification. IEEE Trans. Knowl. Data Eng., 16: 1305-1308. CrossRef | Direct Link |
9: Tan, S., 2006. An effective refinement strategy for KNN Text classifier. Expert Syst. Applic., 30: 290-298. CrossRef |
10: Joachims, T., 1998. Text categorization with support vector machines: Learning with many relevant features. Proceedings of the 10th European Conference on Machine Learning, Chemnitz, Germany, April 21-23, 1998, Springer, Berlin, Heidelberg, pp: 137-142 CrossRef | Direct Link |
|
|
|
 |