Extracting Medical Records with Hierarchical Information Extraction Method

Zhu, Wenhao; Ju, Chaoyou; Xu, Wei; Xia, Jiaoxiong; Fu, Li

Research Article

Extracting Medical Records with Hierarchical Information Extraction Method

Wenhao Zhu
School of Computer Engineering and Science, Shanghai University, Shanghai, China

Chaoyou Ju
School of Computer Engineering and Science, Shanghai University, Shanghai, China

Wei Xu
School of Computer Engineering and Science, Shanghai University, Shanghai, China

Jiaoxiong Xia
Information Centre, Shanghai Municipal Education Commission, Shanghai, China

Li Fu
School of Computer Engineering and Science, Shanghai University, Shanghai, China

ABSTRACT

Traditional Chinese Medicine (TCM) has a very long history in China. As a part of Chinese culture heritage, clinical TCM records were preserved in TCM books. With the rapid development of digitization movement, a lot of these books are being digitized and it will be very useful if the medical records can be extracted as structural information. However, the content of TCM records is in old Chinese language and has diverse written styles as they are accomplished by different authors. Hence, it’s difficult to extract these records by a general one-step approach. In this paper, we present a hierarchical information extraction method that extracts medical records in a multi-level way. Corresponding algorithms are designed according to different information level respectively so that not only the detailed textual features, such as written style and printing format but also the relations between these information are taken into account during the process of extraction. We verify our approach with TCM books which are in old Chinese language and are hard to process with normal natural language processing techniques. The experiment shows that our approach achieves a good performance for most of the test books and can be applied for other similar tasks.

PDF References Citation

How to cite this article

Wenhao Zhu, Chaoyou Ju, Wei Xu, Jiaoxiong Xia and Li Fu, 2013. Extracting Medical Records with Hierarchical Information Extraction Method. Information Technology Journal, 12: 4441-4446.

DOI: 10.3923/itj.2013.4441.4446

URL: https://scialert.net/abstract/?doi=itj.2013.4441.4446

REFERENCES

Miller, G.A., 1995. WordNet: A lexical database for English. Commun. ACM, 38: 39-41.
CrossRef Direct Link
Hu, X.Q., C.L. Zhou and S.Z. Li, 2008. Construction of ancient cases database and research on data processing. Intell. J., 27: 127-129.
Bounhas, I. and Y. Slimani, 2010. A hierarchical approach for semi-structured document indexing and terminology extraction. Proceedings of the International Conference on Information Retrieval and Knowledge Management, March 17-18, 2010, Shah Alam, Selangor, pp: 315-320.
CrossRef
Marciniak, M., A. Mykowiecka, A. Kupsc and J. Piskorski, 2004. Intelligent content extraction from polish medical reports. Proceedings of the 2nd International Workshop on Intelligent Media Technology for Communicative Intelligence, September 13-14, 2004, Warsaw, Poland, pp: 68-78.
Surdeanu, M., R. Nallapati and C. Manning, 2010. Legal claim identification: Information extraction with hierarchically labeled data. Proceedings of the LREC Workshop on the Semantic Processing of Legal Texts, May 17-23, 2010, Malta, pp: 22-29.
Rani, P., R. Reddy, D. Mathur, S. Bandyopadhyay and A. Laha, 2011. Compositional information extraction methodology from medical reports. Proceedings of the 16th International Conference on Database Systems for Advanced Applications, April 22-25, 2011, Hong Kong, China, pp: 400-412.
Grishman, R. and B. Sundheim, 1996. Message understanding conference-6: A brief history. Proceedings of the 16th Conference on Computational Linguistics, August 5-9, 1996, Copenhagen, Denmark, pp: 466-471.
CrossRef
Wu, X.Y., 2004. Applications of hierarchical keyword extraction and automated text classification in bulletin board system. Master's Thesis, Shanghai Jiaotong University, China.
Yu, K., 2005. Research on semi-structured information extraction of the internet. Ph.D. Thesis, University of Science and Technology of China, Anhui, China.
Zhang, Y.B., 2009. Research on data processing for traditional Chinese medicine cases. Master's Thesis, Nanjing University of Science and Technology, Nanjing, China.
Lafferty, J.D., A. McCallum and F.C.N. Pereira, 2001. Conditional random fields: Probabilistic models for segmenting and labeling sequence data. Proceedings of the 18th International Conference on Machine Learning, June 28-July 1, 2001, Williamstown, MA., USA., pp: 282-289.
Direct Link

Information Technology Journal

Research Article

Extracting Medical Records with Hierarchical Information Extraction Method

ABSTRACT

How to cite this article

Search

REFERENCES

Search

Leave a Comment