Wenhao Zhu
School of Computer Engineering and Science, Shanghai University, Shanghai, China
Chaoyou Ju
School of Computer Engineering and Science, Shanghai University, Shanghai, China
Wei Xu
School of Computer Engineering and Science, Shanghai University, Shanghai, China
Jiaoxiong Xia
Information Centre, Shanghai Municipal Education Commission, Shanghai, China
Li Fu
School of Computer Engineering and Science, Shanghai University, Shanghai, China
ABSTRACT
Traditional Chinese Medicine (TCM) has a very long history in China. As a part of Chinese culture heritage, clinical TCM records were preserved in TCM books. With the rapid development of digitization movement, a lot of these books are being digitized and it will be very useful if the medical records can be extracted as structural information. However, the content of TCM records is in old Chinese language and has diverse written styles as they are accomplished by different authors. Hence, its difficult to extract these records by a general one-step approach. In this paper, we present a hierarchical information extraction method that extracts medical records in a multi-level way. Corresponding algorithms are designed according to different information level respectively so that not only the detailed textual features, such as written style and printing format but also the relations between these information are taken into account during the process of extraction. We verify our approach with TCM books which are in old Chinese language and are hard to process with normal natural language processing techniques. The experiment shows that our approach achieves a good performance for most of the test books and can be applied for other similar tasks.
PDF References Citation
How to cite this article
Wenhao Zhu, Chaoyou Ju, Wei Xu, Jiaoxiong Xia and Li Fu, 2013. Extracting Medical Records with Hierarchical Information Extraction Method. Information Technology Journal, 12: 4441-4446.
DOI: 10.3923/itj.2013.4441.4446
URL: https://scialert.net/abstract/?doi=itj.2013.4441.4446
DOI: 10.3923/itj.2013.4441.4446
URL: https://scialert.net/abstract/?doi=itj.2013.4441.4446
REFERENCES
- Miller, G.A., 1995. WordNet: A lexical database for English. Commun. ACM, 38: 39-41.
CrossRefDirect Link - Bounhas, I. and Y. Slimani, 2010. A hierarchical approach for semi-structured document indexing and terminology extraction. Proceedings of the International Conference on Information Retrieval and Knowledge Management, March 17-18, 2010, Shah Alam, Selangor, pp: 315-320.
CrossRef - Grishman, R. and B. Sundheim, 1996. Message understanding conference-6: A brief history. Proceedings of the 16th Conference on Computational Linguistics, August 5-9, 1996, Copenhagen, Denmark, pp: 466-471.
CrossRef - Lafferty, J.D., A. McCallum and F.C.N. Pereira, 2001. Conditional random fields: Probabilistic models for segmenting and labeling sequence data. Proceedings of the 18th International Conference on Machine Learning, June 28-July 1, 2001, Williamstown, MA., USA., pp: 282-289.
Direct Link