The main goal of this study is to analyse the morphological structure of the Turkish documents. The new proposed method consists of lossless compressing the monograms, digrams, trigrams, roots-stems and suffixes individually using a statistical approach. In this work 1g, 2 g, 3 g, root-system and suffix frequencies have been computed for Turkish language. A tuned template has been prepared for each group. The compression of Turkish documents has been performed by using the static Huffman Coding for Word Based Dynamic Huffman has been measured.
PDF References Citation
How to cite this article
Banu Diri, 2001. Content Based Compression of Turkish Documents. Journal of Applied Sciences, 1: 446-451.