Content Based Compression of Turkish Documents

Abstract: The main goal of this study is to analyse the morphological structure of the Turkish documents. The new proposed method consists of lossless compressing the monograms, digrams, trigrams, roots-stems and suffixes individually using a statistical approach. In this work 1g, 2 g, 3 g, root-system and suffix frequencies have been computed for Turkish language. A tuned template has been prepared for each group. The compression of Turkish documents has been performed by using the static Huffman Coding for Word Based Dynamic Huffman has been measured.

Fulltext PDF

How to cite this article

Banu Diri , 2001. Content Based Compression of Turkish Documents. Journal of Applied Sciences, 1: 446-451.

Keywords: huffman coding, language modelling, text compression, n gram models and turkish corpus

REFERENCES

Diri, B., 1999. Turkcenin bicimbilim yapisina dayali bir metin sikistirma sistemi. Ph.D. Thesis, Department of Computer Engineering, YTU, Istanbul, Turkey.

Gibson, J.D. and T. Berger, 1998. Digital Compression for Multimedia. Morgan Kauffmann, San Fransisco, CA

Knuit, D.E., 1985. Dynami chuffman conding. J. Algorithms, 6: 163-180.

Philips, D., 1992. LZW data compression. Comput. Appl. J., 27: 36-48.

Salmon, D., 1998. Data Compression. Springer, New York, USA

Kurumu, T.D., 1996. Imla Kilavuzu. Turk Tarih Kurumu Basimevi, Ankara, Turkey

HOME JOURNALS CONTACT

Journal of Applied Sciences

Year: 2001 | Volume: 1 | Issue: 4 | Page No.: 446-451
DOI: 10.3923/jas.2001.446.451

Content Based Compression of Turkish Documents

Banu Diri

How to cite this article

Banu Diri , 2001. Content Based Compression of Turkish Documents. Journal of Applied Sciences, 1: 446-451.

Keywords: huffman coding, language modelling, text compression, n gram models and turkish corpus

REFERENCES

HOME JOURNALS CONTACT

Journal of Applied Sciences

Year: 2001 | Volume: 1 | Issue: 4 | Page No.: 446-451 DOI: 10.3923/jas.2001.446.451

Content Based Compression of Turkish Documents

Banu Diri

How to cite this article

Banu Diri , 2001. Content Based Compression of Turkish Documents. Journal of Applied Sciences, 1: 446-451.

Keywords: huffman coding, language modelling, text compression, n gram models and turkish corpus

REFERENCES

Year: 2001 | Volume: 1 | Issue: 4 | Page No.: 446-451
DOI: 10.3923/jas.2001.446.451