Subscribe Now Subscribe Today
Science Alert
FOLLOW US:     Facebook     Twitter
Curve Top
Information Technology Journal
  Year: 2014 | Volume: 13 | Issue: 14 | Page No.: 2309-2317
DOI: 10.3923/itj.2014.2309.2317
Facebook Twitter Digg Reddit Linkedin StumbleUpon E-mail
A Fingerprint of Paragraph Generation Approach for Detecting Similar Document
Haitao Wang, Shufeng Liu and Zongpu Jia

This study proposes a kind of fingerprint of paragraph generation approach for similar documents detection which mainly includes three parts, select long sentence, obtain feature set of paragraph by calculate weight value of sentence and generate the fingerprint of paragraph. As a result, a document is turned into a set of paragraph fingerprint and a similarity measure is then applied, the similarity degree is calculated between the input document and each document in the given collection. A similarity function based on consine function is employed to determine whether a document is a near duplicate to the input document by similarity degree between them or not. This study presented can better reveal the characteristics of the document and reduce the noisy influence on feature selection effectively. The experiment demonstrates that the method which offered has the preferable precision and recall ratio, relatively less running time comparing to the other method.
PDF Fulltext XML References Citation Report Citation
How to cite this article:

Haitao Wang, Shufeng Liu and Zongpu Jia, 2014. A Fingerprint of Paragraph Generation Approach for Detecting Similar Document. Information Technology Journal, 13: 2309-2317.

DOI: 10.3923/itj.2014.2309.2317








Curve Bottom