|
|
|
|
Research Article
|
|
Programs Similarity Measure Based on Tree Structure and Eigenvector |
|
Dongmei Li,
Di Zhang,
Zhifang Wei
and
Jianxin Wang
|
|
|
ABSTRACT
|
Program similarity measure technology is to detect the similarity
among the programs by certain means. It is widely used in teaching and protection
of intellectual property rights. Most current program similarity measure technologies
suffer from low accuracy. Based on previous studies of program similarity measure
method, this study proposes a method based on tree structure and eigenvector.
Firstly, the actual frequency of keywords in the program is counted through
employing hierarchical tree structure. Sencondly, the frequency is applied to
generate eigenvector of program and the traditional method based on vector is
improved. Finally, a program similarity measure system named Cplag is implemented
which can be used to measure C language program similarity. Experimental results
indicate that CPlag has apparent advantages in some aspects compared with famous
Jplag.
|
|
|
|
|
|
|
|
REFERENCES |
1: Aimmanee, P., 2011. Automatic plagiarism detection using word-sentence based S-gram. Chiang Mai J. Sci., 38: 1-7.
2: Donaldson, J.L., A.M. Lancaster and P.H. Sposato, 1981. A plagiarism detection system. Proceedings of the 12th SIGCSE symposium on Computer science Education. February 4-6 1981, New York, USA., pp: 21-25
3: Faidhi, J.A.W. and S.K. Robinson, 1987. An empirical approach for detecting program similarity and plagiarism within a university programming environment. Comput. Edu., 11: 11-19. CrossRef |
4: Grier, S., 1981. A tool that detects plagiarism in Pascal programs. Proceedings of the 12th SIGCSE Symposium on Computer Science Education, February 4-6, 1981, New York, USA., pp: 15-20 CrossRef |
5: Huang, L.L., H.Y. Huang and S.M. Shi, 2010. Method for fingerprint selection orienting to code similarity detection. Comput. Engin. Applic., 46: 169-171.
6: Inoue, U. and S. Wada, 2012. Detecting plagiarisms in elementary programming courses. Proceedings of the 9th International Conference on Fuzzy Systems and Knowledge Discovery (FSKD), May 29-31, 2012, Chongqing University, pp: 2308-2312
7: Jones, E.L., 2001. Metrics based plagarism monitoring. J. Comput. Sci. Colleges, 16: 253-261.
8: Kamiya, T., S. Kusumoto and K. Inoue, 2002. CCFinder: A multilinguistic token-based code clone detection system for large scale source code. Trans. Software Eng., 28: 654-670.
9: Prechelt, L., G. Malpohl and M. Philippsen, 2002. Finding plagiarisms among a set of programs with JPlag. J. Univ. Comput. Sci., 8: 1016-1038. Direct Link |
10: Schleimer, S., D.S. Wilkerson and A. Aiken, 2003. Winnowing: Local algorithms for document fingerprinting. Proceedings of the ACM SIGMOD International Conference on Management of Data, June 9-12, 2003, San Diego, California, USA., pp: 76-85. CrossRef |
11: Whale, G., 1988. Plague: Plagiarism detection using program structure. Department of Computer Science Technical Report 8805, University of NSW, Kensington, Australasian.
12: Wise, M.J., 1992. Detection of similarities in student program: YAP'ing may be preferable to Plague'ing. Proceedings of the 23rd SIGCSE Technical Symposium on Computer Science Education, March 5-6, 1992, Kansas City, Missouri, USA., pp: 268-271 CrossRef |
13: Wise, M.J., 1996. YAP3: Improved detection of similarities in computer program and other texts. Proceedings of the 27th SIGCSE technical symposium on Computer Science Education, March 10-12, 1996, New York, NY, USA., pp: 130-134 CrossRef |
14: Xiong, H., H.H. Yan and T. Guo, 2010. Code similarity detection: A survey. Comput. Sci., 37: 9-14.
15: Zhao, C.H., H.H. Yan and M.Z. Jin, 2008. Approach based on compiling optimization and disassembling to detect program similarity. J. Beijing Univ. Aeronautics Astronautics, 34: 711-715. Direct Link |
|
|
|
 |