Plagiarism in plagiarism due to the widespread use of
Plagiarism has become very common in educational institutions. Students copy without any hesitation other students? assignments, both text and source code, to complete their work in time or to complete their work in a better way. Many students seldom care to put their time and effort into doing the assignments on their own when it is far simpler and effortless to copy from someone else. However, it is necessary to differentiate the original work from plagiarized work.There is an alarming rise in plagiarism due to the widespread use of internet. Internet is an enormously huge repository of information which can be accessed easily from almost anywhere. This has made it very difficult to control plagiarism. Since the2task of manually detecting plagiarism in a large document database is very tedious and time-consuming, efforts are continuously being made to automate the process.There exist many different plagiarism detection techniques and numerous tools based on these techniques. There are two main categories of techniques for source code plagiarism detection: attribute-counting-based and structure-based comparison. Attribute-counting-based techniques consider the number of occurrences of different attributes in a file following certain criteria and different similarity measures are used to obtain the similarity between files. Structure-based techniques derive information on program structure and obtain similarity scores based on this information. Section 1.4 gives a brief overview of the various plagiarism detection techniques.Attribute-counting algorithms are simple to implement and execute faster. Structure-based methods, on the other hand, are more reliable since they gather details of program structure for comparison of programs. However, structure-based methods are computationally expensive. Hence, the aim of this research is to develop a new strategy which combines the advantages of both the categories.1.3 OBJECTIVES OF THE WORKThe objectives of this research are:1) To provide a review of the existing technologies for source code plagiarism detection.2) To study and analyze the use of different approaches for document retrieval in detecting plagiarisms in source code.3) To combine the advantages of attribute-counting- and structure-based code plagiarism detection techniques and design a new strategy which can effectively figure out plagiarized source code files.4) To derive a fast and efficient method to detect plagiarisms in source code files written in C, C++ and Java.