DataPDF Available

Overview PD Methods

December 2019

December 2019

Authors:

Tomáš Foltýnek

Mendel University in Brno

Norman Meuschke

Georg-August-Universität Göttingen

Bela Gipp

Georg-August-Universität Göttingen

Content uploaded by Norman Meuschke

Content may be subject to copyright.

A preview of the PDF is not available

Content uploaded by Norman Meuschke

Author content

Fig2_Overvi

ew.png

Image

193.43 KB

Download file

ResearchGate has not been able to resolve any citations for this publication.

ResearchGate has not been able to resolve any references for this publication.

Academic Plagiarism Detection: A Systematic Literature Review

Article

October 2019

Tomáš Foltýnek · Norman Meuschke · Bela Gipp

Download

Data

Full-text available

Layer Model

December 2019

View full-text

Article

Full-text available

Academic Plagiarism Detection: A Systematic Literature Review

October 2019 · ACM Computing Surveys

This article summarizes the research on computational methods to detect academic plagiarism by systematically reviewing 239 research papers published between 2013 and 2018. To structure the presentation of the research contributions, we propose novel technically oriented typologies for plagiarism prevention and detection efforts, the forms of academic plagiarism, and computational plagiarism ... [Show full abstract] detection methods. We show that academic plagiarism detection is a highly active research field. Over the period we review, the field has seen major advances regarding the automated detection of strongly obfuscated and thus hard-to-identify forms of academic plagiarism. These improvements mainly originate from better semantic text analysis methods, the investigation of non-textual content features, and the application of machine learning. We identify a research gap in the lack of methodologically thorough performance evaluations of plagiarism detection systems. Concluding from our analysis, we see the integration of heterogeneous analysis methods for textual and non-textual content features using machine learning as the most promising area for future research contributions to improve the detection of academic plagiarism further.

View full-text

Chapter

Full-text available

Identifying Machine-Paraphrased Plagiarism

January 2022

Employing paraphrasing tools to conceal plagiarized text is a severe threat to academic integrity. To enable the detection of machine-paraphrased text, we evaluate the effectiveness of five pre-trained word embedding models combined with machine learning classifiers and state-of-the-art neural language models. We analyze preprints of research papers, graduation theses, and Wikipedia articles, ... [Show full abstract] which we paraphrased using different configurations of the tools SpinBot and SpinnerChief. The best performing technique, Longformer, achieved an average F1 score of 80.99% (F1=99.68% for SpinBot and F1=71.64% for SpinnerChief cases), while human evaluators achieved F1=78.4% for SpinBot and F1=65.6% for SpinnerChief cases. We show that the automated classification alleviates shortcomings of widely-used text-matching systems, such as Turnitin and PlagScan. To facilitate future research, all data (https://doi.org/10.5281/zenodo.3608000), code (https://github.com/jpelhaW/ParaphraseDetection), and two web applications (https://huggingface.co/jpelhaw/longformer-base-plagiarism-detection) showcasing our contributions are openly available.

View full-text

Preprint

Full-text available

Identifying Machine-Paraphrased Plagiarism

March 2021

Employing paraphrasing tools to conceal plagiarized text is a severe threat to academic integrity. To enable the detection of machine-paraphrased text, we evaluate the effectiveness of five pre-trained word embedding models combined with machine learning classifiers and state-of-the-art neural language models. We analyze preprints of research papers, graduation theses, and Wikipedia articles, ... [Show full abstract] which we paraphrased using different configurations of the tools SpinBot and SpinnerChief. The best performing technique, Longformer, achieved an average F1 score of 80.99% (F1=99.68% for SpinBot and F1=71.64% for SpinnerChief cases), while human evaluators achieved F1=78.4% for SpinBot and F1=65.6% for SpinnerChief cases. We show that the automated classification alleviates shortcomings of widely-used text-matching systems, such as Turnitin and PlagScan. To facilitate future research, all data, code, and two web applications showcasing our contributions are openly available.

View full-text

Overview PD Methods

File (1)

Linked Research

Recommended publications

Layer Model

Academic Plagiarism Detection: A Systematic Literature Review

Identifying Machine-Paraphrased Plagiarism

Identifying Machine-Paraphrased Plagiarism