A Template Report Detection System Based on Semantic Text Similarity
In academic publishing, the integrity of the peer-review process is critical for maintaining scientific quality. However, this process faces a significant threat from sophisticated fraud, including the use of paraphrased template reports and coordinated misconduct by reviewer rings and paper mills.
Sarah Weber, 2025
Art der Arbeit Bachelor Thesis
Auftraggebende MDPI
Betreuende Dozierende Pustulka, Elzbieta
Views: 52
While many major scientific publishers utilise systems to detect verbatim text reuse, these tools are often architecturally limited, leaving the peer-review process vulnerable to sophisticated threats like paraphrased content and coordinated fraud among different reviewers.
This thesis aimed to address these limitations by developing a more advanced, scalable detection system capable of understanding semantics. This project focuses on a dual-method prototype that combines a lexical n-gram baseline with a semantic similarity model. The semantic model uses embeddings and a vector database to perform a global, cross-reviewer comparison of all reports in a historical corpus
By operating on the level of semantics instead of just lexical matching, this approach makes it possible to uncover sophisticated misconduct, including paraphrased reports from different reviewers. The prototype's performance was validated through a large-scale test and qualitative interviews with research integrity experts.
Studiengang: Business Information Technology (Bachelor)
Keywords Peer-Review Fraud, Template Detection, Coordinated Misconduct, Semantic Similarity, Vector Database
Vertraulichkeit: öffentlich