A Template Report Detection System Based on Semantic Text Similarity

In academic publishing, the integrity of the peer-review process is critical for maintaining scientific quality. However, this process faces a significant threat from sophisticated fraud, including the use of paraphrased template reports and coordinated misconduct by reviewer rings and paper mills.

Sarah Weber, 2025

Type of Thesis Bachelor Thesis
Client MDPI
Supervisor Pustulka, Elzbieta
Views: 53
While many major scientific publishers utilise systems to detect verbatim text reuse, these tools are often architecturally limited, leaving the peer-review process vulnerable to sophisticated threats like paraphrased content and coordinated fraud among different reviewers.
This thesis aimed to address these limitations by developing a more advanced, scalable detection system capable of understanding semantics. This project focuses on a dual-method prototype that combines a lexical n-gram baseline with a semantic similarity model. The semantic model uses embeddings and a vector database to perform a global, cross-reviewer comparison of all reports in a historical corpus
By operating on the level of semantics instead of just lexical matching, this approach makes it possible to uncover sophisticated misconduct, including paraphrased reports from different reviewers. The prototype's performance was validated through a large-scale test and qualitative interviews with research integrity experts.
Studyprogram: Business Information Technology (Bachelor)
Keywords Peer-Review Fraud, Template Detection, Coordinated Misconduct, Semantic Similarity, Vector Database
Confidentiality: öffentlich
Type of Thesis
Bachelor Thesis
Client
MDPI, Basel
Authors
Sarah Weber
Supervisor
Pustulka, Elzbieta
Publication Year
2025
Thesis Language
English
Confidentiality
Public
Studyprogram
Business Information Technology (Bachelor)
Location
Basel
Keywords
Peer-Review Fraud, Template Detection, Coordinated Misconduct, Semantic Similarity, Vector Database