Persona- and Knowledge-Driven Synthetic Training Data Generation
Large Language Models (LLMs) are increasingly used in dialogue systems but often fall short in addressing the diverse communication needs of different user personas.
Grob, Annick, 2025
Type of Thesis Master Thesis
Client
Supervisor Martin, Andreas, Witschel, Hans Friedrich
Views: 1 - Downloads: 0
This thesis proposes a pipeline for generating persona-specific synthetic training data with the aim of tailoring LLM-based responses to distinct communicative styles, without compromising factual correctness.
Following the Design Science Research methodology, an artifact was developed that integrates supervised fine-tuning (SFT) and Direct Preference Optimization (DPO) to adapt a language model to four defined personas: citizens, journalists, politicians, and subject matter experts.
The resulting system was evaluated through empirical testing, including interviews and blind model comparisons. While no definitive improvement in output quality could be established, the results indicate that the approach may support better persona consistency and stylistic adaptation in specific contexts. The study contributes a replicable pipeline and lays the groundwork for future research on preference-aligned and audience-sensitive language model tuning.
Studyprogram: Business Information Systems (Master)
Keywords
Confidentiality: öffentlich