Persona- and Knowledge-Driven Synthetic Training Data Generation

Large Language Models (LLMs) are increasingly used in dialogue systems but often fall short in addressing the diverse communication needs of different user personas.

Grob, Annick, 2025

Type of Thesis Master Thesis
Client
Supervisor Martin, Andreas, Witschel, Hans Friedrich
Views: 1 - Downloads: 0
This thesis proposes a pipeline for generating persona-specific synthetic training data with the aim of tailoring LLM-based responses to distinct communicative styles, without compromising factual correctness.
Following the Design Science Research methodology, an artifact was developed that integrates supervised fine-tuning (SFT) and Direct Preference Optimization (DPO) to adapt a language model to four defined personas: citizens, journalists, politicians, and subject matter experts.
The resulting system was evaluated through empirical testing, including interviews and blind model comparisons. While no definitive improvement in output quality could be established, the results indicate that the approach may support better persona consistency and stylistic adaptation in specific contexts. The study contributes a replicable pipeline and lays the groundwork for future research on preference-aligned and audience-sensitive language model tuning.
Studyprogram: Business Information Systems (Master)
Keywords
Confidentiality: öffentlich
Type of Thesis
Master Thesis
Authors
Grob, Annick
Supervisor
Martin, Andreas, Witschel, Hans Friedrich
Publication Year
2025
Thesis Language
English
Confidentiality
Public
Studyprogram
Business Information Systems (Master)
Location
Olten