Data Privacy (/privacy). We would like to inform you that our site uses cookies to improve your browsing experience. By using our website, you agree to the use of cookies in accordance with applicable laws and regulations. Akzeptieren

Persona- and Knowledge-Driven Synthetic Training Data Generation

Large Language Models (LLMs) are increasingly used in dialogue systems but often fall short in addressing the diverse communication needs of different user personas.

Grob, Annick, 2025

Type of Thesis Master Thesis

Client

Supervisor Martin, Andreas, Witschel, Hans Friedrich

Views: 4 - Downloads: 0

Share

Download

This thesis proposes a pipeline for generating persona-specific synthetic training data with the aim of tailoring LLM-based responses to distinct communicative styles, without compromising factual correctness.

Following the Design Science Research methodology, an artifact was developed that integrates supervised fine-tuning (SFT) and Direct Preference Optimization (DPO) to adapt a language model to four defined personas: citizens, journalists, politicians, and subject matter experts.

The resulting system was evaluated through empirical testing, including interviews and blind model comparisons. While no definitive improvement in output quality could be established, the results indicate that the approach may support better persona consistency and stylistic adaptation in specific contexts. The study contributes a replicable pipeline and lays the groundwork for future research on preference-aligned and audience-sensitive language model tuning.

Studyprogram: Business Information Systems (Master)

Keywords

Confidentiality: öffentlich

Type of Thesis

Master Thesis

Authors

Grob, Annick

Supervisor

Martin, Andreas, Witschel, Hans Friedrich

Publication Year

2025

Thesis Language

English

Confidentiality

Public

Studyprogram

Business Information Systems (Master)

Location

Olten

Offer

School of Business

Information

Impressum