Transform-by-Context - An LLM-based approach to retrieve data mappings and transformation rules for data migrations
Data migration is a complex and resource-intensive task, particularly when transitioning from legacy systems to modern data architectures.
Cueni, Sven, 2025
Art der Arbeit Master Thesis
Auftraggebende
Betreuende Dozierende Jüngling, Stephan
Views: 1 - Downloads: 0
A central challenge lies in the manual derivation of data mapping and transformation rules, which requires significant domain knowledge and time. This thesis investigates the use of large language models (LLMs) to support the retrieval of transformation rules by exploiting contextual information from source data, target schemas, and metadata. The proposed approach, referred to as transform-by-context, is positioned as an alternative to traditional transform-by-example and transform-by-pattern methods, which struggle with complex, multi-attribute transformations.
A structured literature review shows that existing commercial migration tools provide limited automation and that current academic research on LLM-based data migration remains fragmented. Key factors influencing LLM performance include model size, prompt design, context retrieval, and tool integration. Based on these findings, the thesis proposes both a local and a hybrid system architecture for LLM-assisted transformation rule retrieval and defines an experimental setup to evaluate their effectiveness.Following a design science research methodology, an artefact is developed that integrates conversational LLM interfaces with structured context retrieval using the Model Context Protocol (MCP). The artefact enables LLMs to analyse schemas and data instances and iteratively derive transformation rules. Evaluation is conducted using a synthetic and a real-world dataset. Quantitative results are compared against a validated ground truth, complemented by qualitative expert feedback.
The results indicate that LLMs are not yet reliable for fully automated database-level migrations. However, when applied iteratively and with carefully engineered context, they achieve high accuracy in deriving complex transformation rules at table and attribute level, significantly reducing manual effort. The thesis concludes that LLMs are well suited as context-aware assistants in data migration processes and provides practical guidance for their integration into real-world workflows.
Studiengang: Business Information Systems (Master)
Keywords Data migrations, data transformation, data mapping, auto-transform, transform-by-context, transform-by-pattern, transform-by-example, large language model, transformation rules, Model Context Protocol, Claude Desktop, Ollama, LM Studio
Vertraulichkeit: öffentlich