The Potential of Using Large Language Models for Multimodal Extraction and the Harmonization of Metadata in Audiovisual Broadcasting Archives
This thesis investigates the potential of large language models to enhance multimodal automated metadata extraction and the harmonization of metadata within audiovisual broadcasting archives.
Lauper, Nicolas, 2025
Type of Thesis Master Thesis
Client
Supervisor Martin, Andreas
Views: 1 - Downloads: 0
Using a design science research framework, a pipeline was developed that integrates outputs from speech, image, and text analysis modules into schema-compliant JSON records.
The evaluation employed a dual methodology combining quantitative similarity metrics and a qualitative expert assessment. Participants evaluated the automatically generated metadata based on Likert-scale criteria and open-ended commentary while comparing it to original human-authored archival records.
The findings demonstrate that large language models can substantially improve the accessibility and structural coherence of metadata. However, consistency, factual accuracy, and keyword relevance remain inconsistent, limiting the overall reusability and reliability of the output. The overall assessments were moderate, acknowledging practical usefulness and potential for time savings while emphasizing that professional integration still requires improvement. Consequently, the study concludes that current LLM-based systems are most suitable as assistive tools within hybrid-workflows, where automated outputs serve as a preliminary layer for professional curation rather than as fully autonomous metadata solutions.
Studyprogram: Business Information Systems (Master)
Keywords large language models; multimodal metadata extraction; audiovisual archives; metadata harmonization; automated metadata extraction; archival workflows
Confidentiality: öffentlich