RAG-Assisted Knowledge Graph Construction for Course Recommendation System
The study was conducted under a Design Science Research (DSR) methodology to develop a scalable foundation for AI-driven, skill-based course recommendation. For each education program, the taught skills were automatically derived by configuring a Retrieval-Augmented Generation (RAG) pipeline with GPT-4 and grounding it in program descriptions. The generated skills were modeled as nodes and, together with the corresponding program entities, were loaded into a graph database (Neo4j), thereby instantiating a sustainable, domain-specific Knowledge Graph (KG).
Yaman, Ibrahim, 2025
Type of Thesis Master Thesis
Client
Supervisor Pustulka, Elzbieta, Fornari, Fabrizio
Views: 1
Data quality assurance and maintenance were addressed through structural/semantic placement checks and similarity-based deduplication. Cross-corpus similarity analyses were performed over (i) Scrambl’s proprietary database, (ii) the database produced in this study, and (iii) the database previously developed for Scrambl by Koller, 2025. Threshold-based vector similarity identified duplicate and near-duplicate skill entries; the ensuing consolidation reduced redundancy and improved consistency across corpora. The resulting dataset constitutes an enterprise-ready catalog of education programs and their machine-generated skill sets, whose alignment with source descriptions was verified.
Overall, the research delivers a reproducible pipeline for KG construction and upkeep - from automated skill extraction and graph ingestion to systematic deduplication - supporting explainable, skill-gap-aware recommendations at scale. The artifact
and its evaluation demonstrate that dynamically generated, quality-assured skill sets can be sustained within a graph-based architecture suitable for real-world workforce development contexts.
Studyprogram: Business Information Systems (Master)
Keywords
Confidentiality: öffentlich