Probabilistic Language Models for the Creation of Enterprise Knowledge Graphs

This master thesis explores the use of probabilistic (large) language models (LLMs) in ontology and knowledge graph construction for enterprises. It thereby addresses the challenges associated with conventional approaches, such as the high level of manual effort and the need for specialist expertise. The study adopts an inductive design science research strategy and applies a qualitative research methodology. For data collection, interviews with knowledge engineering experts and documents analysis are conducted.

Mathys, Adrian, 2023

Art der Arbeit Master Thesis

Auftraggebende

Betreuende Dozierende Laurenzi, Emanuele, Martin, Andreas

Views: 25 - Downloads: 13

Download

A literature review establishes the foundation, highlighting the advancements in artificial intelligence and the specifics of knowledge graphs and probabilistic language models. Based on the insights from knowledge engineers and the literature, a novel process for ontology and knowledge graph development is established. It encompasses six steps, beginning with informal competency questions, the design of the ontology schema and data integration, proceeding to data validation and instance creation, and ending with the maintenance and expansion of the knowledge graph. Through the course of the research, LLMs are applied to facilitate each of these steps. Corresponding scenarios are first proposed, then implemented and finally evaluated with an anonymised data set from an actual company.

The findings demonstrate that LLMs can significantly streamline the creation of ontologies and knowledge graphs, offering themselves as a new tool for knowledge engineers and business users. The two language models GPT 4 and Claude 2 achieve remarkable results in generating competency questions and ontology schemas from both unstructured and structured input. Good performance is also observed in code generation in RDF, SHACL, SPARQL and SWRL. The validity of the resulting code snippets is confirmed by means of a knowledge graph implementation on the AllegroGraph platform. While the employment of LLMs for the creation of enterprise knowledge graphs shows promise, there are some shortcomings. In general, the performance of LLMs varies depending on the quality and amount of input provided. In addition, the generated output requires manual validation. Although LLMs have the potential to reduce the reliance on specialised expertise, users still need to be familiar with the underlying ontology schema. Furthermore, concerns remain regarding data protection when interacting with LLMs. Nevertheless, if these obstacles can be overcome, LLMs have the power to democratise access to knowledge graphs.

Studiengang: Business Information Systems (Master)

Keywords

Vertraulichkeit: öffentlich

Art der Arbeit

Master Thesis

Autorinnen und Autoren

Mathys, Adrian

Betreuende Dozierende

Laurenzi, Emanuele, Martin, Andreas

Publikationsjahr

2023

Sprache der Arbeit

Englisch

Vertraulichkeit

öffentlich

Studiengang

Business Information Systems (Master)

Standort Studiengang

Olten