Evaluating the LLM-Enhanced Dialogue System

With the latest advancements in AI, voice based systems have improved, but still face a variety of issues, leading to conversations that lack human-like quality. To address these issues, the Institute of Informaiton Systems developed a new, hybrid variant of their LLM-enhanced dialogue system.

Mateusz Oskedra, 2025

Art der Arbeit Bachelor Thesis
Auftraggebende FHNW Institute of Information Systems
Betreuende Dozierende Zhong, Vivienne Jia
Views: 2
The hybrid variant of the system, utilizes the previous variant's fixed time threshold before responding, but combines it with the semantic analysis functionality. Instead of relying on fixed times to respond to the user, the Hybrid variant can evaluate if the sentence was finished or not, and respond faster or slower accordingly, leading to less interruptions or slow response times. Other functionality included i.a. dynamic interruption functionality. The goal of this thesis was to evaluate if the new variant outperforms the older one, as well as to create recommendations on its improvement.
Firstly a research on technology with similar intended functionality was done. Following that, an initial testing of the hybrid variant itself was done, examining the conversation flow, turn-taking, as well as input comprehension and recognition. Additionally, both variants of the systems were compared. Lastly, the user study was conducted, where both of the variants were tested out by the users, and their performance measured and compared.
The results of the user study showed that the hybrid version had an improved conversational flow as well as turn-taking, and reduced system interruptions, but struggled with higher latencies than its counterpart, in addition to the backchannel detection. The VAD-only variant, on the other hand, showed faster response latencies and better user interruption detection; however, it interrupted the users more often. Both of the variants could handle the Swiss German dialect, but not perfectly. The user feedback showed a preference for the VAD-only variant in most categories, with 70.59% favoring it overall, while the hybrid performed better in avoiding interruptions and perceived knowledgeability. It is recommended to the client to investigate the hybrid variant's latency issues, consider different LLM models for better response quality, and add visual elements to the system when the Pepper robot is not used during testing. Following that, it was recommended for the future user-testing to be based on more diverse test groups, in addition to conducting post-testing interviews with the users to collect additional feedback on the system.
Studiengang: Business Information Technology (Bachelor)
Keywords Dialogue System, Social Robots, Large Language Models, Swiss German
Vertraulichkeit: vertraulich
Art der Arbeit
Bachelor Thesis
Auftraggebende
FHNW Institute of Information Systems, Basel
Autorinnen und Autoren
Mateusz Oskedra
Betreuende Dozierende
Zhong, Vivienne Jia
Publikationsjahr
2025
Sprache der Arbeit
Englisch
Vertraulichkeit
vertraulich
Studiengang
Business Information Technology (Bachelor)
Standort Studiengang
Basel
Keywords
Dialogue System, Social Robots, Large Language Models, Swiss German