Multimodal Human-Robot Interaction Combining Speech, Facial Expressions, and Eye Gaze
Applewhite, Timothy, 2021
Art der Arbeit Master Thesis
Auftraggebende
Betreuende Dozierende Zhong, Vivienne Jia, Dornberger, Rolf
Keywords
Views: 69 - Downloads: 14
Human-Robot Interaction (HRI) is being applied more and more frequently in different areas with the continuous uprise of technology. While initially, dialogue systems only involved recognizing a spoken or text input, the shift has changed towards multimodal dialogue systems. In essence, this implies that multiple input channels are retrieved, such as apart from the verbal input, also various nonverbal channels such as the human’s facial expression. The artifact developed during this Master thesis consists of a multimodal dialogue system involving Pepper, a humanoid robot developed by SoftBank Robotics (n.d.-a) and input channels capturing the human’s speech, facial expression, and eye gaze. By establishing a modular architecture using network communication, the collected inputs are combined and sent to Rasa (2021), an open-source conversational agent running on an intermediate server. Upon Pepper receiving the response selected by Rasa, a body language animation is performed, and an emoji matching the social context is displayed on the attached tablet, while simultaneously speaking the response back to the interacting partner. The results of the evaluation phase propose that while speech and eye gaze recognition achieve high levels of accuracy, the facial expression recognition component cannot provide the same reliability. Apart from the facial expression recognition concept proposed by SoftBank Robotics (n.d.-g), two different approaches were defined by the author of this Master thesis and require further evaluation. The overall response times of the multimodal system are kept low with Rasa requiring the majority of the time to select the appropriate response.
Studiengang: Business Information Systems (Master)
Vertraulichkeit: öffentlich