Code Quality of AI Chatbots
ChatGPT made a significant achievement in the AI chatbot field by attracting 100 million users within just two months of its launch, thanks to its human-like responses and the model's ability to answer a wide range of questions through its "ask me anything" capability. Google Bard was introduced in March 2023 and is specially designed to excel in coding tasks, reasoning and advanced mathematics. Code generation in multiple programming languages, bug fixing, and solving complex software tasks are among the prominent features of the two models.
Chamas, Joun, 2023
Art der Arbeit Master Thesis
Auftraggebende
Betreuende Dozierende Scherb, Christopher
Views: 42 - Downloads: 10
This study aims to evaluate the code quality produced by ChatGPT and Google Bard, identify potential risks, and propose specific recommendations.
To achieve this, two LLM evaluation datasets, comprising 246 prompts and 576 methods for code generation, were utilized to challenge the chatbots with simple and intricate Python programming tasks. Additionally, software quality metrics focused on functional correctness, maintainability, and performance efficiency were defined for comprehensive assessment.
The results indicate a low performance in both models, particularly in intricate and interconnected code scenarios. ChatGPT and Bard displayed suboptimal performance, with 30% and 17% pass rates, respectively. Additionally, ChatGPT demonstrated a pass rate of 68.90% when tasked with generating more straightforward and standalone Python methods, while Bard achieved 54.878%. Moreover, Both models showed a high compilation rate, ensuring correct syntax and executable code. However, the generated code frequently failed to deliver the intended functionality.
Studiengang: Business Information Systems (Master)
Keywords AI Chatbots, ChatGPT, Google Bard, LLM Evaluation Datasets, Software Quality Metrics, Automated Software Testing
Vertraulichkeit: öffentlich