Programmers have been embracing AI chatbots like ChatGPT for coding help, but a new study throws cold water on that trust. The research found that a whopping 52% of the time, ChatGPT gave wrong answers to programming questions. This is concerning since programmers rely on accuracy and precision in their work.
The study, titled “Is Stack Overflow Obsolete? An Empirical Study of the Characteristics of ChatGPT Answers to Stack Overflow Questions,” highlights a common issue with AI for creative tasks. Chatbots like ChatGPT can sometimes fabricate entirely incorrect answers.
Researchers from Purdue University analyzed 517 questions on Stack Overflow, a popular programmer forum, and how ChatGPT responded to them. Over half (52%) of the chatbot’s answers contained misinformation. Furthermore, 77% were overly wordy compared to human responses, and 78% showed inconsistencies.
The analysis also showed that ChatGPT leans towards formal, analytical language and avoids negativity. However, it’s more prone to conceptual errors than factual mistakes. The study suggests that the bot often misses the question’s context, leading to inaccurate answers.
Interestingly, despite the high error rate, 39% of programmers in a small survey by the researchers preferred ChatGPT’s answers. Even more concerning, 39% couldn’t detect the AI-generated errors. The study suggests that ChatGPT’s articulate, comprehensive, and polite style can lull users into a false sense of security, making them overlook misinformation. Errors were only spotted when they were obvious. Complex or un-verifiable answers often went unnoticed, potentially leading to serious consequences.
This research highlights the limitations of current AI algorithms and the need for better communication about the reliability of AI-generated content. While some language models mention the possibility of errors, the researchers believe it’s not enough. They recommend disclaimers that indicate the level of uncertainty or potential errors in the answers.
The study emphasizes the value of critical thinking in the assessment of AI-generated information, particularly in domains that need a high degree of precision. Therefore, even if AI assistants might be useful, it’s important to verify your responses with them before entrusting them with important jobs.