The remarkable capabilities of ChatGPT have been demonstrated through a recent study published in Scientific Reports. The research explores ChatGPT’s ability to tackle university-level assessment questions across various subjects, including computer science, political studies, engineering, and psychology.
Researchers Talal Rahwan and Yasir Zaki embarked on a unique investigation, collaborating with faculty members from New York University Abu Dhabi (NYUAD) who taught diverse courses. These educators contributed three student-written submissions for each of the ten assessment questions they had formulated.
The study involved ChatGPT in generating responses to the same ten questions. These generated answers were evaluated alongside genuine student responses by a panel of three graders. Impressively, ChatGPT’s responses earned an equivalent or superior average grade compared to students’ performance in nine out of thirty-two courses.
Interestingly, ChatGPT’s dominance was most pronounced in the “Introduction to Public Policy” course, where its average grade substantially outshined the students’ average – 9.56 for ChatGPT versus 4.39 for students. However, it’s noteworthy that only mathematics and economics courses consistently saw students outperforming ChatGPT-generated responses.
Beyond evaluating ChatGPT’s performance, the research delved into attitudes toward using AI tools like ChatGPT for academic assignments. A survey involved 1,601 participants from Brazil, India, Japan, the US, and the UK, including a significant representation of students and educators from each country. 74% of students were willing to incorporate ChatGPT into their academic work, suggesting a keen interest in leveraging AI-powered assistance.
In stark contrast, educators across all surveyed countries seemed to hold reservations about ChatGPT’s role in academia. The majority of educators, around 70%, believed that using ChatGPT for assignments should be categorized as plagiarism. This dichotomy between students’ acceptance and educators’ skepticism hints at potential conflicts over the integration of AI tools in educational settings.
The study’s implications extended to the efficacy of tools designed to identify AI-generated text. GPTZero and an AI text classifier, despite their purpose, showed a notable inability to differentiate between ChatGPT-generated responses and human-written answers consistently. The misclassification rates were significant, with GPTZero misclassifying the AI-generated responses as human-written 32% of the time, while the AI text classifier fared worse at 49%.