Uncategorized

ChatGPT Outperforms Undergrads In Intro-Level Courses, Falls Short Later

Peter Scarfe, a researcher at the University of Reading’s School of Psychology and Clinical Language Sciences, conducted an experiment testing the vulnerability of their examination system to AI-generated work. Using ChatGPT-4, Scarfe’s team submitted over 30 AI-generated answers across multiple undergraduate psychology modules, finding that 94 percent of these submissions went undetected and nearly 84 percent received higher grades than human counterparts. The findings have been published in the journal PLOS One. Ars Technica reports: Scarfe’s team submitted AI-generated work in five undergraduate modules, covering classes needed during all three years of study for a bachelor’s degree in psychology. The assignments were either 200-word answers to short questions or more elaborate essays, roughly 1,500 words long. “The markers of the exams didn’t know about the experiment. In a way, participants in the study didn’t know they were participating in the study, but we’ve got necessary permissions to go ahead with that,” Scarfe claims. Shorter submissions were prepared simply by copy-pasting the examination questions into ChatGPT-4 along with a prompt to keep the answer under 160 words. The essays were solicited the same way, but the required word count was increased to 2,000. Setting the limits this way, Scarfe’s team could get ChatGPT-4 to produce content close enough to the required length. “The idea was to submit those answers without any editing at all, apart from the essays, where we applied minimal formatting,” says Scarfe.

Overall, Scarfe and his colleagues slipped 63 AI-generated submissions into the examination system. Even with no editing or efforts to hide the AI usage, 94 percent of those went undetected, and nearly 84 percent got better grades (roughly half a grade better) than a randomly selected group of students who took the same exam. “We did a series of debriefing meetings with people marking those exams and they were quite surprised,” says Scarfe. Part of the reason they were surprised was that most of those AI submissions that were detected did not end up flagged because they were too repetitive or robotic — they got flagged because they were too good.

Out of five modules where Scarfe’s team submitted AI work, there was one where it did not receive better grades than human students: the final module taken by students just before they left the university. “Large language models can emulate human critical thinking, analysis, and integration of knowledge drawn from different sources to a limited extent. In their last year at the university, students are expected to provide deeper insights and use more elaborate analytical skills. The AI isn’t very good at that, which is why students fared better,” Scarfe explained. All those good grades Chat GPT-4 got were in the first- and second-year exams, where the questions were easier. “But the AI is constantly improving, so it’s likely going to score better in those advanced assignments in the future. And since AI is becoming part of our lives and we don’t really have the means to detect AI cheating, at some point we are going to have to integrate it into our education system,” argues Scarfe. He said the role of a modern university is to prepare the students for their professional careers, and the reality is they are going to use various AI tools after graduation. So, they’d be better off knowing how to do it properly.

Read more of this story at Slashdot.

Peter Scarfe, a researcher at the University of Reading’s School of Psychology and Clinical Language Sciences, conducted an experiment testing the vulnerability of their examination system to AI-generated work. Using ChatGPT-4, Scarfe’s team submitted over 30 AI-generated answers across multiple undergraduate psychology modules, finding that 94 percent of these submissions went undetected and nearly 84 percent received higher grades than human counterparts. The findings have been published in the journal PLOS One. Ars Technica reports: Scarfe’s team submitted AI-generated work in five undergraduate modules, covering classes needed during all three years of study for a bachelor’s degree in psychology. The assignments were either 200-word answers to short questions or more elaborate essays, roughly 1,500 words long. “The markers of the exams didn’t know about the experiment. In a way, participants in the study didn’t know they were participating in the study, but we’ve got necessary permissions to go ahead with that,” Scarfe claims. Shorter submissions were prepared simply by copy-pasting the examination questions into ChatGPT-4 along with a prompt to keep the answer under 160 words. The essays were solicited the same way, but the required word count was increased to 2,000. Setting the limits this way, Scarfe’s team could get ChatGPT-4 to produce content close enough to the required length. “The idea was to submit those answers without any editing at all, apart from the essays, where we applied minimal formatting,” says Scarfe.

Overall, Scarfe and his colleagues slipped 63 AI-generated submissions into the examination system. Even with no editing or efforts to hide the AI usage, 94 percent of those went undetected, and nearly 84 percent got better grades (roughly half a grade better) than a randomly selected group of students who took the same exam. “We did a series of debriefing meetings with people marking those exams and they were quite surprised,” says Scarfe. Part of the reason they were surprised was that most of those AI submissions that were detected did not end up flagged because they were too repetitive or robotic — they got flagged because they were too good.

Out of five modules where Scarfe’s team submitted AI work, there was one where it did not receive better grades than human students: the final module taken by students just before they left the university. “Large language models can emulate human critical thinking, analysis, and integration of knowledge drawn from different sources to a limited extent. In their last year at the university, students are expected to provide deeper insights and use more elaborate analytical skills. The AI isn’t very good at that, which is why students fared better,” Scarfe explained. All those good grades Chat GPT-4 got were in the first- and second-year exams, where the questions were easier. “But the AI is constantly improving, so it’s likely going to score better in those advanced assignments in the future. And since AI is becoming part of our lives and we don’t really have the means to detect AI cheating, at some point we are going to have to integrate it into our education system,” argues Scarfe. He said the role of a modern university is to prepare the students for their professional careers, and the reality is they are going to use various AI tools after graduation. So, they’d be better off knowing how to do it properly.

Read more of this story at Slashdot.

Read More 

Leave a Reply

Your email address will not be published. Required fields are marked *

Scroll to top
Generated by Feedzy