homehome Home chatchat Notifications


Stochastic parrot? New study suggests ChatGPT plagiarizes beyond just "copy" and "paste"

If you're a student using ChatGPT, you may want to think again before using it.

Mihai Andrei
March 17, 2023 @ 1:20 pm

share Share

In the few months since ChatGPT was introduced publicly, it’s taken the world by storm. It has the ability to produce all sorts of text-based content, even passing exams that are challenging for humans. Naturally, students have started taking notice. You can use ChatGPT to help you with essays and all sorts of homework and assignments, especially since the content it outputs isn’t plagiarized — or isn’t it?

According to a new study, language models like ChatGPT can plagiarize on multiple levels. Even if they don’t always take ideas verbatim from other sources, they can rephrase or paraphrase ideas without changing the meaning at all, which is still not acceptable.

“Plagiarism comes in different flavors,” said Dongwon Lee, professor of information sciences and technology at Penn State and co-author of the new study. “We wanted to see if language models not only copy and paste but resort to more sophisticated forms of plagiarism without realizing it.” Lo and behold, it really did.

Image credits: Nick Morrison.

Being a university student nowadays can be pretty challenging. After the pandemic lockdown period, plenty of things have changed: universities face staff shortages and mental health problems as there’s much more online work to do, which can be challenging in multiple ways. In addition to technical challenges, like needing to own a laptop or computer with a stable enough internet connection, students have had to develop a complementary set of skills — particularly in terms of computer literacy. More and more, you need to know how to manage the online course management system, navigate through lectures and recordings, and edit and submit assignments and essays strictly digitally. A few years ago, you may have gotten away without using things such as Google Drive or a pdf editor but nowadays, that just doesn’t fly.

Understandably, students jumped at the opportunity of having an AI assistant do the work for them. At first glance, it seems safe to do because despite being trained on existing data, the AI produces new text which cannot be accused of plagiarism. Or so it would seem.

Lee and colleagues focused on identifying three forms of plagiarism:

  • verbatim, or direct copying;
  • paraphrasing or rephrasing;
  • rewording and restructuring content without quoting the original source.

All these are, in essence, plagiarism.

Because the researchers couldn’t construct a pipeline for ChatGPT, they worked with GPT-2, a previous iteration of the language model. They used 210,000 generated texts to test for plagiarism “in pre-trained language models and fine-tuned language models, or models trained further to focus on specific topic areas.” Overall, the team found that the AI engages in all three forms of plagiarism, and the larger the dataset the model was trained on, the more often the plagiarism occurred. This suggests that larger models would be even more predisposed to it.

“People pursue large language models because the larger the model gets, generation abilities increase,” said lead author Jooyoung Lee, doctoral student in the College of Information Sciences and Technology at Penn State. “At the same time, they are jeopardizing the originality and creativity of the content within the training corpus. This is an important finding.”

It’s not the first time something like this has been suggested. A paper that came out just over a year ago and was already cited over 1,300 times claims that this type of AI is a “stochastic parrot” — simply parroting existing information, without truly producing anything new.

It’s still early days for this type of technology and much more research is required to understand problems such as this one, but companies seem eager to release this technology into the wild before this kind of issue can be understood. According to the study authors, this research highlights the need for more research into the ethical conundrums that text generators pose.

“Even though the output may be appealing, and language models may be fun to use and seem productive for certain tasks, it doesn’t mean they are practical,” said Thai Le, assistant professor of computer and information science at the University of Mississippi who began working on the project as a doctoral candidate at Penn State. “In practice, we need to take care of the ethical and copyright issues that text generators pose.”

In the meantime, AI text generators are set to trigger an arms race. Plagiarism detectors are all over this — being able to detect ChatGPT shenanigans (or shenanigans from any generative AI) is valuable to ensure academic integrity. But whether or not they will actually succeed remains to be seen. For now, current tools don’t seem to do a good enough job.

Meanwhile, university students (and not only) will continue to use ChatGPT for their assignments if they can get away with it. A new dawn of plagiarism may be upon us, and it’s not so easy to tackle.

The researchers will present their findings at the 2023 ACM Web Conference, which takes place April 30-May 4 in Austin, Texas.

share Share

A London Dentist Just Cracked a Geometric Code in Leonardo’s Vitruvian Man

A hidden triangle in the vitruvian man could finally explain one of da Vinci's greatest works.

The Story Behind This Female Pharaoh's Broken Statues Is Way Weirder Than We Thought

New study reveals the ancient Egyptian's odd way of retiring a pharaoh.

China Resurrected an Abandoned Soviet 'Sea Monster' That's Part Airplane, Part Hovercraft

The Soviet Union's wildest aircraft just got a second life in China.

A Rocket Carried Cannabis Seeds and 166 Human Remains into Space But Their Capsule Never Made It Back

The spacecraft crashed into the Pacific Ocean after a parachute failure, ending a bold experiment in space biology and memorial spaceflight.

Ancient ‘Zombie’ Fungus Trapped in Amber Shows Mind Control Began in the Age of the Dinosaurs

The zombie fungus from the age of the dinosaurs.

Your browser lets websites track you even without cookies

Most users don't even know this type of surveillance exists.

What's Seasonal Body Image Dissatisfaction and How Not to Fall into Its Trap

This season doesn’t have to be about comparison or self-criticism.

Why a 20-Minute Nap Could Be Key to Unlocking 'Eureka!' Moments Like Salvador Dalí

A 20-minute nap can boost your chances of a creative breakthrough, according to new research.

The world's oldest boomerang is even older than we thought, but it's not Australian

The story of the boomerang goes back in time even more.

Swarms of tiny robots could go up your nose, melt the mucus and clean your sinuses

The "search-and-destroy” microrobot system can chemically shred the resident bacterial biofilm.