What’s Your Source? Plagiarism, AI, and the Mess We’ve Created

“GAI is a new tool with great potential, but we need to learn how to use it without sacrificing our ethical standards and integrity, nor sacrificing the fundamentals and protections of copyrights.”

To plagiarize means to take someone else’s work and pass it off as your own. Plagiarism can be blatant copying of someone else’s work or slightly restating another’s work without original input or insight. Essentially, plagiarism is literary theft and an ethical and integrity issue for those accused of it, or doing it. Today, evidence of potential plagiarism appears commonplace in many aspects of life, whether it be education, journalism or the law, although it is deemed generally unacceptable.

With the explosion of generative artificial intelligence (GAI) capabilities (such as ChatGPT) and its ability to produce text, video, images and other data, the use of GAI ironically endorses the wholesale copying of content to “inform” its analysis and task completion, while presenting itself as a useful tool to detect plagiarism by others. These days, you can ask a GAI service to draft documents (ChatGPT) and check for spelling errors or instruct you on the clarity of your writing (Grammarly). AI has done wonders for peoples’ efficiency, but it has the capability to blur the ethical and integrity lines with plagiarism and create legal issues and concerns under copyright law.

As an example, TurnItIn (a leading provider of AI detection software), after its first year of service, identified that of the 200 million papers it reviewed, 11%had at least 20% AI writing present and 3% had at least 80%AI writing present. The accuracy of Turnitin’s analysis has come under fire as students who submit their papers to Turnitin also use the grammar-correcting software Grammarly, as recommended by the schools they attend. Turnitin’s subsequent review of such a paper may result in the false flagging of AI writing present, when in fact the writing was generated by the student. These results have both students and universities hesitant to rely upon Turnitin.

Trouble with the (Learning) Curve: The Impacts of AI-Generated Content

AI is designed to make life easier for us. The ability to enter in a simple prompt and get a detailed response simplifies even the most difficult tasks. However, one must proceed with caution. The content produced by these AI models are creating some issues because the results are often inaccurate and sometimes made up. The recent horror stories surrounding AI-generated content are concerning, to say the least.

For example, let’s take an early case in which a New York lawyer used ChatGPT to assist him with a federal court filing. What is wrong with this? The cases that ChatGPT generated to help with his filing did not exist. According to Forbes, Stephen A. Schwartz—the lawyer at issue— had never used ChatGPT before and when he asked ChatGPT if the cases were real, the chatbot said they were. Schwartz, along with a few other lawyers at his firm, relied on this information for their case. Per Reuters, Schwartz and company were sanctioned by the court and ordered to pay a $5,000 fine. Putting aside the questionable strategy behind Schwartz’s decision, the bottom line is that AI may create factually inaccurate and nonexistent sources to which people are blindly relying on daily.

More recently, in May 2024, Google debuted its new GAI powered search feature that allowed Google’s AI software to answer a user’s question instead of just listing out relevant websites. As noted by the Washington Post, the feature was producing strange responses that ranged from “absurd to dangerous” such as suggesting that users mix glue into pizza ingredients so the cheese does not slide off and telling users that there is no country in Africa that starts with the letter “K.” (If you are thinking about it – it’s Kenya). GAI is inherently unpredictable and may produce content that is just wrong.

People are catching on to the problems with AI-generated content and the public perception of AI is becoming increasingly suspect. According to a survey conducted by the Artificial Intelligence Policy Institute (AIPI), 64% of voters support the government creating an organization tasked with auditing AI while just 14% oppose it. Additionally, the Center for AI Policy conducted a poll showing that 78% of Americans think AI should be more regulated, with 52% saying AI should be “much more” regulated. People are becoming disenchanted with the current state of AI and the potential impacts. While GAI platforms have the promise to be a useful tool, there are ample reasons to be skeptical.

Inside the Oracles: How Do AI and Plagiarism Detection Software Work (and How Well?)

The rise in GAI programs has raised the stakes when it comes to verifying both the sources and the author(s) of written work. This has heightened the importance of both plagiarism and AI detection software. Knowing that these types of software work differently is key to understanding the limitations of each. Scribbr, a company offering both AI and plagiarism software, describes the differences between the two types.

How does AI detection software work?

AI detection software measures two specific characteristics of the given text – “perplexity” and “burstiness”. Perplexity can be defined as a measure of how likely an average reader is going to be confused by the given text. Burstiness measures the variation of both sentence structure and length. What’s surprising is that this software does not rely upon a database. It solely measures perplexity and burstiness – the writing style of the author and the content of the submitted text.

How does plagiarism detection software work?

Plagiarism detection software differs from its AI counterpart. This software does rely upon a database of previously published work to determine if the submitted text has been plagiarized. The software combs through its data to determine if the submitted text is too alike any prior published work it has on file. It does not, however, measure specific characteristics of the submitted text.

How well does each software work?

A study investigated the effectiveness of AI detection software. After testing 16 different software programs, the results showed that only three of the software programs (Copyleaks, Originality.ai, and TurnItIn) had “very high” accuracy when detecting GPT-4 papers. Altogether, the 16 programs used in the study only correctly identified 61% of the GPT-4 papers. The varying results help explain why TurnItIn, one of the “very high” accuracy programs in the study, would publicly state that its software is not always accurate.

As for plagiarism detection software, a study published in Applied Medical Informatics found that none of the programs used were able to detect paraphrased text. Consequently, the researchers stated that human supervision should be required even when using plagiarism detection software.

The accuracy of both plagiarism and AI detection programs matter. A missed plagiarized idea or a false positive can have broad implications for anyone in academia, journalism or law. Just ask Marley Stevens, a college student at the University of North Georgia. She recently learned that her paper had been flagged for using AI. The problem? She had never used it. According to MSN, Stevens had used Grammarly before submitting her paper, which caused the school’s AI detection software to flag her work. She received a zero on the paper, and consequently found herself on academic probation. Inside Higher Ed, a publication for the higher education community, has warned professors to take the results of AI detection with a grain of salt because “the boundaries are much blurrier now.” With the rise of the “Plagiarism War” and AI, the accuracy of detection software matters more than ever.

Intellectual Property (and Legal) Implications of AI and Plagiarism

U.S. copyright law protects all works of authorship fixed in a tangible medium of expression from which they can be perceived, reproduced, or otherwise communicated, either directly or with the aid of a machine or device. 17. U.S.C.§ 102. The rise of AI is pushing this statute to its limits. Much of the data AI programs were trained on are copyrighted. Many of the AI developers have had to address the potential copyright issues created by the use of these GAI programs by offering copyright infringement commitments and protections. However, not all GAI developers offer such protections. While there may be protections regarding copyright with the use of AI, it does not protect against the ethical and integrity issues associated with plagiarism – or just plain copying.

Even the practice of law has seen issues associated with copying. Lawyers in their practice typically rely on precedents, whether that be in litigation or transactional work. Leveraging prior briefs, especially those that contain successful arguments, is encouraged, if not required in the practice of law. However, lawyers can copy too much of a brief. A recently filed lawsuit between two law firms exemplifies this issue. The plaintiff alleges the defendant copied the entirety of their Rule 12 motion from their brief. In support of their case, the plaintiff cites Newegg Inc. v. Ezra Sutton, P.A., which granted partial summary judgment holding the defendant had failed to establish a fair use claim because the plaintiff had registered their brief with the United States Copyright Office before filing it with the court. Newegg Inc. v. Ezra Sutton, P.A., 2016 U.S. Dist. LEXIS 124981 (C.D. Cal. September 13, 2016). Whether this claim stands is still uncertain, but it introduces why IP law is tethered to plagiarism and AI.

Not only are lawyers using GAI in their day-to-day work, but judges are too. Judge Kevin Newsom of the Eleventh Circuit has made headlines after his concurrence in a recent decision. Newsom, who agreed with the majority opinion in full, wrote a 32-page concurrence on how he used ChatGPT to help inform his decision in the case. The question before the court was whether an in-ground trampoline installation met the ordinary meaning of “landscaping.” Snell v. United Specialty Ins. Co., No. 22-12581 (11th Cir. May 28, 2024). Newsom asked ChatGPT the pressing question and out came a response detailing why installing an in-ground trampoline can be considered a part of landscaping. Id. at 9 (Newsom, J., concurring). Newsom contends that language learning models (LLMs) are “quite literally ‘taught’ using data that aims to reflect and capture how individuals use language in their everyday lives. Id. at 11 (Newsom, J., concurring). Newsom feels that LLMs are important instruments to inform inquiries into ordinary meaning and he urges the legal community to follow suit and embrace this technology in practice.

The majority of judges disfavor using GAI to such an extent that they restrict lawyers’ ability to use it. Judge Newsom, however, appears to be in the minority here. His use of GAI in this case is most certainly controversial, but quite possibly innovative. Nevertheless, this is just the latest story demonstrating that this technology is making its way into every facet of life and work – and the law is no exception. Although the implications of Judge Newsom’s actions are yet to be seen, the issues involving AI and plagiarism still loom large over this generational technology.

The Path Forward: Where Will AI, IP and Plagiarism Go from Here?

As AI and other future techs evolve (See Quantum Computing) how do we manage AI, plagiarism and copyright issues in education, journalism, research and in the practice of law? GAI is a new tool with great potential, but we need to learn how to use it without sacrificing our ethical standards and integrity, nor sacrificing the fundamentals and protections of copyrights. As we move forward, we must find a balance between the AI and plagiarism detection programs – which may or may not be right – and the further education of our students, educators, journalists and even lawyers on what is right and what is wrong.

Note: We would like to acknowledge and thank Drew Engel and Logan Woodward for their assistance in researching and drafting this Article.

Image Source: Deposit Photos
Author: IuriiMotov
Image ID: 394981892

Felicia J. Boyd Felicia Boyd is Head of IP Brands, United States, with Norton Rose Fulbright US LLP and a Chambers ranked litigator with extensive experience as a leader in complex IP disputes [...see more]

Chuck Hollis Chuck Hollis is Head of Artificial Intelligence, United States, with Norton Rose Fulbright. He is a technology, outsourcing and strategic commercial transactions lawyer handling a range of technology and commercial [...see more]

Warning & Disclaimer: The pages, articles and comments on IPWatchdog.com do not constitute legal advice, nor do they create any attorney-client relationship. The articles published express the personal opinion and views of the author as of the time of publication and should not be attributed to the author’s employer, clients or the sponsors of IPWatchdog.com.

Join the Discussion

2 comments so far.

Anon
June 18, 2024 06:48 pm
Given the extremely broad categorization of things “AI,” context — across several vectors — is critical.

For a quick example, Generative AI is a rather specific subset of AI, and one should not expect Generative AI to be factually dispositive, and it is the user’s error in not understanding the tool if factually dispositive results are being achieved expected (contrast with other non-Generative expert systems).
Pro Say
June 18, 2024 03:03 pm
“a useful tool to detect plagiarism by others.”

Heal first thyself, AI.