Class Action Suit Against OpenAI Underscores Valuable Property Right Consumers Hold in Their Personal Data

“[T]his legal interest could be vindicated by the Northern California district court if it were to order OpenAI to pay a ‘data dividend’ to consumers during the period of misappropriated use.”

On June 28, a group of 16 individuals filed a class action complaint in the Northern District of California against generative artificial intelligence (GAI) developer OpenAI on several alleged violations of federal and state law on privacy, unfair business practices and computer fraud. The class action lawsuit’s discussion on property interests in consumer data underscores the intellectual property issues that have arisen since the advent of generative AI platforms like ChatGPT, which scrapes personal data and IP-protected material to train its GAI systems.

OpenAI’s Shift to Profit Model Based on Unauthorized Use of Consumer Personal Data

Although OpenAI was founded in 2015 as a nonprofit research facility to advance AI research for society as a whole, the individual plaintiffs allege that a $1 billion investment from Microsoft in 2019, followed by a $10 billion follow-up investment earlier this year, has shifted OpenAI’s organizational structure in concerning ways. The plaintiffs further allege that much of this new profit model is supported by secret web-scraping activities that pull data from copyrighted content, including writings and images, and user’s social media posts.

In discussing the property and privacy rights at issue, the class action plaintiffs draw parallels between ChatGPT’s web-scraping activities and those of facial recognition firm Clearview AI, which entered a settlement last May to terminate a lawsuit alleging that its web-scraping violated Illinois’ biometric privacy law. This lawsuit also pushed Clearview AI to register as a data broker in at least California and Vermont; OpenAI registering as such would enable consumers in those states to opt out of its data collection practices.

OpenAI’s data scraping activities arguably violate the property right that U.S. consumers hold in their personal data, a premise which U.S. courts have acknowledged in rulings like the Northern District of California’s 2021 decision in Calhoun v. Google. The plaintiffs allege that the personal information of most Internet users can be valued at anywhere from $15 to $40 per person, and the value of an individual’s online identity can increase to $1,200 when sold on the dark web.

Court-Ordered Data Dividend Could Vindicate Violation of Consumers’ Data Rights

The Ninth Circuit, the appellate court for the Northern District of California, has also found that California law recognizes a legal interest in profits unjustly earned from the misappropriation of personal data in In re: Facebook, Inc. Internet Tracking Litigation (2020). The plaintiffs argue that this legal interest could be vindicated by the Northern California district court if it were to order OpenAI to pay a “data dividend” to consumers during the period of misappropriated use. At least, OpenAI should pay a fair market value determinable by expert testimony submitted at trial.

The plaintiffs in the class action suit also allege that OpenAI’s web-scraping activities violate important privacy interests protected by law, the plaintiffs argue. OpenAI’s collection of personal information was carried on in secret, preventing at least California citizens from exercising their statutory rights to request deletion of their personal data. The plaintiffs further allege that OpenAI’s use of personal data aggregation to attract multi-billion dollar investments from Microsoft qualifies as the sale of personal data that, according to legislative history leading up to 2020’s enactment of the California Privacy Rights Act, Californians have the right to control.

Increasing FTC Regulation Will Likely Tighten Rules Around Large Language Models

While the impact of U.S. property and privacy law on AI platforms using large language models has yet to be fully understood, the plaintiffs point to the Federal Trade Commission’s (FTC) recent settlement with Amazon over that company’s illegal use of children’s voice data to train algorithms for its Alexa digital assistant platform. While that case involved alleged federal children’s privacy law violations not at issue in the OpenAI class action suit, FTC statements to media following the news indicate that the federal agency is looking at large language models being implemented across the tech agency.

OpenAI’s use of personal data is in excess of reasonable consent in several ways, the plaintiffs argue. Not only was plaintiffs’ personal data collected by websites incorporating ChatGPT plug-ins or API, but plaintiffs themselves were never asked for consent by OpenAI for ChatGPT’s personal data collections when registering to use the platform. OpenAI’s direct feeding of personal information into ChatGPT’s large language models further fails to meet cybersecurity guidelines for businesses promulgated by the FTC, which require encryption of personal data and disposal when no longer needed.

Image Source: Deposit Photos
Image ID: 640750110
Author: Skorzewiak

Steve Brachmann Steve Brachmann is a graduate of the University at Buffalo School of Law, having earned his Juris Doctor in May 2022 and served as the President of the Intellectual Property [...see more]

Warning & Disclaimer: The pages, articles and comments on IPWatchdog.com do not constitute legal advice, nor do they create any attorney-client relationship. The articles published express the personal opinion and views of the author as of the time of publication and should not be attributed to the author’s employer, clients or the sponsors of IPWatchdog.com.

Join the Discussion

4 comments so far.

Anon
July 2, 2023 12:53 pm
Pro Say,

Have you dived into the massive complaint in order to see what specific personal data is being alleged to having been illicitly scraped for AI training purposes?

Keep in mind – not only do MANY US citizens willingly, almost 0bsessively share their personal data, scraping off of public-facing sources is NOT stealing.
Pro Say
June 30, 2023 04:49 pm
Good. Personal data and IP-protected content is entitled to all state and federal protections. “AI-ing” such data and content changes nothing.

Absolutely nothing.

Stealing is still stealing.

Something Big Tech are masters of.
Anon
June 30, 2023 03:58 pm
If personal data are property how do excludability and rivalry work? Are they alienable? If I sell my personal data do I lose all my rights? Property law doesn’t feel like the right way to be thinking about this.
Anon
June 29, 2023 06:48 pm
Have not read the suit itself, but what specific personal data is being alleged to being scraped for AI training purposes?

Short of that, the (now usual) assertion of copyright infringement vis just that: an assertion.

In my view, it is a very weak assertion, and one not likely to survive an actual adjudication on the merits.