“[T]his legal interest could be vindicated by the Northern California district court if it were to order OpenAI to pay a ‘data dividend’ to consumers during the period of misappropriated use.”
On June 28, a group of 16 individuals filed a class action complaint in the Northern District of California against generative artificial intelligence (GAI) developer OpenAI on several alleged violations of federal and state law on privacy, unfair business practices and computer fraud. The class action lawsuit’s discussion on property interests in consumer data underscores the intellectual property issues that have arisen since the advent of generative AI platforms like ChatGPT, which scrapes personal data and IP-protected material to train its GAI systems.
OpenAI’s Shift to Profit Model Based on Unauthorized Use of Consumer Personal Data
Although OpenAI was founded in 2015 as a nonprofit research facility to advance AI research for society as a whole, the individual plaintiffs allege that a $1 billion investment from Microsoft in 2019, followed by a $10 billion follow-up investment earlier this year, has shifted OpenAI’s organizational structure in concerning ways. The plaintiffs further allege that much of this new profit model is supported by secret web-scraping activities that pull data from copyrighted content, including writings and images, and user’s social media posts.
In discussing the property and privacy rights at issue, the class action plaintiffs draw parallels between ChatGPT’s web-scraping activities and those of facial recognition firm Clearview AI, which entered a settlement last May to terminate a lawsuit alleging that its web-scraping violated Illinois’ biometric privacy law. This lawsuit also pushed Clearview AI to register as a data broker in at least California and Vermont; OpenAI registering as such would enable consumers in those states to opt out of its data collection practices.
OpenAI’s data scraping activities arguably violate the property right that U.S. consumers hold in their personal data, a premise which U.S. courts have acknowledged in rulings like the Northern District of California’s 2021 decision in Calhoun v. Google. The plaintiffs allege that the personal information of most Internet users can be valued at anywhere from $15 to $40 per person, and the value of an individual’s online identity can increase to $1,200 when sold on the dark web.
Court-Ordered Data Dividend Could Vindicate Violation of Consumers’ Data Rights
The Ninth Circuit, the appellate court for the Northern District of California, has also found that California law recognizes a legal interest in profits unjustly earned from the misappropriation of personal data in In re: Facebook, Inc. Internet Tracking Litigation (2020). The plaintiffs argue that this legal interest could be vindicated by the Northern California district court if it were to order OpenAI to pay a “data dividend” to consumers during the period of misappropriated use. At least, OpenAI should pay a fair market value determinable by expert testimony submitted at trial.
The plaintiffs in the class action suit also allege that OpenAI’s web-scraping activities violate important privacy interests protected by law, the plaintiffs argue. OpenAI’s collection of personal information was carried on in secret, preventing at least California citizens from exercising their statutory rights to request deletion of their personal data. The plaintiffs further allege that OpenAI’s use of personal data aggregation to attract multi-billion dollar investments from Microsoft qualifies as the sale of personal data that, according to legislative history leading up to 2020’s enactment of the California Privacy Rights Act, Californians have the right to control.
Increasing FTC Regulation Will Likely Tighten Rules Around Large Language Models
While the impact of U.S. property and privacy law on AI platforms using large language models has yet to be fully understood, the plaintiffs point to the Federal Trade Commission’s (FTC) recent settlement with Amazon over that company’s illegal use of children’s voice data to train algorithms for its Alexa digital assistant platform. While that case involved alleged federal children’s privacy law violations not at issue in the OpenAI class action suit, FTC statements to media following the news indicate that the federal agency is looking at large language models being implemented across the tech agency.
OpenAI’s use of personal data is in excess of reasonable consent in several ways, the plaintiffs argue. Not only was plaintiffs’ personal data collected by websites incorporating ChatGPT plug-ins or API, but plaintiffs themselves were never asked for consent by OpenAI for ChatGPT’s personal data collections when registering to use the platform. OpenAI’s direct feeding of personal information into ChatGPT’s large language models further fails to meet cybersecurity guidelines for businesses promulgated by the FTC, which require encryption of personal data and disposal when no longer needed.
Image Source: Deposit Photos
Image ID: 640750110