The Lawsuits Against the Company Reshaping Tech

Guy DeMarco

2 years ago

Here’s a non-ChatGPT-generated quote from OpenAI CEO Sam Altman: “AI will probably most likely lead to the end of the world, but in the meantime, there’ll be a lot of great companies.”

AI may be a revolutionary advancement that reshapes technology, with the potential to maximize profit and efficiency. However, the lack of a legal framework and understanding of AI creates an open question as to issues core to the new innovations. For now, these questions will likely be resolved through first of its kind litigation.

According to Docket Alarm analysis, OpenAI has only been involved in twenty-two federal legal proceedings since ChatGPT’s public release in November 2022 up to April 10th of this year. However, these statistics may understate risk of legal consequences – some of these lawsuits could seriously impact OpenAI.

OpenAI is no stranger to press since the public release of ChatGPT; however, the company is receiving unwanted legal attention from one of its founding board members. Elon Musk sued OpenAI in early March, claiming the originally non-profit company breached its founding agreement. “To this day, OpenAI, Inc.’s website continues to profess that its charter is to ensure that AGI [artificial general intelligence] benefits all of humanity. In reality, however, OpenAI, Inc. has been transformed into a closed-source de facto subsidiary of the largest technology company in the world: Microsoft,” the lawsuit alleges. In response to the lawsuit, OpenAI revealed Elon Musk attempted to buy the company back in 2018, and then left when they rejected his offer.

ChatGPT has only been a public phenomenon over the last few years. Docket Alarm analysis of OpenAI’s litigation history shows that artificial intelligence’s legal story is just beginning.

OpenAI is most frequently involved in copyright, contract, and fraud cases.

Understanding how artificial intelligence tools like ChatGPT operates is critical to understanding how it could infringe copyright law. Artificial intelligece products like ChatGPT operate through Large Language Models (LLMs).

A class action lawsuit brought by anonymous plaintiffs, quoting from a Medium post, says that “Despite OpenAI’s ‘absolute secrecy’ surrounding its data collections and practices, we know at the highest levels that the Company used (at least) five (5) distinct datasets to train ChatGPT: (1) Common Crawl; (2) WebTex2, text of webpages from all outbound Reddit links from posts with 3+ upvotes; (3) Books1; (4) Books2; and (5) Wikipedia.”

The complaint explains that “The Common Crawl dataset is owned by a non-profit of the same name, which has been indexing and storing as much of the World Wide Web as it can access, filing away as many as 3 billion webpages every month, for over a decade.” The plaintiffs add that the non-profit shares the data for free but for research and educational purposes.

Addressing intellectual property issues that could result from the acquisition of these datasets, including the Reddit links, Steve Huffman, the cofounder of Reddit, commented, “The Reddit corpus of data is really valuable. But we don’t need to give all of that value to some of the largest companies in the world for free.”

According to other lawsuits from institutions like The New York Times, The Author’s Guild and creators like Sarah Silverman and Michael Chabon, the business model underlying OpenAI relies on copyright infringement. The Times’ lawsuit states that “OpenAI quickly became a multi-billion-dollar for-profit business built in large part on the unlicensed exploitation of copyrighted works belonging to The Times and others.”

A series of contract cases against both OpenAI and Microsoft-owned GitHub allege that Copilot, a coding tool offered by GitHub starting in 2021, is powered by unlicensed scraping of the code that plaintiffs host on the GitHub platform in violation of GitHub’s open-source license as well as the Digital Millennium Copyright Act. Open-source licensing generally allows the free and public use of the open-source code as long as an attribution to the original author is provided. Copilot is alleged to reproduce code that can be traced back to its open-source origin.

On occasion large language models fabricate an answer to a question – ChatGPT is not immune to this phenomenon. OpenAI refers to this mishap as a “hallucination.” However, these hallucinations can have legal repercussions. In one instance, in response to a journalist’s probing, an unrelated third party was falsely accused by ChatGPT of defrauding and embezzling funds from the Second Amendment Foundation, resulting in a defamation lawsuit.

ChatGPT also gained notoriety in the legal community as lawyers who used ChatGPT to conduct legal research inadvertently provided courts with fictitious citations, hallucinated by the large language model. In some cases this has resulted in fines.

As new technologies threaten to change the world, they reveal gaps in our legal systems – important questions that balance the benefits of new technologies against the people and businesses they disrupt. OpenAI and other AI companies are sure to be challenged by these questions in the coming years.