Authors Sue NVIDIA Over NeMo AI’s Copying Of Copyrighted Works

On Friday, March 8, 2024, three authors filed a class action lawsuit against NVIDIA Corporation in the U.S. District Court, Northern District of California, alleging that NVIDIA copied their copyrighted works to train its NeMo Megatron-GPT large language model artificial intelligence software programs.

Abdi Nazemian, Brian Keene, and Stewart O’Nan are authors who own registered copyrights in certain books. At issue in this lawsuit is Nazemian’s Like a Love Story, Keene’s Ghost Walk, and O’Nan’s Last Night at the Lobster. In September 2022, NVIDIA released its NeMo Megatron-GPT large language model artificial intelligence software programs “designed to emit convincingly naturalistic text outputs in response to user prompts.” Like other large language model programs, NeMo Megatron-GPT is “trained by copying an enormous quantity of textual works and then feeding these copies into the model.” During this training, the program “progressively adjusts its output to more closely approximate the protected expression copied from the training dataset. The [program] records the results of this process in a large set of numbers called weights.” NeMo Megatron-GPT 20B stores 20 billion weights.

The complaint alleges that a large quantity of the material in NVIDIA’s training dataset “comes from copyrighted works. . . that were copied by NVIDIA without consent, without credit, and without compensation.” The training dataset used by NVIDIA is named “The Pile,” which was curated by EleutherAI, a research organization. One of the components making up The Pile is a collection of books called Books3, comprised of 108 gigabytes of fiction and nonfiction books. The Books3 dataset contains approximately 196,640 books. The plaintiffs’ books are included in this dataset. The NeMo Megatron-GPT models are hosted on a website called Hugging Face. This website provides information about each model, stating that the models were trained on The Pile dataset. The Books3 dataset was available from the Hugging Face website until October 2023. When the dataset in question was removed from the site, a message stated that the dataset “is defunct and no longer accessible due to reported copyright infringement.”

The complaint states a cause of action for direct copyright infringement against NVIDIA under 17 U.S.C. § 501. The proposed class for the lawsuit includes anyone that owns a copyright in any work that was used as training data for the NeMo Megatron-GPT large language models. The complaint requests relief in the form of (1) a declaratory judgment; (2) an award of damages; (3) attorneys’ fees; (4) destruction of all copies of protected works made or used by NVIDIA; (5) pre- and post-judgment interest; and (6) a Court-approved notice program paid for by NVIDIA to provide notice to class members.

OpenAI, the maker of the ChatGPT large language model, is involved in similar lawsuits related to the copying of non-fiction books used for training ChatGPT.

Additional Reading

Nvidia is sued by authors over AI use of copyrighted works, Reuters (March 11, 2024)

Nazemian et al v. NVIDIA Corporation (Case No. 3:2024cv01454)

Complaint in Nazemian et al v. NVIDIA Corporation

OpenAI, Microsoft Face New Copyright Infringement Lawsuit, Justia Legal News (November 27, 2023)

Sarah Silverman Sues OpenAI and Meta for Copyright Infringement, Justia Legal News (July 12, 2023)

Photo Credit: Konstantin Savusia /