Meta Platforms is facing a major copyright infringement lawsuit following allegations that CEO Mark Zuckerberg approved the use of pirated content to train the company’s AI models, particularly the Llama large language model.
The lawsuit, filed by authors including Sarah Silverman and Ta-Nehisi Coates, claims that Meta used a dataset known as Library Genesis (LibGen), a collection of pirated books and academic articles, to develop its AI technologies.
Internal communications reveal that Meta’s AI team was aware of the legal risks associated with using LibGen. Despite concerns, the decision was escalated to Zuckerberg, who reportedly gave the green light for the use of the pirated dataset. One Meta employee referred to LibGen as a “dataset we know to be pirated,” demonstrating that the company was aware of the potential infringement.
Further claims suggest that Meta took steps to conceal its use of the pirated dataset by stripping out copyright information from the content. One Meta engineer allegedly created a script to remove attribution details from e-books and scientific articles, an act that could be seen as an effort to cover up the infringement.
Additionally, Meta is accused of torrenting LibGen, a process that involves downloading and redistributing pirated content. Internal communications show that one engineer was uneasy about the practice, saying, “torrenting from a [Meta-owned] corporate laptop doesn’t feel right.” However, the company’s head of generative AI, Ahmad Al-Dahle, reportedly downplayed the legal risks, and the team continued with the torrenting activities.
Meta has defended its actions by citing the fair use doctrine, which permits limited use of copyrighted material for transformative purposes. However, the court has yet to rule on the case, and past rulings on similar claims against AI developers have shown mixed outcomes.
Judge Vince Chhabria, who is overseeing the case, criticized Meta’s attempt to redact parts of the lawsuit, suggesting that the company was more focused on avoiding negative publicity than protecting sensitive business information. This has added additional scrutiny to how Meta is handling the situation.
As the case unfolds, it highlights the ongoing legal and ethical challenges surrounding AI development and the use of copyrighted materials. The outcome could have far-reaching implications for the tech industry’s approach to training AI models while respecting intellectual property rights.