Meta Faces Lawsuit Over Alleged Use of Pirated Content to Train AI Models

Meta Platforms is facing a major copyright infringement lawsuit following allegations that CEO Mark Zuckerberg approved the use of pirated content to train the company’s AI models, particularly the Llama large language model.

The lawsuit, filed by authors including Sarah Silverman and Ta-Nehisi Coates, claims that Meta used a dataset known as Library Genesis (LibGen), a collection of pirated books and academic articles, to develop its AI technologies.

Internal communications reveal that Meta’s AI team was aware of the legal risks associated with using LibGen. Despite concerns, the decision was escalated to Zuckerberg, who reportedly gave the green light for the use of the pirated dataset. One Meta employee referred to LibGen as a “dataset we know to be pirated,” demonstrating that the company was aware of the potential infringement.

Further claims suggest that Meta took steps to conceal its use of the pirated dataset by stripping out copyright information from the content. One Meta engineer allegedly created a script to remove attribution details from e-books and scientific articles, an act that could be seen as an effort to cover up the infringement.

Additionally, Meta is accused of torrenting LibGen, a process that involves downloading and redistributing pirated content. Internal communications show that one engineer was uneasy about the practice, saying, “torrenting from a [Meta-owned] corporate laptop doesn’t feel right.” However, the company’s head of generative AI, Ahmad Al-Dahle, reportedly downplayed the legal risks, and the team continued with the torrenting activities.

Meta has defended its actions by citing the fair use doctrine, which permits limited use of copyrighted material for transformative purposes. However, the court has yet to rule on the case, and past rulings on similar claims against AI developers have shown mixed outcomes.

Judge Vince Chhabria, who is overseeing the case, criticized Meta’s attempt to redact parts of the lawsuit, suggesting that the company was more focused on avoiding negative publicity than protecting sensitive business information. This has added additional scrutiny to how Meta is handling the situation.

As the case unfolds, it highlights the ongoing legal and ethical challenges surrounding AI development and the use of copyrighted materials. The outcome could have far-reaching implications for the tech industry’s approach to training AI models while respecting intellectual property rights.

Leave a Comment

Your email address will not be published. Required fields are marked *

Read More

El Salvador’s Crypto Crash: A Cautionary Tale

A whopping 92% of Salvadorans didn't use bitcoin in 2024. This shows El Salvador's crypto experiment failed. It's surprising since...

Master the Art of Writing Killer AI Prompts: Essential Tips

Did you know AI video generators can cut video production time by up to 90%1? This is a big deal...

How to Use ChatGPT effectively: A Beginner’s Guide

ChatGPT is an AI chatbot from OpenAI that can write like a human. It launched in November 2022, sparking lots...

IoT (Internet of Things) Revolutionizing Industries

By 2030, 32.1 billion devices will be connected to IoT, changing how we live and work. The Internet of Things...

How Phone Batteries Are Getting Thinner Yet More Powerful – The Science Behind High-Capacity Cells

75% of smartphone users say battery life is key when picking a new phone. This is because we use our...

Goodbye Google? Top 9 Ways AI Is Transforming the Way People Search

A whopping 51% of AI answers about news were found to have big problems1. This makes us wonder about the...