Uncategorized

Authors sue Anthropic for training AI using pirated books

Image: The Verge

A group of authors has sued Anthropic, accusing it of training its models on pirated books, as reported by Reuters. The proposed class action lawsuit was filed in a California court on Monday and alleges Anthropic “built a multibillion-dollar business by stealing hundreds of thousands of copyrighted books.”
In the lawsuit, the authors say that Anthropic used a sprawling, open-source dataset known as “The Pile” to train its family of Claude AI chatbots. Within this dataset is something called Books3, a massive library of pirated ebooks that includes works from Stephen King, Michael Pollan, and thousands of other authors. Earlier this month, Anthropic confirmed to Vox that it used The Pile to train Claude.
“It is apparent that Anthropic downloaded and reproduced copies of The Pile and Books3, knowing that these datasets were comprised of a trove of copyrighted content sourced from pirate websites like Bibiliotik,” the lawsuit reads. The authors want the court to certify their class action lawsuit as well as require Anthropic to pay proposed damages and prevent the company from using copyrighted material in the future. Anthropic didn’t immediately respond to The Verge’s request for comment.
The writers suing Anthropic include Andrea Bartz, the author of We Were Never Here; Charles Graeber, who wrote The Good Nurse; and Kirk Wallace Johnson, the author of The Feather Thief. While the lawsuit acknowledges that Books3 has been removed from the “most official” version of The Pile, the original version is still allegedly available elsewhere online. A recent investigation also found that companies like Anthropic and Apple trained their AI models on thousands of scraped YouTube video subtitles available within The Pile.
Last year, former Arkansas Governor Mike Huckabee and other authors filed a similar lawsuit against Meta, Microsoft, and EleutherAI — the nonprofit behind The Pile — over allegations their work was pirated and used to train AI models. George R.R. Martin, Jodi Picoult, Michael Chabon, and several other authors have also sued OpenAI for its alleged use of their copyrighted content.

Image: The Verge

A group of authors has sued Anthropic, accusing it of training its models on pirated books, as reported by Reuters. The proposed class action lawsuit was filed in a California court on Monday and alleges Anthropic “built a multibillion-dollar business by stealing hundreds of thousands of copyrighted books.”

In the lawsuit, the authors say that Anthropic used a sprawling, open-source dataset known as “The Pile” to train its family of Claude AI chatbots. Within this dataset is something called Books3, a massive library of pirated ebooks that includes works from Stephen King, Michael Pollan, and thousands of other authors. Earlier this month, Anthropic confirmed to Vox that it used The Pile to train Claude.

“It is apparent that Anthropic downloaded and reproduced copies of The Pile and Books3, knowing that these datasets were comprised of a trove of copyrighted content sourced from pirate websites like Bibiliotik,” the lawsuit reads. The authors want the court to certify their class action lawsuit as well as require Anthropic to pay proposed damages and prevent the company from using copyrighted material in the future. Anthropic didn’t immediately respond to The Verge’s request for comment.

The writers suing Anthropic include Andrea Bartz, the author of We Were Never Here; Charles Graeber, who wrote The Good Nurse; and Kirk Wallace Johnson, the author of The Feather Thief. While the lawsuit acknowledges that Books3 has been removed from the “most official” version of The Pile, the original version is still allegedly available elsewhere online. A recent investigation also found that companies like Anthropic and Apple trained their AI models on thousands of scraped YouTube video subtitles available within The Pile.

Last year, former Arkansas Governor Mike Huckabee and other authors filed a similar lawsuit against Meta, Microsoft, and EleutherAI — the nonprofit behind The Pile — over allegations their work was pirated and used to train AI models. George R.R. Martin, Jodi Picoult, Michael Chabon, and several other authors have also sued OpenAI for its alleged use of their copyrighted content.

Read More 

Leave a Reply

Your email address will not be published. Required fields are marked *

Scroll to top
Generated by Feedzy