Nonprofit scrubs illegal content from controversial AI training dataset

After backlash, LAION cleans child sex abuse materials from AI training data.

Enlarge (credit: Kirillm | iStock / Getty Images Plus)

After Stanford Internet Observatory researcher David Thiel found links to child sexual abuse materials (CSAM) in an AI training dataset tainting image generators, the controversial dataset was immediately taken down in 2023.

Now, the LAION (Large-scale Artificial Intelligence Open Network) team has released a scrubbed version of the LAION-5B dataset called Re-LAION-5B and claimed that it “is the first web-scale, text-link to images pair dataset to be thoroughly cleaned of known links to suspected CSAM.”

To scrub the dataset, LAION partnered with the Internet Watch Foundation (IWF) and the Canadian Center for Child Protection (C3P) to remove 2,236 links that matched with hashed images in the online safety organizations’ databases. Removals include all the links flagged by Thiel, as well as content flagged by LAION’s partners and other watchdogs, like Human Rights Watch, which warned of privacy issues after finding photos of real kids included in the dataset without their consent.

Read 36 remaining paragraphs | Comments

ars-rss

Recent Posts

Recent Comments

‘Forget ChatGPT: Why Researchers Now Run Small AIs On Their Laptops’

Jump raises $12M to help freelancers get benefits just like employees

I tried Google’s new one-click AI podcast creator, and now I don’t know what’s real anymore

Categories

Archives

Recent Posts

Recent Comments

‘Forget ChatGPT: Why Researchers Now Run Small AIs On Their Laptops’

Jump raises $12M to help freelancers get benefits just like employees

I tried Google’s new one-click AI podcast creator, and now I don’t know what’s real anymore

Categories

Archives

Nonprofit scrubs illegal content from controversial AI training dataset

Leave a Reply Cancel reply

Archives

Categories