logo80lv
Articlesclick_arrow
Research
Talentsclick_arrow
Events
Workshops
Aboutclick_arrow
profile_loginLogIn

170,000+ YouTube Videos Were Used by Silicon Valley Giants to Train AI

Asking the opinions of YouTube creators themselves never occurred to tech companies.

Proof News, in collaboration with Wired, has published an extensive investigative report stating that numerous tech companies, including NVIDIA, Apple, Salesforce, and Anthropic, have used content from thousands of YouTube videos to train their AI models, completely ignoring YouTube's rules against harvesting material from the platform without permission.

According to the investigation, Silicon Valley giants employed a service called YouTube Subtitles to access subtitles from 173,536 YouTube videos, sourced from over 48,000 channels, including Khan Academy, MIT, Harvard, The Wall Street Journal, BBC, late-night shows, and popular YouTubers like MrBeast, Marques Brownlee, Jacksepticeye, and PewDiePie.

The subtitles were then utilized as training data for the companies' generative AIs, showing once again that when it comes to artificial intelligence, multi-billion companies are perfectly content with using tactics of questionable legality to gain an edge over their competitors in the AI race.

"It's theft," commented Nebula CEO Dave Wiskus in response to the findings, stressing that using creators' work without their consent is disrespectful because companies may utilize "generative AI to replace as many of the artists along the way as they can."

"No one came to me and said, 'We would like to use this.' This is my livelihood, and I put time, resources, money, and staff time into creating this content. There's really no shortage of work," added David Pakman of "The David Pakman Show".

The report further claims that representatives from EleutherAI, the creators of the YouTube Subtitles dataset, did not respond to requests for comment regarding the findings, including accusations of using videos without permission. The dataset was found to be part of a larger compilation known as The Pile, which includes not only YouTube video transcripts but also material from the European Parliament, English Wikipedia, and emails from Enron Corporation employees.

Furthermore, Proof News discovered that multiple tech companies, including those mentioned above, have detailed in their research papers how they utilized The Pile to train their AI models. Documents indicate that Apple used The Pile to train OpenELM, a prominent AI model released in April, just weeks before announcing new AI capabilities for iPhones and Macs. Salesforce also confirmed its use of The Pile to develop an AI model for "academic and research purposes".

Click here to read the full report and don't forget to join our 80 Level Talent platform and our Telegram channel, follow us on InstagramTwitterLinkedInTikTok, and Reddit, where we share breakdowns, the latest news, awesome artworks, and more.

Join discussion

Comments 0

    You might also like

    We need your consent

    We use cookies on this website to make your browsing experience better. By using the site you agree to our use of cookies.Learn more