logo80lv
Articlesclick_arrow
Professional Services
Research
Talentsclick_arrow
Events
Workshops
Aboutclick_arrow
Order outsourcing
Advertiseplayer
profile_loginLogIn

Runway Scraped Thousands of YouTube Channels to Train Its Text-to-Video AI

Hardly surprising, unfortunately.

At this point, few people doubt that many of the methods AI developers use to train their generative models range from barely legal to outright unlawful. However, no blood, no foul, and until concrete proof is found, it could be considered libel to accuse companies like OpenAI and Midjourney of stealing content from the web to build their machines.

Recently, though, 404 Media appears to have uncovered such proof in the form of a massive spreadsheet listing thousands of YouTube channels and other sources that AI company Runway reportedly compiled to train its text-to-video model, Gen-3.

According to 404's report, citing the AI company's former employee, the channels were compiled by Runway and then scraped using the open-source software YouTube-DL. The document they obtained, accessible here, includes 14 separate spreadsheets listing various tags, keywords, websites, and, most importantly, close to five thousand YouTube channels that Runway allegedly utilized for AI development.

Each channel is accompanied by comments detailing its content type, relevant keywords, and the number of videos available at the time it was added to the list. While it hasn't been confirmed that all these hundreds of thousands of videos were ultimately used to train Gen-3, 404 Media showed that using the names of popular YouTubers along with their video titles and "in the style of" prompts allowed them to generate results strikingly similar to the original videos.

Unsurprisingly, Runway did not release any official statements regarding the matter, and according to the report's author, Samantha Cole, the company ignored multiple requests for comment.

As it stands, the disclosed spreadsheet appears to be one of two things – either it's a long-awaited piece of actual evidence shedding light on the shady practices that AI developers adopt to outdo their competitors, or it's an elaborate hoax that required someone to compile thousands of YouTube channels and tags while also knowing the names of multiple Runway employees who worked on Gen-3. Which of these is more likely, considering what we already know about AI training methods, would be a rhetorical question.

Interestingly, this wouldn't be the first time in recent weeks that YouTube has been identified as a treasure trove for AI makers. Earlier, Proof News released an extensive investigative report revealing that multiple Silicon Valley giants, including NVIDIA, Apple, Salesforce, and Anthropic, used content from thousands of YouTube videos to train their AI models, completely disregarding YouTube's rules against harvesting material from the platform without permission.

Don't forget to join our 80 Level Talent platform and our Telegram channel, follow us on InstagramTwitterLinkedInTikTok, and Reddit, where we share breakdowns, the latest news, awesome artworks, and more.

Join discussion

Comments 0

    You might also like

    We need your consent

    We use cookies on this website to make your browsing experience better. By using the site you agree to our use of cookies.Learn more