AI Learns to Play Games by Studying YouTube Videos
Subscribe:  iCal  |  Google Calendar
7, Mar — 1, Jun
York US   26, Mar — 29, Mar
Boston US   28, Mar — 1, Apr
Anaheim US   29, Mar — 1, Apr
RALEIGH US   30, Mar — 1, Apr
Latest comments

This is amazing! Please tell us, What programs where used to create these amazing animations?

I am continuing development on WorldKit as a solo endeavor now. Progress is a bit slower as I've had to take a more moderate approach to development hours. I took a short break following the failure of the commercial launch, and now I have started up again, but I've gone from 90 hour work weeks to around 40 or 50 hour work weeks. See my longer reply on the future of WorldKit here: I am hard at work with research and code, and am not quite ready to start the next fund-raising campaign to open-source, so I've been quiet for a while. I hope to have a video out on the new features in the next few weeks.

Someone please create open source world creator already in C/C++.

AI Learns to Play Games by Studying YouTube Videos
31 May, 2018

Google DeepMind’s researchers revealed a new paper that discusses a method of training artificial intelligence to play “infamously hard exploration games” using YouTube videos of human playthroughs. The core idea behind the concept is that it’s quite challenging for deep reinforcement learning algorithms to improve at tasks which take place “where environment rewards are particularly sparse.”


Deep reinforcement learning methods traditionally struggle with tasks where environment rewards are particularly sparse. One successful method of guiding exploration in these domains is to imitate trajectories provided by a human demonstrator. However, these demonstrations are typically collected under artificial conditions, i.e. with access to the agent’s exact environment setup and the demonstrator’s action and reward trajectories. Here we propose a two-stage method that overcomes these limitations by relying on noisy, unaligned footage without access to such data. First, we learn to map unaligned videos from multiple sources to a common representation using self-supervised objectives constructed over both time and modality (i.e. vision and sound). Second, we embed a single YouTube video in this representation to construct a reward function that encourages an agent to imitate human gameplay. This method of one-shot imitation allows our agent to convincingly exceed human-level performance on the infamously hard exploration games MONTEZUMA’S REVENGE, PITFALL! and PRIVATE EYE for the first time, even if the agent is not presented with any environment rewards.

AI can use this kind of videos to learn, but the algorithm tends to play games in a more interesting way. “Specifically, providing a standard RL agent with an imitation reward learnt from a single YouTube video, we are the first to convincingly exceed human-level performance on three of Atari’s hardest exploration games: Montezuma’s Revenge, Pitfall! and Private Eye,” the team pointed out. “Despite the challenges of designing reward functions or learning them using inverse reinforcement learning, we also achieve human-level performance even in the absence of an environment reward signal.”

You can find the full article with a thorough report from the team here


Leave a Reply