logo80lv
Articlesclick_arrow
Research
Talentsclick_arrow
Events
Workshops
Aboutclick_arrow
profile_loginLogIn

AI Learns to Play Games by Studying YouTube Videos

Google DeepMind’s researchers revealed a new paper that discusses a surprising method of training artificial intelligence to play games.

Google DeepMind’s researchers revealed a new paper that discusses a method of training artificial intelligence to play “infamously hard exploration games” using YouTube videos of human playthroughs. The core idea behind the concept is that it’s quite challenging for deep reinforcement learning algorithms to improve at tasks which take place “where environment rewards are particularly sparse.”

Abstract

Deep reinforcement learning methods traditionally struggle with tasks where environment rewards are particularly sparse. One successful method of guiding exploration in these domains is to imitate trajectories provided by a human demonstrator. However, these demonstrations are typically collected under artificial conditions, i.e. with access to the agent’s exact environment setup and the demonstrator’s action and reward trajectories. Here we propose a two-stage method that overcomes these limitations by relying on noisy, unaligned footage without access to such data. First, we learn to map unaligned videos from multiple sources to a common representation using self-supervised objectives constructed over both time and modality (i.e. vision and sound). Second, we embed a single YouTube video in this representation to construct a reward function that encourages an agent to imitate human gameplay. This method of one-shot imitation allows our agent to convincingly exceed human-level performance on the infamously hard exploration games MONTEZUMA’S REVENGE, PITFALL! and PRIVATE EYE for the first time, even if the agent is not presented with any environment rewards.

AI can use this kind of videos to learn, but the algorithm tends to play games in a more interesting way. “Specifically, providing a standard RL agent with an imitation reward learnt from a single YouTube video, we are the first to convincingly exceed human-level performance on three of Atari’s hardest exploration games: Montezuma’s Revenge, Pitfall! and Private Eye,” the team pointed out. “Despite the challenges of designing reward functions or learning them using inverse reinforcement learning, we also achieve human-level performance even in the absence of an environment reward signal.”

You can find the full article with a thorough report from the team here

10 armor Nanomesh Brushes all contained within one Multi-mesh brush and Low poly meshes.

Error
(2000-0001)

Join discussion

Comments 0

    You might also like

    A Week After "Basically Announcing" Minecraft 2, Notch Basically Cancels It

    Instead, he and his team will focus on the previously-announced retro-style roguelike.

    Rumor: Possible Release Date for Grand Theft Auto 6 Revealed

    A video game store from Uruguay appears to have disclosed the launch date for the gaming industry's most anticipated title.

    Breaking: Unity Suddenly Lays Off Numerous Developers With a 5 AM Email

    Apparently, the entire Unity Behavior team was cut, alongside many other employees.
    • Chain Brushes
      by Nicolas Swijngedau

      20 Chains Curve Brushes all contained within one Multi-mesh brush and Low poly meshes.

      Error
      (2000-0001)
    • 65 Brushes
      by Mels Mneyan

      These 65 brushes is everything you need to draw materials, textures and other things.

      Error
      (2000-0001)

    We need your consent

    We use cookies on this website to make your browsing experience better. By using the site you agree to our use of cookies.Learn more

    ×