AI Locomotion Learning in Complex Environments
Subscribe:  iCal  |  Google Calendar
7, Mar — 12, Jun
Austin US   9, Mar — 19, Mar
San Francisco US   19, Mar — 24, Mar
San Francisco US   19, Mar — 21, Mar
Anaheim US   23, Mar — 26, Mar
Latest comments

Donald Trump, insulation is a seamless wall with airpockets. Ceilings can be printed using a re-enforcing scaffold for support. Try googling info..

by Polygrinder
4 hours ago

Really awesome work and the tutorial is fantastic. Thanks for sharing.

by Dave
4 hours ago

Absolutely no information about the 4.2 release - was it ever released in September. There is about as much information on trueSKY as there is in any of the so called products that use it. For me this lack of transparency is killing there business and points to fundamental issues with the technology. Google trueSKY in YouTube and you'll hardly get any information at all. For such a ground breaking technology this is very suspicious. Do they not have a marketing team - do they even care? Sounds like a very small company which wishes to remain small and doesn't understand what they can become because with the technology they have they should be targeting a bigger profile, revenue streams and audiance than they have and the lack of foresight here with the Simul management is quite frankly very disapointing. Another 10 years could easily disapear for these guys and they will simply remain a small fish. Very sad.

AI Locomotion Learning in Complex Environments
11 July, 2017

The researchers from DeepMind are working on complex artificial intelligence solutions which could be used in animation and gamedev.

Last week Cornell University Library published a very interesting paper with a link to a particular video, which is now making rounds on the internet. The paper is penned by a team of researchers from DeepMind Technologies: Nicolas Heess, Dhruva TB, Srinivasan Sriram, Jay Lemmon, Josh Merel, Greg Wayne, Yuval Tassa, Tom Erez, Ziyu Wang, Ali Eslami, Martin Riedmiller, David Silver. The man behind DeepMind is probably one of the most inspiring minds in the field of game design and artificial intelligence. We’re talking, of course, about Demis Hassabis.

Hassabis worked at Bullfrog Productions and Lionhead, where he programmed AI for Black & White – a technological marvel at the time. He later founded Elixir Studios, where he helped to create Republic: The Revolution and Evil Genius. Later this studio was closed, but Hassabis moved to found a new stat up called DeepMind. In 2014 this company was acquired by Google.

DeepMind performs research in various fields, paying most attention to artificial intelligence. The current paper deals with learning paradigm and in particular “Emergence of Locomotion Behaviours in Rich Environments”. In plain English, it means that researches taught virtual AI characters to move in complex environments guided by its desire to achieve progress. While hilarious, this video does show an incredible power of the artificial intelligence and shows various ways it could be implemented. Games and animation are probably one of the basic fields where this could be applied, but also the key points of this research could fuel the performance of future generations of robots or whatever we’ll be enslaved by in the future.

Here’s a little abstract from the paper, which you can find over here. 

We focus on a set of novel locomotion tasks that go significantly beyond the previous state-of-the-art for agents trained directly from reinforcement learning. They include a variety of obstacle courses for agents with different bodies (Quadruped, Planar Walker, and Humanoid [5, 6]). The courses are procedurally generated such that every episode presents a different instance of the task.

Our environments include a wide range of obstacles with varying levels of difficulty (e.g. steepness, unevenness, distance between gaps). The variations in difficulty present an implicit curriculum to the agent – as it increases its capabilities it is able to overcome increasingly hard challenges, resulting in the emergence of ostensibly sophisticated locomotion skills which may naïvely have seemed to require careful reward design or other instruction. We also show that learning speed can be improved by explicitly structuring terrains to gradually increase in difficulty so that the agent faces easier obstacles first and harder obstacles only when it has mastered the easy ones.

In order to learn effectively in these rich and challenging domains, it is necessary to have a reliable and scalable reinforcement learning algorithm. We leverage components from several recent approaches to deep reinforcement learning. First, we build upon robust policy gradient algorithms, such as trust region policy optimization (TRPO) and proximal policy optimization (PPO) [7, 8], which bound parameter updates to a trust region to ensure stability. Second, like the widely used A3C algorithm [2] and related approaches [3] we distribute the computation over many parallel instances of agent and environment. Our distributed implementation of PPO improves over TRPO in terms of wall clock time with little difference in robustness, and also improves over our existing implementation of A3C with continuous actions when the same number of workers is used.

The paper proceeds as follows. In Section 2 we describe the distributed PPO (DPPO) algorithm that enables the subsequent experiments, and validate its effectiveness empirically. Then in Section 3 we introduce the main experimental setup: a diverse set of challenging terrains and obstacles. We provide evidence in Section 4 that effective locomotion behaviours emerge directly from simple rewards; furthermore we show that terrains with a “curriculum” of difficulty encourage much more rapid progress, and that agents trained in more diverse conditions can be more robust.

And while the majority of us will be interested in the solution for pathfinding and other similar tasks, what you really want to think about is what kind of reward you should be giving to your AI. It’s a long-term task, but in this day and age ‘long-term quickly’ becomes ‘short term’.

Leave a Reply

Be the First to Comment!