DeepMind has presented a neural network that can imagine a scene from different viewpoints, using just a single image.
DeepMind has presented a neural network that can “imagine” a scene from different viewpoints, using just a single image. The network takes a 2D picture of a scene and generates a 3D view from a different vantage point, rendering the other sides of the objects and changing shadows to maintain the same light source.
This system, called the Generative Query Network (GQN), is said to tease out details from the static images to guess at spatial relationships.
Imagine you’re looking at Mt. Everest, and you move a metre – the mountain doesn’t change size, which tells you something about its distance from you.
But if you look at a mug, it would change position. That’s similar to how this works.
Ali Eslami, Deepmind
The team showed the neural network images of a scene from different viewpoints, and the network tried to predict what something would look like from other angles. The network has also used context to learn about textures, colours, and lighting.
Scene representation—the process of converting visual sensory data into concise descriptions—is a requirement for intelligent behavior. Recent work has shown that neural networks excel at this task when provided with large, labeled datasets. However, removing the reliance on human labeling remains an important open problem. To this end, we introduce the Generative Query Network (GQN), a framework within which machines learn to represent scenes using only their own sensors. The GQN takes as input images of a scene taken from different viewpoints, constructs an internal representation, and uses this representation to predict the appearance of that scene from previously unobserved viewpoints. The GQN demonstrates representation learning without human labels or domain knowledge, paving the way toward machines that autonomously learn to understand the world around them.
It learns a lot like we do being able to control objects in virtual space. The system is also capable of learning different characteristics of similar objects and remembering them.