Microsoft revealed a new AI technology that’s described as an artist — a “drawing bot.”
Microsoft revealed a new AI technology that’s described as an artist — a “drawing bot.” The bot can generate images from text descriptions of an object, but it is also capable of adding details that weren’t mentioned in the text, meaning that the AI holds a little imagination of its own, states Microsoft.
The new artificial intelligence technology under development in Microsoft’s research labs is programmed to pay close attention to individual words when generating images from caption-like text descriptions. This deliberate focus produced a nearly three-fold boost in image quality compared to the previous state-of-the-art technique for text-to-image generation, according to results on an industry standard test reported in a research paper posted on arXiv.org.
The technology, which the researchers simply call the drawing bot, can generate images of everything from ordinary pastoral scenes, such as grazing livestock, to the absurd, such as a floating double-decker bus. Each image contains details that are absent from the text descriptions, indicating that this artificial intelligence contains an artificial imagination.
The research group started with technology that automatically writes photo captions – the CaptionBot – and then switched to a technology that “answers questions humans ask about images“, like the location or attributes of objects.
This system is said to use “training machine learning models to identify objects, interpret actions and converse in natural language”
If you go to Bing and you search for a bird, you get a bird picture. But here, the pictures are created by the computer, pixel by pixel, from scratch. These birds may not exist in the real world — they are just an aspect of our computer’s imagination of birds.
Xiaodong He, a principal researcher and research manager
The core of Microsoft’s new AI is a technology called a Generative Adversarial Network, or GAN. The network features two machine learning models. The first one builds images from text descriptions and the other one, known as a discriminator, uses text descriptions to judge the authenticity of generated images.
Drawing bot was trained on datasets that contain paired images and captions, which allow the models to learn how to match words to the visual representation of those words. The GAN, for example, learns to generate an image of a bird when a caption says bird and, likewise, learns what a picture of a bird should look like. “That is a fundamental reason why we believe a machine can learn,” said He.
GANs work well when generating images from simple text descriptions such as a blue bird or an evergreen tree, but the quality stagnates with more complex text descriptions such as a bird with a green crown, yellow wings and a red belly. That’s because the entire sentence serves as a single input to the generator. The detailed information of the description is lost. As a result, the generated image is a blurry greenish-yellowish-reddish bird instead a close, sharp match with the description.
You can find more details on the bot in the official announcement.