NVIDIA's New Model for Multimodal Conditional Image Synthesis

The new model allows mixing different inputs like text, segmentation, sketch, or style reference.

In case you missed it

You may find these articles interesting

NVIDIA Reveals GauGAN 2 with Text-to-Image Feature

NVIDIA presented a new synthesis model that allows users to use multiple modules to generate new images. The team noted that existing conditional image synthesis frameworks generate images based on user inputs in a single modality, such as text, segmentation, sketch, or style reference. The problem is these models are usually limited to a single input so you can use text and sketch but can't use both modules. To deal with this limitation, the NVIDIA team developed the Product-of-Experts Generative Adversarial Networks (PoE-GAN) framework, which can generate images based on a desired set of modules.

"PoE-GAN consists of a product-of-experts generator and a multimodal multiscale projection discriminator. Through our carefully designed training scheme, PoE-GAN learns to synthesize images with high quality and diversity," wrote the team. Besides advancing the state of the art in multimodal conditional image synthesis, PoE-GAN also outperforms the best existing unimodal conditional image synthesis approaches when tested in the unimodal setting."

A few weeks ago, NVIDIA had also presented a similar AI called GauGAN 2, the main feature of which is the ability to turn a simple written phrase, or sentence, into a photorealistic image using deep learning.

You can learn more about PoE-GAN here (the code is coming soon). Don't forget to join our new Reddit page, our new Telegram channel, follow us on Instagram and Twitter, where we are sharing breakdowns, the latest news, awesome artworks, and more.

NVIDIA's New Model for Multimodal Conditional Image Synthesis

Join discussion

Comments 0

You might also like

We need your consent