eDiffi: NVIDIA's Text-to-Image Model with Expert Denoisers

The tool introduces two features: Paint-with-Words and style-guided image generation.

We've seen plenty of AI-generated art from Stable Diffusion, Midjourney, DALL-E, and other tools, now it's time for NVIDIA to present its own text-to-image model – eDiffi.

The creators describe the model as "a new generation of generative AI content creation tool that offers unprecedented text-to-image synthesis with instant style transfer and intuitive painting with words capabilities."

The researchers used different expert denoisers for different noise intervals of the generative process. eDiffi, unlike many other text-to-image tools, uses CLIP text, T5 text, and CLIP image encoders, which allegedly leads to improved synthesis capabilities.

eDiffi offers two interesting features: Paint-with-Words and style-guided image generation. Paint-with-Words allows assigning separate words or phrases from the prompt to different colors so that the model better understands where to put which object.

A digital painting of a half-frozen lake near mountains under a full moon and aurora. A boat is in the middle of the lake. Highly detailed.

Style transfer lets you give the AI a picture it should reference. The output image will adopt the same style.

Overall, the results do look impressive, especially when it comes to generating text: many AI tools struggle with correct spelling, but eDiffi's seem fine.

If you're interested in more technical details, read the paper here. Also, don't forget to join our Reddit page and our Telegram channel, follow us on Instagram and Twitter, where we share breakdowns, the latest news, awesome artworks, and more.

eDiffi: NVIDIA's Text-to-Image Model with Expert Denoisers

Join discussion

Comments 0

You might also like

We need your consent