logo80lv
Articlesclick_arrow
Research
Talentsclick_arrow
Events
Workshops
Aboutclick_arrow
profile_loginLogIn

ERNIE-ViLG 2.0: the Biggest Text-to-Image Model with 24B Parameters

Version 2.0 incorporates textual and visual knowledge into the diffusion model, improving the quality of images.

The Chinese text-to-image diffusion model ERNIE-ViLG got upgraded to version 2.0, which makes the quality of the pictures higher and the tool itself – the biggest model at present. Developer Baidu managed to improve ERNIE-ViLG by incorporating textual and visual knowledge of key elements in the scene and utilizing different denoising experts at different denoising stages.

According to the creators, ERNIE-ViLG 2.0 achieves the stateof-the-art on Microsoft COCO (a dataset that helps recognize objects in a scene) with zero-shot FID score of 6.75. It also reportedly outperforms recent models in terms of image fidelity and image-text alignment.

The researchers employed a text parser and an object detector to extract key elements of the scene in the input text-image pair and guided the model to pay more attention to their alignment in the learning process. What's more, they divided the denoising steps into several stages and used denoising “experts” for each stage. This way, the model can involve more parameters and learn the data distribution of each denoising stage better, without increasing the inference time. ERNIE-ViLG 2.0 can scale up the model to 24 billion parameters, which is 10 times more than in Stable Diffusion, making it the largest text-to-image model at the time.

ERNIE-ViLG is an important player in the text-to-image "game" as it can understand prompts in Chinese, as well as generate anime art and capture Chinese culture better than other tools. 

You can try the model out here. Check out more images made with it here and don't forget to join our Reddit page and our Telegram channel, follow us on Instagram and Twitter, where we share breakdowns, the latest news, awesome artworks, and more. 

Join discussion

Comments 1

  • Anonymous user

    生成一辆超级炫酷的敞篷SUV

    0

    Anonymous user

    ·2 years ago·

You might also like

A Week After "Basically Announcing" Minecraft 2, Notch Basically Cancels It

Instead, he and his team will focus on the previously-announced retro-style roguelike.

Rumor: Possible Release Date for Grand Theft Auto 6 Revealed

A video game store from Uruguay appears to have disclosed the launch date for the gaming industry's most anticipated title.

Breaking: Unity Suddenly Lays Off Numerous Developers With a 5 AM Email

Apparently, the entire Unity Behavior team was cut, alongside many other employees.
  • Desert Eagle MAG50
    by Abderrezek Bouhedda

    This game ready weapon will help you in your game or in your renders, just drag and drop it into your game engine. The weapon is modeled part by part including perfect UV Unwrapping and PBR Textures.

  • Kitbash Brushes for Concept Art
    by Mels Mneyan

    1500+ high resolution Kitbash Brushes for Photoshop. Make your concepts Easier and Faster!

We need your consent

We use cookies on this website to make your browsing experience better. By using the site you agree to our use of cookies.Learn more

×