CogVideo: New Method for Generating GIFs from Text Input

The AI creates 4-second clips of 32 frames.

Take a look at CogVideo – an algorithm that can generate short videos based on text input. It's trained by inheriting a pretrained text-to-image model, CogView2.

Like with DALL-E, you can type what you want to get, and the model creates an output, but this time – in form of a 4-second video of 32 frames. It's trained on 5.4 million text-video pairs and can make videos of pretty good quality.

The researchers also proposed a multi-frame-rate hierarchical training strategy to better align text and video clips. The original input was done in Chinese, but it should also work well in other languages.

Check out the research on GitHub. Also, don't forget to join our new Reddit pageour new Telegram channel, follow us on Instagram and Twitter, where we are sharing breakdowns, the latest news, awesome artworks, and more.

Join discussion

Comments 1

  • Anonymous user



    Anonymous user

    ·2 months ago·

You might also like

We need your consent

We use cookies on this website to make your browsing experience better. By using the site you agree to our use of cookies.Learn more