Google's Phenaki Model Can Generate Long Videos Based on Text Prompts

The company showed how is scaling helpful technologies worldwide.

We are pretty used to text-to-image models that pop up more and more frequently, but text-to-video tools are still somewhat novel. Google has presented an AI-generated super-resolution video made using Phenaki – "a model capable of realistic video synthesis given a sequence of textual prompts." What makes it stand out is its ability to create extensive videos – as long as several minutes.

Phenaki was made to address the issues of computational cost, limited quantities of high-quality text-video data, and variable lengths of videos that text-to-video models face. This causal model for learning video representation compresses the video to a small discrete tokens representation. 

"To generate video tokens from text we are using a bidirectional masked transformer conditioned on pre-computed text tokens. The generated video tokens are subsequently de-tokenized to create the actual video." 

Hopefully, we'll see more from the tech soon as Google promised to bring its text-to-image tools to AI Test Kitchen soon. 

Learn more about Google's AI solutions in this blog post and don't forget to join our Reddit page and our Telegram channel, follow us on Instagram and Twitter, where we share breakdowns, the latest news, awesome artworks, and more. 

Join discussion

Comments 0

    You might also like

    We need your consent

    We use cookies on this website to make your browsing experience better. By using the site you agree to our use of cookies.Learn more