The model is capable of generating and editing videos from image and text prompts.
To improve the motion editability, the team has also proposed a mixed objective that jointly fine-tunes with full temporal attention and temporal attention masking. The developers have also introduced a new framework for image animation, which transforms an image into a coarse video through simple image processing operations and then uses the general video editor to animate it.
Our method supports multiple applications by application-dependent pre-processing, converting the input content into a uniform video format. For image-to-video, the input image is duplicated and transformed using perspective transformations, synthesizing a coarse video with some camera motion. For subject-driven video generation, the input is omitted - finetuning alone takes care of fidelity," commented the team. "This coarse video is then edited using our general Dreamix Video Editor: we first corrupt the video by downsampling followed by adding noise. We then apply the finetuned text-guided video diffusion model, which upscales the video to the final spatio-temporal resolution"
Learn more here. Also, don't forget to join our 80 Level Talent platform, our Reddit page, and our Telegram channel, follow us on Instagram and Twitter, where we share breakdowns, the latest news, awesome artworks, and more.