logo80lv
Articlesclick_arrow
Research
Talentsclick_arrow
Events
Workshops
Aboutclick_arrow
profile_loginLogIn

Video-P2P: AI Video Editing Model for Generating New Characters

Switch an object in the video with the one from a text prompt.

Researchers from The Chinese University of Hong Kong, SmartMore, and Adobe presented Video-P2P, a framework for real-world video editing with cross-attention control. Simply put, it can replace an object in the video with the one you specify in a text prompt.

The model adapts an image generation diffusion model to complete various video editing tasks. The creators propose to first tune a text-to-set model to complete an inversion and then optimize a shared embedding to achieve accurate video inversion.

"For attention control, we introduce a novel decoupled-guidance strategy, which uses different guidance strategies for the source and target prompts. The optimized unconditional embedding for the source prompt improves reconstruction ability, while an initialized unconditional embedding for the target prompt enhances editability. Incorporating the attention maps of these two branches enables detailed editing."

These designs enable text-driven editing applications, including word swap, prompt refinement, and attention re-weighting. Video-P2P seems to work on real-world videos for generating new characters while preserving their original poses and scenes, but more data is needed to fully check the researchers' claims. 

Meanwhile, you can read the paper and wait for the code here. Also, don't forget to join our Reddit page and our Telegram channel, follow us on Instagram and Twitter, where we share breakdowns, the latest news, awesome artworks, and more.

Join discussion

Comments 0

    You might also like

    We need your consent

    We use cookies on this website to make your browsing experience better. By using the site you agree to our use of cookies.Learn more