Experiment on Text-Driven Video Editing with Stable Diffusion

The model can change nearly anything.

Take a look at this experiment with optimizing Neural Atlases through Stable Diffusion. Omer Bar Tal's video shows you can change almost anything, and it will look quite nice. The experiment is based on "Layered Neural Atlases for Consistent Video Editing" – a research presenting a method that decomposes an input video into a set of layered 2D atlases.

"For each pixel in the video, our method estimates its corresponding 2D coordinate in each of the atlases, giving us a consistent parameterization of the video, along with an associated alpha (opacity) value." 

Edits applied to a 2D atlas are mapped back to the original video frames preserving occlusions, deformation, and other scene effects such as shadows and reflections. 

Omer Bar Tal has worked on a similar project called "Text2LIVE: Text-Driven Layered Image and Video Editing", where the researchers introduced a method for zero-shot, text-driven appearance manipulation in natural images and videos.

The method combines an input image or video and a target text prompt to edit the appearance of existing objects or augment the scene with new visual effects "in a semantically meaningful manner." The key idea of the research is to generate an edit layer that is composited over the original input, allowing constraining the generation process and maintaining high fidelity to the original input via novel text-driven losses that are applied directly to the edit layer.

You can find more fascinating results here. Also, don't forget to join our Reddit page and our Telegram channel, follow us on Instagram and Twitter, where we share breakdowns, the latest news, awesome artworks, and more. 

Join discussion

Comments 0

    You might also like

    We need your consent

    We use cookies on this website to make your browsing experience better. By using the site you agree to our use of cookies.Learn more