MoFusion: Automating Human Motion Synthesis
The creators promise text-to-motion, motion completion, and zero-shot mixing of multiple control signals.
Researchers from DAMO Academy and Alibaba Group have presented MoFusion – a diffusion model for unified motion synthesis capable of text-to-motion transformation, motion completion, and zero-shot mixing of multiple control signals.
The new generative model uses a Transformer backbone and pretrains the backbone as a diffusion model to support synthesis from motion completion of a body part to whole-body motion generation. According to the researchers, the results show that "pretraining is vital for scaling the model size without overfitting and demonstrate MoFusion's potential in various tasks."
While there are plenty of tools for human motion synthesis, the creators aim to make a model that can do several tasks. MoFusion takes care of a range of jobs including:
- text-to-motion – generating motion based on text
- music-to-dance – creating motion from a piece of music
- motion in-betweening – generating intermediate frames between two keyframes
- inverse kinematics – finding a set of joint parameters subject to some constraints while maintaining natural poses
- modifying a body part – changing the movement of a body part in specified frames while retaining the content of the others
- mixing control signals – synthesizing a motion clip relevant to text and music.
Find the research here and don't forget to join our Reddit page and our Telegram channel, follow us on Instagram and Twitter, where we share breakdowns, the latest news, awesome artworks, and more.