This novel model can generate human motion based on an input text prompt.
Over the past few months, we have seen tons of various "text-to-something" AIs for seemingly any task imaginable, including text-to-video, text-to-material, and countless text-to-image programs capable of producing mind-blowing art pieces. With such a rapid advance and growth in popularity, it seems that the existence of any AI one can come up with is just a matter of time and is probably already being developed by someone.
A group of researchers from Tel Aviv University has recently proved that point once again by unveiling Motion Diffusion Model (MDM), a new diffusion-based generative model that acts as a text-to-motion AI capable of generating human motions from text inputs. Trained with lightweight data, the proposed model is very resource-efficient and is fully capable of turning text prompts into realistic human models doing various tasks.
"MDM is transformer-based, combining insights from motion generation literature. A notable design choice is the prediction of the sample, rather than the noise, in each diffusion step. This facilitates the use of established geometric losses on the locations and velocities of the motion, such as the foot contact loss," commented the team. "As we demonstrate, MDM is a generic approach, enabling different modes of conditioning, and different generation tasks. We show that our model is trained with lightweight resources and yet achieves state-of-the-art results on leading benchmarks for text-to-motion and action-to-motion."