This New Model Can Extract Motion Data From Internet Videos

Check out SMPLer-X, a novel generalist foundation model for expressive human pose and shape estimation.

Last week, a team of researchers officially introduced SMPLer-X, a novel generalist foundation model for expressive human pose and shape estimation (EHPS) that can quickly and easily extract motion data from Internet videos, which can then be used to animate virtual characters.

Trained on 4.5M instances from diverse data sources, SMPLer-X delivers strong performance across various test benchmarks and demonstrates remarkable transferability to previously unexplored domains.

According to the research team, the model consistently achieves state-of-the-art results on seven benchmarks, including AGORA (107.2 mm NMVE), UBody (57.4 mm PVE), EgoBody (63.6 mm PVE), and EHF (62.3 mm PVE).

"For the data scaling, we perform a systematic investigation on 32 EHPS datasets, encompassing a wide range of scenarios that a model trained on any single dataset cannot handle," commented the team. "More importantly, capitalizing on insights obtained from the extensive benchmarking process, we optimize our training scheme and select datasets that lead to a significant leap in EHPS capabilities."

"For the model scaling, we take advantage of vision transformers to study the scaling law of model sizes in EHPS. Moreover, our finetuning strategy turns SMPLer-X into specialist models, allowing them to achieve further performance boosts."

Learn more about SMPLer-X here and access the code over here. Also, don't forget to join our 80 Level Talent platform and our Telegram channel, follow us on Instagram, Twitter, and LinkedIn, where we share breakdowns, the latest news, awesome artworks, and more.

This New Model Can Extract Motion Data From Internet Videos

Join discussion

Comments 0

You might also like

We need your consent