The generator utilizes the Jukebox model to encode input music into embeddings.
Stanford University's Jonathan Tseng, Rodrigo Castellon, and C. Karen Liu have recently presented EDGE, an awesome new AI capable of generating smooth dance moves based on input music.
Trained with physical realism in mind and powered by a transformer-based diffusion model paired with Jukebox, a strong music feature extractor that gives the AI an understanding of music, EDGE can produce short videos with a set of dance moves, which the AI considers to be the most in-line with the input. On top of that, EDGE also confers powerful editing capabilities well-suited to dance, including joint-wise conditioning, motion in-betweening, and dance continuation.
"EDGE uses a frozen Jukebox model to encode input music into embeddings," comment the developers. "A conditional diffusion model learns to map the music embedding into a series of 5-second dance clips. At inference time, temporal constraints are applied to batches of multiple clips to enforce temporal consistency before stitching them into an arbitrary-length full video."