Xinwei Yao, Ohad Fried, Kayvon Fatahalian, and Maneesh Agrawala from Stanford University and IDC Herzliya presented a text-based tool for editing talking-head video that enables iterative editing workflow. On each iteration, users can edit the wording of the speech, further refine mouth motions if necessary to reduce artifacts, and manipulate non-verbal aspects of the performance by inserting mouth gestures (e.g. a smile) or changing the overall performance style (e.g. energetic, mumble). The entire workflow is shown in the example video.
The tool requires only 2-3 minutes of the target actor video and it synthesizes the video for each iteration in about 40 seconds, allowing users to quickly explore many editing possibilities as they iterate.