AutoKeyframe

Autoregressive Keyframe Generation for Human Motion Synthesis and Editing

Bowen Zheng1, Ke Chen1, Yuxin Yao2, Zijiao Zeng3, Xinwei Jiang3,
He Wang4, Joan Lasenby2, *Xiaogang Jin1,5
1State Key Lab of CAD&CG, Zhejiang University
2University of Cambridge    3Tencent Games    4University College London 5ZJU-Tencent Game and Intelligent Graphics Innovation Technology Joint Lab
*Corresponding Author
ACM SIGGRAPH 2025
Teaser Image

Abstract

Keyframing has long been the cornerstone of standard character animation pipelines, offering precise control over detailed postures and dynamics. However, this approach is labor-intensive, necessitating significant manual effort. Automating this process while balancing the trade-off between minimizing manual input and maintaining full motion control has therefore been a central research challenge. In this work, we introduce AutoKeyframe, a novel framework that simultaneously accepts dense and sparse control signals for motion generation by generating keyframes directly. Dense signals govern the overall motion trajectory, while sparse signals define critical key postures at specific timings. This approach substantially reduces manual input requirements while preserving precise control over motion. The generated keyframes can be easily edited to serve as detailed control signals. AutoKeyframe operates by automatically generating keyframes from dense root positions, which can be determined through arc-length parameterization of the trajectory curve. This process is powered by an autoregressive diffusion model, which facilitates keyframe generation and incorporates a skeleton-based gradient guidance technique for sparse spatial constraints and frame editing. Extensive experiments demonstrate the efficacy of AutoKeyframe, achieving high-quality motion synthesis with precise and intuitive control.

Video

Pipeline

Given a complete root trajectory of length L, action label and sparse spatial constraints as control input, our method generates a sequence of motion keyframes, with each frame located on a specific point on the trajectory, which is specified by users. This keyframe sequence can be further completed into high-quality motion and serves as a solid foundation for artists to edit. To accomplish that, we train an autoregressive keyframe diffusion model, which takes as input the previous keyframe, action label, and various control signals derived from the trajectory, and learns the conditional distribution of the future keyframe. To facilitate accurate control and precise editing of the motion, we propose a skeleton-based gradient guidance approach to enable the keyframe generation to adhere to flexible spatial constraints. To further improve the generation quality, we construct a motion keyframe dataset using an adaptive keyframe selection method based on deep reinforcement learning.

Pipeline Diagram

Results (swipe to see the keyframes)

We generate keyframes sequences and complete the motion sequences using existing motion completion methods. We first generate examples using root trajectory (yellow spheres) only as control input (shown on the left). Then we impose additional sparse constraints (green spheres) to edit the motion (shown on the right). The modified part of the motion is highlighted in blue.

Action Label: Fight

In this example, we impose two spatial constraints on the foot and wrist to make the character perform expressive kick and puch.

First generated

Edited

Action Label: Obstacles

In this example, we fix the penetration and floating issues in the first generated result, allowing the character to make complex interaction with the environment by imposing only four spatial constraints.

First generated

Edited

Comparison - Motion Generation

We compare our method with MDM, HGHOI and OmniControl for motion generation under trajectory control.

Action Label: Aiming

MDM

HGHOI

OmniControl

Ours

Action Label: FallAndGetUp

MDM

HGHOI

OmniControl

Ours

Action Label: Run

MDM

HGHOI

OmniControl

Ours

Comparison - Motion Generation under Mixed Control

We compare our method with OmniControl for motion generation under mixed contol of dense root trajectory and sparse spatial constraints on specific joints. We found that, although OmniControl accepts mixed control signals as input, the generated results often neglects the sparse control and prioritize the trajectory control, while our method accomodates both.

Action Label: Aiming

OmniControl

Ours

Action Label: Obstacles

OmniControl

Ours

BibTeX

If you find this project helpful to your research, please consider citing:
@inproceedings{autokeyframe_sig25,
        author = {Zheng, Bowen and Chen, Ke and Yao, Yuxin and Zeng, Zijiao and Jiang, Xinwei and Wang, He and Lasenby, Joan and Jin, Xiaogang},
        title = {AutoKeyframe: Autoregressive Keyframe Generation for Human Motion Synthesis and Editing},
        year = {2025}, 
        isbn = {9798400715402}, 
        publisher = {Association for Computing Machinery},
        address = {New York, NY, USA},
        url = {https://doi.org/10.1145/3721238.3730680},
        doi = {10.1145/3721238.3730680},
        booktitle = {ACM SIGGRAPH 2025 Conference Proceedings},
        numpages = {12},
        location = {Vancouver, BC, Canada},
        series = {SIGGRAPH '25}
        }