Motion-Aware Graph Fusion NetWork for 3D Human Pose Estimation
Abstract
Recent state-of-the-art (SOTA) methods in 3D human pose estimation (HPE) typically prioritize lifting 2D pose coordinates to 3D but tend to underemphasize the importance of generalizing under real-world conditions with noisy 2D inputs from off-the-shelf 2D detector. In this paper, we introduce Graph Attention Fusion Network (GAtFuN), a novel motion-aware framework that integrates our spatial and temporal graph attention mechanisms to explicitly model joint velocities and motion transformations, resulting in more stable and coherent 3D pose predictions despite being trained with the same dataset pipeline as other SOTA methods. GAtFuN achieves a 7.8\% improvement in MPJPE over the current SOTA on the Human3.6M dataset and a 1.9\% improvement on the MPI-INF-3DHP dataset, while demonstrating more robust performance on the 3DPW dataset in the wild.