RAT4D: Rig and Animate Objects without Surface Templates in 4D
Abstract
We present a surface-template-free method for reconstructing dense, rigged (re-animatable) 3D models from monocular videos. By combining robust pose optimization with differentiable Gaussian splatting, this work bridges the gap between flexible template-free approaches and the visual quality of template-based methods. Starting from noisy 2D keypoints, we refine 3D poses through kinematic and temporal constraints, then attach Gaussian primitives to the optimized skeleton for differentiable supervision and rendering. We demonstrate dense, rigged (re-animatable) 3D models without surface templates across humans, animals, insects, and everyday articulated objects, and we empirically show that closing the rendering–pose loop improves 3D lifting from noisy landmarks. A key enabler of these template-free reconstructions is our kinematic optimization, which reduces 3D pose error by 20–25\% relative to template-free baselines; at the same time, our results approach template-based visual metrics (PSNR/SSIM within 5\%). We also demonstrate the method’s practical utility by detecting and correcting geometric inconsistencies in AI-generated videos. While limited to articulated subjects with detectable keypoints, the approach provides a practical pipeline that serves as a drop-in refinement to improve 3D lifting in existing pipelines and enables the creation of rigged 3D assets from casual captures when expensive surface templates (e.g., MoCap-derived) are unavailable.