SIAM: Synchronous Interaction Attention for Human Mesh Recovery
Abstract
Conventional 3D body mesh reconstruction methods often use decoupling strategies that isolate individual features for separate representation, lacking relational cues among entities. In this paper, we propose SIAM, a novel Synchronous Interaction Attention for Human Mesh Recovery. Our framework builds upon a high-resolution multi-branch backbone (HRNet) and introduces two key components. First, Synchronous Interaction Attention (SIA), which explicitly models spatial relational cues among multiple human instances in live scenes. Second, Feature Decomposition (FD), which extracts enriched instance-specific features by leveraging the attributes captured by the SIA module. This integrated approach significantly enhances spatial reasoning, mitigates error accumulation, and results in more accurate 3D human mesh reconstruction. SIAM achieves state-of-the-art performance on several benchmarks, including 3DPW, 3DPW-OCC, AGORA, and and CMU-Panoptic for 3D human mesh reconstruction. Notably, our model runs at 25 frames per second on video streams, highlighting its potential for real-time applications. The source code will be released publicly