Broadcast2Pitch: Game State Reconstruction from Unconstrained Soccer Videos
Abstract
Game State Reconstruction (GSR) aims to reconstruct the 2D positions and identities of all athletes from broadcast soccer videos, requiring robust tracking, localization, and identity association under dynamic and unconstrained camera motions. We propose a modular GSR framework that integrates a multi-task keypoint and line detection model with an optimization-based homography estimation module. This approach leverages dense geometric cues from lines, circles, and keypoints to achieve robust spatial localization on a frame-by-frame basis, providing reliable alignment in diverse broadcast scenarios. To address identity consistency, we use appearance-based re-identification and a vision-language-guided tracklet refinement strategy to reduce ID switches and enforce temporal coherence. Comprehensive ablation studies validate the contribution of each component, and our framework achieves state-of-the-art performance on the SoccerNet-GSR benchmark, outperforming existing baselines by a significant margin. The proposed framework demonstrates strong robustness, generalization across scenes, and practical utility for structured game understanding in real-world broadcast sports analytics.