Human Pose Aggregation for Multi-View Temporal Video Alignment
Fabien Delattre · Tsung-Wei Huang · Guan-Ming Su · Erik Learned-Miller
Abstract
When multiple videos of a scene are taken from differing viewpoints without precise synchronization, it can be difficult to temporally align them after the fact. Often the metadata or audio needed to do so is missing or inaccurate. But human motion in such videos can provide a strong signal for identifying matching time points across videos, through analysis of pose and movement. In this work, we leverage view-invariant human pose features to synchronize videos. Unlike previous human pose-based alignment techniques, our method can align videos containing multiple people without performing tracking or re-identification across views. We achieve this by aggregating pose information from multiple people into a single frame descriptor. This also enables fast $\mathcal{O}(n\log n)$ search for the optimal alignment. This simple but effective strategy leads to major and consistent improvements over existing human-based and visual feature temporal alignment techniques.
Successful Page Load