ART: Actor-Related Tubelet for Detecting Complex-shaped Action Tubes
Abstract
This paper addresses the challenge of detecting complex-shaped action tubes in videos. Existing methods assume that actor's position changes slightly in short video clips. Therefore, they either oversimplify the shape of action tubes by representing them as cuboids or learnable positional patterns. However, these solutions may produce an action tube losing the corresponding actor when the actor trajectory becomes complex. This is because they rely solely on position information to determine action tubes, lacking the ability to trace the same actor when their movement patterns are intricate. To address this issue, we propose Actor-related Tubelet (ART), which incorporates actor-specific information when generating action tubes. Regardless of the complexity of an actor's trajectory, ART ensures that an action tube consistently tracks the same actor, relying on actor-specific cues rather than solely on positional information. To assess ART’s effectiveness, we introduce a metric for quantifying tube shape complexity and evaluate ART on three mainstream datasets, MultiSports, UCF101-24 and JHMDB51-21, achieving substantial improvements.