Denoise, Divide, Distill, and Predict ($D^3P$): Towards Forecasting Long-horizon Real-world Anomaly from Normalcy
Quentin Mérilleau · Snehashis Majhi · Antitza Dantcheva · Quan Kong · Lorenzo Garattoni · Gianpiero Francesca · Francois Bremond
Abstract
Forecasting abnormal human behavior (AHB) in unconstrained real-world environments is critical for enabling proactive safety interventions. Unlike short-term anomaly detection, long-horizon forecasting offers a vital reaction window but remains underexplored due to three core challenges: (i) noisy, complex human–agent interactions; (ii) weak temporal coupling between normal observations and distant anomalies; and (iii) data scarcity limiting the scalability of autoregressive models. To address these, we propose $\mathcal{D}^3\mathcal{P}$ (Denoise, Divide, Distill, and Predict), a novel encoder–decoder framework that bridges denoised pasts with distilled autoregressive futures. Our Differential Past Encoder (DiPE) disentangles scene-level and object-level dynamics via differential attention, suppressing irrelevant interactions and enhancing discriminative cues. The Distilled Future Auto-Regressive Decoder (D-FAD) adopts a divide-and-conquer strategy, segmenting future queries into temporal chunks for sequential prediction, while leveraging distillation to balance robustness and latency. We validate our approach on the AHB-F benchmark, the only dataset dedicated to abnormal behavior forecasting, and further integrate D-FAD with several state-of-the-art methods. In all cases, our framework consistently outperforms prior work in both forecasting accuracy and computational efficiency.
Successful Page Load