GorillaWatch: An Automated System for In-the-Wild Gorilla Re-Identification and Population Monitoring
Abstract
Monitoring critically endangered Western Lowland Gorillas is hampered by the immense manual effort required to re-identify individuals from vast amounts of camera trap footage. The primary obstacle to automating this process has been the lack of large-scale, "in-the-wild" video datasets suitable for training robust deep learning models. To address this critical gap, we introduce a comprehensive benchmark suite of three novel datasets: Gorilla Wild, the largest in-the-wild video dataset for primate re-identification to date, designed for challenging cross-encounter evaluation; Gorilla Zoo, for assessing cross-domain generalization; and Gorilla Tracking, a meticulously annotated dataset for evaluating multi-object tracking.Building on these datasets, we present GorillaWatch, a complete end-to-end pipeline that integrates state-of-the-art detection, tracking, and re-identification. Our technical contributions include a novel multi-frame self-supervised pretraining strategy that leverages temporal consistency in tracklets to learn powerful, domain-specific features without manual labels. We systematically adapt and evaluate various foundation models, video transformers, and ensemble techniques, achieving state-of-the-art performance on gorilla re-identification. Furthermore, we introduce a constrained clustering method that uses spatiotemporal metadata to accurately perform unsupervised population counting. An adaptation of AttnLRP for representation learning provides interpretability, ensuring our model focuses on meaningful biological traits.