An improved architecture for part-based animal re-identification through semantic segmentation distillation
Eugênio Dias Ribeiro Neto · Marc Chaumont · Gérard Subsol · Michel Garine-Wichatitsky · Hélène Guis
Abstract
Wildlife re-identification (Re-ID) is critical for non-invasive monitoring. Yet, animal Re-ID performances remain far behind person Re-ID due to limited datasets and a greater fine-grained appearance variability between individuals. One strategy is to adopt part-based methods in order to more precisely attend to distinct anatomical regions. To adapt to animal Re-ID, we propose PAW-ViT (Part-AWare animal re-identification Vision Transformer), a ViT that replaces the standard classification token with $K$ learnable part tokens, each specialized to a specific anatomical region of the animal. Spatial specialization is achieved via feature-based knowledge distillation by training each token’s attention to image patches to produce a semantic segmentation mask. An additional aggregation token fuses the part embeddings into a single part-aware descriptor. Trained with a multi-task loss, PAW-ViT outperforms state-of-the-art methods in animal Re-ID on ATRW (Amur tigers) and YakREID-103 (yaks), particularly in scenarios of strong viewpoint variations like the cross-camera setting.
Successful Page Load