VAST-ReID: A Low-Light Benchmark Dataset for Person Re-Identification with Visual and Attribute-Rich Semantic Tracking
Abstract
Person Re-Identification (ReID) task is important for designing intelligent surveillance systems. ReID can be highly challenging in low-light and low resolution scenarios. Existing ReID datasets predominantly feature cropped pedestrian images captured in well-lit environments, often lacking semantic richness, frame-level temporal continuity, and robustness to adverse conditions. To address these limitations, we introduce VAST-ReID, a new benchmark dataset specifically designed for the low-light person ReID task in real-world surveillance contexts. VAST-ReID consists of 1,211 surveillance videos collected at 21 different locations, capturing 169 distinct pedestrians of various age groups. The dataset emphasizes naturally low-light and visually degraded scenarios. Each identity is annotated with dense bounding boxes and enriched with auxiliary semantic labels, including pedestrian attributes and LLM-generated descriptions. While these annotations are not used during supervised training, they provide valuable semantic context for advancing research in language-guided retrieval and attribute-aware modeling. Additionally, we release identity-aligned image crops under the BoxTrack-ReID subset, which has over 14.3K frames sampled at 1fps from the raw videos, with standard training, gallery, and query splits compatible with the Market-1501 evaluation protocol, enabling straightforward benchmarking. The dataset has been benchmarked against SOTA methods, and experiments reveal that there is huge scope for improvement in ReID research.