A Dataset and Framework for Learning State-invariant Object Representations
Abstract
We introduce state invariance alongside other common invariances to learn object representations for recognition and retrieval tasks. State invariance refers to robustness against changes in an object’s structural form, such as when an umbrella is folded or a clothing item is tossed on the floor. Humans recognize objects despite such changes, motivating the question of whether neural architectures can achieve similar robustness. To that end, we present ObjectsWithStateChange, a novel dataset designed to facilitate research in fine-grained 3D object recognition and retrieval of objects capable of state changes. The dataset captures variations in state and pose from arbitrary viewpoints to support learning discriminative embeddings that are invariant not only to state changes but also to variations in viewpoint, pose, and illumination.A key challenge is that different objects (within and across categories) may appear visually similar under certain state changes, causing their embeddings to be close and making discrimination difficult. To address this, we propose a curriculum learning strategy that leverages the learned similarity relationships after each epoch to guide training. Following curriculum learning principles, the approach progressively selects object pairs with smaller inter-object distances, gradually sampling harder-to-distinguish examples of visually similar objects from within and across categories during training.Our ablation study shows that this curriculum learning strategy improves object recognition accuracy by 7.9% and retrieval mAP by 9.2% compared to state-of-the-art methods. We believe that this approach enhances the model’s ability to learn discriminative features for fine-grained tasks involving objects with state changes, leading to improved performance not only on the new dataset we present, but also on three other multi-view datasets, such as ModelNet40, ObjectPI, and FG3D.