Oral Session
Oral Session 2B: Biometrics, Face, Gesture, and Body Pose I
Identity Verification from Human Scent using Channel Representation of 2D Gas Chromatography-Mass Spectrometry Data
Radim Spetlik ⋅ Jan Hlavsa ⋅ Jana Čechová ⋅ Petra Pojmanová ⋅ Jiri Matas ⋅ Štěpán Urban
This study examines the feasibility of employing raw two-dimensional gas chromatography/time-of-flight mass spectrometry (GCxGC ToF-MS) data for the purpose of human scent identity verification. Unlike techniques that require expert-driven identification of compounds, our framework transforms each GCxGC sample into a multi-channel image. A comprehensive assessment has been conducted on ten channel-encoding schemes, five spatial-alignment strategies, and ten feature-embedding methods.The evaluation is performed on a newly assembled dataset of 252 individuals, comprising 2,528 raw samples and aggregating around 7.5TB of data. In contrast to conventional methodologies employed in chemical analysis, our research demonstrates that alignment to a common spatial reference frame is unnecessary. The best performing method reaches an approximately 53% true positive rate at a 5% false positive rate. Although this performance is below that of well-established biometrics (e.g., iris verification), our results underscore the feasibility of raw-odor-based verification for scenarios where direct line-of-sight or cooperation may be limited, thereby revealing opportunities for interdisciplinary research.We will release the code and datasets with the camera ready.
milliMamba: Specular-Aware Human Pose Estimation via Dual mmWave Radar with Multi-Frame Mamba Fusion
Niraj Prakash Kini ⋅ Shiau-Rung Tsai ⋅ Guan-Hsun Lin ⋅ Wen-Hsiao Peng ⋅ Ching-Wen Ma ⋅ Jenq-Neng Hwang
Millimeter-wave radar offers a privacy-preserving and lighting-invariant alternative to RGB sensors for Human Pose Estimation (HPE) task. However, the radar signals are often sparse due to specular reflection, making the extraction of robust features from radar signals highly challenging. To address this, we present milliMamba, a radar-based 2D human pose estimation framework that jointly models spatio-temporal dependencies across both the feature extraction and decoding stages. Specifically, given the high dimensionality of radar inputs, we adopt a Cross-View Fusion Mamba encoder to efficiently extract spatio-temporal features from longer sequences with linear complexity. A Spatio-Temporal-Cross Attention decoder then predicts joint coordinates across multiple frames. Together, this spatio-temporal modeling pipeline enables the model to leverage contextual cues from neighboring frames and joints to infer missing joints caused by specular reflections. To reinforce motion smoothness, we incorporate a velocity loss alongside the standard keypoint loss during training. Experiments on the TransHuPR and HuPR datasets demonstrate that our method achieves significant performance improvements, exceeding the baselines by 11.0 AP and 14.6 AP, respectively, while maintaining reasonable complexity. Our code will be released upon publication.
OpenCowID: Zero-Shot Visual Identification of Dairy Cows
Omkar Prabhune ⋅ Younghyun Kim
Accurate identification of individual cows is essential to precision dairy farming. While computer vision offers a non-invasive alternative to ear tags and RFID systems, its practical deployment remains limited by the need for zero-shot identification in dynamic herds where test identities are unseen during training. In this work, we propose OpenCowID, a unified framework that addresses this challenge.First, we introduce a stochastic cow coat synthesis pipeline that efficiently generates large-scale, diverse images.Second, using the generated large-scale high-quality data, we present a centroid-guided feature learning strategy that forms a well-structured embedding space using virtual class centroids, enabling generalization to unseen identities. OpenCowID achieves state-of-the-art zero-shot and open-set identification on real-world cow benchmarks, without requiring any real labeled training data. This work contributes to the advancement of automated livestock monitoring, enabling robust, non-invasive identification.The code for reproducing our results is provided in the supplementary material.
QCFace: Image Quality Control for boosting Face Representation & Recognition
Duc-Phuong Doan-Ngo ⋅ Thanh-Dang Diep ⋅ Thanh Nguyen-Duc ⋅ Thanh-Sach LE ⋅ Nam Thoai
Recognizability, a key perceptual factor in human face processing, strongly affects the performance of face recognition (FR) systems in both verification and identification. Effectively using recognizability to enhance feature representation remains challenging. In deep FR, the loss function plays a crucial role in shaping how features are embedded. However, current methods have two main drawbacks: (i) recognizability is only partially captured through soft margin constraints, resulting in weaker quality representation and lower discrimination, especially for low-quality or ambiguous faces; (ii) mutual overlapping gradients between feature direction and magnitude introduce undesirable interactions during optimization, causing instability and confusion in hypersphere planning, which may result in poor generalization, and entangled representations where recognizability and identity are not cleanly separated. To address these issues, we introduce a hard margin strategy - Quality Control Face (QCFace) that overcomes the mutual overlapping gradient problem and enables clear decoupling of recognizability from identity representation. Based on this strategy, a novel \textit{hard-margin-based} loss function employs a guidance factor for hypersphere planning, simultaneously optimizing for recognition ability and explicit recognizability representation. Extensive experiments confirm that QCFace not only provides robust and quantifiable recognizability encoding but also achieves state-of-the-art performance in both verification and identification benchmarks compared to existing recognizability-based losses.
MMHOI: Modeling Complex 3D Multi-Human Multi-Object Interactions
Kaen Kazawa (Kogashi) ⋅ Anoop Cherian ⋅ Meng-Yu Jennifer Kuo
Real‑world scenes often feature multiple humans interacting with multiple objects in ways that are causal, goal‑oriented, or cooperative. Yet existing 3D human-object interaction (HOI) benchmarks consider only a fraction of these complex interactions. To close this gap, we present MMHOI -- a large-scale, Multi-human Multi-object Interaction dataset consisting of images from 12 everyday scenarios. MMHOI offers complete 3D shape and pose annotations for every person and object, along with labels for 78 action categories and 14 interaction‑specific body parts, providing a comprehensive testbed for next-generation HOI research.Building on MMHOI, we present MMHOI-Net, an end-to-end transformer-based neural network for jointly estimating human–object 3D geometries, their interactions, and associated actions. A key innovation in our framework is a structured dual-patch representation for modeling objects and their interactions, combined with action recognition to enhance the interaction prediction. Experiments on MMHOI and the recently proposed CORE4D datasets demonstrate that our approach achieves state-of-the-art performance in multi-HOI modeling, excelling in both accuracy and reconstruction quality.