Eff-GRot: Efficient and Generalizable Rotation Estimation with Transformers
Abstract
We introduce Eff-GPose, an approach for efficient and generalizable 3D pose estimation from RGB images. Given a query image and a set of posed reference images, our method directly predicts the object’s pose in a single forward pass, without requiring object- or category-specific training. At the core of our framework is a transformer that performs a pose-aware comparison in the latent space, jointly processing enriched global representations from multiple posed references alongside a query. This design enables a favorable balance between accuracy and computational efficiency while remaining simple, scalable, and fully end-to-end. Our results demonstrate that Eff-GPose offers a promising direction toward more efficient pose estimation, particularly for latency-sensitive applications.