FlyPose: Towards Robust Human Pose Estimation From Aerial Views
Abstract
Unmanned Aerial Vehicles (UAVs) are increasingly deployed in close proximity to humans for applications such as parcel delivery, traffic monitoring, disaster reponse and infrastructure inspections. Ensuring safe and reliable operation in these human-populated environments demands accurate perception of human poses and actions from an aerial viewpoint. However, person detection and human pose estimation from onboard UAVs present unique challenges due to factors like low resolution, steep viewing angles, occlusion, and limited computation resources. In this work, we develop \textit{FlyPose}, a lightweight top-down human pose estimation model optimized for aerial imagery and able to run on edge devices. We compare the effectiveness of current approaches and improve the person detection and pose estimation results on aerial datasets. Through multi-dataset training, we achieve an average improvement of 12.3 AP in person detection on the test-sets of Manipal-UAV, VisDrone, HIT-UAV as well as a custom aerial pose estimation dataset. For 2D human pose estimation we report an improvement of 16.3 mAP on the challenging UAV-Human dataset. FlyPose runs with an inference latency of 19.5 milliseconds on a Jetson Orin Developer Kit. Our model aims to provide a foundation for human-aware UAV applications with realtime demands and provides more accurate human poses for downstream tasks.