Human pose estimation (HPE) is essential in human motion analysis. Nowadays, numerous RGB datasets are available to train deep learning-based HPE models. However, poor lighting and privacy issues pose challenges in the visible domain. Thermal cameras can address these issues as they are illumination invariant. However, few annotated thermal HPE datasets exist for training deep learning models. Also, HPE models trained for either the thermal or visual domain, with little exploration of cross-domain knowledge transfer. In this work, we trained the YOLO11-pose models for multispectral human pose estimation by fusing the COCO and OpenThermalPose2 datasets. The results show that our models achieve high accuracy in both domains, even outperforming models specialized for each domain. The largest model, YOLO11x-pose, achieved an AP50:95_pose of 95.23% on the test set of OpenThermalPose2, establishing a new benchmark for this dataset. Also, the model achieved an AP50:95_pose of 69.89% on the COCO validation set, slightly improving the results of the original YOLO11x-pose model. We optimized the models and deployed them on an NVIDIA Jetson AGX Orin 64GB. The models in TensorRT format with half-precision floating-point achieved the best balance of speed and accuracy, making them suitable for real-time applications.
The paper was presented at the 51st Annual Conference of the IEEE Industrial Electronics Society. We will provide a link to the paper here when it is published on IEEE.
Download checkpoints (nano, small, medium, large, and x-large) from Google Drive.
pip install ultralytics
from ultralytics import YOLO
# load one of the models
model = YOLO(PATH_TO_MODEL)
# Predict with the model
results = model.predict(PATH_TO_IMAGE, save=True)
