The use of personal protective equipment (PPE) is essential to improve workplace safety. Despite specific regulations requiring the use of PPE, workers often neglect to wear it due to factors such as inattention, urgency, or convenience. Monitoring the correct use of PPE is especially critical in high-risk tasks. Computer vision technology can automate this process, leveraging deep neural models. This study investigates the performance of modern object detection models in identifying the correct use of PPE, focusing on their accuracy and execution speed. Specifically, the YOLOv11 and RT-DERT models are employed, trained on a real-world PPE dataset. Deployment on low-cost hardware, specifically an NVIDIA Jetson Nano, is evaluated using three deployment frameworks, namely PyTorch, OpenVINO, and TensorRT. The results show that YOLOv11n, with 2.6 million parameters, provides slightly lower average accuracy than more complex models. It stands out for its speed, reaching performances of 6.6 Frames Per Second (FPS) with PyTorch, 2.3 FPS with OpenVINO, and 10.6 FPS with TensorRT. On the other hand, YOLOv11l and YOLOv11x with, respectively, 46.5 and 86.7 million parameters offer higher accuracy, especially evident in small class identification, where simpler models tend to struggle. However, they show lower throughput, with 1.2 and 0.7 FPS on PyTorch. RT-DETR has competitive accuracy but lower performance on edge devices.
A Comparative Analysis of Models for Real-Time Personal Protective Equipment Detection on Edge Devices
Miglionico, Giustino Claudio;Di Rienzo, Francesco;Ducange, Pietro;Marcelloni, Francesco;Vallati, Carlo
2025-01-01
Abstract
The use of personal protective equipment (PPE) is essential to improve workplace safety. Despite specific regulations requiring the use of PPE, workers often neglect to wear it due to factors such as inattention, urgency, or convenience. Monitoring the correct use of PPE is especially critical in high-risk tasks. Computer vision technology can automate this process, leveraging deep neural models. This study investigates the performance of modern object detection models in identifying the correct use of PPE, focusing on their accuracy and execution speed. Specifically, the YOLOv11 and RT-DERT models are employed, trained on a real-world PPE dataset. Deployment on low-cost hardware, specifically an NVIDIA Jetson Nano, is evaluated using three deployment frameworks, namely PyTorch, OpenVINO, and TensorRT. The results show that YOLOv11n, with 2.6 million parameters, provides slightly lower average accuracy than more complex models. It stands out for its speed, reaching performances of 6.6 Frames Per Second (FPS) with PyTorch, 2.3 FPS with OpenVINO, and 10.6 FPS with TensorRT. On the other hand, YOLOv11l and YOLOv11x with, respectively, 46.5 and 86.7 million parameters offer higher accuracy, especially evident in small class identification, where simpler models tend to struggle. However, they show lower throughput, with 1.2 and 0.7 FPS on PyTorch. RT-DETR has competitive accuracy but lower performance on edge devices.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.


