The use of personal protective equipment (PPE) is essential to improve workplace safety. Despite specific regulations requiring the use of PPE, workers often neglect to wear it due to factors such as inattention, urgency, or convenience. Monitoring the correct use of PPE is especially critical in high-risk tasks. Computer vision technology can automate this process, leveraging deep neural models. This study investigates the performance of modern object detection models in identifying the correct use of PPE, focusing on their accuracy and execution speed. Specifically, the YOLOv11 and RT-DERT models are employed, trained on a real-world PPE dataset. Deployment on low-cost hardware, specifically an NVIDIA Jetson Nano, is evaluated using three deployment frameworks, namely PyTorch, OpenVINO, and TensorRT. The results show that YOLOv11n, with 2.6 million parameters, provides slightly lower average accuracy than more complex models. It stands out for its speed, reaching performances of 6.6 Frames Per Second (FPS) with PyTorch, 2.3 FPS with OpenVINO, and 10.6 FPS with TensorRT. On the other hand, YOLOv11l and YOLOv11x with, respectively, 46.5 and 86.7 million parameters offer higher accuracy, especially evident in small class identification, where simpler models tend to struggle. However, they show lower throughput, with 1.2 and 0.7 FPS on PyTorch. RT-DETR has competitive accuracy but lower performance on edge devices.

A Comparative Analysis of Models for Real-Time Personal Protective Equipment Detection on Edge Devices

Miglionico, Giustino Claudio;Di Rienzo, Francesco;Ducange, Pietro;Marcelloni, Francesco;Vallati, Carlo
2025-01-01

Abstract

The use of personal protective equipment (PPE) is essential to improve workplace safety. Despite specific regulations requiring the use of PPE, workers often neglect to wear it due to factors such as inattention, urgency, or convenience. Monitoring the correct use of PPE is especially critical in high-risk tasks. Computer vision technology can automate this process, leveraging deep neural models. This study investigates the performance of modern object detection models in identifying the correct use of PPE, focusing on their accuracy and execution speed. Specifically, the YOLOv11 and RT-DERT models are employed, trained on a real-world PPE dataset. Deployment on low-cost hardware, specifically an NVIDIA Jetson Nano, is evaluated using three deployment frameworks, namely PyTorch, OpenVINO, and TensorRT. The results show that YOLOv11n, with 2.6 million parameters, provides slightly lower average accuracy than more complex models. It stands out for its speed, reaching performances of 6.6 Frames Per Second (FPS) with PyTorch, 2.3 FPS with OpenVINO, and 10.6 FPS with TensorRT. On the other hand, YOLOv11l and YOLOv11x with, respectively, 46.5 and 86.7 million parameters offer higher accuracy, especially evident in small class identification, where simpler models tend to struggle. However, they show lower throughput, with 1.2 and 0.7 FPS on PyTorch. RT-DETR has competitive accuracy but lower performance on edge devices.
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11568/1342102
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 1
  • ???jsp.display-item.citation.isi??? ND
social impact