As a cost-effective and robust technology, automotive radar has seen steady improvement during the last years. Radio frequency (RF) images, serving as a radar data format with rich semantic information, have attracted considerable interest in radar object detection. Previous RF-based models heavily rely on convolutional neural networks, leading to the high computational cost. To solve this problem, we propose a model called Mask-RadarNet to fully utilize the hierarchical semantic features from the RF image sequences. Mask-RadarNet exploits the combination of interleaved convolution and attention operations in the encoder. In addition, patch shift is introduced to Mask-RadarNet for efficient spatial-temporal feature learning. By shifting part of patches with a specific mosaic pattern in the temporal dimension, Mask-RadarNet achieves competitive performance while reducing the computational burden of the spatial-temporal modeling. In order to capture the spatial-temporal semantic contextual information, we design the class masking attention module (CMAM) in our encoder. Moreover, a lightweight auxiliary decoder is added to our model to aggregate prior maps generated from the CMAM. Experiments on the CRUW dataset demonstrate that the proposed Mask-RadarNet achieves state-of-the-art performance with relatively lower computational complexity and fewer parameters.

Mask-RadarNet: Enhancing Radar Object Detection With Spatio-Temporal Context

Danilo Orlando;
2025-01-01

Abstract

As a cost-effective and robust technology, automotive radar has seen steady improvement during the last years. Radio frequency (RF) images, serving as a radar data format with rich semantic information, have attracted considerable interest in radar object detection. Previous RF-based models heavily rely on convolutional neural networks, leading to the high computational cost. To solve this problem, we propose a model called Mask-RadarNet to fully utilize the hierarchical semantic features from the RF image sequences. Mask-RadarNet exploits the combination of interleaved convolution and attention operations in the encoder. In addition, patch shift is introduced to Mask-RadarNet for efficient spatial-temporal feature learning. By shifting part of patches with a specific mosaic pattern in the temporal dimension, Mask-RadarNet achieves competitive performance while reducing the computational burden of the spatial-temporal modeling. In order to capture the spatial-temporal semantic contextual information, we design the class masking attention module (CMAM) in our encoder. Moreover, a lightweight auxiliary decoder is added to our model to aggregate prior maps generated from the CMAM. Experiments on the CRUW dataset demonstrate that the proposed Mask-RadarNet achieves state-of-the-art performance with relatively lower computational complexity and fewer parameters.
2025
Wu, Yuzhi; Liu, Jun; Jiang, Guangfeng; Liu, Weijian; Orlando, Danilo; Xiao, Li
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11568/1345027
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact