Leveraging Deep Learning for Image Anonymization in the Insurance Domain

Andreozzi, Alessandra; RICCARDI CELSI, L.; Martini, A.

Digital transformation is triggering radical changes in terms of value proposition in the insurance industry, thanks to the emerging data-driven processes based on deep learning tools. However, unlocking valuable insight in this respect depends on the fine-tuning of suitable algorithms, and on the quality/quantity of the prepared input data. AI encompasses deep learning techniques that are suitable for the object detection task foreseen by image anonymization for insurance purposes: among the several available methods of implementing such computer vision strategies, convolutional neural networks have returned the best results over the last years [3]. More in detail, object detection requires deep ad-hoc architectures and the subsequent approach to combining layers into a suitable model is generally tailored to the task that needs to be performed. In this respect, new ways of combining layers are constantly released in order to provide improved architectures [3]: among these, YOLO [4, 5], SSD [6], Faster R-CNN [7], and RetinaNet [8, 9] have been identified as the most in line with the customer’s needs. Eventually, RetinaNet was chosen as the most suitable framework for image anonymization, mainly due to the fact that focal loss allows concentrating the algorithm design on learning from hard examples. The implementation and fine-tuning of the most recent deep neural network architectures provides valuable and unprecedented performance in privacy-preserving insurance business processes. In order to effectively evaluate the peculiar task of anonymizing sensitive data in insurance images, ad hoc performance metrics are to be identified and validated (i.e., Intersection-over-Union, Recall). The research aim is to design an efficient procedure for anonymizing car images, blurring sensitive data while leaving any other data unaffected (i.e., the part of the image accounting for the damaged objects). The object detection algorithm is aimed at allowing the anonymization of car images by detecting and blurring any sensitive data while leaving any other valuable data unaffected. Thus, the part of the image accounting for the damaged objects is left untouched so that the subsequent categorization of the car image can be performed (e.g., assessing where the most relevant car damages are located and what is the extent of the damage). The related deep neural network, designed according to the RetinaNet framework, was trained on a sample of more than 10000 images stored in the customer’s database, and eventually, several tests on samples of 3000 images containing sensitive data were carried out to evaluate the algorithm performance. The obtained results show that the performance of the trained object detection algorithm at predicting any object classes accounting for sensitive data (e.g., license plates, vehicle identification numbers and person shapes) in terms of recall and Intersection-over-Union metrics proves to be greater than 90%. This score was found to be consistent with the recent computer vision literature as well as with the customer requirements, thus encouraging the adoption of the proposed tool within the process of automating claim management in the insurance domain.