OpenFashionCLIP: Vision-and-Language Contrastive Learning with Open-Source Fashion Data

Cartella, Giuseppe; Baldrati, Alberto; Morelli, Davide; Cornia, Marcella; Bertini, Marco; Cucchiara, Rita

doi:10.1007/978-3-031-43148-7_21

The inexorable growth of online shopping and e-commerce demands scalable and robust machine learning-based solutions to accommodate customer requirements. In the context of automatic tagging classification and multimodal retrieval, prior works either defined a low generalizable supervised learning approach or more reusable CLIP-based techniques while, however, training on closed source data. In this work, we propose OpenFashionCLIP, a vision-and-language contrastive learning method that only adopts open-source fashion data stemming from diverse domains, and characterized by varying degrees of specificity. Our approach is extensively validated across several tasks and benchmarks, and experimental results highlight a significant out-of-domain generalization capability and consistent improvements over state-of-the-art methods both in terms of accuracy and recall. Source code and trained models are publicly available at: https://github.com/aimagelab/open-fashion-clip.

OpenFashionCLIP: Vision-and-Language Contrastive Learning with Open-Source Fashion Data

Cartella, Giuseppe;Baldrati, Alberto;Morelli, Davide;Cornia, Marcella;Bertini, Marco;Cucchiara, Rita

2023-01-01

Abstract

The inexorable growth of online shopping and e-commerce demands scalable and robust machine learning-based solutions to accommodate customer requirements. In the context of automatic tagging classification and multimodal retrieval, prior works either defined a low generalizable supervised learning approach or more reusable CLIP-based techniques while, however, training on closed source data. In this work, we propose OpenFashionCLIP, a vision-and-language contrastive learning method that only adopts open-source fashion data stemming from diverse domains, and characterized by varying degrees of specificity. Our approach is extensively validated across several tasks and benchmarks, and experimental results highlight a significant out-of-domain generalization capability and consistent improvements over state-of-the-art methods both in terms of accuracy and recall. Source code and trained models are publicly available at: https://github.com/aimagelab/open-fashion-clip.

Scheda breve

Scheda completa

Scheda completa (DC)

Anno

2023

Codice ISBN

9783031431470
9783031431487

File in questo prodotto:

Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11568/1275193

Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni

ND

10

7

CINECA IRIS Institutional Research Information System

OpenFashionCLIP: Vision-and-Language Contrastive Learning with Open-Source Fashion Data

Cartella, Giuseppe;Baldrati, Alberto;Morelli, Davide;Cornia, Marcella;Bertini, Marco;Cucchiara, Rita

2023-01-01

Abstract

Scheda breve

Scheda completa

Scheda completa (DC)

Attenzione

Citazioni

social impact

CINECA IRIS Institutional Research Information System

OpenFashionCLIP: Vision-and-Language Contrastive Learning with Open-Source Fashion Data

Cartella, Giuseppe;Baldrati, Alberto;Morelli, Davide;Cornia, Marcella;Bertini, Marco;Cucchiara, Rita

2023-01-01

Abstract

Scheda breve Scheda completa Scheda completa (DC)

Informazioni

Attenzione

Citazioni

social impact

Conferma cancellazione

Scheda breve

Scheda completa

Scheda completa (DC)