With the proliferation of image-based social media, an ex-tremely large amount of multimodal data is being produced. Very oftenimage contents are published together with a set of user defined meta-data such as tags and textual descriptions. Despite being very useful toenhance traditional image retrieval, user defined tags on social mediahave been proven to be noneffective to index images because they areinfluenced by personal experiences of the owners as well as their will ofpromoting the published contents. To be analyzed and indexed, multi-modal data require algorithms able to jointly deal with textual and visualdata. This research presents a multimodal approach to the problem of tagrefinement, which consists in separating the relevant descriptors (tags)of images from noisy ones. The proposed method exploits both Natu-ral Language Processing (NLP) and Computer Vision (CV) techniquesbased on deep learning to find a match between the textual informationand visual content of social media posts. Textual semantic features arerepresented with (multilingual) word embeddings, while visual ones areobtained with image classification. The proposed system is evaluated ona manually annotated Italian dataset extracted from Instagram achieving68% of weighted F1-score

Less is MORE: a MultimOdal system for tag REfinement

Lucia Passaro
Primo
;
Alessandro Lenci
Secondo
2020-01-01

Abstract

With the proliferation of image-based social media, an ex-tremely large amount of multimodal data is being produced. Very oftenimage contents are published together with a set of user defined meta-data such as tags and textual descriptions. Despite being very useful toenhance traditional image retrieval, user defined tags on social mediahave been proven to be noneffective to index images because they areinfluenced by personal experiences of the owners as well as their will ofpromoting the published contents. To be analyzed and indexed, multi-modal data require algorithms able to jointly deal with textual and visualdata. This research presents a multimodal approach to the problem of tagrefinement, which consists in separating the relevant descriptors (tags)of images from noisy ones. The proposed method exploits both Natu-ral Language Processing (NLP) and Computer Vision (CV) techniquesbased on deep learning to find a match between the textual informationand visual content of social media posts. Textual semantic features arerepresented with (multilingual) word embeddings, while visual ones areobtained with image classification. The proposed system is evaluated ona manually annotated Italian dataset extracted from Instagram achieving68% of weighted F1-score
File in questo prodotto:
File Dimensione Formato  
Passaro-Lenci-2020-Less-is-MORE-a-MultimOdal-system-for-tag-REfinemen-annotated.pdf

accesso aperto

Descrizione: Articolo principale
Tipologia: Versione finale editoriale
Licenza: Creative commons
Dimensione 3.54 MB
Formato Adobe PDF
3.54 MB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11568/1069972
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 1
  • ???jsp.display-item.citation.isi??? ND
social impact