Visual-Language (VL) models offer potential for advancing Engineering Design (ED) by integrating text and visuals from technical documents. We review VL applications across ED phases, highlighting three key challenges: (i) understanding how functional and structural information is complementarily expressed by text and images, (ii) creating large-scale multimodal design datasets and (iii) improving VL models' ability to represent ED knowledge. A dataset of 1.5 million text-image pairs and an evaluation dataset for cross-modal information retrieval were developed using patents. By Fine-tuning and testing the CLIP base model on these datasets, we identified significant limitations in VL models' capacity to capture fine-grained technical details required for precision-driven ED tasks. Based on these findings, we propose future research directions to advance VL models for ED applications.

Uncovering the limits of visual-language models in engineering knowledge representation

Marco Consoloni;Vito Giordano;Federico Andrea Galatolo;Mario Giovanni Cosimo Antonio Cimino;Gualtiero Fantoni
2025-01-01

Abstract

Visual-Language (VL) models offer potential for advancing Engineering Design (ED) by integrating text and visuals from technical documents. We review VL applications across ED phases, highlighting three key challenges: (i) understanding how functional and structural information is complementarily expressed by text and images, (ii) creating large-scale multimodal design datasets and (iii) improving VL models' ability to represent ED knowledge. A dataset of 1.5 million text-image pairs and an evaluation dataset for cross-modal information retrieval were developed using patents. By Fine-tuning and testing the CLIP base model on these datasets, we identified significant limitations in VL models' capacity to capture fine-grained technical details required for precision-driven ED tasks. Based on these findings, we propose future research directions to advance VL models for ED applications.
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11568/1336527
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 0
  • ???jsp.display-item.citation.isi??? ND
social impact