We train three different models to generatenewspaper headlines from a portion of thecorresponding article. The articles are ob-tained from two mainstream Italian news-papers. In order to assess the models’ per-formance, we set up a human-based eval-uation where 30 different native speakersexpressed their judgment over a varietyof aspects. The outcome shows that (i)pointer networks perform better than stan-dard sequence to sequence models, creat-ing mostly correct and appropriate titles;(ii) the suitability of a headline to its arti-cle for pointer networks is on par or betterthan the gold headline; (iii) gold headlinesare still by far more inviting than gener-ated headlines to read the whole article,highlighting the contrast between humancreativity and content appropriateness.
Suitable Doesn’t Mean Attractive.Human-Based Evaluation of Automatically Generated Headlines
Michele Cafagna;Lorenzo De Mattei;Davide Bacciu;
2019-01-01
Abstract
We train three different models to generatenewspaper headlines from a portion of thecorresponding article. The articles are ob-tained from two mainstream Italian news-papers. In order to assess the models’ per-formance, we set up a human-based eval-uation where 30 different native speakersexpressed their judgment over a varietyof aspects. The outcome shows that (i)pointer networks perform better than stan-dard sequence to sequence models, creat-ing mostly correct and appropriate titles;(ii) the suitability of a headline to its arti-cle for pointer networks is on par or betterthan the gold headline; (iii) gold headlinesare still by far more inviting than gener-ated headlines to read the whole article,highlighting the contrast between humancreativity and content appropriateness.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.