The document aboutness problem asks for creating a succinct representation of a document's subject matter via keywords, sentences or entities drawn from a Knowledge Base. In this paper we propose an approach to solve this problem which improves the known solutions over all known datasets [4,19]. It is based on a wide and detailed experimental study of syntactic and semantic features drawn from the input document thanks to the use of some IR/NLP tools. To encourage and support reproducible experimental results on this task, we will make accessible our system via a public API: this is the first, and best performing, tool publicly available for the document aboutness problem.
Document aboutness via sophisticated syntactic and semantic features
Paolo Ferragina
;
2017-01-01
Abstract
The document aboutness problem asks for creating a succinct representation of a document's subject matter via keywords, sentences or entities drawn from a Knowledge Base. In this paper we propose an approach to solve this problem which improves the known solutions over all known datasets [4,19]. It is based on a wide and detailed experimental study of syntactic and semantic features drawn from the input document thanks to the use of some IR/NLP tools. To encourage and support reproducible experimental results on this task, we will make accessible our system via a public API: this is the first, and best performing, tool publicly available for the document aboutness problem.File | Dimensione | Formato | |
---|---|---|---|
main.pdf
accesso aperto
Tipologia:
Documento in Post-print
Licenza:
Tutti i diritti riservati (All rights reserved)
Dimensione
152.63 kB
Formato
Adobe PDF
|
152.63 kB | Adobe PDF | Visualizza/Apri |
Ponza2017_Chapter_version of record.pdf
solo utenti autorizzati
Tipologia:
Versione finale editoriale
Licenza:
NON PUBBLICO - Accesso privato/ristretto
Dimensione
489.88 kB
Formato
Adobe PDF
|
489.88 kB | Adobe PDF | Visualizza/Apri Richiedi una copia |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.