Profiling city areas, in terms of citizens' behaviour and commercial and social activities, is an interesting issue in the context of smart cities, especially considering a real-time streaming context. Several methods have been proposed in the literature, exploiting different data sources. In this paper, we propose an approach to perform profiling of city areas based on articles of local online newspapers, by exploiting information regarding the text as well as metadata such as geo-localization and tags. In particular, we use tags associated with each article for identifying macro-categories through clustering analysis on tags embeddings. Further, we employ a text categorization model based on SVM to label online a new article, represented as Bag-of-Words, with one of such categories. The categorization approach has been integrated into a framework recently proposed by the authors for profiling city areas exploiting different web sources of data: the online newspapers are monitored continuously, thus producing a news stream to be analysed. We show experiments performed on the city of Rome, considering data from 2014 to 2018. We discuss the results obtained by adopting different classifiers and present that the best classifier, namely an SVM, can achieve an accuracy and an f1-score up to 93% and 79%, respectively.

Exploiting Categorization of Online News for Profiling City Areas

Bondielli A.;Ducange P.;Marcelloni F.
2020-01-01

Abstract

Profiling city areas, in terms of citizens' behaviour and commercial and social activities, is an interesting issue in the context of smart cities, especially considering a real-time streaming context. Several methods have been proposed in the literature, exploiting different data sources. In this paper, we propose an approach to perform profiling of city areas based on articles of local online newspapers, by exploiting information regarding the text as well as metadata such as geo-localization and tags. In particular, we use tags associated with each article for identifying macro-categories through clustering analysis on tags embeddings. Further, we employ a text categorization model based on SVM to label online a new article, represented as Bag-of-Words, with one of such categories. The categorization approach has been integrated into a framework recently proposed by the authors for profiling city areas exploiting different web sources of data: the online newspapers are monitored continuously, thus producing a news stream to be analysed. We show experiments performed on the city of Rome, considering data from 2014 to 2018. We discuss the results obtained by adopting different classifiers and present that the best classifier, namely an SVM, can achieve an accuracy and an f1-score up to 93% and 79%, respectively.
2020
978-1-7281-4384-2
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11568/1055590
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 6
  • ???jsp.display-item.citation.isi??? 6
social impact