Ranking a Stream of News

Del Corso, Gianna Maria; Gullì, A.; Romani, Francesco

doi:10.1145/1060745.1060764

According to a recent survey made by Nielsen NetRatings, searching on news articles is one of the most important activity online. Indeed, {\sf Google}, {\sf Yahoo}, {\sf MSN} and many others have proposed commercial search engines for indexing news feeds. Despite this commercial interest, no academic research has focused on ranking a stream of news articles and a set of news sources. In this paper, we introduce this problem by proposing a ranking framework which models: (1) the process of generation of a stream of news articles, (2) the news articles clustering by topics, and (3) the evolution of news story over the time. The ranking algorithm proposed ranks news information, finding the most authoritative news sources and identifying the most interesting events in the different categories to which news article belongs. All these ranking measures take in account the time and can be obtained without a predefined sliding window of observation over the stream. The complexity of our algorithm is linear in the number of pieces of news still under consideration at the time of a new posting. This allow a continuous on-line process of ranking. Our ranking framework is validated on a collection of more than 300.000 pieces of news, produced in two months by more of 2000 news sources belonging to 13 different categories (World, U.S, Europe, Sports, Business, etc). This collection is extracted from the index of {\sc comeToMyHead}, an academic news search engine available online.