The Anatomy of a Hierarchical Clustering Engine for Web-page, News and Book Snippets

Paolo Ferragina; Antonio Gull?

doi:10.1109/ICDM.2004.10027

Fourth IEEE International Conference on Data Mining (ICDM'04)

The Anatomy of a Hierarchical Clustering Engine for Web-page, News and Book Snippets

Year: 2004, Pages: 395-398

DOI Bookmark: 10.1109/ICDM.2004.10027

Authors

Paolo Ferragina, Universit? di Pisa, Italy
Antonio Gull?, Universit? di Pisa, Italy

Abstract

In this paper, we investigate the web snippet hierarchical clustering problem in its full extent by devising an algorithmic solution, and a software prototype called SnakeT (accessible at http://roquefort.di.unipi.it/), that: (1) draws the snippets from 16 Web search engines, the Amazon collection of books a9.com, the news of Google News and the blogs of Blogline; (2) builds the clusters on-the-fly (ephemeral clustering) in response to a user query without adopting any pre-defined organization in categories; (3) labels the clusters with sentences of variable length, drawn from the snippets and possibly missing some terms, provided they are not too many;

Like what you’re reading?

Already a member?

Get this article FREE with a new membership!