Go home now Header Background Image
Search
Submission Procedure
share: |
 
Follow us
 
 
 
 
Volume 4 / Issue 9

available in:   HTML (47 kB) PDF (120 kB) PS (230 kB)
 
get:  
Similar Docs BibTeX   Write a comment
  
get:  
Links into Future
 
DOI:   10.3217/jucs-004-09-0719

 

Categorisation by Context

Giuseppe Attardi (Dipartimento di Informatica, Università di Pisa, Italy)

Sergio Di Marco (Dipartimento di Informatica, Università di Pisa, Italy)

Davide Salvi (Dipartimento di Informatica, Università di Pisa, Italy)

Abstract: Assistance in retrieving of documents on the World Wide Web is provided either by search engines, through keyword based queries, or by catalogues, which organise documents into hierarchical collections. Maintaining catalogues manually is becoming increasingly difficult due to the sheer amount of material on the Web, and therefore it will be soon necessary to resort to techniques for automatic classification of documents. Classification is traditionally performed by extracting information for indexing a document from the document itself. The paper describes the technique of categorisation by context, which exploits the context perceivable from the structure of HTML documents to extract useful information for classifying the documents they refer to. We present the results of experiments with a preliminary implementation of the technique.

Keywords: Web search, hypertext navigation, information retrieval, text categorisation

Categories: H.3.1, H.3.3, H.3.5, H.5.1, H.5.3, I.2.7