Titre du document / Document title
Indexing the invisible web : a survey
Auteur(s) / Author(s)
YANBO RU
(1) ;
HOROWITZ Ellis
(1) ;
Affiliation(s) du ou des auteurs / Author(s) Affiliation(s)
(1) Department of Computer Science, University of Southern California, Los Angeles, California, ETATS-UNIS
Résumé / Abstract
Purpose - The existence and continued growth of the invisible web creates a major challenge for search engines that are attempting to organize all of the material on the web into a form that is easily retrieved by all users. The purpose of this paper is to identify the challenges and problems underlying existing work in this area. Design/methodology/approach - A discussion based on a short survey of prior work, including automated discovery of invisible web site search interfaces, automated classification of invisible web sites, label assignment and form filling, information extraction from the resulting pages, learning the query language of the search interface, building content summary for an invisible web site, selecting proper databases, integrating invisible web-search interfaces, and accessing the performance of an invisible web site. Findings - Existing technologies and tools for indexing the invisible web follow one of two strategies: indexing the web site interface or examining a portion of the contents of an invisible web site and indexing the results. Originality/value - The paper is of value to those involved with information management.
Revue / Journal Title
Online information review
ISSN 1468-4527
Source / Source
2005, vol. 29, n
o3, pp. 249-265 [17 page(s) (article)] (2 p.1/4)
Langue / Language
Anglais
Editeur / Publisher
Emerald, Bradford, ROYAUME-UNI
(2000)
(Revue)
Mots-clés anglais / English Keywords
Invisible web ;
Survey ;
Method ;
Indexing ;
Information retrieval ;
Search engine ;
World wide web ;
Mots-clés français / French Keywords
Mots-clés espagnols / Spanish Keywords
Web invisible ;
Encuesta ;
Método ;
Indización ;
Búsqueda información ;
Buscador ;
Red WWW ;
Mots-clés d'auteur / Author Keywords
Worldwide web ;
Search engines ;
Information retrieval ;
Indexing ;
Localisation / Location
INIST-CNRS, Cote INIST : 17093, 35400013818753.0030
Nº notice refdoc (ud4) : 16892496