Titre du document / Document title
Résumé automatique de texte avec un algorithme d'ordonnancement = Automatic text summarization with a scheduling algorithm
Auteur(s) / Author(s)
USUNIER Nicolas
(1) ;
AMINI Massih-Reza
(1) ;
GALLINARI Patrick
(1) ;
Affiliation(s) du ou des auteurs / Author(s) Affiliation(s)
(1) Laboratoire d'Informatique de Paris 6 8, rue du Capitaine Scott, 75015 Paris, FRANCE
Résumé / Abstract
This paper investigates a new approach for automatic text summarization based on a Machine Learning (ML) ranking algorithm. Previous ML approaches defined a set of features which were used to produce a vector ofscores for each sentence in a given document and trained a classifier to make a global combination of these scores. The goal is to extract a subset of a document which most reflects its content. However, recent theoretical results suggest that the classification criterion may be suboptimal for learning scoring functions. Therefore, we propose to use ranking algorithms, which also combine the scores of different features but using a criterion which tends to reduce the relative misordering of sentences within a document. Features we use here are either based on the state-of-the-art or built upon word-clusters. These clusters are groups of words which often cooccur with each other, and can serve to expand a query or to enrich the representation of the sentences of the documents. We empirically show that the features used as well as the ranking algorithms outperforms state-of-the-art approaches on two distinct datasets.
Revue / Journal Title
Ingénierie des systèmes d'information
ISSN 1633-1311
Source / Source
2006, vol. 11, n
o 2 (107 p.) [Document : 21 p.] (1 p.1/4), pp. 71-91 [21 page(s) (article)]
Langue / Language
Français
Editeur / Publisher
Lavoisier, Paris, FRANCE
(2001)
(Revue)
Mots-clés anglais / English Keywords
Learning algorithm ;
Scheduling ;
Sentence ;
Hierarchical classification ;
Abstract ;
Database query ;
Artificial intelligence ;
Text ;
Information system ;
Mots-clés français / French Keywords
Algorithme apprentissage ;
Ordonnancement ;
Phrase ;
Classification hiérarchique ;
Résumé ;
Interrogation base donnée ;
Intelligence artificielle ;
Texte ;
Système information ;
Mots-clés espagnols / Spanish Keywords
Algoritmo aprendizaje ;
Reglamento ;
Frase ;
Clasificación jerarquizada ;
Resumen ;
Interrogación base datos ;
Inteligencia artificial ;
Texto ;
Sistema información ;
Localisation / Location
INIST-CNRS, Cote INIST : 26729, 35400014262316.0040
Nº notice refdoc (ud4) : 17782331