Evaluation of Cross-Language Information Retrieval Systems


Book Description

The second evaluation campaign of the Cross Language Evaluation Forum (CLEF) for European languages was held from January to September 2001. This campaign proved a great success, and showed an increase in participation of around 70% com pared with CLEF 2000. It culminated in a two day workshop in Darmstadt, Germany, 3–4 September, in conjunction with the 5th European Conference on Digital Libraries (ECDL 2001). On the first day of the workshop, the results of the CLEF 2001 evalua tion campaign were reported and discussed in paper and poster sessions. The second day focused on the current needs of cross language systems and how evaluation cam paigns in the future can best be designed to stimulate progress. The workshop was attended by nearly 50 researchers and system developers from both academia and in dustry. It provided an important opportunity for researchers working in the same area to get together and exchange ideas and experiences. Copies of all the presentations are available on the CLEF web site at http://www. clef campaign. org. This volume con tains thoroughly revised and expanded versions of the papers presented at the work shop and provides an exhaustive record of the CLEF 2001 campaign. CLEF 2001 was conducted as an activity of the DELOS Network of Excellence for Digital Libraries, funded by the EC Information Society Technologies program to further research in digital library technologies. The activity was organized in collabo ration with the US National Institute of Standards and Technology (NIST).




Language Modeling for Information Retrieval


Book Description

A statisticallanguage model, or more simply a language model, is a prob abilistic mechanism for generating text. Such adefinition is general enough to include an endless variety of schemes. However, a distinction should be made between generative models, which can in principle be used to synthesize artificial text, and discriminative techniques to classify text into predefined cat egories. The first statisticallanguage modeler was Claude Shannon. In exploring the application of his newly founded theory of information to human language, Shannon considered language as a statistical source, and measured how weH simple n-gram models predicted or, equivalently, compressed natural text. To do this, he estimated the entropy of English through experiments with human subjects, and also estimated the cross-entropy of the n-gram models on natural 1 text. The ability of language models to be quantitatively evaluated in tbis way is one of their important virtues. Of course, estimating the true entropy of language is an elusive goal, aiming at many moving targets, since language is so varied and evolves so quickly. Yet fifty years after Shannon's study, language models remain, by all measures, far from the Shannon entropy liInit in terms of their predictive power. However, tbis has not kept them from being useful for a variety of text processing tasks, and moreover can be viewed as encouragement that there is still great room for improvement in statisticallanguage modeling.