Expérimentations et évaluations en fouille de textes : Un panorama des campagnes DEFT


Book Description

La fouille de textes est une activité combinant traitements informatiques et données linguistiques avec comme objectif principal l’extraction et l’organisation automatique des informations présentes dans les textes. Deux familles de méthodes permettent d’atteindre ce but : celles à base de connaissances d’experts et celles reposant sur un apprentissage automatique supervisé. Une campagne d’évaluation consiste à confronter les systèmes développés par plusieurs équipes sur un même jeu de données et en un temps limité. Créé en 2005 à l’image des campagnes anglo-saxonnes, le défi fouille de textes (DEFT) est aujourd’hui la seule campagne d’évaluation francophone en fouille de textes. Cet ouvrage rassemble les méthodes utilisées lors des différentes éditions du défi. Les thématiques relèvent de la classification de documents en genres et thèmes, de la fouille d’opinions et de l’identification de la période de parution d’un document.




European Language Grid


Book Description

This open access book provides an in-depth description of the EU project European Language Grid (ELG). Its motivation lies in the fact that Europe is a multilingual society with 24 official European Union Member State languages and dozens of additional languages including regional and minority languages. The only meaningful way to enable multilingualism and to benefit from this rich linguistic heritage is through Language Technologies (LT) including Natural Language Processing (NLP), Natural Language Understanding (NLU), Speech Technologies and language-centric Artificial Intelligence (AI) applications. The European Language Grid provides a single umbrella platform for the European LT community, including research and industry, effectively functioning as a virtual home, marketplace, showroom, and deployment centre for all services, tools, resources, products and organisations active in the field. Today the ELG cloud platform already offers access to more than 13,000 language processing tools and language resources. It enables all stakeholders to deposit, upload and deploy their technologies and datasets. The platform also supports the long-term objective of establishing digital language equality in Europe by 2030 – to create a situation in which all European languages enjoy equal technological support. This is the very first book dedicated to Language Technology and NLP platforms. Cloud technology has only recently matured enough to make the development of a platform like ELG feasible on a larger scale. The book comprehensively describes the results of the ELG project. Following an introduction, the content is divided into four main parts: (I) ELG Cloud Platform; (II) ELG Inventory of Technologies and Resources; (III) ELG Community and Initiative; and (IV) ELG Open Calls and Pilot Projects.




String Processing and Information Retrieval


Book Description

This book constitutes the proceedings of the 18th International Symposium on String Processing and Information Retrieval, SPIRE 2011, held in Pisa, Italy, in October 2011. The 30 long and 10 short papers together with 1 keynote presented were carefully reviewed and selected from 102 submissions. The papers are structured in topical sections on introduction to web retrieval, sequence learning, computational geography, space-efficient data structures, algorithmic analysis of biological data, compression, text and algorithms.




New Trends in Database and Information Systems


Book Description

This book constitutes the proceedings of the 26th European Conference on Advances in Databases and Information Systems, ADBIS 2022, held in Turin, Italy, in September 2022. The 29 short papers presented were carefully reviewed and selected from 90 submissions. The selected short papers are organized in the following sections: data understanding, modeling and visualization; fairness in data processing; data management pipeline, information and process retrieval; data access optimization; data pre-processing and cleaning; data science and machine learning. Further, papers from the following workshops and satellite events are provided in the volume: DOING: 3rd Workshop on Intelligent Data – From Data to Knowledge; K-GALS: 1st Workshop on Knowledge Graphs Analysis on a Large Scale; MADEISD: 4th Workshop on Modern Approaches in Data Engineering and Information System Design; MegaData: 2nd Workshop on Advanced Data Systems Management, Engineering, and Analytics; SWODCH: 2nd Workshop on Semantic Web and Ontology Design for Cultural Heritage; Doctoral Consortium.




Machine Learning: ECML 2003


Book Description

This book constitutes the refereed proceedings of the 14th European Conference on Machine Learning, ECML 2003, held in Cavtat-Dubrovnik, Croatia in September 2003 in conjunction with PKDD 2003. The 40 revised full papers presented together with 4 invited contributions were carefully reviewed and, together with another 40 ones for PKDD 2003, selected from a total of 332 submissions. The papers address all current issues in machine learning including support vector machine, inductive inference, feature selection algorithms, reinforcement learning, preference learning, probabilistic grammatical inference, decision tree learning, clustering, classification, agent learning, Markov networks, boosting, statistical parsing, Bayesian learning, supervised learning, and multi-instance learning.




Advances in Information Retrieval


Book Description

This two-volume set LNCS 12035 and 12036 constitutes the refereed proceedings of the 42nd European Conference on IR Research, ECIR 2020, held in Lisbon, Portugal, in April 2020.* The 55 full papers presented together with 8 reproducibility papers, 46 short papers, 10 demonstration papers, 12 invited CLEF papers, 7 doctoral consortium papers, 4 workshop papers, and 3 tutorials were carefully reviewed and selected from 457 submissions. They were organized in topical sections named: Part I: deep learning I; entities; evaluation; recommendation; information extraction; deep learning II; retrieval; multimedia; deep learning III; queries; IR – general; question answering, prediction, and bias; and deep learning IV. Part II: reproducibility papers; short papers; demonstration papers; CLEF organizers lab track; doctoral consortium papers; workshops; and tutorials. *Due to the COVID-19 pandemic, this conference was held virtually.




Text Mining


Book Description

Text Mining: Applications and Theory presents the state-of-the-art algorithms for text mining from both the academic and industrial perspectives. The contributors span several countries and scientific domains: universities, industrial corporations, and government laboratories, and demonstrate the use of techniques from machine learning, knowledge discovery, natural language processing and information retrieval to design computational models for automated text analysis and mining. This volume demonstrates how advancements in the fields of applied mathematics, computer science, machine learning, and natural language processing can collectively capture, classify, and interpret words and their contexts. As suggested in the preface, text mining is needed when “words are not enough.” This book: Provides state-of-the-art algorithms and techniques for critical tasks in text mining applications, such as clustering, classification, anomaly and trend detection, and stream analysis. Presents a survey of text visualization techniques and looks at the multilingual text classification problem. Discusses the issue of cybercrime associated with chatrooms. Features advances in visual analytics and machine learning along with illustrative examples. Is accompanied by a supporting website featuring datasets. Applied mathematicians, statisticians, practitioners and students in computer science, bioinformatics and engineering will find this book extremely useful.




Semantic Keyword-Based Search on Structured Data Sources


Book Description

This book constitutes the thoroughly refereed post-conference proceedings of the Second COST Action IC1302 International KEYSTONE Conference on Semantic Keyword-Based Search on Structured Data Sources, IKC 2016, held in Cluj-Napoca, Romania, in September 2016. The 15 revised full papers and 2 invited papers are reviewed and selected from 18 initial submissions and cover the areas of keyword extraction, natural language searches, graph databases, information retrieval techniques for keyword search and document retrieval.




Term Variation in Specialised Corpora


Book Description

This book addresses term variation which has been a very important topic in terminology, computational terminology and natural language processing for up to twenty years. This book presents the first complete inventory of term variants and the linguistic procedures that lead to their formation. It also takes into account issues raised by multilingual applications and presents ways to detect variants in five different languages: French, English, German, Spanish and Russian. The book provides insights into the following issues: What is a variant? What are the main linguistic mechanisms involved in the transformation of base terms into variants? How can variants be automatically detected in texts? Should variation be taken into account in natural language processing applications? This book is targeted at terminologists and linguists interested in term variation as well as researchers in natural language processing and computer science that must handle term variants in different kinds of applications.




Advances in Artificial Intelligence


Book Description

The 18th conference of the Canadian Society for the Computational Study of Intelligence (CSCSI) continued the success of its predecessors. This set of - pers re?ects the diversity of the Canadian AI community and its international partners. AI 2005 attracted 135 high-quality submissions: 64 from Canada and 71 from around the world. Of these, eight were written in French. All submitted papers were thoroughly reviewed by at least three members of the Program Committee. A total of 30 contributions, accepted as long papers, and 19 as short papers are included in this volume. We invited three distinguished researchers to give talks about their current research interests: Eric Brill from Microsoft Research, Craig Boutilier from the University of Toronto, and Henry Krautz from the University of Washington. The organization of such a successful conference bene?ted from the coll- oration of many individuals. Foremost, we would like to express our apprec- tion to the Program Committee members and external referees, who provided timely and signi?cant reviews. To manage the submission and reviewing process we used the Paperdyne system, which was developed by Dirk Peters. We owe special thanks to Kellogg Booth and Tricia d’Entremont for handling the local arrangementsandregistration.WealsothankBruceSpencerandmembersofthe CSCSI executive for all their e?orts in making AI 2005 a successful conference.