Unsupervised Information Extraction by Text Segmentation


Book Description

A new unsupervised approach to the problem of Information Extraction by Text Segmentation (IETS) is proposed, implemented and evaluated herein. The authors’ approach relies on information available on pre-existing data to learn how to associate segments in the input string with attributes of a given domain relying on a very effective set of content-based features. The effectiveness of the content-based features is also exploited to directly learn from test data structure-based features, with no previous human-driven training, a feature unique to the presented approach. Based on the approach, a number of results are produced to address the IETS problem in an unsupervised fashion. In particular, the authors develop, implement and evaluate distinct IETS methods, namely ONDUX, JUDIE and iForm. ONDUX (On Demand Unsupervised Information Extraction) is an unsupervised probabilistic approach for IETS that relies on content-based features to bootstrap the learning of structure-based features. JUDIE (Joint Unsupervised Structure Discovery and Information Extraction) aims at automatically extracting several semi-structured data records in the form of continuous text and having no explicit delimiters between them. In comparison with other IETS methods, including ONDUX, JUDIE faces a task considerably harder that is, extracting information while simultaneously uncovering the underlying structure of the implicit records containing it. iForm applies the authors’ approach to the task of Web form filling. It aims at extracting segments from a data-rich text given as input and associating these segments with fields from a target Web form. All of these methods were evaluated considering different experimental datasets, which are used to perform a large set of experiments in order to validate the presented approach and methods. These experiments indicate that the proposed approach yields high quality results when compared to state-of-the-art approaches and that it is able to properly support IETS methods in a number of real applications. The findings will prove valuable to practitioners in helping them to understand the current state-of-the-art in unsupervised information extraction techniques, as well as to graduate and undergraduate students of web data management.




Mining Text Data


Book Description

Text mining applications have experienced tremendous advances because of web 2.0 and social networking applications. Recent advances in hardware and software technology have lead to a number of unique scenarios where text mining algorithms are learned. Mining Text Data introduces an important niche in the text analytics field, and is an edited volume contributed by leading international researchers and practitioners focused on social networks & data mining. This book contains a wide swath in topics across social networks & data mining. Each chapter contains a comprehensive survey including the key research content on the topic, and the future directions of research in the field. There is a special focus on Text Embedded with Heterogeneous and Multimedia Data which makes the mining process much more challenging. A number of methods have been designed such as transfer learning and cross-lingual mining for such cases. Mining Text Data simplifies the content, so that advanced-level students, practitioners and researchers in computer science can benefit from this book. Academic and corporate libraries, as well as ACM, IEEE, and Management Science focused on information security, electronic commerce, databases, data mining, machine learning, and statistics are the primary buyers for this reference book.




Machine Learning for Text


Book Description

This second edition textbook covers a coherently organized framework for text analytics, which integrates material drawn from the intersecting topics of information retrieval, machine learning, and natural language processing. Particular importance is placed on deep learning methods. The chapters of this book span three broad categories:1. Basic algorithms: Chapters 1 through 7 discuss the classical algorithms for text analytics such as preprocessing, similarity computation, topic modeling, matrix factorization, clustering, classification, regression, and ensemble analysis. 2. Domain-sensitive learning and information retrieval: Chapters 8 and 9 discuss learning models in heterogeneous settings such as a combination of text with multimedia or Web links. The problem of information retrieval and Web search is also discussed in the context of its relationship with ranking and machine learning methods. 3. Natural language processing: Chapters 10 through 16 discuss various sequence-centric and natural language applications, such as feature engineering, neural language models, deep learning, transformers, pre-trained language models, text summarization, information extraction, knowledge graphs, question answering, opinion mining, text segmentation, and event detection. Compared to the first edition, this second edition textbook (which targets mostly advanced level students majoring in computer science and math) has substantially more material on deep learning and natural language processing. Significant focus is placed on topics like transformers, pre-trained language models, knowledge graphs, and question answering.




Machine Learning for Text


Book Description

Text analytics is a field that lies on the interface of information retrieval,machine learning, and natural language processing, and this textbook carefully covers a coherently organized framework drawn from these intersecting topics. The chapters of this textbook is organized into three categories: - Basic algorithms: Chapters 1 through 7 discuss the classical algorithms for machine learning from text such as preprocessing, similarity computation, topic modeling, matrix factorization, clustering, classification, regression, and ensemble analysis. - Domain-sensitive mining: Chapters 8 and 9 discuss the learning methods from text when combined with different domains such as multimedia and the Web. The problem of information retrieval and Web search is also discussed in the context of its relationship with ranking and machine learning methods. - Sequence-centric mining: Chapters 10 through 14 discuss various sequence-centric and natural language applications, such as feature engineering, neural language models, deep learning, text summarization, information extraction, opinion mining, text segmentation, and event detection. This textbook covers machine learning topics for text in detail. Since the coverage is extensive,multiple courses can be offered from the same book, depending on course level. Even though the presentation is text-centric, Chapters 3 to 7 cover machine learning algorithms that are often used indomains beyond text data. Therefore, the book can be used to offer courses not just in text analytics but also from the broader perspective of machine learning (with text as a backdrop). This textbook targets graduate students in computer science, as well as researchers, professors, and industrial practitioners working in these related fields. This textbook is accompanied with a solution manual for classroom teaching.




SAS Text Analytics for Business Applications


Book Description

Extract actionable insights from text and unstructured data. Information extraction is the task of automatically extracting structured information from unstructured or semi-structured text. SAS Text Analytics for Business Applications: Concept Rules for Information Extraction Models focuses on this key element of natural language processing (NLP) and provides real-world guidance on the effective application of text analytics. Using scenarios and data based on business cases across many different domains and industries, the book includes many helpful tips and best practices from SAS text analytics experts to ensure fast, valuable insight from your textual data. Written for a broad audience of beginning, intermediate, and advanced users of SAS text analytics products, including SAS Visual Text Analytics, SAS Contextual Analysis, and SAS Enterprise Content Categorization, this book provides a solid technical reference. You will learn the SAS information extraction toolkit, broaden your knowledge of rule-based methods, and answer new business questions. As your practical experience grows, this book will serve as a reference to deepen your expertise.




Natural Language Processing – IJCNLP 2005


Book Description

This book constitutes the thoroughly refereed proceedings of the Second International Joint Conference on Natural Language Processing, IJCNLP 2005, held in Jeju Island, Korea in October 2005. The 88 revised full papers presented in this volume were carefully reviewed and selected from 289 submissions. The papers are organized in topical sections on information retrieval, corpus-based parsing, Web mining, rule-based parsing, disambiguation, text mining, document analysis, ontology and thesaurus, relation extraction, text classification, transliteration, machine translation, question answering, morphological analysis, text summarization, named entity recognition, linguistic resources and tools, discourse analysis, semantic analysis NLP applications, tagging, language models, spoken language, and terminology mining.




Knowledge Engineering: Practice and Patterns


Book Description

This book constitutes the refereed proceedings of the 16th International Conference on Knowledge Engineering and Knowledge Management, EKAW 2008, held in Acitrezza, Sicily, Italy, in September/October 2008. The 17 revised full papers and 15 revised short papers presented together with 3 invited talks were carefully reviewed and selected from 102 submissions. The papers are organized in topical sections on knowledge patterns and knowledge representation, matching ontologies and data integration, natural language, knowledge acquisition and annotations, search, query and interaction, as well as ontologies.




Computational Linguistics and Intelligent Text Processing


Book Description

The two-volume set LNCS 13451 and 13452 constitutes revised selected papers from the CICLing 2019 conference which took place in La Rochelle, France, April 2019. The total of 95 papers presented in the two volumes was carefully reviewed and selected from 335 submissions. The book also contains 3 invited papers. The papers are organized in the following topical sections: General, Information extraction, Information retrieval, Language modeling, Lexical resources, Machine translation, Morphology, sintax, parsing, Name entity recognition, Semantics and text similarity, Sentiment analysis, Speech processing, Text categorization, Text generation, and Text mining.




Data Mining


Book Description

Data Mining: Opportunities and Challenges presents an overview of the state of the art approaches in this new and multidisciplinary field of data mining. The primary objective of this book is to explore the myriad issues regarding data mining, specifically focusing on those areas that explore new methodologies or examine case studies. This book contains numerous chapters written by an international team of forty-four experts representing leading scientists and talented young scholars from seven different countries.




Graph Learning and Network Science for Natural Language Processing


Book Description

Advances in graph-based natural language processing (NLP) and information retrieval tasks have shown the importance of processing using the Graph of Words method. This book covers recent concrete information, from the basics to advanced level, about graph-based learning, such as neural network-based approaches, computational intelligence for learning parameters and feature reduction, and network science for graph-based NPL. It also contains information about language generation based on graphical theories and language models. Features: Presents a comprehensive study of the interdisciplinary graphical approach to NLP Covers recent computational intelligence techniques for graph-based neural network models Discusses advances in random walk-based techniques, semantic webs, and lexical networks Explores recent research into NLP for graph-based streaming data Reviews advances in knowledge graph embedding and ontologies for NLP approaches This book is aimed at researchers and graduate students in computer science, natural language processing, and deep and machine learning.