Practical Data Analysis


Book Description

A practical guide to obtaining, transforming, exploring, and analyzing data using Python, MongoDB, and Apache Spark About This Book Learn to use various data analysis tools and algorithms to classify, cluster, visualize, simulate, and forecast your data Apply Machine Learning algorithms to different kinds of data such as social networks, time series, and images A hands-on guide to understanding the nature of data and how to turn it into insight Who This Book Is For This book is for developers who want to implement data analysis and data-driven algorithms in a practical way. It is also suitable for those without a background in data analysis or data processing. Basic knowledge of Python programming, statistics, and linear algebra is assumed. What You Will Learn Acquire, format, and visualize your data Build an image-similarity search engine Generate meaningful visualizations anyone can understand Get started with analyzing social network graphs Find out how to implement sentiment text analysis Install data analysis tools such as Pandas, MongoDB, and Apache Spark Get to grips with Apache Spark Implement machine learning algorithms such as classification or forecasting In Detail Beyond buzzwords like Big Data or Data Science, there are a great opportunities to innovate in many businesses using data analysis to get data-driven products. Data analysis involves asking many questions about data in order to discover insights and generate value for a product or a service. This book explains the basic data algorithms without the theoretical jargon, and you'll get hands-on turning data into insights using machine learning techniques. We will perform data-driven innovation processing for several types of data such as text, Images, social network graphs, documents, and time series, showing you how to implement large data processing with MongoDB and Apache Spark. Style and approach This is a hands-on guide to data analysis and data processing. The concrete examples are explained with simple code and accessible data.




Practical Data Analysis Cookbook


Book Description

Over 60 practical recipes on data exploration and analysis About This Book Clean dirty data, extract accurate information, and explore the relationships between variables Forecast the output of an electric plant and the water flow of American rivers using pandas, NumPy, Statsmodels, and scikit-learn Find and extract the most important features from your dataset using the most efficient Python libraries Who This Book Is For If you are a beginner or intermediate-level professional who is looking to solve your day-to-day, analytical problems with Python, this book is for you. Even with no prior programming and data analytics experience, you will be able to finish each recipe and learn while doing so. What You Will Learn Read, clean, transform, and store your data usng Pandas and OpenRefine Understand your data and explore the relationships between variables using Pandas and D3.js Explore a variety of techniques to classify and cluster outbound marketing campaign calls data of a bank using Pandas, mlpy, NumPy, and Statsmodels Reduce the dimensionality of your dataset and extract the most important features with pandas, NumPy, and mlpy Predict the output of a power plant with regression models and forecast water flow of American rivers with time series methods using pandas, NumPy, Statsmodels, and scikit-learn Explore social interactions and identify fraudulent activities with graph theory concepts using NetworkX and Gephi Scrape Internet web pages using urlib and BeautifulSoup and get to know natural language processing techniques to classify movies ratings using NLTK Study simulation techniques in an example of a gas station with agent-based modeling In Detail Data analysis is the process of systematically applying statistical and logical techniques to describe and illustrate, condense and recap, and evaluate data. Its importance has been most visible in the sector of information and communication technologies. It is an employee asset in almost all economy sectors. This book provides a rich set of independent recipes that dive into the world of data analytics and modeling using a variety of approaches, tools, and algorithms. You will learn the basics of data handling and modeling, and will build your skills gradually toward more advanced topics such as simulations, raw text processing, social interactions analysis, and more. First, you will learn some easy-to-follow practical techniques on how to read, write, clean, reformat, explore, and understand your data—arguably the most time-consuming (and the most important) tasks for any data scientist. In the second section, different independent recipes delve into intermediate topics such as classification, clustering, predicting, and more. With the help of these easy-to-follow recipes, you will also learn techniques that can easily be expanded to solve other real-life problems such as building recommendation engines or predictive models. In the third section, you will explore more advanced topics: from the field of graph theory through natural language processing, discrete choice modeling to simulations. You will also get to expand your knowledge on identifying fraud origin with the help of a graph, scrape Internet websites, and classify movies based on their reviews. By the end of this book, you will be able to efficiently use the vast array of tools that the Python environment has to offer. Style and approach This hands-on recipe guide is divided into three sections that tackle and overcome real-world data modeling problems faced by data analysts/scientist in their everyday work. Each independent recipe is written in an easy-to-follow and step-by-step fashion.




Practical Data Analysis in Chemistry


Book Description

The majority of modern instruments are computerised and provide incredible amounts of data. Methods that take advantage of the flood of data are now available; importantly they do not emulate 'graph paper analyses' on the computer. Modern computational methods are able to give us insights into data, but analysis or data fitting in chemistry requires the quantitative understanding of chemical processes. The results of this analysis allows the modelling and prediction of processes under new conditions, therefore saving on extensive experimentation. Practical Data Analysis in Chemistry exemplifies every aspect of theory applicable to data analysis using a short program in a Matlab or Excel spreadsheet, enabling the reader to study the programs, play with them and observe what happens. Suitable data are generated for each example in short routines, this ensuring a clear understanding of the data structure. Chapter 2 includes a brief introduction to matrix algebra and its implementation in Matlab and Excel while Chapter 3 covers the theory required for the modelling of chemical processes. This is followed by an introduction to linear and non-linear least-squares fitting, each demonstrated with typical applications. Finally Chapter 5 comprises a collection of several methods for model-free data analyses.* Includes a solid introduction to the simulation of equilibrium processes and the simulation of complex kinetic processes.* Provides examples of routines that are easily adapted to the processes investigated by the reader* 'Model-based' analysis (linear and non-linear regression) and 'model-free' analysis are covered




A Practical Guide to Scientific Data Analysis


Book Description

Inspired by the author's need for practical guidance in the processes of data analysis, A Practical Guide to Scientific Data Analysis has been written as a statistical companion for the working scientist. This handbook of data analysis with worked examples focuses on the application of mathematical and statistical techniques and the interpretation of their results. Covering the most common statistical methods for examining and exploring relationships in data, the text includes extensive examples from a variety of scientific disciplines. The chapters are organised logically, from planning an experiment, through examining and displaying the data, to constructing quantitative models. Each chapter is intended to stand alone so that casual users can refer to the section that is most appropriate to their problem. Written by a highly qualified and internationally respected author this text: Presents statistics for the non-statistician Explains a variety of methods to extract information from data Describes the application of statistical methods to the design of “performance chemicals” Emphasises the application of statistical techniques and the interpretation of their results Of practical use to chemists, biochemists, pharmacists, biologists and researchers from many other scientific disciplines in both industry and academia.




Practical Data Analysis for Designed Experiments


Book Description

Placing data in the context of the scientific discovery of knowledge through experimentation, Practical Data Analysis for Designed Experiments examines issues of comparing groups and sorting out factor effects and the consequences of imbalance and nesting, then works through more practical applications of the theory. Written in a modern and accessible manner, this book is a useful blend of theory and methods. Exercises included in the text are based on real experiments and real data.




Practical Statistics for Data Scientists


Book Description

Statistical methods are a key part of of data science, yet very few data scientists have any formal statistics training. Courses and books on basic statistics rarely cover the topic from a data science perspective. This practical guide explains how to apply various statistical methods to data science, tells you how to avoid their misuse, and gives you advice on what's important and what's not. Many data science resources incorporate statistical methods but lack a deeper statistical perspective. If you’re familiar with the R programming language, and have some exposure to statistics, this quick reference bridges the gap in an accessible, readable format. With this book, you’ll learn: Why exploratory data analysis is a key preliminary step in data science How random sampling can reduce bias and yield a higher quality dataset, even with big data How the principles of experimental design yield definitive answers to questions How to use regression to estimate outcomes and detect anomalies Key classification techniques for predicting which categories a record belongs to Statistical machine learning methods that “learn” from data Unsupervised learning methods for extracting meaning from unlabeled data




Practical Data Science with R


Book Description

Summary Practical Data Science with R lives up to its name. It explains basic principles without the theoretical mumbo-jumbo and jumps right to the real use cases you'll face as you collect, curate, and analyze the data crucial to the success of your business. You'll apply the R programming language and statistical analysis techniques to carefully explained examples based in marketing, business intelligence, and decision support. Purchase of the print book includes a free eBook in PDF, Kindle, and ePub formats from Manning Publications. About the Book Business analysts and developers are increasingly collecting, curating, analyzing, and reporting on crucial business data. The R language and its associated tools provide a straightforward way to tackle day-to-day data science tasks without a lot of academic theory or advanced mathematics. Practical Data Science with R shows you how to apply the R programming language and useful statistical techniques to everyday business situations. Using examples from marketing, business intelligence, and decision support, it shows you how to design experiments (such as A/B tests), build predictive models, and present results to audiences of all levels. This book is accessible to readers without a background in data science. Some familiarity with basic statistics, R, or another scripting language is assumed. What's Inside Data science for the business professional Statistical analysis using the R language Project lifecycle, from planning to delivery Numerous instantly familiar use cases Keys to effective data presentations About the Authors Nina Zumel and John Mount are cofounders of a San Francisco-based data science consulting firm. Both hold PhDs from Carnegie Mellon and blog on statistics, probability, and computer science at win-vector.com. Table of Contents PART 1 INTRODUCTION TO DATA SCIENCE The data science process Loading data into R Exploring data Managing data PART 2 MODELING METHODS Choosing and evaluating models Memorization methods Linear and logistic regression Unsupervised methods Exploring advanced methods PART 3 DELIVERING RESULTS Documentation and deployment Producing effective presentations




Qualitative Data Analysis


Book Description

Written by an experienced researcher in the field of qualitative methods, this dynamic new book provides a definitive introduction to analysing qualitative data. It is a clear, accessible and practical guide to each stage of the process, including: - Designing and managing qualitative data for analysis - Working with data through interpretive, comparative, pattern and relational analyses - Developing explanatory theory and coherent conclusions, based on qualitative data. The book pairs theoretical discussion with practical advice using a host of examples from diverse projects across the social sciences. It describes data analysis strategies in actionable steps and helpfully links to the use of computer software where relevant. This is an exciting new addition to the literature on qualitative data analysis and a must-read for anyone who has collected, or is preparing to collect, their own data.




Practical Text Mining and Statistical Analysis for Non-structured Text Data Applications


Book Description

"The world contains an unimaginably vast amount of digital information which is getting ever vaster ever more rapidly. This makes it possible to do many things that previously could not be done: spot business trends, prevent diseases, combat crime and so on. Managed well, the textual data can be used to unlock new sources of economic value, provide fresh insights into science and hold governments to account. As the Internet expands and our natural capacity to process the unstructured text that it contains diminishes, the value of text mining for information retrieval and search will increase dramatically. This comprehensive professional reference brings together all the information, tools and methods a professional will need to efficiently use text mining applications and statistical analysis. The Handbook of Practical Text Mining and Statistical Analysis for Non-structured Text Data Applications presents a comprehensive how- to reference that shows the user how to conduct text mining and statistically analyze results. In addition to providing an in-depth examination of core text mining and link detection tools, methods and operations, the book examines advanced preprocessing techniques, knowledge representation considerations, and visualization approaches. Finally, the book explores current real-world, mission-critical applications of text mining and link detection using real world example tutorials in such varied fields as corporate, finance, business intelligence, genomics research, and counterterrorism activities"--




Entertainment Science


Book Description

The entertainment industry has long been dominated by legendary screenwriter William Goldman’s “Nobody-Knows-Anything” mantra, which argues that success is the result of managerial intuition and instinct. This book builds the case that combining such intuition with data analytics and rigorous scholarly knowledge provides a source of sustainable competitive advantage – the same recipe for success that is behind the rise of firms such as Netflix and Spotify, but has also fueled Disney’s recent success. Unlocking a large repertoire of scientific studies by business scholars and entertainment economists, the authors identify essential factors, mechanisms, and methods that help a new entertainment product succeed. The book thus offers a timely alternative to “Nobody-Knows” decision-making in the digital era: while coupling a good idea with smart data analytics and entertainment theory cannot guarantee a hit, it systematically and substantially increases the probability of success in the entertainment industry. Entertainment Science is poised to inspire fresh new thinking among managers, students of entertainment, and scholars alike. Thorsten Hennig-Thurau and Mark B. Houston – two of our finest scholars in the area of entertainment marketing – have produced a definitive research-based compendium that cuts across various branches of the arts to explain the phenomena that provide consumption experiences to capture the hearts and minds of audiences. Morris B. Holbrook, W. T. Dillard Professor Emeritus of Marketing, Columbia University Entertainment Science is a must-read for everyone working in the entertainment industry today, where the impact of digital and the use of big data can’t be ignored anymore. Hennig-Thurau and Houston are the scientific frontrunners of knowledge that the industry urgently needs. Michael Kölmel, media entrepreneur and Honorary Professor of Media Economics at University of Leipzig Entertainment Science’s winning combination of creativity, theory, and data analytics offers managers in the creative industries and beyond a novel, compelling, and comprehensive approach to support their decision-making. This ground-breaking book marks the dawn of a new Golden Age of fruitful conversation between entertainment scholars, managers, and artists. Allègre Hadida, Associate Professor in Strategy, University of Cambridge