Synopses for Massive Data


Book Description

Describes basic principles and recent developments in approximate query processing. It focuses on four key synopses: random samples, histograms, wavelets, and sketches. It considers issues such as accuracy, space and time efficiency, optimality, practicality, range of applicability, error bounds on query answers, and incremental maintenance.




Python for Finance


Book Description

The financial industry has recently adopted Python at a tremendous rate, with some of the largest investment banks and hedge funds using it to build core trading and risk management systems. Updated for Python 3, the second edition of this hands-on book helps you get started with the language, guiding developers and quantitative analysts through Python libraries and tools for building financial applications and interactive financial analytics. Using practical examples throughout the book, author Yves Hilpisch also shows you how to develop a full-fledged framework for Monte Carlo simulation-based derivatives and risk analytics, based on a large, realistic case study. Much of the book uses interactive IPython Notebooks.




Using R for Introductory Statistics


Book Description

The second edition of a bestselling textbook, Using R for Introductory Statistics guides students through the basics of R, helping them overcome the sometimes steep learning curve. The author does this by breaking the material down into small, task-oriented steps. The second edition maintains the features that made the first edition so popular, while updating data, examples, and changes to R in line with the current version. See What’s New in the Second Edition: Increased emphasis on more idiomatic R provides a grounding in the functionality of base R. Discussions of the use of RStudio helps new R users avoid as many pitfalls as possible. Use of knitr package makes code easier to read and therefore easier to reason about. Additional information on computer-intensive approaches motivates the traditional approach. Updated examples and data make the information current and topical. The book has an accompanying package, UsingR, available from CRAN, R’s repository of user-contributed packages. The package contains the data sets mentioned in the text (data(package="UsingR")), answers to selected problems (answers()), a few demonstrations (demo()), the errata (errata()), and sample code from the text. The topics of this text line up closely with traditional teaching progression; however, the book also highlights computer-intensive approaches to motivate the more traditional approach. The authors emphasize realistic data and examples and rely on visualization techniques to gather insight. They introduce statistics and R seamlessly, giving students the tools they need to use R and the information they need to navigate the sometimes complex world of statistical computing.




Real-Time Analytics


Book Description

Construct a robust end-to-end solution for analyzing and visualizing streaming data Real-time analytics is the hottest topic in data analytics today. In Real-Time Analytics: Techniques to Analyze and Visualize Streaming Data, expert Byron Ellis teaches data analysts technologies to build an effective real-time analytics platform. This platform can then be used to make sense of the constantly changing data that is beginning to outpace traditional batch-based analysis platforms. The author is among a very few leading experts in the field. He has a prestigious background in research, development, analytics, real-time visualization, and Big Data streaming and is uniquely qualified to help you explore this revolutionary field. Moving from a description of the overall analytic architecture of real-time analytics to using specific tools to obtain targeted results, Real-Time Analytics leverages open source and modern commercial tools to construct robust, efficient systems that can provide real-time analysis in a cost-effective manner. The book includes: A deep discussion of streaming data systems and architectures Instructions for analyzing, storing, and delivering streaming data Tips on aggregating data and working with sets Information on data warehousing options and techniques Real-Time Analytics includes in-depth case studies for website analytics, Big Data, visualizing streaming and mobile data, and mining and visualizing operational data flows. The book's "recipe" layout lets readers quickly learn and implement different techniques. All of the code examples presented in the book, along with their related data sets, are available on the companion website.




Gnuplot in Action


Book Description

Summary Gnuplot in Action, Second Edition is a major revision of this popular and authoritative guide for developers, engineers, and scientists who want to learn and use gnuplot effectively. Fully updated for gnuplot version 5, the book includes four pages of color illustrations and four bonus appendixes available in the eBook. Purchase of the print book includes a free eBook in PDF, Kindle, and ePub formats from Manning Publications. About the Technology Gnuplot is an open-source graphics program that helps you analyze, interpret, and present numerical data. Available for Unix, Mac, and Windows, it is well-maintained, mature, and totally free. About the Book Gnuplot in Action, Second Edition is a major revision of this authoritative guide for developers, engineers, and scientists. The book starts with a tutorial introduction, followed by a systematic overview of gnuplot's core features and full coverage of gnuplot's advanced capabilities. Experienced readers will appreciate the discussion of gnuplot 5's features, including new plot types, improved text and color handling, and support for interactive, web-based display formats. The book concludes with chapters on graphical effects and general techniques for understanding data with graphs. It includes four pages of color illustrations. 3D graphics, false-color plots, heatmaps, and multivariate visualizations are covered in chapter-length appendixes available in the eBook. What's Inside Creating different types of graphs in detail Animations, scripting, batch operations Extensive discussion of terminals Updated to cover gnuplot version 5 About the Reader No prior experience with gnuplot is required. This book concentrates on practical applications of gnuplot relevant to users of all levels. About the Author Philipp K. Janert, PhD, is a programmer and scientist. He is the author of several books on data analysis and applied math and has been a gnuplot power user and developer for over 20 years. Table of Contents PART 1 GETTING STARTED Prelude: understanding data with gnuplot Tutorial: essential gnuplot The heart of the matter: the plot command PART 2 CREATING GRAPHS Managing data sets and files Practical matters: strings, loops, and history A catalog of styles Decorations: labels, arrows, and explanations All about axes PART 3 MASTERING TECHNICALITIES Color, style, and appearance Terminals and output formats Automation, scripting, and animation Beyond the defaults: workflow and styles PART 4 UNDERSTANDING DATA Basic techniques of graphical analysis Topics in graphical analysis Coda: understanding data with graphs




Data Mining: Concepts and Techniques


Book Description

Data Mining: Concepts and Techniques provides the concepts and techniques in processing gathered data or information, which will be used in various applications. Specifically, it explains data mining and the tools used in discovering knowledge from the collected data. This book is referred as the knowledge discovery from data (KDD). It focuses on the feasibility, usefulness, effectiveness, and scalability of techniques of large data sets. After describing data mining, this edition explains the methods of knowing, preprocessing, processing, and warehousing data. It then presents information about data warehouses, online analytical processing (OLAP), and data cube technology. Then, the methods involved in mining frequent patterns, associations, and correlations for large data sets are described. The book details the methods for data classification and introduces the concepts and methods for data clustering. The remaining chapters discuss the outlier detection and the trends, applications, and research frontiers in data mining. This book is intended for Computer Science students, application developers, business professionals, and researchers who seek information on data mining. - Presents dozens of algorithms and implementation examples, all in pseudo-code and suitable for use in real-world, large-scale data mining projects - Addresses advanced topics such as mining object-relational databases, spatial databases, multimedia databases, time-series databases, text databases, the World Wide Web, and applications in several fields - Provides a comprehensive, practical look at the concepts and techniques you need to get the most out of your data




High Performance Visualization


Book Description

Visualization and analysis tools, techniques, and algorithms have undergone a rapid evolution in recent decades to accommodate explosive growth in data size and complexity and to exploit emerging multi- and many-core computational platforms. High Performance Visualization: Enabling Extreme-Scale Scientific Insight focuses on the subset of scientifi




Extracting Spatial Information from Historical Maps


Book Description

Historical maps are fascinating documents and a valuable source of information for scientists of various disciplines. Many of these maps are available as scanned bitmap images, but in order to make them searchable in useful ways, a structured representation of the contained information is desirable. This book deals with the extraction of spatial information from historical maps. This cannot be expected to be solved fully automatically (since it involves difficult semantics), but is also too tedious to be done manually at scale. The methodology used in this book combines the strengths of both computers and humans: it describes efficient algorithms to largely automate information extraction tasks and pairs these algorithms with smart user interactions to handle what is not understood by the algorithm. The effectiveness of this approach is shown for various kinds of spatial documents from the 16th to the early 20th century.




Practical SQL, 2nd Edition


Book Description

Analyze data like a pro, even if you’re a beginner. Practical SQL is an approachable and fast-paced guide to SQL (Structured Query Language), the standard programming language for defining, organizing, and exploring data in relational databases. Anthony DeBarros, a journalist and data analyst, focuses on using SQL to find the story within your data. The examples and code use the open-source database PostgreSQL and its companion pgAdmin interface, and the concepts you learn will apply to most database management systems, including MySQL, Oracle, SQLite, and others.* You’ll first cover the fundamentals of databases and the SQL language, then build skills by analyzing data from real-world datasets such as US Census demographics, New York City taxi rides, and earthquakes from US Geological Survey. Each chapter includes exercises and examples that teach even those who have never programmed before all the tools necessary to build powerful databases and access information quickly and efficiently. You’ll learn how to: Create databases and related tables using your own data Aggregate, sort, and filter data to find patterns Use functions for basic math and advanced statistical operations Identify errors in data and clean them up Analyze spatial data with a geographic information system (PostGIS) Create advanced queries and automate tasks This updated second edition has been thoroughly revised to reflect the latest in SQL features, including additional advanced query techniques for wrangling data. This edition also has two new chapters: an expanded set of instructions on for setting up your system plus a chapter on using PostgreSQL with the popular JSON data interchange format. Learning SQL doesn’t have to be dry and complicated. Practical SQL delivers clear examples with an easy-to-follow approach to teach you the tools you need to build and manage your own databases. * Microsoft SQL Server employs a variant of the language called T-SQL, which is not covered by Practical SQL.




Interactive Visual Data Analysis


Book Description

In the age of big data, being able to make sense of data is an important key to success. Interactive Visual Data Analysis advocates the synthesis of visualization, interaction, and automatic computation to facilitate insight generation and knowledge crystallization from large and complex data. The book provides a systematic and comprehensive overview of visual, interactive, and analytical methods. It introduces criteria for designing interactive visual data analysis solutions, discusses factors influencing the design, and examines the involved processes. The reader is made familiar with the basics of visual encoding and gets to know numerous visualization techniques for multivariate data, temporal data, geo-spatial data, and graph data. A dedicated chapter introduces general concepts for interacting with visualizations and illustrates how modern interaction technology can facilitate the visual data analysis in many ways. Addressing today’s large and complex data, the book covers relevant automatic analytical computations to support the visual data analysis. The book also sheds light on advanced concepts for visualization in multi-display environments, user guidance during the data analysis, and progressive visual data analysis. The authors present a top-down perspective on interactive visual data analysis with a focus on concise and clean terminology. Many real-world examples and rich illustrations make the book accessible to a broad interdisciplinary audience from students, to experts in the field, to practitioners in data-intensive application domains. Features: Dedicated to the synthesis of visual, interactive, and analysis methods Systematic top-down view on visualization, interaction, and automatic analysis Broad coverage of fundamental and advanced visualization techniques Comprehensive chapter on interacting with visual representations Extensive integration of automatic computational methods Accessible portrayal of cutting-edge visual analytics technology Foreword by Jack van Wijk For more information, you can also visit the author website, where the book's figures are made available under the CC BY Open Access license.