The Data Wrangler's Handbook


Book Description

"Like all organizations, libraries are generating more data than ever before and are keen to use it. Data manipulation and analysis is far easier than most people imagine. This book demystifies the process of working with data, familiarizing readers with a small number of simple tools, and easily digestible but powerful concepts. Using tools that come with desktop computers, readers will learn to extract, manipulate, and analyze data (and metadata) of any size and complexity. Kyle Banerjee, experienced author of in data and digital library topics, is determined to take the fear out of the command line. This book will be useful to librarians developing their skills, introducing concepts and tools gradually. Starter topics, most of which can be accomplished with a single-word command, will include: -how to use the output of one program as input for another -redirecting the results of that to any file or program -sorting files of any size by any criteria -identifying duplicates - listing the number of occurrences for each entry As readers develop a firm grasp of the fundamentals, they will learn progressively more sophisticated tasks such as comparing files, converting data from one format to another, reformatting values (e.g. converting inconsistent dates to a consistent format), combining data from multiple files, and communicating with APIs (Application Programming Interfaces) built into their systems. Each chapter with more examples that power users might appreciate, but others can skip over without impeding their ability to understand anything else in the book. Table of Contents 1. Introduction 2. Getting started 3. Directing output - making programs and files work with each other 4. Regular expressions -- the Swiss Army knife of data 5. Understanding data formats Model, namespaces, and validation 6. Application Programming Interfaces (APIs) - talk to programs across the Web 7. Putting it all together 8. More advanced topics 9. One line solutions for common library tasks 10. Command reference 11. Glossary"--




The Data Wrangler's Handbook


Book Description

Data manipulation and analysis are far easier than you might imagine—in fact, using tools that come standard with your desktop computer, you can learn how to extract, manipulate, and analyze data (and metadata) of any size and complexity.




Data Wrangling with Python


Book Description

How do you take your data analysis skills beyond Excel to the next level? By learning just enough Python to get stuff done. This hands-on guide shows non-programmers like you how to process information that’s initially too messy or difficult to access. You don't need to know a thing about the Python programming language to get started. Through various step-by-step exercises, you’ll learn how to acquire, clean, analyze, and present data efficiently. You’ll also discover how to automate your data process, schedule file- editing and clean-up tasks, process larger datasets, and create compelling stories with data you obtain. Quickly learn basic Python syntax, data types, and language concepts Work with both machine-readable and human-consumable data Scrape websites and APIs to find a bounty of useful information Clean and format data to eliminate duplicates and errors in your datasets Learn when to standardize data and when to test and script data cleanup Explore and analyze your datasets with new Python libraries and techniques Use Python solutions to automate your entire data-wrangling process




Principles of Data Wrangling


Book Description

A key task that any aspiring data-driven organization needs to learn is data wrangling, the process of converting raw data into something truly useful. This practical guide provides business analysts with an overview of various data wrangling techniques and tools, and puts the practice of data wrangling into context by asking, "What are you trying to do and why?" Wrangling data consumes roughly 50-80% of an analyst’s time before any kind of analysis is possible. Written by key executives at Trifacta, this book walks you through the wrangling process by exploring several factors—time, granularity, scope, and structure—that you need to consider as you begin to work with data. You’ll learn a shared language and a comprehensive understanding of data wrangling, with an emphasis on recent agile analytic processes used by many of today’s data-driven organizations. Appreciate the importance—and the satisfaction—of wrangling data the right way. Understand what kind of data is available Choose which data to use and at what level of detail Meaningfully combine multiple sources of data Decide how to distill the results to a size and shape that can drive downstream analysis




Data Wrangling with R


Book Description

This guide for practicing statisticians, data scientists, and R users and programmers will teach the essentials of preprocessing: data leveraging the R programming language to easily and quickly turn noisy data into usable pieces of information. Data wrangling, which is also commonly referred to as data munging, transformation, manipulation, janitor work, etc., can be a painstakingly laborious process. Roughly 80% of data analysis is spent on cleaning and preparing data; however, being a prerequisite to the rest of the data analysis workflow (visualization, analysis, reporting), it is essential that one become fluent and efficient in data wrangling techniques. This book will guide the user through the data wrangling process via a step-by-step tutorial approach and provide a solid foundation for working with data in R. The author's goal is to teach the user how to easily wrangle data in order to spend more time on understanding the content of the data. By the end of the book, the user will have learned: How to work with different types of data such as numerics, characters, regular expressions, factors, and dates The difference between different data structures and how to create, add additional components to, and subset each data structure How to acquire and parse data from locations previously inaccessible How to develop functions and use loop control structures to reduce code redundancy How to use pipe operators to simplify code and make it more readable How to reshape the layout of data and manipulate, summarize, and join data sets




The VES Handbook of Visual Effects


Book Description

Wisdom from the best and the brightest in the industry, this visual effects bible belongs on the shelf of anyone working in or aspiring to work in VFX. The book covers techniques and solutions all VFX artists/producers/supervisors need to know, from breaking down a script and initial bidding, to digital character creation and compositing of both live-action and CG elements. In-depth lessons on stereoscopic moviemaking, color management and digital intermediates are included, as well as chapters on interactive games and full animation authored by artists from EA and Dreamworks respectively. From predproduction to acquisition to postproduction, every aspect of the VFX production workflow is given prominent coverage. VFX legends such as John Knoll, Mike Fink, and John Erland provide you with invaluable insight and lessons from the set, equipping you with everything you need to know about the entire visual effects workflow. Simply a must-have book for anyone working in or wanting to work in the VFX industry.




The VES Handbook of Visual Effects


Book Description

The award-winning VES Handbook of Visual Effects remains the most complete guide to visual effects techniques and best practices available today. This new edition has been updated to include the latest, industry-standard techniques, technologies, and workflows for the ever-evolving fast paced world of visual effects. The Visual Effects Society (VES) tasked the original authors to update their areas of expertise, such as AR/VR Moviemaking, Color Management, Cameras, VFX Editorial, Stereoscopic and the Digital Intermediate, as well as provide detailed chapters on interactive games and full animation. Additionally, 56 contributors share their best methods, tips, tricks, and shortcuts developed through decades of trial and error and real-world, hands-on experience. This third edition has been expanded to feature lessons on 2.5D/3D Compositing; 3D Scanning; Digital Cinematography; Editorial Workflow in Animated and Visual Effects Features; Gaming updates; General Geometry Instancing; Lens Mapping for VFX; Native Stereo; Real-Time VFX and Camera Tracking; Shot/Element Pulls and Delivery to VFX; Techvis; VFX Elements and Stereo; Virtual Production; and VR/AR (Virtual Reality / Augmented Reality). A must-have for anyone working in or aspiring to work in visual effects, The VES Handbook of Visual Effects, Third Edition covers essential techniques and solutions for all VFX artists, producers, and supervisors, from pre-production to digital character creation, compositing of both live-action and CG elements, photorealistic techniques, and much more. With subjects and techniques clearly and definitively presented in beautiful four-color, this handbook is a vital resource for any serious VFX artist.




Data Science on AWS


Book Description

With this practical book, AI and machine learning practitioners will learn how to successfully build and deploy data science projects on Amazon Web Services. The Amazon AI and machine learning stack unifies data science, data engineering, and application development to help level upyour skills. This guide shows you how to build and run pipelines in the cloud, then integrate the results into applications in minutes instead of days. Throughout the book, authors Chris Fregly and Antje Barth demonstrate how to reduce cost and improve performance. Apply the Amazon AI and ML stack to real-world use cases for natural language processing, computer vision, fraud detection, conversational devices, and more Use automated machine learning to implement a specific subset of use cases with SageMaker Autopilot Dive deep into the complete model development lifecycle for a BERT-based NLP use case including data ingestion, analysis, model training, and deployment Tie everything together into a repeatable machine learning operations pipeline Explore real-time ML, anomaly detection, and streaming analytics on data streams with Amazon Kinesis and Managed Streaming for Apache Kafka Learn security best practices for data science projects and workflows including identity and access management, authentication, authorization, and more




Visual Effects Society Handbook


Book Description

Wisdom from the best and the brightest in the industry, this visual effects bible belongs on the shelf of anyone working in or aspiring to work in VFX. The book covers techniques and solutions all VFX artists/producers/supervisors need to know, from breaking down a script and initial bidding, to digital character creation and compositing of both live-action and CG elements. In-depth lessons on stereoscopic moviemaking, color management and digital intermediates are included, as well as chapters on interactive games and full animation authored by artists from EA and Dreamworks respectively. From predproduction to acquisition to postproduction, every aspect of the VFX production workflow is given prominent coverage. VFX legends such as John Knoll, Mike Fink, and John Erland provide you with invaluable insight and lessons from the set, equipping you with everything you need to know about the entire visual effects workflow. Simply a must-have book for anyone working in or wanting to work in the VFX industry.




Data Wrangling on AWS


Book Description

Revamp your data landscape and implement highly effective data pipelines in AWS with this hands-on guide Purchase of the print or Kindle book includes a free PDF eBook Key Features Execute extract, transform, and load (ETL) tasks on data lakes, data warehouses, and databases Implement effective Pandas data operation with data wrangler Integrate pipelines with AWS data services Book DescriptionData wrangling is the process of cleaning, transforming, and organizing raw, messy, or unstructured data into a structured format. It involves processes such as data cleaning, data integration, data transformation, and data enrichment to ensure that the data is accurate, consistent, and suitable for analysis. Data Wrangling on AWS equips you with the knowledge to reap the full potential of AWS data wrangling tools. First, you’ll be introduced to data wrangling on AWS and will be familiarized with data wrangling services available in AWS. You’ll understand how to work with AWS Glue DataBrew, AWS data wrangler, and AWS Sagemaker. Next, you’ll discover other AWS services like Amazon S3, Redshift, Athena, and Quicksight. Additionally, you’ll explore advanced topics such as performing Pandas data operation with AWS data wrangler, optimizing ML data with AWS SageMaker, building the data warehouse with Glue DataBrew, along with security and monitoring aspects. By the end of this book, you’ll be well-equipped to perform data wrangling using AWS services.What you will learn Explore how to write simple to complex transformations using AWS data wrangler Use abstracted functions to extract and load data from and into AWS datastores Configure AWS Glue DataBrew for data wrangling Develop data pipelines using AWS data wrangler Integrate AWS security features into Data Wrangler using identity and access management (IAM) Optimize your data with AWS SageMaker Who this book is for This book is for data engineers, data scientists, and business data analysts looking to explore the capabilities, tools, and services of data wrangling on AWS for their ETL tasks. Basic knowledge of Python, Pandas, and a familiarity with AWS tools such as AWS Glue, Amazon Athena is required to get the most out of this book.