Data Science on AWS


Book Description

With this practical book, AI and machine learning practitioners will learn how to successfully build and deploy data science projects on Amazon Web Services. The Amazon AI and machine learning stack unifies data science, data engineering, and application development to help level upyour skills. This guide shows you how to build and run pipelines in the cloud, then integrate the results into applications in minutes instead of days. Throughout the book, authors Chris Fregly and Antje Barth demonstrate how to reduce cost and improve performance. Apply the Amazon AI and ML stack to real-world use cases for natural language processing, computer vision, fraud detection, conversational devices, and more Use automated machine learning to implement a specific subset of use cases with SageMaker Autopilot Dive deep into the complete model development lifecycle for a BERT-based NLP use case including data ingestion, analysis, model training, and deployment Tie everything together into a repeatable machine learning operations pipeline Explore real-time ML, anomaly detection, and streaming analytics on data streams with Amazon Kinesis and Managed Streaming for Apache Kafka Learn security best practices for data science projects and workflows including identity and access management, authentication, authorization, and more




Data Science on AWS


Book Description

With this practical book, AI and machine learning practitioners will learn how to successfully build and deploy data science projects on Amazon Web Services. The Amazon AI and machine learning stack unifies data science, data engineering, and application development to help level upyour skills. This guide shows you how to build and run pipelines in the cloud, then integrate the results into applications in minutes instead of days. Throughout the book, authors Chris Fregly and Antje Barth demonstrate how to reduce cost and improve performance. Apply the Amazon AI and ML stack to real-world use cases for natural language processing, computer vision, fraud detection, conversational devices, and more Use automated machine learning to implement a specific subset of use cases with SageMaker Autopilot Dive deep into the complete model development lifecycle for a BERT-based NLP use case including data ingestion, analysis, model training, and deployment Tie everything together into a repeatable machine learning operations pipeline Explore real-time ML, anomaly detection, and streaming analytics on data streams with Amazon Kinesis and Managed Streaming for Apache Kafka Learn security best practices for data science projects and workflows including identity and access management, authentication, authorization, and more




Data Analytics in the AWS Cloud


Book Description

A comprehensive and accessible roadmap to performing data analytics in the AWS cloud In Data Analytics in the AWS Cloud: Building a Data Platform for BI and Predictive Analytics on AWS, accomplished software engineer and data architect Joe Minichino delivers an expert blueprint to storing, processing, analyzing data on the Amazon Web Services cloud platform. In the book, you’ll explore every relevant aspect of data analytics—from data engineering to analysis, business intelligence, DevOps, and MLOps—as you discover how to integrate machine learning predictions with analytics engines and visualization tools. You’ll also find: Real-world use cases of AWS architectures that demystify the applications of data analytics Accessible introductions to data acquisition, importation, storage, visualization, and reporting Expert insights into serverless data engineering and how to use it to reduce overhead and costs, improve stability, and simplify maintenance A can't-miss for data architects, analysts, engineers and technical professionals, Data Analytics in the AWS Cloud will also earn a place on the bookshelves of business leaders seeking a better understanding of data analytics on the AWS cloud platform.




Effective Data Science Infrastructure


Book Description

Effective Data Science Infrastructure: How to make data scientists more productive is a hands-on guide to assembling infrastructure for data science and machine learning applications. It reveals the processes used at Netflix and other data-driven companies to manage their cutting edge data infrastructure. In it, you'll master scalable techniques for data storage, computation, experiment tracking, and orchestration that are relevant to companies of all shapes and sizes. You'll learn how you can make data scientists more productive with your existing cloud infrastructure, a stack of open source software, and idiomatic Python.




Data Science: Neural Networks, Deep Learning, LLMs and Power BI


Book Description

I wrote this book as I got an interview offer for Data Analyst. There they asked me a lot of questions and there was an exam. This helped me a lot to write the book based on the interview questions faced by me and the knowledge gained by working on AI projects. I then added all my other knowledge working as a Data Analyst on my other projects and wrote the book. Technical books need a lot of attention, as they need deep checks, but I tried to do my best. Not everything can be included in detail, it is impossible. I have tried to include everything related to Data Science that is presently going on in the industry and the world.




Data Science at the Command Line


Book Description

This hands-on guide demonstrates how the flexibility of the command line can help you become a more efficient and productive data scientist. You’ll learn how to combine small, yet powerful, command-line tools to quickly obtain, scrub, explore, and model your data. To get you started—whether you’re on Windows, OS X, or Linux—author Jeroen Janssens introduces the Data Science Toolbox, an easy-to-install virtual environment packed with over 80 command-line tools. Discover why the command line is an agile, scalable, and extensible technology. Even if you’re already comfortable processing data with, say, Python or R, you’ll greatly improve your data science workflow by also leveraging the power of the command line. Obtain data from websites, APIs, databases, and spreadsheets Perform scrub operations on plain text, CSV, HTML/XML, and JSON Explore data, compute descriptive statistics, and create visualizations Manage your data science workflow using Drake Create reusable tools from one-liners and existing Python or R code Parallelize and distribute data-intensive pipelines using GNU Parallel Model data with dimensionality reduction, clustering, regression, and classification algorithms




Big Data, Cloud Computing, Data Science & Engineering


Book Description

This book presents the outcomes of the 3rd IEEE/ACIS International Conference on Big Data, Cloud Computing, Data Science & Engineering (BCD 2018), which was held on July 10–12, 2018 in Kanazawa. The aim of the conference was to bring together researchers and scientists, businesspeople and entrepreneurs, teachers, engineers, computer users, and students to discuss the various fields of computer science, to share their experiences, and to exchange new ideas and information in a meaningful way. All aspects (theory, applications and tools) of computer and information science, the practical challenges encountered along the way, and the solutions adopted to solve them are all explored here. The conference organizers selected the best papers from among those accepted for presentation. The papers were chosen on the basis of review scores submitted by members of the program committee and subsequently underwent further rigorous review. Following this second round of review, 13 of the conference’s most promising papers were selected for this Springer (SCI) book. We eagerly await the important contributions that we know these authors will make to the field of computer and information science.




The Machine Learning Solutions Architect Handbook


Book Description

Build highly secure and scalable machine learning platforms to support the fast-paced adoption of machine learning solutions Key Features Explore different ML tools and frameworks to solve large-scale machine learning challenges in the cloud Build an efficient data science environment for data exploration, model building, and model training Learn how to implement bias detection, privacy, and explainability in ML model development Book DescriptionWhen equipped with a highly scalable machine learning (ML) platform, organizations can quickly scale the delivery of ML products for faster business value realization. There is a huge demand for skilled ML solutions architects in different industries, and this handbook will help you master the design patterns, architectural considerations, and the latest technology insights you’ll need to become one. You’ll start by understanding ML fundamentals and how ML can be applied to solve real-world business problems. Once you've explored a few leading problem-solving ML algorithms, this book will help you tackle data management and get the most out of ML libraries such as TensorFlow and PyTorch. Using open source technology such as Kubernetes/Kubeflow to build a data science environment and ML pipelines will be covered next, before moving on to building an enterprise ML architecture using Amazon Web Services (AWS). You’ll also learn about security and governance considerations, advanced ML engineering techniques, and how to apply bias detection, explainability, and privacy in ML model development. By the end of this book, you’ll be able to design and build an ML platform to support common use cases and architecture patterns like a true professional. What you will learn Apply ML methodologies to solve business problems Design a practical enterprise ML platform architecture Implement MLOps for ML workflow automation Build an end-to-end data management architecture using AWS Train large-scale ML models and optimize model inference latency Create a business application using an AI service and a custom ML model Use AWS services to detect data and model bias and explain models Who this book is for This book is for data scientists, data engineers, cloud architects, and machine learning enthusiasts who want to become machine learning solutions architects. You’ll need basic knowledge of the Python programming language, AWS, linear algebra, probability, and networking concepts before you get started with this handbook.




Amazon SageMaker Best Practices


Book Description

Overcome advanced challenges in building end-to-end ML solutions by leveraging the capabilities of Amazon SageMaker for developing and integrating ML models into production Key FeaturesLearn best practices for all phases of building machine learning solutions - from data preparation to monitoring models in productionAutomate end-to-end machine learning workflows with Amazon SageMaker and related AWSDesign, architect, and operate machine learning workloads in the AWS CloudBook Description Amazon SageMaker is a fully managed AWS service that provides the ability to build, train, deploy, and monitor machine learning models. The book begins with a high-level overview of Amazon SageMaker capabilities that map to the various phases of the machine learning process to help set the right foundation. You'll learn efficient tactics to address data science challenges such as processing data at scale, data preparation, connecting to big data pipelines, identifying data bias, running A/B tests, and model explainability using Amazon SageMaker. As you advance, you'll understand how you can tackle the challenge of training at scale, including how to use large data sets while saving costs, monitoring training resources to identify bottlenecks, speeding up long training jobs, and tracking multiple models trained for a common goal. Moving ahead, you'll find out how you can integrate Amazon SageMaker with other AWS to build reliable, cost-optimized, and automated machine learning applications. In addition to this, you'll build ML pipelines integrated with MLOps principles and apply best practices to build secure and performant solutions. By the end of the book, you'll confidently be able to apply Amazon SageMaker's wide range of capabilities to the full spectrum of machine learning workflows. What you will learnPerform data bias detection with AWS Data Wrangler and SageMaker ClarifySpeed up data processing with SageMaker Feature StoreOvercome labeling bias with SageMaker Ground TruthImprove training time with the monitoring and profiling capabilities of SageMaker DebuggerAddress the challenge of model deployment automation with CI/CD using the SageMaker model registryExplore SageMaker Neo for model optimizationImplement data and model quality monitoring with Amazon Model MonitorImprove training time and reduce costs with SageMaker data and model parallelismWho this book is for This book is for expert data scientists responsible for building machine learning applications using Amazon SageMaker. Working knowledge of Amazon SageMaker, machine learning, deep learning, and experience using Jupyter Notebooks and Python is expected. Basic knowledge of AWS related to data, security, and monitoring will help you make the most of the book.




Data Science and Analytics Strategy


Book Description

This book describes how to establish data science and analytics capabilities in organisations using Emergent Design, an evolutionary approach that increases the chances of successful outcomes while minimising upfront investment. Based on their experiences and those of a number of data leaders, the authors provide actionable advice on data technologies, processes, and governance structures so that readers can make choices that are appropriate to their organisational contexts and requirements. The book blends academic research on organisational change and data science processes with real-world stories from experienced data analytics leaders, focusing on the practical aspects of setting up a data capability. In addition to a detailed coverage of capability, culture, and technology choices, a unique feature of the book is its treatment of emerging issues such as data ethics and algorithmic fairness. Data Science and Analytics Strategy: An Emergent Design Approach has been written for professionals who are looking to build data science and analytics capabilities within their organisations as well as those who wish to expand their knowledge and advance their careers in the data space. Providing deep insights into the intersection between data science and business, this guide will help professionals understand how to help their organisations reap the benefits offered by data. Most importantly, readers will learn how to build a fit-for-purpose data science capability in a manner that avoids the most common pitfalls.