Mastering the Modern Data Stack


Book Description

In the age of digital transformation, becoming overwhelmed by the sheer volume of potential data management, analytics, and AI solutions is common. Then it's all too easy to become distracted by glossy vendor marketing, and then chase the latest shiny tool, rather than focusing on building resilient, valuable platforms that will outperform the competition. This book aims to fix a glaring gap for data professionals: a comprehensive guide to the full Modern Data Stack that's rooted in real-world capabilities, not vendor hype. It is full of hard-earned advice on how to get maximum value from your investments through tangible insights, actionable strategies, and proven best practices. It comprehensively explains how the Modern Data Stack is truly utilized by today's data-driven companies. Mastering the Modern Data Stack: An Executive Guide to Unified Business Analytics is crafted for a diverse audience. It's for business and technology leaders who understand the importance and potential value of data, analytics, and AI—but don’t quite see how it all fits together in the big picture. It's for enterprise architects and technology professionals looking for a primer on the data analytics domain, including definitions of essential components and their usage patterns. It's also for individuals early in their data analytics careers who wish to have a practical and jargon-free understanding of how all the gears and pulleys move behind the scenes in a Modern Data Stack to turn data into actual business value. Whether you're starting your data journey with modest resources, or implementing digital transformation in the cloud, you'll find that this isn't just another textbook on data tools or a mere overview of outdated systems. It's a powerful guide to efficient, modern data management and analytics, with a firm focus on emerging technologies such as data science, machine learning, and AI. If you want to gain a competitive advantage in today’s fast-paced digital world, this TinyTechGuide™ is for you. Remember, it’s not the tech that’s tiny, just the book!™




Mastering the Data Paradox


Book Description

There are two remarkable phenomena that are unfolding almost simultaneously. The first is the emergence of a data-first world, where data has become a central driving force, shaping industries and fueling innovation. The second is the dawn of the AI age, propelled by the advent of Generative AI, that has created the possibility to leverage the data of the world for the first time. The convergence of these two, with data as the common denominator, holds immense promise and the opportunities are boundless. This book provides us with opportunities to push our thinking, to innovate, to transform and to create a better future at all levels—individual, enterprise and the world.




Architecting Modern Data Platforms


Book Description

There’s a lot of information about big data technologies, but splicing these technologies into an end-to-end enterprise data platform is a daunting task not widely covered. With this practical book, you’ll learn how to build big data infrastructure both on-premises and in the cloud and successfully architect a modern data platform. Ideal for enterprise architects, IT managers, application architects, and data engineers, this book shows you how to overcome the many challenges that emerge during Hadoop projects. You’ll explore the vast landscape of tools available in the Hadoop and big data realm in a thorough technical primer before diving into: Infrastructure: Look at all component layers in a modern data platform, from the server to the data center, to establish a solid foundation for data in your enterprise Platform: Understand aspects of deployment, operation, security, high availability, and disaster recovery, along with everything you need to know to integrate your platform with the rest of your enterprise IT Taking Hadoop to the cloud: Learn the important architectural aspects of running a big data platform in the cloud while maintaining enterprise security and high availability




The Informed Company


Book Description

Learn how to manage a modern data stack and get the most out of data in your organization! Thanks to the emergence of new technologies and the explosion of data in recent years, we need new practices for managing and getting value out of data. In the modern, data driven competitive landscape the "best guess" approach—reading blog posts here and there and patching together data practices without any real visibility—is no longer going to hack it. The Informed Company provides definitive direction on how best to leverage the modern data stack, including cloud computing, columnar storage, cloud ETL tools, and cloud BI tools. You'll learn how to work with Agile methods and set up processes that's right for your company to use your data as a key weapon for your success . . . You'll discover best practices for every stage, from querying production databases at a small startup all the way to setting up data marts for different business lines of an enterprise. In their work at Chartio, authors Fowler and David have learned that most businesspeople are almost completely self-taught when it comes to data. If they are using resources, those resources are outdated, so they're missing out on the latest cloud technologies and advances in data analytics. This book will firm up your understanding of data and bring you into the present with knowledge around what works and what doesn't. Discover the data stack strategies that are working for today's successful small, medium, and enterprise companies Learn the different Agile stages of data organization, and the right one for your team Learn how to maintain Data Lakes and Data Warehouses for effective, accessible data storage Gain the knowledge you need to architect Data Warehouses and Data Marts Understand your business's level of data sophistication and the steps you can take to get to "level up" your data The Informed Company is the definitive data book for anyone who wants to work faster and more nimbly, armed with actionable decision-making data.




Mastering Data Engineering and Analytics with Databricks


Book Description

TAGLINE Master Databricks to Transform Data into Strategic Insights for Tomorrow’s Business Challenges KEY FEATURES ● Combines theory with practical steps to master Databricks, Delta Lake, and MLflow. ● Real-world examples from FMCG and CPG sectors demonstrate Databricks in action. ● Covers real-time data processing, ML integration, and CI/CD for scalable pipelines. ● Offers proven strategies to optimize workflows and avoid common pitfalls. DESCRIPTION In today’s data-driven world, mastering data engineering is crucial for driving innovation and delivering real business impact. Databricks is one of the most powerful platforms which unifies data, analytics and AI requirements of numerous organizations worldwide. Mastering Data Engineering and Analytics with Databricks goes beyond the basics, offering a hands-on, practical approach tailored for professionals eager to excel in the evolving landscape of data engineering and analytics. This book uniquely blends foundational knowledge with advanced applications, equipping readers with the expertise to build, optimize, and scale data pipelines that meet real-world business needs. With a focus on actionable learning, it delves into complex workflows, including real-time data processing, advanced optimization with Delta Lake, and seamless ML integration with MLflow—skills critical for today’s data professionals. Drawing from real-world case studies in FMCG and CPG industries, this book not only teaches you how to implement Databricks solutions but also provides strategic insights into tackling industry-specific challenges. From setting up your environment to deploying CI/CD pipelines, you'll gain a competitive edge by mastering techniques that are directly applicable to your organization’s data strategy. By the end, you’ll not just understand Databricks—you’ll command it, positioning yourself as a leader in the data engineering space. WHAT WILL YOU LEARN ● Design and implement scalable, high-performance data pipelines using Databricks for various business use cases. ● Optimize query performance and efficiently manage cloud resources for cost-effective data processing. ● Seamlessly integrate machine learning models into your data engineering workflows for smarter automation. ● Build and deploy real-time data processing solutions for timely and actionable insights. ● Develop reliable and fault-tolerant Delta Lake architectures to support efficient data lakes at scale. WHO IS THIS BOOK FOR? This book is designed for data engineering students, aspiring data engineers, experienced data professionals, cloud data architects, data scientists and analysts looking to expand their skill sets, as well as IT managers seeking to master data engineering and analytics with Databricks. A basic understanding of data engineering concepts, familiarity with data analytics, and some experience with cloud computing or programming languages such as Python or SQL will help readers fully benefit from the book’s content. TABLE OF CONTENTS SECTION 1 1. Introducing Data Engineering with Databricks 2. Setting Up a Databricks Environment for Data Engineering 3. Working with Databricks Utilities and Clusters SECTION 2 4. Extracting and Loading Data Using Databricks 5. Transforming Data with Databricks 6. Handling Streaming Data with Databricks 7. Creating Delta Live Tables 8. Data Partitioning and Shuffling 9. Performance Tuning and Best Practices 10. Workflow Management 11. Databricks SQL Warehouse 12. Data Storage and Unity Catalog 13. Monitoring Databricks Clusters and Jobs 14. Production Deployment Strategies 15. Maintaining Data Pipelines in Production 16. Managing Data Security and Governance 17. Real-World Data Engineering Use Cases with Databricks 18. AI and ML Essentials 19. Integrating Databricks with External Tools Index




Mastering Modern Linux


Book Description

Praise for the First Edition: "This outstanding book ... gives the reader robust concepts and implementable knowledge of this environment. Graphical user interface (GUI)-based users and developers do not get short shrift, despite the command-line interface’s (CLI) full-power treatment. ... Every programmer should read the introduction’s Unix/Linux philosophy section. ... This authoritative and exceptionally well-constructed book has my highest recommendation. It will repay careful and recursive study." --Computing Reviews, August 2011 Mastering Modern Linux, Second Edition retains much of the good material from the previous edition, with extensive updates and new topics added. The book provides a comprehensive and up-to-date guide to Linux concepts, usage, and programming. The text helps the reader master Linux with a well-selected set of topics, and encourages hands-on practice. The first part of the textbook covers interactive use of Linux via the Graphical User Interface (GUI) and the Command-Line Interface (CLI), including comprehensive treatment of the Gnome desktop and the Bash Shell. Using different apps, commands and filters, building pipelines, and matching patterns with regular expressions are major focuses. Next comes Bash scripting, file system structure, organization, and usage. The following chapters present networking, the Internet and the Web, data encryption, basic system admin, as well as Web hosting. The Linux Apache MySQL/MariaDB PHP (LAMP) Web hosting combination is also presented in depth. In the last part of the book, attention is turned to C-level programming. Topics covered include the C compiler, preprocessor, debugger, I/O, file manipulation, process control, inter-process communication, and networking. The book includes many examples and complete programs ready to download and run. A summary and exercises of varying degrees of difficulty can be found at the end of each chapter. A companion website (http://mml.sofpower.com) provides appendices, information updates, an example code package, and other resources for instructors, as well as students.




Mastering Data Structures with Python


Book Description

"Mastering Data Structures with Python: A Practical Guide" offers a comprehensive journey through the essential concepts of data structures, all within the practical framework of Python. Designed for both beginners and experienced programmers, this book provides a thorough understanding of the data structures that are critical to writing efficient, high-performance algorithms. The book begins with a solid introduction to fundamental data structures like arrays, linked lists, stacks, and queues, before moving on to more complex structures such as trees, graphs, and heaps. Each data structure is broken down with easy-to-understand explanations, step-by-step walkthroughs, and Python code examples that bring theory to life. The clear, practical approach ensures that readers can apply what they've learned in real-world programming situations. In addition to covering these essential structures, the book also focuses on the efficiency and performance of algorithms, teaching you how to analyze time and space complexity using Big O notation. This understanding is crucial for writing code that scales and performs well under pressure, a skill that's highly sought after in technical interviews and real-world development. The book goes beyond theory, showcasing real-world applications of data structures in Python, such as how to use them to optimize search algorithms, build complex networks, and manage large datasets. With a focus on practical problem-solving, you'll also learn tips and tricks for optimizing code, managing memory efficiently, and implementing the right data structures for various tasks. Whether you’re a student preparing for coding interviews, a developer wanting to sharpen your skills, or simply curious about data structures, "Mastering Data Structures with Python" serves as a valuable guide. It’s not just about learning Python—it’s about mastering the art of programming itself.




Mastering MLOps Architecture: From Code to Deployment


Book Description

Harness the power of MLOps for managing real time machine learning project cycle KEY FEATURES ● Comprehensive coverage of MLOps concepts, architecture, tools and techniques. ● Practical focus on building end-to-end ML Systems for Continual Learning with MLOps. ● Actionable insights on CI/CD, monitoring, continual model training and automated retraining. DESCRIPTION MLOps, a combination of DevOps, data engineering, and machine learning, is crucial for delivering high-quality machine learning results due to the dynamic nature of machine learning data. This book delves into MLOps, covering its core concepts, components, and architecture, demonstrating how MLOps fosters robust and continuously improving machine learning systems. By covering the end-to-end machine learning pipeline from data to deployment, the book helps readers implement MLOps workflows. It discusses techniques like feature engineering, model development, A/B testing, and canary deployments. The book equips readers with knowledge of MLOps tools and infrastructure for tasks like model tracking, model governance, metadata management, and pipeline orchestration. Monitoring and maintenance processes to detect model degradation are covered in depth. Readers can gain skills to build efficient CI/CD pipelines, deploy models faster, and make their ML systems more reliable, robust and production-ready. Overall, the book is an indispensable guide to MLOps and its applications for delivering business value through continuous machine learning and AI. WHAT YOU WILL LEARN ● Architect robust MLOps infrastructure with components like feature stores. ● Leverage MLOps tools like model registries, metadata stores, pipelines. ● Build CI/CD workflows to deploy models faster and continually. ● Monitor and maintain models in production to detect degradation. ● Create automated workflows for retraining and updating models in production. WHO THIS BOOK IS FOR Machine learning specialists, data scientists, DevOps professionals, software development teams, and all those who want to adopt the DevOps approach in their agile machine learning experiments and applications. Prior knowledge of machine learning and Python programming is desired. TABLE OF CONTENTS 1. Getting Started with MLOps 2. MLOps Architecture and Components 3. MLOps Infrastructure and Tools 4. What are Machine Learning Systems? 5. Data Preparation and Model Development 6. Model Deployment and Serving 7. Continuous Delivery of Machine Learning Models 8. Continual Learning 9. Continuous Monitoring, Logging, and Maintenance




Mastering SaltStack


Book Description

Take charge of SaltStack to automate and configure your enterprise-grade environments About This Book Automate tasks effectively and take charge of your infrastructure Effectively scale Salt to manage thousands of machines and tackle everyday problems Explore Salt's inner workings and advance your knowledge of it Who This Book Is For This book is ideal for IT professionals and ops engineers who already manage groups of servers, but would like to expand their knowledge and gain expertise with SaltStack. This book explains the advanced features and concepts of Salt. A basic knowledge of Salt is required in order to get to grips with advanced Salt features. What You Will Learn Automate tasks effectively, so that your infrastructure can run itself Start building more complex concepts Master user-level internals Build scaling strategies Explore monitoring strategies Learn how to troubleshoot Salt and its subcomponents Explore best practices for Salt In Detail SaltStack is a powerful configuration management and automation suite designed to manage servers and tens of thousands of nodes. This book showcases Salt as a very powerful automation framework. We will review the fundamental concepts to get you in the right frame of mind, and then explore Salt in much greater depth. You will explore Salt SSH as a powerful tool and take Salt Cloud to the next level. Next, you'll master using Salt services with ease in your infrastructure. You will discover methods and strategies to scale your infrastructure properly. You will also learn how to use Salt as a powerful monitoring tool. By the end of this book, you will have learned troubleshooting tips and best practices to make the entire process of using Salt pain-free and easy. Style and approach This book follows a step-by-step conversational tone. Topics are covered in detail through examples and a user-friendly approach.




Data Engineering with dbt


Book Description

Use easy-to-apply patterns in SQL and Python to adopt modern analytics engineering to build agile platforms with dbt that are well-tested and simple to extend and run Purchase of the print or Kindle book includes a free PDF eBook Key Features Build a solid dbt base and learn data modeling and the modern data stack to become an analytics engineer Build automated and reliable pipelines to deploy, test, run, and monitor ELTs with dbt Cloud Guided dbt + Snowflake project to build a pattern-based architecture that delivers reliable datasets Book Descriptiondbt Cloud helps professional analytics engineers automate the application of powerful and proven patterns to transform data from ingestion to delivery, enabling real DataOps. This book begins by introducing you to dbt and its role in the data stack, along with how it uses simple SQL to build your data platform, helping you and your team work better together. You’ll find out how to leverage data modeling, data quality, master data management, and more to build a simple-to-understand and future-proof solution. As you advance, you’ll explore the modern data stack, understand how data-related careers are changing, and see how dbt enables this transition into the emerging role of an analytics engineer. The chapters help you build a sample project using the free version of dbt Cloud, Snowflake, and GitHub to create a professional DevOps setup with continuous integration, automated deployment, ELT run, scheduling, and monitoring, solving practical cases you encounter in your daily work. By the end of this dbt book, you’ll be able to build an end-to-end pragmatic data platform by ingesting data exported from your source systems, coding the needed transformations, including master data and the desired business rules, and building well-formed dimensional models or wide tables that’ll enable you to build reports with the BI tool of your choice.What you will learn Create a dbt Cloud account and understand the ELT workflow Combine Snowflake and dbt for building modern data engineering pipelines Use SQL to transform raw data into usable data, and test its accuracy Write dbt macros and use Jinja to apply software engineering principles Test data and transformations to ensure reliability and data quality Build a lightweight pragmatic data platform using proven patterns Write easy-to-maintain idempotent code using dbt materialization Who this book is for This book is for data engineers, analytics engineers, BI professionals, and data analysts who want to learn how to build simple, futureproof, and maintainable data platforms in an agile way. Project managers, data team managers, and decision makers looking to understand the importance of building a data platform and foster a culture of high-performing data teams will also find this book useful. Basic knowledge of SQL and data modeling will help you get the most out of the many layers of this book. The book also includes primers on many data-related subjects to help juniors get started.