Programmer's Guide to Apache Thrift


Book Description

Summary Programmer's Guide to Apache Thrift provides comprehensive coverage of the Apache Thrift framework along with a developer's-eye view of modern distributed application architecture. Foreword by Jens Geyer. Purchase of the print book includes a free eBook in PDF, Kindle, and ePub formats from Manning Publications. About the Technology Thrift-based distributed software systems are built out of communicating components that use different languages, protocols, and message types. Sitting between them is Thrift, which handles data serialization, transport, and service implementation. Thrift supports many client and server environments and a host of languages ranging from PHP to JavaScript, and from C++ to Go. About the Book Programmer's Guide to Apache Thrift provides comprehensive coverage of distributed application communication using the Thrift framework. Packed with code examples and useful insight, this book presents best practices for multi-language distributed development. You'll take a guided tour through transports, protocols, IDL, and servers as you explore programs in C++, Java, and Python. You'll also learn how to work with platforms ranging from browser-based clients to enterprise servers. What's inside Complete coverage of Thrift's IDL Building and serializing complex user-defined types Plug-in protocols, transports, and data compression Creating cross-language services with RPC and messaging systems About the Reader Readers should be comfortable with a language like Python, Java, or C++ and the basics of service-oriented or microservice architectures. About the Author Randy Abernethy is an Apache Thrift Project Management Committee member and a partner at RX-M. Table of Contents PART 1 - APACHE THRIFT OVERVIEW Introduction to Apache Thrift Apache Thrift architecture Building, testing, and debugging PART 2 - PROGRAMMING APACHE THRIFT Moving bytes with transports Serializing data with protocols Apache Thrift IDL User-defined types Implementing services Handling exceptions Servers PART 3 - APACHE THRIFT LANGUAGES Building clients and servers with C++ Building clients and servers with Java Building C# clients and servers with .NET Core and Windows Building Node.js clients and servers Apache Thrift and JavaScript Scripting Apache Thrift Thrift in the enterprise




Programmer's Guide to Apache Thrift


Book Description

Summary Programmer's Guide to Apache Thrift provides comprehensive coverage of the Apache Thrift framework along with a developer's-eye view of modern distributed application architecture. Foreword by Jens Geyer. About the Technology Thrift-based distributed software systems are built out of communicating components that use different languages, protocols, and message types. Sitting between them is Thrift, which handles data serialization, transport, and service implementation. Thrift supports many client and server environments and a host of languages ranging from PHP to JavaScript, and from C++ to Go. About the Book Programmer's Guide to Apache Thrift provides comprehensive coverage of distributed application communication using the Thrift framework. Packed with code examples and useful insight, this book presents best practices for multi-language distributed development. You'll take a guided tour through transports, protocols, IDL, and servers as you explore programs in C++, Java, and Python. You'll also learn how to work with platforms ranging from browser-based clients to enterprise servers. What's inside Complete coverage of Thrift's IDL Building and serializing complex user-defined types Plug-in protocols, transports, and data compression Creating cross-language services with RPC and messaging systems About the Reader Readers should be comfortable with a language like Python, Java, or C++ and the basics of service-oriented or microservice architectures. About the Author Randy Abernethy is an Apache Thrift Project Management Committee member and a partner at RX-M. Table of Contents Introduction to Apache Thrift Apache Thrift architecture Building, testing, and debugging Moving bytes with transports Serializing data with protocols Apache Thrift IDL User-defined types Implementing services Handling exceptions Servers Building clients and servers with C++ Building clients and servers with Java Building C# clients and servers with .NET Core and Windows Building Node.js clients and servers Apache Thrift and JavaScript Scripting Apache Thrift Thrift in the enterprise




Spark: The Definitive Guide


Book Description

Learn how to use, deploy, and maintain Apache Spark with this comprehensive guide, written by the creators of the open-source cluster-computing framework. With an emphasis on improvements and new features in Spark 2.0, authors Bill Chambers and Matei Zaharia break down Spark topics into distinct sections, each with unique goals. Youâ??ll explore the basic operations and common functions of Sparkâ??s structured APIs, as well as Structured Streaming, a new high-level API for building end-to-end streaming applications. Developers and system administrators will learn the fundamentals of monitoring, tuning, and debugging Spark, and explore machine learning techniques and scenarios for employing MLlib, Sparkâ??s scalable machine-learning library. Get a gentle overview of big data and Spark Learn about DataFrames, SQL, and Datasetsâ??Sparkâ??s core APIsâ??through worked examples Dive into Sparkâ??s low-level APIs, RDDs, and execution of SQL and DataFrames Understand how Spark runs on a cluster Debug, monitor, and tune Spark clusters and applications Learn the power of Structured Streaming, Sparkâ??s stream-processing engine Learn how you can apply MLlib to a variety of problems, including classification or recommendation




Programming Hive


Book Description

Need to move a relational database application to Hadoop? This comprehensive guide introduces you to Apache Hive, Hadoop’s data warehouse infrastructure. You’ll quickly learn how to use Hive’s SQL dialect—HiveQL—to summarize, query, and analyze large datasets stored in Hadoop’s distributed filesystem. This example-driven guide shows you how to set up and configure Hive in your environment, provides a detailed overview of Hadoop and MapReduce, and demonstrates how Hive works within the Hadoop ecosystem. You’ll also find real-world case studies that describe how companies have used Hive to solve unique problems involving petabytes of data. Use Hive to create, alter, and drop databases, tables, views, functions, and indexes Customize data formats and storage options, from files to external databases Load and extract data from tables—and use queries, grouping, filtering, joining, and other conventional query methods Gain best practices for creating user defined functions (UDFs) Learn Hive patterns you should use and anti-patterns you should avoid Integrate Hive with other data processing programs Use storage handlers for NoSQL databases and other datastores Learn the pros and cons of running Hive on Amazon’s Elastic MapReduce




Cassandra: The Definitive Guide


Book Description

Imagine what you could do if scalability wasn't a problem. With this hands-on guide, you’ll learn how the Cassandra database management system handles hundreds of terabytes of data while remaining highly available across multiple data centers. This expanded second edition—updated for Cassandra 3.0—provides the technical details and practical examples you need to put this database to work in a production environment. Authors Jeff Carpenter and Eben Hewitt demonstrate the advantages of Cassandra’s non-relational design, with special attention to data modeling. If you’re a developer, DBA, or application architect looking to solve a database scaling issue or future-proof your application, this guide helps you harness Cassandra’s speed and flexibility. Understand Cassandra’s distributed and decentralized structure Use the Cassandra Query Language (CQL) and cqlsh—the CQL shell Create a working data model and compare it with an equivalent relational model Develop sample applications using client drivers for languages including Java, Python, and Node.js Explore cluster topology and learn how nodes exchange data Maintain a high level of performance in your cluster Deploy Cassandra on site, in the Cloud, or with Docker Integrate Cassandra with Spark, Hadoop, Elasticsearch, Solr, and Lucene




Kafka: The Definitive Guide


Book Description

Every enterprise application creates data, whether it’s log messages, metrics, user activity, outgoing messages, or something else. And how to move all of this data becomes nearly as important as the data itself. If you’re an application architect, developer, or production engineer new to Apache Kafka, this practical guide shows you how to use this open source streaming platform to handle real-time data feeds. Engineers from Confluent and LinkedIn who are responsible for developing Kafka explain how to deploy production Kafka clusters, write reliable event-driven microservices, and build scalable stream-processing applications with this platform. Through detailed examples, you’ll learn Kafka’s design principles, reliability guarantees, key APIs, and architecture details, including the replication protocol, the controller, and the storage layer. Understand publish-subscribe messaging and how it fits in the big data ecosystem. Explore Kafka producers and consumers for writing and reading messages Understand Kafka patterns and use-case requirements to ensure reliable data delivery Get best practices for building data pipelines and applications with Kafka Manage Kafka in production, and learn to perform monitoring, tuning, and maintenance tasks Learn the most critical metrics among Kafka’s operational measurements Explore how Kafka’s stream delivery capabilities make it a perfect source for stream processing systems




Hadoop: The Definitive Guide


Book Description

Ready to unlock the power of your data? With this comprehensive guide, you’ll learn how to build and maintain reliable, scalable, distributed systems with Apache Hadoop. This book is ideal for programmers looking to analyze datasets of any size, and for administrators who want to set up and run Hadoop clusters. You’ll find illuminating case studies that demonstrate how Hadoop is used to solve specific problems. This third edition covers recent changes to Hadoop, including material on the new MapReduce API, as well as MapReduce 2 and its more flexible execution model (YARN). Store large datasets with the Hadoop Distributed File System (HDFS) Run distributed computations with MapReduce Use Hadoop’s data and I/O building blocks for compression, data integrity, serialization (including Avro), and persistence Discover common pitfalls and advanced features for writing real-world MapReduce programs Design, build, and administer a dedicated Hadoop cluster—or run Hadoop in the cloud Load data from relational databases into HDFS, using Sqoop Perform large-scale data processing with the Pig query language Analyze datasets with Hive, Hadoop’s data warehousing system Take advantage of HBase for structured and semi-structured data, and ZooKeeper for building distributed systems




Software Engineering at Google


Book Description

Today, software engineers need to know not only how to program effectively but also how to develop proper engineering practices to make their codebase sustainable and healthy. This book emphasizes this difference between programming and software engineering. How can software engineers manage a living codebase that evolves and responds to changing requirements and demands over the length of its life? Based on their experience at Google, software engineers Titus Winters and Hyrum Wright, along with technical writer Tom Manshreck, present a candid and insightful look at how some of the world’s leading practitioners construct and maintain software. This book covers Google’s unique engineering culture, processes, and tools and how these aspects contribute to the effectiveness of an engineering organization. You’ll explore three fundamental principles that software organizations should keep in mind when designing, architecting, writing, and maintaining code: How time affects the sustainability of software and how to make your code resilient over time How scale affects the viability of software practices within an engineering organization What trade-offs a typical engineer needs to make when evaluating design and development decisions




Learning Spark


Book Description

Data is bigger, arrives faster, and comes in a variety of formats—and it all needs to be processed at scale for analytics or machine learning. But how can you process such varied workloads efficiently? Enter Apache Spark. Updated to include Spark 3.0, this second edition shows data engineers and data scientists why structure and unification in Spark matters. Specifically, this book explains how to perform simple and complex data analytics and employ machine learning algorithms. Through step-by-step walk-throughs, code snippets, and notebooks, you’ll be able to: Learn Python, SQL, Scala, or Java high-level Structured APIs Understand Spark operations and SQL Engine Inspect, tune, and debug Spark operations with Spark configurations and Spark UI Connect to data sources: JSON, Parquet, CSV, Avro, ORC, Hive, S3, or Kafka Perform analytics on batch and streaming data using Structured Streaming Build reliable data pipelines with open source Delta Lake and Spark Develop machine learning pipelines with MLlib and productionize models using MLflow




gRPC: Up and Running


Book Description

Get a comprehensive understanding of gRPC fundamentals through real-world examples. With this practical guide, you’ll learn how this high-performance interprocess communication protocol is capable of connecting polyglot services in microservices architecture, while providing a rich framework for defining service contracts and data types. Complete with hands-on examples written in Go, Java, Node, and Python, this book also covers the essential techniques and best practices to use gRPC in production systems. Authors Kasun Indrasiri and Danesh Kuruppu discuss the importance of gRPC in the context of microservices development.