SQL & NoSQL Databases


Book Description

This book offers a comprehensive introduction to relational (SQL) and non-relational (NoSQL) databases. The authors thoroughly review the current state of database tools and techniques, and examine coming innovations. The book opens with a broad look at data management, including an overview of information systems and databases, and an explanation of contemporary database types: SQL and NoSQL databases, and their respective management systems The nature and uses of Big Data A high-level view of the organization of data management Data Modeling and Consistency Chapter-length treatment is afforded Data Modeling in both relational and graph databases, including enterprise-wide data architecture, and formulas for database design. Coverage of languages extends from an overview of operators, to SQL and and QBE (Query by Example), to integrity constraints and more. A full chapter probes the challenges of Ensuring Data Consistency, covering: Multi-User Operation Troubleshooting Consistency in Massive Distributed Data Comparison of the ACID and BASE consistency models, and more System Architecture also gets from its own chapter, which explores Processing of Homogeneous and Heterogeneous Data; Storage and Access Structures; Multi-dimensional Data Structures and Parallel Processing with MapReduce, among other topics. Post-Relational and NoSQL Databases The chapter on post-relational databases discusses the limits of SQL – and what lies beyond, including Multi-Dimensional Databases, Knowledge Bases and and Fuzzy Databases. A final chapter covers NoSQL Databases, along with Development of Non-Relational Technologies, Key-Value, Column-Family and Document Stores XML Databases and Graphic Databases, and more The book includes more than 100 tables, examples and illustrations, and each chapter offers a list of resources for further reading. SQL & NoSQL Databases conveys the strengths and weaknesses of relational and non-relational approaches, and shows how to undertake development for big data applications. The book benefits readers including students and practitioners working across the broad field of applied information technology. This textbook has been recommended and developed for university courses in Germany, Austria and Switzerland.




SQL on Big Data


Book Description

Learn various commercial and open source products that perform SQL on Big Data platforms. You will understand the architectures of the various SQL engines being used and how the tools work internally in terms of execution, data movement, latency, scalability, performance, and system requirements. This book consolidates in one place solutions to the challenges associated with the requirements of speed, scalability, and the variety of operations needed for data integration and SQL operations. After discussing the history of the how and why of SQL on Big Data, the book provides in-depth insight into the products, architectures, and innovations happening in this rapidly evolving space. SQL on Big Data discusses in detail the innovations happening, the capabilities on the horizon, and how they solve the issues of performance and scalability and the ability to handle different data types. The book covers how SQL on Big Data engines are permeating the OLTP, OLAP, and Operational analytics space and the rapidly evolving HTAP systems. You will learn the details of: Batch Architectures—Understand the internals and how the existing Hive engine is built and how it is evolving continually to support new features and provide lower latency on queries Interactive Architectures—Understanding how SQL engines are architected to support low latency on large data sets Streaming Architectures—Understanding how SQL engines are architected to support queries on data in motion using in-memory and lock-free data structures Operational Architectures—Understanding how SQL engines are architected for transactional and operational systems to support transactions on Big Data platforms Innovative Architectures—Explore the rapidly evolving newer SQL engines on Big Data with innovative ideas and concepts Who This Book Is For: Business analysts, BI engineers, developers, data scientists and architects, and quality assurance professionals/div




Data Analysis Using SQL and Excel


Book Description

Useful business analysis requires you to effectively transform data into actionable information. This book helps you use SQL and Excel to extract business information from relational databases and use that data to define business dimensions, store transactions about customers, produce results, and more. Each chapter explains when and why to perform a particular type of business analysis in order to obtain useful results, how to design and perform the analysis using SQL and Excel, and what the results should look like.




SQL Server 2019 Revealed


Book Description

Get up to speed on the game-changing developments in SQL Server 2019. No longer just a database engine, SQL Server 2019 is cutting edge with support for machine learning (ML), big data analytics, Linux, containers, Kubernetes, Java, and data virtualization to Azure. This is not a book on traditional database administration for SQL Server. It focuses on all that is new for one of the most successful modernized data platforms in the industry. It is a book for data professionals who already know the fundamentals of SQL Server and want to up their game by building their skills in some of the hottest new areas in technology. SQL Server 2019 Revealed begins with a look at the project's team goal to integrate the world of big data with SQL Server into a major product release. The book then dives into the details of key new capabilities in SQL Server 2019 using a “learn by example” approach for Intelligent Performance, security, mission-critical availability, and features for the modern developer. Also covered are enhancements to SQL Server 2019 for Linux and gain a comprehensive look at SQL Server using containers and Kubernetes clusters. The book concludes by showing you how to virtualize your data access with Polybase to Oracle, MongoDB, Hadoop, and Azure, allowing you to reduce the need for expensive extract, transform, and load (ETL) applications. You will then learn how to take your knowledge of containers, Kubernetes, and Polybase to build a comprehensive solution called Big Data Clusters, which is a marquee feature of 2019. You will also learn how to gain access to Spark, SQL Server, and HDFS to build intelligence over your own data lake and deploy end-to-end machine learning applications. What You Will LearnImplement Big Data Clusters with SQL Server, Spark, and HDFS Create a Data Hub with connections to Oracle, Azure, Hadoop, and other sourcesCombine SQL and Spark to build a machine learning platform for AI applicationsBoost your performance with no application changes using Intelligent PerformanceIncrease security of your SQL Server through Secure Enclaves and Data ClassificationMaximize database uptime through online indexing and Accelerated Database RecoveryBuild new modern applications with Graph, ML Services, and T-SQL Extensibility with JavaImprove your ability to deploy SQL Server on Linux Gain in-depth knowledge to run SQL Server with containers and KubernetesKnow all the new database engine features for performance, usability, and diagnosticsUse the latest tools and methods to migrate your database to SQL Server 2019Apply your knowledge of SQL Server 2019 to Azure Who This Book Is For IT professionals and developers who understand the fundamentals of SQL Server and wish to focus on learning about the new, modern capabilities of SQL Server 2019. The book is for those who want to learn about SQL Server 2019 and the new Big Data Clusters and AI feature set, support for machine learning and Java, how to run SQL Server with containers and Kubernetes, and increased capabilities around Intelligent Performance, advanced security, and high availability.




SQL for Data Analysis


Book Description

With the explosion of data, computing power, and cloud data warehouses, SQL has become an even more indispensable tool for the savvy analyst or data scientist. This practical book reveals new and hidden ways to improve your SQL skills, solve problems, and make the most of SQL as part of your workflow. You'll learn how to use both common and exotic SQL functions such as joins, window functions, subqueries, and regular expressions in new, innovative ways--as well as how to combine SQL techniques to accomplish your goals faster, with understandable code. If you work with SQL databases, this is a must-have reference. Learn the key steps for preparing your data for analysis Perform time series analysis using SQL's date and time manipulations Use cohort analysis to investigate how groups change over time Use SQL's powerful functions and operators for text analysis Detect outliers in your data and replace them with alternate values Establish causality using experiment analysis, also known as A/B testing




SQL for Data Science


Book Description

This textbook explains SQL within the context of data science and introduces the different parts of SQL as they are needed for the tasks usually carried out during data analysis. Using the framework of the data life cycle, it focuses on the steps that are very often given the short shift in traditional textbooks, like data loading, cleaning and pre-processing. The book is organized as follows. Chapter 1 describes the data life cycle, i.e. the sequence of stages from data acquisition to archiving, that data goes through as it is prepared and then actually analyzed, together with the different activities that take place at each stage. Chapter 2 gets into databases proper, explaining how relational databases organize data. Non-traditional data, like XML and text, are also covered. Chapter 3 introduces SQL queries, but unlike traditional textbooks, queries and their parts are described around typical data analysis tasks like data exploration, cleaning and transformation. Chapter 4 introduces some basic techniques for data analysis and shows how SQL can be used for some simple analyses without too much complication. Chapter 5 introduces additional SQL constructs that are important in a variety of situations and thus completes the coverage of SQL queries. Lastly, chapter 6 briefly explains how to use SQL from within R and from within Python programs. It focuses on how these languages can interact with a database, and how what has been learned about SQL can be leveraged to make life easier when using R or Python. All chapters contain a lot of examples and exercises on the way, and readers are encouraged to install the two open-source database systems (MySQL and Postgres) that are used throughout the book in order to practice and work on the exercises, because simply reading the book is much less useful than actually using it. This book is for anyone interested in data science and/or databases. It just demands a bit of computer fluency, but no specific background on databases or data analysis. All concepts are introduced intuitively and with a minimum of specialized jargon. After going through this book, readers should be able to profitably learn more about data mining, machine learning, and database management from more advanced textbooks and courses.




SQL Server Big Data Clusters


Book Description

Get a head-start on learning one of SQL Server 2019’s latest and most impactful features—Big Data Clusters—that combines large volumes of non-relational data for analysis along with data stored relationally inside a SQL Server database. This book provides a first look at Big Data Clusters based upon SQL Server 2019 Release Candidate 1. Start now and get a jump on your competition in learning this important new feature. Big Data Clusters is a feature set covering data virtualization, distributed computing, and relational databases and provides a complete AI platform across the entire cluster environment. This book shows you how to deploy, manage, and use Big Data Clusters. For example, you will learn how to combine data stored on the HDFS file system together with data stored inside the SQL Server instances that make up the Big Data Cluster. Filled with clear examples and use cases, this book provides everything necessary to get started working with Big Data Clusters in SQL Server 2019 using Release Candidate 1. You will learn about the architectural foundations that are made up from Kubernetes, Spark, HDFS, and SQL Server on Linux. You then are shown how to configure and deploy Big Data Clusters in on-premises environments or in the cloud. Next, you are taught about querying. You will learn to write queries in Transact-SQL—taking advantage of skills you have honed for years—and with those queries you will be able to examine and analyze data from a wide variety of sources such as Apache Spark. Through the theoretical foundation provided in this book and easy-to-follow example scripts and notebooks, you will be ready to use and unveil the full potential of SQL Server 2019: combining different types of data spread across widely disparate sources into a single view that is useful for business intelligence and machine learning analysis. What You Will LearnInstall, manage, and troubleshoot Big Data Clusters in cloud or on-premise environments Analyze large volumes of data directly from SQL Server and/or Apache Spark Manage data stored in HDFS from SQL Server as if it were relational data Implement advanced analytics solutions through machine learning and AI Expose different data sources as a single logical source using data virtualization Who This Book Is For For data engineers, data scientists, data architects, and database administrators who want to employ data virtualization and big data analytics in their environment




Seven Databases in Seven Weeks


Book Description

Data is getting bigger and more complex by the day, and so are your choices in handling it. Explore some of the most cutting-edge databases available - from a traditional relational database to newer NoSQL approaches - and make informed decisions about challenging data storage problems. This is the only comprehensive guide to the world of NoSQL databases, with in-depth practical and conceptual introductions to seven different technologies: Redis, Neo4J, CouchDB, MongoDB, HBase, Postgres, and DynamoDB. This second edition includes a new chapter on DynamoDB and updated content for each chapter. While relational databases such as MySQL remain as relevant as ever, the alternative, NoSQL paradigm has opened up new horizons in performance and scalability and changed the way we approach data-centric problems. This book presents the essential concepts behind each database alongside hands-on examples that make each technology come alive. With each database, tackle a real-world problem that highlights the concepts and features that make it shine. Along the way, explore five database models - relational, key/value, columnar, document, and graph - from the perspective of challenges faced by real applications. Learn how MongoDB and CouchDB are strikingly different, make your applications faster with Redis and more connected with Neo4J, build a cluster of HBase servers using cloud services such as Amazon's Elastic MapReduce, and more. This new edition brings a brand new chapter on DynamoDB, updated code samples and exercises, and a more up-to-date account of each database's feature set. Whether you're a programmer building the next big thing, a data scientist seeking solutions to thorny problems, or a technology enthusiast venturing into new territory, you will find something to inspire you in this book. What You Need: You'll need a *nix shell (Mac OS or Linux preferred, Windows users will need Cygwin), Java 6 (or greater), and Ruby 1.8.7 (or greater). Each chapter will list the downloads required for that database.




Next Generation Databases


Book Description

"It’s not easy to find such a generous book on big data and databases. Fortunately, this book is the one." Feng Yu. Computing Reviews. June 28, 2016. This is a book for enterprise architects, database administrators, and developers who need to understand the latest developments in database technologies. It is the book to help you choose the correct database technology at a time when concepts such as Big Data, NoSQL and NewSQL are making what used to be an easy choice into a complex decision with significant implications. The relational database (RDBMS) model completely dominated database technology for over 20 years. Today this "one size fits all" stability has been disrupted by a relatively recent explosion of new database technologies. These paradigm-busting technologies are powering the "Big Data" and "NoSQL" revolutions, as well as forcing fundamental changes in databases across the board. Deciding to use a relational database was once truly a no-brainer, and the various commercial relational databases competed on price, performance, reliability, and ease of use rather than on fundamental architectures. Today we are faced with choices between radically different database technologies. Choosing the right database today is a complex undertaking, with serious economic and technological consequences. Next Generation Databases demystifies today’s new database technologies. The book describes what each technology was designed to solve. It shows how each technology can be used to solve real word application and business problems. Most importantly, this book highlights the architectural differences between technologies that are the critical factors to consider when choosing a database platform for new and upcoming projects. Introduces the new technologies that have revolutionized the database landscape Describes how each technology can be used to solve specific application or business challenges Reviews the most popular new wave databases and how they use these new database technologies




Big Data Processing with Apache Spark


Book Description

Apache Spark is a popular open-source big-data processing framework thatÕs built around speed, ease of use, and unified distributed computing architecture. Not only it supports developing applications in different languages like Java, Scala, Python, and R, itÕs also hundred times faster in memory and ten times faster even when running on disk compared to traditional data processing frameworks. Whether you are currently working on a big data project or interested in learning more about topics like machine learning, streaming data processing, and graph data analytics, this book is for you. You can learn about Apache Spark and develop Spark programs for various use cases in big data analytics using the code examples provided. This book covers all the libraries in Spark ecosystem: Spark Core, Spark SQL, Spark Streaming, Spark ML, and Spark GraphX.