IBM Data Engine for Hadoop and Spark


Book Description

This IBM® Redbooks® publication provides topics to help the technical community take advantage of the resilience, scalability, and performance of the IBM Power SystemsTM platform to implement or integrate an IBM Data Engine for Hadoop and Spark solution for analytics solutions to access, manage, and analyze data sets to improve business outcomes. This book documents topics to demonstrate and take advantage of the analytics strengths of the IBM POWER8® platform, the IBM analytics software portfolio, and selected third-party tools to help solve customer's data analytic workload requirements. This book describes how to plan, prepare, install, integrate, manage, and show how to use the IBM Data Engine for Hadoop and Spark solution to run analytic workloads on IBM POWER8. In addition, this publication delivers documentation to complement available IBM analytics solutions to help your data analytic needs. This publication strengthens the position of IBM analytics and big data solutions with a well-defined and documented deployment model within an IBM POWER8 virtualized environment so that customers have a planned foundation for security, scaling, capacity, resilience, and optimization for analytics workloads. This book is targeted at technical professionals (analytics consultants, technical support staff, IT Architects, and IT Specialists) that are responsible for delivering analytics solutions and support on IBM Power Systems.




Bridging Relational and NoSQL Databases


Book Description

Relational databases have been predominant for many years and are used throughout various industries. The current system faces challenges related to size and variety of data thus the NoSQL databases emerged. By joining these two database models, there is room for crucial developments in the field of computer science. Bridging Relational and NoSQL Databases is an innovative source of academic content on the convergence process between databases and describes key features of the next database generation. Featuring coverage on a wide variety of topics and perspectives such as BASE approach, CAP theorem, and hybrid and native solutions, this publication is ideally designed for professionals and researchers interested in the features and collaboration of relational and NoSQL databases.




IBM Power Systems L and LC Server Positioning Guide


Book Description

This IBM® RedpaperTM publication is written to assist you in locating the optimal server/workload fit within the IBM Power SystemsTM L and IBM OpenPOWER LC product lines. IBM has announced several scale-out servers, and as a partner in the OpenPOWER organization, unique design characteristics that are engineered into the LC line have broadened the suite of available workloads beyond typical client OS hosting. This paper looks at the benefits of the Power Systems L servers and OpenPOWER LC servers, and how they are different, providing unique benefits for Enterprise workloads and use cases.




Apache Spark Implementation on IBM z/OS


Book Description

The term big data refers to extremely large sets of data that are analyzed to reveal insights, such as patterns, trends, and associations. The algorithms that analyze this data to provide these insights must extract value from a wide range of data sources, including business data and live, streaming, social media data. However, the real value of these insights comes from their timeliness. Rapid delivery of insights enables anyone (not only data scientists) to make effective decisions, applying deep intelligence to every enterprise application. Apache Spark is an integrated analytics framework and runtime to accelerate and simplify algorithm development, depoyment, and realization of business insight from analytics. Apache Spark on IBM® z/OS® puts the open source engine, augmented with unique differentiated features, built specifically for data science, where big data resides. This IBM Redbooks® publication describes the installation and configuration of IBM z/OS Platform for Apache Spark for field teams and clients. Additionally, it includes examples of business analytics scenarios.




Enterprise Data Warehouse Optimization with Hadoop on IBM Power Systems Servers


Book Description

Data warehouses were developed for many good reasons, such as providing quick query and reporting for business operations, and business performance. However, over the years, due to the explosion of applications and data volume, many existing data warehouses have become difficult to manage. Extract, Transform, and Load (ETL) processes are taking longer, missing their allocated batch windows. In addition, data types that are required for business analysis have expanded from structured data to unstructured data. The Apache open source Hadoop platform provides a great alternative for solving these problems. IBM® has committed to open source since the early years of open Linux. IBM and Hortonworks together are committed to Apache open source software more than any other company. IBM Power SystemsTM servers are built with open technologies and are designed for mission-critical data applications. Power Systems servers use technology from the OpenPOWER Foundation, an open technology infrastructure that uses the IBM POWER® architecture to help meet the evolving needs of big data applications. The combination of Power Systems with Hortonworks Data Platform (HDP) provides users with a highly efficient platform that provides leadership performance for big data workloads such as Hadoop and Spark. This IBM RedpaperTM publication provides details about Enterprise Data Warehouse (EDW) optimization with Hadoop on Power Systems. Many people know Power Systems from the IBM AIX® platform, but might not be familiar with IBM PowerLinuxTM, so part of this paper provides a Power Systems overview. A quick introduction to Hadoop is provided for those not familiar with the topic. Details of HDP on Power Reference architecture are included that will help both software architects and infrastructure architects understand the design. In the optimization chapter, we describe various topics: traditional EDW offload, sizing guidelines, performance tuning, IBM Elastic StorageTM Server (ESS) for data-intensive workload, IBM Big SQL as the common structured query language (SQL) engine for Hadoop platform, and tools that are available on Power Systems that are related to EDW optimization. We also dedicate some pages to the analytics components (IBM Data Science Experience (IBM DSX) and IBM SpectrumTM Conductor for Spark workload) for the Hadoop infrastructure.




IBM Software Defined Infrastructure for Big Data Analytics Workloads


Book Description

This IBM® Redbooks® publication documents how IBM Platform Computing, with its IBM Platform Symphony® MapReduce framework, IBM Spectrum Scale (based Upon IBM GPFSTM), IBM Platform LSF®, the Advanced Service Controller for Platform Symphony are work together as an infrastructure to manage not just Hadoop-related offerings, but many popular industry offeringsm such as Apach Spark, Storm, MongoDB, Cassandra, and so on. It describes the different ways to run Hadoop in a big data environment, and demonstrates how IBM Platform Computing solutions, such as Platform Symphony and Platform LSF with its MapReduce Accelerator, can help performance and agility to run Hadoop on distributed workload managers offered by IBM. This information is for technical professionals (consultants, technical support staff, IT architects, and IT specialists) who are responsible for delivering cost-effective cloud services and big data solutions on IBM Power SystemsTM to help uncover insights among client's data so they can optimize product development and business results.




IBM Reference Architecture for Genomics, Power Systems Edition


Book Description

This IBM® Redbooks® publication introduces the IBM Reference Architecture for Genomics, IBM Power SystemsTM edition on IBM POWER8®. It addresses topics such as why you would implement Life Sciences workloads on IBM POWER8, and shows how to use such solution to run Life Sciences workloads using IBM PlatformTM Computing software to help set up the workloads. It also provides technical content to introduce the IBM POWER8 clustered solution for Life Sciences workloads. This book customizes and tests Life Sciences workloads with a combination of an IBM Platform Computing software solution stack, Open Stack, and third party applications. All of these applications use IBM POWER8, and IBM Spectrum ScaleTM for a high performance file system. This book helps strengthen IBM Life Sciences solutions on IBM POWER8 with a well-defined and documented deployment model within an IBM Platform Computing and an IBM POWER8 clustered environment. This system provides clients in need of a modular, cost-effective, and robust solution with a planned foundation for future growth. This book highlights IBM POWER8 as a flexible infrastructure for clients looking to deploy life sciences workloads, and at the same time reduce capital expenditures, operational expenditures, and optimization of resources. This book helps answer clients' workload challenges in particular with Life Sciences applications, and provides expert-level documentation and how-to-skills to worldwide teams that provide Life Sciences solutions and support to give a broad understanding of a new architecture.




Big Data Analytics with Applications in Insider Threat Detection


Book Description

Today's malware mutates randomly to avoid detection, but reactively adaptive malware is more intelligent, learning and adapting to new computer defenses on the fly. Using the same algorithms that antivirus software uses to detect viruses, reactively adaptive malware deploys those algorithms to outwit antivirus defenses and to go undetected. This book provides details of the tools, the types of malware the tools will detect, implementation of the tools in a cloud computing framework and the applications for insider threat detection.




Big Data Management and Processing


Book Description

From the Foreword: "Big Data Management and Processing is [a] state-of-the-art book that deals with a wide range of topical themes in the field of Big Data. The book, which probes many issues related to this exciting and rapidly growing field, covers processing, management, analytics, and applications... [It] is a very valuable addition to the literature. It will serve as a source of up-to-date research in this continuously developing area. The book also provides an opportunity for researchers to explore the use of advanced computing technologies and their impact on enhancing our capabilities to conduct more sophisticated studies." ---Sartaj Sahni, University of Florida, USA "Big Data Management and Processing covers the latest Big Data research results in processing, analytics, management and applications. Both fundamental insights and representative applications are provided. This book is a timely and valuable resource for students, researchers and seasoned practitioners in Big Data fields. --Hai Jin, Huazhong University of Science and Technology, China Big Data Management and Processing explores a range of big data related issues and their impact on the design of new computing systems. The twenty-one chapters were carefully selected and feature contributions from several outstanding researchers. The book endeavors to strike a balance between theoretical and practical coverage of innovative problem solving techniques for a range of platforms. It serves as a repository of paradigms, technologies, and applications that target different facets of big data computing systems. The first part of the book explores energy and resource management issues, as well as legal compliance and quality management for Big Data. It covers In-Memory computing and In-Memory data grids, as well as co-scheduling for high performance computing applications. The second part of the book includes comprehensive coverage of Hadoop and Spark, along with security, privacy, and trust challenges and solutions. The latter part of the book covers mining and clustering in Big Data, and includes applications in genomics, hospital big data processing, and vehicular cloud computing. The book also analyzes funding for Big Data projects.




Data Analytics for Pandemics


Book Description

Epidemic trend analysis, timeline progression, prediction, and recommendation are critical for initiating effective public health control strategies, and AI and data analytics play an important role in epidemiology, diagnostic, and clinical fronts. The focus of this book is data analytics for COVID-19, which includes an overview of COVID-19 in terms of epidemic/pandemic, data processing and knowledge extraction. Data sources, storage and platforms are discussed along with discussions on data models, their performance, different big data techniques, tools and technologies. This book also addresses the challenges in applying analytics to pandemic scenarios, case studies and control strategies. Aimed at Data Analysts, Epidemiologists and associated researchers, this book: discusses challenges of AI model for big data analytics in pandemic scenarios; explains how different big data analytics techniques can be implemented; provides a set of recommendations to minimize infection rate of COVID-19; summarizes various techniques of data processing and knowledge extraction; enables users to understand big data analytics techniques required for prediction purposes.




Recent Books