IBM Spectrum Scale Best Practices for Genomics Medicine Workloads


Book Description

Advancing the science of medicine by targeting a disease more precisely with treatment specific to each patient relies on access to that patient's genomics information and the ability to process massive amounts of genomics data quickly. Although genomics data is becoming a critical source for precision medicine, it is expected to create an expanding data ecosystem. Therefore, hospitals, genome centers, medical research centers, and other clinical institutes need to explore new methods of storing, accessing, securing, managing, sharing, and analyzing significant amounts of data. Healthcare and life sciences organizations that are running data-intensive genomics workloads on an IT infrastructure that lacks scalability, flexibility, performance, management, and cognitive capabilities also need to modernize and transform their infrastructure to support current and future requirements. IBM® offers an integrated solution for genomics that is based on composable infrastructure. This solution enables administrators to build an IT environment in a way that disaggregates the underlying compute, storage, and network resources. Such a composable building block based solution for genomics addresses the most complex data management aspect and allows organizations to store, access, manage, and share huge volumes of genome sequencing data. IBM SpectrumTM Scale is software-defined storage that is used to manage storage and provide massive scale, a global namespace, and high-performance data access with many enterprise features. IBM Spectrum ScaleTM is used in clustered environments, provides unified access to data via file protocols (POSIX, NFS, and SMB) and object protocols (Swift and S3), and supports analytic workloads via HDFS connectors. Deploying IBM Spectrum Scale and IBM Elastic StorageTM Server (IBM ESS) as a composable storage building block in a Genomics Next Generation Sequencing deployment offers key benefits of performance, scalability, analytics, and collaboration via multiple protocols. This IBM RedpaperTM publication describes a composable solution with detailed architecture definitions for storage, compute, and networking services for genomics next generation sequencing that enable solution architects to benefit from tried-and-tested deployments, to quickly plan and design an end-to-end infrastructure deployment. The preferred practices and fully tested recommendations described in this paper are derived from running GATK Best Practices work flow from the Broad Institute. The scenarios provide all that is required, including ready-to-use configuration and tuning templates for the different building blocks (compute, network, and storage), that can enable simpler deployment and that can enlarge the level of assurance over the performance for genomics workloads. The solution is designed to be elastic in nature, and the disaggregation of the building blocks allows IT administrators to easily and optimally configure the solution with maximum flexibility. The intended audience for this paper is technical decision makers, IT architects, deployment engineers, and administrators who are working in the healthcare domain and who are working on genomics-based workloads.




HIPAA Compliance for Healthcare Workloads on IBM Spectrum Scale


Book Description

When technology workloads process healthcare data, it is important to understand Health Insurance Portability and Accountability Act (HIPAA) compliance and what it means for the technology infrastructure in general and storage in particular. HIPAA is US legislation that was signed into law in 1996. HIPAA was enacted to protect health insurance coverage, but was later extended to ensure protection and privacy of electronic health records and transactions. In simple terms, it was instituted to modernize the exchange of healthcare information and how the Personally Identifiable Information (PII) that is maintained by the healthcare and healthcare-related industries are safeguarded. From a technology perspective, one of the core requirements of HIPAA is the protection of Electronic Protected Health Information (ePHIPer through physical, technical, and administrative defenses. From a non-compliance perspective, the Health Information Technology for Economic and Clinical Health Act (HITECH) added protections to HIPAA and increased penalties $100 USD - $50,000 USD per violation. Today, HIPAA-compliant solutions are a norm in the healthcare industry worldwide. This IBM® Redpaper publication describes HIPPA compliance requirements for storage and how security enhanced software-defined storage is designed to help meet those requirements. We correlate how Software Defined IBM Spectrum® Scale security features address the safeguards that are specified by the HIPAA Security Rule.




IBM Reference Architecture for High Performance Data and AI in Healthcare and Life Sciences


Book Description

This IBM® Redpaper publication provides an update to the original description of IBM Reference Architecture for Genomics. This paper expands the reference architecture to cover all of the major vertical areas of healthcare and life sciences industries, such as genomics, imaging, and clinical and translational research. The architecture was renamed IBM Reference Architecture for High Performance Data and AI in Healthcare and Life Sciences to reflect the fact that it incorporates key building blocks for high-performance computing (HPC) and software-defined storage, and that it supports an expanding infrastructure of leading industry partners, platforms, and frameworks. The reference architecture defines a highly flexible, scalable, and cost-effective platform for accessing, managing, storing, sharing, integrating, and analyzing big data, which can be deployed on-premises, in the cloud, or as a hybrid of the two. IT organizations can use the reference architecture as a high-level guide for overcoming data management challenges and processing bottlenecks that are frequently encountered in personalized healthcare initiatives, and in compute-intensive and data-intensive biomedical workloads. This reference architecture also provides a framework and context for modern healthcare and life sciences institutions to adopt cutting-edge technologies, such as cognitive life sciences solutions, machine learning and deep learning, Spark for analytics, and cloud computing. To illustrate these points, this paper includes case studies describing how clients and IBM Business Partners alike used the reference architecture in the deployments of demanding infrastructures for precision medicine. This publication targets technical professionals (consultants, technical support staff, IT Architects, and IT Specialists) who are responsible for providing life sciences solutions and support.




IBM Elastic Storage Server Implementation Guide for Version 5.3


Book Description

This IBM® RedpaperTM publication introduces and describes the IBM Elastic StorageTM Server as a scalable, high-performance data and file management solution. The solution is built on proven IBM SpectrumTM Scale technology, formerly IBM General Parallel File System (GPFSTM). IBM Elastic Storage Servers can be implemented for a range of diverse requirements, providing reliability, performance, and scalability. This publication helps you to understand the solution and its architecture and helps you to plan the installation and integration of the environment. The following combination of physical and logical components are required: Hardware Operating system Storage Network Applications This paper provides guidelines for several usage and integration scenarios. Typical scenarios include Cluster Export Services (CES) integration, disaster recovery, and multicluster integration. This paper addresses the needs of technical professionals (consultants, technical support staff, IT Architects, and IT Specialists) who must deliver cost-effective cloud services and big data solutions.




Cloud Data Sharing with IBM Spectrum Scale


Book Description

This IBM® RedpaperTM publication provides information to help you with the sizing, configuration, and monitoring of hybrid cloud solutions using the Cloud data sharing feature of IBM Spectrum ScaleTM. IBM Spectrum Scale, formerly IBM General Parallel File System (IBM GPFSTM), is a scalable data and file management solution that provides a global namespace for large data sets along with several enterprise features. Cloud data sharing allows for the sharing and use of data between various cloud object storage types and IBM Spectrum Scale. Cloud data sharing can help with the movement of data in both directions, between file systems and cloud object storage, so that data is where it needs to be, when it needs to be there. This paper is intended for IT architects, IT administrators, storage administrators, and those who want to learn more about sizing, configuration, and monitoring of hybrid cloud solutions using IBM Spectrum Scale and Cloud data sharing.




Integration of IBM Aspera Sync with IBM Spectrum Scale: Protecting and Sharing Files Globally


Book Description

Economic globalization requires data to be available globally. With most data stored in file systems, solutions to make this data globally available become more important. Files that are in file systems can be protected or shared by replicating these files to another file system that is in a remote location. The remote location might be just around the corner or in a different country. Therefore, the techniques that are used to protect and share files must account for long distances and slow and unreliable wide area network (WAN) connections. IBM® Spectrum Scale is a scalable clustered file system that can be used to store all kinds of unstructured data. It provides open data access by way of Network File System (NFS); Server Message Block (SMB); POSIX Object Storage APIs, such as S3 and OpenStack Swift; and the Hadoop Distributed File System (HDFS) for accessing and sharing data. The IBM Aspera® file transfer solution (IBM Aspera Sync) provides predictable and reliable data transfer across large distance for small and large files. The combination of both can be used for global sharing and protection of data. This IBM RedpaperTM publication describes how IBM Aspera Sync can be used to protect and share data that is stored in IBM SpectrumTM Scale file systems across large distances of several hundred to thousands of miles. We also explain the integration of IBM Aspera Sync with IBM Spectrum ScaleTM and differentiate it from solutions that are built into IBM Spectrum Scale for protection and sharing. We also describe different use cases for IBM Aspera Sync with IBM Spectrum Scale.




IBM Reference Architecture for Genomics, Power Systems Edition


Book Description

This IBM® Redbooks® publication introduces the IBM Reference Architecture for Genomics, IBM Power SystemsTM edition on IBM POWER8®. It addresses topics such as why you would implement Life Sciences workloads on IBM POWER8, and shows how to use such solution to run Life Sciences workloads using IBM PlatformTM Computing software to help set up the workloads. It also provides technical content to introduce the IBM POWER8 clustered solution for Life Sciences workloads. This book customizes and tests Life Sciences workloads with a combination of an IBM Platform Computing software solution stack, Open Stack, and third party applications. All of these applications use IBM POWER8, and IBM Spectrum ScaleTM for a high performance file system. This book helps strengthen IBM Life Sciences solutions on IBM POWER8 with a well-defined and documented deployment model within an IBM Platform Computing and an IBM POWER8 clustered environment. This system provides clients in need of a modular, cost-effective, and robust solution with a planned foundation for future growth. This book highlights IBM POWER8 as a flexible infrastructure for clients looking to deploy life sciences workloads, and at the same time reduce capital expenditures, operational expenditures, and optimization of resources. This book helps answer clients' workload challenges in particular with Life Sciences applications, and provides expert-level documentation and how-to-skills to worldwide teams that provide Life Sciences solutions and support to give a broad understanding of a new architecture.




IBM Software-Defined Storage Guide


Book Description

Today, new business models in the marketplace coexist with traditional ones and their well-established IT architectures. They generate new business needs and new IT requirements that can only be satisfied by new service models and new technological approaches. These changes are reshaping traditional IT concepts. Cloud in its three main variants (Public, Hybrid, and Private) represents the major and most viable answer to those IT requirements, and software-defined infrastructure (SDI) is its major technological enabler. IBM® technology, with its rich and complete set of storage hardware and software products, supports SDI both in an open standard framework and in other vendors' environments. IBM services are able to deliver solutions to the customers with their extensive knowledge of the topic and the experiences gained in partnership with clients. This IBM RedpaperTM publication focuses on software-defined storage (SDS) and IBM Storage Systems product offerings for software-defined environments (SDEs). It also provides use case examples across various industries that cover different client needs, proposed solutions, and results. This paper can help you to understand current organizational capabilities and challenges, and to identify specific business objectives to be achieved by implementing an SDS solution in your enterprise.




Implementation Guide for IBM Elastic Storage System 5000


Book Description

This IBM® Redbooks® publication introduces and describes the IBM Elastic Storage® Server 5000 (ESS 5000) as a scalable, high-performance data and file management solution. The solution is built on proven IBM Spectrum® Scale technology, formerly IBM General Parallel File System (IBM GPFS). ESS is a modern implementation of software-defined storage, making it easier for you to deploy fast, highly scalable storage for AI and big data. With the lightning-fast NVMe storage technology and industry-leading file management capabilities of IBM Spectrum Scale, the ESS 3000 and ESS 5000 nodes can grow to over YB scalability and can be integrated into a federated global storage system. By consolidating storage requirements from the edge to the core data center — including kubernetes and Red Hat OpenShift — IBM ESS can reduce inefficiency, lower acquisition costs, simplify storage management, eliminate data silos, support multiple demanding workloads, and deliver high performance throughout your organization. This book provides a technical overview of the ESS 5000 solution and helps you to plan the installation of the environment. We also explain the use cases where we believe it fits best. Our goal is to position this book as the starting point document for customers that would use the ESS 5000 as part of their IBM Spectrum Scale setups. This book is targeted toward technical professionals (consultants, technical support staff, IT Architects, and IT Specialists) who are responsible for delivering cost-effective storage solutions with ESS 5000.




IBM Spectrum Archive Enterprise Edition V1.3.2.2: Installation and Configuration Guide


Book Description

This IBM® Redbooks® publication helps you with the planning, installation, and configuration of the new IBM Spectrum® Archive Enterprise Edition (EE) Version 1.3.2.2 for the IBM TS4500, IBM TS3500, IBM TS4300, and IBM TS3310 tape libraries. IBM Spectrum Archive Enterprise Edition enables the use of the LTFS for the policy management of tape as a storage tier in an IBM Spectrum Scale based environment. It also helps encourage the use of tape as a critical tier in the storage environment. This edition of this publication is the tenth edition of IBM Spectrum Archive Installation and Configuration Guide. IBM Spectrum Archive EE can run any application that is designed for disk files on a physical tape media. IBM Spectrum Archive EE supports the IBM Linear Tape-Open (LTO) Ultrium 9, 8, 7, 6, and 5 tape drives. and the IBM TS1160, TS1155, TS1150, and TS1140 tape drives. IBM Spectrum Archive EE can play a major role in reducing the cost of storage for data that does not need the access performance of primary disk. The use of IBM Spectrum Archive EE to replace disks with physical tape in tier 2 and tier 3 storage can improve data access over other storage solutions because it improves efficiency and streamlines management for files on tape. IBM Spectrum Archive EE simplifies the use of tape by making it transparent to the user and manageable by the administrator under a single infrastructure. This publication is intended for anyone who wants to understand more about IBM Spectrum Archive EE planning and implementation. This book is suitable for IBM customers, IBM Business Partners, IBM specialist sales representatives, and technical specialists.




Recent Books