Optimizing Supercompilers for Supercomputers


Book Description

Effective use of a supercomputer requires users to have a good algorithm and to express this algorithm in an appropriate language, and requires compilers to generate efficient code. This book investigates several problems facing compiler design for supercomputers, including building efficient and comprehensive data dependence graphs, recurrence relations, the management of compiler temporary variables, and WHILE loops. The book first proposes an efficient means of representing the flow of data in a program by labeling the arcs in a data dependence graph with direction vectors to show how the flow of data corresponds to the loop structure of the program. These data dependence direction vectors are then used in several high level compiler loop optimizations: loop vectorization, loop concurrentization, loop fusion, and loop interchanging. The book shows how to perform these transformations and how to use them to optimize programs for a wide range of supercomputers. The problems of recurrence relations studied include arithmetic recurrences with IF statements and recurrences involving both data and control dependence relations in a cycle. The wavefront method of solving recurrences is also treated. The book discusses ways to make the problem of managing temporary arrays more tractable. It concludes by offering several methods for executing WHILE loops and describes a general structure of an optimizing compiler for supercomputers developed from the author's experience with a test bed compiler. Michael Wolfe is Associate Professor in the Computer Science and Engineering Department at the Oregon Graduate Center Optimizing Supercompilers for Supercomputers is included in the series ResearchMonographs in Parallel Computing. Copublished with Pitman Publishing.







Compiler Optimizations for Scalable Parallel Systems


Book Description

Scalable parallel systems or, more generally, distributed memory systems offer a challenging model of computing and pose fascinating problems regarding compiler optimization, ranging from language design to run time systems. Research in this area is foundational to many challenges from memory hierarchy optimizations to communication optimization. This unique, handbook-like monograph assesses the state of the art in the area in a systematic and comprehensive way. The 21 coherent chapters by leading researchers provide complete and competent coverage of all relevant aspects of compiler optimization for scalable parallel systems. The book is divided into five parts on languages, analysis, communication optimizations, code generation, and run time systems. This book will serve as a landmark source for education, information, and reference to students, practitioners, professionals, and researchers interested in updating their knowledge about or active in parallel computing.




A Systolic Array Optimizing Compiler


Book Description

This book is a revision of my Ph. D. thesis dissertation submitted to Carnegie Mellon University in 1987. It documents the research and results of the compiler technology developed for the Warp machine. Warp is a systolic array built out of custom, high-performance processors, each of which can execute up to 10 million floating-point operations per second (10 MFLOPS). Under the direction of H. T. Kung, the Warp machine matured from an academic, experimental prototype to a commercial product of General Electric. The Warp machine demonstrated that the scalable architecture of high-peiformance, programmable systolic arrays represents a practical, cost-effective solu tion to the present and future computation-intensive applications. The success of Warp led to the follow-on iWarp project, a joint project with Intel, to develop a single-chip 20 MFLOPS processor. The availability of the highly integrated iWarp processor will have a significant impact on parallel computing. One of the major challenges in the development of Warp was to build an optimizing compiler for the machine. First, the processors in the xx A Systolic Array Optimizing Compiler array cooperate at a fine granularity of parallelism, interaction between processors must be considered in the generation of code for individual processors. Second, the individual processors themselves derive their performance from a VLIW (Very Long Instruction Word) instruction set and a high degree of internal pipelining and parallelism. The compiler contains optimizations pertaining to the array level of parallelism, as well as optimizations for the individual VLIW processors.




Supercomputing in Engineering Analysis


Book Description

The first volume in this new series has a companion in volume 2 (unseen), Parallel processing in computational mechanics . The first six contributions present general aspects of supercomputing from both hardware and software engineering points of view. Subsequent chapters discuss homotopy algorithms




Logic for Programming, Artificial Intelligence, and Reasoning


Book Description

This book constitutes the refereed proceedings of the 13th International Conference on Logic for Programming, Artificial Intelligence, and Reasoning, LPAR 2006, held in Phnom Penh, Cambodia in November 2006. The 38 revised full papers presented together with one invited talk were carefully reviewed and selected from 96 submissions.




Compiler Construction


Book Description

The International Workshop on Compiler Construction provides a forum for thepresentation and discussion of recent developments in the area of compiler construction. Its scope ranges from compilation methods and tools to implementation techniques for specific requirements of languages and target architectures. This volume contains the papers selected for presentation at the 4th International Workshop on Compiler Construction, CC '92, held in Paderborn, Germany, October 5-7, 1992. The papers present recent developments on such topics as structural and semantic analysis, code generation and optimization, and compilation for parallel architectures and for functional, logical, and application languages.




Euro-Par '96 - Parallel Processing


Book Description

Content Description #Includes bibliographical references and index.




Intelligent Agents


Book Description

This volume coherently present 24 thoroughly revised full papers accepted for the ECAI-94 Workshop on Agent Theories, Architectures, and Languages. There is currently considerable interest, from both the AI and the mainstream CS communities, in conceptualizing and building complex computer systems as collections of intelligent agents. This book is devoted to theoretical and practical aspects of architectural and language-related design and implementation issues of software agents. Particularly interesting is the comprehensive survey by the volume editors, which outlines the key issues and indicates, via a comprehensive bibliography, topics for further reading. In addition, a glossary of key terms in this emerging field and a comprehensive subject index is included.




Automatic Parallelization


Book Description

Distributed-memory multiprocessing systems (DMS), such as Intel's hypercubes, the Paragon, Thinking Machine's CM-5, and the Meiko Computing Surface, have rapidly gained user acceptance and promise to deliver the computing power required to solve the grand challenge problems of Science and Engineering. These machines are relatively inexpensive to build, and are potentially scalable to large numbers of processors. However, they are difficult to program: the non-uniformity of the memory which makes local accesses much faster than the transfer of non-local data via message-passing operations implies that the locality of algorithms must be exploited in order to achieve acceptable performance. The management of data, with the twin goals of both spreading the computational workload and minimizing the delays caused when a processor has to wait for non-local data, becomes of paramount importance. When a code is parallelized by hand, the programmer must distribute the program's work and data to the processors which will execute it. One of the common approaches to do so makes use of the regularity of most numerical computations. This is the so-called Single Program Multiple Data (SPMD) or data parallel model of computation. With this method, the data arrays in the original program are each distributed to the processors, establishing an ownership relation, and computations defining a data item are performed by the processors owning the data.