Provenance and Annotation of Data and Processes


Book Description

This book constitutes the thoroughly refereed post-conference proceedings of the Second International Provenance and Annotation Workshop, IPAW 2008, held in Salt Lake City, UT, USA, in June 2007. The 14 revised full papers and 15 revised short and demo papers presented together with 2 keynote lectures were carefully reviewed and selected from 40 submissions. The paper are organized in topical sections on provenance: models and querying; provenance: visualization, failures, identity; provenance and workflows; provenance for streams and collaboration; and applications.




Automated Optimization Methods for Scientific Workflows in e-Science Infrastructures


Book Description

Scientific workflows have emerged as a key technology that assists scientists with the design, management, execution, sharing and reuse of in silico experiments. Workflow management systems simplify the management of scientific workflows by providing graphical interfaces for their development, monitoring and analysis. Nowadays, e-Science combines such workflow management systems with large-scale data and computing resources into complex research infrastructures. For instance, e-Science allows the conveyance of best practice research in collaborations by providing workflow repositories, which facilitate the sharing and reuse of scientific workflows. However, scientists are still faced with different limitations while reusing workflows. One of the most common challenges they meet is the need to select appropriate applications and their individual execution parameters. If scientists do not want to rely on default or experience-based parameters, the best-effort option is to test different workflow set-ups using either trial and error approaches or parameter sweeps. Both methods may be inefficient or time consuming respectively, especially when tuning a large number of parameters. Therefore, scientists require an effective and efficient mechanism that automatically tests different workflow set-ups in an intelligent way and will help them to improve their scientific results. This thesis addresses the limitation described above by defining and implementing an approach for the optimization of scientific workflows. In the course of this work, scientists’ needs are investigated and requirements are formulated resulting in an appropriate optimization concept. In a following step, this concept is prototypically implemented by extending a workflow management system with an optimization framework, including general mechanisms required to conduct workflow optimization. As optimization is an ongoing research topic, different algorithms are provided by pluggable extensions (plugins) that can be loosely coupled with the framework, resulting in a generic and quickly extendable system. In this thesis, an exemplary plugin is introduced which applies a Genetic Algorithm for parameter optimization. In order to accelerate and therefore make workflow optimization feasible at all, e-Science infrastructures are utilized for the parallel execution of scientific workflows. This is empowered by additional extensions enabling the execution of applications and workflows on distributed computing resources. The actual implementation and therewith the general approach of workflow optimization is experimentally verified by four use cases in the life science domain. All workflows were significantly improved, which demonstrates the advantage of the proposed workflow optimization. Finally, a new collaboration-based approach is introduced that harnesses optimization provenance to make optimization faster and more robust in the future.




Scientific and Statistical Database Management


Book Description

This book constitutes the refereed proceedings of the 21st International Conference on Scientific and Statistical Database Management, SSDBM 2009, held in New Orleans, LA, USA in June 2009. The 29 revised full papers and 12 revised short papers including poster and demo papers presented together with three invited presentations were carefully reviewed and selected from 76 submissions. The papers are organized in topical sections on improving the end-user experience, indexing, physical design, and energy, application experience, workflow, query processing, similarity search, mining, as well as spatial data.




Workflows for e-Science


Book Description

This is a timely book presenting an overview of the current state-of-the-art within established projects, presenting many different aspects of workflow from users to tool builders. It provides an overview of active research, from a number of different perspectives. It includes theoretical aspects of workflow and deals with workflow for e-Science as opposed to e-Commerce. The topics covered will be of interest to a wide range of practitioners.




Guide to e-Science


Book Description

This guidebook on e-science presents real-world examples of practices and applications, demonstrating how a range of computational technologies and tools can be employed to build essential infrastructures supporting next-generation scientific research. Each chapter provides introductory material on core concepts and principles, as well as descriptions and discussions of relevant e-science methodologies, architectures, tools, systems, services and frameworks. Features: includes contributions from an international selection of preeminent e-science experts and practitioners; discusses use of mainstream grid computing and peer-to-peer grid technology for “open” research and resource sharing in scientific research; presents varied methods for data management in data-intensive research; investigates issues of e-infrastructure interoperability, security, trust and privacy for collaborative research; examines workflow technology for the automation of scientific processes; describes applications of e-science.




Data Management in Grid and Peer-to-Peer Systems


Book Description

The synergy and convergence of research on grid computing and peer-to-peer (P2P) computing have materialized in the meeting of the two research communities: parallel systems and distributed systems. The main common objective is to harness Internet-connected resources (e.g., CPU, memory, network bandwidth, data sources) at very large scale. In this framework, the Globe Conference tries to consolidate the bidirectional bridge between grid and P2P systems and large-scale heterogeneous distributed database systems. Today, the grid and P2P systems hold a more and more important position in the landscape of the research in large-scale distributed systems, and the applications which require an effective management of voluminous, distributed and heterogeneous data. This importance comes out of characteristics offered by these systems: autonomy and dynamicity of peers, decentralized control for scaling, and transparent sharing large-scale distributed resources. The second edition of the International Conference on Data Management in Grid and P2P Systems was held during September 1-2, 2009 in Linz, Austria. The main objective of this conference was to present the latest results in research and applications, to identify new issues, and to shape future directions.




Data Provenance and Data Management in eScience


Book Description

This book covers important aspects of fundamental research in data provenance and data management(DPDM), including provenance representation and querying, as well as practical applications in such domains as clinical trials, bioinformatics and radio astronomy.




Conceptual Modeling - ER 2005


Book Description

Conceptual modeling is fundamental to any domain where one must cope with complex real-world situations and systems because it fosters communication - tween technology experts and those who would bene?t from the application of those technologies. Conceptual modeling is the key mechanism for und- standing and representing the domains of information system and database - gineering but also increasingly for other domains including the new “virtual” e-environmentsandtheinformationsystemsthatsupportthem.Theimportance of conceptual modeling in software engineering is evidenced by recent interest in “model-drivenarchitecture”and“extremenon-programming”.Conceptualm- eling also plays a prominent rolein various technical disciplines and in the social sciences. The Annual International Conference on Conceptual Modeling (referred to as the ER Conference) provides a central forum for presenting and discussing current research and applications in which conceptual modeling is the major emphasis. In keeping with this tradition, ER 2005, the 24th ER Conference, spanned the spectrum of conceptual modeling including research and practice in areas such as theories of concepts and ontologies underlying conceptual m- eling, methods and tools for developing and communicating conceptual models, and techniques for transforming conceptual models into e?ective (information) system implementations. Moreover, new areas of conceptual modeling incl- ing Semantic Web services and the interdependencies of conceptual modeling with knowledge-based, logical and linguistic theories and approaches were also addressed.




From Active Data Management to Event-Based Systems and More


Book Description

Dedicated to the vision of Prof. Alejandro Buchmann, this collection of work illuminates various facets of data management and reflects the development of the field from its early association with database systems through to today’s wide-ranging applications.




Handbook of Research on Geoinformatics


Book Description

"This book discusses the complete range of contemporary research topics such as computer modeling, geometry, geoprocessing, and geographic information systems"--Provided by publisher.