Chapter 1
Introduction

A computational grid [1], built using grid computing technology, is a network of computing resources that work together as a single, uniform operating environment. It provides pervasive access to resources with advanced computational and storage capabilities, and can be viewed as a virtual supercomputer designed for large-scale applications. These applications are massive computational problems from scientific and engineering domains, such as earthquake simulation [2], and climate/weather modeling [345] that require huge computation power and storage resources for their operations. Computational grids are one of the most cost-effective strategies for meeting the resource needs of these applications.

One important characteristic of these applications is that they are no longer being developed as monolithic and single-executable codes, but incorporate multiple dependent computational modules, and entail transfer and storage of a large amount of data. The execution of these applications involves the concurrent and sequential execution of multiple modules, and the automatic and timely data transfer between modules. These applications are often referred to as scientific workflow applications, and a module is often called a task in workflow terms. A very important issue in executing a scientific workflow application in computational grids is how to map and schedule workflow modules onto multiple distributed resources, and handle module dependencies in a timely manner to deliver users’ expected performance. The goal of this research is to develop a workflow system to address this issue in computational grid environments.

1.1 Workflow Applications in Computational Grid Environments

There are mainly two topics in the effort of deploying workflow applications in computational grid environments: the workflow description means and the workflow scheduler. Workflow description is about how to model workflow structure, i.e., describing the workflow tasks and their dependencies. Most workflow applications are modeled as graphs; the graph vertices denote workflow tasks and the graph edges denote task dependencies. Workflow scheduling is about executing workflow tasks in the right order, on the right resources and at the right time. It can be as simple as performing a topological sort [6], and then launching the workflow tasks according to the topological order. In grid environments, these two issues become much more complex than they appear to be. For example, we must coordinately allocate multiple resources to the workflow tasks before their execution, and reduce the queue waiting time for workflow tasks that are submitted to resources for execution. In describing a workflow, resource request information for workflow tasks must be specified to support for resource allocation decision making.

There have been many efforts to develop a workflow system and workflow scheduling algorithms for grid environments, see a list of these efforts in [7]. In most of these efforts, such as DAGMan [8], Taverna [9], Karajan [10] and Triana [11], the workflow description methods focus on the expressive capability of describing workflow structures, and they do not support for specifying scheduling related information. The workflow scheduling strategy is basically an extension of the workflow enact engine with grid job launching and file transfer services. They lack the capability of allocating resources for workflow tasks, thus requiring users to manually specify resource allocation details. This is a very inflexible and impractical approach in dynamic and virtual grid environments. In efforts to develop workflow scheduling algorithms, such as ASKALON [12], Pegasus [13], and Gridbus[14], grid resource management issues are not well addressed. For example, the grid scheduling hierarchy, which may introduce significant performance overhead during workflow execution in high-load environments, is not taken into account in the scheduling process. Because of that, some assumptions made when evaluating those algorithms are unrealistic in computational grid environments.

Advanced scheduling techniques, such as resource co-allocation, resource negotiation and advanced reservation, and performance prediction, have been studied in the past several decades. Some of them have been implemented in commercial schedulers for computing clusters or supercomputers. Recently, they have been active and open research topics in the scope of grid computing. Yet we found very few efforts to study and use those techniques in grid workflow systems. We believe (and have proved by this work) that those advanced scheduling techniques can greatly improve the overall workflow execution performance.

1.2 Research Goals and Contributions

The goal of this work is to develop a grid workflow system with advanced scheduling techniques, and to study the performance impacts of these advanced scheduling techniques on the overall workflow performance in computational grid environments. In the workflow system, the description method should support for specifying information required for workflow scheduling. The workflow scheduler should have the capability of co-allocating resources for workflow tasks, and should be able to apply different advanced scheduling techniques in the scheduling process to deliver users’ expected quality of services. We summarize the contributions of our work presented in this dissertation as follows:

  1. The definition of a workflow system architecture that integrates a workflow-orchestrated execution planner and resource allocator, a workflow enact engine and a runtime system.
  2. A workflow description language that addresses the issue of lacking scheduling support we mentioned herein.
  3. A workflow scheduling (resource allocation and planning) algorithm that applies those advanced scheduling techniques we mentioned herein.
  4. A simulation environment that closely models a real computational grid in those aspects relevant to workflow scheduling and the performance analysis of our scheduling algorithm under the simulated grid.

Along this work, a paper in the Journal of Grid Computing, and a few conference papers and book chapters have been published or in progress. A website, with contents being updated, has been set up to provide the latest technical documentation and software update, and the URL is “http://www.cs.uh.edu/~gracce”.

1.3 Dissertation Organization

The detailed description of this work is organized in the rest of the dissertation. In the following chapter, grid computing model, grid architecture and grid resource management issues are introduced and discussed. Then in Chapter 3, we introduce grid scientific workflow applications and workflow systems. In Chapter 4, related efforts of grid workflow systems are studied and based on the studies, we motivate our work. Chapter 5 presents the features and technical details of the workflow description languages designed in this work. Following it in Chapter 6, we present our workflow scheduling architecture and algorithms for workflow resource allocation and execution planning. Chapter 7 shows our simulation results and analyzes the performance improvements by using our scheduler on workflow executions. We finally conclude this dissertation and discuss the future works in Chapter 8.