WORKS09 Workshop at SC09

Monday Nov. 16 2009


This was the 4th Workshop on Workflows in Support of Large-Scale Science (WORKS09). Sessions were divided into three subjects: Applications, Workflow Representation and Provenance/Scheduling.

Workflow Applications

Workflow Management for Paramedical Emergency Operations within a Mobile-Static Distributed Envinornment

Description of use of workflow for integrating mobile and static participants. Emergency center integrates hospitals and ambulances.

Uses Triana - but said to be non-specific to this system.

Abstractions through web services.

Web Enabling Desktop Workflow Applications

The idea is to bundle up all relative data and binaries into a jar/zip file called WHIP (Workflow Hosted in Portals) - WHIP website.

A bundle is self described and can potentially be used by any workflow system.

Plasma Fusion Code Coupling Using Scalable I/O Services and Scientific Workflows

Uses Kepler.

Provenance gathered from I/O system, Kepler gets information from computing element nodes through SSH (this runs on the Jaguar).

Workflow Representation

A Data-Driven Workflow Language for Frids Based on Array Programming Principles

Defines behavior of dot and cross products on participant inputs.

Items in input/output vectors represented by an empty symbol.

Based on the SCULF language (Taverna).

Project website: GWENDIA

Kepler+Hadoop: A General Architecture Facilitating Data-Intensive Applications in Scientific Workflow Systems

Kepler subworkflows are distributed on the map/reduce phases of Hadoop.

Large overhead to start up the Kepler engine for each subworkflow.

Scientific Workflow Design with Data Assembly Lines

Workflow representations are usually not enough to express complicated loops. The idea here is to have an XML tree with parameters being passed into a participant, which then consumes part of it and generates a modified XML tree passed to the next participant. This helps reducing the workflow complexity at a higher lever, but brings complexity into the participants.

Implemented using Kepler.

Publish/Subscribe as a Model for Scientific Workflow Interoperability

Interoperability publish/subscribe mixed with webservices.

Workflow Representation and Runtime Based on Lazy Functionality Streams

Implementation of a library of commands in Haskel to allow the representation and execution of workflows.

(need link here)

Towards Scientific Workflow Patterns


Keynote: Provenance Interoperability with the Open Provenance Model

Conversion of provenance in other formats into OPM for queries.

OPM will possibly become a W3C standard.

Research aspects:
  • OPM accounts
  • Relation between accounts
  • Reasoning with provenance conflicts
  • Reasoning with incomplete provenance records

A Pipeline-Centric Provenance Model

Similar to VDL (Chimera/Swift), products are reconstructed using parameters.

Storage space vs runtime for generating products not stored.

Possibility of use of WHIP to store parameters needed to generate products.

A Navigation Model for Exploring Scientific Workflow Provenance Graphs

Proposal of several level of provenance views and navigation scheme between them.

Composing and Executing Parallel Dataflow Graphs with Shell Pipes

Changes to the behavior of regular unix pipes to allow map/reduce type operations to be composed on the shell command line.

A Simulation Toolkit to Investigate the Effects of Grid Characteristics on Workflow Completion Time

Scheduling Data-Intensive Workflows on Storage Constrained Resources

Order in which the workflow manager releases jobs, based on storage constraints. Three techniques shows. One used by Pegasus and two other based on Genetic Algorithms. Conclusion is that GA is equal or worst than Pegasus' scheme.