Project lifecycle

A project manages delivery of a dataset to one or more jobs. Each individual job independently requests files from the project, continuing until there are none left.

Running a project

Choose a project name: this must be unique, so some combination of username and time is a reasonable place to start. Then start the project with the /startProject interface passing it the station name, project name, and dataset definition. This interface returns the URL to use for future operations. The URL for a running project can also be retrieved by the /findProject interface, so all that batch jobs can obtain the URL as long as they know the project name.

Each separate job calls the /establishProcess interface in order to register itself with the project. This returns the consumer process ID. Each consumer process represents a separate stream of file deliveries and has a different ID.

Each consumer process loop calls the /getNextFile interface in a loop. This interface can return http status code 204, which means no more files are available for this consumer. Or it can return status code 202, which means there are no files currently available but it is trying to obtain more. The body text in this case returns a descriptive text string followed by a colon followed by the suggested number of seconds before the client should query again. Finally, /getNextFile can return status 200 and a response, the first line of which is the access URL for a file [currently only file:// URLs are supported, but this may change in the future].

When the job has completed processing each file it must call the /releaseFile interface with the filename and the status of the file. A status of 'ok' will mark the file as successfully processed. Any other status will mark it as unsuccessful.

Processes that have delivered all the available files are automatically marked as finished, but if desired the /endProcess interface may be used to explicitly end a process.

When the job has completely finished processing, including tasks such as returning the output to the user, the /setStatus interface may be used to mark it as 'complete' (for recovery purposes).

Once all consumer processes have completed, then the /endProject interface should be used to terminate the project. After this is done the project is no longer usable.