Project

General

Profile

poms logo

Draft Working Document (at this time things can change at any time and pictures included as well)

POMS Overview

The Production Operations Management System (POMS) is a project designed to provide a service to assist production teams and analysis groups of experiments in their MC production and DATA processing. As the quantity of data originated by the running experiments greatly increases, the ability of simplifying the steps in data processing and management has become more and more appealing to the users.

POMS provides a web service interface that enables automated jobs submission on distributed resources according to customers’ requests and subsequent monitoring and recovery of failed submissions, debugging and record keeping.

POMS is interfaced to the following systems:

  • Jobsub, a service that provides support for the job lifecycle enabling the management of jobs on distributed resources, such as the Grid.
  • SAM, the data handling system, to keep track of files, their meta-data and processing.
  • ECL, the Electronic Collaboration Logbook where experiments can keep track of the production and processing operations as a collection of records chronologically organized in the form of "logbook entries".

The ultimate goal is the most efficient utilization of all computing resources available to experiments, while providing a simple and transparent interface between users and the complexity of the grid.

POMS runs behind a web service interface that provides both interactive pages to the users, and a REST interfaces to scripts that interact with it. This means that experiments can use POMS through their web browser to configure and run their production code, or they can use the poms_client and poms_jobsub_wrapper tools to submit jobs through a command line and have POMS tracking, debugging and monitoring them.

To help understanding some terminology used in this document a Glossary is provided.

Basic Concepts

POMS employs the concept of a “Campaign” to achieve the goal of data processing. A “Campaign” is a collection of one or more stages executed in a user defined order to achieve the goal of data processing.

Each campaign stage configuration involves three main steps, mainly the definition of parameters and actions which will be used to launch the jobs:

  • Compose a login/setup.
  • Compose the job type.
  • Compose one or more campaign stages providing the login/setup template and job types previously defined.

Note: Both the Login/Setup and Job Type definitions can and should be reused.

Getting access: New Experiments, New Accounts, Users and Roles

To be able to use POMS interface the user needs to have an account. As this account has a level of authority in the experiment, its creation must be approved by the experiment. The user needs to do the following to request an account (We assume the user has already a Fermilab Service account).
  • Open the webpage to SNOW and login using the Service account.
  • Select the tile "Request Something".
  • Select the tile "Experiment/Project/Collaboration Computing Account".
  • Choose your experiment.
  • Choose the role "Production" or "Analysis". Analysis can view production campaign data, but cannot yet create its own.

It can take up to 24 hours from when your request is approved until it appears in POMS.

When a new experiment is interested in using POMS, a Service Desk ticket needs to be opened with the request. The user can go the Service Desk Website
Scientific Computing Services , select Scientific Production Processing and under the 'Get Help' select 'Submit a request to service providers'.

POMS has a concept of user Roles to control various operations on the components. User get assigned role also outside POMS through Ferry,VOMS. Depending on the role, a user can perform certain actions.

Three roles are provided : coordinator, production and analysis.

  • The highest role is coordinator . A user with this role can modify any jobs.
  • Those with production role are able to modify ALL production templates.
  • Those with analysis role can only modify their templates.

When logging in POMS, for the first time, the user has, by default, the lowest role allowed for the account/experiment.

A "Practical Scenario"

Let's envision a very possible scenario for an experiment during the initial phase of simulation:

We want to use the production processing to simulate and recostruct events in a portion of the detector to test reconstruction accuracy. Most likely we would use a MonteCarlo type programs to generate particles events in a detector, store data in files which in turn will be used to reconstruct the events and then finally analyze the data.

From POMS point of view, the overall process will represent a 'campaign' with possibly five stages:

Stage 1: Event Generation jobs.
Stage 2: Geant simulation jobs.
Stage 3: Detector Simulation jobs.
Stage 4: Reconstrution jobs.
Stage 5: Analyzing jobs.

Basic information need to be provided for each stage to achieve the goals of the whole campaign which could be simplified as the "who-where-how":
  • who
    • the account to use to login into the host (unix account)
  • where
    • the host on which the login script will be launched
  • how
    • the script to use to configure the environment
    • the job types to be used for each stage
    • the scripts to be used when submitting the jobs and the parameters to be used
    • the dependency between stages
    • the criterion to determine when a stage is considered finished so we can move on to the next stage

Those steps are specified and grouped in three type of templates: "Login Template", "Job Types", "Campaign stages" which will be first viewed in the next section before expounding in the details .

The "Big Picture"

Before entering into the details of the components, let's examine the following "Big Picture" for a hypothetical campaign called eve_mc which could represent the above 'Practical Scenario'.
This representation is done using the Campaign Editor, a POMS built-in tool which allows to create and manipulate campaigns. Details on how to use the tool are available in the Campaign Editor User Guide.

Let's see first a brief description of the elements in the campaign's layout:

  • Login/Setup : This is the basic component which defines the host from which jobs will be launched, the account used to login into the host and the environment POMS configures for the jobs that will be launched. If you double click on the 'generic_fife_launch' login/setup element you will see the following:

  • Job Type : This defines the type of job used for processing. Strictly speaking, there is no 'field' to store the job type, instead, the type is the result of the launch script and parameters which are used to accomplish the purpose of the job. As a suggestion, the user could give it a meaningful name to reflect its purpose, for example 'Myjob_MC' for a MonteCarlo job.
    If you double click on the 'generic_fife_process' login/setup element you will see the following:

  • Campaign Stage: A stage consists of a Login/Setup, a Job Type and a set of definitions and parameters used to run the jobs for accomplishing a 'stage' of the whole campaign. Campaign stages can be connected by dependencies, for examples, files produced by one stage can be used as input for the next stage.
    If you double click on a stage element you will see the following:

  • Campaign Stage Dependency: It defines how the stages depend on each other, typically specifying the file pattern of the files produce by one stage and used as input by the next stage . If you double click on the arrow between two stages you will see the following:

  • Campaign Defaults: default values used when adding a new stage. If you double click on the arrow between two stages you will see the following:

Campaign stages can branch out, have further dependency and eventually come back together. An example of a possible scenario based on the previous example could be if after generating the events, the user wants to continue the simulation under two different detectors configurations, with and without noise , and then compare the results.
The picture below shows this new scenario:

Now let's see the components in more details. The basic components can be created following the links in the Main Menu; Sample Campaigns are also available to provide samples of campaigns which can be cloned and then customized using the Campaign Editor.

Drilling down

Campaign definition, as already mentioned, involves the definition of a login/setup , a job type and one or more stages.
We strongly raccomend to use the Campaign Editor to define all the component, however, they can all be built using the links from the main menu.
At the time of writing, creating a new login setup and a job type can actually only be done using the links from the main menu and that is shown in the following sections.

Compose a login/setup script

Campaign setup involves the definition of a login/setup which is a collection of bash commands to be executed on the experiment’s user defined host to establish the environment from which POMS will launch all jobsub jobs.

As a user may belong to more than one experiment, as a first step, make sure you select the experiment and role from the pull-down menu on the top right corner.

Templates can be reused, cloned and modified for different campaigns and are available to other users as well. So you can use the same template for different processing campaigns.

Existing templates could be viewed following the 'Compose Login/setup' link on the main page side panel to see if there is already a template usable for the purpose. To create a new template just click the 'Add' button.

login template example

Four fields need to be added:

  • Name: Define the name of your login template.
  • Host: Define the interactive node or machine you are going to use to launch the login/setup script.
  • Account: Define the user login account for the launch (ex. novapro, minervapro, minervacal, uboonepro, etc )
  • Setup: Define the environment variable, setups, or scripts that you usually setup on your own machine (e.g. setup_nova, launch script) if you were to launch the jobs from the shell command line.
    This must be done in one semicolon separated shell line, one line only, no <CR> otherwise the script will break.
    Typically, as part of the setup, FIFE_UTILS and JOBSUB are used.

Note: The Account above will need the following .k5login entries:

poms/cd/pomsgpvm01.fnal.gov@FNAL.GOV

Compose a job type

Next step is to compose the job type. A Job 'Type' is a way of categorizing the job based on its purpose, for example, Monte Carlo, Calibration, Reconstruction etc. Jobs of the same 'type' will typically have same or similar set of parameters.
Since the Job Type defines the purpose of the processing accomplished in a certain stage, you can have different job types for the different campaign stages.
Furthermore, a typical way stages are connected is through the use of files produced by the previous stage and before the next stage can be started, you need a way to let the work flow know that the previous stage is finished. This is also done configuring the Job Type.

You can view, create or clone existing job types following the main menu link to 'Compose Job Type'.

The page might show existing job types; as per the template you can modify existing ones or click the ‘Add’ button to create a new one.

job type example

The following are the fields that need to be filled:

  • Name: A name that describes the kind of job campaign you are running (eg. Nova raw2root keepup FD, rock_neutrinoMC, minerva_cal).
  • Output file patterns: The output pattern you are interested in your campaign (eg. %.root)
  • Launch Script: In this field, you need to put the script that you run to submit jobs in your machine.
  • Definition Parameters: The arguments your launch script (included in Launch script) used for the submission.
  • Recovery Launches: When jobs complete there might be errors so that you want to re-launch the campaign stage: a Recovery Launch field is where you specify options to re-submit jobs based on their failure. Example of available options: added_files, consumed_status, pending files, proj_status

About the use of the 'Output file patterns':
User's Jobs can use input files and produce output files (it is responsability of the user's job to declare the files to SAM for further data handling).
From the POMS perspective, when configuring the JOB type, the user can specify the output file patterns of the files produced by the jobs; these will be then used by POMS to check on the 'completion' level of the campaign stage the jobs ran for.

About the use of the 'Launch Script':
The launch script is used when starting the job using, in typical case fife_launch (example [mvi: change accordingly when showing the real example..]:
fife_launch -c /sbnd/app/users/dbrailsf/poms/soft/srcs/sbndutil/cfg/poms/sbnd_launch.cfg
POMS strongly suggests the use of fife_launch which is a config-file based job launcher script; fife_launch is the front-end to jobsub_submit which is part of the JOBSUB client library which in turn does the final job submission.
Example of a config file can be found clicking on the 'Config File Templates' link from the main menu.

Compose a campaign stage

This is the final preparation step where you combine the information previously defined in the launch template and the Job type with some 'campaign stage-proper' information before running the campaign.
A campaign can have multiple stages; here is where you will also be able to specify how stages are connected with dependencies which need to be 'satisfied' before next stage can be started and you will specify the criterion to use to declare a stage done so that the next one can start.

You can view, create or clone existing campaign stages following the main menu link to 'Compose Campaign Stages’.
Whichever action, user is presented with a form that has 3 sections:

  1. general campaign information
  2. specify launch template to use
  3. specify job type to use.

stage example

General campaign section

In this section you will specify the following fields:

  • Name: the name for this stage, which, as a suggestion, it could be something meaningful for the purpose of the stage.
  • VO Role: the role to use for jobsub when submitting jobs for this campaign. It can be "Production", "Analysis" or in some cases "Calibration" or others provided they exist in the experiement VO role.
  • State: Active or Inactive
  • Experiment Software Version: the software version. Typically experiment software components are bundled up in a version to be used by the running jobs.
    The version will be set in the metadata of output files generated by this campaign.
    POMS assumes files have metadata that lists their parentage, and software application information;it can then use the software version, filename patterns, and parentage to define datasets for the output of this campaign layer.
  • Dataset: Dataset this campaign stage will process. If this campaign is only ever run as a later stage in a workflow, this is ignored.
  • Dataset Split Type: It specifies how the Dataset could be split, please refer to further documentation for details.
  • Completion Type: This is where you specify the criterion used to be able to move to next stage. Two options are available:
    • Completed: to say the campaign layer submissions are complete when their jobs complete, or
    • Located: to say the layer is completed when the submissions output files are located.
  • Completion : This is related to the completion type: here you specify the percentage: so if you say that Completion type is 'located' and completion percent is 75, then the campaign will move on to the next stage when 75% of the jobs are found in SAM
  • Parameter Overrides: This allows you to override parameters to your Job Type's launch command. Clicking on the edit icon will pop up a window where you can specify the parameter as a key-value pairs that will be concatenated and put on the command line. Note that matching keys will replace matching keys from similar parameter lists you had previously assigned in the Definition Parameters in the Job Type.
    Note that the values in the Parameter Overrides will have VariableSubstitution performed on them.
  • Test Parameter Overrides: These are used in the same fashion as the Parameter Overrides but only when Testing the campaign submission (see later).
  • Depends On: This lets you define the dependencies on other campaign layers that this one has. Clicking on the edit icon will pop up a window where you can specify the Campaign name and the file pattern used to define the dependency. Note that to add a circular dependency (i.e. to make this campaign stage auto-launch the next submission as each one completes) you have to have saved the campaign stage at least once, so it will show up in the list of campaign stages to choose from.
Launch template section

This section contains the information you have pre-filled when creating the launch template.
When you pick the template from pull down menu the remaining fields for this section are automatically filled what the template information.

Job type section

This section contains the information you have pre-filled when creating the Job Type to be used for this stage.
When you pick the Job Type from pull down menu the remaining fields for this section are automatically filled what the Job Type information.

Please be advise that if you are editing an existing stage and you change the information in any section, and then save the stage with the same name , the previous information will be
overidden.

As previously mentioned, you can add, edit or clone a campaign stage.

Let’s see what happens in each case.

Case Add , the form opens:

a) general section: enter the name for the campaign and other info..
b) launch template section: pick template from pull down menu -->
the remaining fields are automatically filled what the template info

c) job type section: pick job type from pull down menu -->
the remaining fields are automatically filled what the template info

Case edit, select the edit icon for an existing campaign, the form opens:
a),b) and c) are prefilled with existing information. At this point you can make changes as needed.

Case clone, select the clone icon for an existing campaign, form opens:

a),b) and c) are prefilled with existing information.
The name is prefixed with ‘CLONE OF:name’: so you can change the name as desired.

Bringing it all together

Using the Compose Campaign Stage from the main menu implies filling up the forms for each stage to build the whole campaign.
This is where the Campaign Editor becomes very useful so you can create all the stages and define how the stages depend on each other in one common spot.
Let's see how we can get to the campaign eve_mc used in the initial example using the Campaign Editor.
Let's assume we have already defined the Login/setup and the job type. Also, let's assume we have created one stage for it, so basically starting with the following picture:

As you can see in picture below, if you right click on the existing stage a pop up window will show up where you can 'Add Node' which is the editor generic notation for 'Add stage' in our case; replace ' undefined' with the stage name you want, in our case eve_g4: click OK and you will end up with what it is shown in the picture on the right.

The second stages has been created with the default values and you can then open the form and change the fields accordingly.
The arrow that connects the two stages represents the dependency from the first stage. Double clicking on the arrow it shows the type of dependency, in this case stage eve_g4 will use files with pattern 'root' created by eve_gen.
If we continue to add stages in the same fashion the whole campaign will be generated.

Running the Campaign, actions and monitoring.

Now that the campaign has been defined, we can launch jobs, and to do so, under the main menu, section 'Campaign Data’ click on Campaign. This will show all the existing campaigns.
Click on the campaign you are interest and you will be redirected to a page with the stages for that campaign. (MVI: at this point want to use a campaign that has some useful information..)
At the time of writing, to start a campaign from the VERY beginning you need to start it from the first stage. POMS allows to start any stage individually: this is useful when, for example,
a campaigns 'breaks' in the middle of a certain stage, then you can restart from that broken stage instead of relaunching from scratch.
If you click on the first stage of the campaign you will be redirected to a page with all the information about the stage and looks like the following:

campaign stage info page

The page has several sections organized by the type of information they present:

  • Job status for the campaign stage
  • Reports
  • Actions
  • Campaign Stage general information
  • Job Type information
  • Login/Setup Information
  • Diagram with stages immediate dependencies.

Launching the campaign:

If you are sure to start the campaign for production , you would click on 'Launch Campaign Jobs Now',
however, if you prefer testing first you can select the 'Launch Campaign Test Jobs Now'. (using devel for this, and campaign mwm_test_1).

1) Testing: click on 'Launch Campaign Test Jobs Now':
it will go automatically to the log page showing all the commands executed.

2) Running: click on 'Launch Campaign Test Jobs Now':
it will go automatically to the log page showing all the commands executed.

The user can verify the status of the jobs choosing various options in the 'Reports/Status' section.

Two types of reports are available:

  • Produced by Landscape/Fifemon
  • Internal reports like 'Submission Time Bars' and 'Campaign stage Submissions Files'

POMS: Navigation Overview

Lets see a general description of POMS from the Navigation perspective.
Logging into POMS will direct you to the Home Page:

On the top right corner two selector fields show the experiment and the role based on your account; if you belong to multiple experiments you will be able to select accordingly.

The Main Menu on the left panel is organized in various sections which allow the user to view, configure and monitor the work:

  • External Links:
    • Logbook: Link to the Electronic Collaboration Logbook for the experiment if available.
    • POMS SNOW Page: Link to POMS Service Desk Page (Service information)
    • Downtime Calendar: Link to the Scientific Services Outage Calendar
  • Administration:
    This will show only for POMS admins
  • Campaign Data:
    • Campaigns: link to the list of existing campaigns for the experiment.
    • Campaign Stages: link to the list of existing stages.
    • Sample Campaigns: link to the list of existing samples to be used as templates when creating new campaign.
  • Configure works:
    • Compose Login/setup: link to the list of existing login templates and possible actions.
    • Compose Job Type: link to the list of existing job types and possible actions.
    • Compose Campaign Stages: link to the list of existing stages and possible actions.
    • Config File Templates: link to some useful templates that can be used when launching jobs.
  • Jobs:
    These links direct to Landscape plots to monitor campaigns and jobs status.

One of the most useful section is the 'Campaign Data'. Under 'Campaigns' you can view the all campaigns and select various actions on them.
In the following pictures, a filter on the campaign name has been applied to narrow down the list.

You can configure what is displayed on the page from the group of check boxes. By default, you will see Active campaigns that belong to you and others with Production role.

Several actions can be performed on the campaigns. In particular is worth mentioning few:

  • Add a campaign
  • Cloning the campaign
  • GUI editor

All the above actions will direct you to the use of the Campaign Editor.

If, for example, you choose Add a campaign, after being prompted for a name, in this case eve_calib was given, you will be presented with the following page:

A basic campaign skeleton is pre-built with default values, one generic stage which uses the generic login setup and job type. This example will be used in the Campaign Editor documentation to show how to use the tool.

Under 'Campaign Stages' you can view all the stages available.

Campaigns

This section looks a little more in details on the actions you can perform from the Campaigns page.
We will do the following:

  • Clone an existing campaign.
  • Launch the campaign.
  • Pause the campaign.
  • Resume the campaign.
  • View results.
  • Re-launch and kill it.

Clone an existing campaign

For this purpose we will go to the Campaigns page and clone the Campaign fake_demo1 which has 3 stages and different job types for each stage.
You will be prompetd for a new name, in our case eve_demo, then you will be redirected to the Campaign Editor Page.

The following picture illustrates the original campaign and the cloned one as they appear in the campaign editor.

As you can see, the new campaign has the same the new name assigned and all the stages have the same names as the original, BUT once saved they would become private to the new campaign.
In the bottom section the Login setup and job types are NOT cloned. This is purposely done since multiple campaigns could use same job types and users are encouraged to re-use them. However, if you need to change some field in the job type, then you must give a new name so that another job type will be created for your campaign so that nothing gets overridden in the original.

The following picture shows how to change the name of the cloned stage to be more appropriate to the campaign:

The following picture shows the final campaign after changing the stage names.

Launch the campaign

Now that the campaign has been created, to launch it you need to go to the first stage page, in our case eve_demo_gen_v1.
As previously stated, each stage has its own page and can be individually launched which can be good in case of failure; however, since the stages depend on each other, if no problems occur, after launching from first stage the whole campaign will be executed.
To monitor the jobs, you can go to the 'All Jobs' in landscape and see the list of jobs running. The following two pictures illustrate that:

In the following fifebatch monitoring page you can see jobs for the first two stages, gen and sim.

The batch job status is also summarized in the plot on the stage page:

If the stage had some issues, for example not all jobs succeed, the batch job Status would appear as in the following picture:

Glossary