Project

General

Profile

Pubs and project.py

Production.py

The pubs script that interacts with project.py is called production.py, and is located in subdirectory dstream_prod of the pubs product.

Pubs status codes

Production.py uses a large number of status codes to manage multistage projects, which are executed by project.py on offline computing resources.
Pubs status codes are two-digit numbers, xy, where the first digit x represents the stage, and the second digit y represents the processing status within a stage. Obviously, this convention implies a maximum of ten stages and ten states within stages. So far, we haven't exeeded these limits.

Stage status codes

Value Symbolic name Description
0 kDONE Stage is complete
1 kINITIATED Stage is started (ready to submit batch job)
2 kTOBEVALIDATED Not used
3 kSUBMITTED Batch job has been submitted
4 kRUNNING Batch job is running
5 kFINISHED Batch job completed (ready to check output)
6 kTOBERECOVERED Check failed (ready to submit recovery job)
7 kREADYFORSAM Check succeeded (ready to declare to sam)
8 kDECLARED Declared to sam (ready to upload to enstore)
9 kSTORED Successfully copied to FTS dropbox (location not verified)

The above values and symbolic names are hard coded in production.py.

In normal production, stage statuses advance in sequence: 1, 3, 4, 5, 7, 8, 10, 11 (next stage), ....

MC chain stages.

Value Name Description
0 gen Generator
10 g4 Geant4
20 detsim Detector simulation
30 reco1 Stage 1 reconstruction
40 reco2 Stage 2 reconstruction
50 mergeana Merge + analysis

Stage names and values are defined as resources in the project configuration.

Special pubs statuses

There are some special pubs statuses that do not conform to the above two-digit convention.

Value Name Description
100 Dead end No further processing needed for this (run, subrun, seq, version). Not an error.
>1000 Error Too many resubmissions for this (run, subrun, seq, version).

Production.py logic

On each invocation of production.py for a given project, the main function (process) has four nested loops over the following indices:

  1. Status within stage (high to low).
  2. Stage (high to low).
  3. Run.
  4. subrun.

According to this logic, the special statuses (dead end and error) are never reached, and so no further processing happens for them. For each regular status, production.py queries what (run, subrun) pairs are at this status through the pubs database api, loops over the resulting runs and subruns, and does an appropriate action according to the status. The appropriate action will often involve an invocation of project.py via the python interface (import project). The invocation of project.py is done in such a way that, were it done through the command line interface, it would be equivalent to invoking project.py with command line option "--pubs <run> <subrun> <version>".

A recently added capability of production.py and project.py is the ability to process multiple subruns in a single "action" (e.g. a single invocation of project.py). Currently, the only action that has a working multiple-subrun interface is job submission. The multiple-subrun job submission interface works for both file list and sam dataset input.

Project.py pubs mode.

When project.py is invoked with option "--pubs <run> <subrun> <version>" (or equivalent via python interface), one says that project.py is being invoked in pubs mode. Pubs mode modifies the behavior of project.py, as compared to a stand alone project.py project, both with respect to where project.py gets input files, and where project.py stores output files.

Multiple subruns can be specified on the command line interface by specifying multiple subruns or subruns ranges, separated by commas and hyphens, with no embedded spaces.

Effect of pubs mode on output files.

Pubs mode affects where project.py stores output files by adding subdirectories <version>/<run>/<subrun> to the output directory, log directory, and work directory specified in the project xml file.

Effect of pubs mode on input files.

Pubs mode affects where project.py expects to find input files, depending on how input is specified in the xml file.

  • If a stage has no input (e.g. generator stage), pubs mode has no effect on input.
  • In case input is being daisy-chained from disk from a previous stage, pubs mode causes project.py to construct a reduced input file list, assuming the previous stage has a pubs directory structure.
  • In case input is from sam, pubs mode will cause project.py to generate a new sam dataset definition with additional run and subrun constraints.
  • Pubs input mode is not supported, and does not make sense, for single-file input.
  • Pubs input mode can be defeated by specifying "<pubsinput>0</pubsinput>" in the xml file for a stage. Then project.py will get input files from wherever it would have got them if it were not using pubs mode, e.g. a file list or a sam dataset.