Pubs and project.py¶
The pubs script that interacts with
project.py is called
production.py, and is located in subdirectory
dstream_prod of the pubs product.
Pubs status codes¶
Production.py uses a large number of status codes to manage multistage projects, which are executed by
project.py on offline computing resources.
Pubs status codes are two-digit numbers,
xy, where the first digit
x represents the stage, and the second digit
y represents the processing status within a stage. Obviously, this convention implies a maximum of ten stages and ten states within stages. So far, we haven't exeeded these limits.
Stage status codes¶
|0||kDONE||Stage is complete|
|1||kINITIATED||Stage is started (ready to submit batch job)|
|3||kSUBMITTED||Batch job has been submitted|
|4||kRUNNING||Batch job is running|
|5||kFINISHED||Batch job completed (ready to check output)|
|6||kTOBERECOVERED||Check failed (ready to submit recovery job)|
|7||kREADYFORSAM||Check succeeded (ready to declare to sam)|
|8||kDECLARED||Declared to sam (ready to upload to enstore)|
|9||kSTORED||Successfully copied to FTS dropbox (location not verified)|
The above values and symbolic names are hard coded in production.py.
In normal production, stage statuses advance in sequence: 1, 3, 4, 5, 7, 8, 10, 11 (next stage), ....
MC chain stages.¶
|30||reco1||Stage 1 reconstruction|
|40||reco2||Stage 2 reconstruction|
|50||mergeana||Merge + analysis|
Stage names and values are defined as resources in the project configuration.
Special pubs statuses¶
There are some special pubs statuses that do not conform to the above two-digit convention.
|100||Dead end||No further processing needed for this (run, subrun, seq, version). Not an error.|
|>1000||Error||Too many resubmissions for this (run, subrun, seq, version).|
On each invocation of
production.py for a given project, the main function (
process) has four nested loops over the following indices:
- Status within stage (high to low).
- Stage (high to low).
According to this logic, the special statuses (dead end and error) are never reached, and so no further processing happens for them. For each regular status,
production.py queries what (run, subrun) pairs are at this status through the pubs database api, loops over the resulting runs and subruns, and does an appropriate action according to the status. The appropriate action will often involve an invocation of
project.py via the python interface (import project). The invocation of
project.py is done in such a way that, were it done through the command line interface, it would be equivalent to invoking
project.py with command line option "
--pubs <run> <subrun> <version>".
A recently added capability of
project.py is the ability to process multiple subruns in a single "action" (e.g. a single invocation of
project.py). Currently, the only action that has a working multiple-subrun interface is job submission. The multiple-subrun job submission interface works for both file list and sam dataset input.
Project.py pubs mode.¶
project.py is invoked with option "
--pubs <run> <subrun> <version>" (or equivalent via python interface), one says that
project.py is being invoked in pubs mode. Pubs mode modifies the behavior of
project.py, as compared to a stand alone
project.py project, both with respect to where
project.py gets input files, and where
project.py stores output files.
Multiple subruns can be specified on the command line interface by specifying multiple subruns or subruns ranges, separated by commas and hyphens, with no embedded spaces.
Effect of pubs mode on output files.¶
Pubs mode affects where
project.py stores output files by adding subdirectories
<version>/<run>/<subrun> to the output directory, log directory, and work directory specified in the project xml file.
Effect of pubs mode on input files.¶
Pubs mode affects where
project.py expects to find input files, depending on how input is specified in the xml file.
- If a stage has no input (e.g. generator stage), pubs mode has no effect on input.
- In case input is being daisy-chained from disk from a previous stage, pubs mode causes
project.pyto construct a reduced input file list, assuming the previous stage has a pubs directory structure.
- In case input is from sam, pubs mode will cause
project.pyto generate a new sam dataset definition with additional run and subrun constraints.
- Pubs input mode is not supported, and does not make sense, for single-file input.
- Pubs input mode can be defeated by specifying "
<pubsinput>0</pubsinput>" in the xml file for a stage. Then
project.pywill get input files from wherever it would have got them if it were not using pubs mode, e.g. a file list or a sam dataset.