User guide » History » Version 48
« Previous -
Version 48/86
(diff) -
Next » -
Current version
Herbert Greenlee, 01/12/2015 05:09 PM
- Table of contents
- Overview
- Using project.py
- Project File Structure
Overview¶
Larsoft common batch and workflow tools are contained in ups product larbatch
(this redmine), which is built and distributed as part of larsoft
. Larbatch tools are built on top of Fermilab jobsub_client
batch submission tools. For general information about jobsub_client and the Fermilab batch system, refer to articles on the jobsub wiki and the fife wiki.
No other part of larsoft is dependent on larbatch
, and larbatch
is not setup as a dependent of the larsoft
umbrella ups product. Rather, larbatch
is intended to be a dependent of experiment-specific ups products (see this article for instructions on configuring larbatch
for a specific experiment.
After setting up ups product larbatch
, several executable scripts and python modules are available on the execution path and python path. Here is a list of the more important ones.
- project.py
An executable python script that is the the main entry point for user interation. More information can be found below.
- project_utilities.py
A python module, imported byproject.py
, that implements some of the workflow functionality. End users would not normally interact directly with this module. However, a significant aspect ofproject_utilities.py
is that is supplies hooks for providing experiment-specific implementations of some functionality, as described in an accompanying article on this wiki.
- condor_lar.sh
The main batch script.Condor_lar.sh
is a general purpose script that manages a single invocation of an art framework program (lar
executable).Condor_lar.sh
sets up the run-time environment, fetches input data, interacts with sam, and copies output data. It is not intended that end users will directly invokecondor_lar.sh
. However, one can get a general idea of the features and capabilities ofcondor_lar.sh
by viewing the built-in documentation by typing "condor_lar.sh -h
, or reading the file header.
- condor_start_project.sh
Batch script for starting a sam project.
- condor_stop_project.sh
Batch script for stopping a sam project.
Using project.py
¶
Project.py
is used in conjunction with a xml format project definition file (see below). The concept of a project, as understood by project.py
, and as defined by the project definition file, is a multistage linear processing chain involving a specified number of batch workers at each stage.
Use cases¶
In a typical invocation of project.py
, one specifies the project file (via option --xml
), tha stage name (via option --stage
), and one or more action options. Here are some use cases for invoking project.py
.
project.py -h
orproject.py --help
Print built-in help (lists all available command line options).
project.py -xh
orproject.py --xmlhelp
Print built-in xml help (lists all available elements that can be included in project definition file).
project.py --xml xml-name --status
Print global summary status of the project.
project.py --xml xml-name --stage stage-name --submit
Submit batch jobs for specified stage.
project.py --xml xml-name --stage stage-name --check
Check results from specified stage (identifies failed jobs). This action assumes that the art program produces an artroot output file.
project.py --xml xml-name --stage stage-name --checkana
Check results from specified stage (identifies failed jobs). This version of the check action skips some checks done by--check
that only make sense if the art program produces an artroot output file. Use this action to check results from an analyzer-only art program.
project.py --xml xml-name --stage stage-name --makeup
Submit makeup jobs for failed jobs, as identified by a previous--check
or--checkana
action.
project.py --xml xml-name --stage stage-name --clean
Delete output for the specified stage and later stages. This option can be combined with--submit
.
project.py --xml xml-name --stage stage-name --declare
Declare successful artroot files to sam.
project.py --xml xml-name --stage stage-name --upload
Upload successful artroot files to enstore.
project.py --xml xml-name --stage stage-name --define
Create sam dataset definition.
project.py --xml xml-name --stage stage-name --audit
Check the completeness and correctness of a processing stage using sam parentage information. For this action to work, input and output files must be must be declared to sam.
Project File Structure¶
The general structure of the project file is that it contains a single root element of type "project
" (enclosed in "<project name=project-name>...</project>
"). Inside the project element, there are additional subelements, including one or moe stage subelements (enclosed in "<stage name=stage-name>...</stage>
." Each stage element defines a group of batch jobs that are submitted together by a single invocation of jobsub_submit
.
Examples¶
Example XML project files used by microboone from ubutil product can be found here.
Internal documentation¶
Refer to header of project.py or type "project.py --xmlhelp"
.
XML header section¶
The initial lines of an XML project file should follow a standard pattern. Here is a typical example header.
<?xml version="1.0"?> <!DOCTYPE project [ <!ENTITY release "v02_05_01"> <!ENTITY file_type "mc"> <!ENTITY run_type "physics"> <!ENTITY name "prod_eminus_0.1-2.0GeV_isotropic_uboone"> <!ENTITY tag "mcc5.0"> ]>
The significance of the header elements are as follows.
- The XML version
Copy the above version line exactly, namely,<?xml version="1.0"?>
- The document type (DOCTYPE keyword).
The argument following the DOCTYPE keyword specifies the "root element" of the XML file, and should always be "project
."
- Entity definitions
Entity definitions, which occur inside the DOCTYPE section, are XML aliases. Any string that occurs repeatedly inside an XML file is a candidate for being defined as an entity. Entities can be substituted inside the the body of the XML file by enclosing the entity name inside&...;
(e.g.&release;
).
Project Element¶
Each project definition file should contain a single project element enclosed in "<project name=project-name>...</project>
." The name attribute of the project element is required.
- A single subelement with tag "
larsoft
," which defines the run-time environment. - Option subelements.
- One or more stage subelements.
Larsoft subelement.¶
Each project element is required to contain a single subelement with tag "larsoft
" (enclosed in "<larsoft>...</larsoft>
." The larsoft subelement defines the batch run-time environment. The larsoft subelement contains its own subelements, of which there are currently three:
<tag>...</tag>
Larsoft release version.
<qual>...</qual>
Larsoft release qualifier.
<local>...</local>
Path of user's local test release directory or tarball.
All larsoft subelements should contain only text. The local
subelement is optional. Here is how a typical larsoft subelement might appear in a project definition file.
<larsoft> <tag>&release;</tag> <qual>e6:prof</qual> </larsoft>
Note in this example that the larsoft version is defined by an entity "
release
," which should be defined in the DOCTYPE section.
Project options¶
Project options are subelements of the project element with tags other that "larsoft
" or "stage
." Project options should contain only text. Here are some project options (this is the full list when this wiki was written). The full list of project options (and all defined XML constructs) can always be found by typing "project.py --xmlhelp
."
- <group>...</group>
Should contain the standard experiment name (for microboone use "uboone
"). If missing, environment variable$GROUP
is used.
- <numevents>...</numevents>
Total number of events to process.
- <numjobs>...</numjobs>
Default number of parallel worker jobs (default 1). Can be overridden in individual stages.
- <os>...</os>
Comma-separated list of allowed batch OSes (e.g. "SL5,SL6"). This option is passed directly tojobsub_submit
command line option--OS
. Defaultjobsub
decides.
- <resource>...</resource>
Specify defaultjobsub
resources (command line option "--resource-provides=usage_model=
"). Default is "DEDICATED,OPPORTUNISTIC
". For OSG specify "OFFSITE
." Can be overridden in individual stages.
- <server>...</server>
Specifyjobsub
server. Expert option, usually not needed.
- <site>...</site>
Specify OSG site (comma-separated list). Use with<resource>OFFSITE</resource>
. Defaultjobsub
decides, which usually means "any site."