Project

General

Profile

User guide » History » Version 74

Herbert Greenlee, 06/26/2015 09:40 AM

1 1 Herbert Greenlee
{{toc}}
2 1 Herbert Greenlee
3 1 Herbert Greenlee
h1. Overview
4 1 Herbert Greenlee
5 4 Herbert Greenlee
Larsoft common batch and workflow tools are contained in ups product @larbatch@ (this redmine), which is built and distributed as part of @larsoft@.  Larbatch tools are built on top of Fermilab @jobsub_client@ batch submission tools.  For general information about jobsub_client and the Fermilab batch system, refer to articles on the "jobsub wiki":https://cdcvs.fnal.gov/redmine/projects/jobsub/wiki and the "fife wiki":https://cdcvs.fnal.gov/redmine/projects/fife/wiki/Getting_Started_on_GPCF.
6 3 Herbert Greenlee
7 3 Herbert Greenlee
No other part of larsoft is dependent on @larbatch@, and @larbatch@ is not setup as a dependent of the @larsoft@ umbrella ups product.  Rather, @larbatch@ is intended to be a dependent of experiment-specific ups products (see [[admin_guide|this article]] for instructions on configuring @larbatch@ for a specific experiment.
8 5 Herbert Greenlee
9 9 Herbert Greenlee
After setting up ups product @larbatch@, several executable scripts and python modules are available on the execution path and python path.  Here is a list of the more important ones.
10 9 Herbert Greenlee
11 11 Herbert Greenlee
* "project.py":https://cdcvs.fnal.gov/redmine/projects/larbatch/repository/revisions/develop/entry/scripts/project.py
12 10 Herbert Greenlee
An executable python script that is the the main entry point for user interation.  More information can be found below.  
13 9 Herbert Greenlee
14 11 Herbert Greenlee
* "project_utilities.py":https://cdcvs.fnal.gov/redmine/projects/larbatch/repository/revisions/develop/entry/python/project_utilities.py
15 10 Herbert Greenlee
A python module, imported by @project.py@, that implements some of the workflow functionality.  End users would not normally interact directly with this module.  However, a significant aspect of @project_utilities.py@ is that is supplies hooks for providing experiment-specific implementations of some functionality, as described in an [[admin_guide#Experiment-specific hooks|accompanying article]] on this wiki.
16 10 Herbert Greenlee
17 11 Herbert Greenlee
* "condor_lar.sh":https://cdcvs.fnal.gov/redmine/projects/larbatch/repository/revisions/develop/entry/scripts/condor_lar.sh
18 10 Herbert Greenlee
The main batch script.  @Condor_lar.sh@ is a general purpose script that manages a single invocation of an art framework program (@lar@ executable).  @Condor_lar.sh@ sets up the run-time environment, fetches input data, interacts with sam, and copies output data.  It is not intended that end users will directly invoke @condor_lar.sh@.  However, one can get a general idea of the features and capabilities of @condor_lar.sh@ by viewing the built-in documentation by typing "@condor_lar.sh -h@, or reading the file header. 
19 12 Herbert Greenlee
20 12 Herbert Greenlee
* "condor_start_project.sh":https://cdcvs.fnal.gov/redmine/projects/larbatch/repository/revisions/develop/entry/scripts/condor_start_project.sh
21 12 Herbert Greenlee
Batch script for starting a sam project.
22 12 Herbert Greenlee
23 12 Herbert Greenlee
* "condor_stop_project.sh":https://cdcvs.fnal.gov/redmine/projects/larbatch/repository/revisions/develop/entry/scripts/condor_stop_project.sh
24 12 Herbert Greenlee
Batch script for stopping a sam project.
25 13 Herbert Greenlee
26 17 Herbert Greenlee
h1. Using @project.py@
27 13 Herbert Greenlee
28 26 Herbert Greenlee
@Project.py@ is used in conjunction with a xml format project definition file (see [[user_guide#Project File Structure|below]]).  The concept of a project, as understood by @project.py@, and as defined by the project definition file, is a multistage linear processing chain involving a specified number of batch workers at each stage.
29 13 Herbert Greenlee
30 56 Herbert Greenlee
h2. Internal documentation
31 56 Herbert Greenlee
32 56 Herbert Greenlee
Refer to header of "project.py":https://cdcvs.fnal.gov/redmine/projects/larbatch/repository/revisions/develop/entry/scripts/project.py or type @"project.py --help"@.  Internal documentation is always kept up to date @project.py@ command line options are changed.
33 56 Herbert Greenlee
34 22 Herbert Greenlee
h2. Use cases
35 22 Herbert Greenlee
36 27 Herbert Greenlee
In a typical invocation of @project.py@, one specifies the project file (via option @--xml@), tha stage name (via option @--stage@), and one or more action options.  Here are some use cases for invoking @project.py@.
37 13 Herbert Greenlee
38 13 Herbert Greenlee
* @project.py -h@ or @project.py --help@
39 13 Herbert Greenlee
Print built-in help (lists all available command line options).
40 13 Herbert Greenlee
41 1 Herbert Greenlee
* @project.py -xh@ or @project.py --xmlhelp@
42 28 Herbert Greenlee
Print built-in xml help (lists all available elements that can be included in project definition file).
43 14 Herbert Greenlee
44 29 Herbert Greenlee
* @project.py --xml xml-name --status@
45 14 Herbert Greenlee
Print global summary status of the project.
46 14 Herbert Greenlee
47 29 Herbert Greenlee
* @project.py --xml xml-name --stage stage-name --submit@
48 14 Herbert Greenlee
Submit batch jobs for specified stage.
49 14 Herbert Greenlee
50 29 Herbert Greenlee
* @project.py --xml xml-name --stage stage-name --check@
51 15 Herbert Greenlee
Check results from specified stage (identifies failed jobs).  This action assumes that the art program produces an artroot output file.  
52 14 Herbert Greenlee
53 29 Herbert Greenlee
* @project.py --xml xml-name --stage stage-name --checkana@
54 15 Herbert Greenlee
Check results from specified stage (identifies failed jobs).  This version of the check action skips some checks done by @--check@ that only make sense if the art program produces an artroot output file.  Use this action to check results from an analyzer-only art program.
55 14 Herbert Greenlee
56 29 Herbert Greenlee
* @project.py --xml xml-name --stage stage-name --makeup@
57 14 Herbert Greenlee
Submit makeup jobs for failed jobs, as identified by a previous @--check@ or @--checkana@ action.
58 14 Herbert Greenlee
59 29 Herbert Greenlee
* @project.py --xml xml-name --stage stage-name --clean@
60 14 Herbert Greenlee
Delete output for the specified stage and later stages.  This option can be combined with @--submit@.
61 14 Herbert Greenlee
62 29 Herbert Greenlee
* @project.py --xml xml-name --stage stage-name --declare@
63 14 Herbert Greenlee
Declare successful artroot files to sam.
64 14 Herbert Greenlee
65 29 Herbert Greenlee
* @project.py --xml xml-name --stage stage-name --upload@
66 14 Herbert Greenlee
Upload successful artroot files to enstore.
67 14 Herbert Greenlee
68 29 Herbert Greenlee
* @project.py --xml xml-name --stage stage-name --define@
69 1 Herbert Greenlee
Create sam dataset definition.
70 1 Herbert Greenlee
71 29 Herbert Greenlee
* @project.py --xml xml-name --stage stage-name --audit@
72 1 Herbert Greenlee
Check the completeness and correctness of a processing stage using sam parentage information.  For this action to work, input and output files must be must be declared to sam.
73 1 Herbert Greenlee
74 69 Herbert Greenlee
h1. GUI Interface
75 69 Herbert Greenlee
76 69 Herbert Greenlee
@Project.py@ has a GUI interface called @projectgui.py@, which is invoked as follows.
77 69 Herbert Greenlee
<pre>
78 69 Herbert Greenlee
projectgui.py xml-files
79 69 Herbert Greenlee
</pre>
80 69 Herbert Greenlee
81 69 Herbert Greenlee
Essentially all of the functionality that is available via the command line interface of @project.py@ can be access from the GUI, in (hopefully) obvious fashion.
82 69 Herbert Greenlee
83 1 Herbert Greenlee
h1. Project File Structure
84 26 Herbert Greenlee
85 54 Herbert Greenlee
The general structure of the project file is that it is an XML file that contains a single root element of type "@project@" (enclosed in "@<project name=project-name>...</project>@").  Inside the project element, there are additional subelements, including one or moe stage subelements (enclosed in "@<stage name=stage-name>...</stage>@."  Each stage element defines a group of batch jobs that are submitted together by a single invocation of @jobsub_submit@.
86 15 Herbert Greenlee
87 39 Herbert Greenlee
h2. Examples
88 16 Herbert Greenlee
89 39 Herbert Greenlee
Example XML project files used by microboone from ubutil product can be found "here.":https://cdcvs.fnal.gov/redmine/projects/ubutil/repository/revisions/master/show/xml/mcc5.0
90 1 Herbert Greenlee
91 21 Herbert Greenlee
h2. Internal documentation
92 21 Herbert Greenlee
93 55 Herbert Greenlee
Refer to header of "project.py":https://cdcvs.fnal.gov/redmine/projects/larbatch/repository/revisions/develop/entry/scripts/project.py or type @"project.py --xmlhelp"@.  Internal documentation is always kept up to date when XML constructs are added or changed.
94 21 Herbert Greenlee
95 23 Herbert Greenlee
h2. XML header section
96 1 Herbert Greenlee
97 23 Herbert Greenlee
The initial lines of an XML project file should follow a standard pattern.  Here is a typical example header.
98 1 Herbert Greenlee
99 23 Herbert Greenlee
<pre>
100 23 Herbert Greenlee
<?xml version="1.0"?>
101 23 Herbert Greenlee
<!DOCTYPE project [
102 23 Herbert Greenlee
<!ENTITY release "v02_05_01">
103 23 Herbert Greenlee
<!ENTITY file_type "mc">
104 23 Herbert Greenlee
<!ENTITY run_type "physics">
105 23 Herbert Greenlee
<!ENTITY name "prod_eminus_0.1-2.0GeV_isotropic_uboone">
106 23 Herbert Greenlee
<!ENTITY tag "mcc5.0">
107 23 Herbert Greenlee
]>
108 23 Herbert Greenlee
</pre>
109 23 Herbert Greenlee
110 23 Herbert Greenlee
The significance of the header elements are as follows.
111 23 Herbert Greenlee
112 23 Herbert Greenlee
* The XML version
113 33 Herbert Greenlee
Copy the above version line exactly, namely,
114 32 Herbert Greenlee
<pre>
115 32 Herbert Greenlee
<?xml version="1.0"?>
116 32 Herbert Greenlee
</pre>
117 23 Herbert Greenlee
118 24 Herbert Greenlee
* The document type (DOCTYPE keyword).
119 34 Herbert Greenlee
The argument following the DOCTYPE keyword specifies the "root element" of the XML file, and should always be "@project@."
120 24 Herbert Greenlee
121 24 Herbert Greenlee
* Entity definitions
122 40 Herbert Greenlee
Entity definitions, which occur inside the DOCTYPE section, are XML aliases.  Any string that occurs repeatedly inside an XML file is a candidate for being defined as an entity.  Entities can be substituted inside the the body of the XML file by enclosing the entity name inside @&...;@ (e.g. @&release;@).
123 1 Herbert Greenlee
124 41 Herbert Greenlee
h2. Project Element
125 1 Herbert Greenlee
126 41 Herbert Greenlee
Each project definition file should contain a single project element enclosed in "@<project name=project-name>...</project>@."  The name attribute of the project element is required.
127 1 Herbert Greenlee
128 41 Herbert Greenlee
The content of the project element consists of other XML subelements, including the following.
129 41 Herbert Greenlee
* A single subelement with tag "@larsoft@," which defines the run-time environment.
130 41 Herbert Greenlee
* Option subelements.
131 41 Herbert Greenlee
* One or more stage subelements. 
132 1 Herbert Greenlee
133 41 Herbert Greenlee
h3. Larsoft subelement.
134 41 Herbert Greenlee
135 57 Herbert Greenlee
Each project element is required to contain a single subelement with tag "@larsoft@" (enclosed in "@<larsoft>...</larsoft>@."  The larsoft subelement defines the batch run-time environment.  The larsoft subelement may contain simple text subelements, of which there are currently three:
136 42 Herbert Greenlee
137 43 Herbert Greenlee
* @<tag>...</tag>@
138 42 Herbert Greenlee
Larsoft release version.
139 42 Herbert Greenlee
140 43 Herbert Greenlee
* @<qual>...</qual>@
141 42 Herbert Greenlee
Larsoft release qualifier.
142 42 Herbert Greenlee
143 43 Herbert Greenlee
* @<local>...</local>@
144 42 Herbert Greenlee
Path of user's local test release directory or tarball.
145 42 Herbert Greenlee
146 57 Herbert Greenlee
The @local@ subelement is optional.  Here is how a typical larsoft subelement might appear in a project definition file.
147 42 Herbert Greenlee
148 42 Herbert Greenlee
<pre>
149 42 Herbert Greenlee
<larsoft>
150 42 Herbert Greenlee
  <tag>&release;</tag>
151 42 Herbert Greenlee
  <qual>e6:prof</qual>
152 42 Herbert Greenlee
</larsoft>
153 42 Herbert Greenlee
</pre>
154 42 Herbert Greenlee
Note in this example that the larsoft version is defined by an entity "@release@," which should be defined in the DOCTYPE section.
155 41 Herbert Greenlee
156 44 Herbert Greenlee
h3. Project options
157 41 Herbert Greenlee
158 58 Herbert Greenlee
Project options are text subelements of the project element with tags other that "@larsoft@" or "@stage@."  Here are some project options (this is the full list when this wiki was written).  The full list of project options (and all defined XML constructs) can always be found by typing "@project.py --xmlhelp@."
159 41 Herbert Greenlee
160 51 Herbert Greenlee
* @<group>...</group>@
161 46 Herbert Greenlee
Should contain the standard experiment name (for microboone use "@uboone@").  If missing, environment variable @$GROUP@ is used.
162 1 Herbert Greenlee
163 51 Herbert Greenlee
* @<numevents>...</numevents>@
164 45 Herbert Greenlee
Total number of events to process.
165 45 Herbert Greenlee
166 51 Herbert Greenlee
* @<numjobs>...</numjobs>@
167 59 Herbert Greenlee
Number of parallel worker jobs (default 1).  Can be overridden in individual stages.
168 45 Herbert Greenlee
169 73 Michael Kirby
* @<maxfilesperjob>...</maxfilesperjob>@
170 72 Michael Kirby
Maximum number of files to deliver to a single job. Useful in case you want to limit output file size or keep 1 -> 1 correlation between input and output.
171 72 Michael Kirby
Can be overwritten by <stage><maxfilesperjob>
172 72 Michael Kirby
173 51 Herbert Greenlee
* @<os>...</os>@
174 59 Herbert Greenlee
Comma-separated list of allowed batch OSes (e.g. "SL5,SL6").  This option is passed directly to @jobsub_submit@ command line option @--OS@.  Default is @jobsub@ decides.
175 1 Herbert Greenlee
176 51 Herbert Greenlee
* @<resource>...</resource>@
177 59 Herbert Greenlee
Specify @jobsub@ resources (command line option "@--resource-provides=usage_model=@").  Default is "@DEDICATED,OPPORTUNISTIC@".  For OSG specify "@OFFSITE@."  Can be overridden in individual stages.
178 48 Herbert Greenlee
179 51 Herbert Greenlee
* @<server>...</server>@
180 1 Herbert Greenlee
Specify @jobsub@ server.  Expert option, usually not needed.
181 1 Herbert Greenlee
182 51 Herbert Greenlee
* @<site>...</site>@
183 59 Herbert Greenlee
OSG site(s) (comma-separated list).  Use with "@<resource>OFFSITE</resource>@."  Default is @jobsub@ decides, which usually means "any site."
184 1 Herbert Greenlee
185 51 Herbert Greenlee
* @<filetype>...</filetype>@
186 50 Herbert Greenlee
Sam file type (e.g. "data" or "mc").  Default none.
187 1 Herbert Greenlee
188 51 Herbert Greenlee
* @<runtype>...</runtype>@
189 1 Herbert Greenlee
Sam run type (e.g. "physics").  Default none.
190 1 Herbert Greenlee
191 51 Herbert Greenlee
* @<merge>...</merge>@
192 60 Herbert Greenlee
Histogram merging program.  Default "@hadd -T@."  Can be overridden in each stage.
193 1 Herbert Greenlee
194 51 Herbert Greenlee
* @<fcldir>...</fcldir>@
195 1 Herbert Greenlee
Specify additional directories in which to search for top-level fcl job files.  @Project.py@ searches @$FHICL_FILE_PATH@ and the current directory by default.
196 1 Herbert Greenlee
197 1 Herbert Greenlee
h3. Stage Sublements
198 1 Herbert Greenlee
199 52 Herbert Greenlee
Each project element should contain one or more stage subelements enclosed in "@<stage name=stage-name>...</stage>@."  The name attribute of the stage subelement is required, and should be different for each stage.  The stage element should contain stage options in the form of simple text subelements.  Here are the stage options:
200 51 Herbert Greenlee
201 51 Herbert Greenlee
* @<fcl>...</fcl>@
202 51 Herbert Greenlee
Top-level fcl job file (required).  Can be specified as full or relative path.
203 51 Herbert Greenlee
204 51 Herbert Greenlee
* @<outdir>...</outdir>@
205 51 Herbert Greenlee
Output directory full path (required).  The output directory should be accessible interactively on the submitting node and grid-write-accessible via @ifdh cp@ from the batch worker.
206 51 Herbert Greenlee
207 63 Herbert Greenlee
* @<numjobs>...</numjobs>@
208 63 Herbert Greenlee
Number of parallel worker jobs.  If not specified, inherit from project options.
209 63 Herbert Greenlee
210 63 Herbert Greenlee
* @<targetfilesize>...</targetfilesize>@
211 63 Herbert Greenlee
If specified, this option may override the number of workers (option numjobs) in the downward direction to achieve the estimated target file size.
212 63 Herbert Greenlee
213 66 Herbert Greenlee
The following options deal with where this processing stage gets its input data.  Specify no more than one input option.  You can also omit any input opiton, in which case, output data from the previous stage is pipelined to this stage, or there is no input.
214 63 Herbert Greenlee
215 51 Herbert Greenlee
* @<inputfile>...</inputfile>@
216 60 Herbert Greenlee
Specify a single input file full path.
217 60 Herbert Greenlee
218 60 Herbert Greenlee
* @<inputlist>...</inputlist>@
219 62 Herbert Greenlee
Specify input file list (a file containing a list of input files, one per line, full path).
220 1 Herbert Greenlee
221 60 Herbert Greenlee
* @<inputdef>...</inputdef>@
222 1 Herbert Greenlee
Specify input sam dataset definition.
223 66 Herbert Greenlee
224 66 Herbert Greenlee
* @<inputmode>...</inputmode>@
225 68 Tingjun Yang
Specify input mode, which can be "@textfile@" (do not include the quotes in the xml file) or nothing.  Use this option with @<inputfile>...</inputfile>@ or @<inputlist>...</inputlist>@ together with art producer module @TextFileGen@.  @TextFileGen@ should be configured at location physics.producers.generator in the job configuration fcl file. One example fcl is "prodtext.fcl":https://cdcvs.fnal.gov/redmine/projects/larsim/repository/revisions/develop/entry/EventGenerator/prodtext.fcl. The number of files in the @inputlist@ should match the number of grid jobs. 
226 1 Herbert Greenlee
227 65 Herbert Greenlee
The following options allow job customizations by user-written scripts.  The script location should be specified as an absolute or relative path (relative to current directory where @project.py@ is invoked).  Any specified job customization scripts are copied to the work directory and from there are copied to the batch worker.
228 60 Herbert Greenlee
229 1 Herbert Greenlee
* @<initscript>...</initscript>@
230 1 Herbert Greenlee
Worker initialization script (condor_lar.sh --init-script).
231 1 Herbert Greenlee
232 60 Herbert Greenlee
* @<initsource>...</initsource>@
233 60 Herbert Greenlee
Worker initialization source script (condor_lar.sh --init-source).
234 60 Herbert Greenlee
235 63 Herbert Greenlee
* @<endscript>...</endscript>@
236 63 Herbert Greenlee
Worker finalization script (condor_lar.sh --end-script).
237 64 Herbert Greenlee
238 64 Herbert Greenlee
Additional options.
239 63 Herbert Greenlee
240 63 Herbert Greenlee
* @<defname>...</defname>@
241 71 Herbert Greenlee
Sam dataset definition name for art output files.
242 60 Herbert Greenlee
243 1 Herbert Greenlee
* @<anadefname>...</anadefname>@
244 71 Herbert Greenlee
Sam dataset definition name for analysis output files.
245 71 Herbert Greenlee
246 71 Herbert Greenlee
* @<datatier>...</datatier>@
247 71 Herbert Greenlee
Sam data tier for art output files.
248 71 Herbert Greenlee
249 71 Herbert Greenlee
* @<anadatatier>...</anadatatier>@
250 71 Herbert Greenlee
Sam data tier for analysis output files.
251 70 Herbert Greenlee
252 61 Herbert Greenlee
* @<merge>...</merge>@
253 60 Herbert Greenlee
Histogram merging program.  If not specified, inherit from project options.
254 60 Herbert Greenlee
255 60 Herbert Greenlee
* @<resource>...</resource>@
256 60 Herbert Greenlee
Specify @jobsub@ resources (command line option "@--resource-provides=usage_model=@").  If not specified, inherit from project options.
257 60 Herbert Greenlee
258 60 Herbert Greenlee
* @<lines>...</lines>@
259 60 Herbert Greenlee
Specify arbitrary condor command via @jobsub_submit --lines= (expert option).
260 60 Herbert Greenlee
261 60 Herbert Greenlee
* @<site>...</site>@
262 60 Herbert Greenlee
OSG site(s) (comma-separated list).  If not specified, inherit from project options.
263 74 Herbert Greenlee
264 74 Herbert Greenlee
* @<output>...</output>@
265 74 Herbert Greenlee
Specify output file name.
266 74 Herbert Greenlee
267 74 Herbert Greenlee
* @<TFileName>...</TFileName>@
268 74 Herbert Greenlee
Specify TFileName.
269 74 Herbert Greenlee
270 74 Herbert Greenlee
* @<jobsub>...</jobsub>@
271 74 Herbert Greenlee
Arbitrary jobsub_submit command line options (space-separated list).