Project

General

Profile

Running Jobs » History » Version 11

Version 10 (Gianluca Petrillo, 01/22/2016 09:39 AM) → Version 11/21 (Gianluca Petrillo, 01/22/2016 09:40 AM)

{{toc}}

Under construction... NOT YET A SOURCE OF INFORMATION!!

h1. Running Jobs

This page describes the job configuration file (often, in jargon: _FHiCL file_) and how to run a job using one.

It is assumed that you have previously read the [[ Quick-start guide to using and developing LArSoft code ]] and the information on [[Using LArSoft on the GPVM nodes]].



h2. The Job Configuration File

Once a base release is set up, it is easy to run a job. The basic unit for running a job is the job-control script, written in the FHICL language. The FHICL language provides a simple mechanism for including parameter set configurations from different files such that many job configuration files can use the same configuration for a module or service.

There is also a nice "FHiCL quick start guide":https://cdcvs.fnal.gov/redmine/attachments/16021/quick_start.pdf available for more details.

h3. Key Concepts in FHiCL

There are a few key concepts to writing a FHICL job control script. In order, they are

# Including previously defined configurations for services and modules from other files. This is done using @#include@ statements. *Be sure you don't have any trailing space or tab characters* on the @#include@ line.
# Service block, denoted by @services: { }@ This block will contain configurations for ART specific services such as the @TFileService@ and the @RandomNumberGenerator@. It also contains the configuration of LArSoft specific services[1].
# Source block, denoted by @source: { }@. This block tells the job what kind of source to expect (@EmptyEvent@ in the case of Monte Carlo generation, @RootInput@ in the case of anything downstream of a Monte Carlo generator or reconstruction, and specific modules for data from the detector), the file name for the input source if appropriate, and how many events to process. Both the file name and number of events to process can be specified on the command line.
# Output block, denoted by @outputs: { }@ This block tells the job what kind of output to make, i.e. @RootOutput@, and what the name of the output file should be. The output file name can be specified on the command line. It is possible to define more than one output file if one wanted to run a job that produced different output files based on filter criteria - i.e. empty events are put in one file and events with neutrinos in them are put in another. Multiple output files can only be specified in the job configuration file, not from the command line.
# Physics block, denoted by @physics: { }@ This block is where all producer, analyzer, and filter modules are configured. Sequences of producer and filter modules to run is defined in user-named _paths_ in this block. The list of analyzers and output modules to run is defined in a separate user-named path. The block also defines two keyword parameters, @trigger_paths@ and @end_paths@. @trigger_paths@ contains all producer and filter paths to run, and @end_paths@ contains the analyzer and output path.

Comments may be included in FHiCL configuration files using the "#" character. The @#include@ is a keyword so that the parser knows not to ignore what comes after "#include[2]".

fn1. In old FHiCL files you will notice that LArSoft and in general non-art service configuration is enclosed in a @user@ block, that is now deprecated.

fn2. Note that the FHiCL parser can't recognize a comment on the same line as a @#include@ directive.

h3. FHiCL Rules

There are a few of rules to keep in mind about FHiCL:
* The value of the @process_name@ parameter may not contain underscores as the process name is used in the ROOT file branch name. Module labels may not contain underscores either, for the same reason.
* Parameter set names may not contain numbers, periods, backslashes, stars, etc. They may contain underscores.
* Put the values for all string parameters in double quotes, @"..."@
* Specify input vectors using @[ , , ]@, i.e. if you want a vector of doubles do @MyVector: [1.0, 3e-9, -900.]@
* You pick out configurations from the @PROLOG@ section(s), usually defined in the @#include@ files, using the @ @local::@ syntax. The value after the "@::@" is the name of the configuration specified in the @PROLOG@ (see the next bullet)
* You can override the value of an included configuration. For example, imagine there is a configuration specified in a included file called @mymoduleconfig@ and it contains the value @-5@ for the parameter named @myint@. One can load the configuration and then change the value of @myint@ by doing the following:
*# inside the producers block:
<pre>
physics: {
producers: {
# ...
mymod: @local::mymoduleconfig
}
}
</pre>
*# out of the physics block
<pre>
physics.producer.mymod.myint: 1
</pre>
The last value for a parameter always wins. If the second line was repeated with the value @2@ instead of @1@, the job would run with @myint@ as @2@.
Also note that in the example the original content of @mymoduleconfig@ is not changed when the content of @mymod@ is.

h3. Example configuration file: @detsimmodules.fcl@

An example of a file with predefined configurations for modules is in the "larsim/DetSim/detsimmodules.fcl":https://cdcvs.fnal.gov/redmine/projects/larsim/repository/revisions/develop/entry/DetSim/detsimmodules.fcl file.
All the definitions are inside a prologue block.
The following is taken from LArSoft @v02_03_01@ (experiments now use their specific @detsimmodule_Xxxx.fcl@ configuration file though):

<pre>
###### All files that are parameter set definitions must contain BEGIN_PROLOG as their first line ######
###### This tag tells the FHICL parser that parameter set definitions are coming ######
BEGIN_PROLOG

###### Generic configuration of the analyser for SimWireXxxx producer output ######
standard_simwireana:
{
module_type: "SimWireAna"
DetSimModuleLabel: "daq"
}

###### Configuration of the SimWire module (digitization) for ArgoNeuT ######
# extra comment to test check-in
argoneut_simwire:
{
module_type: "SimWireT962"
DriftEModuleLabel: "largeant"
ResponseFile: "shape-argo.root"
NoiseFact: 0.132 #Noise Scale
NoiseWidth: 62.4 #Exponential Noise width (kHz)
LowCutoff: 7.5 #Low frequency filter cutoff (kHz)
FieldBins: 75
Col3DCorrection: 2.5
Ind3DCorrection: 1.5
ColFieldRespAmp: 0.0354
IndFieldRespAmp: 0.018
ShapeTimeConst: [ 3000., 900. ]
CompressionType: "none" #could also be Huffman
}
###### Configuration of the analyser for SimWireXxxx output for ArgoNeuT (copy of generic) ######
argoneut_simwireana: @local::standard_simwireana

###### Configuration of the SimWire module (digitization) for MicroBooNE ######
microboone_simwire:
{
module_type: "SimWireMicroBooNE"
DriftEModuleLabel: "largeant"
NoiseFact: 0.0132 #Noise Scale
#NoiseFact: 0.15 #Noise Scale to use with histogram
NoiseWidth: 62.4 #Exponential Noise width (kHz)
LowCutoff: 7.5 #Low frequency filter cutoff (kHz)
CompressionType: "none" #could also be none
GetNoiseFromHisto: false
NoiseFileFname: "uboone_noise_v0.1.root"
NoiseHistoName: "NoiseFreq"

}
###### Configuration of the analyser for SimWireXxxx output for MicroBooNE (copy of generic) ######
microboone_simwireana: @local::standard_simwireana

###### Configuration of a different analyser for SimWireXxxx output, generic and experiment-specific ######
standard_wienerfilterana:
{
module_type: "WienerFilterAna"
DetSimModuleLabel: "daq"
}

bo_wienerfilterana: @local::standard_wienerfilterana
argoneut_wienerfilterana: @local::standard_wienerfilterana
microboone_wienerfilterana: @local::standard_wienerfilterana

###### All files that are parameter set definitions must contain END_PROLOG as their last line ######
###### This tag tells the FHICL parser that parameter set definitions are ended ######
END_PROLOG
</pre>

h3. Configuring the "message service":https://cdcvs.fnal.gov/redmine/projects/messagefacility/wiki/Using_MessageFacility#Using-MessageFacility

Several standard configurations for the message service are in "lardata/Utilities/messageservice.fcl":https://cdcvs.fnal.gov/redmine/projects/lardata/repository/revisions/develop/entry/Utilities/messageservice.fcl. There is one configuration for each level of message output - Debug, Info, Warning, and Error. These configurations will be applied to each message level that is specified and those of higher priority. For example, the Info configuration will print out Info, Warning and Error level messages while the Warning configuration only print outs Warning and Error level messages. The "standard" debug configurations will cause the messages to go to a specified output file, @debug.log@. The Error configuration redirect to standard error stream (like @std::cerr@), while the others print into the standard output (@std::cout@). All impose some limits on the repetition of some frequent messages.
Remember that to use one of these "standard" configurations you need to include it in your FCL file: they are standard, not default.
If you want to define your own configuration, please take a look at the comments in "lardata/Utilities/messageservice.fcl":https://cdcvs.fnal.gov/redmine/projects/lardata/repository/revisions/develop/entry/Utilities/messageservice.fcl file to determine how to do so.

Examples of how to include the usual use of the message service configurations are in the example files below.

To get a different level of output from just one module (say @DBSCAN@) one would do:

<pre>
services:
{
# Load the service that manages root files for histograms.
TFileService: { fileName: "reco_hist.root" }
Timing: {}
RandomNumberGenerator: {} #ART native random number generator

# configure the message service with the INFO for DBSCAN
# and WARNING level for everything else
message: {
destinations: {
infomsg: {
type: "cout"
threshold: "INFO"
append: true
category: {
DBSCAN: {
reportEvery: 1
}
}
}
warningmsg: {
type: "cout"
threshold: "WARNING"
append: true
categories: {
default: {
limit: 1000
timespan: 60
}
} # end categories
} # end warningmsg
} # end destinations
} # end standard_warning

user: @local::argoneut_services
}
</pre>

h3. Example job script: @prodgenie.fcl@

An example job script to produce Monte Carlo events is "larsim/EventGenerator/GENIE/prodgenie.fcl":https://cdcvs.fnal.gov/redmine/projects/larsim/repository/revisions/develop/entry/EventGenerator/GENIE/prodgenie.fcl . The job defined by this script will generate neutrino interactions using GENIE, run them through Geant4, do the electron transport and then simulate the electronics.

Comments on the form of the file are included as ###### Commment ######

<pre>
###### This is how to include configurations from other files ######
#include "services.fcl"
#include "genie.fcl"
#include "largeantmodules.fcl"
#include "detsimmodules.fcl"

###### give the process a name ######
process_name: GenieGen

###### Please note the convention of defining detector specific configurations ######
###### Pick out the configurations from the #include files using the @local:: syntax ######
###### for services from LArSoft, in the user{} block - see definitions for configurations in ######
###### job/geometry.fcl ######
###### job/services.fcl ######
###### job/simulationservices.fcl ######

services:
{
# Load the service that manages root files for histograms.
TFileService: { fileName: "genie_hist.root" }
Timing: {}
SimpleMemoryCheck: { ignoreTotal: 1 } # default is one
RandomNumberGenerator: {} #ART native random number generator
user: @local::argoneut_simulation_services
}

###### source is where you get events from - can also be RootInput ######
#Start each new event with an empty event.
source:
{
module_type: EmptyEvent
maxEvents: 10 # Number of events to create
firstRun: 1 # Run number to use for this file
firstEvent: 1 # number of first event in the file
}

###### physics is the block that controls configuration of modules ######
# Define and configure some modules to do work on each event.
# First modules are defined; they are scheduled later.
# Modules are grouped by type.
physics:
{

###### the module labels in the output file will be generator, largeant, and daq ######
###### their configuration is taken from ArgoNeuT defaults ######
producers:
{
generator: @local::argoneut_genie_simple_neutrino
largeant: @local::argoneut_largeant
daq: @local::argoneut_simwire
rns: { module_type: "RandomNumberSaver" }
}

#define the producer and filter modules for this path, order matters,
#filters reject all following items. see lines starting physics.producers below
simulate: [ rns, generator, largeant, daq, ]

#define the output stream, there could be more than one if using filters
stream1: [ out1 ]

#trigger_paths is a keyword and contains the paths that modify the art::event,
#ie filters and producers
trigger_paths: [simulate]

#end_paths is a keyword and contains the paths that do not modify the art::Event,
#ie analyzers and output streams. these all run simultaneously
end_paths: [stream1]
}

#block to define where the output goes. if you defined a filter in the physics
#block and put it in the trigger_paths then you need to put a SelectEvents: {SelectEvents: [XXX]}
#entry in the output stream you want those to go to, where XXX is the label of the filter module(s)
outputs:
{
out1:
{
module_type: RootOutput
fileName: "genie_gen.root" #default file name, can override from command line with -o or --output
}
}

</pre>

Notice that you have not specified which libraries to load anywhere. That is because the build system compiles the plugin shared library files (@.so@) against the ones they depend upon.

h3. Example job script: @standard_reco.fcl@

There is an example reconstruction job script available for people to use, "lardata/Utilities/standard_reco":https://cdcvs.fnal.gov/redmine/projects/lardata/repository/revisions/develop/entry/Utilities/standard_reco.fcl .
This script takes the output of either raw data or MC that has produced simulated raw digits and performs a list of reconstruction tasks.
The version from LArSoft @v02_03_01@ is copied here:

<pre>
#include "services.fcl"
#include "caldata.fcl"
#include "hitfindermodules.fcl"
#include "clustermodules.fcl"
#include "trackfindermodules.fcl"

process_name: Reco

services:
{
# Load the service that manages root files for histograms.
TFileService: { fileName: "reco_hist.root" }
Timing: {}
RandomNumberGenerator: {} #ART native random number generator
message: @local::standard_warning
user: @local::argoneut_services
}

#source is now a root file
source:
{
module_type: RootInput
maxEvents: 10 # Number of events to create
}

# Define and configure some modules to do work on each event.
# First modules are defined; they are scheduled later.
# Modules are grouped by type.
physics:
{

producers:
{
caldata: @local::argoneut_calwire
ffthit: @local::argoneut_hitfinder # note if job is for MC, use argoneut_mc_hitfinder
dbcluster: @local::argoneut_dbcluster
hough: @local::argoneut_houghlinefinder
linemerger: @local::argoneut_linemerger
track: @local::argoneut_track
harris: @local::argoneut_endpointmod
}

analyzers:
{
dbclusterana: @local::argoneut_dbclusterana
}

#define the producer and filter modules for this path, order matters,
#filters reject all following items. see lines starting physics.producers below
reco: [ caldata, ffthit, dbcluster, hough, linemerger, track, harris ]
ana: [ dbclusterana ]

#define the output stream, there could be more than one if using filters
stream1: [ out1 ]

#trigger_paths is a keyword and contains the paths that modify the art::event,
#ie filters and producers
trigger_paths: [reco]

#end_paths is a keyword and contains the paths that do not modify the art::Event,
#ie analyzers and output streams. these all run simultaneously
end_paths: [ ana, stream1]
}

#block to define where the output goes. if you defined a filter in the physics
#block and put it in the trigger_paths then you need to put a SelectEvents: {SelectEvents: [XXX]}
#entry in the output stream you want those to go to, where XXX is the label of the filter module(s)
outputs:
{
out1:
{
module_type: RootOutput
fileName: "standard_reco.root" #default file name, can override from command line with -o or --output
}
}

</pre>

h3. How to override a default parameter

If you want to override a default parameter that has been included from a predefined parameter set, you must specify which parameter and its value as

<pre>
mainBlock.subBlock.label.parameterName: newValue
</pre>

where

* @mainBlock@ can be services or physics
* @subBlock@ can be user, producers, filters, or analyzers
* @label@ is the name of the desired service or module in a producers, filters, or analyzers block
* @parameterName@ is the name of the desired parameter
* @newValue@ is the desired new value; lists or entire blocks (that is, a comma-separated list of @key: value@ pairs in braces) can be specified

These lines must go after the @mainBlock@ and be outside of any other block.

For example, if one wanted to change the default value of the @fhitsModuleLabel@ parameter in the @DBcluster@ module in the previous section, one would put

<pre>
physics.producers.cluster.fhitsModuleLabel: "differentHitModuleLabel"
</pre>

Note that FHiCL allows for completely replacing a value but not for changing it (e.g., it's not possible to add an element to an existing list).

h3. fhicl Emacs syntax highlighting

If you use Emacs as your editor, you can put the following into your @.emacs@ file in your home directory to cause it to display @.fcl@ files with syntax highlighting

<pre>
(setq fclKeywords
'(
;; This, due to poor language design, conflicts with comments and fails
("#include" . font-lock-keyword-face)
("@local" . font-lock-keyword-face)
;; All these names are magic, I think

("process_name:\\|services:\\|source:\\|outputs:\\|physics\\|producers:\\|filters:\\|analyzers:" . font-lock-builtin-face)
("true\\|false" . font-lock-builtin-face)
;; Variable definitions are followed by colons

("[a-zA-Z0-9_]*:" . font-lock-variable-name-face)
)
)

;; Python mode gets us comment handling and indentation at colons

(define-derived-mode fcl-mode python-mode
(setq mode-name "FHICL")
(setq font-lock-defaults '(fclKeywords))
;; (setq tab-width 2) ;; Doesn't seem to work

)

(add-to-list 'auto-mode-alist '("\\.fcl\\'" . fcl-mode))
</pre>

Also, please use _only spaces_ for alignment: it's not the perfect solution, but it makes people using different editors see the same code alignment.



h2. Executable and command line options

Currently there is one executable to run in LArSoft. The executable to run a typical reconstruction or analysis job is @lar@ which is placed in the user's path by the setup script. To see what options are available do
<pre>
$ lar -h
</pre>
The output for @art@ version @v1_09_03@ is:
<pre>
Usage: lar <-c <config-file>> <other-options> [<source-file>]+

Allowed options:
-c [ --config ] arg Configuration file.
-h [ --help ] produce help message
--process-name arg art process name.

-s [ --source ] arg Source data file (multiple OK).
-S [ --source-list ] arg file containing a list of source files to read,
one per line.
-e [ --estart ] arg Event # of first event to process.
-n [ --nevts ] arg Number of events to process.
--nskip arg Number of events to skip.
-T [ --TFileName ] arg File name for TFileService.
-o [ --output ] arg Event output stream file.
--trace Activate tracing.
--notrace Deactivate tracing.
--memcheck Activate monitoring of memory use.
--nomemcheck Deactivate monitoring of memory use.
--default-exceptions some exceptions may be handled differently by
default (e.g. ProductNotFound).
--rethrow-default all exceptions default to rethrow.
--rethrow-all all exceptions overridden to rethrow (cf
rethrow-default).
--sam-web-uri arg URI for SAM web service.
--sam-process-id arg SAM process ID.
--sam-application-family arg SAM application family.
--sam-app-family arg SAM application family.
--sam-application-version arg SAM application version.
--sam-app-version arg SAM application version.
--sam-file-type arg File type for SAM metadata.
--sam-data-tier arg SAM data tier spec (<module-label>:<tier-spec>)
.
--sam-stream-name arg SAM stream name (<module-label>:<stream-name>).

Art has completed and will exit with status 1.
</pre>

h2. Running a Job

To run the job defined by the script above, do
<pre>
$ lar -c job/prodgenie.fcl
</pre>

One can stop a job in two ways:
# press <Ctrl>+<C> once: the job will complete at the end of the current module. If the job is running in the background type @kill -9 %jobID@ on the command line.
# press <Ctrl>+<C> twice: the job should stop immediately and, depending on the shall settings, might produce a core dump.

If you want to have your job keep running even if you get disconnected from a remote session, depending on your shell you might need to start it with:
<pre>
$ nohup lar job/prodgenie.fcl >& pg.out
</pre>

To stop such a job, then do
<pre>
$ pgrep lar # to find the job ID (use ps to disambiguate if more than one lar process are running)
$ kill -INT jobID
</pre>

One can print out the configuration of the job without starting the executable by:
<pre>
$ ART_DEBUG_CONFIG=1 lar -c prodgenie.fcl
</pre>
in @bash@, or
<pre>
> env ART_DEBUG_CONFIG=1 lar -c prodgenie.fcl
</pre>
in C-shell, which produces the output
<pre>
** ART_DEBUG_CONFIG is defined: config debug output follows **
all_modules: [ "out1"
, "daq"
, "generator"
, "largeant"
, "rns"
]
outputs: { out1: { fileName: "genie_gen.root"
module_label: "out1"
module_type: "RootOutput"
}
}
physics: { end_paths: [ "stream1" ]
producers: { daq: { Col3DCorrection: 2.5
ColFieldRespAmp: 3.54e-2
CompressionType: "none"
DriftEModuleLabel: "largeant"
FieldBins: 75
Ind3DCorrection: 1.5
IndFieldRespAmp: 1.8e-2
LowCutoff: 7.5
NoiseFact: 1.32e-1
NoiseWidth: 6.24e1
ResponseFile: "shape-argo.root"
ShapeTimeConst: [ 3000
, 900
]
module_label: "daq"
module_type: "SimWireT962"
}
generator: { BeamCenter: [ -1400
, -350
, 0
]
[...]
</pre>

This functionality is particularly helpful when trying to debug what input parameters were passed to the job.

h2. Why did my job fail?

In case of failure, @art@ usually provides some information about the failure.

If a job fails, look at the warnings printed to the screen or any output log files.

Also check in the [[Breaking_Changes|breaking changes]] page for information about problems (and solutions) caused by changes to the code.

You can ask help to the larsoft@fnal.gov mailing list, or open a issue in the appropriate tracker (usually "the LArSoft one":https://cdcvs.fnal.gov/redmine/projects/larsoft/issues), or to artists@fnal.gov .
The usual bug report rules apply: include at least
* the full command line of the failing command
* a full path to the input files
* which LArSoft release you are using
* the complete error message.

If a bug should be reported to the artists@fnal.gov list, attach the _complete output of the job_ to the email.

h2. Submitting Jobs to the compute farms

We have to find out where the page with the submission instructions went (*TODO*!).