Project

General

Profile

Running Jobs » History » Version 18

« Previous - Version 18/21 (diff) - Next » - Current version
Gianluca Petrillo, 01/26/2016 11:37 AM


Under construction... NOT YET A SOURCE OF INFORMATION!!

Running Jobs

This page describes the job configuration file (often, in jargon: FHiCL file) and how to run a job using one.

It is assumed that you have previously read the Quick-start guide to using and developing LArSoft code and the information on Using LArSoft on the GPVM nodes.

The Job Configuration

Once a base release is set up, it is easy to run a job. The basic unit for running a job is the job-control script, written in the FHICL language. The FHICL language provides a simple mechanism for including parameter set configurations from different files such that many job configuration files can use the same configuration for a module or service.

There is also a nice FHiCL quick start guide available for more details.

Key Concepts in FHiCL

There are a few key concepts to writing a FHICL job control script. In order, they are

  1. Including previously defined configurations for services and modules from other files. This is done using #include statements. Be sure you don't have any trailing space or tab characters on the #include line.
  2. Service block, denoted by services: { } This block will contain configurations for ART specific services such as the TFileService and the RandomNumberGenerator. It also contains the configuration of LArSoft specific services1.
  3. Source block, denoted by source: { }. This block tells the job what kind of source to expect (EmptyEvent in the case of Monte Carlo generation, RootInput in the case of anything downstream of a Monte Carlo generator or reconstruction, and specific modules for data from the detector), the file name for the input source if appropriate, and how many events to process. Both the file name and number of events to process can be specified on the command line.
  4. Output block, denoted by outputs: { } This block tells the job what kind of output to make, i.e. RootOutput, and what the name of the output file should be. The output file name can be specified on the command line. It is possible to define more than one output file if one wanted to run a job that produced different output files based on filter criteria - i.e. empty events are put in one file and events with neutrinos in them are put in another. Multiple output files can only be specified in the job configuration file, not from the command line.
  5. Physics block, denoted by physics: { } This block is where all producer, analyzer, and filter modules are configured. Sequences of producer and filter modules to run is defined in user-named paths in this block. The list of analyzers and output modules to run is defined in a separate user-named path. The block also defines two keyword parameters, trigger_paths and end_paths. trigger_paths contains all producer and filter paths to run, and end_paths contains the analyzer and output path.

Comments may be included in FHiCL configuration files using the "#" character. The #include is a keyword so that the parser knows not to ignore what comes after "#include2".

1 In old FHiCL files you will notice that LArSoft and in general non-art service configuration is enclosed in a user block, that is now deprecated.

2 Note that the FHiCL parser can't recognize a comment on the same line as a #include directive.

FHiCL Rules

There are a few of rules to keep in mind about FHiCL:
  • The value of the process_name parameter may not contain underscores as the process name is used in the ROOT file branch name. Module labels may not contain underscores either, for the same reason.
  • Parameter set names may not contain numbers, periods, backslashes, stars, etc. They may contain underscores.
  • Put the values for all string parameters in double quotes, "..."
  • Specify input vectors using [ , , ], i.e. if you want a vector of doubles do MyVector: [1.0, 3e-9, -900.]
  • You pick out configurations from the PROLOG section(s), usually defined in the #include files, using the @local:: syntax. The value after the "::" is the name of the configuration specified in the PROLOG (see the next bullet)
  • You can override the value of an included configuration. For example, imagine there is a configuration specified in a included file called mymoduleconfig and it contains the value -5 for the parameter named myint. One can load the configuration and then change the value of myint by doing the following:
    1. inside the producers block:
      physics: {
        producers: {
          # ...
          mymod: @local::mymoduleconfig
        }
      }
      
    2. out of the physics block
      physics.producer.mymod.myint: 1
      

      The last value for a parameter always wins. If the second line was repeated with the value 2 instead of 1, the job would run with myint as 2.
      Also note that in the example the original content of mymoduleconfig is not changed when the content of mymod is.

Example configuration file: detsimmodules.fcl

An example of a file with predefined configurations for modules is in the larsim/DetSim/detsimmodules.fcl file.
All the definitions are inside a prologue block.
The following is an excerpt taken from LArSoft v05_00_00 (experiments now use their specific detsimmodule_Xxxx.fcl configuration file though):

BEGIN_PROLOG

standard_simwire:
{
  module_type:        "SimWire" 
  DriftEModuleLabel:  "largeant" 
  NoiseFact:           0.0132      # Noise Scale
  NoiseWidth:         62.4         # Exponential Noise width (kHz)
  LowCutoff:           7.5         # Low frequency filter cutoff (kHz)
  FieldBins:          75
  Col3DCorrection:    2.5
  Ind3DCorrection:    1.5
  ColFieldRespAmp:    0.0354
  IndFieldRespAmp:    0.018
  ShapeTimeConst:     [ 3000., 900. ]
  CompressionType:    "none" 
}

argoneut_simwireana: @local::standard_simwireana

###
### ... and more configurations ...
###

END_PROLOG

Configuring the message service

Several standard configurations for the message service are in lardata/Utilities/messageservice.fcl. There is one configuration for each level of message output - Debug, Info, Warning, and Error. These configurations will be applied to each message level that is specified and those of higher priority. For example, the Info configuration will print out Info, Warning and Error level messages while the Warning configuration only print outs Warning and Error level messages. The "standard" debug configurations will cause the messages to go to a specified output file, debug.log. The Error configuration redirect to standard error stream (like std::cerr), while the others print into the standard output (std::cout). All impose some limits on the repetition of some frequent messages.
Remember that to use one of these "standard" configurations you need to include it in your FCL file: they are standard, not default.
If you want to define your own configuration, please take a look at the comments in lardata/Utilities/messageservice.fcl file to determine how to do so.

Examples of how to include the usual use of the message service configurations are in the example files below.

To get a different level of output from just one module (say DBSCAN) one would do:

services:
{
  # Load the service that manages root files for histograms.
  TFileService: { fileName: "reco_hist.root" }
  Timing:       {}
  RandomNumberGenerator: {} #ART native random number generator

  # configure the message service with the INFO for DBSCAN
  # and WARNING level for everything else
  message: { 
    destinations: {  
      infomsg: {
        type: "cout" 
        threshold: "INFO" 
        append: true
        category: {
           DBSCAN: {
             reportEvery: 1
           }
        }
      }
      warningmsg: {
        type:      "cout"    
        threshold: "WARNING" 
        append:    true        
        categories: {
          default: {
            limit:       1000   
            timespan:    60    
          }
        } # end categories
      } # end warningmsg
    } # end destinations
  } # end standard_warning

  user:         @local::argoneut_services          
}

Debug messages

Note that debugging messages are treated in a special way:
  1. there are two ways to print a debug message in the code: by using mf::LogDebug, and by using LOG_DEBUG macro; the messages using the latter will not be present when a non-debug qualifier (prof and opt) is used. In fact, no code is generated at all from a LOG_DEBUG call unless using a debug qualified build
  2. debug output from modules is selectively enabled by the debugModules list; to enable all the messages, use debugModules: [ "*" ]

Also remember that, depending on the configuration, the debug output could be not shown on screen, but only stored in a file (e.g. debug.log).

Example job script: prodgenie.fcl

An example job script to produce Monte Carlo events is larsim/EventGenerator/GENIE/prodgenie.fcl . The job defined by this script will generate neutrino interactions using GENIE, run them through Geant4, do the electron transport and then simulate the electronics.

Comments on the form of the file are included as ###### Commment ######

###### This is how to include configurations from other files ######
#include "services.fcl" 
#include "genie.fcl" 
#include "largeantmodules.fcl" 
#include "detsimmodules.fcl" 

###### give the process a name ######
process_name: GenieGen

###### Please note the convention of defining detector specific configurations                ######
###### Pick out the configurations from the #include files using the @local:: syntax          ######
###### for services from LArSoft, in the user{} block - see definitions for configurations in ######
###### job/geometry.fcl                                                                       ######
###### job/services.fcl                                                                       ######
###### job/simulationservices.fcl                                                             ######

services:
{
  # Load the service that manages root files for histograms.
  TFileService: { fileName: "genie_hist.root" }
  Timing:       {}
  SimpleMemoryCheck:     { ignoreTotal: 1 } # default is one
  RandomNumberGenerator: {} #ART native random number generator
  user:         @local::argoneut_simulation_services
}

###### source is where you get events from - can also be RootInput ######
#Start each new event with an empty event.
source:
{
  module_type: EmptyEvent
  maxEvents:   10          # Number of events to create
  firstRun:    1           # Run number to use for this file
  firstEvent:  1           # number of first event in the file
}

###### physics is the block that controls configuration of modules ######
# Define and configure some modules to do work on each event.
# First modules are defined; they are scheduled later.
# Modules are grouped by type.
physics:
{

 ###### the module labels in the output file will be generator, largeant, and daq ######
 ###### their configuration is taken from ArgoNeuT defaults                       ######
 producers:
 {
   generator: @local::argoneut_genie_simple_neutrino
   largeant:  @local::argoneut_largeant           
   daq:       @local::argoneut_simwire         
   rns:       { module_type: "RandomNumberSaver" }
 }

 #define the producer and filter modules for this path, order matters, 
 #filters reject all following items.  see lines starting physics.producers below
 simulate: [ rns, generator, largeant, daq, ] 

 #define the output stream, there could be more than one if using filters 
 stream1:  [ out1 ]

 #trigger_paths is a keyword and contains the paths that modify the art::event, 
 #ie filters and producers
 trigger_paths: [simulate] 

 #end_paths is a keyword and contains the paths that do not modify the art::Event, 
 #ie analyzers and output streams.  these all run simultaneously
 end_paths:     [stream1]  
}

#block to define where the output goes.  if you defined a filter in the physics
#block and put it in the trigger_paths then you need to put a SelectEvents: {SelectEvents: [XXX]}
#entry in the output stream you want those to go to, where XXX is the label of the filter module(s)
outputs:
{
 out1:
 {
   module_type: RootOutput
   fileName:    "genie_gen.root" #default file name, can override from command line with -o or --output
 }
}

Notice that you have not specified which libraries to load anywhere. That is because the build system compiles the plugin shared library files (.so) against the ones they depend upon.

Example job script: standard_reco.fcl

There is an example reconstruction job script available for people to use, lardata/Utilities/standard_reco .
This script takes the output of either raw data or MC that has produced simulated raw digits and performs a list of reconstruction tasks.
The version from LArSoft v02_03_01 is copied here:

#include "services.fcl" 
#include "caldata.fcl" 
#include "hitfindermodules.fcl" 
#include "clustermodules.fcl" 
#include "trackfindermodules.fcl" 

process_name: Reco

services:
{
  # Load the service that manages root files for histograms.
  TFileService: { fileName: "reco_hist.root" }
  Timing:       {}
  RandomNumberGenerator: {} #ART native random number generator
  message:      @local::standard_warning
  user:         @local::argoneut_services          
}

#source is now a root file
source:
{
  module_type: RootInput
  maxEvents:  10        # Number of events to create
}

# Define and configure some modules to do work on each event.
# First modules are defined; they are scheduled later.
# Modules are grouped by type.
physics:
{

 producers:
 {
  caldata:    @local::argoneut_calwire        
  ffthit:     @local::argoneut_hitfinder # note if job is for MC, use argoneut_mc_hitfinder
  dbcluster:  @local::argoneut_dbcluster        
  hough:      @local::argoneut_houghlinefinder
  linemerger: @local::argoneut_linemerger        
  track:      @local::argoneut_track        
  harris:     @local::argoneut_endpointmod    
 }

 analyzers:
 {
  dbclusterana: @local::argoneut_dbclusterana
 }

 #define the producer and filter modules for this path, order matters, 
 #filters reject all following items.  see lines starting physics.producers below
 reco: [ caldata, ffthit, dbcluster, hough, linemerger, track, harris ] 
 ana:  [ dbclusterana ]

 #define the output stream, there could be more than one if using filters 
 stream1:  [ out1 ]

 #trigger_paths is a keyword and contains the paths that modify the art::event, 
 #ie filters and producers
 trigger_paths: [reco] 

 #end_paths is a keyword and contains the paths that do not modify the art::Event, 
 #ie analyzers and output streams.  these all run simultaneously
 end_paths:     [ ana, stream1]  
}

#block to define where the output goes.  if you defined a filter in the physics
#block and put it in the trigger_paths then you need to put a SelectEvents: {SelectEvents: [XXX]}
#entry in the output stream you want those to go to, where XXX is the label of the filter module(s)
outputs:
{
 out1:
 {
   module_type: RootOutput
   fileName:    "standard_reco.root" #default file name, can override from command line with -o or --output
 }
}

How to override a default parameter

If you want to override a default parameter that has been included from a predefined parameter set, you must specify which parameter and its value as

mainBlock.subBlock.label.parameterName: newValue

where

  • mainBlock can be services or physics
  • subBlock can be producers, filters, or analyzers, skipped in case of services, and user for old service configuration
  • label is the name of the desired service or module instance in a producers, filters, or analyzers block
  • parameterName is the name of the desired parameter
  • newValue is the desired new value; a list or entire block (that is, a brace-enclosed sequence of key: value pairs) can be specified

These lines must go after the mainBlock and be outside of any other block.

For example, if one wanted to change the default value of the HitsModuleLabel parameter in the dbcluster module instance in the previous section, one would put

physics.producers.dbcluster.HitsModuleLabel: "differentHitModuleLabel" 

Note that FHiCL allows for completely replacing a list but not for changing its size (i.e., it's not possible to add an element to an existing list).

This behavior is general: FHiCL only allows to replace elements. For example, if one wants to change both eps and epstwo parameters,

physics.producers.dbcluster.DBScanAlg: { # WRONG!!!
  eps:    1.1
  epstwo: 1.6
}
will not work, because it is replacing the whole physics.producers.dbcluster.DBScanAlg parameter set, thus removing e.g. the parameter minPts.
The correct way to change parameters is to think of that as replacing them, and to point exactly at each of them:
physics.producers.dbcluster.DBScanAlg.eps:    1.1
physics.producers.dbcluster.DBScanAlg.epstwo: 1.6
will leave the other parameters (e.g., minPts) unchanged.

fhicl Emacs syntax highlighting

If you use Emacs as your editor, you can put the following into your .emacs file in your home directory to cause it to display .fcl files with syntax highlighting

(setq fclKeywords
'(
  ;; This, due to poor language design, conflicts with comments and fails
  ("#include" . font-lock-keyword-face)
  ("@local" . font-lock-keyword-face)
  ;; All these names are magic, I think

("process_name:\\|services:\\|source:\\|outputs:\\|physics\\|producers:\\|filters:\\|analyzers:" . font-lock-builtin-face)
  ("true\\|false" . font-lock-builtin-face)
  ;; Variable definitions are followed by colons

  ("[a-zA-Z0-9_]*:" . font-lock-variable-name-face)
 )
)

;; Python mode gets us comment handling and indentation at colons

(define-derived-mode fcl-mode python-mode
 (setq mode-name "FHICL")
 (setq font-lock-defaults '(fclKeywords))
;;  (setq tab-width 2) ;; Doesn't seem to work

)

(add-to-list 'auto-mode-alist '("\\.fcl\\'" . fcl-mode))

Please use only spaces for alignment: it's not the perfect solution, but it makes people using different editors see the same code alignment.

Executable and command line options

Currently there is one executable to run in LArSoft. The executable to run a typical reconstruction or analysis job is lar which is placed in the user's path by the setup script. To see what options are available do

$ lar -h

The output for art version v1_17_05 is:
Usage: lar <-c <config-file>> <other-options> [<source-file>]+

Allowed options:
  -c [ --config ] arg                  Configuration file.
  -h [ --help ]                        produce help message
  --process-name arg                   art process name.
  --print-available arg                List all available plugins with the
                                       provided suffix.  Choose from:
                                           'module'
                                           'plugin'
                                           'service'
                                           'source'
  --print-available-modules            List all available modules that can be
                                       invoked in a FHiCL file.
  --print-available-services           List all available services that can be
                                       invoked in a FHiCL file.
  --print-description arg              Print description of specified module,
                                       service, source, or other plugin
                                       (multiple OK).

  -s [ --source ] arg                  Source data file (multiple OK);
                                       precludes -S.
  -S [ --source-list ] arg             file containing a list of source files
                                       to read, one per line; precludes -s.
  -e [ --estart ] arg                  Event # of first event to process.
  -n [ --nevts ] arg                   Number of events to process.
  --nskip arg                          Number of events to skip.
  -T [ --TFileName ] arg               File name for TFileService.
  --tmpdir arg                         Temporary directory for in-progress
                                       output files (defaults to directory of
                                       specified output file names).
  -o [ --output ] arg                  Event output stream file (optionally
                                       specify stream with stream-label:fileNam
                                       e in which case multiples are OK).
  --no-output                          Disable all output streams.
  --trace                              Activate tracing.
  --notrace                            Deactivate tracing.
  --memcheck                           Activate monitoring of memory use.
  --memcheck-db arg                    Output memory use data to SQLite3
                                       database with name <db-file>.
  --nomemcheck                         Deactivate monitoring of memory use.
  --default-exceptions                 Some exceptions may be handled
                                       differently by default (e.g.
                                       ProductNotFound).
  --rethrow-default                    All exceptions default to rethrow.
  --rethrow-all                        All exceptions overridden to rethrow (cf
                                       rethrow-default).
  --errorOnFailureToPut [=arg(=true)]  Global flag that controls the behavior
                                       upon failure to 'put' a product
                                       (declared by 'produces') onto the Event.
                                         If 'true', per-module flags can
                                       override the value of the global flag.
  --errorOnSIGINT [=arg(=true)]        If 'true', a signal received from the
                                       user yields an art return code
                                       corresponding to an error; otherwise
                                       return 0.
  --debug-config arg                   Output post-processed configuration to
                                       <file> and exit. Equivalent to env
                                       ART_DEBUG_CONFIG=<file> lar ...
  --config-out arg                     Output post-processed configuration to
                                       <file> and continue with job.
  --annotate                           Include configuration parameter source
                                       information.
  --prefix-annotate                    Include configuration parameter source
                                       information on line preceding parameter
                                       declaration.
  --sam-web-uri arg                    URI for SAM web service.
  --sam-process-id arg                 SAM process ID.
  --sam-application-family arg         SAM application family.
  --sam-app-family arg                 SAM application family.
  --sam-application-version arg        SAM application version.
  --sam-app-version arg                SAM application version.
  --sam-group arg                      SAM group.
  --sam-file-type arg                  File type for SAM metadata.
  --sam-data-tier arg                  SAM data tier spec-label>:<tier-spec>).
  --sam-run-type arg                   Global run-type for SAM metadata.
  --sam-stream-name arg                SAM stream name (<module-label>:<stream-
                                       name>).

Art has completed and will exit with status 1.

Running a Job

To run the job defined by the script above, do

$ lar -c job/prodgenie.fcl

One can stop a job in two ways:
  1. press <Ctrl>+<C> once: the job will complete at the end of the current module. If the job is running in the background type kill -9 %jobID on the command line.
  2. press <Ctrl>+<C> twice: the job should stop immediately and, depending on the shall settings, might produce a core dump.

If you want to have your job keep running even if you get disconnected from a remote session, depending on your shell you might need to start it with:

$ nohup lar job/prodgenie.fcl >& pg.out

To stop such a job, then do

$ pgrep lar # to find the job ID (use ps to disambiguate if more than one lar process are running)
$ kill -INT jobID

One can print out the configuration of the job without starting the executable by:

$ ART_DEBUG_CONFIG=1 lar -c prodgenie.fcl

in bash, or
> env ART_DEBUG_CONFIG=1 lar -c prodgenie.fcl

in C-shell, which produces the output
** ART_DEBUG_CONFIG is defined: config debug output follows **
all_modules: [ "out1" 
             , "daq" 
             , "generator" 
             , "largeant" 
             , "rns" 
             ]
outputs: { out1: { fileName: "genie_gen.root" 
                   module_label: "out1" 
                   module_type: "RootOutput" 
                 }
         }
physics: { end_paths: [ "stream1" ]
           producers: { daq: { Col3DCorrection: 2.5
                               ColFieldRespAmp: 3.54e-2
                               CompressionType: "none" 
                               DriftEModuleLabel: "largeant" 
                               FieldBins: 75
                               Ind3DCorrection: 1.5
                               IndFieldRespAmp: 1.8e-2
                               LowCutoff: 7.5
                               NoiseFact: 1.32e-1
                               NoiseWidth: 6.24e1
                               ResponseFile: "shape-argo.root" 
                               ShapeTimeConst: [ 3000
                                               , 900
                                               ]
                               module_label: "daq" 
                               module_type: "SimWireT962" 
                             }
                        generator: { BeamCenter: [ -1400
                                                 , -350
                                                 , 0
                                                 ]
[...]

This functionality is particularly helpful when trying to debug what input parameters were passed to the job.

Why did my job fail?

In case of failure, art usually provides some information about the failure.

If a job fails, look at the warnings printed to the screen or any output log files.

Also check in the breaking changes page for information about problems (and solutions) caused by changes to the code.

You can ask help to the mailing list, or open a issue in the appropriate tracker (usually the LArSoft one), or to .
The usual bug report rules apply: include at least
  • the full command line of the failing command
  • a full path to the input files
  • which LArSoft release you are using
  • the complete error message.

If a bug should be reported to the list, attach the complete output of the job to the email.

Submitting Jobs to the compute farms

We have to find out where the page with the submission instructions went (TODO!).