Project

General

Profile

Running Jobs » History » Version 5

Version 4 (Gianluca Petrillo, 07/22/2014 07:17 PM) → Version 5/21 (Gianluca Petrillo, 07/28/2014 04:59 PM)

{{toc}}

Under construction... NOT YET A SOURCE OF INFORMATION!!

h1. Running Jobs

This page describes the job configuration script and how to run a job using one.

It is assumed that you have previously read the [[ Quick-start guide to using and developing LArSoft code ]] and the information on [[Using LArSoft on the GPVM nodes]].

h2. The Job Configuration Script

Once a base and test release are set up, it is easy to run a job. The basic unit for running a job is the job-control script, written in the FHICL language. The FHICL language provides a simple mechanism for including parameter set configurations from different files such that many job configuration files can use the same configuration for a module or service.

There is also a nice "FHICL quick start guide":https://cdcvs.fnal.gov/redmine/attachments/16021/quick_start.pdf available for more details.

h3. Key Concepts in FHICL

There are a few key concepts to writing a FHICL job control script. In order, they are

# Including previously defined configurations for services and modules from other files. This is done using @#include@ statements. *Be sure you don't have any trailing space or tab characters* on the @#include@ line.
# Services block, denoted by @services: { }@ This block will contain configurations for ART specific services such as the @TFileService@ and the @RandomNumberGenerator@. It also contains a @user: {}@ sub-block where LArSoft specific services are configured.
# Source block, denoted by @source: { }@. This block tells the job what kind of source to expect (@EmptyEvent@ in the case of Monte Carlo generation, @RootInput@ in the case of anything downstream of a Monte Carlo generator or reconstruction), the file name for the input source if appropriate, and how many events to process. Both the file name and number of events to process can be specified on the command line.
# Outputs block, denoted by @outputs: { }@ This block tells the job what kind of output to make, i.e. @RootOutput@, and what the name of the output file should be. The output file name can be specified on the command line. It is possible to define more than one output file if one wanted to run a job that produced different output files based on filter criteria - i.e. empty events are put in one file and events with neutrinos in them are put in another. Multiple output files can only be specified in the job configuration file, not from the command line.
# Physics block, denoted by @physics: { }@ This block is where all producer, analyzer, and filter modules are configured. The sequence of producer and filter modules to run is defined in a user-named path in this block. The list of analyzers to run is defined in separate user-named path. The block also defines two keyword parameters, @trigger_paths@ and @end_paths@. @trigger_paths@ contains all producer and filter paths to run, and @end_paths@ contains the analyzer paths and output streams.

Comments may be included in FHICL configuration files using the "#" character. The @#include@ is a keyword so that the parser knows not to ignore what comes after "#include".

h3. FHICL Rules

There are a couple of rules to keep in mind about FHICL:
* The value of the process_name parameter may not contain underscores as the process name is used in the ROOT file branch name. Module labels may not contain underscores either, for the same reason.
* Parameter set names may not contain numbers, periods, backslashes, stars, etc. They may contain underscores.
* Put the values for all string parameters in double quotes, @"..."@
* Specify input vectors using @[ , , ]@, i.e. if you want a vector of doubles do @MyVector: [1.0, 3e-9, -900.]@
* You pick out configurations from the @PROLOG@ section(s), usually defined in the @#include@ files, using the @local::@ syntax. The value after the "::" is the name of the configuration specified in the @PROLOG@ (see the next bullet)
* You can override the value of an included configuration. For example, imagine there is a configuration specified in a included file called @mymoduleconfig@ and it contains the value @-5@ for the parameter named @myint@. One can load the configuration and then change the value of @myint@ by doing the following:
*# inside the producers block:
<pre>
physics: {
producers: {
# ...
mymod: @local::mymoduleconfig
}
}
</pre>
*# out of the physics block
<pre>
physics.producer.mymod.myint: 1
</pre>
The last value for a parameter always wins. If the second line was repeated with the value @2@ instead of @1@, the job would run with @myint@ as @2@.
Also note that in the example the original content of @mymoduleconfig@ is not changed when the content of @mymod@ is.

h3. Configuring the "message service":https://cdcvs.fnal.gov/redmine/projects/messagefacility/wiki/Using_MessageFacility#Using-MessageFacility [[Using_the_Framework#Message-Facility-and-MessageLogger|message service]]

Several standard configurations for the message service are in "lardata/Utilities/messageservice.fcl":https://cdcvs.fnal.gov/redmine/projects/lardata/repository/revisions/develop/entry/Utilities/messageservice.fcl. There is one configuration for each level of message output - Debug, Info, Warning, and Error. These configurations will be applied to each message level that is specified and those of higher priority. For example, the Info configuration will print out Info, Warning and Error level messages while the Warning configuration only print outs Warning and Error level messages. The "standard" debug configurations will cause the messages to go to a specified output file, @debug.log@. The Error configuration redirect to standard error stream (like @std::cerr@), while the others print into the standard output (@std::cout@). All impose some limits on the repetition of some frequent messages.
Remember Note that to use one of these "standard" configurations you need to include it be included in your FCL file: file to be used: they are standard, not default.
If you want to define your own configuration, please take a look at the comments in "lardata/Utilities/messageservice.fcl":https://cdcvs.fnal.gov/redmine/projects/lardata/repository/revisions/develop/entry/Utilities/messageservice.fcl file to determine how to do so.

Examples of how to include the usual use of the message service configurations are in the example files below.

To get a different level of output from just one module (say @DBSCAN@) DBSCAN) one would do:

<pre>
services:
{
# Load the service that manages root files for histograms.
TFileService: { fileName: "reco_hist.root" }
Timing: {}
RandomNumberGenerator: {} #ART native random number generator

# configure the message service with the INFO for DBSCAN
# and WARNING level for everything else
message: {
destinations: {
infomsg: {
type: "cout"
threshold: "INFO"
append: true
category: {
DBSCAN: {
reportEvery: 1
}
}
}
warningmsg: {
type: "cout"
threshold: "WARNING"
append: true
categories: {
default: {
limit: 1000
timespan: 60
}
} # end categories
} # end warningmsg
} # end destinations
} # end standard_warning

user: @local::argoneut_services
}
</pre>

h3. Example job script: @prodgenie.fcl@ prodgenie.fcl

An example job script to produce Monte Carlo events is "larsim/EventGenerator/GENIE/prodgenie.fcl":https://cdcvs.fnal.gov/redmine/projects/larsim/repository/revisions/develop/entry/EventGenerator/GENIE/prodgenie.fcl . source:trunk/EventGenerator/prodgenie.fcl. The job defined by this script will generate neutrino interactions using GENIE, run them through Geant4, do the electron transport and then simulate the electronics.

Comments on the form of the file are included as ###### Commment ######

<pre>
###### This is how to include configurations from other files ######
#include "services.fcl" "job/services.fcl"
#include "genie.fcl" "job/simulationservices.fcl"
#include "largeantmodules.fcl" "job/genie.fcl"
#include "detsimmodules.fcl" "job/largeantmodules.fcl"
#include "job/detsimmodules.fcl"
#include "job/driftmodules.fcl"


###### give the process a name ######
process_name: GenieGen

###### Please note the convention of defining detector specific configurations ######
###### Pick out the configurations from the #include files using the @local:: syntax syntanx ######
###### for services from LArSoft, in the user{} block - see definitions for configurations in ######

###### job/geometry.fcl ######
###### job/services.fcl ######
###### job/simulationservices.fcl ######

services:
{
# Load the service that manages root files for histograms.
TFileService: { fileName: "genie_hist.root" }
# Timing records the time spent in each module for each event
Timing: {}
SimpleMemoryCheck: { ignoreTotal: 1 } # default is one message configures the message service
RandomNumberGenerator: {} #ART message: @local::standard_warning
# the ART
native random number generator
RandomNumberGenerator: {}
# LArSoft specific services
user: @local::argoneut_simulation_services @local::argoneut_services
}



###### source is where you get events from - can also be RootInput ######
#Start each new event with an empty event. from scratch
source:
{
module_type: EmptyEvent
maxEvents: 10 # Number of events to create
firstRun: 1 # Run number to use for this file
firstEvent: 1 # number of first event in the file

}

###### physics is the block that controls configuration of modules ######
# Define and configure some modules to do work on each event.
# First modules are defined; they are scheduled later.
# Modules are grouped by type.
physics:
{

###### the module labels in the output file will be generator, largeant, and daq ######
###### their configuration is taken from ArgoNeuT defaults ######
producers:
{

generator: @local::argoneut_genie_simple_neutrino
@local::argoneut_genie
largeant: @local::argoneut_largeant

daq: @local::argoneut_simwire
rns: { module_type: "RandomNumberSaver"

}

analyzers:

{
detsimana: @local::argoneut_simwireana
}

#define the producer and filter modules for this path, order matters,
#filters reject all following items. see lines starting physics.producers below
simulate: [ rns, generator, largeant, daq, daq ]



#define a path for any analyzers to use
anapath: [ detsimana ]

#define
the output stream, there could be more than one if using filters
stream1: [ out1 ]

#trigger_paths is a keyword and contains the paths that modify the art::event,
#ie filters and producers
trigger_paths: [simulate]

#end_paths is a keyword and contains the paths that do not modify the art::Event,
#ie analyzers and output streams. these all run simultaneously
end_paths: [stream1] [anapath, stream1]
}

#block to define where the output goes. if you defined a filter in the physics
#block and put it in the trigger_paths then you need to put a SelectEvents: {SelectEvents: [XXX]}
#entry in the output stream you want those to go to, where XXX is the label of the filter module(s)
outputs:
{
out1:
{
module_type: RootOutput
fileName: "genie_gen.root" #default file name, can override from command line with -o or --output
}
}

</pre>

Notice that you have not specified which libraries to load anywhere. That is because the SRT build system compiles the plugin shared library .so files (@.so@) against the ones .so's they depend upon.



h3. Example job script: @standard_reco.fcl@ standard_reco.fcl

There is an example reconstruction job script available for people to use, source:trunk/Utilities/standard_reco.fcl. This script takes the output of either raw data or MC that has produced simulated raw digits and performs a list of reconstruction tasks.

<pre>
#include "job/services.fcl"
#include "job/caldata.fcl"
#include "job/fftfinder.fcl"
#include "job/clustermodules.fcl"
#include "job/trackfindermodules.fcl"
#include "job/vertexfindermodules.fcl"

process_name: Reco

services:
{
# Load the service that manages root files for histograms.
TFileService: { fileName: "reco_hist.root" }
scheduler: { wantTracer: true wantSummary: true }
message: {}
Timing: {}
RandomNumberGenerator: {} #ART native random number generator
message: @local::standard_warning
user: @local::argoneut_services
}

#source is now a root file
source:
{
module_type: RootInput
maxEvents: 10 # Number of events to create
}

# Define and configure some modules to do work on each event.
# First modules are defined; they are scheduled later.
# Modules are grouped by type.
physics:
{

producers:
{
caldata: @local::argoneut_calwire
ffthit: @local::argoneut_hitfinder
cluster: @local::argoneut_dbcluster
hough: @local::argoneut_houghlinefinder
linemerger: @local::argoneut_linemerger
track: @local::argoneut_track
harris: @local::argoneut_harris
}

#define the producer and filter modules for this path, order matters,
#filters reject all following items. see lines starting physics.producers below
reco: [ caldata, ffthit, cluster, hough, linemerger, track, harris ]

#define the output stream, there could be more than one if using filters
stream1: [ out1 ]

#trigger_paths is a keyword and contains the paths that modify the art::event,
#ie filters and producers
trigger_paths: [reco]

#end_paths is a keyword and contains the paths that do not modify the art::Event,
#ie analyzers and output streams. these all run simultaneously
end_paths: [stream1]
}

#block to define where the output goes. if you defined a filter in the physics
#block and put it in the trigger_paths then you need to put a SelectEvents: {SelectEvents: [XXX]}
#entry in the output stream you want those to go to, where XXX is the label of the filter module(s)
outputs:
{
out1:
{
module_type: RootOutput
fileName: "standard_reco.root" #default file name, can override from command line with -o or --output
}
}

</pre>

h3. How to override a default parameter

If you want to override a default parameter that has been included from a predefined parameter set, you must specify which parameter and its value as

<pre><code class="c">
mainBlock.subBlock.label.parameterName: newValue
</code></pre>

where

* mainBlock can be services or physics
* subBlock can be user, producers, filters, or analyzers
* label is the name of the desired service or module in a producers, filters, or analyzers block
* parameterName is the name of the desired parameter
* newValue is the desired new value

These lines must go after the mainBlock and be outside of any other mainBlocks.

For example, if one wanted to change the default value of the fhitsModuleLabel parameter in the DBcluster module in the previous section, one would put

<pre><code class="c">
physics.producers.cluster.fhitsModuleLabel: "differentHitModuleLabel"
</code></pre>

h3. Example configuration file: geometry.fcl

An example of a file with predefined configurations for a service is in the source:trunk/Geometry/geometry.fcl file:

<pre>
###### All files that are parameter set definitions must contain BEGIN_PROLOG as their first line ######
###### This tag tells the FHICL parser that parameter set definitions are coming ######
BEGIN_PROLOG

###### The argoneut geometry definition ######
argoneut_geo:
{
SurfaceY: 130.0e2 #in cm, vertical distance to the surface
Name: "argoneut"
GDML: "Geometry/gdml/argoneut.gdml"
ROOT: "Geometry/gdml/argoneut.root"
}

###### The microboone geometry definition ######
microboone_geo:
{
SurfaceY: 2.0e2 #in cm, vertical distance to the surface
Name: "microboone"
GDML: "Geometry/gdml/microboone.gdml"
ROOT: "Geometry/gdml/microboone.root"
}

###### The two lbne geometry definitions ######
lbne10kt_geo:
{
SurfaceY: 0.0e2 #in cm, vertical distance to the surface
Name: "lbne10kT"
GDML: "Geometry/gdml/lbne10kT.gdml"
ROOT: "Geometry/gdml/lbne10kT.root"
DisableWiresInG4: true
}

lbne35t_geo:
{
SurfaceY: 0.0e2 #in cm, vertical distance to the surface
Name: "lbne35t"
GDML: "Geometry/gdml/lbne35t.gdml"
ROOT: "Geometry/gdml/lbne35t.root"
DisableWiresInG4: true
}

###### All files that are parameter set definitions must contain END_PROLOG as their last line ######
###### This tag tells the FHICL parser that parameter set definitions are ended ######
END_PROLOG
</pre>

h3. fhicl Emacs syntax highlighting

If you use Emacs as your editor, you can put the following into your .emacs file in your home directory to cause it to display .fcl files with syntax highlighting

<pre>
(setq fclKeywords
'(
;; This, due to poor language design, conflicts with comments and fails
("#include" . font-lock-keyword-face)
("@local" . font-lock-keyword-face)
;; All these names are magic, I think

("process_name:\\|services:\\|source:\\|outputs:\\|physics\\|producers:\\|filters:\\|analyzers:" . font-lock-builtin-face)
("true\\|false" . font-lock-builtin-face)
;; Variable definitions are followed by colons

("[a-zA-Z0-9_]*:" . font-lock-variable-name-face)
)
)

;; Python mode gets us comment handling and indentation at colons

(define-derived-mode fcl-mode python-mode
(setq mode-name "FHICL")
(setq font-lock-defaults '(fclKeywords))
;; (setq tab-width 2) ;; Doesn't seem to work

)

(add-to-list 'auto-mode-alist '("\\.fcl\\'" . fcl-mode))
</pre>

h2. Executable and command line options

Currently there is one executable to run in LArSoft. The executable to run a typical reconstruction or analysis job is lar which is placed in the user's path by the setup script. To see what options are available do

@$lar -h@

The output is

@lar <options> [config-file]:
-T [ --TFileName ] arg File name for TFileService.
-c [ --config ] arg Configuration file.
-e [ --estart ] arg Event # of first event to process.
-h [ --help ] produce help message
-n [ --nevts ] arg Number of events to process.
--nskip arg Number of events to skip.
-o [ --output ] arg Event output stream file.
-s [ --source ] arg Source data file (multiple OK).
--trace Activate tracing.
--notrace Dectivate tracing.@

h2. Running a Job

To run the job defined by the script above, do

@$ lar -c job/prodgenie.fcl@

One can stop a job in two ways,

# type ctrl-c once the job will complete at the end of the current module. If the job is running in the background type @kill -9 %jobID@ on the command line.
# If you type ctrl-c, ctrl-c the job will stop immediately and produce a core dump.

If you want to have your job keep running even if you get disconnected from a remote session you can do

@$ nohup lar job/prodgenie.fcl >& pg.out @

To stop such a job, then do
@$ps aux@ to find the job ID
@kill -INT jobID@

One can print out the configuration of the job without starting the executable by doing

@$ ART_DEBUG_CONFIG=1 lar -c job/prodgenie.fcl@

in @bash@, or

@> env ART_DEBUG_CONFIG=1 lar -c job/prodgenie.fcl@

in cshell which produces the output

<pre>** ART_DEBUG_CONFIG is defined: config debug output follows **
all_modules: [ "out1"
, "daq"
, "generator"
, "largeant"
, "rns"
]
outputs: { out1: { fileName: "genie_gen.root"
module_label: "out1"
module_type: "RootOutput"
}
}
physics: { end_paths: [ "stream1" ]
producers: { daq: { Col3DCorrection: 2.5
ColFieldRespAmp: 3.54e-2
CompressionType: "none"
DriftEModuleLabel: "largeant"
FieldBins: 75
Ind3DCorrection: 1.5
IndFieldRespAmp: 1.8e-2
LowCutoff: 7.5
NoiseFact: 1.32e-1
NoiseWidth: 6.24e1
ResponseFile: "shape-argo.root"
ShapeTimeConst: [ 3000
, 900
]
module_label: "daq"
module_type: "SimWireT962"
}
generator: { BeamCenter: [ 2.5e-1
, 0
, 0
]
BeamDirection: [ 0
, 0
, 1
]
BeamName: "numi"
BeamRadius: 3
DebugFlags: 0
DetectorLocation: "MINOS-NearDet"
Environment: [ "GSPLOAD"
, "gxspl-NUMIsmall-R2.6.0.xml"
, "GPRODMODE"
, "YES"
, "GEVGL"
, "Default"
]
EventsPerSpill: 0
FiducialCut: "none"
FluxFiles: [ "argoneut/gsimple_ArgoNeuT_le010z185i_run3_38l0-9r_00001.root" ]
FluxType: "simple_flux"
GenFlavors: [ 12
, 14
, -12
, -14
]
GlobalTimeOffset: 10000
MixerBaseline: 0
MixerConfig: "none"
MonoEnergy: 2
POTPerSpill: 5e13
PassEmptySpills: false
RandomTimeOffset: 10000
SurroundingMass: 0
TopVolume: "volTPCActive"
module_label: "generator"
module_type: "GENIEGen"
}
largeant: { DebugVoxelAccumulation: 0
DisableWireplanes: false
DumpLArVoxelList: false
DumpParticleList: false
GeantCommandFile: "LArG4/LArG4.mac"
SmartStacking: 0
VisualizeEvents: false
module_label: "largeant"
module_type: "LArG4"
}
rns: { module_label: "rns"
module_type: "RandomNumberSaver"
}
}
simulate: [ "generator"
, "largeant"
, "daq"
]
stream1: [ "out1" ]
trigger_paths: [ "simulate" ]
}
process_name: "GenieGen"
services: { RandomNumberGenerator: {}
SimpleMemoryCheck: { ignoreTotal: 1
}
TFileService: { fileName: "genie_hist.root"
}
Timing: {}
message: { destinations: { STDOUT: { categories: { ArtReport: { limit: 100
}
default: { limit: -1
}
}
threshold: "INFO"
type: "cout"
}
}
}
user: { BackTracker: { G4ModuleLabel: "largeant"
}
CatalogInterface: { service_provider: "TrivialFileDelivery"
}
DatabaseUtil: { DBHostName: "fnalpgsdev.fnal.gov"
DBName: "argoneut_dev"
DBUser: "argoneut_reader"
PassFileName: ".apswd"
Port: 5457
ShouldConnect: true
ToughErrorTreatment: false
}
DetectorProperties: { ElectronsToADC: 1.208041e-3
NumberTimeSamples: 2048
ReadOutWindowSize: 2048
SamplingRate: 198
TimeOffsetU: -5.193
TimeOffsetV: 5.85e-1
TimeOffsetW: 0
TriggerOffset: 60
}
FileTransfer: { service_provider: "TrivialFileTransfer"
}
Geometry: { GDML: "Geometry/gdml/argoneut.gdml"
Name: "argoneut"
ROOT: "Geometry/gdml/argoneut.root"
SurfaceY: 13000
}
LArFFT: { FFTOption: "P"
FitBins: 20
}
LArG4Parameters: { CosmogenicK0Bias: 0
CosmogenicXSMNBiasFactor: 1
CosmogenicXSMNBiasOn: 0
DisableWireplanes: false
ElectronClusterSize: 600
EnabledPhysics: [ "Em"
, "SynchrotronAndGN"
, "Ion"
, "Hadron"
, "Decay"
, "HadronElastic"
, "Stopping"
, "NeutronTrackingCut"
]
KeepEMShowerDaughters: false
LongitudinalDiffusion: 6.2e-9
OpticalSimVerbosity: 0
ParticleKineticEnergyCut: 1e-5
StoreTrajectories: true
TransverseDiffusion: 1.63e-8
UseCustomPhysics: false
VisualizationEnergyCut: 1e-2
VisualizeNeutrals: false
}
LArProperties: { AbsLengthEnergies: [ 9.5
, 9.7
, 9.9
]
AbsLengthSpectrum: [ 2000
, 2000
, 2000
]
AtomicMass: 3.9948e1
AtomicNumber: 18
Efield: [ 4.81e-1
, 7e-1
, 8.9e-1
]
Electronlifetime: 750
ExcitationEnergy: 188
FastScintEnergies: [ 9.5
, 9.7
, 9.9
]
FastScintSpectrum: [ 5e-1
, 1
, 5e-1
]
RIndexEnergies: [ 9.5
, 9.7
, 9.9
]
RIndexSpectrum: [ 1.38
, 1.38
, 1.38
]
RadiationLength: 1.955e1
RayleighEnergies: [ 9.5
, 9.7
, 9.9
]
RayleighSpectrum: [ 90
, 90
, 90
]
ReflectiveSurfaceDiffuseFractions: [ [ 5e-1
, 5e-1
, 5e-1
] ]
ReflectiveSurfaceEnergies: [ 9.5
, 9.7
, 9.9
]
ReflectiveSurfaceNames: [ "STEEL_STAINLESS_Fe7Cr2Ni" ]
ReflectiveSurfaceReflectances: [ [ 2.5e-1
, 2.5e-1
, 2.5e-1
] ]
ScintBirksConstant: 3.22e-3
ScintFastTimeConst: 6
ScintResolutionScale: 5e-3
ScintSlowTimeConst: 1590
ScintYield: 24000
ScintYieldRatio: 3e-1
SlowScintEnergies: [ 9.5
, 9.7
, 9.9
]
SlowScintSpectrum: [ 5e-1
, 1
, 5e-1
]
SternheimerA: 1.956e-1
SternheimerCbar: 5.2146
SternheimerK: 3
SternheimerX0: 2e-1
SternheimerX1: 3
Temperature: 8.84e1
}
LArVoxelCalculator: { VoxelEnergyCut: 1e-6
VoxelOffsetT: -2500
VoxelOffsetX: 0
VoxelOffsetY: 0
VoxelOffsetZ: 0
VoxelSizeT: 5000
VoxelSizeX: 3e-2
VoxelSizeY: 3e-2
VoxelSizeZ: 3e-2
}
MagneticField: { ConstantField: [ 0
, 0
, 0
]
MagnetizedVolume: "vWorld"
UseField: false
}
}
}
source: { firstEvent: 1
firstRun: 1
maxEvents: 10
module_label: "source"
module_type: "EmptyEvent"
}
trigger_paths: { trigger_paths: [ "simulate" ]
</pre>

This functionality is particularly helpful when trying to debug what input parameters were passed to the job.

h2. Why did my job fail?

If a job fails with a seg fault look at the warnings printed to the screen or any output log files.

If a bug should be reported to the artists@fnal.gov list, attach the complete output of the job to the email.

h2. Submitting Jobs to the compute farms

Instructions are on [[Batch_job_submission|this page]].