Submitting NOvA ART Jobs


Many analyses will require users to run a nova ART job on the grid. A suite of tools has been developed to make this task simple and easy. This page describes those tools and their usage. While the tools make the task relatively simple, users should be aware that they are about to harness a sophisticated suite of technology. A solid understanding of this suite will help you and the experts to debug any problems; it may also adequately prepare you for a software engineer position at Google.

In an ideal (i.e. dream) world, users would be able to write a simple configuration and get their work done with little thought. The utility was created as an effort to realize that dream. In reality, however, that submission script relies on a host of other software which users should attempt to understand. Usage of is described in a dedicated section on this page. New users in a hurry are welcome, but not encouraged, to skip straight to that section; but those who do will be ill prepared to resolve errors if they occur.


Novice users have often demonstrated poor understanding of the interplay between grid jobs and SAM. The systems are in fact two distinct components which are used in conjunction. Later sections will assume an understanding of these concepts, so new users should read this section carefully.

The Grid

A grid is large cluster of worker nodes controlled by a submission (or head) node. Each worker node is a CPU, each with a local disk for temporary file storage. The submission node maintains a queue of jobs which need to be run and distributes those jobs to worker nodes based on a user priority system. Submitting jobs to the grid means adding jobs to the queue. Jobs must be configured to run a specific executable along with any required arguments. On Fermigrid, these configurations are transmitted to the submission node using the jobsub_client system. More details on jobsub_client can be obtained through the official FIFE documentation.


Sequential Access Metadata (SAM) is a data handling solution developed by Fermilab's Scientific Computing Division (SCD) to efficiently deliver tape-archived files. The tape archive is supplemented by the large dCache disk array which stores recently used files. Technically, SAM is just a database of file names, locations and metadata; in practice, it's the bit of machinery that ties everything together. One of the key features of SAM is that it obfuscates users from nitty-gritty file details like names and locations in favor of higher-level information cataloged by the file metadata. Metadata classify files based on their key features, like processing tier, run number, trigger stream, generator type, etc. Files can be grouped using constraints on the file metadata, for instance:

data_tier reco and online.detector fardet and 0 and online.runnumber 12942

The interactive SAM Web Cookbook provides a host of examples involving metadata constraints.

SAM dataset definitions can be used to package up a set of constraints. Once a definition exists which suits a user's needs, a SAM project can be created. A project uses a snapshot of a dataset definition as a list of input files. Each job (established as a process in SAM language) communicates with a project to request file locations and provide status updates. Job status can be tracked using the SAM Station Monitor. The Station Monitor lists all recent projects with a link to a page which displays a myriad of information, including the status of each process. The wiki-based SAM Web Cookbook provides other examples involving project functionality.

Generic Grid Executables

Prior to addressing job submission, which is covered in a later section, we must discuss the executables which will run on the worker nodes. A simple example of such an executable would be a shell script which sets up the novasoft environment, copies a file to the local disk and run a nova ART job over it. In order to prevent scores of users from writing custom scripts which do the same thing, a pair of scripts have been developed to handle a wide variety of processing scenarios. The first of these scripts is called, which generally serves to set up the environment, fetch files from SAM and run a sub-executable over each file. The prescription described in the submission section uses as that sub-executable, since it carefully handles the naming of output files prior to running a nova ART job. A more complete description for each of those scripts can be found below.

Initially developed by SCD, is a general purpose wrapper for retrieving files cataloged by SAM. A copy of the script was committed the nova offline repository (currently stored in the Metadata/samUtils package in the novasoft repository) in early 2014 and modified slightly, effectively branching the software from the SCD maintained version. (At some point we may move back to the SCD supplied version. If you notice that we already have, please modify this text accordingly.)

This script is commonly submitted to the grid as the primary executable that will be the run on the grid node. The script picks up a SAM project using the $SAM_PROJECT_NAME environment variable, typically exported to the grid nodes using the condor -e argument. A process is established with the SAM station so that file status can be reported and monitored using the Station Monitor web interface. File locations and are obtained via SAM and fetched to the local scratch space, then passed to an executable supplied through the -X argument. The original intention was for this executable to be an ART executable (e.g. nova) so the -c option is passed along the path to a fcl job configuration file. The job fcl file is supplied to through the --config argument. Common usage for NOvA is to instead use (described below) as the executable, which picks up the -c option and eventually passes it on to the nova ART job.

Other convenient arguments include:

Argument Description
--source Sources an arbitrary bash script. This can be used to set up the software. Arguments can be supplied, but spaces must be replaced with a colon (:) character
--export Exports an environment variable. Usage --export MY_VAR=thing_to_set_it_to
--multifile Run the executable (-X argument) multiple times over separate files.
--limit Maximum number of files to process in --multifile mode.
--getconfig Fetch fcl configuration files from SAM project, useful for MC generation.

Even more arguments can be found using --help.

Although it is possible to use to run a nova ART job, there are frequent operations which must be performed before and after it is run. A python script called has been developed as a general platform for performing these operations. It features options for naming output files, copying them to a destination and sorting those files within that destination. Copy-out functionality is enabled with the --copyOut argument. File names are controlled with the --outTier, --cafTier and --histTier options.

Output file options:
Argument Description
--copyOut Enables copyback to output directory, determined from $DEST environment variable.
--outTier Enables naming and copy-out for ART event ROOT files . Format is --outTier <output module label>:<extension>. The <module label> must match an ART output module in the job fcl (typically out1) . The output file names will end with <extension>.root. This argument can be used multiple times for jobs with more than one output, e.g. --outTier out1:pidpart --outTier out2:lemsum.
--cafTier Enables naming and copy-out for CAF files. Format is --cafTier <module label>:<extension>. The <module label> must match a CAFMaker module label in the job fcl (typically cafmaker) . The output file names will end with <extension>.root. In a situation with multiple CAFMaker module instances, this can be used multiple time. (Five points will be awarded to any user who finds a use for that.)
--histTier Enables naming and copy-out for ART "hist" files from the TFileService. Format is --histTier <extension>. The output file names will end with "<extension>.root". This argument can only be used once since the TFileService is only configured once per job; services are singletons.

Job Submission

Submitting jobs to run over files in a SAM project involves two steps. First, the project must be started using samweb start-project. After that, jobs can be submitted which will establish themselves as processes under that project. There exists a script, which wraps up these two steps into one configurable script. Users are encouraged to use that script and work with experts on adding any functionality which may be missing. This section will cover the basic features of, then move on to describe the generic submission script and show some examples. An appendix breaks down the gory details of an example jobsub_client submission block.

Please note: before you submit any jobs using a SAM definition, you must check whether the files in the definition are cached, and pre-stage them if they aren't. The following instructions explain why.

Most of NOvA's files are not readily available; they are stored on archive tapes in the SCD tape library. To use files in permanent storage like this, they need to be recovered and put into a fast disk cache ("staged") from which they can be accessed in real time. This happens automatically when the files are requested, but it's usually a slow process because a robot has to go and physically pull a tape out of a library of physical tapes and read it to obtain the file. If you submit a grid job over a definition with files that aren't cached, the grid job waits, doing nothing, until the robot can read the tape and transfer the file to the cache. This wastes a lot of grid resources that otherwise would be available for somebody else to use, and if you do this, you'll likely get an automated email from FIFE warning you that your job efficiency was too low. (More information on the tape and cache system is at Tape_and_Cache.) Instead, the time spent re-caching files should be done in advance, prior to job submission.

If you haven't used a particular definition in less than 30 days, before submitting any jobs with it:

  • Check the cache status of your definition using the script with the -d flag, as documented on the Tape_and_Cache page.
  • If your definition isn't 100% cached, it will needed to be prestaged. Instructions for prestaging datasets are at the bottom of the SAM_web_cookbook page.
    If your definition contains more than 1000 files, please consult the Production conveners before starting a prestaging process for it; restaging of a large dataset without coordinating with Production can interfere with the Production schedule.

The general purpose script can be used to submit nova art jobs that require SAM input. This script does extensive error checking to ensure that the arguments supplied are valid (for example that the specified output directory exists), starts up a SAM project and submits jobs to the grid. It uses the new jobsub_client suite for job submission. It is possible to supply all required information through command line arguments or a configuration file. Required and optional arguments are described in an extensive help message (use --help to view). In general, the help message is always the most up-to-date documentation.

$ <arguments>

Arguments can passed to the submitter through the command line. Some users may opt to write these arguments in a shell script. Arguments can also be written in a plain, whitespace-insensitive text file, as will be shown in the example.

Required Arguments

At a minimum, you must specify a job name, the input dataset definition, job fcl, novasoft tagged release (software version) and output destination. Note: submit the jobs from an environment which has the same tagged release set up as the jobs are configured, since the script will check for sanity. You specify this minimum information with the following options:

  --jobname JOBNAME     Job name
  --defname DEFNAME     SAM dataset definition to run over
  --config CONFIG, -c CONFIG
                        FHiCL file to use as configuration for nova executable. The path given should be relative to the $SRT_PRIVATE_CONTEXT of any test release you submit using'
  --tag TAG             Tag of novasoft to use
  --dest DEST           Destination for output files

Debugging Options

You can use the --print_jobsub option to print the jobsub command. The --test option is used to run error checking and print the jobsub command, but does not actually start the SAM project or do the job submission.

  --print_jobsub        Print jobsub command
  --test                Do not actually do anything, just run tests and print
                        jobsub cmd
  --gdb                 Run nova executable under gdb, print full stack trace,
                        then quit gdb.
  --test_submission     Override other arguments given to submit a test to the
                        grid. It will run 1 job with 3 events and write the
                        output to /pnfs/nova/scratch/users/<user>/test_jobs/<d

Job Control Options

For realistic cases, you will most likely want to split the processing into several jobs

  --njobs NJOBS         Number of jobs to submit
  --maxConcurrent MAXCONCURRENT
                        Run a maximum of N jobs simultaneously
  --files_per_job FILES_PER_JOB
                        Number of files per job - if zero, calculate from
                        number of jobs
  --nevts NEVTS         Number of events per file to process
  --no_multifile        Do not use multifile mode, which is on
                        by default
  --txtfiledef          Use if the input definition is made up of text files,
                        each containing a list of file names
  --opportunistic       Run opportunistically on the fermigrid
  --offsite             Allow to run on offsite resources as well. Implies
                        --opportunistic and --cvmfs.
  --offsite_only        Allow to run solely on offsite resources. Implies
  --amazon              Run at amazon. Implies --cvmfs.
  --site SITE           Specify allowed offsite locations. Omit to allow
                        running at any offsite location
  --recommended_sites   Specify known working offsite locations.
  --os OS               Specify OS version of worker node
  --disk DISK           Local disk space requirement for worker node in MB.
  --memory MEMORY       Local memory requirement for worker node in MB.
  --expected_lifetime EXPECTED_LIFETIME
                        Expected job lifetime (default is 10800s=3h). Valid
                        values are an integer number of seconds or one of
                        "short" (6h), "medium" (12h) or "long" (24h, jobsub
  --dynamic_lifetime LIFETIME
                        Dynamically determine whether a new file should be
                        started based on glidein lifetime. Specify the maximum
                        length expected for a single file to take to process
                        in seconds.
  --group GROUP, -G GROUP
                        Specify batch group GROUP -- mainly used to set job
                        priority. At present, only supportable value is nova
  --role ROLE           Specify role to run on the grid. Can be Analysis
                        (default) or Production. This option is no longer
  --continue_project CONTINUE_PROJECT
                        Don't start a new samweb project, instead continue
                        this one.
  --snapshot_id ID      Use this existing snapshot instead of creating a new
  --mix MIX             Pass a mixing script to the job to pull in a files for
                        job mixing.

The multifile mode is turned on by default, but can be turned off using the --no_multifile option if desired.

  --no_multifile        Do not use multifile mode, which is on
                        by default

Software Options

The following options control nova software.

  --maxopt              Run in maxopt mode
  --testrel TESTREL     Use a test release at location TESTREL. It will be
                        tarred up, and sent to the worker node.
  --user_tarball USER_TARBALL
                        Use existing test release tarball in specified
                        location rather than having jobsub make one for you
                        (conflicts with --testrel)
  --reuse_tarball       Do you want to reuse a tarball that is already in
                        resilient space? If using this option avoid trailing
                        slash in --testrel option. (conflicts with
  --cvmfs               Does nothing (always true), but retained for
                        compatibility: pull software from CVMFS.
  --novasoftups         Use the ups build of novasoft, must be used with
                        source to setup.
  --ngu_test            Setup the test version of NovaGridUtils in the grid
  --ngu_version NGU_VERSION
                        Setup a specific NovaGridUtils version in the grid
  --lemBalance          Choose lem server based on (CLUSTER+PROCESS)%2 to
                        balance load
  --lemServer LEMSERVER
                        Specify lem server

File Output Options

Most use cases require a method to copy back output. You can either use the built in copyOut method by supplying the --copyOut option. Or you can use --copyOutScript COPYOUTSCRIPT to specify a script to copy your output back. If you use the builtin copyOut method, you must also specify at least one of --outTier, --histTier or --cafTier.

  --copyOutScript COPYOUTSCRIPT
                        Use script COPYOUTSCRIPT to copy back your output
  --copyOut             Use the built in copy out mechanism. If used, you must
                        specify --outTier, --cafTier or --histTier
  --logs                Return .log files corresponding to every output
  --zipLogs             Format logs as .bz2 files. Implies --logs
  --outTier OUTTIER     Data tier of the output file, multiple allowed,
                        formatted as <name_in_fcl_outputs>:<data_tier>
  --cafTier CAFTIER     Module label for CAF output, multiple allowed. Format
                        as <cafmaker_module_label>:<data_tier>
  --histTier HISTTIER   File identifier string for TFileService output, only
                        one allowed. Supply as --histTier <id> for
                        output_name.<id>.root, where output_name is assembled
                        based on the input file.
  --outputNumuDeCAF     Make standard numu decafs for all CAF files produced
                        during the job
  --outputNueDeCAF      Make standard nue decafs for all CAF files produced
                        during the job
                        Make standard nue or numu decafs for all CAF files
                        produced during the job
  --outputNusDeCAF      Make standard nus decafs for all CAF files produced
                        during the job
  --npass NPASS         To specify npass (aka nova.subversion)
  --skim SKIM           To specify nova.skim
  --systematic SYSTEMATIC
                        To specify nova.systematic
  --specialName SPECIALNAME
                        To specify nova.special name
  --hashDirs            Use hash directory structure in destination directory.
  --runDirs             Use run directory structure in destination directory,
                        000XYZ/XYZUW for run number XYZUW.
  --noCleanup           Pass --noCleanup argument to Necessary
                        when using a postscript for copyout.
  --jsonMetadata        Create JSON files with metadata corresponding to each
                        output file, and copy them to the same destinations
  --declareFiles        Declare files with metadata on worker node
  --production          Submit production style jobs. Implies "--
                        role=Production --hashDirs --jsonMetadata --zipLogs",
                        and checks that other settings needed for production
                        are specified
  --calibration         Submit calibration style jobs. Implies "--
                        role=Production", and checks that other settings
                        needed for calibration are specified
  --declareLocations    Declare the file output locations to SAM during the
                        copy back of the files

Job Environment Options

There are a handful of methods for controlling the job environment.

  --export EXPORT       Export variable EXPORT to
  --source SOURCE       Source script SOURCE
  --prescript PRESCRIPT
                        Execute script PRESCRIPT before executing
  --postscript POSTSCRIPT
                        Execute script POSTSCRIPT after executing
  --inputfile INPUTFILE
                        Copy this extra input file into job area before
                        running executable

To export any environment variables, make sure to export that variable in your environment before submitting the job. For instance, to set the version number (Nova.SubVersion metadata parameter), do export NPASS=2 in your terminal, and add --export NPASS to your job configuration.

Support Options

  -h, --help            Show this help message and exit
  -f FILE, --file FILE  Text file containing any arguments to this utility.
                        Multiple allowed. Arguments should look just like they
                        would on the command line, but the parsing of this
                        file is whitespace insenstive. Comments will be
                        identified with the # character and removed.

Use a custom fhicl file

An alternative to the quoted text above (--config/-c) when using your very own fhicl file is to pass it the job yourself. In this case, first ensure your fhicl file is copied into dCache somewhere (/pnfs/nova/scratch/users/<your username> is probably the best choice). Then, add these lines to your configuration:

--inputfile /pnfs/path/to/fcl/<fclname>.fcl
-c <fclname>.fcl

Example Submission Configuration

For this example, we will use the text file input method, where the file is passed to using the -f (or --file) option. The parsing of the file is whitespace insensitive and allows comments escaped with #.

# Example configuration for
# Usage: -f <this file>
# Use --test to run sanity checks without creating project and submitting 

# Job and project options 
--jobname davis_count_argon_atoms                        # Name of your project/jobs, be creative 
--defname prod_reco_S14-11-25_homestake_genie_nonswap    # SAM dataset definition, defines files to be processed 
--njobs 1500                                             # Number of jobs to run 
--files_per_job 20                                       # Maximum number of files to be processed by each job 
--opportunistic                                          # Run in opportunistic mode, i.e. steal non-NOvA nodes, optional 
--print_jobsub                                           # Print jobsub submission block, good for records 

# novasoft options 
-c argoncounterjob.fcl                                   # Job fcl for nova executable 
--testrel /nova/app/users/davis/dev_2014-02-08_chlorine  # Path to test release, optional. Note lack of trailing slash
--reuse_tarball                                          # Option to reuse the newest tarball for the above test release which is stored in /pnfs/nova/resilient/...
--tag development                                        # Tagged release of novasoft to use 
--maxopt                                                 # Run in maxopt, optional 

# Copy-back: options for built-in .  
# Advanced usage can replace this block with the --copyOutScript option
--dest /nova/ana/users/davis/SolarAnomaly/               # Output directory 
--copyOut                                                # Copy back output to --dest location 
--runDirs                                                # Sort output by run number 
--outTier out1:arcount                                   # Extension for ART-ROOT output stream out1: arcount.root                      
--histTier argon_hist                                    # Extension for hist (TFileService) output: argon_hist.root 
--cafTier=cafmaker:caf                                   # Extension for CAFMaker with module lable cafmaker: .caf.root

My job is submitted. Now what?

Information on monitoring jobs can be found here: Monitoring Grid Jobs


Anatomy of a jobsub_client Submission

This section does not serve as a replacement for the full jobsub_client documentation, but it does attempt to describe all of the components in an ART/SAM job using the and scripts. The jobsub_submit executable is used for submission. A fully configured submission is as follows:

jobsub_submit \ 
    -N 800 \ 
    --resource-provides=usage_model=DEDICATED,OPPORTUNISTIC \
    -G nova \ 
    --role=Analysis \
      file:///grid/fermiapp/nova/novaart/novasvn/releases/FA14-11-25/Metadata/samUtils/ \
      --multifile \
      --export EXTERNALS='/nusoft/app/externals' \
      --export DEST=/pnfs/nova/scratch/fts/ParticleID_dropbox/ \
      --export CVMFS_DISTRO_BASE='/cvmfs/' \
      --config Production/fcl/prod_pidpart_job.fcl \
      --source /grid/fermiapp/nova/novaart/novasvn/setup/ \
      --limit 100 \
      -X \
        --copyOut \
        --outTier out1:pid

That is a bit of a mouthful, so we can take a look at the arguments one by one.

Submit 800 jobs:

    -N 800 \

Run on NOvA dedicated nodes as well as opportunistically on other nodes:

    --resource-provides=usage_model=DEDICATED,OPPORTUNISTIC \

Specify the nova group for accounting purposes:

    -G nova \  

Export a few necessary environment variables:


Notably, SAM_PROJECT_NAME tells which project to talk to.

Specify role for grid proxy/authentication:

    --role=Analysis \

(Default is Analysis, this is pedantic.)

Tell jobsub_submit which executable to use, in this case:

     file:///grid/fermiapp/nova/novaart/novasvn/releases/FA14-11-25/Metadata/samUtils/ \

Note, the things which follow are no longer arguments to jobsub_submit, they are arguments for

Run over more than one file per job:

      --multifile \

Export the location of the external software packages.

      --export EXTERNALS='/nusoft/app/externals' \

Export the output destination, passed to as $DEST:

      --export DEST=/pnfs/nova/scratch/fts/ParticleID_dropbox/ \

Specify the fcl configuration to run:

      --config Production/fcl/prod_pidpart_job.fcl \

Tell to run the novasoft setup.

      --source /grid/fermiapp/nova/novaart/novasvn/setup/ \

Set the limit for number of files per job in

      --limit 10 \

Tell to use as the executable.

      -X \

Note, the remaining arguments are not arguments for , but instead

Instruct to copy output files to $DEST@ and specify which sort of files should be copied out. Note, --histTier and --cafTier are also valid options.

        --copyOut \
        --outTier out1:pidpart

Running Offsite

The script supports running offsite using the --offsite or --offsite_only options. Use --offsite if you don't care where your jobs run. Use --offsite_only if you want to force your jobs to run only on offsite grid nodes. You can target specific sites by using the --site option. The following are the sites available:

  • Caltech.
  • FZU (Prague).
  • Harvard. (DO NOT use)
  • Michigan.
  • MIT. (DO NOT use)
  • MWT2 (Mid-West Tier 2). (DO NOT use)
  • Nebraska.
  • Omaha.
  • OSC (Ohio SuperComputing Center).
  • SMU_HPC.
  • SU-OG.
  • UChigaco. (DO NOT use)
  • UCSD.
  • TTU. (DO NOT use)
  • Wisconsin.

For example: --offsite --site Harvard (type the name of the site as listed above) forces your jobs only to run at Harvard. You can specify the --site options multiple times. So: --offsite --site Harvard --site FZU would force your jobs to run at Harvard or at FZU (Prague), but nowhere else.

The performance plot (find it at the bottom of this page) aids the user to decide which offsite locations are more likely to successfully complete jobs sent offsite. The performance score takes into account: the fraction of jobs successfully completed, the total time used to complete the full set of submitted jobs, the idle time taken to start the first job, and the average time to process an individual file. The score runs continuously from 1 to 16, where the best possible score is 1. The Offsite bin in the vertical axis indicates the performance of jobs sent using the --offsite_only option. The performance plot, average version, presents the average performance of the last week. The performance plot, latest version, presents the latest test. NOVA-doc-14304 has more detailed metrics of the latest test and the average of the last week. The performance plot is updated regularly.

The following link:

presents the configurations required to run in each of the non-Fermilab sites. Most sites allocate 2500MB of memory, except for MWT2 and UChicago that allocate 2000MB, and UCSD and Omaha that allocate 4096MB and 4000MB respectively. To meet the memory requirement for each site use the: --memory, option indicating the requested memory value. NOvA submission scripts have a default memory value of 4000MB.

The Blue-arc disks are not visible at offsite nodes. This means that test releases will not work offsite. It also means that if you want to use a custom fcl file, you will need to use some extra magic. Keep your fcl file in the directory you are submitting from. Then also add the option --inputfile /absolute/path/to/fcl. In the future, this should be made more user-friendly.

By default, the script only allows you to submit your jobs to a predefined list of sites. If you want to submit to a site not on the list, define the environment variable EXTRA_ALLOWED_SITES as a colon delimited list of additional sites you want to be allowed to use. This is intended as an expert feature to allow testing of new sites without maintaining locally modified copies of If there are additional sites you want added to the list, you should contact nova_production.