Project

General

Profile

Grid Running

Computing division now maintains the documentation for running jobs on the grid. An introduction is provided here: Welcome New Computing Users.

If you are attempting to run ART jobs using SAM input, please use the submit_nova_art.py tool, which has extensive documentation here: Submitting NOvA ART Jobs

Please check the jobsub client user guide for documentation on how to submit, monitor and manage grid jobs. If you are writing your own grid scripts, you'll need to know how to get your input data and copy your results back a place where you can access them. For documentation on that consult the REX DH Wiki or NOvA SAM Wiki.

Run time

You should aim for your jobs to each run for a few hours. Jobs that run less than an hour are inefficient since there is a certain amount of overhead involved in getting a grid job started. Jobs that run longer than 24 hours at Fermilab are at risk of being killed by the grid. At sites other than Fermilab, the cut-off time could be considerably shorter (8 or 12 hours). This means you should run a single short job to estimate the time each job will take before submitting them all.

The original version of this page is still available. It may contain some useful information, but it is also out of date.

Monitoring Grid Jobs

Jobs in the Queue

After submitting jobs, one immediately gets the itch to know the job status. The most important information comes from jobsub_q, which reports the status of the scheduler queue. The jobsub_q output reports whether jobs are running or idle (still waiting). By default, jobsub_q reports every job running, but the --user and --jobid arguments will narrow down the output. Complete documentation can be found on the jobsub_q documentation page.

Job Logs

If a job is no longer visible in the queue, it has probably finished running. In this case, the log should be accessible. The logs can be retrieved using jobsub_fetchlog, which has its own set of documentation. The logs will include two files for each job, .out for stdout and .err for stderr. There is also a .log file, which contains information from condor about when your jobs started and ended, as well as intermediate status.

SAM Station Monitor

If you happen to be using a SAM project for file delivery, the status for each file delivered can be monitored using the SAM Station Monitor. It is important to note, however, that this monitor only tracks file delivery. Jobs can fail to start or be terminated in a way which causes the station monitor to report misleading information. Always fall back to the queue and logs if you are confused.

The information from the station monitor can also be accessed from the command line using samweb project-summary.

Handling job output

Where to write your files (dCache/pnfs)

The Bluearc filesystem used to hold data for interactive jobs (/nova/ana) is too slow to handle the output from many parallel jobs. Trying to write your output will cause your jobs to sit and hang for a long time, meaning that they are taking up a CPU on the grid without really doing anything.

Instead, you should be writing the output of your files to dCache. dCache is not a filesystem in the traditional sense -- you cannot do everything you would do with a normal disk -- but it is fast enough to handle the output of many thousands of jobs simultaneously. The general rule-of-thumb is that the output of your grid jobs (argument to --dest in submit_nova_art.py, see here) should be:

/pnfs/nova/scratch/users/<username>

This area is large, but it is not persistent. As long as you are using your files they will stay there, but once you stop using them they are at risk of deletion (right now lifetimes are on the order of 1 month). You start by putting your files here so you can check that they are correct without using up space on tape. How to do that in the next section.

If you're writing your own job scripts, you will need to use ifdh cp to copy your output to dCache (see below for usage)

More details on the limitations of Bluearc and best practices can be found at the FIFE data handling wiki. More details about NOvA disk systems can be seen at Disks.

Look at a single root file on dCache

If there is a root file you want to look at, you can use a the xrootd interface which allows root to stream files off of dCache. This can work even if you are not located at Fermilab. The utility pnfs2xrootd will convert a path in pnfs into a uri that xrootd can understand. You can then use this uri to open the file directly with root:

root `pnfs2xrootd /pnfs/nova/scratch/users/<user>/myjob/job0.root`

Merge root files on dCache

If you want to combine the output of your jobs into a single root file on Bluearc, you can again do that using pnfs2xrootd (this is a recent feature as of 8/2015, so this may not work with older releases):

hadd /nova/ana/users/<user>/combined.root `pnfs2xrootd /pnfs/nova/scratch/users/<user>/myjob/*.root`

Note that the output must go to Bluearc, you cannot write directly to dCache using hadd.

Copy a limited number of files from dCache to Bluearc

If you really need to access the file directly, you can copy an individual file back to Bluearc (note, you should not copy large amounts of data back to Bluearc):

ifdh cp /pnfs/nova/scratch/users/<user>/myjob/job0.txt /nova/ana/users/<user>/job0.txt

Note that the syntax of ifdh cp (documentation here) is not the same as regular cp, specifically it does not automatically recognize the destination as a directory. To copy a series of files into an existing directory, use:

ifdh cp -D file1 file 2 /some/directory/

How to keep your dCache files long-term

If you decide your files are worth keeping in the longer term, you can copy them to our persistent area using ifdh cp (documentation here):

ifdh cp -r /pnfs/nova/scratch/users/<username>/myjob /pnfs/nova/persistent/users/<username>/myjob

However, this area is not cleaned up automatically, so if we fill it up we fill it up. So please be judicious about what you copy there, and remove what you no longer need.

Anatomy of a Job Script

The job_script is the command that will be run under the condor system. Typically this is an executable shell script written by the user to do some particular unit of processing. Those run by the batch system generally takes a common form.

Also note that /nova/app (and /nusoft/app) will be mounted read-only, executable. That means scripts, executables and libraries can reside there but jobs will not be able to write to it. On the other hand, /nova/data and /nova/ana will be mounted writable, but no-exec. Your job can write to it (though for grid jobs, only if the directory is group writable) but executables and libraries can not reside there. Plan accordingly. Also, just because they might be writable doesn't mean one should use an unregulated direct write (see above discussion of ifdh cp).

a) Setup

This part of the script should setup the desired work environment. Jobs do not run the user's normal .bashrc, .bash_profile or equivalent; on the grid they couldn't as home areas are in AFS and AFS is not mounted on the grid nodes, so don't even try to access AFS.

If one has made use of jobsub_submit's flags -r nova_release or -t test_rel_dir that may be sufficient. One can forego using the -r and -t flags to jobsub_submit and setup the NOvA (or appropriate) work environment directly if desired.

b) Work Area

The variable ${_CONDOR_SCRATCH_DIR} defines a working disk area unique for the particular job. This section of the job can create any necessary subdirectories under that area. Processing should be done with this (or a subdirectory) as the current working directory. Output files should not be written directly to BlueArc while being generated; holding open file handles to network storage for long periods is a BadIdea™.

c) Fetch Input

Jobs that need input files should fetch them appropriately. If fetching from BlueArc then use remember to use ifdh cp.

d) Processing

This is the heart of the job. It is here that the real processing is done.

e) Return Results

Once the real processing has is done any resulting output must moved from the job's scratch space to permanent storage (BlueArc or dCache). Use ifdh cp to move files to either.

f) Clean Up

Condor will delete files under ${_CONDOR_SCRATCH_DIR} from the worker node. Depending on the configuration of condor, some files might automatically be transferred back to the /nova/data/condor-tmp/ user area upon completion of the job_script, so it might be wise to delete any unnecessary files at this point.

Example

#! /usr/bin/env bash

# Useful predefined env variables:
#
# ${PROCESS} is the individual job # when multiple jobs are run as a cluster 
#    via "jobsub -N <n> script args"  values [0...n-1]
# ${_CONDOR_SCRATCH_DIR} this job's unique work area
# 
# Note: on the grid $USER and such will not be "you" 
#       it appears ${GRID_USER} is (and set even for local IF batch nodes)
#

#============================================================================
# Section (a):  Setup
# Define the work environment 
# (shown here is an alternative to using jobsub_submit -r & -t flags)

function setup_novaoffline {
  source /grid/fermiapp/nova/novaart/novasvn/setup/setup_nova.sh "$@" 
}

# pick a particular release
setup_novaoffline -r S12-11-11  

# optionally setup a test release
cd /nova/app/users/${GRID_USER}/test_rel_dir  
srt_setup -a

#============================================================================
# Section (b):  Work Area

cd ${_CONDOR_SCRATCH_DIR}
MYSUBDIR=mysubdir
mkdir $MYSUBDIR

#============================================================================
# Section (c):  Fetch Input
# Assume here the 1st arg to the script is the full path specified input file

MYINPUT=$1
ifdh cp $MYINPUT .
MYLOCALINPUT=`basename $MYINPUT`  # just the filename, not the path

#============================================================================
# Section (d):  Processing
# Assume here the 2nd arg is the .fcl file, and 3rd & 4th are output and hist
# file names (w/out directory or extension).  Add ${PROCESS} to the output 
# file names so that separate jobs in the same condor job cluster ( -N <n>
# which otherwise get the same script args) yield distinct filenames

MYFCL=$2
MYOUT=$3.${PROCESS}.root
MYHIST=$4.${PROCESS}.root
nova -c $MYFCL -o $MYOUT -T $MYHIST $MYLOCALINPUT 

#============================================================================
# Section (e):  Return Results
# These will succeed only if mysubdir is group writable

ifdh cp $MYOUT  /nova/ana/users/${GRID_USER}/mysubdir
ifdh cp $MYHIST /nova/ana/users/${GRID_USER}/mysubdir

#============================================================================
# Section (d):  Clean Up

rm -r $MYSUBDIR $MYLOCALINPUT $MYOUT $MYHIST

echo "end-of-job" 

Don't forget that this script must live on /nova/app and must that the execute bit set (chmod +x job_script).

Pre-staging Data from Tape

If you need to prestage data from tape you can either do so by using SAM SAM web cookbook, or by pre-staging your files one by one using their POSIX directory and filename. This method is only recommended when you are prestaging files that are not in SAM -- the samweb method is preferred whenever possible!

The command you should use is:

$folder=/pnfs/path/to/directory
$filename=myfavouritefilename.root
touch $folder'.(fset)('$filename')(stage)(1)'

To check the status of your files you can:
setup_fnal_security
cache_state.py -s  /pnfs/path/to/directory/myfavouritefiles*root