- Table of contents
- Grid Running
- Monitoring Grid Jobs
- Handling job output
Grid Running¶
Computing division now maintains the documentation for running jobs on the grid. An introduction is provided here: Welcome New Computing Users.
If you are attempting to run ART jobs using SAM input, please use the submit_nova_art.py tool, which has extensive documentation here: Submitting NOvA ART Jobs
Please check the jobsub client user guide for documentation on how to submit, monitor and manage grid jobs. If you are writing your own grid scripts, you'll need to know how to get your input data and copy your results back a place where you can access them. For documentation on that consult the REX DH Wiki or NOvA SAM Wiki.
Run time¶
You should aim for your jobs to each run for a few hours. Jobs that run less than an hour are inefficient since there is a certain amount of overhead involved in getting a grid job started. Jobs that run longer than 24 hours at Fermilab are at risk of being killed by the grid. At sites other than Fermilab, the cut-off time could be considerably shorter (8 or 12 hours). This means you should run a single short job to estimate the time each job will take before submitting them all.
The original version of this page is still available. It may contain some useful information, but it is also out of date.
Monitoring Grid Jobs¶
Jobs in the Queue¶
After submitting jobs, one immediately gets the itch to know the job status. The most important information comes from jobsub_q
, which reports the status of the scheduler queue. The jobsub_q
output reports whether jobs are running or idle (still waiting). By default, jobsub_q
reports every job running, but the --user
and --jobid
arguments will narrow down the output. Complete documentation can be found on the jobsub_q documentation page.
Job Logs¶
If a job is no longer visible in the queue, it has probably finished running. In this case, the log should be accessible. The logs can be retrieved using jobsub_fetchlog
, which has its own set of documentation. The logs will include two files for each job, .out for stdout and .err for stderr. There is also a .log file, which contains information from condor about when your jobs started and ended, as well as intermediate status.
SAM Station Monitor¶
If you happen to be using a SAM project for file delivery, the status for each file delivered can be monitored using the SAM Station Monitor. It is important to note, however, that this monitor only tracks file delivery. Jobs can fail to start or be terminated in a way which causes the station monitor to report misleading information. Always fall back to the queue and logs if you are confused.
The information from the station monitor can also be accessed from the command line using samweb project-summary
.
Handling job output¶
Where to write your files (dCache/pnfs)¶
The Bluearc filesystem used to hold data for interactive jobs (/nova/ana) is too slow to handle the output from many parallel jobs. Trying to write your output will cause your jobs to sit and hang for a long time, meaning that they are taking up a CPU on the grid without really doing anything.
Instead, you should be writing the output of your files to dCache. dCache is not a filesystem in the traditional sense -- you cannot do everything you would do with a normal disk -- but it is fast enough to handle the output of many thousands of jobs simultaneously. The general rule-of-thumb is that the output of your grid jobs (argument to --dest
in submit_nova_art.py
, see here) should be:
/pnfs/nova/scratch/users/<username>
This area is large, but it is not persistent. As long as you are using your files they will stay there, but once you stop using them they are at risk of deletion (right now lifetimes are on the order of 1 month). You start by putting your files here so you can check that they are correct without using up space on tape. How to do that in the next section.
If you're writing your own job scripts, you will need to use ifdh cp
to copy your output to dCache (see below for usage)
More details on the limitations of Bluearc and best practices can be found at the FIFE data handling wiki. More details about NOvA disk systems can be seen at Disks.
Look at a single root file on dCache¶
If there is a root file you want to look at, you can use a the xrootd interface which allows root to stream files off of dCache. This can work even if you are not located at Fermilab. The utility pnfs2xrootd
will convert a path in pnfs into a uri that xrootd can understand. You can then use this uri to open the file directly with root:
root `pnfs2xrootd /pnfs/nova/scratch/users/<user>/myjob/job0.root`
Merge root files on dCache¶
If you want to combine the output of your jobs into a single root file on Bluearc, you can again do that using pnfs2xrootd (this is a recent feature as of 8/2015, so this may not work with older releases):
hadd /nova/ana/users/<user>/combined.root `pnfs2xrootd /pnfs/nova/scratch/users/<user>/myjob/*.root`
Note that the output must go to Bluearc, you cannot write directly to dCache using hadd.
Copy a limited number of files from dCache to Bluearc¶
If you really need to access the file directly, you can copy an individual file back to Bluearc (note, you should not copy large amounts of data back to Bluearc):
ifdh cp /pnfs/nova/scratch/users/<user>/myjob/job0.txt /nova/ana/users/<user>/job0.txt
Note that the syntax of ifdh cp
(documentation here) is not the same as regular cp
, specifically it does not automatically recognize the destination as a directory. To copy a series of files into an existing directory, use:
ifdh cp -D file1 file 2 /some/directory/
How to keep your dCache files long-term¶
If you decide your files are worth keeping in the longer term, you can copy them to our persistent area using ifdh cp
(documentation here):
ifdh cp -r /pnfs/nova/scratch/users/<username>/myjob /pnfs/nova/persistent/users/<username>/myjob
However, this area is not cleaned up automatically, so if we fill it up we fill it up. So please be judicious about what you copy there, and remove what you no longer need.
Anatomy of a Job Script¶
The job_script is the command that will be run under the condor system. Typically this is an executable shell script written by the user to do some particular unit of processing. Those run by the batch system generally takes a common form.
Also note that /nova/app
(and /nusoft/app
) will be mounted read-only, executable. That means scripts, executables and libraries can reside there but jobs will not be able to write to it. On the other hand, /nova/data
and /nova/ana
will be mounted writable, but no-exec. Your job can write to it (though for grid jobs, only if the directory is group writable) but executables and libraries can not reside there. Plan accordingly. Also, just because they might be writable doesn't mean one should use an unregulated direct write (see above discussion of ifdh cp
).
a) Setup¶
This part of the script should setup the desired work environment. Jobs do not run the user's normal .bashrc
, .bash_profile
or equivalent; on the grid they couldn't as home areas are in AFS and AFS is not mounted on the grid nodes, so don't even try to access AFS.
If one has made use of jobsub_submit
's flags -r
nova_release or -t
test_rel_dir that may be sufficient. One can forego using the -r
and -t
flags to jobsub_submit
and setup the NOvA (or appropriate) work environment directly if desired.
b) Work Area¶
The variable ${_CONDOR_SCRATCH_DIR}
defines a working disk area unique for the particular job. This section of the job can create any necessary subdirectories under that area. Processing should be done with this (or a subdirectory) as the current working directory. Output files should not be written directly to BlueArc while being generated; holding open file handles to network storage for long periods is a BadIdea™.
c) Fetch Input¶
Jobs that need input files should fetch them appropriately. If fetching from BlueArc then use remember to use Worker nodes can no longer access BlueArc directly! Draw input files from /pnfs/nova/scratch/ifdh cp
.
d) Processing¶
This is the heart of the job. It is here that the real processing is done.
e) Return Results¶
Once the real processing has is done any resulting output must moved from the job's scratch space to permanent storage (BlueArc or dCache). Use ifdh cp
to move files to either.
f) Clean Up¶
Condor will delete files under ${_CONDOR_SCRATCH_DIR}
from the worker node. Depending on the configuration of condor, some files might automatically be transferred back to the /nova/data/condor-tmp/
user area upon completion of the job_script, so it might be wise to delete any unnecessary files at this point.
Example¶
#! /usr/bin/env bash # Useful predefined env variables: # # ${PROCESS} is the individual job # when multiple jobs are run as a cluster # via "jobsub -N <n> script args" values [0...n-1] # ${_CONDOR_SCRATCH_DIR} this job's unique work area # # Note: on the grid $USER and such will not be "you" # it appears ${GRID_USER} is (and set even for local IF batch nodes) # #============================================================================ # Section (a): Setup # Define the work environment # (shown here is an alternative to using jobsub_submit -r & -t flags) function setup_novaoffline { source /grid/fermiapp/nova/novaart/novasvn/setup/setup_nova.sh "$@" } # pick a particular release setup_novaoffline -r S12-11-11 # optionally setup a test release cd /nova/app/users/${GRID_USER}/test_rel_dir srt_setup -a #============================================================================ # Section (b): Work Area cd ${_CONDOR_SCRATCH_DIR} MYSUBDIR=mysubdir mkdir $MYSUBDIR #============================================================================ # Section (c): Fetch Input # Assume here the 1st arg to the script is the full path specified input file MYINPUT=$1 ifdh cp $MYINPUT . MYLOCALINPUT=`basename $MYINPUT` # just the filename, not the path #============================================================================ # Section (d): Processing # Assume here the 2nd arg is the .fcl file, and 3rd & 4th are output and hist # file names (w/out directory or extension). Add ${PROCESS} to the output # file names so that separate jobs in the same condor job cluster ( -N <n> # which otherwise get the same script args) yield distinct filenames MYFCL=$2 MYOUT=$3.${PROCESS}.root MYHIST=$4.${PROCESS}.root nova -c $MYFCL -o $MYOUT -T $MYHIST $MYLOCALINPUT #============================================================================ # Section (e): Return Results # These will succeed only if mysubdir is group writable ifdh cp $MYOUT /nova/ana/users/${GRID_USER}/mysubdir ifdh cp $MYHIST /nova/ana/users/${GRID_USER}/mysubdir #============================================================================ # Section (d): Clean Up rm -r $MYSUBDIR $MYLOCALINPUT $MYOUT $MYHIST echo "end-of-job"
Don't forget that this script must live on /nova/app
and must that the execute bit set (chmod +x
job_script).
Pre-staging Data from Tape¶
If you need to prestage data from tape you can either do so by using SAM SAM web cookbook, or by pre-staging your files one by one using their POSIX directory and filename. This method is only recommended when you are prestaging files that are not in SAM -- the samweb method is preferred whenever possible!
The command you should use is:
$folder=/pnfs/path/to/directory $filename=myfavouritefilename.root touch $folder'.(fset)('$filename')(stage)(1)'
To check the status of your files you can:
setup_fnal_security cache_state.py -s /pnfs/path/to/directory/myfavouritefiles*root