BatchSubmissions » History » Version 54

« Previous - Version 54/56 (diff) - Next » - Current version
Arthur Kreymer, 05/29/2013 03:49 PM

Batch Submissions


Getting Started

  • Job submission is done from under username minospro. Make sure you have access to that machine as yourself, as minfarm and minospro, current contact person is Arthur Kreymer <>
  • Setup cronjob to renew proxy needed for job submission in your personal crontab, for example
    07          1-23/2 * * * /local/scratch25/grid/kproxy
    07          1-23/2 * * * /local/scratch25/grid/kproxy_pro.20130529 -r Production
  • Obtain permissions to write output files to /pnfs/minos area, current contact person is Arthur Kreymer <> * Update list of submitters to include your username

.bashrc example for setup

Keep-up with current version

Current version is dogwood6 at the moment. Keep up is running daily. Keep-up processing is used for calibrations. More accurate physics processing is done in larger batches, after calibration sign off (see below).

Minos batch at a glance

Cron jobs

Daily keep-up cron jobs are currently running under minospro account. Main submission job is this

  04 07,15,23 * * * /grid/fermiapp/minos/minfarm/scripts/get_daq_submit.glide -v dogwood6 -F

Other active cronjobs are


# Use this crontab for jobs that require the vanilla condor_q

# Note that it is the responsibility of the process to do a sufficient setup

# For safety, keep an updated copy of this crontab
  10 0 * * * /usr/bin/crontab -l > $HOME/cron-pro.minos25

# Clean out old logs, submits, and cores -- Run always
  20 22 * * * /grid/fermiapp/minos/minfarm/scripts/rm_logs.glide -F

Other cronjobs running on minos25 under user minospro


# Use this crontab for jobs that don't require condor_q

# For safety, keep an updated copy of this crontab
  10 0 * * * /usr/bin/crontab -l > $HOME/cron-pro.minos27

# Copy logs to AFS
  55 10,22 * * * /grid/fermiapp/minos/minfarm/scripts/copy_logs

# Keep the good_runs, bad_runs, and farmsdb files up-to-date
  00-55/5 * * * * /grid/fermiapp/minos/minfarm/scripts/gather_runs

# And the same for mc
  02-57/5 * * * * /grid/fermiapp/minos/minfarm/scripts/

# Check that data is flowing from the detectors to pnfs
  02 00-22/4 * * * /grid/fermiapp/minos/minfarm/scripts/check_delivery

# Refresh mclist from mcin_data when new stuff is coming in
# 04 06,14,22 * * * /grid/fermiapp/minos/minfarm/scripts/get_multi_mc dogwood5 near daikon_07
# 04 02,10,18 * * * /grid/fermiapp/minos/minfarm/scripts/get_multi_mc dogwood5 far daikon_07
# 04 00,08,16 * * * /grid/fermiapp/minos/minfarm/scripts/get_multi_mc dogwood5 near daikon_08

# Manage the keep-up lists
  58 22 * * *   /grid/fermiapp/minos/minfarm/scripts/keepup_lists B
  10 23 * * Sun /grid/fermiapp/minos/minfarm/scripts/keepup_orphans

Submit scripts options

PRO> /grid/fermiapp/minos/minfarm/scripts/get_daq_submit.glide -h
Usage:  get_daq_submit.glide [-v VSN] [-V VSN2] [options]
Options -h   print this message
        -d   print debug information in analyze
        -g   use root compiled with only -g
        -O,o use root compiled with -g -O2
        -Q n use mysql server minos-$n -- -Q db1 is default
        -b   bypass bfield check -- will produce ERR 100 in analyze
        -n   process near detector only
        -f   process far detector only
        -a   add ATMOS processing
        -c   do COSMIC processing ONLY
        -s   do SPILL processing ONLY
             COSMIC and SPILL processing is the default
        -v   specify a version -- defaults to current_version
        -y   bypass field and beam checks and *do* pass b,B options to analyze
             Use in shutdown when chambers and db updates run = -Bbc
        -B   beam down -- -F and don't signal missing lists
        -F   bypass beam check and run cosmic only -- will produce ERR 101
        -G   bypass beam check and run both passes -- will produce ERR 101
        -L   don't report on missing lists -- used when testing
        -S   do *not* submit jobs, only update bookkeeping
        -T|X do *not* update tarfiles or delete daily list -- TEST MODE
        -Z   -S and don't write to datalist(s) -- supercedes -S
        -V V add nearlist and farlist to alternate datalist.$V

PRO> /grid/fermiapp/minos/minfarm/scripts/cron_submit.glide -h
Usage:   cron_submit.glide [-pn] [-asmoOACNM] [-t F|N] VSN Num_Jobs [List]
Options: -h   - print this list
         -d   - print debug information in analyze
         -D   - allow duplicate submissions
         -g   - use root compiled with only -g
         -O,o - use root compiled with -g -O2
         -Q n - use mysql server minos- -- -Q mysql1 is default
         -b   - override bfield check
         -B   - override beam check
         -p n - override pass check in submit_job and use pass n
         -t f - count only F(ar) or N(ear); default is both
         -m   - allow multiple passes
         -a   - add ATMOS processing
         -c   - do COSMIC processing ONLY
         -s   - do SPILL processing ONLY
The following are generally useful if -c or -s.  If none of these is
    specified, all output streams are written, i.e. '' = -CNM
         -A   - write all output streams (default)
         -C   - write cand output (includes bcnd for FD)
         -N   - write ntuple output (includes bntp for FD)
         -M   - write mrnt output (for spill pass)

PRO> /grid/fermiapp/minos/minfarm/scripts/ -h
Usage: -v date -V date [-pn] [-dgmoOACNM] [-t f|n] VSN NumJobs [InList]
Options: -h   - print this list
         -d   - print debug information in ana_mc
         -m   - allow multiple passes
         -p n - override pass check in submit_job and use pass n
         -t f - count only f(ar), n(ear, F(mock), N(mock); default is all
         -g   - use root compiled with only -g
         -O,o - use root compiled with -g -O2
         -S   - special handling of subrun > 99
         -v s - string specifying start of time range: 'YYYY,MM,DD,hh,mm,ss'
         -V s - string specifying end of time range: 'YYYY,MM,DD,hh,mm,ss'
The following control output streams.  If none of these or -A
    is specified, all output streams are written, i.e. '' = -CNM
         -A   - write all output streams (default)
         -C   - write cand output
         -N   - write ntuple output
         -M   - write mrnt output

Location of output and log files

There are no log files for submissions currently, all output from submission scripts is sent to mail, which is stored, for example, in


If you prefer to receive actual email you may add to the crontab that runs the job, for example

To see running jobs run

Output files are written in cand_data directory by date. For example for dogwood6 processing of near detector data collected in February 2012:


Log files from grid processing are available while (and after) the job is running on the grid, they are written to


Twice a day log files are archived to (see active cronjob above)

Old location:

Note that all the files are gzipped. (Please don't unzip them!) To look at them use 'less' or copy to another location first.

Files that crashed will appear in the bad_runs file, for example


The error codes are as follows
   1: Input error, usually an srm problem -- rerun
   2: No output streams
   3: Unable to save an output stream -- dcache or farcat/nearcat -- rerun
   7: Unable to locate loon script -- rerun after adding script to tar
   8: Mysql server not available -- rerun
  15: No asciidb files -- configuration error -- probably obsolete
  90: Job runs extremely long without writing output -- killed by hand
  91: Do not process -- not in measurement list -- manual entry in bad_runs
      Should be caught as a suppressed run -- mostly used with atmos 
  95: Temporary reassignment of 100 to allow flushing if not to be rerun
  96: Temporary reassignment of 101 to allow flushing if not to be rerun
  99: Job runs extremely long and writes massive output -- killed by hand
 100: Gaps in bfield database -- usually rerun after db update
 101: Gaps in beam spill database -- usually rerun after db update
 132: Illegal Instruction
 134: Invalid Data
 136: FPE
 137: Killed by system or user; rerun or manually change to 90 or 99
 139: SEGV

Roundup (Concatenation)

Rashid Mehdiyev <> is currently running roundup under minospro account on Roundup checks that files for all subruns from a given run are present, if any are missing, then the run will not be concatenated. Log files for roundup processing are stored, for example


To make a list of missing subruns from the last pass of roundup you can run something like
cd /grid/fermiapp/minos/minfarm/scripts/
./pend2list d6 n

cd /grid/fermiapp/minos/minfarm/scripts/
./pend2list dogwood6 far

Unless runs have crashed due to missing beam or b-field data in the database, the missing file will appear in the list

To include runs that have crashed due to missing beam or b-field data in the database, for example after data have been filled to database, run pend2list with option -k (keep)
./pend2list -k dogwood6 far


To check if particular subrun (or list of subruns) have been processed with particular version AND already concatenated, follow the example

minos25$ cat mmm.d4
F00047650_0004 2011-05
F00047670_0006 2011-05
F00047685_0010 2011-05
F00047692_0009 2011-06
F00047949_0016 2011-06
F00048191_0009 2011-08
F00048350_0005 2011-08

minos25$ while read r m; do sam_find -b c -t sntp -v d4 -m $m $r; done <

There is a possibility of duplicate files. When duplicate file comes in to near_cat (before round up) and it already exists in near_cat it is moved to


If duplicate files comes in after round up already concatenated first version of it, then duplicates are found by round up script, and are moved to


There are two routines that clean up duplicates caught by analyze (while submitting jobs) det2dcache and det2cat.

PRO> det2dcache
Usage:   det2dcache [-RNYnd] VSN f|n
Options: -N: Do NOT ask whether to replace non-zero files - DELETE LOCAL
         -Y: Do NOT ask whether to replace non-zero files - DO IT
         -d: Run srmcp with debug=true
         -n: Don't copy or delete -- just show what would be done

(-R is deprecated in favor of -N)

minos25$ det2cat
Usage:  det2cat [-n] VSN F|N
Option: -n - debug mode -- just show what would be done 

Running routine det2dcache requires srm and sam setup, it should be run under minospro account and on other then minos27. See example of .bashrc for setup. You may run
det2dcache -n d6 n

and see what happens and then run
det2dcache -N d6 n

to actually do the copies/deletions. Similarly for ntuples, run
det2cat -n d6 n

and then
det2cat d6 n 

Missing files
In case that job ran successfully to completion on the grid, but is unable to copy file to /pnfs/minos (for example due to authorization problems, or problems with pnfs), the run will appear in good_runs.* list, and actual file will be moved to the same directories as duplicate files. The cleanup is the same as for duplicates.
/minos/data/minfarm/neardet or /minos/data/minfarm/fardet

To delete files from pnfs

If you need to permanently delete files from sam and pnfs for any reason (if any of the system fail and duplicates occur during concatenation for example), use this example

ssh minospro@minos27
PRO> . /minos/app/app/OSG1/
PRO> SRMV2_PATH="srm://fndca1:8443/srm/managerv2?SFN=/pnfs/" 
PRO>  export X509_USER_PROXY=/minos/data/minfarm/.grid/minospro_proxy
PRO>  DIR=reco_far/dogwood6/${tdir}_data/2012-03
PRO>  tdir=sntp
PRO> DIR=reco_far/dogwood6/${tdir}_data/2012-03
PRO> FILE=F00049284_0001.spill.sntp.dogwood6.0.root
PRO> srmrm -2 $SRM_DIR/${FILE}
PRO> FILE=F00049287_0001.spill.sntp.dogwood6.0.root
PRO> srmrm -2 $SRM_DIR/${FILE}
PRO> FILE=F00049287_0001.cosmic.sntp.dogwood6.0.root
PRO>  tdir=mrnt
PRO> DIR=reco_far/dogwood6/${tdir}_data/2012-03
PRO>  FILE=F00049284_0001.spill.mrnt.dogwood6.0.root
PRO> srmrm -2 $SRM_DIR/${FILE}
PRO>  FILE=F00049287_0001.spill.mrnt.dogwood6.0.root
PRO> srmrm -2 $SRM_DIR/${FILE}
PRO>  tdir=.bntp
PRO> DIR=reco_far/dogwood6/${tdir}_data/2012-03
PRO>  FILE=F00049287_0001.spill.bntp.dogwood6.0.root
PRO> srmrm -2 $SRM_DIR/${FILE}
minos53$ sam undeclare F00049284_0001.spill.sntp.dogwood6.0.root
minos53$ sam undeclare F00049287_0001.spill.sntp.dogwood6.0.root
minos53$ sam undeclare F00049287_0001.cosmic.sntp.dogwood6.0.root
minos53$ ls /pnfs/minos/reco_far/dogwood6/
.bntp_data/ cand_data/  mrnt_data/  sntp_data/  
minos53$ ls /pnfs/minos/reco_far/dogwood6/
minos53$ sam undeclare F00049284_0001.spill.mrnt.dogwood6.0.root
minos53$ sam undeclare F00049287_0001.spill.mrnt.dogwood6.0.root
minos53$ sam undeclare F00049287_0001.spill.bntp.dogwood6.0.root
PRO>  tdir=sntp
PRO> DIR=reco_far/dogwood6/${tdir}_data/2012-03
PRO>  FILE=F00049287_0001.cosmic.sntp.dogwood6.0.root
PRO> srmrm -2 $SRM_DIR/${FILE}
PRO>  cd  /minos/data/reco_far/dogwood6/sntp_data/2012-03/
PRO> for f in `ls -l | grep 0001 | awk '{print$9}'`; do echo rm $f; done
rm F00049284_0001.spill.sntp.dogwood6.0.root
rm F00049287_0001.cosmic.sntp.dogwood6.0.root
rm F00049287_0001.spill.sntp.dogwood6.0.root
PRO> rm F00049284_0001.spill.sntp.dogwood6.0.root
PRO> rm F00049287_0001.cosmic.sntp.dogwood6.0.root
PRO> rm F00049287_0001.spill.sntp.dogwood6.0.root
PRO>  cd  /minos/data/reco_far/dogwood6/mrnt_data/2012-03/
PRO> for f in `ls -l | grep 0001 | awk '{print$9}'`; do echo rm $f; done
rm F00049284_0001.spill.mrnt.dogwood6.0.root
rm F00049287_0001.spill.mrnt.dogwood6.0.root
PRO> rm F00049284_0001.spill.mrnt.dogwood6.0.root
PRO> rm F00049287_0001.spill.mrnt.dogwood6.0.root
PRO>  cd  /minos/data/reco_far/dogwood6/.bntp_data/2012-03/
PRO> for f in `ls -l | grep 0001 | awk '{print$9}'`; do echo rm $f; done
rm F00049287_0001.spill.bntp.dogwood6.0.root
PRO> rm F00049287_0001.spill.bntp.dogwood6.0.root

Job monitoring on grid

To see a list of running jobs type


To remove a job
condor_rm id#
condor_rm -force jobid#

Physics Processing

When calibrations are done, the decision is made to process a certain time period with dogwood7. You need to make a list of files from list archives and put it in the lists directory, then add cron_submit to the crontab, for example

04-58/10 * * * * /usr/krb5/bin/kcron "/grid/fermiapp/minos/minfarm/scripts/cron_submit.glide  dogwood7 300 march.dogwood7" 

Monte Carlo Requests Processing

Requests are submitted by email, for example

Files were generated with daikon 07 and  can be found in 

They should be reconstructed with dogwood5 and for this sample please use following validity dates
  2005-05-21 to 2006-02-25 

The file list is created using a cron job which looks for new files in /pnfs/minos/mcin_data


The job is get_multi_mc and it updates files in lists/mc and writes a measurement list of the form
mclist_near.VSN, for example
minos25$ cat mclist_near.dogwood5

The validity dates are passed to the loon script. They are -v start_date-time and -V end_date-time, most easily in the form 'YY,MM,DD,hh,mm,ss'. Here are some examples of MC submissions
# 06,26,46 * * * * /usr/krb5/bin/kcron "/grid/fermiapp/minos/minfarm/scripts/ dogwood3 600 mclist_near.dogwood5" 
  02-52/20 * * * * /usr/krb5/bin/kcron "/grid/fermiapp/minos/minfarm/scripts/ -v '2005,05,21,0,0,0' -V '2006,02,25,23,59,59' dogwood5 400 mclist_near.dogwood5" 
# 06-56/10 * * * * /usr/krb5/bin/kcron "/grid/fermiapp/minos/minfarm/scripts/ -v '2011,06,01,0,0,0' -V '2011,06,30,23,59,59' dogwood5 450 mclist_far.NOVA" 

The main run script is ana_mc analogous to analyze.

One actual example:

* Running MC request

A new CosmicMu sample is ready to reco. It is daikon07 far detector
sample and it was alread transfered to 

* Running

 ssh minos25
 minos25$ get_multi_mc dogwood5 far daikon_07
far daikon_07 AtmosNu: old and new have 7931 entries.
far daikon_07 CosmicLE: old and new have 120 entries.
far daikon_07 CosmicMu: new list mc/far_daikon_07_CosmicMu has 880 entries.
It has also been copied to /minos/data/minfarm/lists/new_far_daikon_07_CosmicMu
Appending new_far_daikon_07_CosmicMu to mclist_far.dogwood5 and
removing new_far_daikon_07_CosmicMu
far daikon_07 L010185N_r1: old and new have 2291 entries.
far daikon_07 L010185N_r2: old and new have 1320 entries.
far daikon_07 L010185N_r3: old and new have 3127 entries.
far daikon_07 L010185R: only empty groups.
far daikon_07 L010185R_r4: old and new have 4300 entries.
far daikon_07 L100200N_r7: old and new have 261 entries.
far daikon_07 L100200R_r7: old and new have 264 entries.
far daikon_07 L250200N_r1: old and new have 1144 entries.
Check far daikon_07 L250200N_r2: blocked by NORECO
far daikon_07 L250200N_r7: old and new have 258 entries.
far daikon_07 L250200R_r7: old and new have 263 entries.

cd lists
mv  mclist_far.dogwood5 far_daikon_07_CosmicMu.dogwood5
crontab -e

12-52/20 * * * * /usr/krb5/bin/kcron "/grid/fermiapp/minos/minfarm/scripts/ -v '2007,11,19,0,0,0' -V '2009,06,12,23,59,59' dogwood5 250 far_daikon_07_CosmicMu.dogwood5" 

New Version

Special Processing


If files crashed with error 100 or 101 (missing beam or bfield) database might have been updated later, to check if data is now there do the following. If the data is there now, remove from bad_runs and resubmit.

To check if bfield was on for a particular run

source /grid/fermiapp/minos/minossoft/setup/
setup sam

resolve_bfld_fail  -r  N00021656_0013

To check if beam was on for a particular run

source /grid/fermiapp/minos/minossoft/setup/
setup sam

resolve_beam_fail  -r  N00021656_0013