Project

General

Profile

How To Launch Grid Jobs

LAr1ND now has a VO which is a part of the Fermilab VOMS and allows us to run
on the FermiGrid and Open Science Grid.

Launching jobs is done using the jobsub_client package:
https://cdcvs.fnal.gov/redmine/projects/jobsub/wiki/Using_the_Client

but a quick way to launch your job is the following:

make sure you create your user directory on the dCache disk space mounted at:

mkdir /pnfs/lar1nd/scratch/users/<your_user_name>
chmod g+w -R /pnfs/lar1nd/scratch/users/<your_user_name>

This will be the directory with which your jobs will be able to communicate, as it is visible from the OSG.
Hence, all your input files and executables (if they are not ups products) should be there and copied over to
the worker nodes.

To run a job, then you need to setup jobsub_client:

source /grid/fermiapp/products/common/etc/setup
setup jobsub_client v1_0

then, you need a wrapper script. For now you can use the one in:

/lar1nd/app/users/andrzejs/run_job.sh

copy it over to your working directory (e.g. /lar1nd/app/users/<your_user_name>)
and edit the following lines (starting from l. 55):

####################################
###### setup your needed products here, e.g. geant4 etc...
####################################

# source /grid/fermiapp/products/larsoft/setup
# setup geant4 v4_9_6_p03e -q debug:e6 
# setup geant4 v4_9_6_p03e -q e6:prof    #no debug information, faster. 

####################################
#### This is where you copy all of your executable/necessary files to the worker node 
#### ( If applicable )
####################################

###### this is where you copy your executable - I have a simple hello.out code here.
 ifdh cp /pnfs/lar1nd/scratch/users/andrzejs/hello.out .

####### 
####### ifdh cp does not preserve permissions, so need to add executable. #########
#######
 chmod u+x hello.out

#######
####### launch executable
#######

 ./hello.out

#######
####### Copy results back 
#######

ifdh mkdir ${SCRATCH_DIR}/${GRID_USER}/output_${CLUSTER}.${PROCESS}

ifdh cp test_hello.txt ${SCRATCH_DIR}/${GRID_USER}/output_${CLUSTER}.${PROCESS}/

This is how you would launch the job from your working directory (e.g. bluearc space like, /lar1nd/app/users/):

jobsub_submit -G lar1nd --role=Analysis -N 3 -M --resource-provides=usage_model=OPPORTUNISTIC --OS=SL5,SL6 file://`pwd`/run_job.sh

Note the -N specifying the number of jobs you want, -M, which will send you an email every time a subprocess finishes. usage_model=OPPORTUNISTIC tells the script to use any available slots, soon we will be able to use DEDICATED, OPPORTUNISTIC which will set the priority to our dedicated slots. --OS specifies the Scientific Linux version, if you care about that (note that SL5 nodes will go out of commission in the next couple of months).

The results of your job will end up in

/pnfs/lar1nd/scratch/users/<your_user_name>/output_${CLUSTER}.${PROCESS}

Some other useful commands:

See how your jobs are doing:

jobsub_q -G lar1nd --user=<your_user_name>

Remove a job:

jobsub_rm -G lar1nd --jobid=<number of job, can get e.f. from previous question>

Fetch the log files:

jobsub_fetchlog -G lar1nd --jobid <job id specified at runtime, e.g. >77457.0@fifebatch2.fnal.gov >

For more details go to
https://cdcvs.fnal.gov/redmine/projects/jobsub/wiki/Using_the_Client

Launching LArSoft jobs (also in large numbers)

Coming soon...

Using project.py to launch LArSoft jobs (especially in large numbers)

project.py is a wrapper script that takes a lot of the tedious setup of grid etc out of the hand of the user,
instead using an xml configuration file.
The general instructions for project.py are here:
https://cdcvs.fnal.gov/redmine/projects/larbatch/wiki/User_guide

to use this package in lar1nd we need to set up lar1ndcode and lar1ndutil products.
An example shell session could look like this:

$ source /grid/fermiapp/products/lar1nd/setup_lar1nd.sh
$ setup lar1ndcode v00_05_00 -q e6:prof
$ setup lar1ndutil v01_10_01 -q e6:prof

now you should be able to run project.py
to actually run grid jobs you need two things:
an xml file that will configure your job, present locally and a .fcl file that will be used by the larsoft instance
which needs to be present in your product/fhicl path.

xml file preparation

Example xml files can be found in the lar1ndutil repository, which you can download by doing:

mrb g lar1ndutil

in your srcs directory. the xml files, are e.g. in lar1ndutil/xml/test/ .
These are currently untested. The instructions on xml file format can be found in:
https://cdcvs.fnal.gov/redmine/projects/larbatch/wiki/User_guide

but the relevant part for lar1ndcode is:

<!ENTITY release "v00_05_00">

i.e. the release number should be the number of the lar1ndcode version you are using. So it has to be present in
/grid/fermiapp/products/lar1nd or in your localProducts directory (in that case you need to use the tarball option in the file. You can make the tarball using the: /lar1ndutil/scripts/make_tar_lar1nd.sh ).

.fcl file preparation

The .fcl files need to be in your FHICL_FILE_PATH, so again, they need to be either present in a tagged release or in your localProducts directory.
This also means that after each modification of the .fcl file you need to make install for project.py to pick it up.

An example .fcl file is in:
lar1ndcode/lar1ndcode/JobConfigurations/prod_eminus_0.1_0.9_lar1nd.fcl

Running

Once you have these elements you can run:

project.py --xml <path to your xml file> --stage <your defined stage > --submit

and look for results.