Project

General

Profile

Running Grid Jobs using projectpy

project.py is a wrapper script that takes a lot of the tedious setup of grid etc out of the hand of the user,
instead using an xml configuration file.

The general instructions for project.py are here:
https://cdcvs.fnal.gov/redmine/projects/larbatch/wiki/User_guide

Getting project.py

To use this package in lariat we need to set up the lariatutil product (It should set up lariatsoft automatically).
An example shell session could look like this:

source /grid/fermiapp/lariat/setup_lariat.sh
setup lariatutil <current_release> -q <current qualifiers like e7:prof>

List of current lariatsoft releases

To check that you have it set up, try the command

project.py --help

To actually run grid jobs you need two things:
  • an xml file present locally that will configure your job and
  • a working .fcl file that will be used by the larsoft instance which needs to be present in your product/fhicl path or, better yet, locally.

XML file preparation

Get an example XML file

Example XML file for LAriAT batch jobs

Further example xml files can be found in the lariatutil (lardbt-lariatutil) repository. You can bring it into your working area by doing the following in your srcs directory:

mrb g ssh://p-lardbt@cdcvs.fnal.gov/cvs/projects/lardbt-lariatutil

There are example xml files are in lardbt-lariatutil/xml/test/ .

What you need to change in the XML file

Understanding everything in the xml file:
https://cdcvs.fnal.gov/redmine/projects/larbatch/wiki/User_guide

Output directories

You will probably want to change the destination directory <outdir> and set the working directory <workdir> as you need.

Release version and your local-only code

Make the release match your lariatsoft version:

<!ENTITY release "v01_04_00">

i.e. the release number should be the number of the lariatsoft version you are using. So it has to be present in /grid/fermiapp/products/lariatsoft

If you are using code which is not in the release, include the optional <local> tag in your <larsoft> section:

<larsoft>
  <tag>&release;</tag>
  <qual>e6:prof</qual>
  <local>/lariat/app/users/USERNAME/lariatOffline/AwesomeNewModule/localProducts_larsoft_RELEASE_QUALIFIERS</local>
</larsoft>

BE SURE TO SET THE ACTUAL VALUES OF THE PATH ABOVE!

Or use the tarball option in the file. You have to make the tarball using the script lardbt-lariatutil/scripts/make_tar_lariat.sh (or /grid/fermiapp/products/lariat/lariatutil/(change version)/bin/make_tar_lariat.sh -d localProducts... name_of_the_output.tar if you don't have a local copy.

More at https://cdcvs.fnal.gov/redmine/projects/larbatch/wiki/User_guide

SAM datasets

Rather than list files individually you can use a SAM dataset,

 <inputdef> BatchTestRun6326_10Events </inputdef>

A SAM dataset is a selection of files with specific conditions or configurations in common.

More here: SAM datasets

.fcl file preparation

The .fcl files need to be in your FHICL_FILE_PATH , so, again they need to be either present in a tagged release or in your localProducts directory.

This also means that after each modification of the .fcl file you need to make install for project.py to pick it up. :(

An example .fcl file is in:
lariatsoft/mccconfigs/

Running your project

Once you have these elements you can run:

project.py --xml <path to your xml file> --stage <your defined stage > --submit

and look for results in your <outdir>

Before you run the next stage you need to create a files.list
normally this would be done by launching:

project.py --xml <path to your xml file> --stage <your defined stage > --check

but, this currently fails for our files, so temporarily the solution is:

ls /path/to/your/output/files/<your defined stage>/CLUSTER_PROCESS/*.root >> /path/to/your/output/files/<your defined stage>/files.list 

otherwise the submission will fail by not finding the files.list.

Troubleshooting

No results. What went wrong?

If you don't see output, you run jobsub_fetchlog --list.
This lists available logs grouped by the fifebatch headnodes.

For the one that failed, jobsub_fetchlog -J 2586120.0@fifebatch2.fnal.gov --unzipdir=2586120.0.
This the directory 2586120.0/ full of log files.

<numevents> is being ignored in my SAM dataset definition

Set <maxfilesperjob> and <numbjobs> instead.