Project

General

Profile

So You Want to Make Some Monte Carlo (A Qwik Start Guide for non-Productioners)

So you need to make some Monte Carlo and perhaps don't have a month's time to devote learning all software properly. This guide is for you. The goal of this page is to get you started on generating some MC with minimal thought and effort.

There are two scripts you need to use to generate MC:
1. make_sim_fcl
- This script generates the fcls you need to process MC. This script resides in the "novaproduction" package. It is setup automatically to run without needing to locally install the package. You can verify this by using

$which make_sim_fcl

2. submit_nova_art.py
- If you don't want to run your jobs interactively, you'll want to use this. This script submits jobs to GRID for you based on an input configuration file that contains the SAM dataset definition that you've generated with your fcls from make_sim_fcl. Much like make_sim_fcl, this script works without you needing to setup the NovaGridUtils package.

Though you don't need to, I would still suggest setting up local copies of the both novaproduction and NovaGridUtils (addpkg_svn -h novaproduction) in a test release.

make_sim_fcl

Now let's go through some examples of how you'll be using the scripts above. The examples below are modified commands used in the Second Analysis Campaign to create the official samples used. For a list of basic options, you can use make_sim_fcl -h and for the full list of options, use make_sim_fcl -ah.

The only option you should never use is the -fts option. This option will add the fcls you create into official datasets and give the Production group a headache, please don't do that.

examples using make_sim_fcl for ND

These are the three most requested samples for the ND

ND Genie non-swap no-RW (no-reweight) w/Rock Overlays:

make_sim_fcl --randomsubruns 100 /nova/app/users/projas/runlists/nd_grl_v0_epoch1-3c.txt --events 2000 --detector nd --generator genie --hornpolarity fhc \
  --flavorset nonswap --iteration 1 --outdir ./ --batchnumber 3 --timespec full --geantonly --nogenierw

ND Genie Fluxswap (no overlays):

make_sim_fcl --randomsubruns 100 /nova/app/users/projas/runlists/nd_grl_v0_epoch1-3c.txt --events 2000 --detector nd --generator genie --hornpolarity fhc \
  --flavorset swap --iteration 1 --outdir ./ --batchnumber 1 --timespec full --nogenierw

ND Genie nonswap w/RW (reweight) w/Rock Overlays:

make_sim_fcl --randomsubruns 100 /nova/app/users/projas/runlists/nd_grl_v0_epoch1-3c.txt --events 400 --detector nd --generator genie --hornpolarity fhc \
  --flavorset nonswap --iteration 1 --outdir ./ --batchnumber 4 --timespec full --geantonly

If you notice, this sample has less spills per file than the other two (2000 vs 400), this was to shorten processing time as overlays already take more time, plus the genie reweighting adds even more time.

ND CRY (gainmode 100):

The following command produces 100 files with 1M events a piece:

make_sim_fcl -gm 100 --lastrun 100000 --numsubruns 24 --events 1000000 --generator cry --flavorset all --detector nd --iteration 1 --batchnumber 1 \
  --timespec ideal --outdir ./

examples using make_sim_fcl for FD

FD Genie nonswap:

make_sim_fcl --randomsubruns 100 /nova/app/users/projas/runlists/FD_GRL_v3_020816.txt --events 1000 --detector fd --generator genie --hornpolarity fhc \
  --flavorset nonswap --geantonly --iteration 1 --batchnumber 1 --timespec full --nogenierw --outdir ./

FD Genie fluxswap:

make_sim_fcl --randomsubruns 100 /nova/app/users/projas/runlists/FD_GRL_v3_020816.txt --events 1000 --detector fd --generator genie --hornpolarity fhc \
  --flavorset swap --iteration 1 --batchnumber 1 --timespec full --nogenierw --outdir ./

FD Genie tauswap:

make_sim_fcl --randomsubruns 100 /nova/app/users/projas/runlists/FD_GRL_v3_020816.txt --events 1000 --detector fd --generator genie --hornpolarity fhc\
  --flavorset tau --iteration 1 --batchnumber 1 --timespec full --nogenierw --outdir ./

FD CRY gainmode 100:

make_sim_fcl -gm 100 --lastrun 1000000 --numsubruns 24 --events 200 --generator cry --flavorset all --detector fd --iteration 1 --batchnumber 1 \
  --timespec ideal --outdir ./

FD CRY gainmode 140:

make_sim_fcl -gm 140 --lastrun 2000000 --numsubruns 24 --events 200 --generator cry --flavorset all --detector fd --iteration 1 --batchnumber 1 \
  --timespec ideal --outdir ./

Custom modifications to the fhicl file

make_sim_fcl uses a template to make fhicl files. By default the template lives in $NOVAPRODUCTION_DIR/fcl/templates/sim_fcl.template. If you need to customize the fhicl for your job, you should make a local copy of this template, and then tell the job to use it with:

make_sim_fcl --templatefhicl some_custom_fhcl.template ...

Some extra notes

You should now have everything you need to run jobs. The output of submit_nova_art.py should have given you a jobsubID and a samweb monitoring page. I suggest saving them somewhere so you can access them at a later date if needed. Additionally, don't forget to fill out an ELOG entry in the ECL, so that other may learn from your work and Conveners can track your progress.

If you need to make something different from the above, such as a systematics sample or some other special sample, you'll probably using the --syst option.

For a list of stand/basic options plus more examples, use:

  make_sim_fcl -h

For a full list of options and brief explanations, use:
  make_sim_fcl -ah

If you know you will be making lots of MC, it would behoove you to learn more about how make_sim_fcl works: Running make_sim_fcl as well as just setup up the head version (described near the top) and just reading the code.

submit_nova_art.py

This is the way that you actually submit jobs on the gird. The below stages will step you through how to do this.

Preparation

If you want to make your fhicls into a SAM dataset for processing on the grid using the sample configuration below, use the sam4users tool:

sam_add_dataset -n <name of your dataset> -d <directory containing your fhicl files>

The syntax for running jobs is:

submit_nova_art.py -f /<path_to>/<config_file>.cfg

Standard configurations (non-overlays)

I refer to non-overlay MC jobs as standard jobs, this is typically anything made for FD as well as certain ND samples (Fluxswap, CRY).

For these standard jobs, you'll use a configuration like the one below. It is based on $NOVAGRIDUTILS_DIR/configs/prod_mcgen_template.cfg, but make sure to NOT include -f production.inc if you copy it from there!

#This is a template to help you configure your mc generation jobs on the grid.

# Include standard MC generation configuration
-f mcgen.inc

########################################
# General options that you need to set #
########################################

#Type the name of your job here
--jobname <job_name>

#Type the definition that contains the fcl you want to process
--defname <dataset_definition_name>

#This is where you place which tag you want to generate in
--tag <novasoft_release>

#Number of jobs you'd like to submit
--njobs <num_jobs>

#Where do you want your files to be copied to?
--dest /pnfs/nova/scratch/users/$USER/<somewhere>

####################################
# Default options you might change #
####################################

# By default, run jobs everywhere
-f everywhere.inc
#-f offsite.inc
#-f onsite.inc

# Normally produce artdaq but...
--outTier=out1:artdaq
# ...Switch to g4 for making rock MC and other overlay pieces
#--outTier=out1:g4

# It is worth stealing some things off of production.inc...
--maxopt 
--os SL6
--copyOut 
--print_jobsub
-G nova

Overlay configurations

Running overlays is more complicated, but still possible. If you need to do this, please take a look at the documentation for the production group.

Need Reconstruction?

If you need to reconstruct your files, the easiest way is to again make use of the sam4users:

sam_add_dataset -n <name of your dataset> -d <directory containing your mc files>

and use this as input to a "full chain" reconstruction job.

Example configuration for Reco Jobs

Currently (Summer 2018) you will likely have to run two reconstruction steps; Production 3 (artdaq -> PID), and Production 4 (PID -> RePID) as these production campaigns followed on from each other. Once we move on to Production 5 (Summer 2019), we will just have a single reconstruction step.

#This is a template to help you configure your reco jobs on the grid.
########################################
# General options that you need to set #
########################################

##Set the tier of the file that you want to copy back.
# If performing Prod3 Reco
#-f outputs_Prod3_PID.inc 
# If performing Prod4 Reco
#-f outputs_repid.inc
# If you want to configure the output tiers yourself, and not use production ones.
--outTier=out1:<tier>

#Type the name of your job here
--jobname <job_name>

#Type the definition that contains the fcl you want to process
--defname <dataset_definition_name>

#This is where you place which tag you want to generate in
--tag <novasoft_release>

##Which fhicl file are you running?
-c <fhicl-file>
# Prod3 Reco, will likely be `prod_full_chain_numi_job.fcl`
# Prod4 Reco, will likely be `prod_repid_numi_prod4_job.fcl`
# If you want to specify your own fhicl file you will need the following two line, and then -c `<fhicl>.fcl`;
#--inputfile /pnfs/path/to/fcl/<fhicl>.fcl

#Number of jobs you'd like to submit. Remember to not submit more jobs that files in your definition!
--njobs <num_jobs>
# To submit exactly one job per file use `--files_per_job 1` --> Only use if definition has less than 5k files!

#Where do you want your files to be copied to?
--dest /pnfs/nova/scratch/users/$USER/<somewhere>

####################################
# Default options you might change #
####################################

# By default, run jobs everywhere
-f everywhere.inc
#-f offsite.inc
#-f onsite.inc

# It is worth stealing some things off of production.inc...
--maxopt 
--os SL6
--copyOut 
--print_jobsub
-G nova

Use a custom fhicl file

Sometimes you need to use a custom FCL file that's not in the novaproduction area for a one-off sample. In that case, first ensure your FCL is copied into dCache somewhere (/pnfs/nova/scratch/users/<your username> is probably the best choice). Then, add these lines to your submit_nova_art.py configuration:

--inputfile /pnfs/path/to/fcl/<fclname>.fcl
-c <fclname>.fcl

Set the maximum number of running jobs.

Turn off by setting this to 0 or by not specifying. (Default is off.)

--maxConcurrent <N>

Need more help? Take a look at the production webpage

There is also some more information about how production runs these jobs here, but I don't think you'll need that link... If you follow it, be sure to remove -f production.inc from your .cfg file, as you will not be running as production.

Some useful things which that page covers;
  • Checking if your definition is cached
    cache_state.py -d <def>
    
  • Prestaging your definition if it isn't 100 cached
    samweb prestage-dataset --defname=<your definition here> --parallel=5
    
    • Contact production before prestaging a definition with more than 500 files
  • Submitting test jobs
    submit_nova_art.py -f <config>.cfg --test_submission
    
  • Submitting additional jobs to an existing project. To do this add the following to your <config>.cfg
    --continue_project <project name>