Making Simulation fcls

First Guide: Making the FCL Files

This guide will tell you how to produce and register the job files which contain instructions for producing Monte Carlo.

Part 1: Making nova job files using make_sim_fcl

Like most tasks on our experiment, NOvA simulation jobs are performed using the instructions contained in “.fcl” files:

nova -c producesomecosmics.fcl -n <some number of events> -o outputARTfile.root 

Basically, the fcl file contains information on what Art modules and services to use to produce the files, and also the various parameters defining how those modules and services are used. Reference the general NOvA wiki for more information on how these configuration are formatted. The “-n” parameter is how many “events” the job should run over / produce, where event is the ART definition of “Event”, rather than just “a single particle interaction in the detector” - this can be a beam spill, a cosmic trigger window, or also a single particle, depending on what the job is doing. The -o output file will be a root file with ART data structure (i.e., it can be run over by other ART jobs).

In short, for Monte Carlo simulations, a fcl file will contain information on what the Monte Carlo generator is going to make: which detector, which event generator, which event flavor type, the order and type of ART modules to run, etc. Coding these all up by hand would be extremely cumbersome, so instead we use a script called make_sim_fcl to auto-generate these. This is located in the novaproduction package. If you want to take a look at the raw code, you can either grab a version of that package for the release we’re using right now:
addpkg_svn -h novaproduction

Or you can just look at the current version:

less $NOVAPRODUCTION_DIR/bin/make_sim_fcl

What make_sim_fcl does is to take in a bunch of command-line options, and use these to generate files based on an official fcl file template: sim_fcl.template, in the same directory. Using bash, certain lines are added in, taken out, uncommented, etc. based on what you’ve asked to be produced.

Important Options:

Doing make_sim_fcl --help will print the basic options. Doing make_sim_fcl -ah will bring “advanced” options (a longer, more comprehensive list).

A) What to generate?:

-n : How many events to produce per file (an integer)

-det : nd for Near Detector, fd for Far Detector, etc.

-gen : Which generator to use. cry is for the CRY cosmic event generator, genie is for the beam event generator.

-hp : Horn polarity. fhc for Forward Horn Current (our current neutrino beam mode), rhc for Reverse Horn Current (the antineutrino-enriched mode which we’ll switch to later).

-fs : Flavorset. When we produce beam Monte Carlo, we do so separately for different oscillation modes. The first option, nonswap, oscillates no neutrinos (muon neutrino -> muon neutrino, electron neutrino -> electron neutrino, etc.). The second option, swap, oscillates all numus to nues and all nues to numus (these files are usually called “fluxswap” files). The tau option oscillates all the numus to nutaus. These three filesets are later reweighted for POT and oscillation probabilities and then combined to form our Far MC prediction. We also produce a nonswap set for the Near Detector.

-bn / --batchnumber This is the batch number for the set of files you are creating. For the first batch of files this would be -bn 1, second batch -bn 2 etc.

-ts / --timespec This is the time period from the run numbers you created. If using run numbers from epoch3b, this would be -ts epoch3b. If using run numbers from Run2, -ts run2. For ideal conditions, use -ts ideal

-sm : Only generate one neutrino interaction or one cosmic interaction per ART event. Otherwise, for an ART event, the generator will simulate every cosmic or neutrino interaction which occurs in the detector during the beam spill or trigger window.

-go : Only run the simulation up through the g4nova (GEANT) step. Usually, the simulation runs through a bunch of different steps to make Monte Carlo. A generator step (CRY/genie, generating the initial event), geantgen (which simulates the passage of particles through matter), photrans (the photon transport model), and daq (DAQ simulation module). Setting geant-only gets rid of the last two steps and only runs generator and geantgen. We usually use this if we later want to “overlay” a second set of interactions on a first (as you’ll later see with the ND genie files).

-ro : (Less common) Only generate neutrinos or cosmic interactions which occur in the rock around the detector.

B) How many files to make, and what to name them?:

If you want to make a single file, just add the following to the end of your make_sim_fcl script:

-f <local file>.root 

For instance:

make_sim_fcl -n 100 -det fd -gen cry -fs all -f foo.fcl

Naturally, however, we don’t make one giant simulation file, as this would take forever and be extremely unwieldy. Instead, we make MC files with Run and Subrun numbers, just like our real data. There are two ways to do this.

First, we can make files for “ideal” running conditions, with fake numbers. This is the simplest way of doing this

-gm <gain mode> -lr <last run> -ns <number of subruns>

You can choose either -gm 100 or -gm 140. -gm 100 will assume a first run of 1000000 and -gm 140 will assume a first run number of 2000000.
For example:

-gm 100 -lr 1000007 -ns 100

The “last run” is inclusive, so this will make 7 runs x 100 subruns/run = 700 runs, with regular numbering.

However, sometimes we want to use “real” run numbers, so that we can later do POT reweighting to match our actual running conditions. In this case, we take a list of real data runs that we’ve determined to be “good” for each detector and randomly select N run and subrun numbers:

-rsr <number of files> <Good run list> :
For example:

-rsr 1500 /nova/app/users/rbpatter/runlists/fd_numi_subrunlist_20150320_30secadded.vec

This makes 1500 files with run and subrun numbers taken from the official list of Far Detector “good” files. Sometimes there are more Monte Carlo files than good runs and subruns. In this case, make_sim_fcl iterates up a “cycle” version (“c00x”) in the filename.

Finally, sometimes you want to add a special tag to your files, both in the names and metadata, to distinguish them from “official” samples. In this cause you use the “special” option (you should use this in the later example tasks I give in this guide):

-sp <label>: Add the string <label> to the NOVA.special metadata parameter. This often gets appended to other “special” strings which have been added by other options in make_sim_fcl.

C) Where do I put the files?

One last important feature is telling make_sim_fcl where to put the output files. I already showed you how to make a single local file (-f). Here’s what you do with a big group of multiple files.

First, let’s say you want to just produce your files locally (i.e., in your test release or somewhere else on bluearc). Usually, you’d do this for checking your code, for testing, etc. In this case, the file names will be automatically generated, and you simply supply make_sim_fcl with an output directory:
-o <directory>: Output the ensemble of files to the given directory.

For “real” running, however, you’re going to want to write stuff to SAM, via
-fts : Send the files directly to FTS.
This will automatically send the fcl files to the /nova/prod dropbox for FTS, to put them in a dataset where they can be accessed by SAM when we run our jobs on the grid. I’ll explain more about this in the next section. For now, though, only do this when you’ve settled on your make_sim_fcl commands and want to make “real” files to run. You must be novapro to write the fcls to /pnfs/. To do this, just type "ksu novapro". Don't forget to type "exit" afterwards.

But wait! Let’s say that you messed up something with your dataset, but you’ve already sent it to FTS. You asked for too many or too few events / file, for the wrong numbering scheme, etc. Running make_sim_fcl again will simply add files to the existing dataset, and since SAM randomly accesses files in the dataset, you’ll later be running a combination of old “bad” and new “good” FTS files. You want to make a new version of the dataset which will supersede the old files. You can do this by requesting a new “iteration” with the -i command:
-i <number>:
For the first time you make a set of files, do -i 1. But, for instance, if you accidentally made a set of files that has 100 instead of 1000 events per file, and don’t want to change anything else, you can run make_sim_fcl with -i 2. This will make a new set of files with “v2” in the name, and a new dataset with v2 appended to the end as well. For testing purposes, you should use -i 0; this way, generated files have no chance of getting mixed in with real datasets.

Part 2: FTS and Samweb

So what happens when you send your files to FTS to use on SAMWeb? Here’s an example of a submission:

make_sim_fcl  -fr 1000001 -lr 1000001 -ns 100 -n 200 -det fd -gen cry -fs all -i 1 --special prodtest -fts

What happens when I put this into my command line? (Don’t do this yourself yet - I’ll have more instructions / a separate example in a minute.) You’ll start by seeing the following for a while:

Creating /nova/prod/FTS_DropBoxes/FCL_dropbox/0/fardet_cosmics_all_prodtest_200_r01000001_s00_c000_FA14-10-03x.a_v1_20150510_135535.fcl
Creating /nova/prod/FTS_DropBoxes/FCL_dropbox/1/fardet_cosmics_all_prodtest_200_r01000001_s01_c000_FA14-10-03x.a_v1_20150510_135535.fcl
Creating /nova/prod/FTS_DropBoxes/FCL_dropbox/2/fardet_cosmics_all_prodtest_200_r01000001_s02_c000_FA14-10-03x.a_v1_20150510_135535.fcl

This is make_sim_fcl making the fcl files themselves, and then putting them in the FTS dropbox (/nova/prod/FTS_DropBoxes/FCL_dropbox/0/f etc.). You’ll also see print outs like the following:
VOLUME is: rock_detector
SPECIAL is: prodtest
SPECIAL: prodtest
FCL_SPECIAL: prodtest
base_dimensions: fcl.Version=FA14-10-03x.a and nova.label=beta and nova.detectorID=fd  and simulated.cryflavorset=all and simulated.cryused=true and simulated.genieflavorset=none and simulated.genieused=false and simulated.singlepflavorset=none and simulated.singlepused=false and nova.subversion=1 and simulated.mixingType=pileup and simulated.volume=rock_detector and nova.special=prodtest

These are some of the base options defining what’s in the dataset, and the basic Metadata parameters. Note that a few things get picked up that you didn’t explicitly give to make_sim_fcl command. In particular, the fcl.Version, which responds to the release you’ve set up using setup_nova (here FA14-10-03x.a).

Using this information, the scripts look for the correct dataset to stick the files in:

parent dataset doesn't exist, defining it 
Dataset definition 'parent_FA14-10-03x.a_fd_cry_all_prodtest' has been created with id 133871
Dataset definition 'prod_fcl_FA14-10-03x.a_fd_cry_all_prodtest' has been created with id 133891
Dataset definition 'prod_fcl_FA14-10-03x.a_fd_cry_all_prodtest_draining' has been created with id 133911
Dataset definition 'prod_daq_FA14-10-03x.a_fd_cry_all_prodtest' has been created with id 133931
Dataset definition 'prod_daq_FA14-10-03x.a_fd_cry_all_prodtest_draining' has been created with id 133951

Here, a corresponding dataset was not found, so the scripts went and created 5 new ones to correspond to the Metadata info. There’s an umbrella parent dataset, two “prod_fcl” datasets, and two “prod_daq” datasets. The fcl files will go to the “prod_fcl” datasets, and after we run the simulations jobs, the output ART simulation files will get sent to the “prod_daq” ones.

So what are the two “draining” datasets? These are extremely useful datasets which get used for file recovery (I’ll explain more about this in the second how-to, on running jobs). Here’s a quick explanation for now: the non-draining fcl dataset will always hold all existing fcls. As production jobs are successfully run, however, the corresponding fcl file gets removed from the fcl “_draining” set. This means that if some of your jobs fail, you can run over the draining dataset a second time and not re-run anything.

Due to heavy use, the FTS server can often get severely backed up, so it might take some time for your files to be registered. To check how many of your files made it to the dataset, use the following:
samweb -e nova count-definition-files prod_fcl_FA14-10-03x.a_fd_cry_all_prodtest
(You can list the files using list-definition-files instead.)

You can also see what’s happening on the FTS server by going to the following web page:

Note that you can only access this if you’re on a FNAL machine. You can do this by connecting to Fermilab via VPN, using your Services username and password. Here’s how to do this:

Part 3: An example to try

Here’s a quick example that you can try. If you get any proxy or certificate complaints about these commands, try running the command “kx509” first (I honestly forget if these are necessary just for SAMWeb submissions, or for make_sim_fcl as well; you may as well do it anyway before you try).

Let’s try to make some Far Detector cosmic files with “ideal” run numbers. We’ll start by just making a local fcl file, so you can look at it and see what the various components are. First, set up your release; let’s do the current “official” production release:

setup_nova -r FA14-10-03x.a -b maxopt

(“maxopt” is the optimized version of this build.)

Now let’s make a test release (apologies if you already know how to do this):

newrel -t FA14-10-03x.a FA14-10-03x.a_ProductionTest
cd FA14-10-03x.a_ProductionTest
srt_setup -a

Now, let’s make a single fcl. Do this (you don’t need to download any packages for the moment):

make_sim_fcl -n 200 -det fd -gen cry -fs all --special firstfcltest -f mytest.fcl -i 0

This makes a single fcl file (mytest.fcl) in your local directory. It will have -n 200 “events” which are trigger windows (-n 200), for cosmic events (-gen cry) in the Far Detector (-det fd). We’re going to run every type of event (-fs all). We’re also going to distinguish it from the official sample with a special lable “firstfcltest” (--special firstfcltest).

Try this, and look at the fcl file code to get a sense for how it works. Try running it interactively! (Although don’t do all 200; let’s just ask to run 10 events instead):

nova -c mytest.fcl -n 10

(If you don’t include the -n 10 option, ART automatically runs whatever number of events are requested in the fcl file.)

Next, let’s make a bunch of these files, and send them to FTS. You may want to run that “kx509” at this point. Now do the following:

make_sim_fcl  -fr 1000001 -lr 1000001 -ns 10 -n 200 -det fd -gen cry -fs all -i 1 --special firstfcltest -fts

We’re making 10 files, run number 1000001, with 10 subruns (-ns 10), first iteration (-i). These will go to FTS. See what the output says, but it should make a set of datasets with names like prod_fcl_FA14-10-03x.a_fd_cry_all_firstfcltest. To see how many have gotten registered, do this:

samweb -e nova count-definition-files prod_fcl_FA14-10-03x.a_fd_cry_all_firstfcltest

(Assuming that’s what the name is - check the output from make_sim_fcl to see the names of the dataset produced!)

When this reads 10, all of the files have been transferred to the dataset, and we’ll be ready for job submission, the next guide!

Part 4: Generating Official Files

DON’T RUN THESE COMMANDS. These are just showing you what got made for this round of Production, so you know what to do in the future. If you run these, it will add extra files to the “official” set and cause confusion!

All of these are Forward Horn Current (FHC), but we might make Reverse Horn Current (RHC) in the future.

First, the beam/genie Far Monte Carlo. We make 1500 files with subruns taken from the list of good runs for the FD:

make_sim_fcl -rsr 1500 /nova/app/users/rbpatter/runlists/fd_numi_subrunlist_20150320_30secadded.vec -n 1000 -det fd -gen genie -hp fhc -fs nonswap -i 1 -fts
make_sim_fcl -rsr 1500 /nova/app/users/rbpatter/runlists/fd_numi_subrunlist_20150320_30secadded.vec -n 1000 -det fd -gen genie -hp fhc -fs swap -i 1 -fts
make_sim_fcl -rsr 1500 /nova/app/users/rbpatter/runlists/fd_numi_subrunlist_20150320_30secadded.vec -n 1000 -det fd -gen genie -hp fhc -fs tau -i 1 -fts

This made three fcl datasets:

Next, we also made beam/genie Near Detector Monte Carlo. These are slightly special, in that they get run with the “-go” geant-only option. We do this because we’ll later be overlaying the files with ND rock files at the simulation step. These run and subrun numbers are also taken from the list of good runs for the ND:
make_sim_fcl -rsr 20000 /nova/app/users/rbpatter/runlists/nd_numi_subrunlist_20150318_30secadded.vec -n 2000 -det nd -gen genie -hp fhc -fs nonswap -go -i 1 -fts
The following dataset was made:

Sometimes, it is also requested that you make cosmic CRY samples. For this round of production, this was only done for testing, and with ideal running numbers. Here are some examples of submission commands; in the future, ask what you should produce, or look at the previous official CRY production datasets:

make_sim_fcl  -fr 1000001 -lr 1000005 -ns 100 -n 1000000 -det nd -gen cry -fs all -i 1 fts 
make_sim_fcl  -fr 1000001 -lr 1000001 -ns 100 -n 200 -det fd -gen cry -fs all -i 1 -fts

NearDet Rock Singles

Also, you may also be asked to produce ND Rock Singles. These are ND beam events which only interact in the rock around the detector; they are layer overlayed on top of the regular ND beam files to make our official ND MC. There are a few special things about these files. First, they are, of course, rock-only (-ro). They are also geant-only, because they’re being used in overlay code (-go). Finally, they are also singles (-sm), with a single interaction per ART event.

Here’s what the submission code looks like:

make_sim_fcl -gm 100 -lr 1000046 -ns 100 -n 20000 -det nd -gen genie -hp fhc -fs nonswap -ro -sm -go -fts -i 1

We don’t always produce this. The number of files to produce depends on a few variables but in this case this will produce from run 1000000 to run 1000046 with 100 subruns per run --> 4700 files.

Otherwise, those should be the primary files you’ll get asked to produce!

How many NearDet Rock Singles?

In the past we have made 20,000 NearDet genie files (10e20 POT in total, with 2000 spills/file and 2.5e13 POT/spill - note half intensity here, be aware of this changing in the future).

In determining how many rock files to make for the NearDet overlays, there are a few variables to keep in mind:

  • Number of events per ND rock file: evts = 20000
  • Total ND events (ND POT/POT per spill): N = 4e7
  • Filter efficiency (i.e, for what fraction of events the generator produces an actual event): eff = 0.04
  • Desired reuse rate for each rock event: R=200
  • Number of rock singles per primary event (calculated by studies): n_avg= 19.056

The n_avg number comes from the meanPoisson value in NovaSimMixer/prodNDRockMixMerge

So, for a desired event repetition rate of 200, we would need:
  • Number of rock secondaries files = N*n_avg/(eff*R*evts) = ~4700 files

For the code itself, we also need to decide how many secondary files we will use per overlay job. The calculation for the number of secondary files required is a difficult calculation because the number of events needed is random for each job, and the number of events in each secondary file is also random due to the fact that a filter has been applied to remove empty events.
The best one can do is be conservative in the estimates. In the past an arbitrary conservative factor of 1.4 has been applied when deciding how many files to sample per overlay job:

Number of secondary files: *N_sec = 1.4(2000 evts/file)*n_avg/(evts*eff) = ~70 files*