Running MC Validation

There are multiple packages that allow for tests of the MC. The two main packages are MCCheckOut and MCCheater. MCCheckOut has many modules that creates plots of many important quantities; TestTrackIds within MCCheater checks much of the truth matching that occurs within MC files.


Standard MCCheckOut Running

Results from standard running appear at this webpage:

  • Make the output area for MCCheckOut jobs
    mkdir /nova/ana/users/<USER>/<TESTREL>/<DATASET>
    chmod 775 /nova/ana/users/<USER>/<TESTREL>/<DATASET>
  • Make and run the configuration file for the MCCheckOut jobs
    cd /nova/app/users/<USER>/<TESTREL>/
    cp Submit_MCCheckOut_<OLD_SET>.cfg Submit_MCCheckOut_<DATASET>.cfg
    # open the new cfg file in your favorite editor
    # find and replace 3 appearances of <OLD_SET> with <DATASET>
    # make sure --inputfile, --testrel, and --dest point to your local <USER> app or ana area
    # alternatively, copy the example file below and replace <USER>, <TESTREL>, and <DATASET> with the appropriate values -f Submit_MCCheckOut_<DATASET>.cfg
  • Check and add the MCCheckOut files
    cd /nova/ana/users/<USER>/MCValidation/<DATASET>
    ls *.root | wc -l # make sure this is 100
    ls -alhFG *.root | awk '{print $4}' # make sure files have roughly the same size, especially look for empty files or different suffixes (k vs M)
    ls -1 -d *.root > files.txt
    hadd mccheckout_<DATASET>.root @files.txt
  • Generate HTML
    cd /nova/app/users/<USER>/<TESTREL>/MCCheckOut
    .L HtmlGenieCompare.C+
    # You can open the webpage in your default browser by uncommenting the following on a terminal:
    #open html_out/mccheckout_<DATASET>.root_comp/index.html
  • Make HTML public (if you have novasoft privileges)
    ksu novasoft # You must have novasoft priveleges...
    # I have the following line in my .bash_profile:
    #alias cdhtml="cd /nusoft/app/web/htdoc/nova/novasoft" 
    # If this is the first set of plots from the new dataset, uncomment the following:
    #mkdir <DATASET>
    cp -r /nova/app/users/<USER>/<TESTREL>/MCCheckOut/html_out/mccheckout_<DATASET>.root_comp ./<DATASET>/<DATASET>#_<add additional tags here if necessary>
    cd /nova/app/users/<USER>/<TESTREL>
    exit # Back to your own user

Example cfg File


--jobname MCCheckOut_<DATASET>
--defname <DATASET>
--njobs 100
--files_per_job 1

-c mccheckoutjob.fcl
--inputfile /nova/app/users/<USER>/<TESTREL>/MCCheckOut/mccheckoutjob.fcl
--tag development
--testrel /nova/app/users/<USER>/<TESTREL>/

--dest /nova/ana/users/<USER>/<TESTREL>/<DATASET>/
--histTier mccheckout.hist

Replace the two occurrences (-c and --inputfile) of mccheckoutjob.fcl to cosmicanajob.fcl to run the standard cosmic suite of modules.

Notes and Discussion

For the online plots, do try to keep to the current conventions--label directories by the 'new' tag release and append extra tags only if necessary. For example, the directory FA14-10-03_fluxv08 was labeled as such because the ND and FD datasets are generated with tags FA14-10-03x.d and FA14-10-03x.e. Inside the directories, label individual entries by dataset. If the same 'new' dataset is compared to multiple 'old' ones, differentiate them by appending to the dataset name "_comp_<TAG>". Remember, conventions are used to make it easier for everyone, and if you change them, things become messy, people will be less inclined to explore validation plots, and errors are more likely to propagate. If you do change the conventions, make sure they are robust and can handle edge cases like comparing one 'new' dataset to multiple 'old' ones.

Someone (production, DetSim convener(s), the Dataset explorer) will provide you with a dataset. Copy this dataset. Datasets are long and you are error prone.

The MCCheckOut trunk currently (Jan 30, 2016) has 4 root macros for generating html pages: HtmlGenie.C, HtmlGenieCompare.C, HtmlCosmic.C, HtmlCosmicCompare.C. The versions without "Compare" are run with a single dataset and do not generate any ratio plots. The versions with "Compare" take two datasets and makes ratio plots. The first input should be the "new" dataset, or the dataset that should be used for titling the pages, etc. For 1D histograms, 'red is old, blue is new.' For 2D histograms, where no ratios can be reasonably generated, the macros will just display the new histogram on the left and old histogram on the right. The Genie versions of the macro correspond to running the standard mccheckoutjob.fcl, the Cosmic versions correspond to running the standard cosmicanajob.fcl.

The specific MCCheckOut modules that are run in a job are set in the lowercase <...>job.fcl files. The fcl parameters for the standard modules are set in MCCheckOut.fcl.

Possible Future Improvements

  • A script that combines files (as opposed to hadd-ing them): For plots generated in POTAna, which are in the form of something per spill or per subrun, results would be better if shown in separate bins when combined, but hadd adds all the results into a single bin. (This means that the plots actually show 'something' over all spills or over all subruns.) It would be useful to have a script that detects the parent directory name, hadds histograms for most directories, but for POTAna places each result in a separate bin.
  • Automating the standard validation: This would require some interface with production. Previously discussed ideas involve setting up a recurring crontab that regularly generates sim files, probably weekly. The idea would be to seed events the same way every time so that events should theoretically only change if the simulation has changed. Other improvements could involve generating Ks tests alongside or replacing ratio plots.
  • Html page improvements: At the very least, a drop down menu for finding plots would be useful, as many people do not enjoy long term scrolling to find plots. If possible, it would be very cool to run root within the webpage and store the MCCheckOut root files themselves. The webpage could then implement two drop down bars for datasets and the webpage root session could generate comparisons on the fly, allowing any dataset to be compared to any other dataset, users to zoom or change to log scales on histograms, and directly save results.


Coming soon!

Gareth Kafka
Harvard University | Physics Ph.D. Candidate