- For each dataset, generate a root file with the same histograms/spectra. These can then be displayed/compared using the source:trunk/Validation webpage making tools.
- To use generic validation: write a yaml configuration file pointing to your files; choose appropriate names and an accessible location.
README for the Validation documentation: source:trunk/Validation/generic/README.txt
The following refers to four datasets (data, mc) (old, new), but can be extended to other cases
Note that many workflow choices are to avoid excessive copy-pasting.
Preparing your macro and running a test¶
- Your macro should be flexible to work on both datasets + be
cafe -bq NDDataMC.C+ $OUTFILE_OLD true $NDDATA_OLD $NDMC_OLD cafe -bq NDDataMC.C+ $OUTFILE_NEW true $NDDATA_NEW $NDMC_NEW
Depending on the type of validation, each line might need to be run with different tags/test releases
- Use variables and cuts defined in header files; these are more likely to be properly maintained.
Frequently used variables have been defined in source:trunk/CAFAna/Vars/Vars.h, source:trunk/CAFAna/Vars/TruthVars.h, source:trunk/CAFAna/Vars/NusVars.h
- In addition, in source:trunk/CAFAna/Vars/NueVarsExtra.h and source:trunk/CAFAna/Vars/NueVarsExtra.cxx as well as our own source:trunk/CAFAna/Vars/NusVars.h you'll find functions that return groups of
HistDefs, useful structures that already have short names, variables, and appropriate
HistAxis. No need to reinvent the wheel
- Test your macro interactively over a single file/run.
cafe -bq NDDataMC.C+ $OUTFILE_OLD_TEST true $NDDATA_OLD_TEST $NDMC_OLD_TEST cafe -bq NDDataMC.C+ $OUTFILE_NEW_TEST true $NDDATA_NEW_TEST $NDMC_NEW_TEST
Try specific file names or a samweb query like
NDDATA_TEST="dataset_def_name_newest_snapshot "$NDDATA" and run_number 11264 and Online.Subrun 00" NDMC_TEST="dataset_def_name_newest_snapshot "$NDMC" and run_number 11264 and Simulated.firstSubRun 00"
(make sure files and snapshots exit in all cases).
Validation/genericwill use the spectra/histogram names in your
<outfile>to write the website. These include the pretty selectors.
Any "cosmetic" changes can be done over the same spectra, shouldn't need to re-run.
FDDataMC.C)splits the process via a boolean. You might want to limit the number of variables and cuts that are "formatted" in this step in order to quickly generate a preview of the validation.
Running the validation¶
- Start with 4 formatted files: data/mc old/new
- Write the yaml config files and run the validation
sh $SRT_PUBLIC_CONTEXT/Validation/generic/run_validation.sh config.yaml
An example of the configuration file is source:trunk/Validation/generic/test/test_validation.yaml
You will point the output (controlled by the yaml file) to the
Within this area are users/ directories. For more official validation you can send it to
/nusoft/app/web/validation/nus/ and a designated sub-directory.
- Remember to clear any failed attempts from the
Running over the full dataset¶
- If concats are available, just replace the test above with the filenames
- For large datasets, run the spectra-saving section on the grid. This one sends 40 jobs
submit_cafana.py -n 40 -r $RELEASE -t $TESTREL -o $OUTPUT_DIR NDDataMC.C $OUTFILE true "$NDDATA" "$NDMC"
Note that your release and optional
test_releasemust be consistent with your datasets.
Check your progress:
jobsub_q --user USERNAME
- Once all your grid jobs are done,
hadd_cafanathe results, apply format as you did with the test, create the validation website
hadd_cafana $OUTFILE_NEW `pnfs2xrootd \path\to\pnfs\file.*of40.root`
Once you've got the hadd'ed output file, you can then run the macro again but this time skipping the make spectra section:
cafe -bq NDDataMC.C+ $OUTFILE_NEW false
- Low hanging fruit: are you using the correct tags? datasets? all needed packages in your test release? did you re-compile?
Check the commit history for the package: Is the error related to a recent change?
- Did you search in slack?
- Branch doesn't exist: check your test release for consistency (StandardRecord, CAFAna, MCReweight). You might need to fix the variable definition
- It says that branch doesn't exist but I totally see it: check the requirements of the Var; try using xx.xx.xx.xx to xx.xx.* and report it
- Complaints from nan/inf: some variable might not be filled with default value. Identify and report.
- Occasional segfault on large dataset, not test: some variable might be ill-defined, or not properly filled. Identify and report
- Things break with no cut, ok otherwise: there might be a problem with LID variables
- Empty new_histograms: is the variable being filled in the new cafs? is a default value assigned out of range?
- Weight errors: are weights applicable to both datasets?