In a nutshell¶
- For each dataset, generate a root file with the same histograms/spectra.
These are generated by standard source:trunk/CAFAna/nue/SecondAna/dataMC/nd_data_mc.C, or source:branches/feature_caf_size/CAFAna/Validation/nue_data_mc_validation.C
TODO: bring an updated version of
CAFAna/Validation/to the trunk
- To use generic validation: write a json configuration file pointing to your files; choose appropriate names and an accessible location.
README for the Validation documentation: source:trunk/Validation/generic/README.txt
Some notes from the experience from
feature_caf_size validation (source:branches/feature_caf_size/CAFAna/Validation/). The following refers to four datasets (data, mc) (old, new), but can be extended to other cases
Note that many workflow choices are to avoid excessive copy-pasting.
Preparing your macro and running a test¶
- Your macro should be flexible to work on both datasets + be
cafe -bq nue_data_mc_validation.C+ <outfile_old> <> <nddata_old> <ndmc_old> cafe -bq nue_data_mc_validation.C+ <outfile_new> <> <nddata_new> <ndmc_new>
Depending on the type of validation, each line might need to be run with different tags/test releases
- Use variables and cuts defined in header files; these are more likely to be properly mantained.
Frequently used variables have been defined in source:trunk/CAFAna/Vars/Vars.h, source:trunk/CAFAna/Vars/TruthVars.h, source:trunk/CAFAna/Vars/NueVars.h source:trunk/CAFAna/Vars/NueVarsExtra.h
- In addition, in source:trunk/CAFAna/Vars/NueVarsExtra.h and source:trunk/CAFAna/Vars/NueVarsExtra.cxx you'll find functions that return groups of
HistDefs, useful structures that already have shortnames, variables, and appropriate
HistAxis. No need to reinvent the wheel
- Test your macro interactively over a single file/run.
cafe -bq nue_data_mc_validation.C+ <outfile_old_test> <> <nddata_old_test> <ndmc_old_test> cafe -bq nue_data_mc_validation.C+ <outfile_new_test> <> <nddata_new_test> <ndmc_new_test>
Try specific file names or a samweb query like
nddata_test="dataset_def_name_newest_snapshot "$nddata" and run_number 11264 and Online.Subrun 00" ndmc_test="dataset_def_name_newest_snapshot "$ndmc" and run_number 11264 and Simulated.firstSubRun 00"
(make sure files and snapshots exit in all cases).
Validation/genericwill use the spectra/histogram names in your
<outfile>to write the website. These include the pretty selectors.
Any "cosmetic" changes can be done over the same spectra, shouldn't need to re-run.
nue_data_mc_validation.Csplits the process via a boolean. You might want to limit the number of variables and cuts that are "formatted" in this step to quickly generate a preview of the validation
Running the validation¶
- Start with 4 formatted files: data,mc old,new
- Write the yaml/json config files and run the validation
sh $SRT_PUBLIC_CONTEXT/Validation/generic/run_validation.sh config.json
An example of the configuration file is source:trunk/Validation/generic/test/test_validation.yaml
- Last block of code in source:/branches/feature_caf_size/CAFAna/Validation/nue_nd_run_validation.sh writes json files for data vs data and mc vs mc, taking into account possible changes in file name, etc., to save some precious minutes
- Remember to clear the failed attempts from the
Running over the full dataset¶
- If concats are available, just replace the test above with the filenames
- For large datasets, run the spectra-saving section on the grid. This one sends 40 jobs
submit_cafana.py -n 40 -r $release -t $testrel -o $output nue_data_mc_validation.C $file true "$nddata" "$ndmc"
Note that your release and optional
test_releasemust be consistent with your datasets.
Check your progress:
jobsub_q --user USERNAME
- Once all your jobs are done,
hadd_cafanathe results, apply format as you did with the test, create the validation website
hadd_cafana <outfile_old> `pnfs2xrootd \path\to\pnfs\file.*of40.root`
- Low hanging fruit: are you using the correct tags? datasets? all needed packages in your test release? did you re-compile?
Check the commit history for the package: Is the error related to a recent change?
- Did you search in slack?
- Branch doesn't exist: check your test release for consistency (StandardRecord, CAFAna, MCReweight). You might need to fix the variable definition
- It says that branch doesn't exist but I totally see it: check the requirements of the Var; try using xx.xx.xx.xx to xx.xx.* and report it
- Complaints from nan/inf: some variable might not be filled with default value. Identify and report.
- Occasional segfault on large dataset, not test: some variable might be ill-defined, or not properly filled. Identify and report
- Things break with no cut, ok otherwise: there might be a problem with LID variables
- Empty new_histograms: is the variable being filled in the new cafs? is a default value assigned out of range?
- Weight errors: are weights applicable to both datasets?