# Nue Validation¶

## In a nutshell¶

• To use generic validation: write a json configuration file pointing to your files; choose appropriate names and an accessible location.

Some notes from the experience from feature_caf_size validation (source:branches/feature_caf_size/CAFAna/Validation/). The following refers to four datasets (data, mc) (old, new), but can be extended to other cases

Note that many workflow choices are to avoid excessive copy-pasting.

### Preparing your macro and running a test¶

• Your macro should be flexible to work on both datasets + be hadd_cafana or hadd able
cafe -bq nue_data_mc_validation.C+ <outfile_old> <> <nddata_old> <ndmc_old>
cafe -bq nue_data_mc_validation.C+ <outfile_new> <> <nddata_new> <ndmc_new>


Depending on the type of validation, each line might need to be run with different tags/test releases
• Test your macro interactively over a single file/run.
cafe -bq nue_data_mc_validation.C+ <outfile_old_test> <> <nddata_old_test> <ndmc_old_test>
cafe -bq nue_data_mc_validation.C+ <outfile_new_test> <> <nddata_new_test> <ndmc_new_test>


Try specific file names or a samweb query like

nddata_test="dataset_def_name_newest_snapshot "$nddata" and run_number 11264 and Online.Subrun 00" ndmc_test="dataset_def_name_newest_snapshot "$ndmc" and run_number 11264 and Simulated.firstSubRun 00"


(make sure files and snapshots exit in all cases).

• Validation/generic will use the spectra/histogram names in your <outfile> to write the website. These include the pretty selectors.
Any "cosmetic" changes can be done over the same spectra, shouldn't need to re-run. nue_data_mc_validation.C splits the process via a boolean. You might want to limit the number of variables and cuts that are "formatted" in this step to quickly generate a preview of the validation

### Running the validation¶

• Remember to clear the failed attempts from the /nusoft/app/web area

### Running over the full dataset¶

• If concats are available, just replace the test above with the filenames
• For large datasets, run the spectra-saving section on the grid. This one sends 40 jobs
submit_cafana.py -n 40 -r $release -t$testrel -o $output nue_data_mc_validation.C$file true "$nddata" "$ndmc"


Note that your release and optional test_release must be consistent with your datasets.
Check your progress: jobsub_q --user USERNAME
• Once all your jobs are done, hadd_cafana the results, apply format as you did with the test, create the validation website
hadd_cafana <outfile_old> pnfs2xrootd \path\to\pnfs\file.*of40.root


### Trouble shooting¶

• Low hanging fruit: are you using the correct tags? datasets? all needed packages in your test release? did you re-compile?
Check the commit history for the package: Is the error related to a recent change?
• Did you search in slack?
• Branch doesn't exist: check your test release for consistency (StandardRecord, CAFAna, MCReweight). You might need to fix the variable definition
• It says that branch doesn't exist but I totally see it: check the requirements of the Var; try using xx.xx.xx.xx to xx.xx.* and report it
• Complaints from nan/inf: some variable might not be filled with default value. Identify and report.
• Occasional segfault on large dataset, not test: some variable might be ill-defined, or not properly filled. Identify and report
• Things break with no cut, ok otherwise: there might be a problem with LID variables
• Empty new_histograms: is the variable being filled in the new cafs? is a default value assigned out of range?
• Weight errors: are weights applicable to both datasets?