Nue Validation

In a nutshell

  • To use generic validation: write a json configuration file pointing to your files; choose appropriate names and an accessible location.
    README for the Validation documentation: source:trunk/Validation/generic/README.txt


Some notes from the experience from feature_caf_size validation (source:branches/feature_caf_size/CAFAna/Validation/). The following refers to four datasets (data, mc) (old, new), but can be extended to other cases

Note that many workflow choices are to avoid excessive copy-pasting.

Preparing your macro and running a test

  • Your macro should be flexible to work on both datasets + be hadd_cafana or hadd able
    cafe -bq nue_data_mc_validation.C+ <outfile_old> <> <nddata_old> <ndmc_old>
    cafe -bq nue_data_mc_validation.C+ <outfile_new> <> <nddata_new> <ndmc_new>

    Depending on the type of validation, each line might need to be run with different tags/test releases
  • Test your macro interactively over a single file/run.
    cafe -bq nue_data_mc_validation.C+ <outfile_old_test> <> <nddata_old_test> <ndmc_old_test>
    cafe -bq nue_data_mc_validation.C+ <outfile_new_test> <> <nddata_new_test> <ndmc_new_test>

    Try specific file names or a samweb query like

    nddata_test="dataset_def_name_newest_snapshot "$nddata" and run_number 11264 and Online.Subrun 00" 
    ndmc_test="dataset_def_name_newest_snapshot "$ndmc" and run_number 11264 and Simulated.firstSubRun 00" 

    (make sure files and snapshots exit in all cases).

  • Validation/generic will use the spectra/histogram names in your <outfile> to write the website. These include the pretty selectors.
    Any "cosmetic" changes can be done over the same spectra, shouldn't need to re-run. nue_data_mc_validation.C splits the process via a boolean. You might want to limit the number of variables and cuts that are "formatted" in this step to quickly generate a preview of the validation

Running the validation

  • Start with 4 formatted files: data,mc old,new
  • Remember to clear the failed attempts from the /nusoft/app/web area

Running over the full dataset

  • If concats are available, just replace the test above with the filenames
  • For large datasets, run the spectra-saving section on the grid. This one sends 40 jobs -n 40 -r $release -t $testrel -o $output nue_data_mc_validation.C $file true "$nddata" "$ndmc" 

    Note that your release and optional test_release must be consistent with your datasets.
    Check your progress: jobsub_q --user USERNAME
  • Once all your jobs are done, hadd_cafana the results, apply format as you did with the test, create the validation website
    hadd_cafana <outfile_old> `pnfs2xrootd \path\to\pnfs\file.*of40.root`

Trouble shooting

  • Low hanging fruit: are you using the correct tags? datasets? all needed packages in your test release? did you re-compile?
    Check the commit history for the package: Is the error related to a recent change?
  • Did you search in slack?
  • Branch doesn't exist: check your test release for consistency (StandardRecord, CAFAna, MCReweight). You might need to fix the variable definition
  • It says that branch doesn't exist but I totally see it: check the requirements of the Var; try using xx.xx.xx.xx to xx.xx.* and report it
  • Complaints from nan/inf: some variable might not be filled with default value. Identify and report.
  • Occasional segfault on large dataset, not test: some variable might be ill-defined, or not properly filled. Identify and report
  • Things break with no cut, ok otherwise: there might be a problem with LID variables
  • Empty new_histograms: is the variable being filled in the new cafs? is a default value assigned out of range?
  • Weight errors: are weights applicable to both datasets?