Nus Validation


  • To use generic validation: write a yaml configuration file pointing to your files; choose appropriate names and an accessible location.
    README for the Validation documentation: source:trunk/Validation/generic/README.txt


The following refers to four datasets (data, mc) (old, new), but can be extended to other cases

Note that many workflow choices are to avoid excessive copy-pasting.

Preparing your macro and running a test

  • Your macro should be flexible to work on both datasets + be hadd_cafana or hadd able
    cafe -bq NDDataMC.C+ $OUTFILE_OLD true $NDDATA_OLD $NDMC_OLD
    cafe -bq NDDataMC.C+ $OUTFILE_NEW true $NDDATA_NEW $NDMC_NEW

    Depending on the type of validation, each line might need to be run with different tags/test releases
  • Test your macro interactively over a single file/run.

    Try specific file names or a samweb query like

    NDDATA_TEST="dataset_def_name_newest_snapshot "$NDDATA" and run_number 11264 and Online.Subrun 00" 
    NDMC_TEST="dataset_def_name_newest_snapshot "$NDMC" and run_number 11264 and Simulated.firstSubRun 00" 

    (make sure files and snapshots exit in all cases).

  • Validation/generic will use the spectra/histogram names in your <outfile> to write the website. These include the pretty selectors.
    Any "cosmetic" changes can be done over the same spectra, shouldn't need to re-run. NDDataMC.C (and FDDataMC.C)splits the process via a boolean. You might want to limit the number of variables and cuts that are "formatted" in this step in order to quickly generate a preview of the validation.

Running the validation

  • Start with 4 formatted files: data/mc old/new

Webpage output

You will point the output (controlled by the yaml file) to the /nusoft/app/web area.
Within this area are users/ directories. For more official validation you can send it to /nusoft/app/web/validation/nus/ and a designated sub-directory.

  • Remember to clear any failed attempts from the /nusoft/app/web area.

Running over the full dataset

  • If concats are available, just replace the test above with the filenames
  • For large datasets, run the spectra-saving section on the grid. This one sends 40 jobs -n 40 -r $RELEASE -t $TESTREL -o $OUTPUT_DIR NDDataMC.C $OUTFILE true "$NDDATA" "$NDMC" 

    Note that your release and optional test_release must be consistent with your datasets.
    Check your progress: jobsub_q --user USERNAME
  • Once all your grid jobs are done, hadd_cafana the results, apply format as you did with the test, create the validation website
    hadd_cafana $OUTFILE_NEW `pnfs2xrootd \path\to\pnfs\file.*of40.root`

Once you've got the hadd'ed output file, you can then run the macro again but this time skipping the make spectra section:

cafe -bq NDDataMC.C+ $OUTFILE_NEW false


  • Low hanging fruit: are you using the correct tags? datasets? all needed packages in your test release? did you re-compile?
    Check the commit history for the package: Is the error related to a recent change?
  • Did you search in slack?
  • Branch doesn't exist: check your test release for consistency (StandardRecord, CAFAna, MCReweight). You might need to fix the variable definition
  • It says that branch doesn't exist but I totally see it: check the requirements of the Var; try using xx.xx.xx.xx to xx.xx.* and report it
  • Complaints from nan/inf: some variable might not be filled with default value. Identify and report.
  • Occasional segfault on large dataset, not test: some variable might be ill-defined, or not properly filled. Identify and report
  • Things break with no cut, ok otherwise: there might be a problem with LID variables
  • Empty new_histograms: is the variable being filled in the new cafs? is a default value assigned out of range?
  • Weight errors: are weights applicable to both datasets?