Offline Data Quality

The goal is to assess the health of a data taken run-by-run at many levels.

The first level is the Raw data level. This is already checked (at some level) by the online data quality monitoring (DQM) but will be our first line of defense.

The next level is the Digit data level. This is the data created by the Slicing of the raw data. Thus the plots will check (as we take more data and update versions of LArIATsoft) that the slicing is behaving in an expected and consistent way.

For example, by plotting the number of RawDigit objects (this is the number of TPC channels created) per event, and per sub-run we can monitor for Good TPC data. We could also check that they are all the expected total number of ticks (sometimes the reports errors with this, we think.)

These plots should also be made for weekly versions of LArIATsoft (comparing this weeks latest release to the previous weeks release)

Finally, checking the Reconstructed quantities made from these digits as a function of sub-run on a week by week basis will provide us a check of software versions and bugs introduced in our software.

For example, the number of reconstructed hits, tracks, Optical Flashes, Wire Chamber Tracks, Time-of-Flight objects, etc… should all be “roughly” flat as a function of sub-run (or run). This will require having a standard reconstruction fhicl file which is run over the newly sliced data and a simple plotting script which compares last weeks result to this weeks result.

Plots and quantities of interest:


Number of raw data blocks for this subrun:
  • 1740s
  • 1751s
  • MWPCs
  • Trigger V1495


Number of failures to make each kind of digit, given the parent raw data block
  • RawDigits
  • OptDetDigits
  • TOFDigits
  • TDCDigits
  • Are there even aerogel digits?
  • TriggerInputDigits is that a thing?
  • MuRSDigits


I expect these distributions to be (roughly) flat as a function of sub-run. All these quantities should be straightforward to extract
  • Number of Wire Chamber Tracks
  • Number of "Good" TOF objects
  • Number of TPC Hits
  • Number of TPC Tracks
  • Number of Optical Flashes

I don't know the status of these objects in reconstructed and likely should be flushed out a bit more

  • Number of AeroGel Hits
  • Number of Muon Range Stack Hits
  • Number of TPC Showers

Instructions on how to get started with this task
(This is meant to provide a sketch of steps which can be followed to create this module and successfully test that it is working)

  1. Check out develop version of LArIATsoft
  2. Create a feature branch (git flow feature start feature_branch_name) with some useful title (e.g. OfflineDQM_username)
  3. Push your feature branch to the public (git push -u origin feature_branch_name)
  4. Inside srcs/lariatsoft/LArIATAnaModule create a new analyzer module OfflineDQMAna (see instructions on using artmod to do this...e.g. artmod -A analyzer OfflineDQMAna)
  5. Modify srcs/lariatsoft/LArIATAnaModule/lariatanamodules.fcl to include your new module
  6. Create a fcl file which runs your module (see srcs/lariatsoft/LArIATAnaModule/AnaTree.fcl as an example...your will look similar...if you can have the histogram file named by date that will make bookkeeping easier)
  7. Add code which makes plots of the quantities we are interested in (it might be easiest to start with the Reconstructed objects as you can use as an example...then work backwards to including digits and raw objects)
  8. Test on some sliced data
  9. Commit and push your changes to the feature branch and we can all have a look and help out....