MCC8 Analysis checklist

1. Beam Flash Timing Window:

Each sample has a different timing window where the flashes occur, are yours correctly configured?
The following beam windows have length of 1.8 us: 1.6 us (beam spill) + 0.1 us at each end to accomodate time jitter)
  • MC BNB+ MC cosmics: [3.10, 4.90]
  • BNB data: [3.20, 5.00]
  • BNB-EXT data: [3.60, 5.40]
  • MC BNB + data cosmics overlaid: [3.50, 5.30]
  • References: NuMu CC Inclusive Internal Note

2. Reco2 Calibration data:

In MCC8 if you use the standard production reco2 (pandora, trajcluster, PMA, etc.) there are two sets of calorimetry information, are you using the correct one?
  • "calo" -- does not include the calibration information
  • "cali" -- has had the calibration applied
  • Only MCC 8.8 has the correct calibration, MCC8.17 (v06_26_01_21) is planned to also have the correct calibration
  • Technote: MicroBooNE-doc-14754-v3
  • Public note: MicroBooNE-doc-15584-v13

3. Space-charge corrections: Are you applying them?

If you try to match between reconstructed quantities and truth level information you need to account for the space charge effects that are applied at LArG4.

The following snippet works with uboonecode v06_26_01_20 and corrects both for Space Charge and for other time offsets.

#include "larevt/SpaceChargeServices/SpaceChargeService.h" 
auto const* SCE = lar::providerFrom<spacecharge::SpaceChargeService>();
double nux = mct.GetNeutrino().Lepton().Vx();
double nuy = mct.GetNeutrino().Lepton().Vy();
double nuz = mct.GetNeutrino().Lepton().Vz();
auto scecorr = SCE->GetPosOffsets(nux,nuy,nuz);
double g4Ticks = detClocks->TPCG4Time2Tick(mct.GetNeutrino().Lepton().T())+theDetector->GetXTicksOffset(0,0,0)-theDetector->TriggerOffset();
double xOffset = theDetector->ConvertTicksToX(g4Ticks, 0, 0, 0)-scecorr[0];
double yOffset = scecorr[1];
double zOffset = scecorr[2];
recob::tracking::Point_t mcpos(nux+xOffset,nuy+yOffset,nuz+zOffset);

4. Data-to-MC Normalization: Are you including empty files?

Filtered samples (in both data and MC) may include empty files (runs where no events pass your filter), but you must account for the POT used to produce those files when doing an absolutely normalized comparison.

Additionally, you need to count your POT after processing files (and applying any DQM requirements). For example, if you have <100% grid efficiency you will be incorrectly normalizing your samples for an inefficiency that isn't associated to a cut.

5. Is your sample Prescaled?

  • If you're using standard BNB, BNB EXT, or NuMI On-Beam samples, the chance that your events are prescaled is very unlikely! If you're using NuMI EXT or any Unbiased samples, then you'll likely need to be careful of the prescaling.
  • To correct for any prescaling, you should apply this on a run-by-run basis. Zarko's POT counting tool in it's most recent (August 2018) form, should automatically account for any prescaling, removing this from a worry of any analyser.
  • You can check if your data is prescaled by looking at the software trigger parameters. Constructing a SWTriggerHandle:
      art::Handle<raw::ubdaqSoftwareTriggerData> SWTriggerHandle;
      e.getByLabel("daq", SWTriggerHandle);

where EXT_NUMIwin_FEMBeamTriggerAlgo is one of the many different trigger algorithms. You should check these based on the sample. Also be sure to include:

 #include "uboone/RawData/utils/ubdaqSoftwareTriggerData.h" 

6. Reco to Truth Matching

while looking at some hit-level truth studies I found a bug in the indirect hit-mcparticle association in the production (mcc8) releases.
I believe this is the cause for the apparent degradation in completeness spotted by the CI validation in v06_26_01_10 (see docdb 15854-v1, slide 12).

The problem goes back to when we slimmed the MC reco2 files by dropping some MC truth information.
The backtracker is used in reco1 to create a Hit-MCParticle association for the "gaushit" collection (direct association, labeled "gaushitTruthMatch").
In reco2 then the backtracker is not accessible anymore, so for other hit collections (e.g. "pandoraCosmicHitRemoval" collection which is used by the pandoraNu pass, or "trajcluster" collection), an "indirect" association was derived from the direct association (labeled "crHitRemovalTruthMatch" or "trajclusterTruthMatch").
In order to match hits between collections the bugged version looks for matching contributions between 'StartTick' and 'EndTick' (which correspond to the ROI) while the correct version of the code looks between 'PeakTimeMinusSigma' and 'PeakTimePlusSigma'.

The effect of this bug is that the associated MCParticles from all hits in the same ROI are merged as in this example:
- hitA and hitB are in the same ROI, and thus have the same 'StartTick' and 'EndTick' but different peak time and rms
- when looking at the "gaushitTruthMatch" association hitA is associated to mcp1 and mcp2, and hitB is associated to mcp3
- when looking at the "crHitRemovalTruthMatch" association hitA is associated to mcp1, mcp2, mcp3 and hitB is also associated to mcp1, mcp2, mcp3
In other words a given MCParticle will have more hits associated to it than it should.

The bug may affect MCC8 analyses in terms of efficiency evaluation, other performance metrics, or simply for debugging. Analyzers should evaluate if and how much this affects their analysis. In general, the bug has an effect for ROIs with multiple hits, so more likely close to the vertex and at specific angles where the signal is broad. However, it is hard to imagine how this can screw up things for a long muon; for shorter proton tracks or showers instead this may become more significant.

A workaround for people using the "pandoraCosmicHitRemoval" collection (i.e. for pandoraNu) can be developed as follows.
Given that the "pandoraCosmicHitRemoval" collection is a subset of the "gaushits" one, each "pandoraCosmicHitRemoval" hit can be mapped into the corresponding element of the "gaushit" collection, and then the 'direct' association can be used to find the associated MCParticles. A code snipped is provided below [*].

 // gaushit collection and direct association
 art::InputTag HitInputTag("gaushit");
 art::InputTag HitTruthInputTag("gaushitTruthMatch");
 art::ValidHandle<std::vector<recob::Hit> > inputHits = e.getValidHandle<std::vector<recob::Hit> >(HitInputTag);
 std::unique_ptr<art::FindManyP<simb::MCParticle,anab::BackTrackerHitMatchingData> > assocMCPart = std::unique_ptr<art::FindManyP<simb::MCParticle,anab::BackTrackerHitMatchingData> >(new art::FindManyP<simb::MCParticle,anab::BackTrackerHitMatchingData>(inputHits, e, HitTruthInputTag));

 // cosmic removal hit collection
 art::InputTag CRHitInputTag("pandoraCosmicHitRemoval");
 art::ValidHandle<std::vector<recob::Hit> > inputCRHits = e.getValidHandle<std::vector<recob::Hit> >(CRHitInputTag);

 // map the crhits to gaushits, crHitRemovalTruthMatch is buggy in mcc8!
 std::vector<int> crindex(inputCRHits->size()); /// <= this is the map!!!
 for ( unsigned int ic=0; ic<inputCRHits->size(); ic++) {
   const auto& crhit = inputCRHits->at(ic);
   for ( unsigned int ig=0; ig<inputHits->size(); ig++) {
     const auto& gahit = inputHits->at(ig);
     if (crhit.Channel()==gahit.Channel() && std::abs(crhit.PeakTime()-gahit.PeakTime())<0.000001) {
         crindex[ic] = ig;

 // now loop over the CR hits and find the associated MCParticles
 for ( unsigned int ih=0; ih<inputCRHits->size(); ih++) {
   auto assmcp = assocMCPart->at(crindex[ih]); /// <= get the associated MCParticles through the remap!

7. Has your sample been passed through the data-quality monitoring?

While most samples used by analysers are already passed through the basic form of data-quality monitoring (DQM), it never hurts to check.

First look at the sam definition name you're using. Some of them will say "DQM" or "GoodRuns", etc.

One of the ways to check if it has been applied is to simply do:

samweb describe-definition defname

This should return meta-data regarding the definition, including how it is defined. If you check "Dimensions", this will tell you how the definition was constructed. Quite often when the DQM parameters are applied you will see:

and defname: dqm_neutrino2016_bnbextrevised

This parameter indicates that the runs included in the sam definition also include runs which pass the DQM.

The notable exception to this is the MCC8.7 overlay sample (prodgenie_bnb_nu_uboone_overlay_mcc8.11_reco2), for which the input EXT unbiased dataset was not passed through DQM prior to generation of the sample. Analyzers should make sure they apply the good runs list, which can be found in plain-text at docdb-13933, to this overlay sample. Analyzers should also double-check this for any future MCC8 overlay samples.

8. If using MCC8 overlay (such as prodgenie_bnb_nu_uboone_overlay_mcc8.11_reco2), have you properly counted the POT in your sample?

There are two important things to keep in mind when doing POT counting for overlay samples.

  • The DQM criteria may not have been applied to the input EXT unbiased dataset used to make an overlay sample. Runs that do not pass DQM should not be analyzed, and therefore should not be counted towards the sample POT.
  • There is a bug in the POT accounting for overlay files with multiple subruns. The bug can be summarized as follows: When the subrun number advances, the sumdata::POTSummary data product that most analyzers (as far as I know) use to do POT accounting is filled based on the value of a POT counter. However, that POT counter is not reset. This means that the POT recorded for each subrun is actually the total POT for all subruns in that file so far (i.e., POT for the first subrun is correct, POT for the second subrun is the sum of the POT for the first and second subruns, ...). The result is that the correct POT for the file is given by the POT recorded for the final subrun in the file. The sample POT should be calculated based on this. Note that extra care must be taken when counting the sample POT based on files produced form multiple input overlay files (e.g., the output of grid jobs run with multiple files per job). (Anyone who figures out a good way to do this should add an outline of their method here.) Somewhat more detailed documentation is available at docdb-17641.
  • Update A way to fix this is to put a counter in your analyzer that is set to 0 every time you access a new file in this way:
    void yourAnalyzer::respondToOpenInputFile(art::FileBlock const &fb)
        _sum_pot = 0;
    bool yourAnalyzer::endSubRun(art::SubRun &sr)
        _pot = potListHandle->totpot - _sum_pot;
        if (m_isOverlaidSample) {
            _sum_pot += _pot;