Project

General

Profile

How to do an analysis

This wiki is intended to detail how to perform an analysis of MicroBooNE data and MC. It should cover topics that every analyser will need to deal with.
It is very much under construction. Andy Furmanski is currently building it, so send him ideas for things to include (or just add them!)

Performing a data analysis on MicroBooNE is complex. If you are lost in a sea of new words, see here https://microboone-docdb.fnal.gov/cgi-bin/private/ShowDocument?docid=4476

Analysis Checklist (common pitfalls and how they can be addressed)

Here is a curated list of common analysis pitfalls and how analyses can work around them.

These are shared across all analyses and should be addressed and answered before your analysis enters review:

*Checklist*

Getting the right data and MC, and normalising them

Run Periods

MicroBooNE has several different periods where the detector was in different conditions. This table summarises these.
The Prodhistory page keeps also good track of run periods and raw data epochs.

MC samples

We know that there are large uncertainties when modelling neutrino interactions on argon. For this reason we produce multiple MC samples with different GENIE model configurations. For MCC8, there were 3 "tunes" produced - see [[https://redmine.fnal.gov/redmine/projects/uboonecode/wiki/GenieTunes]] for more information, and/or docdb 14958.
Speak to your conveners if you are unsure which it makes sense to start with (but Tune1 is a pretty good starting point...)

Normalisation of data and MC samples

Instructions on how to normalise data and MC samples are available here: [[https://microboone-docdb.fnal.gov/cgi-bin/private/ShowDocument?docid=15204]].

There are many considerations when comparing data and MC samples with sensible normalisations.

Usually, we like to normalise things by the corresponding POT (protons on target). Calculating this number, however, is not so simple. In addition, one usually needs to subtract off-beam data, or add in cosmic MC, to produce a realistic comparison, and for this you need to calculate an additional scaling which depends on the average beam intensity.
https://microboone-docdb.fnal.gov/cgi-bin/private/ShowDocument?docid=5640 details how these are calculated for the 5e19 open sample. For other samples (smaller or larger) one needs to follow the same method, but use the python script getDataInfo.py to calculate the total number of triggers and POT in each sample.

POT counting in data after optical filter using endSubRun

The optical filter removes events where little to no light was deposited in a time window around the beam spill. This is done in order to reduce the number of events one has to run over. This leads to having less events in the final files, which may affect the POT counting. In order to correctly estimate the POT for the analysed sample, one should use the beginSubRun() of endSubRun() methods, that one can call from a producer, filter, or analyser module. Let's assume you have an analyser module for your analysis. You'll have to add the void endSubRun(const art::SubRun &sr) override; in the declaration of the required functions. Then, at the end of your module, add

void MyAnalyzerName::endSubRun(const art::SubRun& sr)
{
    std::cout << __PRETTY_FUNCTION__ << " Ended run " << sr.run() << ", subrun " << sr.subRun() << std::endl;
}

then save the sr.run() and sr.subRun() numbers to a tree (which will be filled per subrun) or dump them to txt file. In this way you will have the run and subRun numbers you have run over. This is all one needs to retrieve the POT exposure. Even if the filter removed all events in a subRun, the endSubRun() method will still be called, and so the information on the run and subRun number is not lost.
This method allows to get the POT number on the events you actually run on (those that survived the grid submission, for example).
Finally the POT number can be retrieved by using Zarko's script /uboone/app/users/zarko/getDataInfo.py --run-subrun-list runsubrun_list.txt. For example, if you dumped all the run and subrun in a txt file called runsubrun_list.txt, you will do:
/uboone/app/users/zarko/getDataInfo.py --run-subrun-list runsubrun_list.txt

You can then read the POT under the variable tor860_wcut.

POT counting in data after optical filter using SAM

You can use SAM isparentof clauses to find the grandparents of a set of filtered files (grandparents because of separate filter+merge steps):

$ samweb list-files --summary "defname: prod_reco_optfilter_bnb_v11_mcc8 and run_number 5500" 
File count:     11
Total size:     3077314432
Event count:    171

$ samweb list-files --summary "isparentof: ( isparentof: ( defname: prod_reco_optfilter_bnb_v11_mcc8 and run_number 5500 ) with availability anylocation )" 
File count:     11
Total size:     16423167976
Event count:    388

PMT timing offsets

Due to an intricacy of the PMT readout system, there is a delay between the (hardware) trigger time and the start of the unbiased readout window. This delay is different for different trigger types - 2 ticks for BNB, 4 ticks for NUMI, and 26 ticks for EXT.

The software trigger algorithm calculates a number of ticks from the start of the unbiased window, and therefore the software trigger window is in a slightly different place in on-beam and off-beam data. In MC samples the timing is designed to be close to the measured beam time, but due to uncertainties in the measurement it may not be exact.

Because of this, when applying cuts to "in-time" flashes, one must shift the definition of the beam window, as hit and flash times are given relative to the (hardware) trigger time. See slide 3 on https://microboone-docdb.fnal.gov/cgi-bin/private/RetrieveFile?docid=7066&filename=MCC8_validations_xsec_related.pdf&version=5 for MCC7 and MCC8 beam windows.

Understanding the data format inside a LArSoft file

LArSoft uses artroot files. If you don't know how to read information from these, there are some examples at https://cdcvs.fnal.gov/redmine/projects/larexamples/wiki
This covers simple analysis (getting data products) and using LArSoft services and algorithms.

What's in a MicroBooNE final reco file

Getting simple data products (tracks, showers, etc)

Matching reco to truth, and using associations

Current best producers

As of October 2017, the reconstruction group advise using the following producers for analysis

Flashes - simpleFlashBeam and simpleFlashCosmic
Tracks -
Showers -

Using AnalysisTree

An analysis tree is a straight ROOT TTree containing much information about each event.
The AnalysisTree_variables page has more information about how they are created, what they contain, and a link to a 2014 tutorial on using analysis trees.

Using Gallery

The art gallery software is a lightweight library that allows one to read LArSoft-written data files using only the experiment-provided ROOT dictionaries for the data classes. For more information on gallery, please go to: https://github.com/marcpaterno/gallery-demo or https://indico.fnal.gov/getFile.py/access?sessionId=16&resId=0&materialId=0&confId=11857.

Event Displays

There are several event displays available for MicroBooNE simulated/physics data:

LArSoft Event Display
Setup a uboonecode release and run

 lar -c evd_ub_data_truncated.fcl path/to/my/artroot/file.root 

_Note that this fcl file is only correct through the MCC8 release, possibility of different fcl files in newer releases.

Gallery Event Display (no installation required, works with art-root files)
In a clean environment, just run

source /uboone/app/users/cadams/static_evd/setup.sh

then you are good to go.
To open the event display, run
evd.py -t /path/to/my/artroot/file.root

where the -t symbol tells the event display to use the microboone geometry and is compatible with showing truncated waveform.
Or run
evd3D.py -u /path/to/my/artroot/file.root

for the 3D event display. In this case the -t option is not present, and -u means to use the microboone geometry.

Argo
Visit http://argo-microboone.fnal.gov/ and look up the data file of interest.

The Overlay samples

The overlay sample is a first-of-a-kind Monte Carlo sample in which the simulated beam signal using GENIE is overlaid on a cosmic background event from real data. You can learn all about it in the Overlay Technical Note

The new overlay sample is now available under the samweb definition name: prodgenie_bnb_nu_uboone_overlay_mcc8.11_reco2, and from now on will be updated
in the production page whenever a new version will be released.

The overlay sample has several special properties an analyser should account for:

  1. Note that the isData flag is true for overlay events.
  2. As the cosmic part of the overlay sample is coming from data, it will not have an MC particles associated to its objects (hits, flashes, tracks). One must add the following
    if statement to the analysis module just after finding the art::Ptr<simb::MCParticle> maxp_me associated to a track:
    if ( maxp_me.isNull()) { these are unmatched tracks, assumed to be cosmic}
    else {these are matched tracks, assumed to be originated from the beam}
    
  3. Due to the backtracker algorithm, some of the cosmic induced hits will be associated to an MC particle from the beam by accident. We therefore recommend adding an additional check for each track to quantify its purity, which is the fraction of its
    charge associated to the MC particle and in case it is less than 0.1 reject that association. This can be done easily using the BackTrackerTruthMatch class available
    in uboonecode with the following lines added to your analysis module:
    std::vector< art::Ptr<recob::Hit> > trk_hits_ptrs = hits_per_track.at(i_t);
    BackTrackerTruthMatch backtrackertruthmatch;
    backtrackertruthmatch.MatchToMCParticle(hit_handle,e,trk_hits_ptrs);
    art::Ptr< simb::MCParticle > maxp_me = backtrackertruthmatch.ReturnMCParticle();
    

    This backtracker application will return an associated MC particle only in case
    the purity of this association is larger than 0.1. One can find the value of the purity
    by using the following line:
    double purity = backtrackertruthmatch.ReturnPurity();
    
  4. The beam flash time window is shifted, in case you use it as a filter we recommend
    changing it to be for example:
    physics.producers.NuMuCCSelectionII.NuMuCCSelectionIIAlg.BeamMin : 3.6
    physics.producers.NuMuCCSelectionII.NuMuCCSelectionIIAlg.BeamMax : 5.2
    
  5. The overlay sample is using the data calibration for the tracks,
    physics.producers.NuMuCCSelectionII.NuMuCCSelectionIIAlg.GainCorrections :
    @local::microboone_calorimetryalgmcc84data.CalAreaConstants
    

With these minor changes, you're good to go with your regular analysis on the overlay samples.

Final stages - Calibrating, correcting, and dealing with systematics

Reconstruction corrections

Calibrations

MC reweighting

Correction weights

For MCC8, at production a "correction" weight is calculated to account for the fact that the beam simulation incorrectly accounts for re-decay of muons to produce electron neutrinos (this description might not be 100% accurate, but the point is it's wrong but we can fix it).
This is stored as "bnbcorrection" and should be applied to all events but particularly when making absolutely normalised nu_e event distributions.

Systematic variation weights

A wiki describing how to run the EventWeight package to produce systematic variation weights for GENIE and beam variations can be found here
https://cdcvs.fnal.gov/redmine/projects/uboonecode/wiki/MCEventWeight

Detector variations

To estimate detector systematics at the moment (Nov 2017) the plan is to produce special MC datasets with modified detector parameters. It is key here to use the same events such that there is no statistical variation between the event samples. The workflow for producing these is described at the following link:
https://cdcvs.fnal.gov/redmine/projects/uboonecode/wiki/Generating_MCC8_Detector_Systematic_and_Reco_Variation_Samples

Important/useful computing information

How to find the data and MC files you need using SAM

How to submit analysis jobs to the grid

How to use Sam4users to store your files on tape (and best practices)

Other notes on Sam and using xrootd- https://cdcvs.fnal.gov/redmine/projects/uboonecode/wiki/Sam

How to make a merged analysis tree from a samweb definition containing a specific number of events

Running GENIE with different models in LArSoft - https://microboone-docdb.fnal.gov/cgi-bin/private/ShowDocument?docid=6045