Retraining the muon PID

This page needs to be updated! The information below is correct but incomplete. The next person to do this needs to smooth this out, fill in the wiki and commit any new macros that they write to the appropriate place.


(Taken from an old document written by M.Baird found here BreakPointFitter/macros/BPF_training_notes.txt)

==================== START OF NOTES FROM THE OLD TEXT FILE ====================

This file is to serve as instructions for how to run all of the Break Point Fitter retraining. The steps listed below must happen in order although one can choose to start at either of the major stages (if for example you wanted to keep the current dE/dx histos and just retrain with new genie MC.) The major stages of training are as follows:

Stage 1: Regenerate the dE/dx log-likelihood histograms (used for the muon PID.)

Stage 2: Retrain the BPF muon PID.

The specific steps for each of these stages are outlined below.

STAGE 1: Generating the dE/dx log-likelihood histograms (see docDB #12694 for a description of this method)

The steps listed below are very general. You can put together your own scripts or follow the model and scripts I have here:

/nova/app/users/mbaird42/THESIS/FA14-10-03x.a_dEdxLLhisto_sample_generation/ (used to generate the simulated single particle samples)
/nova/app/users/mbaird42/THESIS/S15-05-04a_dEdxLLhisto_reco_and_ana/ (used for reco and to make the dE/dx LL histos)

NOTE: The above two directories are quite old and new scripts for the single-particle-gen will likely have to be written.

1. Generate the 8 single particle gen samples needed to make the dE/dx LL histos. The single gen fcl parameters can be found in BreakPointFitter/macros/BPF_dEdxLLhisto_single_part_gen_params.txt

2. Run the appropriate reco over the generated samples. This includes CalHit, Slicer4D, MultiHough, ElasticArms, FuzzyK, and BreakPointFitter. This should be followed by the BPF module BPFdEdxHistoMaker. Ultimately, you only need to keep the TFile output from the BPFdEdxHistoMaker module, so if you are doing all of this in the same tag (and you are feeling brave) you can string everything from the single gen through the dEdx hisot making all into one job.

3. Once all grid jobs are complete, the last step is to combine the results together. This is done with the following steps:

a) hadd samples 1-4 together into a file of "all muon results"
b) hadd sampels 5-8 together into a file of "all proton results"
c) copy the script BreakPointFitter/macros/add_and_normalize_LL_histos.C to the local area with the hadded files and edit it to point to the new files
d) run that script to properly combine the results and normalize the resulting histos
e) copy the file "temp_output.root" to the /nova/ana/users/mbaird42/BreakPoint/official_training_files/{tag} directory and rename it appropriately (see the file names in the other tags for examples)

STAGE 2: Retrain the BPF muon PID

1. Modify BPFTmvaTrainer.fcl and BPFEnergyEstimator.fcl in your local copy of BreakPointFitter to point to the new dEdx LL histo file created in stage 1.

2. Run the BPFTmvaTrainer module with the "TrainMode" fcl parameter set to 0 (for muon training.) For this I used the 14db ideal conditions MC only.

3. Once all grid jobs are complete, hadd the results into one file.

4. Use the script "make_BPF_muonPID_KNN.C" (modified to point to the recently hadd-ed file) to make the new kNN weight file.

NOTE: The BPF muon PID has now been trained to use a BDT. The above script will only train it as a KNN. See docDB 16575 for more information. A NEW TMVA ROOT MACRO WILL NEED TO BE WRITTEN (but the above one can be used as a model.) I believe that the current BDT was run with the "out-of-the-box" training configuration from the TMVA BDT. Someone SHOULD go through and do a study to optimize the training of this BDT (which was never actually done...)

5. Copy the file "weights/BPF_muon_PID_KNN_80*.xml" to the /nova/ana/users/mbaird42/BreakPoint/official_training_files/{tag} directory and rename it appropriately (see the file names in the other tags for examples.)

NOTE: Obviously, the file won't have the name "KNN" in it...

If desired, all of the new files can be rolled into a version controlled external package (contact the current manager of the NOvASoft code on how to do this.)

==================== END OF NOTES FROM THE OLD TEXT FILE ====================

Additional info from Jonathan Davies' retraining of rEMiD in the same way can be found in docDB's 17482, 17848.