For separating pion and proton tracks, is it correct that only the mean dE/dx is used? Why don't you consider the full dE/dx vs range profile? This should work well to distinguish pions from protons. Perhaps this issue is taken care of by the BDTG step. We do use the full dE/dx vs range profile in the selection. This is described in section 2.1 when we talk about the "MIP-like" requirement. The track is flagged as MIP-like is the full dE/dx vs range profile is consistent with expectations from muon or pion. Though this does not work well for exiting particles. We added the following sentence for clarification: this means that the track was either labeled as a muon or a pion based on the full dE/dx vs range profile. There are two calorimetry related variables in the BDTG: the number of "MIP-like" tracks and the average dEdx of the pion candiate track.
Some numbers in Table 1 don't pass a simple sanity check. (Contained and reconstructed as MIP-like track + Not contained and reconstructed as MIP-like track) 37+76!=100. They are not supposed to add up to 100% since they are independent samples. Contained and reconstructed as MIP-like track + contained and reconstructed as non-MIP-like track should add up to 100%. The purpose is to demenstrated most of the contained protons are flagged as protons and most of the exiting protons as flagged as MIP-like particles.
What does "Non-resonant events" refer to in Figure 2? GENIE caterizes events into 5 different channels: QE, resonance pion production, DIS, coherent pion production and MEC. Here "non-resonant events" means all channels except resonance pion production. This is now clarified in the note: "Non-resonant events" means the events produced in one of the CC channels: QE, DIS, coherent pion production and MEC.
I notice that no unfolding is used for any of the measurements reported (muon momentum, muon angle wrt neutrino direction, pion angle wrt neutrino direction, and angle between muon and pion). In order to justify not unfolding, you must show the reconstructed and true distributions in terms of all of these variables for simulated events and demonstrate that unfolding is not necessary. We do unfolding in this analysis in the sense that the numerator of efficiency is calculated using the reconstructed quantity while the demoninator of efficiency is calculated using the truth quantity. It is a standard approach to correct for both efficiency and smearing. Note we have the following sentence in the note: Using reconstructed information in the numerator and truth information in the denominator, the process is able to take into account and correct for detector smearing.
Also, it is important to see what the expected resolutions are in terms of the reconstructed variables. We have added the following plots in the note to show the resolution of the kinematic variables.
Is the efficiency reported in Figure 6, top right in consideration of all analysis cuts? No, this is the definition of the efficiency in the note: The detection efficiency is defined as the ratio between the distribution of the truth value of a pion variable (momentum) in all truth CC 1$\pi$ events when the pion is reconstructed as a track and the distribution of the same variable in all truth CC 1$\pi$ events, requiring in both cases only the neutrino interaction vertex to be contained in the ArgoNeuT TPC fiducial volume. It is basically the track reconstruction efficiency. We tweaked the definition a little to make it clear.
Why are true NC events considered when reporting efficiency (and in Table 2)? Instead of NC+CC, shouldn't the denominator be "number of true charged current events with an interaction vertex featuring a single pion, with no kaon or neutral pion and no requirement on the number of nucleons"? We include NC events in Table 2 to show the event composition. Most of the NC events are removed by the MINOS matching cuts. NC events are not included in the calculation of efficiency. See the efficiency definition in the above response.
From Table 2, I notice that a lot more neutrino events are removed by the "MINOS match" requirement than antineutrino events. Is this understood? In the antineutrino beam, neutrinos events have higher energy, thus more DIS interactions with lots of tracks. It is hard to reconstruct the muon track in the busy enviroment and that's why there are fews neutrino events passing MINOS matching cut.
In Figure 9, it seems that the ratio of signal:background is much lower with neutrinos than antineutrinos. According to the text, this seems to simply be an issue with the number of events simulated--and this is corrected by applying a weighting factor of 0.25 to neutrino background events. However, it doesn't seem like this was propagated to the plots in Figure 9(?) The neutrino events are mostly DIS interactions due to the higher energy. For the anti-neutrino events, roughly 1/3 of the events are resonance pion production. This is the reason anti-neutrinos have much higher signal to background ratio compared with neutrinos. We apply a weight in the BDT training so the training won't completely focus on the background event. Figure 9 shows the real signal/background predictions from GENIE without any weights/tuning, which just reflects the proor modeling of pion production.
How are correlations between bins accounted for in the TFractionFitter procedure? I don't fully understand the procedure. In particular, it is not clear to me why one needs to invoke a root class to normalize the distributions to match the normalization of the data. TFractionFitter takes into account both data and Monte Carlo statistical uncertainties. The way in which this is done is through a standard likelihood fit using Poisson statistics; however, the template (MC) predictions are also varied within statistics, leading to additional contributions to the overall likelihood. We actually tried both TFractionFitter and another method to just scale signal and backgorund templates and got consistent results.
Just to reiterate my previous comment: the flux normalization uncertainties for neutrinos and antineutrinos should be treated more carefully. Given central values and a correlation matrix from MINERvA, this should be fairly simple to extract. Even though this is a statistics-dominated measurement, I think it is important to get the largest systematic correct, especially since I don't think it is very difficult to do so (given info from MINERvA). Based the MINERvA flux paper (https://arxiv.org/abs/1607.00704), we have changed the flux normalization uncertainty from 11% to 9.7% for neutrino flux and 7.8% for antineutrino flux. The systematic errors reduced from +18.7-18.6% to +18.2-18.0 for neutrinos and from +11.6-13.4% to +9.1-10.0% for antineutrinos.