# ModularExtrap¶

This page is an in-progress port of the technote in docdb 12563

This page describes the process used to predict nue and numu spectra in the NOvA far detector using data from the NOvA near detector. Specifically detailed is `ModularExtrap`

, which extrapolates decomposed near-detector data into spectra that can be combined with oscillation probabilities to form far-detector predictions. Also presented are descriptions of and figures demonstrating the validation of `ModularExtrap`

, as well as a note on running `ModularExtrap`

.

# Introduction¶

## Overview¶

The goal of the nue extrapolation is to make a prediction of far-detector event rates

Here, *F* indicates a far-detector event rate, *Pred* indicates that the rate is computed by the extrapolation, *S*_{e} indicates that nue selection has been applied, and *B*_{e} are the bins of the reconstructed variable of the nue analysis, indexed by *j*. (For a full description of the notation, see Appendix A.)

Similarly, the goal of the numu extrapolation is to compute the rates

*S*_{μ} indicates that numu selection has been applied, and *B*^{μ} are the bins of the reconstructed variable of the numu analysis, indexed by *j*.

The far-detector event rates are predicted by extrapolation from the near-detector event rates. This is accomplished in three steps, by three (confusingly-named) `CAFAna`

classes, in order: `IDecomp`

, `ModularExtrap`

, and `PredictionExtrap`

. Working backward from the goal, they will be discussed in reverse order.

## PredictionExtrap¶

The above rates, are computed by:

Here, α→β indicates a neutrino-flavor transition, *E*^{T} are the true-energy bins, indexed by *i*, and *P* is an oscillation probability. *S* and *B* represent either the nue or numu selection and binning, respectively.

The oscillation probability depends on the flavor transition, central value of the true-energy bin, and the neutrino-oscillation parameters. The *F*_{α→β}^{Pred} terms are constructed by assuming *P*_{α→β}=1, and are thus independent of the neutrino-oscillation parameters.

Factoring the computation this way is useful because predictions can be made at many different values of the oscillation parameters without recomputing the *F*_{α→β}^{Pred} terms. Predictions at many different values of the oscillation parameters are used when generating surfaces, finding contours, or fitting in parameter space.

The double-sum is computed by the `CAFAna`

class `PredictionExtrap`

, which uses input from an oscillation calculator and `ModularExtrap`

.

`ModularExtrap`

¶

It is the goal of `ModularExtrap`

to compute the terms:

Computation in `ModularExtrap`

is carried out independently for each flavor-transition channel. For each channel there is a choice of extrapolation method, but there are fewer methods than channels. That many channels reuse the same extrapolation-method code is the modular nature of `ModularExtrap`

. A description of the methods and the channels that use them is found in the Sections Nue Extrapolation and Numu Extrapolation.

Input to `ModularExtrap`

is provided directly by the Monte Carlo files and by the output of a decomposition implementing the `IDecomp`

interface.

`IDecomp`

¶

`ModularExtrap`

does not use near-detector data directly, but instead relies on the result of decompositions. Each decomposition estimates the neutrino-flavor makeup of each reconstructed-variable bin of the near detector data. `IDecomp`

is an abstract interface; `ModularExtrap`

is independent of the choice of decompositions.

The nue decomposition, used for the nue analysis, returns terms of the form:

Here, *N* indicates a near-detector event rate, *Data* indicates decomposed near-detector data, and α indicates neutrino flavor in the near detector.

The numu decomposition, used for both analyses, returns terms of the form:

# Nue Extrapolation¶

## Signals¶

There are two signal channels in the nue analysis: μ→e and numubar→nuebar. These are extrapolated by reweighting by true-energy bin:

Here, *MC* indicates a quantity computed directly from Monte-Carlo files. In each true-energy bin, the Monte-Carlo far-detector rate has been reweighted by the ratio of a computed (see Equation \ref{eq:ndnumu}) rate of near-detector numu events and the Monte-Carlo rate of near-detector numu events.

For the signal channels, the far-detector Monte-Carlo rates are not reweighted by reconstructed-variable bin because, for these channels, the near detector uses the numu reconstructed variable, while the far detector uses the nue reconstructed variable. Even if both variables represent energy and have the same binning, the estimators are not guaranteed to be compatible.

In each true-energy bin, the computed rate of near-detector numu events is given by reweighting by reconstructed-variable bin:

Here *k* indexes bins of the numu reconstructed variable. This operation is equivalent to the application of a reconstructed-to-true matrix to the decomposed near-detector spectrum.

Notice that for these channels, the numu selection has been applied in the near detector and the nue selection has been applied in the far detector. This is because the nue-analysis signal consists of muon neutrinos that have oscillated to electron neutrinos en route to the far detector.

## Major Backgrounds¶

The major nue background channels are e→e, μ→μ, and NC→NC. These are extrapolated by reweighting by reconstructed-variable bin:

For the major background channels, the far-detector Monte-Carlo rates are not reweighted by true-energy bin because these channels are populated by events that have been misidentified. There is no reason to expect energy estimators or other reconstructed variables to perform well on these events.

Notice that for these channels, the nue selection has been applied in both detectors. For the e→e channel this is the obvious selection. For the nue analysis, the μ→μ and NC→NC channels consist of misidentified events. Using the nue selection in the near detector for these channels ensures that misidentified events are extrapolated from misidentified events, so particle-identification failure rate is correctly extrapolated to the far detector.

## Minor Backgrounds¶

The remaining channels (e→μ, e→τ, μ→τ, and all channels involving anti-neutrinos are minor backgrounds, and the far-detector prediction is trivially derived from the Monte-Carlo:

The minor backgrounds have very low event rates, so there are insufficient statistics to use even the reconstructed-variable reweighting. Because the minor backgrounds have very low event rates, the effect of non-extrapolated systematic errors is also very low.

# Numu Extrapolation¶

## Signals¶

There are two signal channels in the numu analysis: μ→μ and numubar→numubar. Just as in the nue analysis (Signals), these channels are extrapolated by reweighting by true-energy bin:

Here, *N*_{α,Sμ}^{Pred}(*E*_{i}^{T}) is computed exactly as in the nue analysis (Equation \ref{eq:ndnumu}).

## Backgrounds¶

All of the backgrounds to the numu analysis are minor, and just as with the minor nue backgrounds (Nue Minor Backgrounds), the far-detector prediction is trivially derived from the Monte-Carlo:

# Appendix A: Notation¶

Extrapolation necessarily deals with two detectors; Monte Carlo, data, and predicted event rates; thirteen different flavor-transition channels; two different selections; and true, nue-reconstructed, and numu-reconstructed energy bins. Notation is almost guaranteed to be unwieldy.

## Symbols¶

The symbol *N* represents an event rate in the near detector and *F* represents an event rate in the far detector. All rates are normalized to data POT in the appropriate detector.

The superscript *MC* indicates that the rate is directly derived from Monte Carlo. The superscript *Data* indicates that the rate is directly derived from the decomposition, which is essentially near-detector data with estimated flavor information attached. The superscript *Pred* indicates that the rate is a prediction calculated by the extrapolation; the method of such calculation is described explicitly in this document.

The subscript α indicates that the near-detector rate is only for a certain neutrino flavor. For *MC* rates, this selection is made using truth information. For *Data* rates, this selection is done by the decomposition. For the purposes of the extrapolation, neutral current is considered as a separate, non-oscillating flavor. There are vanishingly few τs in the near detector (the Monte Carlo does not include near-detector τs at all), so:

The subscript α→β indicates that the far-detector rate is for only a certain neutrino-flavor transition channel. The rate is computed under the assumption that the flavor transition occurs with probability one, and is thus independent of any neutrino-oscillation parameters. The far-detector Monte Carlo files are generated under the same assumption, so *MC* rates can be derived directly. In the far detector, neutral current is again considered as a separate, non-oscillating flavor, so:

The subscript *S* indicates that a selection has been applied, encompassing the appropriate pre-selection, selection, and containment. *S*_{e} indicates the nue selection, and *S*_{μ} indicates the numu selection.

Parenthetical terms indicate restriction to energy bins. *E*^{T} are true-energy bins; *B*^{e} and *B*^{μ} are reconstructed-variable bins for nue and numu, respectively. The *B* are fully general: they may be energy bins, PID bins, bins of some other reconstructed variable, or even multi-dimensional bins of several variables. The subscripts *i*, *j*, and *k* index bins.

## Reweighting¶

Nearly all of the computation in `ModularExtrap`

has the following form:

Notice that, trivially:

The second expression describes a reweighting of far-detector Monte Carlo by a near-detector data/MC ratio. The third expression describes a reweighting of near-detector data by a Monte-Carlo F/N ratio. It is useful to think of these reweightings in both ways simultaneously; the notation of the first expression is chosen for this document to emphasize this duality.

# Appendix B: Validation¶

Validation of `ModularExtrap`

is done in two parts: in-and-out testing and testing on real data. In-and-out tests are performed by passing near-detector Monte Carlo (fake data) to the decompositions in lieu of near-detector data. In this case, the extrapolation should apply no corrections and the far-detector prediction should match the far-detector Monte Carlo exactly. Real-data testing is performed by running the extrapolation as designed and checking that corrections are applied appropriately. For the remainder of this appendix, figures will be presented in pairs, with the left figure generated with fake data and the right figure generated with real data.

The files, cuts, estimators, and decompositions used to generate figures for this appendix are representative of but not exact matches to those used for the first analysis. `ModularExtrap`

is independent of the exact files, cuts, estimators, and decompositions passed to it. In particular, when passed identical near-detector data and Monte Carlo, `ModularExtrap`

will return exactly the far-detector Monte-Carlo prediction, no matter how bizarre the inputs. This property can be seen for the nue analysis from examination of the simple cancellation in Equations \ref{eq:sigrw}, \ref{eq:ndnumu}, and \ref{eq:bgrw}, along with Equation \ref{eq:nueminbg} and the note in Appendix C. The property similarly holds for the numu analysis.

For the remainder of this appendix, all far-detector plots are scaled to 1e20 protons on target.

## Example Prediction¶

In this section, the final nue prediction is given as an example. The numu prediction is similar.

Figure \ref{fig:osc} displays predicted far-detector event rates (Equation \ref{eq:nuegoal}), oscillated according to nominal oscillation parameters. Figure \ref{fig:unosc} displays predicted far-detector event rates under the no-oscillation hypothesis.

Far-Detector Prediction, Oscillated \label{fig:osc}

Far-Detector Prediction, Unoscillated \label{fig:unosc}

## Example Signal Channel¶

In this section, the nue $\mu\rightarrow e$ channel is given as an example. The other signal channels are similar.

Figures \ref{fig:signdreco}, \ref{fig:signdrtt}, and \ref{fig:signdtrue} show $N_{\mu,S_{\mu}}^{Data, \textcolor{red}{MC}}(B_{k}^{\mu})$, $N_{\mu,S_{\mu}}^{MC}(E_{i}^{T},B_{k}^{\mu})$, and $N_{\mu,S_{\mu}}^{\textcolor{blue}{Pred}, \textcolor{red}{MC}}(E_{i}^{T})$, respectively. These figures illustrate the near-detector reco-to-true reweighting (Equation \ref{eq:ndnumu}).

μ→e Channel Near-Detector Numu-Reconstructed Energy Spectra

μ→e Channel Near-Detector Reconstructed-to-True Spectrum (Monte Carlo) \label{fig:signdrtt}

μ→e Channel Near-Detector True-Energy Spectra \label{fig:signdtrue}

Figures \ref{fig:sigfdtrue}, \ref{fig:sigfdttr}, and \ref{fig:sigfdreco} show $F_{\mu\rightarrow e,S_{e}}^{\textcolor{blue}{Pred}, \textcolor{red}{MC}} (E_{i}^{T})$, $F_{\mu\rightarrow e,S_{e}}^{MC}(E_{i}^{T},B_{j}^{e})$, and $F_{\mu\rightarrow e,S_{e}}^{\textcolor{blue}{Pred}, \textcolor{red}{MC}} (B_{j}^{e})$, respectively. These figures illustrate the far-detector reweighting by true energy (Equation \ref{eq:sigrw}).

μ→e Channel Far-Detector True-Energy Spectra \label{fig:sigfdtrue}

μ→e Channel Far-Detector True-to-Reconstructed Spectrum (Monte Carlo) \label{fig:sigfdttr}

μ→e Channel Far-Detector Nue-Reconstructed-Energy Spectra \label{fig:sigfdreco}

## Example Major Background Channel¶

In this section, the NC→NC channel is given as an example. The other major background channels are similar.

Figures \ref{fig:bgn} and \ref{fig:bgf} show $N_{NC,S_{e}}^{Data, \textcolor{red}{MC}}(B_{j}^{e})$ and $F_{NC\rightarrow NC,S_{e}}^{\textcolor{blue}{Pred}, \textcolor{red}{MC}} (B_{j}^{e})$, respectively. These figures illustrate the far-detector reweighting by reconstructed variable (Equation \ref{eq:bgrw}).

NC→NC Near-Detector Nue-Reconstructed-Energy Spectra \label{fig:bgn}

NC→NC Far-Detector Nue-Reconstructed-Energy Spectra \label{fig:bgf}

Figures \ref{fig:bgfn} and \ref{fig:bgdmc} illustrate that data/Monte Carlo and far/near ratios match exactly for this channel, as expected. In Figure \ref{fig:bgfn}, discrepant behaviour is explained by empty bins.

The bin centered at 1.25 GeV contains no events in the near-detector data or Monte Carlo. The far-detector prediction in this bin is just the far-detector Monte Carlo (see Appendix C); a warning is issued.

The bin centered at 3.25 GeV contains no events in the far-detector Monte Carlo and consequently no events in the far-detector prediction. This is the desired result, but of course computation of the prediction/Monte Carlo ratio fails in this case.

NC→NC Ratios to Monte Carlo \label{fig:bgfn}

NC→NC Far-to-Near Ratios \label{fig:bgdmc}

# Appendix C: Code¶

## Constructors¶

`ModularExtrap`

objects are created using the named-constructor idiom:

static ModularExtrap Nue( Loaders& loaders, const IDecomp& nueDecomp, const IDecomp& numuDecomp, const HistAxis& axis, const HistAxis& axisNumuND, const Cut& fdcut, const Cut& nueNDcut, const Cut& numuNDcut, const SystShifts& shiftMC = kNoShift, const Var& weight = kUnweighted );

static ModularExtrap Numu( Loaders& loaders, const IDecomp& numuDecomp, const HistAxis& axis, const Cut& fdcut, const Cut& ndcut, const SystShifts& shiftMC = kNoShift, const Var& weight = kUnweighted );

The arguments are typical for a `CAFAna`

analysis object. Note that the nue analysis needs additional information so that the signal channels can handle Numu selection and binning in the near detector. Example `CAFAna`

macros are found in `CAFAna/test`

.

## Division by Zero¶

Equations \ref{eq:sigrw}, \ref{eq:ndnumu}, \ref{eq:bgrw}, and \ref{eq:numusigrw} contain division. `ModularExtrap`

implements a fallback to no reweighting for bins where the denominator is zero. Using the notation of Appendix \ref{ss:rw} to stand in for the quotient terms of the equations in question:

This fallback is applied bin-by-bin as needed. The effect is very small: only a few bins at the tails of the near-detector Monte Carlo distribution are not fully extrapolated. Higher near-detector Monte Carlo statistics reduce this issue. A warning is issued for every bin that falls back; the warning is suppressed if both *N*^{Data} and *F*^{MC} are both also zero. The zero-division fallback mechanism is seen in action in the marginal bins of Figure \ref{fig:bgfn}.

### Note on equations¶

The pngs in this document are generated with latex2png.com with "Resolution" set to 150.