Project

General

Profile

Beginning Roadmap

NOTE: there is a much more recent organization of our goals & tasks developing here

Systematics Estimation

We can prioritize work on systematics after we know the relative sizes of each effect.

Five-Parameter Fit Systematics

Precession frequency systematic shifts in 5-parameter T-wiggle fit:

Systematic PPM Shift
(Ratio)
PPM Shift
(5-param)
Effect DocDB
Pileup -- -- Deform (suppress) exponential shape at early times --
Gain -- -- Inflate/deflate asymmetry due to energy cut? --
CBO -- -- Degrade sinusoidal fit (equally at early & late times) --
Lost
Muons
-- -- Deform (suppress) exponential shape at late times? --

Characterize the scale of each systematic effect by fitting Toy MC. Introduce each effect into sampling function, do a ratio fit, and quantify ppm shift in precession frequency. Check with N_data at various orders of magnitude (10^5 through 10^9). Check shift in both the ratio formulation and 5-parameter T-wiggle fit.

Startup Work List

Core Deliverable Accessories Critical Checks Notes, Hints, etc. Persons working
First (Blinded) Fits
Naive fits to data
Residuals
Goodness-of-Fit
Results are here (with data and instructions to reproduce).
No systematics are taken into account.
At 54 million positrons, the ratio fit is good, but the five-parameter and exponential fits are not (as judged by chi-squared P-values).
Sync pkl file format between toymc and data
--
toymc output can be processed by fitting scripts
--
--
James
Pileup Check (ToyMC)
Build pileup into Toy MC sampling of wiggle distribution, and fill in 'pileup' blank in systematics table
Plot of chi^2 (or P value) vs. N_data for all three fit types
Extrapolate to >1 billion clusters
Vary fit start time & verify that pileup affects early times more
Start with fiveparam_model in toymc/util.py
Add an option to enforce a 'dead time'
NOTE: dead time must be enforced before dataset quartering
Manolis
Gain Check (ToyMC)
Build gain into Toy MC sampling of wiggle distribution, and fill in 'gain' blank in systematics table
Profile fits with energy threshold (chi^2 or similar for all three fit types) using a couple of gain models
--
Will need ToyMC to generate energies, and fitter will need to handle energy bins/energy threshold
CBO Check (ToyMC)
Build CBO into Toy MC wiggle distribution, and fill in 'CBO' blank in systematics table
Recover CBO frequency from Fourier transform of fit residuals
Check size of frequency shift (and chi^2) for each fit type, and a few different sizes of data
Does CBO affect 5-param fit more than ratio fit?
Does CBO affect exponent at all? (Check chi^2 for fit degradation with stats.)
Start with fiveparam_model in toymc/util.py
Add an optional CBO amplitude and frequency
Start with the assumed exponential decay for CBO decoherence (or pass a function like math.exp to the fiveparam_model function call?)
Energy in ToyMC
Sample (t,E) instead of just time
Implement an energy threshold cut
Fit scans with different values for E_threshold
Set E_threshold=0 to reproduce original results?
Gain systematic study probably wants this
NOTE: E_threshold from figure-of-merit NA^2 is derived assuming 5-parameter T wiggle fit (see TDR section 3.5)
Manolis
Fitter Extensions
Add some features to existing fitting code
P values from chi^2
Residuals analysis (including Fourier transform)
Option for ROOT fitting? (currently using scipy.optimize.leastsq)
--
Existing code is in gm2ilratio/fitting/pyfitter. NOTE: Some of the fitting code is duplicated in the Toy MC (including fitresult.py and util.py). We might need to merge some code in util.py and put it in some other directory accessible by both the fitting code and the Toy MC code. (While data fitting shouldn't affect Toy MC development, it should get improvements from it (like CBO terms in the wiggle function).)
Scale data fits to >1billion clusters
First pass: skim data files, write out clusters, build histograms without exploding memory & time requirements
Think carefully about what information to write with clusters
First check: how does chi^2 (or P value) degrade with N_data for all three fit types to real data?
60-hour dataset is ~65 runs, each with ~300 1.5Gb subrun files
A subrun has ~150 'events' (fills), each with a few thousand clusters
James
60-hour start-time scans
fit params (omega_a, phi, A) as a function of fit start time
tau, N0 as a function of start time,
data_fit_test.py
start from 30us to 100us in 10us intervals?
data_fit_test.py currently overwrites its own fit t_min with the value found in histogram files, so needs to be changed
--
Fit data per-calo ...
Fit data per-fill-type ...
Look for periods of unstable field index (CBO frequency) ...
more later...