Project

General

Profile

pyfitter.py

This script is intended to demonstrate the generic loading, fitting, and plotting of pre-compiled histogram data with various options (fit models, Fourier transform time ranges, and others). It will fit histograms for the ratio method, the positron time spectrum, and the ratio denominator (the exponential graph). Data and fits are plotted by default, as well as residuals time and frequency spectra for all three fit types. More specific or complex applications can be based off of this code.

View the code here: source:fitting/pyfitter/pyfitter.py@develop

It is recommended to execute this script with ipython or at least python -i pyfitter.py, unless you only need to see console output and the plots saved to disk.

Pre-Compiled Histograms

Download data histograms here:

You can extract these datasets with tar -xjvf filename.tar.bz2, and simply pass the filenames as arguments. Example: python -i ./pyfitter.py dstskim_9day_5036C_withfullDQC/clusters_*.hist. Note that the first two of these datasets are 'old-style' histograms, which the script automatically detects, and it converts them to the new format before any fitting or plotting. These input files are also readable by data_fit_test.py (source:fitting/pyfitter/data_fit_test.py@develop), which was developed using the old-style histogram formatting.

Use

The full option list can be printed with ./pyfitter.py --help, but you likely only need a few options to get started.

Example:

./pyfitter.py --fit-model 5par_simplecbo --fit-tmin 35000 --fit-tmax 630000 dstskim_endgame_5040A_withfullDQC/clusters_*.hist

This invocation uses the time spectrum model 5par_simplecbo for the T-method fit, and uses time-shifted transformations of this model to fit the ratio and exponential graphs. The fit time min/max are requested parameters, but the true start & stop time is adjusted to lie on a bin edge in the histogram data given as the last argument.

All times are currently in nanoseconds. (Nov 2019)

Alternative call signatures may prefix the above with rm -rf save_fits && (to remove the output directory from a previous invocation) or python -i (for an interactive session after the script has finished executing).

Other commonly-used command-line arguments include
  • --skip-exp-fit: do not perform the exponential fit (to save time)
  • --dataset-profile PROFILE: conveniently plug in defaults like initial parameter guesses and the appropriate muon loss model (TODO: finish implementation for "60h" and "9d")
  • --muloss-model-file: explicitly specify .pkl file containing muon loss time spectrum (TODO: compare precedence to --dataset-profile argument)
  • --FT-tmin TMIN and --FT-tmax TMAX restrict the time range used for the residuals FFT (default TMAX is 250000).

A full list of supported models is available as the list modelgen.available_models. See Models below for more information.

Fit results and plots will be written to the directory specified by --save-fits-dir, which the script refuses to overwrite. The default is ./save_fits/.

Plots can be disabled with --skip-plots. When enabled, they will pop up in their own windows unless the user specifies --no-popups. If you are running the script on a remote server and do not have (or don't wish to initialize) X11 forwarding, then specify --matplotlib-backend svg to prevent the matplotlib module from initializing graphical display utilities.

The script by default copies all terminal standard output to a log file called pyfitter.log, which it puts in the output directory where it saves the fits. Log output can be disabled with --no-log. (If the script has not exited, then the file content may sometimes fall behind the output on your screen due to buffering.)

Full Command-Line Options

Just run ./pyfitter.py --help to see the command-line options:

usage: pyfitter.py [-h] [--fit-model FIT_MODEL] [--skip-Tmethod]
                   [--skip-exp-fit] [--dataset-profile DATASET_PROFILE]
                   [--tracker-model-CBO-Npars TRACKER_MODEL_CBO_NPARS]
                   [--muloss-model-file MULOSS_MODEL_FILE]
                   [--free-ratio-muloss] [--fit-tmin FIT_TMIN]
                   [--fit-tmax FIT_TMAX] [--skip-save-fits]
                   [--save-fits-dir SAVE_FITS_DIR] [--skip-plots]
                   [--no-popups] [--matplotlib-backend MATPLOTLIB_BACKEND]
                   [--no-log] [--amias] [--amias-stem] [--FT-tmin FT_TMIN]
                   [--FT-tmax FT_TMAX]
                   infiles [infiles ...]

Perform ratio data fitting for input files.

Writes fit results to --save-fits-dir (unless --skip-save-fits argument is 
supplied).  This script REFUSES to overwrite SAVE_FITS_DIR, so either move 
the directory, delete it with rm -rf, or specifiy a different directory name.

TODO:
  * finish implementing --dataset-profile arg (muloss model file, tracker-
    derived CBO params, etc
positional arguments:
  infiles               Input data/toyMC files

optional arguments:
  -h, --help            show this help message and exit
  --fit-model FIT_MODEL
                        fiveparam|fiveparam+simplecbo|...coming soon
  --skip-Tmethod        Skip T-method fit
  --skip-exp-fit        Skip exponential fit
  --dataset-profile DATASET_PROFILE
                        endgame, 9day, or 60hour
  --tracker-model-CBO-Npars TRACKER_MODEL_CBO_NPARS
                        Number of free fit parameters in tracker-derived CBO
                        model
  --muloss-model-file MULOSS_MODEL_FILE
                        muon loss model file (input for util.MuonLossProfile)
  --free-ratio-muloss   don't fix muloss scale parameter in ratio fit
  --fit-tmin FIT_TMIN   start fit at this time [ns]
  --fit-tmax FIT_TMAX   stop fit at this time [ns]
  --skip-save-fits      skip saving fit results
  --save-fits-dir SAVE_FITS_DIR
                        subdirectory for saving fit results
  --skip-plots          don't make plots
  --no-popups           don't show plots in popup windows
  --matplotlib-backend MATPLOTLIB_BACKEND
                        backend to use if popup_plots=False
  --no-log              log stdout to file
  --amias               switch on AMIAS output
  --amias-stem          file-stem for AMIAS analysis
  --FT-tmin FT_TMIN     Start time for residuals Fourier transform
  --FT-tmax FT_TMAX     Stop time for residuals Fourier transform

Fitters and Models

Fit classes

There are currently (Oct 2019) two available fitters, both in the util module. These follow some conventions:
  • all fitters must be initialized with at least the param_names and model arguments, but no other arguments are required.
  • constructors are designed to initialize as little of the fitter's state as possible, but still allow full behavior specification in a single call.

LMFit

Create this fitter using the following syntax:
twiggle_fit = util.LMFit(
  residuals_function=util.fitres_yerr_from_model, 
  model=time_spectrum_model, 
  param_names=time_spectrum_param_names, 
  initial_params=time_spectrum_initial_params, 
  obj_fn_args=('model',('xdata','ydata')), 
  xdata=loaded_data['t'], ydata=loaded_data['bin_counts']
)

Notes:
  • arguments can be in any order
  • twiggle_fit.params will be a tuple set to either the initial values or the parameter values after a successful fit, and is the first argument to residuals_function
  • twiggle_fit.lmfit_params is the lmfit.Parameters object after initialization (but is simply the value None before this)
  • there is a convenience function in LMFit itself that retrieves the lmfit.Paramters as if the LMFit class were a dictionary (e.g. fit["param"])
  • if only model and param_names are given, it may be useful to also supply init=False in order to prevent auto-initialization of the underlying lmfit.Parameters object
  • delayed initialization can be invoked explicitly with twiggle_fit.init() (you can check state with the boolean twiggle_fit.inited)
  • obj_fn_args specifies arguments to the residuals_function aside from the fit parameters
  • residuals_function of course takes the actual x and y data (instead of the strings shown here); during initialization the special tokens model, xdata, ydata, xdata_err, and ydata_err are replaced by the LMFit data members of the same name for convenience

Parameter bounds and parameter fixing/releasing can be done AFTER initialization by setting data members of twiggle_fit.lmfit_params like this:

twiggle_fit['N0'].vary = False       # fix N0
twiggle_fit['tau_mu'].value = 64400. # set the value of tau_mu
twiggle_fit['tau_mu'].vary = False   # fix tau_mu
twiggle_fit['tau_cbo'].min = 0.      # put a lower bound on tau_cbo

and the fit will use these settings on the next call to twiggle_fit.fit() (by default). (NOTE: these settings are cleared after a call to twiggle_fit.init(), which re-initializes twiggle_fit.lmfit_params using the values in twiggle_fit.params.)

As with MINUIT, the optimization routine can handle bounded parameters, but it should be used sparingly.

LeastSqFit

Models

Currently-available time spectrum models are:

  • 5par
  • 5par_simplecbo
  • 5par_LM
  • 5par_simplecbo_LM
  • 5par_newtrackercbo
  • 5par_newtrackercbo_LM
  • 5par_pu_newtrackercbo_LM_VW
  • 5par_pu_newtrackercbo_LM_VWtied
  • (...more coming soon...)

These can be given as the command-line argument --fit-model MODEL. They can also be given to the model configuration utility function modelgen.model_config() as detailed below.

modelgen

The module modelgen has utilities which create the arguments used to initialize the fit classes. Any fit models implemented and/or cataloged in this module can be invoked in any context where the fitting classes are used (e.g. ToyMC or a Jupyter notebook). They are invoked through the dispatcher function model_config(), and there are three levels of control:

  1. model_name: Specify a model like model_name='5par_simplecbo' to get a time spectrum function and corresponding parameter names. The bare time spectrum function is provided for the T-method fit, as well as the corresponding time-shifted functions for fitting the ratio and exponential graphs.
  2. model_name and data: Providing the actual data histogram allows the modelgen utility to estimate initial parameter values based on moments and other properties of the given dataset.
  3. model_name, data, and fitclass: Like the above, but also include the full set of arguments needed to initialize a fit of the given class (LMFit or LeastSq).

model_name: model_config(model_name='5par')
Example:

config = modelgen.model_config( model_name='5par' )
twiggle_fit = util.LMFit(
  residuals_function=util.fitres_yerr_from_model, 
  initial_params=time_spectrum_initial_params, 
  leastsq_args=('model',('xdata','ydata')), 
  xdata=loaded_data['t'], ydata=loaded_data['bin_counts'], 
  **config['twiggle']
)

Note that explicitly-named arguments can occur in any order, but the expansion of the arguments in **config must be last.

Include initial_params: like the above, but include initial parameter estimations based on estimates from the data: model_config(model_name='5par',data=some_tdata_dict)
Example:

config = modelgen.model_config( model_name='5par', data=loaded_data )
twiggle_fit = util.LMFit(
  residuals_function=util.fitres_yerr_from_model, 
  leastsq_args=('model',('xdata','ydata')), 
  xdata=loaded_data['t'], ydata=loaded_data['bin_counts'], 
  **config['twiggle']
)

Do everything:

modelconfig = modelgen.model_config( model_name='5par', data=loaded_data, fitclass='lmfit' )
twiggle_fit = util.LMFit( **modelconfig['twiggle'] )

This attempts to create all initialization parameters needed for the LMFit object, but you can override values in the config dictionary like this before creating the fitter:

modelconfig = modelgen.model_config( model_name='5par', data=loaded_data, fitclass='lmfit' )
modelconfig['twiggle']['ydata'] = # ... some other list of values ...
modelconfig['ratio']['model'] = lambda t,pars: pars[0]*np.cos(pars[1]*t + pars[2])  # pure sinuoid with parameters A, omega, and phi

You can see the exact fit initialization parameters are in modelconfig with util.dprint(modelconfig).

You can also tweak parameter properties (bounds, initial values, freedom) by directly modifying elements like modelconfig['twiggle']['params_vary'][7] = False, but the indexing is inconvenient, so it's better to go ahead and create the fitter, and then tweak the lmfit.parameters directly:

modelconfig = modelgen.model_config( model_name='5par', data=loaded_data, fitclass='lmfit' )
ratio_fit = util.LMFit( **modelconfig['ratio'] )
ratio_fit['K'].vary = False

Muon Loss Models

Fitting with a muon loss model requires an additional file specifying a time distribution for muon loss (leaky storage) throughout the fill. The current version of the pyfitter will use files like these:

They will be loaded according to a filename provided to the class util.MuonLossProfile. This object makes an interpolation of the provided profile, and the interpolation is called with the member function MuonLossProfile.correction_factor(t,K). (Note: MuonLossProfile will accept arguments xlist and ylist instead of a filename, using those x-y pairs for the interpolation instead.) The pyfitter.py script accepts an argument --muloss-model-file which can be fed directly to the MuonLossProfile that it creates (or can be passed to modelgen.model_config function for models with a muon loss correction).

Saving and Loading Data

The script grabs all relevant variables out of the global namespace by inspection and puts them in a dictionary whose keys correspond to the variable name:

#### summary data (in case we want to do anything with it) ####
# we save almost everything in the namespace
savedata = {}
for thing in dir():
  if (
      thing[0]=='_' or type(eval(thing))==type(util)
      or thing in ('args','loaded_data','parser','t','thing')
    ): continue
  savedata[thing] = eval(thing)
savedata['args'] = args.__dict__   # save directly as dictionary
savedata['data'] = loaded_data     # save under the name 'data'

if not args.skip_save_fits:
  savedata_outfilename = os.path.join(args.save_fits_dir,'savedata.pkl')
  print 'saving data and run settings to %s...'%savedata_outfilename
  pickle.dump(savedata,open(savedata_outfilename,'wb'))

You can load information from these files in a new Python session by importing the pickle module and ensuring that the environment is set up to import the util module:

james pyfitter $ python
Python 2.7.12 (default, Dec  4 2017, 14:50:18) 
[GCC 5.4.0 20160609] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import pickle,util
 + ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ +
 +                                                                      +
 +           You have chose to blind your fitting according to          +
 +                omega_ref * (1 + (R +/- deltaR) *10^{-6})             +
 +                                                                      +
 + ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ +
<Blinders.Blinders.Blinders object at 0x7ffb83888d10>
  blinded: Yes
  fit type: 0
  reference value: 1.43948 rad*MHz
>>> savedata = pickle.load(open('pyfitter_test_fiveparamcbo/savedata.pkl','rb'))

The dictionary called savedata here is the dictionary of variables stored by the previous execution of Python. You can check its type and length

>>> type(savedata)
<type 'dict'>
>>> len(savedata)
9

or get its list of keys, and check the types of each variable stored at each key
>>> savedata.keys()
['ratio_fit', 'args', 'Twiggle_fit', 'exponential_fit', 'savedata', 'param_estimates', 'time_spectrum_plot', 'data_plots', 'data']
>>> for key in savedata.keys(): print key,type(savedata[key])
... 
ratio_fit <class 'util.LeastSqFit'>
args <type 'dict'>
Twiggle_fit <class 'util.LeastSqFit'>
exponential_fit <class 'util.LeastSqFit'>
savedata <type 'dict'>
param_estimates <type 'dict'>
time_spectrum_plot <class 'matplotlib.axes._subplots.AxesSubplot'>
data_plots <type 'dict'>
data <type 'dict'>

or grab individual value out and work with them
>>> print savedata['ratio_fit'].result
--------------------------------------------------------------------------
  Fit status: 1 (success)
  Both actual and predicted relative reductions in the sum of squares
    are at most 0.000000
  parameters:
            A:   0.3314 +-  2.763e-05  (initial: -0.332)
            r:  -22.05 +-  0.9251  (initial: 0     )
        phi_a:   5.222 +-  0.000152  (initial: 5.292 )
        tau_a:  -3.407e+10 +-  9.806e+13  (initial: 6.429e+04)
        A_cbo:  -0.001635 +-  0.000287  (initial: 0.0035)
    omega_cbo:   0.002578 +-  2.722e-06  (initial: 0.0026)
        t_cbo:   3.15e+11 +-  6.978e+15  (initial: 1.8e+05)
      phi_cbo:   9.885 +-  0.3116  (initial: 5.69  )
  DoF:  4272 bins - 8 parameters = 4264
  chi^2:  4756.476862
  reduced chi^2:  1.115496
  P: 1.32e-07
  covariance:
    7.64e-10    9.21e-08    -2.06e-11    -7.44e+07      -8e-11    -2.26e-13    -3.79e+09    4.13e-08    
    9.21e-08       0.856    -0.000116    1.76e+12    -2.9e-06    3.69e-08    8.83e+13      -0.006    
    -2.06e-11    -0.000116    2.31e-08    -2.63e+08    5.97e-10    -8.27e-12    -3.84e+10    1.34e-06    
    -7.44e+07    1.76e+12    -2.63e+08    9.62e+27    -1.08e+09     5.4e+06    -3.44e+28    -1.54e+12    
      -8e-11    -2.9e-06    5.97e-10    -1.08e+09    8.24e-08    -7.86e-11     1.2e+11    7.55e-06    
    -2.26e-13    3.69e-08    -8.27e-12     5.4e+06    -7.86e-11    7.41e-12    -2.31e+08    -7.01e-07    
    -3.79e+09    8.83e+13    -3.84e+10    -3.44e+28     1.2e+11    -2.31e+08    4.87e+31    2.73e+13    
    4.13e-08      -0.006    1.34e-06    -1.54e+12    7.55e-06    -7.01e-07    2.73e+13      0.0971    
--------------------------------------------------------------------------

or even load the dictionary values into the global namespace under their variable names for convenience
>>> for key in savedata.keys(): exec("%s = savedata['%s']"%(key,key))
... 
>>> util.dprint(data)
  bin_counts: [  6.58004300e+06   6.13998500e+06   5.68287500e+06 ...,   4.30000000e+02
     4.10000000e+02   3.71000000e+02]
  tmin: 30000
  D: [  3.64582000e+05  -7.04680000e+04  -5.03681000e+05 ...,   8.80000000e+01
     1.12000000e+02   1.28000000e+02]
  omega_a0: 0.00143779984146
  tmax: 700000
  half_T_a0: 2185.0
  S: [  6.21149200e+06   6.20797600e+06   6.18317100e+06 ...,   3.14000000e+02
     3.18000000e+02   2.68000000e+02]
  tbin_edges: [  30000.    30149.1   30298.2 ...,  699757.2  699906.3  700055.4]
  bin_counts_up: [  1.41515200e+06   1.51693700e+06   1.61878400e+06 ...,   4.50000000e+01
     4.20000000e+01   7.00000000e+00]
  bin_counts_v2: [  1.64356900e+06   1.53458400e+06   1.41968400e+06 ...,   9.10000000e+01
     1.01000000e+02   9.20000000e+01]
  bin_counts_um: [  1.50830300e+06   1.62228500e+06   1.72464200e+06 ...,   6.80000000e+01
     6.10000000e+01   6.30000000e+01]
  bin_counts_v1: [  1.64446800e+06   1.53417000e+06   1.42006100e+06 ...,   1.10000000e+02
     1.14000000e+02   1.06000000e+02]
  tbin_width: 149.1
  t: [  30074.55   30223.65   30372.75 ...,  666582.45  666731.55  666880.65]
  V: [  3.28803700e+06   3.06875400e+06   2.83974500e+06 ...,   2.01000000e+02
     2.15000000e+02   1.98000000e+02]
  U: [  2.92345500e+06   3.13922200e+06   3.34342600e+06 ...,   1.13000000e+02
     1.03000000e+02   7.00000000e+01]
  R: [ 0.05869475 -0.0113512  -0.08145998 ...,  0.28025478  0.35220126
    0.47761194]
  R_err: [ 0.00040055  0.00040133  0.00040082 ...,  0.05417174  0.05248403
    0.05366726]
  N_tbins: 4272
  full: {
    tmin: 0.0
    tmax: 800070.6
    tbin_edges: [  0.00000000e+00   1.49100000e+02   2.98200000e+02 ...,   7.99772400e+05
       7.99921500e+05   8.00070600e+05]
    bin_counts: [ 0.  0.  0. ...,  0.  0.  0.]
    tbin_width: 149.1
    t: [  7.45500000e+01   2.23650000e+02   3.72750000e+02 ...,   7.99697850e+05
       7.99846950e+05   7.99996050e+05]
    N_tbins: 5366
  }