atomcloud.fits package

Submodules

Module contents

Created on Mon Mar 14 19:26:59 2022

@author: hofer

class atomcloud.fits.MultiFit1D(function_names, constraints=None, scipy_length=1000.0, fixed_length=None, max_nfev_scalar=100)[source]

Bases: MultiFunctionFit

Class for doing 1d multi-function fits. Inherits from the base class MultiFunctionFit.

Parameters:
  • function_names (list[str]) –

  • constraints (list[str] | None) –

  • scipy_length (int) –

  • fixed_length (int | None) –

  • max_nfev_scalar (int) –

class atomcloud.fits.MultiFit2D(function_names, constraints=None, scipy_length=1000.0, fixed_length=None, max_nfev_scalar=30)[source]

Bases: MultiFunctionFit

Parameters:
  • function_names (list[str]) –

  • constraints (list[str] | None) –

  • scipy_length (int) –

  • fixed_length (int | None) –

  • max_nfev_scalar (int) –

class atomcloud.fits.MultiFunction1D(function_names, constraints=None, use_jax=False)[source]

Bases: MultiFunc

1D cloud multi-function class which inherits from the base class. It uses the imported dictionary of 2D function objects as it’s base dictionary of function objects, but also allows the user to add custom function objects to the dictionary of function objects.

See base class for more details.

Parameters:
  • function_names (list[str]) –

  • constraints (list[str] | None) –

  • use_jax (bool) –

class atomcloud.fits.MultiFunction2D(function_names, constraints=None, use_jax=False)[source]

Bases: MultiFunc

2D cloud multi-function class which inherits from the base class. It uses the imported dictionary of 2D function objects as it’s base dictionary of function objects, but also allows the user to add custom function objects to the dictionary of function objects.

Parameters:
  • function_names (list[str]) – The keys for the function objects in the registry which will be used in the multi-function

  • func_registry – The registry of function objects

  • constraints (list[str] | None) – A list of constraints which will be applied to the the functions in the multi-function (see ConstrainedMultiFunction for more details)

  • use_jax (bool) – If True, the functions in the multi-function will be created using JAX. If False, the functions will be created

Returns:

None

class atomcloud.fits.MultiFunctionFit(function_names, multi_func, func_registry, fit_label, max_nfev_scalar=50, constraints=None, scipy_length=1000.0, fixed_length=None)[source]

Bases: ABC

Parameters:
  • function_names (list[str]) –

  • multi_func (object) –

  • func_registry (dict) –

  • fit_label (str) –

  • max_nfev_scalar (int) –

  • constraints (list[str] | None) –

  • scipy_length (int) –

  • fixed_length (str | None) –

fit_object_init(multi_func)[source]

Initialize the JAXFit object and function. This is only done if JAX is installed and SciPy length is not None.

Parameters:

multi_func (object) – multi-function object to be used

Return type:

None

flatten_fit_data(coords, data)[source]

Flatten the data and coordinates to be fit.

Parameters:
  • coords (ndarray | Iterable[ndarray]) – coordinates of the data

  • data (ndarray) – data to be fit (must be same shape as coords)

Returns:

flattened coordinates of the data flat_data: flattened data to be fit

Return type:

flat_coords

get_default_bounds()[source]

Get the default bounds for the fit. This is a tuple of two lists one for the lower bounds and one for the upper bounds. The lower and upper bounds are each returned as a list of lists where each sublist is the bounds for a single function.

Returns:

tuple of the lower and upper bounds

Return type:

bounds

get_default_seed(coords, data)[source]

Get the default seed for the fit. This is a list of lists where each sublist is the seed for a single function.

Parameters:
  • coords (ndarray | Iterable[ndarray]) – coordinates of the data

  • data (ndarray) – data to be fit

Returns:

list of the seed parameters

Return type:

seed

get_fit(coords, data, seed=None, bounds=None, sigma=None, mask=None, uncertainty=False, verbose=False)[source]

Fit the data to the multi-function

Parameters:
  • coords (ndarray | Iterable[ndarray]) – coordinates of the data

  • data (ndarray) – data to be fit (must be same shape as coords)

  • seed (list[list[float]] | None) – initial seed for the fit in terms of the multi-functions individual functions (ie. list of lists where the top level list corresponds to the function and the second level list corresponds to the parameters of that function).

  • bounds (list[list[float]] | None) – tuple of min and max values for each parameter, but the

  • (see (min and max values are each formatted as a list of lists) –

  • formatting) (seed for) –

  • sigma (ndarray | None) – standard deviation of the data (must be same shape as data) or a covariance matrix with each axis the same length as the data.

  • mask (ndarray | None) – mask to be applied to the data which is the same shape as the data

  • uncertainty (bool) – whether to return the uncertainties of the fit

  • plot_it – whether to plot the fit

  • verbose (bool) – whether to print the fit information to the console.

Returns:

list of the fit parameters info: dictionary of the fit information

Return type:

params

get_fit_metrics(params, func_eval, flat_data, sigma=None)[source]

Get the fit metrics for the fit. Currently only chi squared and reduced chi squared are calculated.

Parameters:
  • params (list[list[float]]) – fit parameters

  • flat_coords – flattened coordinates of the data

  • flat_data (ndarray | Iterable[ndarray]) – flattened data that was fit

  • sigma (ndarray | None) – standard deviation of the data (must be same shape as data) or a covariance matrix with each axis the same length as the data.

  • func_eval (ndarray) –

Returns:

dictionary of the fit metrics

Return type:

fit_metrics

get_fit_obj(flat_data)[source]

Returns the correct fit package to use based on the length of the data and whether JAX is installed.

Parameters:

flat_data – flattened data to be fit

Returns:

curvefit function to use fit_func: function to be fit print_label: label to print to console if Verbose is enabled

Return type:

curvefit

get_info_dict(fit_parameters, fit_metrics, data_sum)[source]

Get the fit information dictionary this will be saved and used throughout the package to do things like plotting, integrating, etc.

Parameters:
  • fit_parameters (list[list[float]]) – fit parameters

  • fit_metrics (dict) – dictionary of the fit metrics

  • data_sum (float) – sum of the data that was fit

Returns:

dictionary of the fit information

Return type:

fit_dict

handle_uncertainty(popt, pcov, func_params, uncertainty)[source]

Handle the uncertainty of the fit. If uncertainty is True, then the covariance matrix and the fit parameters are used to create the uncertainty fit parameters. The packages is designed to handle these uncertainty parameters throughout, but these are more difficult to work with due to limits on the accepted operations and thus the user might wish to neglect using them for their own custom functions.

Parameters:
  • popt (ndarray) – fit parameters

  • pcov (ndarray) – covariance matrix for the fit parameters

  • func_params (list[list[float]]) – fit parameters in the format list of individual functions fit parameters

  • uncertainty (bool) – whether to return the uncertainty parameters

Returns:

fit parameters to be saved in the info dictionary

Return type:

save_params

set_cutoff_length(scipy_length)[source]

Allows the user to change the length of data that will trigger a JAX vs. SciPy fit (see init docstring for more info).

Parameters:

scipy_length (int) – length of data to use scipy over jax

Return type:

None

set_fixed_length(fixed_length)[source]

Set a fixed length, but jaxfit needs to be changed to allow this to work.

Parameters:

fixed_length (int) –

Return type:

None

class atomcloud.fits.SumFit2D(function_names, constraints=None, scipy_length=10000.0, fixed_length=None)[source]

Bases: object

This object sums the image data on the x and y axes and then x and y fits are done on the 1D sums. The x and y fits are then used to construct a 2D function parameters. The functions passed in must be defined in both thev1D function registry as well as the sumfunction registry as both sets of objects are used in the fitting process.

Parameters:
  • function_names (list[str]) –

  • constraints (list[str] | None) –

  • scipy_length (int) –

  • fixed_length (int | None) –

convert_2d_params_to_1d(coords, params2d)[source]

Converts the 2D fit parameters to the 1D fit parameters along the x and y axes. :param coords: x and y coordinates of image data :param params2d: 2D fit parameters for all fit functions

Returns:

x axis parameters in 1D format for all fit functions yparams: y axis parameters in 1D format for all fit functions

Return type:

xparams

Parameters:
  • coords (Iterable[ndarray]) –

  • params2d (list[list[float]]) –

convert_all_sums_2d(coords, xparams, yparams)[source]

Converts the 1D fit parameters for the x and y axes to the 2D fit parameters. :param coords: x and y coordinates of image data :param xparams: x axis fit parameters in 1D format :param yparams: y axis fit parameters in 1D format

Returns:

2D fit parameters for all fit functions

Return type:

params_2d

Parameters:
  • coords (Iterable[ndarray]) –

  • xparams (list[list[float]]) –

  • yparams (list[list[float]]) –

get_bounds(bounds, data_shape)[source]

Bounds needs more work, but basically passes None to both of the 1D stage fits which operate on the x and y axes respectively.

get_fit(coords, data, seed=None, bounds=None, mask=None, verbose=False)[source]

Sums the image data on the x and y axes and then x and y fits are done on the 1D sums. The x and y fits are then used to construct a 2D fit params. The x and y fits and 2d params are returned as dictionaries.

Parameters:
  • coords (Iterable[ndarray]) – x and y coordinates of image data

  • data (ndarray) – image data to be fit

  • seed (list[list[float]] | None) – initial guess for fit parameters in 2d format?

  • bounds (list[list[float]] | None) – bounds for fit parameters in 2d format?

  • mask (ndarray | None) – mask for image data

  • verbose (bool) – boolean to print out fit info

Returns:

x sum fit parameters, y sum fit parameters, 2d fit

parameters in a list

fit_dicts: dictionary with x sum fit info, y sum fit info and

2d fit info

Return type:

fit_params

get_fit_dicts(params_2d, xfit_dicts, yfit_dicts, data_sum)[source]

Creates a dictionary with 2d fit parameters, as well as the x and y sum fit parameters. :param params_2d: 2D fit parameters :param xfit_dicts: dictionary with x sum fit parameters :param yfit_dicts: dictionary with y sum fit parameters :param data_sum: The average of the x and y sum data.

Returns:

dictionary with 2D fit parameters, as well as the x and y sum fit parameters.

Return type:

all_fit_dicts

Parameters:
  • params_2d (list[list[float]]) –

  • xfit_dicts (dict) –

  • yfit_dicts (dict) –

  • data_sum (float) –

get_initial_seed(coords, data, mask=None)[source]

Generates the 1D seed parameters based on image data and coords. However just using the initial seed functions from each of the respective 1d function classes.

Parameters:
  • coords (Iterable[ndarray]) – x and y coordinates of image data

  • data (ndarray) – image data to be fit

  • mask (ndarray | None) –

Returns:

x axis seeds in 1D format yseeds: y axis seeds in 1D format

Return type:

xseeds

get_seed(seed, coords, data)[source]

Converts the 2D seed parameters to the 1D seed parameters if they are in the 2D format or if they are None then it generates the 1D seed parameters based on image data and coords

Parameters:
  • seed (None | list[list[float]] | list[list[list[float]]]) – initial guess for fit parameters which is either None or in the 2D format or 1D format.

  • coords (Iterable[ndarray]) – x and y coordinates of image data

  • data (ndarray) – image data to be fit

Returns:

x axis seeds in 1D format yseeds: y axis seeds in 1D format

Return type:

xseeds

get_xy_params(coords, params)[source]

Converts the 2D fit parameters to the 1D fit parameters if they are in the 2D format.

Parameters:
  • coords (Iterable[ndarray]) – x and y coordinates of image data

  • params (list[list[float]] | list[list[list[float]]]) – parameters either in 2D format or 1D format

Returns:

x axis parameters in 1D format yparams: y axis parameters in 1D format

Return type:

xparams

is_axes_params(seeds)[source]

Checks if the seeds are in the 2D format or the 1D format.

Parameters:

seeds (list[list[float]] | list[list[list[float]]]) – seeds for the fit parameters either in 2D format or 1D format.

Returns:

boolean that is True if the seeds are in the 1D

format and False if the seeds are in the 2D format.

Return type:

is_axes_params

atomcloud.fits.calc_chi_squared(num_parameters, actual_data, fit_data, sigma=None)[source]

Calculates the chi squared value and the reduced chi squared value for a fit.

Parameters:
  • num_parameters (int) – The number of parameters in the fit.

  • actual_data (ndarray) – The actual data that was fit.

  • fit_data (ndarray) – Fit corresponding to original coord points based on fit parameters.

  • sigma (ndarray | None) – The standard deviation of the data can be used to weight the chi squared value.

Returns:

The chi squared value and the reduced chi squared value.

Return type:

list[float]

atomcloud.fits.curve_fit(f, xdata, ydata, p0=None, sigma=None, absolute_sigma=False, check_finite=None, bounds=(-inf, inf), method=None, jac=None, *, full_output=False, nan_policy=None, **kwargs)[source]

Use non-linear least squares to fit a function, f, to data.

Assumes ydata = f(xdata, *params) + eps.

Parameters:
  • f (callable) – The model function, f(x, …). It must take the independent variable as the first argument and the parameters to fit as separate remaining arguments.

  • xdata (array_like) – The independent variable where the data is measured. Should usually be an M-length sequence or an (k,M)-shaped array for functions with k predictors, and each element should be float convertible if it is an array like object.

  • ydata (array_like) – The dependent data, a length M array - nominally f(xdata, ...).

  • p0 (array_like, optional) – Initial guess for the parameters (length N). If None, then the initial values will all be 1 (if the number of parameters for the function can be determined using introspection, otherwise a ValueError is raised).

  • sigma (None or M-length sequence or MxM array, optional) –

    Determines the uncertainty in ydata. If we define residuals as r = ydata - f(xdata, *popt), then the interpretation of sigma depends on its number of dimensions:

    • A 1-D sigma should contain values of standard deviations of errors in ydata. In this case, the optimized function is chisq = sum((r / sigma) ** 2).

    • A 2-D sigma should contain the covariance matrix of errors in ydata. In this case, the optimized function is chisq = r.T @ inv(sigma) @ r.

      New in version 0.19.

    None (default) is equivalent of 1-D sigma filled with ones.

  • absolute_sigma (bool, optional) –

    If True, sigma is used in an absolute sense and the estimated parameter covariance pcov reflects these absolute values.

    If False (default), only the relative magnitudes of the sigma values matter. The returned parameter covariance matrix pcov is based on scaling sigma by a constant factor. This constant is set by demanding that the reduced chisq for the optimal parameters popt when using the scaled sigma equals unity. In other words, sigma is scaled to match the sample variance of the residuals after the fit. Default is False. Mathematically, pcov(absolute_sigma=False) = pcov(absolute_sigma=True) * chisq(popt)/(M-N)

  • check_finite (bool, optional) – If True, check that the input arrays do not contain nans of infs, and raise a ValueError if they do. Setting this parameter to False may silently produce nonsensical results if the input arrays do contain nans. Default is True if nan_policy is not specified explicitly and False otherwise.

  • bounds (2-tuple of array_like or Bounds, optional) –

    Lower and upper bounds on parameters. Defaults to no bounds. There are two ways to specify the bounds:

    • Instance of Bounds class.

    • 2-tuple of array_like: Each element of the tuple must be either an array with the length equal to the number of parameters, or a scalar (in which case the bound is taken to be the same for all parameters). Use np.inf with an appropriate sign to disable bounds on all or some parameters.

  • method ({'lm', 'trf', 'dogbox'}, optional) –

    Method to use for optimization. See least_squares for more details. Default is ‘lm’ for unconstrained problems and ‘trf’ if bounds are provided. The method ‘lm’ won’t work when the number of observations is less than the number of variables, use ‘trf’ or ‘dogbox’ in this case.

    New in version 0.17.

  • jac (callable, string or None, optional) –

    Function with signature jac(x, ...) which computes the Jacobian matrix of the model function with respect to parameters as a dense array_like structure. It will be scaled according to provided sigma. If None (default), the Jacobian will be estimated numerically. String keywords for ‘trf’ and ‘dogbox’ methods can be used to select a finite difference scheme, see least_squares.

    New in version 0.18.

  • full_output (boolean, optional) –

    If True, this function returns additioal information: infodict, mesg, and ier.

    New in version 1.9.

  • nan_policy ({'raise', 'omit', None}, optional) –

    Defines how to handle when input contains nan. The following options are available (default is None):

    • ’raise’: throws an error

    • ’omit’: performs the calculations ignoring nan values

    • None: no special handling of NaNs is performed (except what is done by check_finite); the behavior when NaNs are present is implementation-dependent and may change.

    Note that if this value is specified explicitly (not None), check_finite will be set as False.

    New in version 1.11.

  • **kwargs – Keyword arguments passed to leastsq for method='lm' or least_squares otherwise.

Returns:

  • popt (array) – Optimal values for the parameters so that the sum of the squared residuals of f(xdata, *popt) - ydata is minimized.

  • pcov (2-D array) – The estimated approximate covariance of popt. The diagonals provide the variance of the parameter estimate. To compute one standard deviation errors on the parameters, use perr = np.sqrt(np.diag(pcov)). Note that the relationship between cov and parameter error estimates is derived based on a linear approximation to the model function around the optimum [1]. When this approximation becomes inaccurate, cov may not provide an accurate measure of uncertainty.

    How the sigma parameter affects the estimated covariance depends on absolute_sigma argument, as described above.

    If the Jacobian matrix at the solution doesn’t have a full rank, then ‘lm’ method returns a matrix filled with np.inf, on the other hand ‘trf’ and ‘dogbox’ methods use Moore-Penrose pseudoinverse to compute the covariance matrix. Covariance matrices with large condition numbers (e.g. computed with numpy.linalg.cond) may indicate that results are unreliable.

  • infodict (dict (returned only if full_output is True)) – a dictionary of optional outputs with the keys:

    nfev

    The number of function calls. Methods ‘trf’ and ‘dogbox’ do not count function calls for numerical Jacobian approximation, as opposed to ‘lm’ method.

    fvec

    The residual values evaluated at the solution, for a 1-D sigma this is (f(x, *popt) - ydata)/sigma.

    fjac

    A permutation of the R matrix of a QR factorization of the final approximate Jacobian matrix, stored column wise. Together with ipvt, the covariance of the estimate can be approximated. Method ‘lm’ only provides this information.

    ipvt

    An integer array of length N which defines a permutation matrix, p, such that fjac*p = q*r, where r is upper triangular with diagonal elements of nonincreasing magnitude. Column j of p is column ipvt(j) of the identity matrix. Method ‘lm’ only provides this information.

    qtf

    The vector (transpose(q) * fvec). Method ‘lm’ only provides this information.

    New in version 1.9.

  • mesg (str (returned only if full_output is True)) – A string message giving information about the solution.

    New in version 1.9.

  • ier (int (returnned only if full_output is True)) – An integer flag. If it is equal to 1, 2, 3 or 4, the solution was found. Otherwise, the solution was not found. In either case, the optional output variable mesg gives more information.

    New in version 1.9.

Raises:
  • ValueError – if either ydata or xdata contain NaNs, or if incompatible options are used.

  • RuntimeError – if the least-squares minimization fails.

  • OptimizeWarning – if covariance of the parameters can not be estimated.

See also

least_squares

Minimize the sum of squares of nonlinear functions.

scipy.stats.linregress

Calculate a linear least squares regression for two sets of measurements.

Notes

Users should ensure that inputs xdata, ydata, and the output of f are float64, or else the optimization may return incorrect results.

With method='lm', the algorithm uses the Levenberg-Marquardt algorithm through leastsq. Note that this algorithm can only deal with unconstrained problems.

Box constraints can be handled by methods ‘trf’ and ‘dogbox’. Refer to the docstring of least_squares for more information.

References

[1] K. Vugrin et al. Confidence region estimation techniques for nonlinear

regression in groundwater flow: Three case studies. Water Resources Research, Vol. 43, W03423, :doi:`10.1029/2005WR004804`

Examples

>>> import numpy as np
>>> import matplotlib.pyplot as plt
>>> from scipy.optimize import curve_fit
>>> def func(x, a, b, c):
...     return a * np.exp(-b * x) + c

Define the data to be fit with some noise:

>>> xdata = np.linspace(0, 4, 50)
>>> y = func(xdata, 2.5, 1.3, 0.5)
>>> rng = np.random.default_rng()
>>> y_noise = 0.2 * rng.normal(size=xdata.size)
>>> ydata = y + y_noise
>>> plt.plot(xdata, ydata, 'b-', label='data')

Fit for the parameters a, b, c of the function func:

>>> popt, pcov = curve_fit(func, xdata, ydata)
>>> popt
array([2.56274217, 1.37268521, 0.47427475])
>>> plt.plot(xdata, func(xdata, *popt), 'r-',
...          label='fit: a=%5.3f, b=%5.3f, c=%5.3f' % tuple(popt))

Constrain the optimization to the region of 0 <= a <= 3, 0 <= b <= 1 and 0 <= c <= 0.5:

>>> popt, pcov = curve_fit(func, xdata, ydata, bounds=(0, [3., 1., 0.5]))
>>> popt
array([2.43736712, 1.        , 0.34463856])
>>> plt.plot(xdata, func(xdata, *popt), 'g--',
...          label='fit: a=%5.3f, b=%5.3f, c=%5.3f' % tuple(popt))
>>> plt.xlabel('x')
>>> plt.ylabel('y')
>>> plt.legend()
>>> plt.show()

For reliable results, the model func should not be overparametrized; redundant parameters can cause unreliable covariance matrices and, in some cases, poorer quality fits. As a quick check of whether the model may be overparameterized, calculate the condition number of the covariance matrix:

>>> np.linalg.cond(pcov)
34.571092161547405  # may vary

The value is small, so it does not raise much concern. If, however, we were to add a fourth parameter d to func with the same effect as a:

>>> def func(x, a, b, c, d):
...     return a * d * np.exp(-b * x) + c  # a and d are redundant
>>> popt, pcov = curve_fit(func, xdata, ydata)
>>> np.linalg.cond(pcov)
1.13250718925596e+32  # may vary

Such a large value is cause for concern. The diagonal elements of the covariance matrix, which is related to uncertainty of the fit, gives more information:

>>> np.diag(pcov)
array([1.48814742e+29, 3.78596560e-02, 5.39253738e-03, 2.76417220e+28])  # may vary

Note that the first and last terms are much larger than the other elements, suggesting that the optimal values of these parameters are ambiguous and that only one of these parameters is needed in the model.

atomcloud.fits.nominal_list(param_list)[source]