check_dist#

check_dist(actual, expected, std=None, dist='norm', check='dist', label=None, alpha=0.05, size=10000, verbose=True, die=False, stats=False)[source]#

Check whether counts match the expected distribution. The distribution can be any listed in scipy.stats. The parameters for the distribution should be supplied via the “expected” argument. The standard deviation for a normal distribution is a special case; it can be supplied separately or calculated from the (actual) data.

Parameters:
  • actual (int, float, or array) – the observed value, or distribution of values

  • expected (int, float, tuple) – the expected value; or, a tuple of arguments

  • std (float) – for normal distributions, the standard deviation of the expected value (taken from data if not supplied)

  • dist (str) – the type of distribution to use

  • check (str) – what to check: ‘dist’ = entire distribution (default), ‘mean’ (equivalent to supplying np.mean(actual)), or ‘median’

  • label (str) – the name of the variable being tested

  • alpha (float) – the significance level at which to reject the null hypothesis

  • size (int) – the size of the sample from the expected distribution to compare with if distribution is discrete

  • verbose (bool) – print a warning if the null hypothesis is rejected

  • die (bool) – raise an exception if the null hypothesis is rejected

  • stats (bool) – whether to return statistics

Returns:

whether null hypothesis is rejected, pvalue, number of samples, expected quintiles, observed quintiles, and the observed quantile.

Return type:

If stats is True, returns statistics

Examples:

sp.check_dist(actual=[3,4,4,2,3], expected=3, dist='poisson')
sp.check_dist(actual=[0.14, -3.37,  0.59, -0.07], expected=0, std=1.0, dist='norm')
sp.check_dist(actual=5.5, expected=(1, 5), dist='lognorm')