check_dist#

check_dist(actual, expected, std=None, dist='norm', check='dist', label=None, alpha=0.05, size=10000, verbose=True, die=False, stats=False)[source]#

Check whether counts match the expected distribution. The distribution can be any listed in scipy.stats. The parameters for the distribution should be supplied via the “expected” argument. The standard deviation for a normal distribution is a special case; it can be supplied separately or calculated from the (actual) data.

Parameters:

actual (int, float, or array) – the observed value, or distribution of values
expected (int, float, tuple) – the expected value; or, a tuple of arguments
std (float) – for normal distributions, the standard deviation of the expected value (taken from data if not supplied)
dist (str) – the type of distribution to use
check (str) – what to check: ‘dist’ = entire distribution (default), ‘mean’ (equivalent to supplying np.mean(actual)), or ‘median’
label (str) – the name of the variable being tested
alpha (float) – the significance level at which to reject the null hypothesis
size (int) – the size of the sample from the expected distribution to compare with if distribution is discrete
verbose (bool) – print a warning if the null hypothesis is rejected
die (bool) – raise an exception if the null hypothesis is rejected
stats (bool) – whether to return statistics

Returns:

whether null hypothesis is rejected, pvalue, number of samples, expected quintiles, observed quintiles, and the observed quantile.

Return type:

If stats is True, returns statistics

Examples:

sp.check_dist(actual=[3,4,4,2,3], expected=3, dist='poisson')
sp.check_dist(actual=[0.14, -3.37,  0.59, -0.07], expected=0, std=1.0, dist='norm')
sp.check_dist(actual=5.5, expected=(1, 5), dist='lognorm')