synthpops.sampling module

Sample distributions, either from real world data or from uniform distributions.

set_seed(seed=None)[source]

Reset the random seed – complicated because of Numba.

fast_choice(weights)[source]

Choose an option – quickly – from the provided weights. Weights do not need to be normalized.

Reimplementation of random.choices(), removing everything inessential.

Example

fast_choice([0.1,0.2,0.3,0.2,0.1]) # might return 2

sample_single_dict(distr_keys, distr_vals)[source]

Sample from a distribution.

Parameters:distr (dict or np.ndarray) – distribution
Returns:A single sampled value from a distribution.
sample_single_arr(distr)[source]

Sample from a distribution.

Parameters:distr (dict or np.ndarray) – distribution
Returns:A single sampled value from a distribution.
resample_age(age_dist_vals, age)[source]

Resample age from single year age distribution.

Parameters:
  • single_year_age_distr (arr) – age distribution, ordered by age
  • age (int) – age as an integer
Returns:

Resampled age as an integer.

sample_from_range(distr, min_val, max_val)[source]

Sample from a distribution from min_val to max_val, inclusive.

Parameters:
  • distr (dict) – distribution with integer keys
  • min_val (int) – minimum of the range to sample from
  • max_val (int) – maximum of the range to sample from
Returns:

A sampled number from the range min_val to max_val in the distribution distr.

check_dist(actual, expected, std=None, dist='norm', check='dist', label=None, alpha=0.05, size=10000, verbose=True, die=False, stats=False)[source]

Check whether counts match the expected distribution. The distribution can be any listed in scipy.stats. The parameters for the distribution should be supplied via the “expected” argument. The standard deviation for a normal distribution is a special case; it can be supplied separately or calculated from the (actual) data.

Parameters:
  • actual (int, float, or array) – the observed value, or distribution of values
  • expected (int, float, tuple) – the expected value; or, a tuple of arguments
  • std (float) – for normal distributions, the standard deviation of the expected value (taken from data if not supplied)
  • dist (str) – the type of distribution to use
  • check (str) – what to check: ‘dist’ = entire distribution (default), ‘mean’ (equivalent to supplying np.mean(actual)), or ‘median’
  • label (str) – the name of the variable being tested
  • alpha (float) – the significance level at which to reject the null hypothesis
  • size (int) – the size of the sample from the expected distribution to compare with if distribution is discrete
  • verbose (bool) – print a warning if the null hypothesis is rejected
  • die (bool) – raise an exception if the null hypothesis is rejected
  • stats (bool) – whether to return statistics
Returns:

whether null hypothesis is rejected, pvalue, number of samples, expected quintiles, observed quintiles, and the observed quantile.

Return type:

If stats is True, returns statistics

Examples:

sp.check_dist(actual=[3,4,4,2,3], expected=3, dist='poisson')
sp.check_dist(actual=[0.14, -3.37,  0.59, -0.07], expected=0, std=1.0, dist='norm')
sp.check_dist(actual=5.5, expected=(1, 5), dist='lognorm')
check_normal(*args, **kwargs)[source]

Alias to check_dist(dist=’normal’)

check_poisson(*args, **kwargs)[source]

Alias to check_dist(dist=’poisson’)

check_truncated_poisson(testdata, mu, lowerbound=None, upperbound=None, skipcheck=False, **kwargs)[source]

test if data fits in truncated poisson distribution between upperbound and lowerbound using kstest :param testdata: data to be tested :type testdata: array :param mu: expected mean for the poisson distribution :type mu: float :param lowerbound: lowerbound for truncation :type lowerbound: float :param upperbound: upperbound for truncation :type upperbound: float

Returns:(bool) return True if statistic check passed, else return False
statistic_test(expected, actual, test=<function chisquare>, verbose=True, die=False, **kwargs)[source]

Perform statistical checks for expected and actual data based on the null hypothesis that expected and actual distributions are identical. Throw assertion if the expected and actual data differ significantly based on the test selected. See https://docs.scipy.org/doc/scipy/reference/stats.html#statistical-tests.

Parameters:
  • expected (array) – the expected value; or, a tuple of arguments
  • actual (array) – the observed value, or distribution of values
  • test (scipy.stats) – scipy statistical tests functions, for example scipy.stats.chisquare
  • verbose (bool) – print a warning if the null hypothesis is rejected
  • die (bool) – raise an exception if the null hypothesis is rejected
  • **kwargs (dict) – optional arguments for statistical tests
Returns:

None.