poliosim.utils module

Numerical utilities for running Poliosim

sample(dist=None, par1=None, par2=None, size=None, **kwargs)[source]

Draw a sample from the distribution specified by the input.

Parameters
  • dist (str) – the distribution to sample from

  • par1 (float) – the “main” distribution parameter (e.g. mean)

  • par2 (float) – the “secondary” distribution parameter (e.g. std)

  • size (int) – the number of samples (default=1)

  • kwargs (dict) – passed to individual sampling functions

Returns

A length N array of samples

Examples:

sample() # returns Unif(0,1)
sample(dist='normal', par1=3, par2=0.5) # returns Normal(μ=3, σ=0.5)

Notes

Lognormal distributions are parameterized with reference to the underlying normal distribution (see: https://docs.scipy.org/doc/numpy-1.14.0/reference/generated/numpy.random.lognormal.html), but this function assumes the user wants to specify the mean and variance of the lognormal distribution.

set_seed(seed=None)[source]

Reset the random seed – complicated because of Numba, which requires special syntax to reset the seed. This function also resets Python’s built-in random number generated.

Parameters

seed (int) – the random seed

n_binomial(prob, n)[source]

Perform multiple binomial (Bernolli) trials

Parameters
  • prob (float) – probability of each trial succeeding

  • n (int) – number of trials (size of array)

Returns

Boolean array of which trials succeeded

Example:

outcomes = ps.n_binomial(0.5, 100) # Perform 100 coin-flips
binomial_filter(prob, arr)[source]

Binomial “filter” – the same as n_binomial, except return the elements of arr that succeeded.

Parameters
  • prob (float) – probability of each trial succeeding

  • arr (array) – the array to be filtered

Returns

Subset of array for which trials succeeded

Example:

inds = ps.binomial_filter(0.5, np.arange(20)**2) # Return which values out of the (arbitrary) array passed the coin flip
binomial_arr(prob_arr)[source]

Binomial (Bernoulli) trials each with different probabilities.

Parameters

prob_arr (array) – array of probabilities

Returns

Boolean array of which trials on the input array succeeded

Example:

outcomes = ps.binomial_arr([0.1, 0.1, 0.2, 0.2, 0.8, 0.8]) # Perform 6 trials with different probabilities
n_multinomial(probs, n)[source]

An array of multinomial trials.

Parameters
  • probs (array) – probability of each outcome, which usually should sum to 1

  • n (int) – number of trials

Returns

Array of integer outcomes

Example:

outcomes = ps.n_multinomial(np.ones(6)/6.0, 50)+1 # Return 50 die-rolls
poisson(rate)[source]

A Poisson trial.

Parameters

rate (float) – the rate of the Poisson process

Example:

outcome = ps.poisson(100) # Single Poisson trial with mean 100
n_poisson(rate, n)[source]

An array of Poisson trials.

Parameters
  • rate (float) – the rate of the Poisson process (mean)

  • n (int) – number of trials

Example:

outcomes = ps.n_poisson(100, 20) # 20 poisson trials with mean 100
n_neg_binomial(rate, dispersion, n, step=1)[source]

An array of negative binomial trials; with dispersion = ∞, converges to Poisson.

Parameters
  • rate (float) – the rate of the process (mean, same as Poisson)

  • dispersion (float) – amount of dispersion: 0 = infinite, 1 = std is equal to mean, ∞ = Poisson

  • n (int) – number of trials

  • step (float) – the step size to use if non-integer outputs are desired

Example:

outcomes = ps.n_neg_binomial(100, 1, 20) # 20 negative binomial trials with mean 100 and dispersion equal to mean
choose(max_n, n)[source]

Choose a subset of items (e.g., people) without replacement.

Parameters
  • max_n (int) – the total number of items

  • n (int) – the number of items to choose

Example:

choices = ps.choose(5, 2) # choose 2 out of 5 people with equal probability (without repeats)
choose_r(max_n, n)[source]

Choose a subset of items (e.g., people), with replacement.

Parameters
  • max_n (int) – the total number of items

  • n (int) – the number of items to choose

Example:

choices = ps.choose_r(5, 10) # choose 10 out of 5 people with equal probability (with repeats)
choose_w(probs, n, unique=True)[source]

Choose n items (e.g. people), each with a probability from the distribution probs.

Parameters
  • probs (array) – list of probabilities, should sum to 1

  • n (int) – number of samples to choose

  • unique (bool) – whether or not to ensure unique indices

Example:

choices = ps.choose_w([0.2, 0.5, 0.1, 0.1, 0.1], 2) # choose 2 out of 5 people with nonequal probability.
true(arr)[source]

Returns the indices of the values of the array that are true: just an alias for arr.nonzero()[0].

Parameters

arr (array) – any array

Example:

inds = ps.true(np.array([1,0,0,1,1,0,1]))
false(arr)[source]

Returns the indices of the values of the array that are false.

Parameters

arr (array) – any array

Example:

inds = ps.false(np.array([1,0,0,1,1,0,1]))
defined(arr)[source]

Returns the indices of the values of the array that are not-nan.

Parameters

arr (array) – any array

Example:

inds = ps.defined(np.array([1,np.nan,0,np.nan,1,0,1]))
undefined(arr)[source]

Returns the indices of the values of the array that are not-nan.

Parameters

arr (array) – any array

Example:

inds = ps.defined(np.array([1,np.nan,0,np.nan,1,0,1]))
itrue(arr, inds)[source]

Returns the indices that are true in the array – name is short for indices[true]

Parameters
  • arr (array) – a Boolean array, used as a filter

  • inds (array) – any other array (usually, an array of indices) of the same size

Example:

inds = ps.itrue(np.array([True,False,True,True]), inds=np.array([5,22,47,93]))
ifalse(arr, inds)[source]

Returns the indices that are true in the array – name is short for indices[false]

Parameters
  • arr (array) – a Boolean array, used as a filter

  • inds (array) – any other array (usually, an array of indices) of the same size

Example:

inds = ps.ifalse(np.array([True,False,True,True]), inds=np.array([5,22,47,93]))
idefined(arr, inds)[source]

Returns the indices that are true in the array – name is short for indices[defined]

Parameters
  • arr (array) – any array, used as a filter

  • inds (array) – any other array (usually, an array of indices) of the same size

Example:

inds = ps.idefined(np.array([3,np.nan,np.nan,4]), inds=np.array([5,22,47,93]))
itruei(arr, inds)[source]

Returns the indices that are true in the array – name is short for indices[true[indices]]

Parameters
  • arr (array) – a Boolean array, used as a filter

  • inds (array) – an array of indices for the original array

Example:

inds = ps.itruei(np.array([True,False,True,True,False,False,True,False]), inds=np.array([0,1,3,5]))
ifalsei(arr, inds)[source]

Returns the indices that are false in the array – name is short for indices[false[indices]]

Parameters
  • arr (array) – a Boolean array, used as a filter

  • inds (array) – an array of indices for the original array

Example:

inds = ps.ifalsei(np.array([True,False,True,True,False,False,True,False]), inds=np.array([0,1,3,5]))
idefinedi(arr, inds)[source]

Returns the indices that are defined in the array – name is short for indices[defined[indices]]

Parameters
  • arr (array) – any array, used as a filter

  • inds (array) – an array of indices for the original array

Example:

inds = ps.idefinedi(np.array([4,np.nan,0,np.nan,np.nan,4,7,4,np.nan]), inds=np.array([0,1,3,5]))
load(filename=None, folder=None, verbose=False, die=None, remapping=None, method='pickle', **kwargs)

Load a file that has been saved as a gzipped pickle file, e.g. by sc.saveobj(). Accepts either a filename (standard usage) or a file object as the first argument. Note that loadobj()/load() are aliases of each other.

Note: be careful when loading pickle files, since a malicious pickle can be used to execute arbitrary code.

When a pickle file is loaded, Python imports any modules that are referenced in it. This is a problem if module has been renamed. In this case, you can use the remapping argument to point to the new modules or classes.

Parameters
  • filename (str/Path) – the filename (or full path) to load

  • folder (str/Path) – the folder

  • verbose (bool) – print details

  • die (bool) – whether to raise an exception if errors are encountered (otherwise, load as much as possible)

  • remapping (dict) – way of mapping old/unavailable module names to new

  • method (str) – method for loading (usually pickle or dill)

  • kwargs (dict) – passed to pickle.loads()/dill.loads()

Examples:

obj = sc.loadobj('myfile.obj') # Standard usage
old = sc.loadobj('my-old-file.obj', method='dill', ignore=True) # Load classes from saved files
old = sc.loadobj('my-old-file.obj', remapping={'foo.Bar':cat.Mat}) # If loading a saved object containing a reference to foo.Bar that is now cat.Mat
old = sc.loadobj('my-old-file.obj', remapping={'foo.Bar':('cat', 'Mat')}) # Equivalent to the above

New in version 1.1.0: “remapping” argument New in version 1.2.2: ability to load non-gzipped pickles; support for dill; arguments passed to loader

save(filename=None, obj=None, compresslevel=5, verbose=0, folder=None, method='pickle', die=True, *args, **kwargs)

Save an object to file as a gzipped pickle – use compression 5 by default, since more is much slower but not much smaller. Once saved, can be loaded with sc.loadobj(). Note that saveobj()/save() are identical.

Parameters
  • filename (str or Path) – the filename to save to; if str, passed to sc.makefilepath()

  • obj (literally anything) – the object to save

  • compresslevel (int) – the level of gzip compression

  • verbose (int) – detail to print

  • folder (str) – passed to sc.makefilepath()

  • method (str) – whether to use pickle (default) or dill

  • die (bool) – whether to fail if no object is provided

  • args (list) – passed to pickle.dumps()

  • kwargs (dict) – passed to pickle.dumps()

Example:

myobj = ['this', 'is', 'a', 'weird', {'object':44}]
sc.saveobj('myfile.obj', myobj)
sc.saveobj('myfile.obj', myobj, method='dill') # Use dill instead, to save custom classes as well

New in version 1.1.1: removed Python 2 support. New in version 1.2.2: automatic swapping of arguments if order is incorrect; correct passing of arguments

date(obj, *args, start_date=None, readformat=None, outformat=None, as_date=True, **kwargs)[source]

Convert any reasonable object – a string, integer, or datetime object, or list/array of any of those – to a date object. To convert an integer to a date, you must supply a start date.

Caution: while this function and readdate() are similar, and indeed this function calls readdate() if the input is a string, in this function an integer is treated as a number of days from start_date, while for readdate() it is treated as a timestamp in seconds. To change

Parameters
  • obj (str, int, date, datetime, list, array) – the object to convert

  • args (str, int, date, datetime) – additional objects to convert

  • start_date (str, date, datetime) – the starting date, if an integer is supplied

  • readformat (str/list) – the format to read the date in; passed to sc.readdate()

  • outformat (str) – the format to output the date in, if returning a string

  • as_date (bool) – whether to return as a datetime date instead of a string

Returns

either a single date object, or a list of them (matching input data type where possible)

Return type

dates (date or list)

Examples:

sc.date('2020-04-05') # Returns datetime.date(2020, 4, 5)
sc.date([35,36,37], start_date='2020-01-01', as_date=False) # Returns ['2020-02-05', '2020-02-06', '2020-02-07']
sc.date(1923288822, readformat='posix') # Interpret as a POSIX timestamp

New in version 1.0.0. New in version 1.2.2: “readformat” argument; renamed “dateformat” to “outformat”

day(obj, *args, start_date=None, **kwargs)[source]

Convert a string, date/datetime object, or int to a day (int), the number of days since the start day. See also sc.date() and sc.daydiff(). If a start day is not supplied, it returns the number of days into the current year.

Parameters
  • obj (str, date, int, list, array) – convert any of these objects to a day relative to the start day

  • args (list) – additional days

  • start_date (str or date) – the start day; if none is supplied, return days since (supplied year)-01-01.

Returns

the day(s) in simulation time (matching input data type where possible)

Return type

days (int or list)

Examples:

sc.day(sc.now()) # Returns how many days into the year we are
sc.day(['2021-01-21', '2024-04-04'], start_date='2022-02-22') # Days can be positive or negative

New in version 1.0.0. New in version 1.2.2: renamed “start_day” to “start_date”

daydiff(*args)[source]

Convenience function to find the difference between two or more days. With only one argument, calculate days since 2020-01-01.

Examples:

diff  = sc.daydiff('2020-03-20', '2020-04-05') # Returns 16
diffs = sc.daydiff('2020-03-20', '2020-04-05', '2020-05-01') # Returns [16, 26]

New in version 1.0.0.

date_range(start_date, end_date, inclusive=True, as_date=False, dateformat=None)

Return a list of dates from the start date to the end date. To convert a list of days (as integers) to dates, use sc.date() instead.

Parameters
  • start_date (int/str/date) – the starting date, in any format

  • end_date (int/str/date) – the end date, in any format

  • inclusive (bool) – if True (default), return to end_date inclusive; otherwise, stop the day before

  • as_date (bool) – if True, return a list of datetime.date objects instead of strings

  • dateformat (str) – passed to date()

Example:

dates = sc.daterange('2020-03-01', '2020-04-04')

New in version 1.0.0.

load_data(datafile, columns=None, calculate=True, check_date=True, verbose=True, **kwargs)[source]

Load data for comparing to the model output, either from file or from a dataframe.

Parameters
  • datafile (str or df) – if a string, the name of the file to load (either Excel or CSV); if a dataframe, use directly

  • columns (list) – list of column names (otherwise, load all)

  • calculate (bool) – whether to calculate cumulative values from daily counts

  • check_date (bool) – whether to check that a ‘date’ column is present

  • kwargs (dict) – passed to pd.read_excel()

Returns

pandas dataframe of the loaded data

Return type

data (dataframe)

savefig(filename=None, comments=None, **kwargs)[source]

Wrapper for Matplotlib’s savefig() function which automatically stores poliosim metadata in the figure. By default, saves

Parameters
  • filename (str) – name of the file to save to (default, timestamp)

  • comments (str) – additional metadata to save to the figure

  • kwargs (dict) – passed to savefig()

Example:

ps.Sim().run(do_plot=True)
filename = ps.savefig()
get_png_metadata(filename, output=False)[source]

Read metadata from a PNG file. For use with images saved with ps.savefig(). Requires pillow, an optional dependency.

Parameters

filename (str) – the name of the file to load the data from

Example:

ps.Sim().run(do_plot=True)
ps.savefig('poliosim.png')
ps.get_png_metadata('poliosim.png')
git_info(filename=None, check=False, comments=None, old_info=None, die=False, indent=2, verbose=True, **kwargs)[source]

Get current git information and optionally write it to disk. Simplest usage is ps.git_info(__file__)

Parameters
  • filename (str) – name of the file to write to or read from

  • check (bool) – whether or not to compare two git versions

  • comments (str/dict) – additional comments to include in the file

  • old_info (dict) – dictionary of information to check against

  • die (bool) – whether or not to raise an exception if the check fails

  • indent (int) – how many indents to use when writing the file to disk

  • verbose (bool) – detail to print

  • kwargs (dict) – passed to loadjson (if check=True) or loadjson (if check=False)

Examples:

ps.git_info() # Return information
ps.git_info(__file__) # Writes to disk
ps.git_info('poliosim_version.gitinfo') # Writes to disk
ps.git_info('poliosim_version.gitinfo', check=True) # Checks that current version matches saved file
check_version(expected, die=False, verbose=True, **kwargs)[source]

Get current git information and optionally write it to disk.

Parameters
  • expected (str) – expected version information

  • die (bool) – whether or not to raise an exception if the check fails

check_save_version(expected=None, filename=None, die=False, verbose=True, **kwargs)[source]

A convenience function that bundles check_version with git_info and saves automatically to disk from the calling file. The idea is to put this at the top of an analysis script, and commit the resulting file, to keep track of which version of poliosim was used.

Parameters
  • expected (str) – expected version information

  • filename (str) – file to save to; if None, guess based on current file name

  • kwargs (dict) – passed to git_info()

Examples:

ps.check_save_version()
ps.check_save_version('1.3.2', filename='script.gitinfo', comments='This is the main analysis script')
compute_gof(actual, predicted, normalize=True, use_frac=False, use_squared=False, as_scalar='none', eps=1e-09, skestimator=None, **kwargs)[source]

Calculate the goodness of fit. By default use normalized absolute error, but highly customizable. For example, mean squared error is equivalent to setting normalize=False, use_squared=True, as_scalar=’mean’.

Parameters
  • actual (arr) – array of actual (data) points

  • predicted (arr) – corresponding array of predicted (model) points

  • normalize (bool) – whether to divide the values by the largest value in either series

  • use_frac (bool) – convert to fractional mismatches rather than absolute

  • use_squared (bool) – square the mismatches

  • as_scalar (str) – return as a scalar instead of a time series: choices are sum, mean, median

  • eps (float) – to avoid divide-by-zero

  • skestimator (str) – if provided, use this scikit-learn estimator instead

  • kwargs (dict) – passed to the scikit-learn estimator

Returns

array of goodness-of-fit values, or a single value if as_scalar is True

Return type

gofs (arr)

Examples:

x1 = np.cumsum(np.random.random(100))
x2 = np.cumsum(np.random.random(100))

e1 = compute_gof(x1, x2) # Default, normalized absolute error
e2 = compute_gof(x1, x2, normalize=False, use_frac=False) # Fractional error
e3 = compute_gof(x1, x2, normalize=False, use_squared=True, as_scalar='mean') # Mean squared error
e4 = compute_gof(x1, x2, estimator='mean_squared_error') # Scikit-learn's MSE method
e5 = compute_gof(x1, x2, as_scalar='median') # Normalized median absolute error -- highly robust
diff_sims(sim1, sim2, skip_key_diffs=False, output=False, die=False)[source]

Compute the difference of the summaries of two simulations, and print any values which differ.

Parameters
  • sim1 (sim/dict) – either a simulation object or the sim.summary dictionary

  • sim2 (sim/dict) – ditto

  • skip_key_diffs (bool) – whether to skip keys that don’t match between sims

  • output (bool) – whether to return the output as a string (otherwise print)

  • die (bool) – whether to raise an exception if the sims don’t match

  • require_run (bool) – require that the simulations have been run

Example:

s1 = cv.Sim(beta=0.01)
s2 = cv.Sim(beta=0.02)
s1.run()
s2.run()
cv.diff_sims(s1, s2)