poliosim.utils module¶
Numerical utilities for running Poliosim
- sample(dist=None, par1=None, par2=None, size=None, **kwargs)[source]¶
Draw a sample from the distribution specified by the input.
- Parameters
dist (str) – the distribution to sample from
par1 (float) – the “main” distribution parameter (e.g. mean)
par2 (float) – the “secondary” distribution parameter (e.g. std)
size (int) – the number of samples (default=1)
kwargs (dict) – passed to individual sampling functions
- Returns
A length N array of samples
Examples:
sample() # returns Unif(0,1) sample(dist='normal', par1=3, par2=0.5) # returns Normal(μ=3, σ=0.5)
Notes
Lognormal distributions are parameterized with reference to the underlying normal distribution (see: https://docs.scipy.org/doc/numpy-1.14.0/reference/generated/numpy.random.lognormal.html), but this function assumes the user wants to specify the mean and variance of the lognormal distribution.
- set_seed(seed=None)[source]¶
Reset the random seed – complicated because of Numba, which requires special syntax to reset the seed. This function also resets Python’s built-in random number generated.
- Parameters
seed (int) – the random seed
- n_binomial(prob, n)[source]¶
Perform multiple binomial (Bernolli) trials
- Parameters
prob (float) – probability of each trial succeeding
n (int) – number of trials (size of array)
- Returns
Boolean array of which trials succeeded
Example:
outcomes = ps.n_binomial(0.5, 100) # Perform 100 coin-flips
- binomial_filter(prob, arr)[source]¶
Binomial “filter” – the same as n_binomial, except return the elements of arr that succeeded.
- Parameters
prob (float) – probability of each trial succeeding
arr (array) – the array to be filtered
- Returns
Subset of array for which trials succeeded
Example:
inds = ps.binomial_filter(0.5, np.arange(20)**2) # Return which values out of the (arbitrary) array passed the coin flip
- binomial_arr(prob_arr)[source]¶
Binomial (Bernoulli) trials each with different probabilities.
- Parameters
prob_arr (array) – array of probabilities
- Returns
Boolean array of which trials on the input array succeeded
Example:
outcomes = ps.binomial_arr([0.1, 0.1, 0.2, 0.2, 0.8, 0.8]) # Perform 6 trials with different probabilities
- n_multinomial(probs, n)[source]¶
An array of multinomial trials.
- Parameters
probs (array) – probability of each outcome, which usually should sum to 1
n (int) – number of trials
- Returns
Array of integer outcomes
Example:
outcomes = ps.n_multinomial(np.ones(6)/6.0, 50)+1 # Return 50 die-rolls
- poisson(rate)[source]¶
A Poisson trial.
- Parameters
rate (float) – the rate of the Poisson process
Example:
outcome = ps.poisson(100) # Single Poisson trial with mean 100
- n_poisson(rate, n)[source]¶
An array of Poisson trials.
- Parameters
rate (float) – the rate of the Poisson process (mean)
n (int) – number of trials
Example:
outcomes = ps.n_poisson(100, 20) # 20 poisson trials with mean 100
- n_neg_binomial(rate, dispersion, n, step=1)[source]¶
An array of negative binomial trials; with dispersion = ∞, converges to Poisson.
- Parameters
rate (float) – the rate of the process (mean, same as Poisson)
dispersion (float) – amount of dispersion: 0 = infinite, 1 = std is equal to mean, ∞ = Poisson
n (int) – number of trials
step (float) – the step size to use if non-integer outputs are desired
Example:
outcomes = ps.n_neg_binomial(100, 1, 20) # 20 negative binomial trials with mean 100 and dispersion equal to mean
- choose(max_n, n)[source]¶
Choose a subset of items (e.g., people) without replacement.
- Parameters
max_n (int) – the total number of items
n (int) – the number of items to choose
Example:
choices = ps.choose(5, 2) # choose 2 out of 5 people with equal probability (without repeats)
- choose_r(max_n, n)[source]¶
Choose a subset of items (e.g., people), with replacement.
- Parameters
max_n (int) – the total number of items
n (int) – the number of items to choose
Example:
choices = ps.choose_r(5, 10) # choose 10 out of 5 people with equal probability (with repeats)
- choose_w(probs, n, unique=True)[source]¶
Choose n items (e.g. people), each with a probability from the distribution probs.
- Parameters
probs (array) – list of probabilities, should sum to 1
n (int) – number of samples to choose
unique (bool) – whether or not to ensure unique indices
Example:
choices = ps.choose_w([0.2, 0.5, 0.1, 0.1, 0.1], 2) # choose 2 out of 5 people with nonequal probability.
- true(arr)[source]¶
Returns the indices of the values of the array that are true: just an alias for arr.nonzero()[0].
- Parameters
arr (array) – any array
Example:
inds = ps.true(np.array([1,0,0,1,1,0,1]))
- false(arr)[source]¶
Returns the indices of the values of the array that are false.
- Parameters
arr (array) – any array
Example:
inds = ps.false(np.array([1,0,0,1,1,0,1]))
- defined(arr)[source]¶
Returns the indices of the values of the array that are not-nan.
- Parameters
arr (array) – any array
Example:
inds = ps.defined(np.array([1,np.nan,0,np.nan,1,0,1]))
- undefined(arr)[source]¶
Returns the indices of the values of the array that are not-nan.
- Parameters
arr (array) – any array
Example:
inds = ps.defined(np.array([1,np.nan,0,np.nan,1,0,1]))
- itrue(arr, inds)[source]¶
Returns the indices that are true in the array – name is short for indices[true]
- Parameters
arr (array) – a Boolean array, used as a filter
inds (array) – any other array (usually, an array of indices) of the same size
Example:
inds = ps.itrue(np.array([True,False,True,True]), inds=np.array([5,22,47,93]))
- ifalse(arr, inds)[source]¶
Returns the indices that are true in the array – name is short for indices[false]
- Parameters
arr (array) – a Boolean array, used as a filter
inds (array) – any other array (usually, an array of indices) of the same size
Example:
inds = ps.ifalse(np.array([True,False,True,True]), inds=np.array([5,22,47,93]))
- idefined(arr, inds)[source]¶
Returns the indices that are true in the array – name is short for indices[defined]
- Parameters
arr (array) – any array, used as a filter
inds (array) – any other array (usually, an array of indices) of the same size
Example:
inds = ps.idefined(np.array([3,np.nan,np.nan,4]), inds=np.array([5,22,47,93]))
- itruei(arr, inds)[source]¶
Returns the indices that are true in the array – name is short for indices[true[indices]]
- Parameters
arr (array) – a Boolean array, used as a filter
inds (array) – an array of indices for the original array
Example:
inds = ps.itruei(np.array([True,False,True,True,False,False,True,False]), inds=np.array([0,1,3,5]))
- ifalsei(arr, inds)[source]¶
Returns the indices that are false in the array – name is short for indices[false[indices]]
- Parameters
arr (array) – a Boolean array, used as a filter
inds (array) – an array of indices for the original array
Example:
inds = ps.ifalsei(np.array([True,False,True,True,False,False,True,False]), inds=np.array([0,1,3,5]))
- idefinedi(arr, inds)[source]¶
Returns the indices that are defined in the array – name is short for indices[defined[indices]]
- Parameters
arr (array) – any array, used as a filter
inds (array) – an array of indices for the original array
Example:
inds = ps.idefinedi(np.array([4,np.nan,0,np.nan,np.nan,4,7,4,np.nan]), inds=np.array([0,1,3,5]))
- load(filename=None, folder=None, verbose=False, die=None, remapping=None, method='pickle', **kwargs)¶
Load a file that has been saved as a gzipped pickle file, e.g. by
sc.saveobj()
. Accepts either a filename (standard usage) or a file object as the first argument. Note thatloadobj()
/load()
are aliases of each other.Note: be careful when loading pickle files, since a malicious pickle can be used to execute arbitrary code.
When a pickle file is loaded, Python imports any modules that are referenced in it. This is a problem if module has been renamed. In this case, you can use the
remapping
argument to point to the new modules or classes.- Parameters
filename (str/Path) – the filename (or full path) to load
folder (str/Path) – the folder
verbose (bool) – print details
die (bool) – whether to raise an exception if errors are encountered (otherwise, load as much as possible)
remapping (dict) – way of mapping old/unavailable module names to new
method (str) – method for loading (usually pickle or dill)
kwargs (dict) – passed to pickle.loads()/dill.loads()
Examples:
obj = sc.loadobj('myfile.obj') # Standard usage old = sc.loadobj('my-old-file.obj', method='dill', ignore=True) # Load classes from saved files old = sc.loadobj('my-old-file.obj', remapping={'foo.Bar':cat.Mat}) # If loading a saved object containing a reference to foo.Bar that is now cat.Mat old = sc.loadobj('my-old-file.obj', remapping={'foo.Bar':('cat', 'Mat')}) # Equivalent to the above
New in version 1.1.0: “remapping” argument New in version 1.2.2: ability to load non-gzipped pickles; support for dill; arguments passed to loader
- save(filename=None, obj=None, compresslevel=5, verbose=0, folder=None, method='pickle', die=True, *args, **kwargs)¶
Save an object to file as a gzipped pickle – use compression 5 by default, since more is much slower but not much smaller. Once saved, can be loaded with sc.loadobj(). Note that saveobj()/save() are identical.
- Parameters
filename (str or Path) – the filename to save to; if str, passed to sc.makefilepath()
obj (literally anything) – the object to save
compresslevel (int) – the level of gzip compression
verbose (int) – detail to print
folder (str) – passed to sc.makefilepath()
method (str) – whether to use pickle (default) or dill
die (bool) – whether to fail if no object is provided
args (list) – passed to pickle.dumps()
kwargs (dict) – passed to pickle.dumps()
Example:
myobj = ['this', 'is', 'a', 'weird', {'object':44}] sc.saveobj('myfile.obj', myobj) sc.saveobj('myfile.obj', myobj, method='dill') # Use dill instead, to save custom classes as well
New in version 1.1.1: removed Python 2 support. New in version 1.2.2: automatic swapping of arguments if order is incorrect; correct passing of arguments
- date(obj, *args, start_date=None, readformat=None, outformat=None, as_date=True, **kwargs)[source]¶
Convert any reasonable object – a string, integer, or datetime object, or list/array of any of those – to a date object. To convert an integer to a date, you must supply a start date.
Caution: while this function and readdate() are similar, and indeed this function calls readdate() if the input is a string, in this function an integer is treated as a number of days from start_date, while for readdate() it is treated as a timestamp in seconds. To change
- Parameters
obj (str, int, date, datetime, list, array) – the object to convert
args (str, int, date, datetime) – additional objects to convert
start_date (str, date, datetime) – the starting date, if an integer is supplied
readformat (str/list) – the format to read the date in; passed to sc.readdate()
outformat (str) – the format to output the date in, if returning a string
as_date (bool) – whether to return as a datetime date instead of a string
- Returns
either a single date object, or a list of them (matching input data type where possible)
- Return type
dates (date or list)
Examples:
sc.date('2020-04-05') # Returns datetime.date(2020, 4, 5) sc.date([35,36,37], start_date='2020-01-01', as_date=False) # Returns ['2020-02-05', '2020-02-06', '2020-02-07'] sc.date(1923288822, readformat='posix') # Interpret as a POSIX timestamp
New in version 1.0.0. New in version 1.2.2: “readformat” argument; renamed “dateformat” to “outformat”
- day(obj, *args, start_date=None, **kwargs)[source]¶
Convert a string, date/datetime object, or int to a day (int), the number of days since the start day. See also sc.date() and sc.daydiff(). If a start day is not supplied, it returns the number of days into the current year.
- Parameters
obj (str, date, int, list, array) – convert any of these objects to a day relative to the start day
args (list) – additional days
start_date (str or date) – the start day; if none is supplied, return days since (supplied year)-01-01.
- Returns
the day(s) in simulation time (matching input data type where possible)
- Return type
days (int or list)
Examples:
sc.day(sc.now()) # Returns how many days into the year we are sc.day(['2021-01-21', '2024-04-04'], start_date='2022-02-22') # Days can be positive or negative
New in version 1.0.0. New in version 1.2.2: renamed “start_day” to “start_date”
- daydiff(*args)[source]¶
Convenience function to find the difference between two or more days. With only one argument, calculate days since 2020-01-01.
Examples:
diff = sc.daydiff('2020-03-20', '2020-04-05') # Returns 16 diffs = sc.daydiff('2020-03-20', '2020-04-05', '2020-05-01') # Returns [16, 26]
New in version 1.0.0.
- date_range(start_date, end_date, inclusive=True, as_date=False, dateformat=None)¶
Return a list of dates from the start date to the end date. To convert a list of days (as integers) to dates, use sc.date() instead.
- Parameters
start_date (int/str/date) – the starting date, in any format
end_date (int/str/date) – the end date, in any format
inclusive (bool) – if True (default), return to end_date inclusive; otherwise, stop the day before
as_date (bool) – if True, return a list of datetime.date objects instead of strings
dateformat (str) – passed to date()
Example:
dates = sc.daterange('2020-03-01', '2020-04-04')
New in version 1.0.0.
- load_data(datafile, columns=None, calculate=True, check_date=True, verbose=True, **kwargs)[source]¶
Load data for comparing to the model output, either from file or from a dataframe.
- Parameters
datafile (str or df) – if a string, the name of the file to load (either Excel or CSV); if a dataframe, use directly
columns (list) – list of column names (otherwise, load all)
calculate (bool) – whether to calculate cumulative values from daily counts
check_date (bool) – whether to check that a ‘date’ column is present
kwargs (dict) – passed to pd.read_excel()
- Returns
pandas dataframe of the loaded data
- Return type
data (dataframe)
- savefig(filename=None, comments=None, **kwargs)[source]¶
Wrapper for Matplotlib’s savefig() function which automatically stores poliosim metadata in the figure. By default, saves
- Parameters
filename (str) – name of the file to save to (default, timestamp)
comments (str) – additional metadata to save to the figure
kwargs (dict) – passed to savefig()
Example:
ps.Sim().run(do_plot=True) filename = ps.savefig()
- get_png_metadata(filename, output=False)[source]¶
Read metadata from a PNG file. For use with images saved with ps.savefig(). Requires pillow, an optional dependency.
- Parameters
filename (str) – the name of the file to load the data from
Example:
ps.Sim().run(do_plot=True) ps.savefig('poliosim.png') ps.get_png_metadata('poliosim.png')
- git_info(filename=None, check=False, comments=None, old_info=None, die=False, indent=2, verbose=True, **kwargs)[source]¶
Get current git information and optionally write it to disk. Simplest usage is ps.git_info(__file__)
- Parameters
filename (str) – name of the file to write to or read from
check (bool) – whether or not to compare two git versions
comments (str/dict) – additional comments to include in the file
old_info (dict) – dictionary of information to check against
die (bool) – whether or not to raise an exception if the check fails
indent (int) – how many indents to use when writing the file to disk
verbose (bool) – detail to print
kwargs (dict) – passed to loadjson (if check=True) or loadjson (if check=False)
Examples:
ps.git_info() # Return information ps.git_info(__file__) # Writes to disk ps.git_info('poliosim_version.gitinfo') # Writes to disk ps.git_info('poliosim_version.gitinfo', check=True) # Checks that current version matches saved file
- check_version(expected, die=False, verbose=True, **kwargs)[source]¶
Get current git information and optionally write it to disk.
- Parameters
expected (str) – expected version information
die (bool) – whether or not to raise an exception if the check fails
- check_save_version(expected=None, filename=None, die=False, verbose=True, **kwargs)[source]¶
A convenience function that bundles check_version with git_info and saves automatically to disk from the calling file. The idea is to put this at the top of an analysis script, and commit the resulting file, to keep track of which version of poliosim was used.
- Parameters
expected (str) – expected version information
filename (str) – file to save to; if None, guess based on current file name
kwargs (dict) – passed to git_info()
Examples:
ps.check_save_version() ps.check_save_version('1.3.2', filename='script.gitinfo', comments='This is the main analysis script')
- compute_gof(actual, predicted, normalize=True, use_frac=False, use_squared=False, as_scalar='none', eps=1e-09, skestimator=None, **kwargs)[source]¶
Calculate the goodness of fit. By default use normalized absolute error, but highly customizable. For example, mean squared error is equivalent to setting normalize=False, use_squared=True, as_scalar=’mean’.
- Parameters
actual (arr) – array of actual (data) points
predicted (arr) – corresponding array of predicted (model) points
normalize (bool) – whether to divide the values by the largest value in either series
use_frac (bool) – convert to fractional mismatches rather than absolute
use_squared (bool) – square the mismatches
as_scalar (str) – return as a scalar instead of a time series: choices are sum, mean, median
eps (float) – to avoid divide-by-zero
skestimator (str) – if provided, use this scikit-learn estimator instead
kwargs (dict) – passed to the scikit-learn estimator
- Returns
array of goodness-of-fit values, or a single value if as_scalar is True
- Return type
gofs (arr)
Examples:
x1 = np.cumsum(np.random.random(100)) x2 = np.cumsum(np.random.random(100)) e1 = compute_gof(x1, x2) # Default, normalized absolute error e2 = compute_gof(x1, x2, normalize=False, use_frac=False) # Fractional error e3 = compute_gof(x1, x2, normalize=False, use_squared=True, as_scalar='mean') # Mean squared error e4 = compute_gof(x1, x2, estimator='mean_squared_error') # Scikit-learn's MSE method e5 = compute_gof(x1, x2, as_scalar='median') # Normalized median absolute error -- highly robust
- diff_sims(sim1, sim2, skip_key_diffs=False, output=False, die=False)[source]¶
Compute the difference of the summaries of two simulations, and print any values which differ.
- Parameters
sim1 (sim/dict) – either a simulation object or the sim.summary dictionary
sim2 (sim/dict) – ditto
skip_key_diffs (bool) – whether to skip keys that don’t match between sims
output (bool) – whether to return the output as a string (otherwise print)
die (bool) – whether to raise an exception if the sims don’t match
require_run (bool) – require that the simulations have been run
Example:
s1 = cv.Sim(beta=0.01) s2 = cv.Sim(beta=0.02) s1.run() s2.run() cv.diff_sims(s1, s2)