fpsim.experiment module¶

Define classes and functions for the Experiment class (running sims and comparing them to data)

class Experiment(pars=None, flags=None, label=None, **kwargs)[source]¶

Bases: prettyobj

Class for running calibration to data. Effectively, it runs a single sim and compares it to data.

Parameters:

pars (dict) – dictionary of parameters
flags (dict) – which analyses to run; see fp.experiment.default_flags for options
label (str) – label of experiment
kwargs (dict) – passed into pars

load_data(key, **kwargs)[source]¶: Load data from various formats

extract_data()[source]¶: Load data

pop_growth_rate(years, population)[source]¶

run_model(pars=None, **kwargs)[source]¶: Create the sim and run the model

post_process_sim()[source]¶

extract_model()[source]¶

model_pop_size()[source]¶

model_mcpr()[source]¶

model_mmr()[source]¶: Calculate maternal mortality in model over most recent 3 years

model_infant_mortality_rate()[source]¶

model_crude_death_rate()[source]¶

model_crude_birth_rate()[source]¶

model_data_tfr()[source]¶

model_data_asfr(ind=-1)[source]¶

extract_skyscrapers()[source]¶

extract_birth_spacing()[source]¶

extract_methods()[source]¶

extract_age_pregnancy()[source]¶

compute_fit(*args, **kwargs)[source]¶: Compute how good the fit is

post_process_results(keep_people=False, compute_fit=True, **kwargs)[source]¶: Compare the model and the data

run(pars=None, keep_people=False, compute_fit=True, **kwargs)[source]¶: Run the model and post-process the results

compare()[source]¶: Create and print a comparison between model and data

summarize(as_df=False)[source]¶

Convert results to a one-number-per-key summary format. Returns summary, also saves to self.summary.

Parameters:: as_df (bool) – if True, return a dataframe instead of a dict.

to_json(filename=None, tostring=False, indent=2, verbose=False, **kwargs)[source]¶

Export results as JSON.

Parameters:

filename (str) – if None, return string; else, write to file
tostring (bool) – if not writing to file, whether to write to string (alternative is sanitized dictionary)
indent (int) – if writing to file, how many indents to use per nested level
verbose (bool) – detail to print
kwargs (dict) – passed to savejson()

Returns:

A unicode string containing a JSON representation of the results, or writes the JSON file to disk

Examples:

json = exp.to_json()
exp.to_json('results.json')

plot(do_show=None, do_save=None, filename='fp_experiment.png', axis_args=None, do_maximize=True)[source]¶: Plot the model against the data

class Fit(data, sim, weights=None, keys=None, custom=None, compute=True, verbose=False, **kwargs)[source]¶

Bases: prettyobj

A class for calculating the fit between the model and the data. Note the following terminology is used here:

fit: nonspecific term for how well the model matches the data

difference: the absolute numerical differences between the model and the data (one time series per result)

goodness-of-fit: the result of passing the difference through a statistical function, such as mean squared error

loss: the goodness-of-fit for each result multiplied by user-specified weights (one time series per result)

mismatches: the sum of all the losses (a single scalar value per time series)

mismatch: the sum of the mismatches – this is the value to be minimized during calibration

Parameters:

sim (Sim) – the sim object
weights (dict) – the relative weight to place on each result (by default: 10 for deaths, 5 for diagnoses, 1 for everything else)
keys (list) – the keys to use in the calculation
custom (dict) – a custom dictionary of additional data to fit; format is e.g. {‘my_output’:{‘data’:[1,2,3], ‘sim’:[1,2,4], ‘weights’:2.0}}
compute (bool) – whether to compute the mismatch immediately
verbose (bool) – detail to print
kwargs (dict) – passed to cv.compute_gof() – see this function for more detail on goodness-of-fit calculation options

Example:

sim = cv.Sim()
sim.run()
fit = sim.compute_fit()
fit.plot()

compute()[source]¶: Perform all required computations

reconcile_inputs(verbose=False)[source]¶: Find matching keys and indices between the model and the data

compute_diffs(absolute=False)[source]¶: Find the differences between the sim and the data

compute_gofs(**kwargs)[source]¶: Compute the goodness-of-fit

compute_losses()[source]¶: Compute the weighted goodness-of-fit

compute_mismatch(use_median=False)[source]¶: Compute the final mismatch

plot(keys=None, width=0.8, font_size=18, fig_args=None, axis_args=None, plot_args=None, do_show=True)[source]¶

Plot the fit of the model to the data. For each result, plot the data and the model; the difference; and the loss (weighted difference). Also plots the loss as a function of time.

Parameters:

keys (list) – which keys to plot (default, all)
width (float) – bar width
font_size (float) – size of font
fig_args (dict) – passed to pl.figure()
axis_args (dict) – passed to pl.subplots_adjust()
plot_args (dict) – passed to pl.plot()
do_show (bool) – whether to show the plot

compute_gof(actual, predicted, normalize=True, use_frac=False, use_squared=False, as_scalar='none', eps=1e-09, skestimator=None, **kwargs)[source]¶

Calculate the goodness of fit. By default use normalized absolute error, but highly customizable. For example, mean squared error is equivalent to setting normalize=False, use_squared=True, as_scalar=’mean’.

Parameters:

actual (arr) – array of actual (data) points
predicted (arr) – corresponding array of predicted (model) points
normalize (bool) – whether to divide the values by the largest value in either series
use_frac (bool) – convert to fractional mismatches rather than absolute
use_squared (bool) – square the mismatches
as_scalar (str) – return as a scalar instead of a time series: choices are sum, mean, median
eps (float) – to avoid divide-by-zero
skestimator (str) – if provided, use this scikit-learn estimator instead
kwargs (dict) – passed to the scikit-learn estimator

Returns:

array of goodness-of-fit values, or a single value if as_scalar is True

Return type:

gofs (arr)

Examples:

x1 = np.cumsum(np.random.random(100))
x2 = np.cumsum(np.random.random(100))

e1 = compute_gof(x1, x2) # Default, normalized absolute error
e2 = compute_gof(x1, x2, normalize=False, use_frac=False) # Fractional error
e3 = compute_gof(x1, x2, normalize=False, use_squared=True, as_scalar='mean') # Mean squared error
e4 = compute_gof(x1, x2, skestimator='mean_squared_error') # Scikit-learn's MSE method
e5 = compute_gof(x1, x2, as_scalar='median') # Normalized median absolute error -- highly robust

diff_summaries(sim1, sim2, skip_key_diffs=False, output=False, die=False)[source]¶

Compute the difference of the summaries of two FPsim calibration objects, and print any values which differ.

Parameters:

sim1 (sim/dict) – the calib.summary dictionary, representing a single sim
sim2 (sim/dict) – ditto
skip_key_diffs (bool) – whether to skip keys that don’t match between sims
output (bool) – whether to return the output as a string (otherwise print)
die (bool) – whether to raise an exception if the sims don’t match
require_run (bool) – require that the simulations have been run

Example:

c1 = fp.Calibration()
c2 = fp.Calibration()
c1.run()
c2.run()
fp.diff_summaries(c1.summarize(), c2.summarize())