fpsim.experiment module

Define classes and functions for the Experiment class (running sims and comparing them to data)

class Experiment(pars=None, flags=None, label=None, **kwargs)[source]

Bases: prettyobj

Class for running calibration to data. Effectively, it runs a single sim and compares it to data.

Parameters:
  • pars (dict) – dictionary of parameters

  • flags (dict) – which analyses to run; see fp.experiment.default_flags for options

  • label (str) – label of experiment

  • kwargs (dict) – passed into pars

load_data(key, **kwargs)[source]

Load data from various formats

extract_data()[source]

Load data

pop_growth_rate(years, population)[source]
run_model(pars=None, **kwargs)[source]

Create the sim and run the model

post_process_sim()[source]
extract_model()[source]
model_pop_size()[source]
model_mcpr()[source]
model_mmr()[source]

Calculate maternal mortality in model over most recent 3 years

model_infant_mortality_rate()[source]
model_crude_death_rate()[source]
model_crude_birth_rate()[source]
model_data_tfr()[source]
model_data_asfr(ind=-1)[source]
extract_skyscrapers()[source]
extract_birth_spacing()[source]
extract_methods()[source]
extract_age_pregnancy()[source]
compute_fit(*args, **kwargs)[source]

Compute how good the fit is

post_process_results(keep_people=False, compute_fit=True, **kwargs)[source]

Compare the model and the data

run(pars=None, keep_people=False, compute_fit=True, **kwargs)[source]

Run the model and post-process the results

compare()[source]

Create and print a comparison between model and data

summarize(as_df=False)[source]

Convert results to a one-number-per-key summary format. Returns summary, also saves to self.summary.

Parameters:

as_df (bool) – if True, return a dataframe instead of a dict.

to_json(filename=None, tostring=False, indent=2, verbose=False, **kwargs)[source]

Export results as JSON.

Parameters:
  • filename (str) – if None, return string; else, write to file

  • tostring (bool) – if not writing to file, whether to write to string (alternative is sanitized dictionary)

  • indent (int) – if writing to file, how many indents to use per nested level

  • verbose (bool) – detail to print

  • kwargs (dict) – passed to savejson()

Returns:

A unicode string containing a JSON representation of the results, or writes the JSON file to disk

Examples:

json = exp.to_json()
exp.to_json('results.json')
plot(do_show=None, do_save=None, filename='fp_experiment.png', axis_args=None, do_maximize=True)[source]

Plot the model against the data

class Fit(data, sim, weights=None, keys=None, custom=None, compute=True, verbose=False, **kwargs)[source]

Bases: prettyobj

A class for calculating the fit between the model and the data. Note the following terminology is used here:

  • fit: nonspecific term for how well the model matches the data

  • difference: the absolute numerical differences between the model and the data (one time series per result)

  • goodness-of-fit: the result of passing the difference through a statistical function, such as mean squared error

  • loss: the goodness-of-fit for each result multiplied by user-specified weights (one time series per result)

  • mismatches: the sum of all the losses (a single scalar value per time series)

  • mismatch: the sum of the mismatches – this is the value to be minimized during calibration

Parameters:
  • sim (Sim) – the sim object

  • weights (dict) – the relative weight to place on each result (by default: 10 for deaths, 5 for diagnoses, 1 for everything else)

  • keys (list) – the keys to use in the calculation

  • custom (dict) – a custom dictionary of additional data to fit; format is e.g. {‘my_output’:{‘data’:[1,2,3], ‘sim’:[1,2,4], ‘weights’:2.0}}

  • compute (bool) – whether to compute the mismatch immediately

  • verbose (bool) – detail to print

  • kwargs (dict) – passed to cv.compute_gof() – see this function for more detail on goodness-of-fit calculation options

Example:

sim = cv.Sim()
sim.run()
fit = sim.compute_fit()
fit.plot()
compute()[source]

Perform all required computations

reconcile_inputs(verbose=False)[source]

Find matching keys and indices between the model and the data

compute_diffs(absolute=False)[source]

Find the differences between the sim and the data

compute_gofs(**kwargs)[source]

Compute the goodness-of-fit

compute_losses()[source]

Compute the weighted goodness-of-fit

compute_mismatch(use_median=False)[source]

Compute the final mismatch

plot(keys=None, width=0.8, font_size=18, fig_args=None, axis_args=None, plot_args=None, do_show=True)[source]

Plot the fit of the model to the data. For each result, plot the data and the model; the difference; and the loss (weighted difference). Also plots the loss as a function of time.

Parameters:
  • keys (list) – which keys to plot (default, all)

  • width (float) – bar width

  • font_size (float) – size of font

  • fig_args (dict) – passed to pl.figure()

  • axis_args (dict) – passed to pl.subplots_adjust()

  • plot_args (dict) – passed to pl.plot()

  • do_show (bool) – whether to show the plot

compute_gof(actual, predicted, normalize=True, use_frac=False, use_squared=False, as_scalar='none', eps=1e-09, skestimator=None, **kwargs)[source]

Calculate the goodness of fit. By default use normalized absolute error, but highly customizable. For example, mean squared error is equivalent to setting normalize=False, use_squared=True, as_scalar=’mean’.

Parameters:
  • actual (arr) – array of actual (data) points

  • predicted (arr) – corresponding array of predicted (model) points

  • normalize (bool) – whether to divide the values by the largest value in either series

  • use_frac (bool) – convert to fractional mismatches rather than absolute

  • use_squared (bool) – square the mismatches

  • as_scalar (str) – return as a scalar instead of a time series: choices are sum, mean, median

  • eps (float) – to avoid divide-by-zero

  • skestimator (str) – if provided, use this scikit-learn estimator instead

  • kwargs (dict) – passed to the scikit-learn estimator

Returns:

array of goodness-of-fit values, or a single value if as_scalar is True

Return type:

gofs (arr)

Examples:

x1 = np.cumsum(np.random.random(100))
x2 = np.cumsum(np.random.random(100))

e1 = compute_gof(x1, x2) # Default, normalized absolute error
e2 = compute_gof(x1, x2, normalize=False, use_frac=False) # Fractional error
e3 = compute_gof(x1, x2, normalize=False, use_squared=True, as_scalar='mean') # Mean squared error
e4 = compute_gof(x1, x2, skestimator='mean_squared_error') # Scikit-learn's MSE method
e5 = compute_gof(x1, x2, as_scalar='median') # Normalized median absolute error -- highly robust
diff_summaries(sim1, sim2, skip_key_diffs=False, output=False, die=False)[source]

Compute the difference of the summaries of two FPsim calibration objects, and print any values which differ.

Parameters:
  • sim1 (sim/dict) – the calib.summary dictionary, representing a single sim

  • sim2 (sim/dict) – ditto

  • skip_key_diffs (bool) – whether to skip keys that don’t match between sims

  • output (bool) – whether to return the output as a string (otherwise print)

  • die (bool) – whether to raise an exception if the sims don’t match

  • require_run (bool) – require that the simulations have been run

Example:

c1 = fp.Calibration()
c2 = fp.Calibration()
c1.run()
c2.run()
fp.diff_summaries(c1.summarize(), c2.summarize())