Calibration#

class Calibration(sim, datafiles, calib_pars=None, genotype_pars=None, hiv_pars=None, fit_args=None, extra_sim_result_keys=None, par_samplers=None, n_trials=None, n_workers=None, total_trials=None, name=None, db_name=None, estimator=None, keep_db=None, storage=None, rand_seed=None, sampler=None, label=None, die=False, verbose=True)[source]#

Bases: prettyobj

A class to handle calibration of HPVsim simulations. Uses the Optuna hyperparameter optimization library (optuna.org), which must be installed separately (via pip install optuna).

Note: running a calibration does not guarantee a good fit! You must ensure that you run for a sufficient number of iterations, have enough free parameters, and that the parameters have wide enough bounds. Please see the tutorial on calibration for more information.

Parameters:
  • sim (Sim) – the simulation to calibrate

  • datafiles (list) – list of datafile strings to calibrate to

  • calib_pars (dict) – a dictionary of the parameters to calibrate of the format dict(key1=[best, low, high])

  • genotype_pars (dict) – a dictionary of the genotype-specific parameters to calibrate of the format dict(genotype=dict(key1=[best, low, high]))

  • hiv_pars (dict) – a dictionary of the hiv-specific parameters to calibrate of the format dict(key1=[best, low, high])

  • extra_sim_results (list) – list of result strings to store

  • fit_args (dict) – a dictionary of options that are passed to sim.compute_fit() to calculate the goodness-of-fit

  • par_samplers (dict) – an optional mapping from parameters to the Optuna sampler to use for choosing new points for each; by default, suggest_float

  • n_trials (int) – the number of trials per worker

  • n_workers (int) – the number of parallel workers (default: maximum

  • total_trials (int) – if n_trials is not supplied, calculate by dividing this number by n_workers)

  • name (str) – the name of the database (default: ‘hpvsim_calibration’)

  • db_name (str) – the name of the database file (default: ‘hpvsim_calibration.db’)

  • keep_db (bool) – whether to keep the database after calibration (default: false)

  • storage (str) – the location of the database (default: sqlite)

  • rand_seed (int) – if provided, use this random seed to initialize Optuna runs (for reproducibility)

  • label (str) – a label for this calibration object

  • die (bool) – whether to stop if an exception is encountered (default: false)

  • verbose (bool) – whether to print details of the calibration

  • kwargs (dict) – passed to hpv.Calibration()

Returns:

A Calibration object

Example:

sim = hpv.Sim(pars, genotypes=[16, 18])
calib_pars = dict(beta=[0.05, 0.010, 0.20],hpv_control_prob=[.9, 0.5, 1])
calib = hpv.Calibration(sim, calib_pars=calib_pars,
                        datafiles=['test_data/south_africa_hpv_data.xlsx',
                                   'test_data/south_africa_cancer_data.xlsx'],
                        total_trials=10, n_workers=4)
calib.calibrate()
calib.plot()

Methods

run_sim(calib_pars=None, genotype_pars=None, hiv_pars=None, label=None, return_sim=False)[source]#

Create and run a simulation

static update_dict_pars(name_pars, value_pars)[source]#

Function to update parameters from nested dict to nested dict’s value

update_dict_pars_from_trial(name_pars, value_pars)[source]#

Function to update parameters from nested dict to trial parameter’s value

update_dict_pars_init_and_bounds(initial_pars, par_bounds, target_pars)[source]#

Function to update initial parameters and parameter bounds from a trial pars dict

get_full_pars(sim=None, calib_pars=None, genotype_pars=None, hiv_pars=None)[source]#

Make a full pardict from the subset of regular sim parameters, genotype parameters, and hiv parameters used in calibration

trial_pars_to_sim_pars(trial_pars=None, which_pars=None, return_full=True)[source]#

Create genotype_pars and pars dicts from the trial parameters. Note: not used during self.calibrate. :type trial_pars: :param trial_pars: dictionary of parameters from a single trial. If not provided, best parameters will be used :type trial_pars: dict :type return_full: :param return_full: whether to return a unified par dict ready for use in a sim, or the sim pars and genotype pars separately :type return_full: bool

Example:

sim = hpv.Sim(genotypes=[16, 18])
calib_pars = dict(beta=[0.05, 0.010, 0.20],hpv_control_prob=[.9, 0.5, 1])
genotype_pars = dict(hpv16=dict(prog_time=[3, 3, 10]))
calib = hpv.Calibration(sim, calib_pars=calib_pars, genotype_pars=genotype_pars
                    datafiles=['test_data/south_africa_hpv_data.xlsx',
                               'test_data/south_africa_cancer_data.xlsx'],
                    total_trials=10, n_workers=4)
calib.calibrate()
new_pars = calib.trial_pars_to_sim_pars() # Returns best parameters from calibration in a format ready for sim running
sim.update_pars(new_pars)
sim.run()
sim_to_sample_pars()[source]#

Convert sim pars to sample pars

trial_to_sim_pars(pardict=None, trial=None)[source]#

Take in an optuna trial and sample from pars, after extracting them from the structure they’re provided in

run_trial(trial, save=True)[source]#

Define the objective for Optuna

worker()[source]#

Run a single worker

run_workers()[source]#

Run multiple workers in parallel

remove_db()[source]#

Remove the database file if keep_db is false and the path exists.

make_study()[source]#

Make a study, deleting one if it already exists

calibrate(calib_pars=None, genotype_pars=None, hiv_pars=None, verbose=True, load=True, tidyup=True, **kwargs)[source]#

Actually perform calibration.

Parameters:
  • calib_pars (dict) – if supplied, overwrite stored calib_pars

  • verbose (bool) – whether to print output from each trial

  • kwargs (dict) – if supplied, overwrite stored run_args (n_trials, n_workers, etc.)

parse_study(study)[source]#

Parse the study into a data frame – called automatically

to_json(filename=None, indent=2, **kwargs)[source]#

Convert the data to JSON.

plot(res_to_plot=None, fig_args=None, axis_args=None, data_args=None, show_args=None, do_save=None, fig_path=None, do_show=True, plot_type='sns.boxplot', **kwargs)[source]#

Plot the calibration results

Parameters:
  • res_to_plot (int) – number of results to plot. if None, plot them all

  • fig_args (dict) – passed to pl.figure()

  • axis_args (dict) – passed to pl.subplots_adjust()

  • data_args (dict) – ‘width’, ‘color’, and ‘offset’ arguments for the data

  • do_save (bool) – whether to save

  • fig_path (str or filepath) – filepath to save to

  • do_show (bool) – whether to show the figure

  • kwargs (dict) – passed to hpv.options.with_style(); see that function for choices