hpvsim.calibration module¶
Define the calibration class
- class Calibration(sim, datafiles, calib_pars=None, genotype_pars=None, hiv_pars=None, fit_args=None, extra_sim_result_keys=None, par_samplers=None, n_trials=None, n_workers=None, total_trials=None, name=None, db_name=None, estimator=None, keep_db=None, storage=None, rand_seed=None, sampler=None, label=None, die=False, verbose=True)[source]¶
Bases:
prettyobj
A class to handle calibration of HPVsim simulations. Uses the Optuna hyperparameter optimization library (optuna.org), which must be installed separately (via pip install optuna).
Note: running a calibration does not guarantee a good fit! You must ensure that you run for a sufficient number of iterations, have enough free parameters, and that the parameters have wide enough bounds. Please see the tutorial on calibration for more information.
- Parameters:
sim (Sim) – the simulation to calibrate
datafiles (list) – list of datafile strings to calibrate to
calib_pars (dict) – a dictionary of the parameters to calibrate of the format dict(key1=[best, low, high])
genotype_pars (dict) – a dictionary of the genotype-specific parameters to calibrate of the format dict(genotype=dict(key1=[best, low, high]))
hiv_pars (dict) – a dictionary of the hiv-specific parameters to calibrate of the format dict(key1=[best, low, high])
extra_sim_results (list) – list of result strings to store
fit_args (dict) – a dictionary of options that are passed to sim.compute_fit() to calculate the goodness-of-fit
par_samplers (dict) – an optional mapping from parameters to the Optuna sampler to use for choosing new points for each; by default, suggest_float
n_trials (int) – the number of trials per worker
n_workers (int) – the number of parallel workers (default: maximum
total_trials (int) – if n_trials is not supplied, calculate by dividing this number by n_workers)
name (str) – the name of the database (default: ‘hpvsim_calibration’)
db_name (str) – the name of the database file (default: ‘hpvsim_calibration.db’)
keep_db (bool) – whether to keep the database after calibration (default: false)
storage (str) – the location of the database (default: sqlite)
rand_seed (int) – if provided, use this random seed to initialize Optuna runs (for reproducibility)
label (str) – a label for this calibration object
die (bool) – whether to stop if an exception is encountered (default: false)
verbose (bool) – whether to print details of the calibration
kwargs (dict) – passed to hpv.Calibration()
- Returns:
A Calibration object
Example:
sim = hpv.Sim(pars, genotypes=[16, 18]) calib_pars = dict(beta=[0.05, 0.010, 0.20],hpv_control_prob=[.9, 0.5, 1]) calib = hpv.Calibration(sim, calib_pars=calib_pars, datafiles=['test_data/south_africa_hpv_data.xlsx', 'test_data/south_africa_cancer_data.xlsx'], total_trials=10, n_workers=4) calib.calibrate() calib.plot()
- run_sim(calib_pars=None, genotype_pars=None, hiv_pars=None, label=None, return_sim=False)[source]¶
Create and run a simulation
- static update_dict_pars(name_pars, value_pars)[source]¶
Function to update parameters from nested dict to nested dict’s value
- update_dict_pars_from_trial(name_pars, value_pars)[source]¶
Function to update parameters from nested dict to trial parameter’s value
- update_dict_pars_init_and_bounds(initial_pars, par_bounds, target_pars)[source]¶
Function to update initial parameters and parameter bounds from a trial pars dict
- get_full_pars(sim=None, calib_pars=None, genotype_pars=None, hiv_pars=None)[source]¶
Make a full pardict from the subset of regular sim parameters, genotype parameters, and hiv parameters used in calibration
- trial_pars_to_sim_pars(trial_pars=None, which_pars=None, return_full=True)[source]¶
Create genotype_pars and pars dicts from the trial parameters. Note: not used during self.calibrate. :param trial_pars: dictionary of parameters from a single trial. If not provided, best parameters will be used :type trial_pars: dict :param return_full: whether to return a unified par dict ready for use in a sim, or the sim pars and genotype pars separately :type return_full: bool
Example:
sim = hpv.Sim(genotypes=[16, 18]) calib_pars = dict(beta=[0.05, 0.010, 0.20],hpv_control_prob=[.9, 0.5, 1]) genotype_pars = dict(hpv16=dict(prog_time=[3, 3, 10])) calib = hpv.Calibration(sim, calib_pars=calib_pars, genotype_pars=genotype_pars datafiles=['test_data/south_africa_hpv_data.xlsx', 'test_data/south_africa_cancer_data.xlsx'], total_trials=10, n_workers=4) calib.calibrate() new_pars = calib.trial_pars_to_sim_pars() # Returns best parameters from calibration in a format ready for sim running sim.update_pars(new_pars) sim.run()
- trial_to_sim_pars(pardict=None, trial=None)[source]¶
Take in an optuna trial and sample from pars, after extracting them from the structure they’re provided in
- calibrate(calib_pars=None, genotype_pars=None, hiv_pars=None, verbose=True, load=True, tidyup=True, **kwargs)[source]¶
Actually perform calibration.
- Parameters:
calib_pars (dict) – if supplied, overwrite stored calib_pars
verbose (bool) – whether to print output from each trial
kwargs (dict) – if supplied, overwrite stored run_args (n_trials, n_workers, etc.)
- plot(res_to_plot=None, fig_args=None, axis_args=None, data_args=None, show_args=None, do_save=None, fig_path=None, do_show=True, plot_type='sns.boxplot', **kwargs)[source]¶
Plot the calibration results
- Parameters:
res_to_plot (int) – number of results to plot. if None, plot them all
fig_args (dict) – passed to pl.figure()
axis_args (dict) – passed to pl.subplots_adjust()
data_args (dict) – ‘width’, ‘color’, and ‘offset’ arguments for the data
do_save (bool) – whether to save
fig_path (str or filepath) – filepath to save to
do_show (bool) – whether to show the figure
kwargs (dict) – passed to
hpv.options.with_style()
; see that function for choices