T7 - Calibration

Tutorial 2 demonstrated how to run the model and plot the outputs. But it’s entirely possible that the model outputs won’t look like the data for the country that you wish to model. The default parameter values included in HPVsim are intended as points of departure to be iteratively refined via calibration. The process of model calibration involves finding the model parameters that are the most likely explanation for the observed data. This tutorial gives an introduction to the Fit object and some recipes for optimization approaches.

Click here to open an interactive version of this notebook.

Data types supported by HPVsim

Data on HPV and cervical disease comes in many different formats. When using HPVsim, the goal is typically to produce population-level estimates of epidemic outputs like: - age-specific incidence of cancer or high-grade lesions in one or more years; - number of cases of cancer or high-grade lesions reported in one or more years; - HPV prevalence over time; - lifetime incidence of HPV; - the distribution of genotypes in detected cases of cancer/high-grade lesions; - sexual behavior metrics like the average age at first marriage, duration of relationships, or number of lifetime partners.

After running HPVsim, estimates all of these variables are included within the results dictionary. To plot them alongside data, the easiest method is to use the Calibration object.

The Calibration object

Calibration objects contain the following ingredients: - an hpv.Sim() instance with details of the model configuration; - two lists of parameters to vary, one for parameters that vary by genotype and one for those that don’t; - dataframes that hold the calibration targets, which are typically added as csv files; - a list of any additional results to plot; - settings that are passed to the Optuna package[LINK], an open source hyperparameter optimization framework that automates calibration for HPVsim.

We have included Optuna as a built-in calibration option as we have found that it works reasonably well, but it is also possible to use other methods; we will discuss this a little further down.

The example below illustrates the general idea of calibration, and can be adapted for different use cases:

[1]:
# Import HPVsim
import hpvsim as hpv

# Configure a simulation with some parameters
pars = dict(n_agents=10e3, start=1980, end=2020, dt=0.25, location='nigeria')
sim = hpv.Sim(pars)

# Specify some parameters to adjust during calibration.
# The parameters in the calib_pars dictionary don't vary by genotype,
# whereas those in the genotype_pars dictionary do. Both kinds are
# given in the order [best, lower_bound, upper_bound].
calib_pars = dict(
        beta=[0.05, 0.010, 0.20],
    )

genotype_pars = dict(
    hpv16=dict(
        sev_fn=dict(k=[0.5, 0.2, 1.0]),
        dur_episomal=dict(par1=[6, 4, 12])
    ),
    hpv18=dict(
        sev_fn=dict(k=[0.5, 0.2, 1.0]),
        dur_episomal=dict(par1=[6, 4, 12])
    )
)

# List the datafiles that contain data that we wish to compare the model to:
datafiles=['nigeria_cancer_cases.csv',
           'nigeria_cancer_types.csv']

# List extra results that we don't have data on, but wish to include in the
# calibration object so we can plot them.
results_to_plot = ['cancer_incidence', 'asr_cancer_incidence']

# Create the calibration object, run it, and plot the results
calib = hpv.Calibration(
    sim,
    calib_pars=calib_pars,
    genotype_pars=genotype_pars,
    extra_sim_result_keys=results_to_plot,
    datafiles=datafiles,
    total_trials=3, n_workers=1
)
calib.calibrate(die=True)
calib.plot(res_to_plot=4);
HPVsim 1.2.5 (2023-09-21) — © 2023 by IDM
Loading location-specific demographic data for "nigeria"
Initializing sim with 10000 agents
Loading location-specific data for "nigeria"
/home/docs/checkouts/readthedocs.org/user_builds/institute-for-disease-modeling-hpvsim/envs/v1.2.5/lib/python3.9/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html
  from .autonotebook import tqdm as notebook_tqdm
[I 2023-09-21 06:26:51,306] A new study created in RDB with name: hpvsim_calibration
Could not delete study, skipping...
'Record does not exist.'
Removed existing calibration hpvsim_calibration.db
Initializing sim (resetting people) with 10000 agents
Loading location-specific data for "nigeria"
  Running 1980.0 ( 0/164) (0.00 s)  ———————————————————— 1%
  Running 1982.5 (10/164) (0.28 s)  •——————————————————— 7%
  Running 1985.0 (20/164) (0.56 s)  ••—————————————————— 13%
  Running 1987.5 (30/164) (0.84 s)  •••————————————————— 19%
  Running 1990.0 (40/164) (1.12 s)  •••••——————————————— 25%
  Running 1992.5 (50/164) (1.41 s)  ••••••—————————————— 31%
  Running 1995.0 (60/164) (1.70 s)  •••••••————————————— 37%
  Running 1997.5 (70/164) (2.00 s)  ••••••••———————————— 43%
  Running 2000.0 (80/164) (2.31 s)  •••••••••——————————— 49%
  Running 2002.5 (90/164) (2.62 s)  •••••••••••————————— 55%
  Running 2005.0 (100/164) (2.95 s)  ••••••••••••———————— 62%
  Running 2007.5 (110/164) (3.30 s)  •••••••••••••——————— 68%
  Running 2010.0 (120/164) (3.65 s)  ••••••••••••••—————— 74%
  Running 2012.5 (130/164) (4.01 s)  •••••••••••••••————— 80%
  Running 2015.0 (140/164) (4.39 s)  •••••••••••••••••——— 86%
  Running 2017.5 (150/164) (4.78 s)  ••••••••••••••••••—— 92%
  Running 2020.0 (160/164) (5.20 s)  •••••••••••••••••••— 98%
Simulation summary:
    5,322,445 infections
            0 dysplasias
            0 pre-cins
    2,343,025 cin1s
    1,120,515 cin2s
      913,650 cin3s
   13,069,797 cins
      212,610 cancers
            0 cancer detections
      149,402 cancer deaths
            0 detected cancer deaths
    3,921,802 reinfections
            0 reactivations
   690,546,048 number susceptible
   17,904,532 number infectious
    1,434,402 number with inactive infection
   211,557,504 number with no cellular changes
   26,351,492 number with episomal infection
          718 number with transformation
    1,434,402 number with cancer
   19,338,934 number infected
   27,785,894 number with abnormal cells
            0 number with latent infection
    1,293,620 number with precin
    2,019,081 number with cin1
    2,947,097 number with cin2
   12,144,655 number with cin3
   17,049,062 number with detectable dysplasia
            0 number with detected cancer
            0 number screened
            0 number treated for precancerous lesions
            0 number treated for cancer
            0 number vaccinated
            0 number given therapeutic vaccine
         0.26 hpv incidence (/100)
            0 cin1 incidence (/100,000)
            0 cin2 incidence (/100,000)
            0 cin3 incidence (/100,000)
            0 dysplasia incidence (/100,000)
          195 cancer incidence (/100,000)
    7,520,377 births
    2,372,474 other deaths
     -359,139 migration
          283 age-adjusted cervical cancer incidence (/100,000)
            0 age-adjusted cervical cancer mortality
            0 newly vaccinated
            0 cumulative number vaccinated
            0 new doses
            0 cumulative doses
            0 new therapeutic vaccine doses
            0 newly received therapeutic vaccine
            0 cumulative therapeutic vaccine doses
            0 total received therapeutic vaccine
            0 new screens
            0 newly screened
            0 new cin treatments
            0 newly treated for cins
            0 new cancer treatments
            0 newly treated for cancer
            0 cumulative screens
            0 cumulative number screened
            0 cumulative cin treatments
            0 cumulative number treated for cins
            0 cumulative cancer treatments
            0 cumulative number treated for cancer
            0 detected cancer incidence (/100,000)
          135 cancer mortality
   211,557,504 number alive
            0 crude death rate
            0 crude birth rate
         2.82 hpv prevalence (/100)
            0 pre-cin prevalence (/100,000)
            0 cin1 prevalence (/100,000)
            0 cin2 prevalence (/100,000)
            0 cin3 prevalence (/100,000)

[I 2023-09-21 06:26:56,888] Trial 0 finished with value: 123.19432011095383 and parameters: {'hpv16_sev_fn_k': 0.7083787348524284, 'hpv16_dur_episomal_par1': 11.108676032838803, 'hpv18_sev_fn_k': 0.921276222780983, 'hpv18_dur_episomal_par1': 5.280273375628316, 'beta': 0.021208988650867337}. Best is trial 0 with value: 123.19432011095383.
Initializing sim (resetting people) with 10000 agents
Loading location-specific data for "nigeria"
  Running 1980.0 ( 0/164) (0.00 s)  ———————————————————— 1%
  Running 1982.5 (10/164) (0.30 s)  •——————————————————— 7%
  Running 1985.0 (20/164) (0.61 s)  ••—————————————————— 13%
  Running 1987.5 (30/164) (0.91 s)  •••————————————————— 19%
  Running 1990.0 (40/164) (1.25 s)  •••••——————————————— 25%
  Running 1992.5 (50/164) (1.60 s)  ••••••—————————————— 31%
  Running 1995.0 (60/164) (1.97 s)  •••••••————————————— 37%
  Running 1997.5 (70/164) (2.35 s)  ••••••••———————————— 43%
  Running 2000.0 (80/164) (2.79 s)  •••••••••——————————— 49%
  Running 2002.5 (90/164) (3.22 s)  •••••••••••————————— 55%
  Running 2005.0 (100/164) (3.69 s)  ••••••••••••———————— 62%
  Running 2007.5 (110/164) (4.20 s)  •••••••••••••——————— 68%
  Running 2010.0 (120/164) (4.74 s)  ••••••••••••••—————— 74%
  Running 2012.5 (130/164) (5.29 s)  •••••••••••••••————— 80%
  Running 2015.0 (140/164) (5.88 s)  •••••••••••••••••——— 86%
  Running 2017.5 (150/164) (6.52 s)  ••••••••••••••••••—— 92%
  Running 2020.0 (160/164) (7.17 s)  •••••••••••••••••••— 98%
[I 2023-09-21 06:27:04,474] Trial 1 finished with value: 530.0806832681968 and parameters: {'hpv16_sev_fn_k': 0.8369577724726847, 'hpv16_dur_episomal_par1': 6.015663135489066, 'hpv18_sev_fn_k': 0.6414490402461671, 'hpv18_dur_episomal_par1': 9.726765647524854, 'beta': 0.08105211432812606}. Best is trial 0 with value: 123.19432011095383.
Simulation summary:
   37,264,297 infections
            0 dysplasias
            0 pre-cins
   14,408,669 cin1s
    6,606,727 cin2s
    4,846,944 cin3s
   64,607,725 cins
      884,201 cancers
            0 cancer detections
      600,481 cancer deaths
            0 detected cancer deaths
   31,008,091 reinfections
            0 reactivations
   844,832,384 number susceptible
   72,427,632 number infectious
    5,763,468 number with inactive infection
   228,016,832 number with no cellular changes
   87,633,584 number with episomal infection
            0 number with transformation
    5,763,468 number with cancer
   78,191,104 number infected
   93,397,064 number with abnormal cells
            0 number with latent infection
    7,738,015 number with precin
   11,405,547 number with cin1
   13,281,690 number with cin2
   41,207,648 number with cin3
   58,867,252 number with detectable dysplasia
            0 number with detected cancer
            0 number screened
            0 number treated for precancerous lesions
            0 number treated for cancer
            0 number vaccinated
            0 number given therapeutic vaccine
         1.47 hpv incidence (/100)
            0 cin1 incidence (/100,000)
            0 cin2 incidence (/100,000)
            0 cin3 incidence (/100,000)
            0 dysplasia incidence (/100,000)
          738 cancer incidence (/100,000)
    7,506,012 births
    2,815,652 other deaths
     -107,742 migration
          894 age-adjusted cervical cancer incidence (/100,000)
            0 age-adjusted cervical cancer mortality
            0 newly vaccinated
            0 cumulative number vaccinated
            0 new doses
            0 cumulative doses
            0 new therapeutic vaccine doses
            0 newly received therapeutic vaccine
            0 cumulative therapeutic vaccine doses
            0 total received therapeutic vaccine
            0 new screens
            0 newly screened
            0 new cin treatments
            0 newly treated for cins
            0 new cancer treatments
            0 newly treated for cancer
            0 cumulative screens
            0 cumulative number screened
            0 cumulative cin treatments
            0 cumulative number treated for cins
            0 cumulative cancer treatments
            0 cumulative number treated for cancer
            0 detected cancer incidence (/100,000)
          478 cancer mortality
   228,016,832 number alive
            0 crude death rate
            0 crude birth rate
        10.59 hpv prevalence (/100)
            0 pre-cin prevalence (/100,000)
            0 cin1 prevalence (/100,000)
            0 cin2 prevalence (/100,000)
            0 cin3 prevalence (/100,000)

Initializing sim (resetting people) with 10000 agents
Loading location-specific data for "nigeria"
  Running 1980.0 ( 0/164) (0.00 s)  ———————————————————— 1%
  Running 1982.5 (10/164) (0.29 s)  •——————————————————— 7%
  Running 1985.0 (20/164) (0.60 s)  ••—————————————————— 13%
  Running 1987.5 (30/164) (0.91 s)  •••————————————————— 19%
  Running 1990.0 (40/164) (1.24 s)  •••••——————————————— 25%
  Running 1992.5 (50/164) (1.58 s)  ••••••—————————————— 31%
  Running 1995.0 (60/164) (1.96 s)  •••••••————————————— 37%
  Running 1997.5 (70/164) (2.33 s)  ••••••••———————————— 43%
  Running 2000.0 (80/164) (2.74 s)  •••••••••——————————— 49%
  Running 2002.5 (90/164) (3.16 s)  •••••••••••————————— 55%
  Running 2005.0 (100/164) (3.59 s)  ••••••••••••———————— 62%
  Running 2007.5 (110/164) (4.03 s)  •••••••••••••——————— 68%
  Running 2010.0 (120/164) (4.50 s)  ••••••••••••••—————— 74%
  Running 2012.5 (130/164) (4.99 s)  •••••••••••••••————— 80%
  Running 2015.0 (140/164) (5.52 s)  •••••••••••••••••——— 86%
  Running 2017.5 (150/164) (6.05 s)  ••••••••••••••••••—— 92%
  Running 2020.0 (160/164) (6.61 s)  •••••••••••••••••••— 98%
[I 2023-09-21 06:27:11,455] Trial 2 finished with value: 214.72799841954736 and parameters: {'hpv16_sev_fn_k': 0.2524037838362088, 'hpv16_dur_episomal_par1': 5.029469368030074, 'hpv18_sev_fn_k': 0.23320064495926873, 'hpv18_dur_episomal_par1': 8.329451167081391, 'beta': 0.12478222451785628}. Best is trial 0 with value: 123.19432011095383.
Simulation summary:
   39,979,390 infections
            0 dysplasias
            0 pre-cins
   12,704,912 cin1s
    3,408,950 cin2s
    1,738,953 cin3s
   43,392,647 cins
      363,449 cancers
            0 cancer detections
      260,017 cancer deaths
            0 detected cancer deaths
   33,069,550 reinfections
            0 reactivations
   706,104,000 number susceptible
   62,562,796 number infectious
    2,399,051 number with inactive infection
   216,877,792 number with no cellular changes
   79,427,976 number with episomal infection
       29,449 number with transformation
    2,399,051 number with cancer
   64,961,844 number infected
   81,827,032 number with abnormal cells
            0 number with latent infection
    7,571,375 number with precin
   24,411,416 number with cin1
   15,203,802 number with cin2
   13,300,365 number with cin3
   46,848,288 number with detectable dysplasia
            0 number with detected cancer
            0 number screened
            0 number treated for precancerous lesions
            0 number treated for cancer
            0 number vaccinated
            0 number given therapeutic vaccine
         1.89 hpv incidence (/100)
            0 cin1 incidence (/100,000)
            0 cin2 incidence (/100,000)
            0 cin3 incidence (/100,000)
            0 dysplasia incidence (/100,000)
          324 cancer incidence (/100,000)
    7,513,194 births
    2,577,184 other deaths
     -193,935 migration
          419 age-adjusted cervical cancer incidence (/100,000)
            0 age-adjusted cervical cancer mortality
            0 newly vaccinated
            0 cumulative number vaccinated
            0 new doses
            0 cumulative doses
            0 new therapeutic vaccine doses
            0 newly received therapeutic vaccine
            0 cumulative therapeutic vaccine doses
            0 total received therapeutic vaccine
            0 new screens
            0 newly screened
            0 new cin treatments
            0 newly treated for cins
            0 new cancer treatments
            0 newly treated for cancer
            0 cumulative screens
            0 cumulative number screened
            0 cumulative cin treatments
            0 cumulative number treated for cins
            0 cumulative cancer treatments
            0 cumulative number treated for cancer
            0 detected cancer incidence (/100,000)
          227 cancer mortality
   216,877,792 number alive
            0 crude death rate
            0 crude birth rate
         9.62 hpv prevalence (/100)
            0 pre-cin prevalence (/100,000)
            0 cin1 prevalence (/100,000)
            0 cin2 prevalence (/100,000)
            0 cin3 prevalence (/100,000)

Loading saved results...
    Removed temporary file tmp_calibration_00000.obj
  Loaded trial 0
    Removed temporary file tmp_calibration_00001.obj
  Loaded trial 1
    Removed temporary file tmp_calibration_00002.obj
  Loaded trial 2
Making results structure...
Processed 3 trials; 0 failed
Deleted study hpvsim_calibration in sqlite:///hpvsim_calibration.db
Removed existing calibration hpvsim_calibration.db
/home/docs/checkouts/readthedocs.org/user_builds/institute-for-disease-modeling-hpvsim/envs/v1.2.5/lib/python3.9/site-packages/seaborn/_oldcore.py:1498: FutureWarning: is_categorical_dtype is deprecated and will be removed in a future version. Use isinstance(dtype, CategoricalDtype) instead
  if pd.api.types.is_categorical_dtype(vector):
/home/docs/checkouts/readthedocs.org/user_builds/institute-for-disease-modeling-hpvsim/envs/v1.2.5/lib/python3.9/site-packages/seaborn/_oldcore.py:1498: FutureWarning: is_categorical_dtype is deprecated and will be removed in a future version. Use isinstance(dtype, CategoricalDtype) instead
  if pd.api.types.is_categorical_dtype(vector):
/home/docs/checkouts/readthedocs.org/user_builds/institute-for-disease-modeling-hpvsim/envs/v1.2.5/lib/python3.9/site-packages/seaborn/_oldcore.py:1498: FutureWarning: is_categorical_dtype is deprecated and will be removed in a future version. Use isinstance(dtype, CategoricalDtype) instead
  if pd.api.types.is_categorical_dtype(vector):
/home/docs/checkouts/readthedocs.org/user_builds/institute-for-disease-modeling-hpvsim/envs/v1.2.5/lib/python3.9/site-packages/seaborn/_oldcore.py:1498: FutureWarning: is_categorical_dtype is deprecated and will be removed in a future version. Use isinstance(dtype, CategoricalDtype) instead
  if pd.api.types.is_categorical_dtype(vector):
/home/docs/checkouts/readthedocs.org/user_builds/institute-for-disease-modeling-hpvsim/envs/v1.2.5/lib/python3.9/site-packages/seaborn/_oldcore.py:1498: FutureWarning: is_categorical_dtype is deprecated and will be removed in a future version. Use isinstance(dtype, CategoricalDtype) instead
  if pd.api.types.is_categorical_dtype(vector):
/home/docs/checkouts/readthedocs.org/user_builds/institute-for-disease-modeling-hpvsim/envs/v1.2.5/lib/python3.9/site-packages/seaborn/_oldcore.py:1498: FutureWarning: is_categorical_dtype is deprecated and will be removed in a future version. Use isinstance(dtype, CategoricalDtype) instead
  if pd.api.types.is_categorical_dtype(vector):
../_images/tutorials_tut_calibration_3_10.svg

This isn’t a great fit yet! In general, it will probably be necessary to run many more trials that the 3 we ran here. Moreover, careful consideration should be given to the parameters that you want to adjust during calibration. In HPVsim we have taken the approach that any parameter can be adjusted. As we learn more about which parameters make most sense to calibrate, we will add details here. We would also enourage users to share their experiences with calibration and parameter searches.

[ ]:

[ ]:

[ ]:

[ ]:

[ ]:

[ ]:

[ ]:

[ ]: