T3 - Running scenarios

While running individual sims can be interesting for simple explorations, at some point it will almost always be necessary to run a large number of simulations simultaneously – to explore different scenarios, to perform calibration, or simply to get uncertainty bounds on a single projection. This tutorial explains how to do that.

Click here to open an interactive version of this notebook.

Running with MultiSims

The most common way to run multiple simulations is with the MultiSim object. As the name suggests, this is a relatively simple container for a number of sims. However, it contains powerful methods for plotting, statistics, and running all the sims in parallel.

Running one sim with uncertainty

Making and running a multisim based on a single sim is pretty easy:

[1]:
import hpvsim as hpv
hpv.options(jupyter=True, verbose=0)

sim = hpv.Sim()
msim = hpv.MultiSim(sim)
msim.run(n_runs=5)
msim.plot();
HPVsim 0.4.0 (2022-11-16) — © 2022 by IDM
No genotypes provided, will assume only simulating HPV16 by default
No genotypes provided, will assume only simulating HPV16 by default
No genotypes provided, will assume only simulating HPV16 by defaultNo genotypes provided, will assume only simulating HPV16 by default

No genotypes provided, will assume only simulating HPV16 by default
../_images/tutorials_tut_running_3_1.svg

If you run a multisim with a single sim input as above, it will change the random seed for each sim, which is what leads to the variability you see.

By default, the multisim simply plots each simulation. These simulations are stored in the sims attribute, which is just a simple list of sims:

[2]:
for sim in msim.sims:
    sim.brief()
Sim("Sim 0"; 2015.0 to 2030.0; pop: 20000 default; epi: 1034⚙, 16♋︎)
Sim("Sim 1"; 2015.0 to 2030.0; pop: 20000 default; epi: 1016⚙, 20♋︎)
Sim("Sim 2"; 2015.0 to 2030.0; pop: 20000 default; epi: 1101⚙, 22♋︎)
Sim("Sim 3"; 2015.0 to 2030.0; pop: 20000 default; epi: 1053⚙, 14♋︎)
Sim("Sim 4"; 2015.0 to 2030.0; pop: 20000 default; epi: 1018⚙, 23♋︎)

However, you often don’t care about the individual sims (especially when you run the same parameters with different random seeds); you want to see the statistics for the sims. You can calculate either the mean or the median of the results across all the sims as follows:

[3]:
msim.mean()
msim.plot('total_infections');
../_images/tutorials_tut_running_7_0.svg
[4]:
msim.median()
msim.plot('total_infections');
../_images/tutorials_tut_running_8_0.svg

You can see these are similar, but slightly different. You can also treat each of the individual sims as part of a larger single sim, and “combine” the results into one sim:

[5]:
msim.combine()
msim.plot('total_infections');
../_images/tutorials_tut_running_10_0.svg

Note how now there is no uncertainty and the total number of infections is 5x higher than in the previous plots, since we just added 5 different sims together.

Each of these operations modifies the msim.base_sim object, and does not affect the actual list of stored sims, which is why you can go back and forth between them.

Running different sims

Often you don’t want to run the same sim with different seeds, but instead want to run a set of different sims. That’s also very easy – for example, here’s how you would do a sweep across the relative transmissibility of people with high-grade (CIN3) lesions:

[6]:
import numpy as np

rel_trans_cin3_vals = np.linspace(0.5, 1.5, 5) # Sweep from 0.5 to 1.5 with 5 values
sims = []
for rel_trans_cin3 in rel_trans_cin3_vals:
    sim = hpv.Sim(rel_trans_cin3=rel_trans_cin3, label=f'Rel trans CIN3 = {rel_trans_cin3}')
    sims.append(sim)
msim = hpv.MultiSim(sims)
msim.run()
msim.plot('total_infections');
No genotypes provided, will assume only simulating HPV16 by defaultNo genotypes provided, will assume only simulating HPV16 by default

No genotypes provided, will assume only simulating HPV16 by defaultNo genotypes provided, will assume only simulating HPV16 by default

No genotypes provided, will assume only simulating HPV16 by default
../_images/tutorials_tut_running_12_1.svg

As you would expect, the more transmissible people with CIN3s are, the more infections we get.

Finally, note that you can use multisims to do very compact scenario explorations – here we are using the command hpv.parallel(), which is an alias for hpv.MultiSim().run():

[7]:
def custom_vx(sim):
    if sim.yearvec[sim.t] == 2000:
        target_group = (sim.people.age>9) * (sim.people.age<14)
        sim.people.peak_imm[0, target_group] = 1

pars = dict(
    location = 'tanzania', # Use population characteristics for Japan
    n_agents = 10e3, # Have 50,000 people total in the population
    start = 1980, # Start the simulation in 1980
    n_years = 50, # Run the simulation for 50 years
    burnin = 10, # Discard the first 20 years as burnin period
    verbose = 0, # Do not print any output
)

# Running with multisims -- see Tutorial 3
s1 = hpv.Sim(pars, label='Default')
s2 = hpv.Sim(pars, interventions=custom_vx, label='Custom vaccination')
hpv.parallel(s1, s2).plot(['total_hpv_incidence', 'total_cin_incidence']);
Loading location-specific demographic data for "tanzania"
Loading location-specific demographic data for "tanzania"
No genotypes provided, will assume only simulating HPV16 by default
No genotypes provided, will assume only simulating HPV16 by default
/home/docs/checkouts/readthedocs.org/user_builds/institute-for-disease-modeling-hpvsim/envs/latest/lib/python3.9/site-packages/hpvsim/data/loaders.py:209: FutureWarning: The default value of numeric_only in DataFrameGroupBy.sum is deprecated. In a future version, numeric_only will default to False. Either specify numeric_only or select only columns which should be valid for the function.
  dd = full_df.groupby("Time").sum()["PopTotal"]
/home/docs/checkouts/readthedocs.org/user_builds/institute-for-disease-modeling-hpvsim/envs/latest/lib/python3.9/site-packages/hpvsim/data/loaders.py:209: FutureWarning: The default value of numeric_only in DataFrameGroupBy.sum is deprecated. In a future version, numeric_only will default to False. Either specify numeric_only or select only columns which should be valid for the function.
  dd = full_df.groupby("Time").sum()["PopTotal"]
../_images/tutorials_tut_running_14_2.svg

Warning: Because multiprocess pickles the sims when running them, sims[0] (before being run by the multisim) and msim.sims[0] are not the same object. After calling msim.run(), always use sims from the multisim object, not from before. In contrast, if you don’t run the multisim (e.g. if you make a multisim from already-run sims), then sims[0] and msim.sims[0] are indeed exactly the same object.

Advanced usage

Finally, you can also merge or split different multisims together. Here’s an example that’s similar to before, except it shows how to run a multisim of different seeds for the same rel_trans_cin3 value, but then merge multisims for different rel_trans_cin3 values together into one multisim:

[8]:
n_sims = 3
rel_trans_cin3_vals = [0.5, 1.0, 1.5]

msims = []
for rel_trans_cin3 in rel_trans_cin3_vals:
    sims = []
    for s in range(n_sims):
        sim = hpv.Sim(n_agents=10e3, rel_trans_cin3=rel_trans_cin3, rand_seed=s, label=f'Rel trans CIN3 = {rel_trans_cin3}')
        sims.append(sim)
    msim = hpv.MultiSim(sims)
    msim.run()
    msim.mean()
    msims.append(msim)

merged = hpv.MultiSim.merge(msims, base=True)
merged.plot(color_by_sim=True);
No genotypes provided, will assume only simulating HPV16 by defaultNo genotypes provided, will assume only simulating HPV16 by default

No genotypes provided, will assume only simulating HPV16 by default
No genotypes provided, will assume only simulating HPV16 by defaultNo genotypes provided, will assume only simulating HPV16 by default

No genotypes provided, will assume only simulating HPV16 by default
No genotypes provided, will assume only simulating HPV16 by defaultNo genotypes provided, will assume only simulating HPV16 by default

No genotypes provided, will assume only simulating HPV16 by default
../_images/tutorials_tut_running_17_1.svg

As you can see, running this way lets you run not just different values, but run different values with uncertainty. Which brings us to…

Running with Scenarios

Most of the time, you’ll want to run with multisims since they give you the most flexibility. However, in certain cases, Scenario objects let you achieve the same thing more simply. Unlike MultiSims, which are completely agnostic about what sims you include, scenarios always start from the same base sim. They then modify the parameters as you specify, and finally add uncertainty, if desired. For example, this shows how you’d use scenarios to run the example similar to the one above.

[9]:
# Set base parameters -- these will be shared across all scenarios
basepars = {'n_agents':10e3}

# Configure the settings for each scenario
scenarios = {'baseline': {
              'name':'Baseline',
              'pars': {}
              },
            'high_rel_trans_cin3': {
              'name':'High rel trans CIN3 (1.5)',
              'pars': {
                  'rel_trans_cin3': 1.5,
                  }
              },
            'low_rel_trans_cin3': {
              'name':'Low rel trans CIN3 (0.5)',
              'pars': {
                  'rel_trans_cin3': 0.5,
                  }
              },
             }

# Run and plot the scenarios
scens = hpv.Scenarios(basepars=basepars, scenarios=scenarios)
scens.run()
scens.plot();
No genotypes provided, will assume only simulating HPV16 by default
../_images/tutorials_tut_running_20_1.svg