T1 - Getting started¶
Installing and getting started with HPVsim is quite simple.
HPVsim is a Python package that can be pip-installed by typing pip install hpvsim
into a terminal. You can then check that the installation was successful by importing HPVsim with import hpvsim as hpv
.
The basic design philosophy of HPVsim is: common tasks should be simple. For example:
Defining parameters
Running a simulation
Plotting results
This tutorial walks you through how to do these things.
Click here to open an interactive version of this notebook.
Hello world¶
To create, run, and plot a sim with default options is just:
[1]:
import hpvsim as hpv
sim = hpv.Sim()
sim.run()
fig = sim.plot()
HPVsim 2.0.0 (2023-11-29) — © 2023 by IDM
Loading location-specific demographic data for "nigeria"
Initializing sim with 20000 agents
Loading location-specific data for "nigeria"
Running 1995.0 ( 0/144) (1.06 s) ———————————————————— 1%
Running 1997.5 (10/144) (1.35 s) •——————————————————— 8%
Running 2000.0 (20/144) (1.65 s) ••—————————————————— 15%
Running 2002.5 (30/144) (1.94 s) ••••———————————————— 22%
Running 2005.0 (40/144) (2.25 s) •••••——————————————— 28%
Running 2007.5 (50/144) (2.55 s) •••••••————————————— 35%
Running 2010.0 (60/144) (2.87 s) ••••••••———————————— 42%
Running 2012.5 (70/144) (3.18 s) •••••••••——————————— 49%
Running 2015.0 (80/144) (3.51 s) •••••••••••————————— 56%
Running 2017.5 (90/144) (3.86 s) ••••••••••••———————— 63%
Running 2020.0 (100/144) (4.21 s) ••••••••••••••—————— 70%
Running 2022.5 (110/144) (4.57 s) •••••••••••••••————— 77%
Running 2025.0 (120/144) (4.95 s) ••••••••••••••••———— 84%
Running 2027.5 (130/144) (5.34 s) ••••••••••••••••••—— 91%
Running 2030.0 (140/144) (5.77 s) •••••••••••••••••••— 98%
Simulation summary:
874,111,895 total HPV infections
510,599 total cancers
286,811 total cancer deaths
5.14 mean HPV prevalence (%)
15.21 mean cancer incidence (per 100k)
32.94 mean age of infection (years)
47.79 mean age of cancer (years)
Defining parameters and genotypes, and running simulations¶
Parameters are defined as a dictionary. Some common parameters to modify are the number of agents in the simulation, the genotypes to simulate, and the start and end dates of the simulation. We can define those as:
[2]:
pars = dict(
n_agents = 10e3,
genotypes = [16, 18, 'hr'], # Simulate genotypes 16 and 18, plus all other high-risk HPV genotypes pooled together
start = 1980,
end = 2030,
)
Running a simulation is pretty easy. In fact, running a sim with the parameters we defined above is just:
[3]:
sim = hpv.Sim(pars)
sim.run()
Loading location-specific demographic data for "nigeria"
Initializing sim with 10000 agents
Loading location-specific data for "nigeria"
Running 1980.0 ( 0/204) (0.04 s) ———————————————————— 0%
Running 1982.5 (10/204) (0.26 s) •——————————————————— 5%
Running 1985.0 (20/204) (0.47 s) ••—————————————————— 10%
Running 1987.5 (30/204) (0.68 s) •••————————————————— 15%
Running 1990.0 (40/204) (0.90 s) ••••———————————————— 20%
Running 1992.5 (50/204) (1.11 s) •••••——————————————— 25%
Running 1995.0 (60/204) (1.33 s) •••••——————————————— 30%
Running 1997.5 (70/204) (1.55 s) ••••••—————————————— 35%
Running 2000.0 (80/204) (1.78 s) •••••••————————————— 40%
Running 2002.5 (90/204) (2.01 s) ••••••••———————————— 45%
Running 2005.0 (100/204) (2.25 s) •••••••••——————————— 50%
Running 2007.5 (110/204) (2.50 s) ••••••••••—————————— 54%
Running 2010.0 (120/204) (2.76 s) •••••••••••————————— 59%
Running 2012.5 (130/204) (3.03 s) ••••••••••••———————— 64%
Running 2015.0 (140/204) (3.31 s) •••••••••••••——————— 69%
Running 2017.5 (150/204) (3.60 s) ••••••••••••••—————— 74%
Running 2020.0 (160/204) (3.90 s) •••••••••••••••————— 79%
Running 2022.5 (170/204) (4.19 s) ••••••••••••••••———— 84%
Running 2025.0 (180/204) (4.49 s) •••••••••••••••••——— 89%
Running 2027.5 (190/204) (4.80 s) ••••••••••••••••••—— 94%
Running 2030.0 (200/204) (5.12 s) •••••••••••••••••••— 99%
Simulation summary:
635,895,662 total HPV infections
438,150 total cancers
257,144 total cancer deaths
3.46 mean HPV prevalence (%)
10.55 mean cancer incidence (per 100k)
33.25 mean age of infection (years)
44.62 mean age of cancer (years)
[3]:
Sim(<no label>; 1980 to 2030; pop: 10000 default; epi: 6.35896e+08⚙, 438150♋︎)
This will generate a results dictionary sim.results
. Results by genotype are named things like sim.results['infections']
and stored as arrays where each row corresponds to a genotype, while totals across all genotypes have names like sim.results['infections']
or sim.results['cancers']
.
Rather than creating a parameter dictionary, any valid parameter can also be passed to the sim directly. For example, exactly equivalent to the above is:
[4]:
sim = hpv.Sim(n_agents=10e3, start=1980, end=2030)
sim.run()
Loading location-specific demographic data for "nigeria"
Initializing sim with 10000 agents
Loading location-specific data for "nigeria"
Running 1980.0 ( 0/204) (0.04 s) ———————————————————— 0%
Running 1982.5 (10/204) (0.27 s) •——————————————————— 5%
Running 1985.0 (20/204) (0.49 s) ••—————————————————— 10%
Running 1987.5 (30/204) (0.70 s) •••————————————————— 15%
Running 1990.0 (40/204) (0.92 s) ••••———————————————— 20%
Running 1992.5 (50/204) (1.15 s) •••••——————————————— 25%
Running 1995.0 (60/204) (1.37 s) •••••——————————————— 30%
Running 1997.5 (70/204) (1.60 s) ••••••—————————————— 35%
Running 2000.0 (80/204) (1.84 s) •••••••————————————— 40%
Running 2002.5 (90/204) (2.08 s) ••••••••———————————— 45%
Running 2005.0 (100/204) (2.34 s) •••••••••——————————— 50%
Running 2007.5 (110/204) (2.59 s) ••••••••••—————————— 54%
Running 2010.0 (120/204) (2.86 s) •••••••••••————————— 59%
Running 2012.5 (130/204) (3.14 s) ••••••••••••———————— 64%
Running 2015.0 (140/204) (3.43 s) •••••••••••••——————— 69%
Running 2017.5 (150/204) (3.72 s) ••••••••••••••—————— 74%
Running 2020.0 (160/204) (4.03 s) •••••••••••••••————— 79%
Running 2022.5 (170/204) (4.34 s) ••••••••••••••••———— 84%
Running 2025.0 (180/204) (4.68 s) •••••••••••••••••——— 89%
Running 2027.5 (190/204) (5.04 s) ••••••••••••••••••—— 94%
Running 2030.0 (200/204) (5.40 s) •••••••••••••••••••— 99%
Simulation summary:
902,840,317 total HPV infections
590,425 total cancers
351,238 total cancer deaths
4.36 mean HPV prevalence (%)
14.15 mean cancer incidence (per 100k)
31.98 mean age of infection (years)
41.29 mean age of cancer (years)
[4]:
Sim(<no label>; 1980 to 2030; pop: 10000 default; epi: 9.0284e+08⚙, 590425♋︎)
You can mix and match too – pass in a parameter dictionary with default options, and then include other parameters as keywords (including overrides; keyword arguments take precedence). For example:
[5]:
sim = hpv.Sim(pars, end=2050) # Use parameters defined above, except set the end data to 2050
sim.run()
Loading location-specific demographic data for "nigeria"
Initializing sim with 10000 agents
Loading location-specific data for "nigeria"
Running 1980.0 ( 0/284) (0.04 s) ———————————————————— 0%
Running 1982.5 (10/284) (0.26 s) ———————————————————— 4%
Running 1985.0 (20/284) (0.47 s) •——————————————————— 7%
Running 1987.5 (30/284) (0.68 s) ••—————————————————— 11%
Running 1990.0 (40/284) (0.90 s) ••—————————————————— 14%
Running 1992.5 (50/284) (1.11 s) •••————————————————— 18%
Running 1995.0 (60/284) (1.33 s) ••••———————————————— 21%
Running 1997.5 (70/284) (1.55 s) •••••——————————————— 25%
Running 2000.0 (80/284) (1.78 s) •••••——————————————— 29%
Running 2002.5 (90/284) (2.01 s) ••••••—————————————— 32%
Running 2005.0 (100/284) (2.25 s) •••••••————————————— 36%
Running 2007.5 (110/284) (2.49 s) •••••••————————————— 39%
Running 2010.0 (120/284) (2.75 s) ••••••••———————————— 43%
Running 2012.5 (130/284) (3.02 s) •••••••••——————————— 46%
Running 2015.0 (140/284) (3.30 s) •••••••••——————————— 50%
Running 2017.5 (150/284) (3.58 s) ••••••••••—————————— 53%
Running 2020.0 (160/284) (3.88 s) •••••••••••————————— 57%
Running 2022.5 (170/284) (4.17 s) ••••••••••••———————— 60%
Running 2025.0 (180/284) (4.47 s) ••••••••••••———————— 64%
Running 2027.5 (190/284) (4.78 s) •••••••••••••——————— 67%
Running 2030.0 (200/284) (5.10 s) ••••••••••••••—————— 71%
Running 2032.5 (210/284) (5.42 s) ••••••••••••••—————— 74%
Running 2035.0 (220/284) (5.75 s) •••••••••••••••————— 78%
Running 2037.5 (230/284) (6.10 s) ••••••••••••••••———— 81%
Running 2040.0 (240/284) (6.46 s) ••••••••••••••••———— 85%
Running 2042.5 (250/284) (6.82 s) •••••••••••••••••——— 88%
Running 2045.0 (260/284) (7.24 s) ••••••••••••••••••—— 92%
Running 2047.5 (270/284) (7.63 s) •••••••••••••••••••— 95%
Running 2050.0 (280/284) (8.05 s) •••••••••••••••••••— 99%
Simulation summary:
1,225,861,740 total HPV infections
1,009,181 total cancers
636,395 total cancer deaths
3.61 mean HPV prevalence (%)
12.68 mean cancer incidence (per 100k)
34.86 mean age of infection (years)
44.68 mean age of cancer (years)
[5]:
Sim(<no label>; 1980 to 2050; pop: 10000 default; epi: 1.22586e+09⚙, 1.00918e+06♋︎)
Plotting results¶
As you saw above, plotting the results of a simulation is rather easy too:
[6]:
fig = sim.plot()
Full usage example¶
Many of the details of this example will be explained in later tutorials, but to give you a taste, here’s an example of how you would run two simulations to determine the impact of a custom intervention aimed at protecting the elderly.
[7]:
import hpvsim as hpv
# Custom vaccination intervention
def custom_vx(sim):
if sim.yearvec[sim.t] == 2000:
target_group = (sim.people.age>9) * (sim.people.age<14)
sim.people.peak_imm[0, target_group] = 1
pars = dict(
location = 'tanzania', # Use population characteristics for Japan
n_agents = 10e3, # Have 50,000 people total in the population
start = 1980, # Start the simulation in 1980
n_years = 50, # Run the simulation for 50 years
burnin = 10, # Discard the first 20 years as burnin period
verbose = 0, # Do not print any output
)
# Running with multisims -- see Tutorial 3
s1 = hpv.Sim(pars, label='Default')
s2 = hpv.Sim(pars, interventions=custom_vx, label='Custom vaccination')
msim = hpv.MultiSim([s1, s2])
msim.run()
fig = msim.plot(['cancers', 'cins'])
Loading location-specific demographic data for "tanzania"
Loading location-specific demographic data for "tanzania"
[ ]: