T1 - Getting started

Installing and getting started with HPVsim is quite simple.

HPVsim is a Python package that can be pip-installed by typing pip install hpvsim into a terminal. You can then check that the installation was successful by importing HPVsim with import hpvsim as hpv.

The basic design philosophy of HPVsim is: common tasks should be simple. For example:

  • Defining parameters

  • Running a simulation

  • Plotting results

This tutorial walks you through how to do these things.

Click here to open an interactive version of this notebook.

Hello world

To create, run, and plot a sim with default options is just:

[1]:
import hpvsim as hpv

sim = hpv.Sim()
sim.run()
fig = sim.plot()
HPVsim 2.0.0 (2023-11-29) — © 2023 by IDM
Loading location-specific demographic data for "nigeria"
Initializing sim with 20000 agents
Loading location-specific data for "nigeria"
  Running 1995.0 ( 0/144) (1.06 s)  ———————————————————— 1%
  Running 1997.5 (10/144) (1.35 s)  •——————————————————— 8%
  Running 2000.0 (20/144) (1.65 s)  ••—————————————————— 15%
  Running 2002.5 (30/144) (1.94 s)  ••••———————————————— 22%
  Running 2005.0 (40/144) (2.25 s)  •••••——————————————— 28%
  Running 2007.5 (50/144) (2.55 s)  •••••••————————————— 35%
  Running 2010.0 (60/144) (2.87 s)  ••••••••———————————— 42%
  Running 2012.5 (70/144) (3.18 s)  •••••••••——————————— 49%
  Running 2015.0 (80/144) (3.51 s)  •••••••••••————————— 56%
  Running 2017.5 (90/144) (3.86 s)  ••••••••••••———————— 63%
  Running 2020.0 (100/144) (4.21 s)  ••••••••••••••—————— 70%
  Running 2022.5 (110/144) (4.57 s)  •••••••••••••••————— 77%
  Running 2025.0 (120/144) (4.95 s)  ••••••••••••••••———— 84%
  Running 2027.5 (130/144) (5.34 s)  ••••••••••••••••••—— 91%
  Running 2030.0 (140/144) (5.77 s)  •••••••••••••••••••— 98%
Simulation summary:
     874,111,895 total HPV infections
         510,599 total cancers
         286,811 total cancer deaths
            5.14 mean HPV prevalence (%)
           15.21 mean cancer incidence (per 100k)
           32.94 mean age of infection (years)
           47.79 mean age of cancer (years)

../_images/tutorials_tut_intro_3_1.svg

Defining parameters and genotypes, and running simulations

Parameters are defined as a dictionary. Some common parameters to modify are the number of agents in the simulation, the genotypes to simulate, and the start and end dates of the simulation. We can define those as:

[2]:
pars = dict(
    n_agents = 10e3,
    genotypes = [16, 18, 'hr'], # Simulate genotypes 16 and 18, plus all other high-risk HPV genotypes pooled together
    start = 1980,
    end = 2030,
)

Running a simulation is pretty easy. In fact, running a sim with the parameters we defined above is just:

[3]:
sim = hpv.Sim(pars)
sim.run()
Loading location-specific demographic data for "nigeria"
Initializing sim with 10000 agents
Loading location-specific data for "nigeria"
  Running 1980.0 ( 0/204) (0.04 s)  ———————————————————— 0%
  Running 1982.5 (10/204) (0.26 s)  •——————————————————— 5%
  Running 1985.0 (20/204) (0.47 s)  ••—————————————————— 10%
  Running 1987.5 (30/204) (0.68 s)  •••————————————————— 15%
  Running 1990.0 (40/204) (0.90 s)  ••••———————————————— 20%
  Running 1992.5 (50/204) (1.11 s)  •••••——————————————— 25%
  Running 1995.0 (60/204) (1.33 s)  •••••——————————————— 30%
  Running 1997.5 (70/204) (1.55 s)  ••••••—————————————— 35%
  Running 2000.0 (80/204) (1.78 s)  •••••••————————————— 40%
  Running 2002.5 (90/204) (2.01 s)  ••••••••———————————— 45%
  Running 2005.0 (100/204) (2.25 s)  •••••••••——————————— 50%
  Running 2007.5 (110/204) (2.50 s)  ••••••••••—————————— 54%
  Running 2010.0 (120/204) (2.76 s)  •••••••••••————————— 59%
  Running 2012.5 (130/204) (3.03 s)  ••••••••••••———————— 64%
  Running 2015.0 (140/204) (3.31 s)  •••••••••••••——————— 69%
  Running 2017.5 (150/204) (3.60 s)  ••••••••••••••—————— 74%
  Running 2020.0 (160/204) (3.90 s)  •••••••••••••••————— 79%
  Running 2022.5 (170/204) (4.19 s)  ••••••••••••••••———— 84%
  Running 2025.0 (180/204) (4.49 s)  •••••••••••••••••——— 89%
  Running 2027.5 (190/204) (4.80 s)  ••••••••••••••••••—— 94%
  Running 2030.0 (200/204) (5.12 s)  •••••••••••••••••••— 99%
Simulation summary:
     635,895,662 total HPV infections
         438,150 total cancers
         257,144 total cancer deaths
            3.46 mean HPV prevalence (%)
           10.55 mean cancer incidence (per 100k)
           33.25 mean age of infection (years)
           44.62 mean age of cancer (years)

[3]:
Sim(<no label>; 1980 to 2030; pop: 10000 default; epi: 6.35896e+08⚙, 438150♋︎)

This will generate a results dictionary sim.results. Results by genotype are named things like sim.results['infections'] and stored as arrays where each row corresponds to a genotype, while totals across all genotypes have names like sim.results['infections'] or sim.results['cancers'].

Rather than creating a parameter dictionary, any valid parameter can also be passed to the sim directly. For example, exactly equivalent to the above is:

[4]:
sim = hpv.Sim(n_agents=10e3, start=1980, end=2030)
sim.run()
Loading location-specific demographic data for "nigeria"
Initializing sim with 10000 agents
Loading location-specific data for "nigeria"
  Running 1980.0 ( 0/204) (0.04 s)  ———————————————————— 0%
  Running 1982.5 (10/204) (0.27 s)  •——————————————————— 5%
  Running 1985.0 (20/204) (0.49 s)  ••—————————————————— 10%
  Running 1987.5 (30/204) (0.70 s)  •••————————————————— 15%
  Running 1990.0 (40/204) (0.92 s)  ••••———————————————— 20%
  Running 1992.5 (50/204) (1.15 s)  •••••——————————————— 25%
  Running 1995.0 (60/204) (1.37 s)  •••••——————————————— 30%
  Running 1997.5 (70/204) (1.60 s)  ••••••—————————————— 35%
  Running 2000.0 (80/204) (1.84 s)  •••••••————————————— 40%
  Running 2002.5 (90/204) (2.08 s)  ••••••••———————————— 45%
  Running 2005.0 (100/204) (2.34 s)  •••••••••——————————— 50%
  Running 2007.5 (110/204) (2.59 s)  ••••••••••—————————— 54%
  Running 2010.0 (120/204) (2.86 s)  •••••••••••————————— 59%
  Running 2012.5 (130/204) (3.14 s)  ••••••••••••———————— 64%
  Running 2015.0 (140/204) (3.43 s)  •••••••••••••——————— 69%
  Running 2017.5 (150/204) (3.72 s)  ••••••••••••••—————— 74%
  Running 2020.0 (160/204) (4.03 s)  •••••••••••••••————— 79%
  Running 2022.5 (170/204) (4.34 s)  ••••••••••••••••———— 84%
  Running 2025.0 (180/204) (4.68 s)  •••••••••••••••••——— 89%
  Running 2027.5 (190/204) (5.04 s)  ••••••••••••••••••—— 94%
  Running 2030.0 (200/204) (5.40 s)  •••••••••••••••••••— 99%
Simulation summary:
     902,840,317 total HPV infections
         590,425 total cancers
         351,238 total cancer deaths
            4.36 mean HPV prevalence (%)
           14.15 mean cancer incidence (per 100k)
           31.98 mean age of infection (years)
           41.29 mean age of cancer (years)

[4]:
Sim(<no label>; 1980 to 2030; pop: 10000 default; epi: 9.0284e+08⚙, 590425♋︎)

You can mix and match too – pass in a parameter dictionary with default options, and then include other parameters as keywords (including overrides; keyword arguments take precedence). For example:

[5]:
sim = hpv.Sim(pars, end=2050) # Use parameters defined above, except set the end data to 2050
sim.run()
Loading location-specific demographic data for "nigeria"
Initializing sim with 10000 agents
Loading location-specific data for "nigeria"
  Running 1980.0 ( 0/284) (0.04 s)  ———————————————————— 0%
  Running 1982.5 (10/284) (0.26 s)  ———————————————————— 4%
  Running 1985.0 (20/284) (0.47 s)  •——————————————————— 7%
  Running 1987.5 (30/284) (0.68 s)  ••—————————————————— 11%
  Running 1990.0 (40/284) (0.90 s)  ••—————————————————— 14%
  Running 1992.5 (50/284) (1.11 s)  •••————————————————— 18%
  Running 1995.0 (60/284) (1.33 s)  ••••———————————————— 21%
  Running 1997.5 (70/284) (1.55 s)  •••••——————————————— 25%
  Running 2000.0 (80/284) (1.78 s)  •••••——————————————— 29%
  Running 2002.5 (90/284) (2.01 s)  ••••••—————————————— 32%
  Running 2005.0 (100/284) (2.25 s)  •••••••————————————— 36%
  Running 2007.5 (110/284) (2.49 s)  •••••••————————————— 39%
  Running 2010.0 (120/284) (2.75 s)  ••••••••———————————— 43%
  Running 2012.5 (130/284) (3.02 s)  •••••••••——————————— 46%
  Running 2015.0 (140/284) (3.30 s)  •••••••••——————————— 50%
  Running 2017.5 (150/284) (3.58 s)  ••••••••••—————————— 53%
  Running 2020.0 (160/284) (3.88 s)  •••••••••••————————— 57%
  Running 2022.5 (170/284) (4.17 s)  ••••••••••••———————— 60%
  Running 2025.0 (180/284) (4.47 s)  ••••••••••••———————— 64%
  Running 2027.5 (190/284) (4.78 s)  •••••••••••••——————— 67%
  Running 2030.0 (200/284) (5.10 s)  ••••••••••••••—————— 71%
  Running 2032.5 (210/284) (5.42 s)  ••••••••••••••—————— 74%
  Running 2035.0 (220/284) (5.75 s)  •••••••••••••••————— 78%
  Running 2037.5 (230/284) (6.10 s)  ••••••••••••••••———— 81%
  Running 2040.0 (240/284) (6.46 s)  ••••••••••••••••———— 85%
  Running 2042.5 (250/284) (6.82 s)  •••••••••••••••••——— 88%
  Running 2045.0 (260/284) (7.24 s)  ••••••••••••••••••—— 92%
  Running 2047.5 (270/284) (7.63 s)  •••••••••••••••••••— 95%
  Running 2050.0 (280/284) (8.05 s)  •••••••••••••••••••— 99%
Simulation summary:
   1,225,861,740 total HPV infections
       1,009,181 total cancers
         636,395 total cancer deaths
            3.61 mean HPV prevalence (%)
           12.68 mean cancer incidence (per 100k)
           34.86 mean age of infection (years)
           44.68 mean age of cancer (years)

[5]:
Sim(<no label>; 1980 to 2050; pop: 10000 default; epi: 1.22586e+09⚙, 1.00918e+06♋︎)

Plotting results

As you saw above, plotting the results of a simulation is rather easy too:

[6]:
fig = sim.plot()
../_images/tutorials_tut_intro_13_0.svg

Full usage example

Many of the details of this example will be explained in later tutorials, but to give you a taste, here’s an example of how you would run two simulations to determine the impact of a custom intervention aimed at protecting the elderly.

[7]:
import hpvsim as hpv

# Custom vaccination intervention
def custom_vx(sim):
    if sim.yearvec[sim.t] == 2000:
        target_group = (sim.people.age>9) * (sim.people.age<14)
        sim.people.peak_imm[0, target_group] = 1

pars = dict(
    location = 'tanzania', # Use population characteristics for Japan
    n_agents = 10e3, # Have 50,000 people total in the population
    start = 1980, # Start the simulation in 1980
    n_years = 50, # Run the simulation for 50 years
    burnin = 10, # Discard the first 20 years as burnin period
    verbose = 0, # Do not print any output
)

# Running with multisims -- see Tutorial 3
s1 = hpv.Sim(pars, label='Default')
s2 = hpv.Sim(pars, interventions=custom_vx, label='Custom vaccination')
msim = hpv.MultiSim([s1, s2])
msim.run()
fig = msim.plot(['cancers', 'cins'])
Loading location-specific demographic data for "tanzania"
Loading location-specific demographic data for "tanzania"
../_images/tutorials_tut_intro_15_1.svg
[ ]: