T1 - Getting started

Installing and getting started with HPVsim is quite simple.

HPVsim is a Python package that can be pip-installed by typing pip install hpvsim into a terminal. You can then check that the installation was successful by importing HPVsim with import hpvsim as hpv.

The basic design philosophy of HPVsim is: common tasks should be simple. For example:

  • Defining parameters

  • Running a simulation

  • Plotting results

This tutorial walks you through how to do these things.

Click here to open an interactive version of this notebook.

Hello world

To create, run, and plot a sim with default options is just:

[1]:
import hpvsim as hpv

sim = hpv.Sim()
sim.run()
fig = sim.plot()
HPVsim 0.4.0 (2022-11-16) — © 2022 by IDM
No genotypes provided, will assume only simulating HPV16 by default
Initializing sim with 20000 agents
  Running 2015.0 ( 0/80) (0.04 s)  ———————————————————— 1%
  Running 2017.0 (10/80) (0.29 s)  ••—————————————————— 14%
  Running 2019.0 (20/80) (0.54 s)  •••••——————————————— 26%
  Running 2021.0 (30/80) (0.81 s)  •••••••————————————— 39%
  Running 2023.0 (40/80) (1.10 s)  ••••••••••—————————— 51%
  Running 2025.0 (50/80) (1.40 s)  ••••••••••••———————— 64%
  Running 2027.0 (60/80) (1.72 s)  •••••••••••••••————— 76%
  Running 2029.0 (70/80) (2.05 s)  •••••••••••••••••——— 89%
Simulation summary:
        1,034 total infections
          240 total cin1s
          267 total cin2s
          130 total cin3s
          637 total cins
           16 total cancers
            0 total cancer detections
            0 total cancer deaths
            0 total detected cancer deaths
          411 total reinfections
          131 total reactivations
            0 total hiv infections
         5.05 total hpv incidence (/100)
        1,736 total cin1 incidence (/100,000)
        1,931 total cin2 incidence (/100,000)
          940 total cin3 incidence (/100,000)
        4,607 total cin incidence (/100,000)
          116 total cancer incidence (/100,000)
         9.13 total hpv prevalence (/100)

../_images/tutorials_tut_intro_3_1.svg

Defining parameters and genotypes, and running simulations

Parameters are defined as a dictionary. Some common parameters to modify are the number of agents in the simulation, the genotypes to simulate, and the start and end dates of the simulation. We can define those as:

[2]:
pars = dict(
    n_agents = 10e3,
    genotypes = [16, 18, 'hrhpv'], # Simulate genotypes 16 and 18, plus all other high-risk HPV genotypes pooled together
    start = 1980,
    end = 2030,
)

Running a simulation is pretty easy. In fact, running a sim with the parameters we defined above is just:

[3]:
sim = hpv.Sim(pars)
sim.run()
Initializing sim with 10000 agents
  Running 1980.0 ( 0/255) (0.03 s)  ———————————————————— 0%
  Running 1982.0 (10/255) (0.18 s)  ———————————————————— 4%
  Running 1984.0 (20/255) (0.34 s)  •——————————————————— 8%
  Running 1986.0 (30/255) (0.51 s)  ••—————————————————— 12%
  Running 1988.0 (40/255) (0.69 s)  •••————————————————— 16%
  Running 1990.0 (50/255) (0.88 s)  ••••———————————————— 20%
  Running 1992.0 (60/255) (1.08 s)  ••••———————————————— 24%
  Running 1994.0 (70/255) (1.28 s)  •••••——————————————— 28%
  Running 1996.0 (80/255) (1.49 s)  ••••••—————————————— 32%
  Running 1998.0 (90/255) (1.70 s)  •••••••————————————— 36%
  Running 2000.0 (100/255) (1.93 s)  •••••••————————————— 40%
  Running 2002.0 (110/255) (2.16 s)  ••••••••———————————— 44%
  Running 2004.0 (120/255) (2.40 s)  •••••••••——————————— 47%
  Running 2006.0 (130/255) (2.64 s)  ••••••••••—————————— 51%
  Running 2008.0 (140/255) (2.89 s)  •••••••••••————————— 55%
  Running 2010.0 (150/255) (3.15 s)  •••••••••••————————— 59%
  Running 2012.0 (160/255) (3.42 s)  ••••••••••••———————— 63%
  Running 2014.0 (170/255) (3.72 s)  •••••••••••••——————— 67%
  Running 2016.0 (180/255) (4.00 s)  ••••••••••••••—————— 71%
  Running 2018.0 (190/255) (4.30 s)  ••••••••••••••—————— 75%
  Running 2020.0 (200/255) (4.61 s)  •••••••••••••••————— 79%
  Running 2022.0 (210/255) (4.93 s)  ••••••••••••••••———— 83%
  Running 2024.0 (220/255) (5.25 s)  •••••••••••••••••——— 87%
  Running 2026.0 (230/255) (5.60 s)  ••••••••••••••••••—— 91%
  Running 2028.0 (240/255) (5.94 s)  ••••••••••••••••••—— 95%
  Running 2030.0 (250/255) (6.31 s)  •••••••••••••••••••— 98%
Simulation summary:
        3,402 total infections
          538 total cin1s
          485 total cin2s
          249 total cin3s
        1,272 total cins
           29 total cancers
            0 total cancer detections
           23 total cancer deaths
            0 total detected cancer deaths
        1,186 total reinfections
          533 total reactivations
            0 total hiv infections
         3.38 total hpv incidence (/100)
        2,522 total cin1 incidence (/100,000)
        2,273 total cin2 incidence (/100,000)
        1,167 total cin3 incidence (/100,000)
        5,963 total cin incidence (/100,000)
          136 total cancer incidence (/100,000)
        14.21 total hpv prevalence (/100)

[3]:
Sim(<no label>; 1980 to 2030; pop: 10000 default; epi: 3402⚙, 29♋︎)

This will generate a results dictionary sim.results. Results by genotype are named things like sim.results['infections'] and stored as arrays where each row corresponds to a genotype, while totals across all genotypes have names like sim.results['total_infections'] or sim.results['total_cancers'].

Rather than creating a parameter dictionary, any valid parameter can also be passed to the sim directly. For example, exactly equivalent to the above is:

[4]:
sim = hpv.Sim(n_agents=10e3, start=1980, end=2030)
sim.run()
No genotypes provided, will assume only simulating HPV16 by default
Initializing sim with 10000 agents
  Running 1980.0 ( 0/255) (0.03 s)  ———————————————————— 0%
  Running 1982.0 (10/255) (0.21 s)  ———————————————————— 4%
  Running 1984.0 (20/255) (0.40 s)  •——————————————————— 8%
  Running 1986.0 (30/255) (0.59 s)  ••—————————————————— 12%
  Running 1988.0 (40/255) (0.81 s)  •••————————————————— 16%
  Running 1990.0 (50/255) (1.03 s)  ••••———————————————— 20%
  Running 1992.0 (60/255) (1.26 s)  ••••———————————————— 24%
  Running 1994.0 (70/255) (1.50 s)  •••••——————————————— 28%
  Running 1996.0 (80/255) (1.74 s)  ••••••—————————————— 32%
  Running 1998.0 (90/255) (1.99 s)  •••••••————————————— 36%
  Running 2000.0 (100/255) (2.26 s)  •••••••————————————— 40%
  Running 2002.0 (110/255) (2.52 s)  ••••••••———————————— 44%
  Running 2004.0 (120/255) (2.80 s)  •••••••••——————————— 47%
  Running 2006.0 (130/255) (3.10 s)  ••••••••••—————————— 51%
  Running 2008.0 (140/255) (3.39 s)  •••••••••••————————— 55%
  Running 2010.0 (150/255) (3.70 s)  •••••••••••————————— 59%
  Running 2012.0 (160/255) (4.02 s)  ••••••••••••———————— 63%
  Running 2014.0 (170/255) (4.36 s)  •••••••••••••——————— 67%
  Running 2016.0 (180/255) (4.70 s)  ••••••••••••••—————— 71%
  Running 2018.0 (190/255) (5.06 s)  ••••••••••••••—————— 75%
  Running 2020.0 (200/255) (5.42 s)  •••••••••••••••————— 79%
  Running 2022.0 (210/255) (5.80 s)  ••••••••••••••••———— 83%
  Running 2024.0 (220/255) (6.20 s)  •••••••••••••••••——— 87%
  Running 2026.0 (230/255) (6.60 s)  ••••••••••••••••••—— 91%
  Running 2028.0 (240/255) (7.02 s)  ••••••••••••••••••—— 95%
  Running 2030.0 (250/255) (7.45 s)  •••••••••••••••••••— 98%
Simulation summary:
        1,519 total infections
          362 total cin1s
          333 total cin2s
          182 total cin3s
          877 total cins
           26 total cancers
            0 total cancer detections
           11 total cancer deaths
            0 total detected cancer deaths
          586 total reinfections
          201 total reactivations
            0 total hiv infections
         4.92 total hpv incidence (/100)
        1,659 total cin1 incidence (/100,000)
        1,526 total cin2 incidence (/100,000)
          834 total cin3 incidence (/100,000)
        4,020 total cin incidence (/100,000)
          119 total cancer incidence (/100,000)
         8.26 total hpv prevalence (/100)

[4]:
Sim(<no label>; 1980 to 2030; pop: 10000 default; epi: 1519⚙, 26♋︎)

You can mix and match too – pass in a parameter dictionary with default options, and then include other parameters as keywords (including overrides; keyword arguments take precedence). For example:

[5]:
sim = hpv.Sim(pars, end=2050) # Use parameters defined above, except set the end data to 2050
sim.run()
Initializing sim with 10000 agents
  Running 1980.0 ( 0/355) (0.05 s)  ———————————————————— 0%
  Running 1982.0 (10/355) (0.21 s)  ———————————————————— 3%
  Running 1984.0 (20/355) (0.37 s)  •——————————————————— 6%
  Running 1986.0 (30/355) (0.54 s)  •——————————————————— 9%
  Running 1988.0 (40/355) (0.72 s)  ••—————————————————— 12%
  Running 1990.0 (50/355) (0.92 s)  ••—————————————————— 14%
  Running 1992.0 (60/355) (1.11 s)  •••————————————————— 17%
  Running 1994.0 (70/355) (1.31 s)  ••••———————————————— 20%
  Running 1996.0 (80/355) (1.52 s)  ••••———————————————— 23%
  Running 1998.0 (90/355) (1.74 s)  •••••——————————————— 26%
  Running 2000.0 (100/355) (1.97 s)  •••••——————————————— 28%
  Running 2002.0 (110/355) (2.20 s)  ••••••—————————————— 31%
  Running 2004.0 (120/355) (2.44 s)  ••••••—————————————— 34%
  Running 2006.0 (130/355) (2.69 s)  •••••••————————————— 37%
  Running 2008.0 (140/355) (2.94 s)  •••••••————————————— 40%
  Running 2010.0 (150/355) (3.20 s)  ••••••••———————————— 43%
  Running 2012.0 (160/355) (3.46 s)  •••••••••——————————— 45%
  Running 2014.0 (170/355) (3.75 s)  •••••••••——————————— 48%
  Running 2016.0 (180/355) (4.03 s)  ••••••••••—————————— 51%
  Running 2018.0 (190/355) (4.32 s)  ••••••••••—————————— 54%
  Running 2020.0 (200/355) (4.62 s)  •••••••••••————————— 57%
  Running 2022.0 (210/355) (4.94 s)  •••••••••••————————— 59%
  Running 2024.0 (220/355) (5.26 s)  ••••••••••••———————— 62%
  Running 2026.0 (230/355) (5.60 s)  •••••••••••••——————— 65%
  Running 2028.0 (240/355) (5.95 s)  •••••••••••••——————— 68%
  Running 2030.0 (250/355) (6.32 s)  ••••••••••••••—————— 71%
  Running 2032.0 (260/355) (6.69 s)  ••••••••••••••—————— 74%
  Running 2034.0 (270/355) (7.07 s)  •••••••••••••••————— 76%
  Running 2036.0 (280/355) (7.46 s)  •••••••••••••••————— 79%
  Running 2038.0 (290/355) (7.86 s)  ••••••••••••••••———— 82%
  Running 2040.0 (300/355) (8.27 s)  ••••••••••••••••———— 85%
  Running 2042.0 (310/355) (8.69 s)  •••••••••••••••••——— 88%
  Running 2044.0 (320/355) (9.13 s)  ••••••••••••••••••—— 90%
  Running 2046.0 (330/355) (9.58 s)  ••••••••••••••••••—— 93%
  Running 2048.0 (340/355) (10.05 s)  •••••••••••••••••••— 96%
  Running 2050.0 (350/355) (10.52 s)  •••••••••••••••••••— 99%
Simulation summary:
        4,452 total infections
          599 total cin1s
          631 total cin2s
          335 total cin3s
        1,565 total cins
           33 total cancers
            0 total cancer detections
           27 total cancer deaths
            0 total detected cancer deaths
        1,578 total reinfections
          773 total reactivations
            0 total hiv infections
         3.07 total hpv incidence (/100)
        1,938 total cin1 incidence (/100,000)
        2,042 total cin2 incidence (/100,000)
        1,084 total cin3 incidence (/100,000)
        5,064 total cin incidence (/100,000)
          107 total cancer incidence (/100,000)
        12.87 total hpv prevalence (/100)

[5]:
Sim(<no label>; 1980 to 2050; pop: 10000 default; epi: 4452⚙, 33♋︎)

Plotting results

As you saw above, plotting the results of a simulation is rather easy too:

[6]:
fig = sim.plot()
../_images/tutorials_tut_intro_13_0.svg

Full usage example

Many of the details of this example will be explained in later tutorials, but to give you a taste, here’s an example of how you would run two simulations to determine the impact of a custom intervention aimed at protecting the elderly.

[7]:
import hpvsim as hpv

# Custom vaccination intervention
def custom_vx(sim):
    if sim.yearvec[sim.t] == 2000:
        target_group = (sim.people.age>9) * (sim.people.age<14)
        sim.people.peak_imm[0, target_group] = 1

pars = dict(
    location = 'tanzania', # Use population characteristics for Japan
    n_agents = 10e3, # Have 50,000 people total in the population
    start = 1980, # Start the simulation in 1980
    n_years = 50, # Run the simulation for 50 years
    burnin = 10, # Discard the first 20 years as burnin period
    verbose = 0, # Do not print any output
)

# Running with multisims -- see Tutorial 3
s1 = hpv.Sim(pars, label='Default')
s2 = hpv.Sim(pars, interventions=custom_vx, label='Custom vaccination')
msim = hpv.MultiSim([s1, s2])
msim.run()
fig = msim.plot(['total_cancers', 'total_cins'])
Loading location-specific demographic data for "tanzania"
Loading location-specific demographic data for "tanzania"
No genotypes provided, will assume only simulating HPV16 by default
No genotypes provided, will assume only simulating HPV16 by default
/home/docs/checkouts/readthedocs.org/user_builds/institute-for-disease-modeling-hpvsim/envs/latest/lib/python3.9/site-packages/hpvsim/data/loaders.py:209: FutureWarning: The default value of numeric_only in DataFrameGroupBy.sum is deprecated. In a future version, numeric_only will default to False. Either specify numeric_only or select only columns which should be valid for the function.
  dd = full_df.groupby("Time").sum()["PopTotal"]
/home/docs/checkouts/readthedocs.org/user_builds/institute-for-disease-modeling-hpvsim/envs/latest/lib/python3.9/site-packages/hpvsim/data/loaders.py:209: FutureWarning: The default value of numeric_only in DataFrameGroupBy.sum is deprecated. In a future version, numeric_only will default to False. Either specify numeric_only or select only columns which should be valid for the function.
  dd = full_df.groupby("Time").sum()["PopTotal"]
../_images/tutorials_tut_intro_15_2.svg
[ ]: