T1 - Getting started

Installing and getting started with HPVsim is quite simple.

HPVsim is a Python package that can be pip-installed by typing pip install hpvsim into a terminal. You can then check that the installation was successful by importing HPVsim with import hpvsim as hpv.

The basic design philosophy of HPVsim is: common tasks should be simple. For example:

  • Defining parameters

  • Running a simulation

  • Plotting results

This tutorial walks you through how to do these things.

Click here to open an interactive version of this notebook.

Hello world

To create, run, and plot a sim with default options is just:

[1]:
import hpvsim as hpv

sim = hpv.Sim()
sim.run()
fig = sim.plot()
HPVsim 1.2.0 (2023-05-31) — © 2023 by IDM
Loading location-specific demographic data for "nigeria"
/home/docs/checkouts/readthedocs.org/user_builds/institute-for-disease-modeling-hpvsim/envs/v1.2.0/lib/python3.9/site-packages/sciris/sc_fileio.py:172: UserWarning: Fixing known unpickling deprecation "No module named 'pandas.core.indexes.numeric'"
  obj = _unpickler(filestr, **kw, **kwargs) # Unpickle the data
Initializing sim with 20000 agents
Loading location-specific data for "nigeria"
/home/docs/checkouts/readthedocs.org/user_builds/institute-for-disease-modeling-hpvsim/envs/v1.2.0/lib/python3.9/site-packages/sciris/sc_fileio.py:172: UserWarning: Fixing known unpickling deprecation "No module named 'pandas.core.indexes.numeric'"
  obj = _unpickler(filestr, **kw, **kwargs) # Unpickle the data
  Running 1995.0 ( 0/144) (1.03 s)  ———————————————————— 1%
  Running 1997.5 (10/144) (1.40 s)  •——————————————————— 8%
  Running 2000.0 (20/144) (1.79 s)  ••—————————————————— 15%
  Running 2002.5 (30/144) (2.18 s)  ••••———————————————— 22%
  Running 2005.0 (40/144) (2.61 s)  •••••——————————————— 28%
  Running 2007.5 (50/144) (3.03 s)  •••••••————————————— 35%
  Running 2010.0 (60/144) (3.47 s)  ••••••••———————————— 42%
  Running 2012.5 (70/144) (3.94 s)  •••••••••——————————— 49%
  Running 2015.0 (80/144) (4.43 s)  •••••••••••————————— 56%
  Running 2017.5 (90/144) (4.94 s)  ••••••••••••———————— 63%
  Running 2020.0 (100/144) (5.47 s)  ••••••••••••••—————— 70%
  Running 2022.5 (110/144) (6.00 s)  •••••••••••••••————— 77%
  Running 2025.0 (120/144) (6.55 s)  ••••••••••••••••———— 84%
  Running 2027.5 (130/144) (7.13 s)  ••••••••••••••••••—— 91%
  Running 2030.0 (140/144) (7.76 s)  •••••••••••••••••••— 98%
Simulation summary:
   13,395,204 infections
            0 dysplasias
            0 pre-cins
    3,230,231 cin1s
      173,048 cin2s
       81,717 cin3s
    5,793,372 cins
       20,296 cancers
            0 cancer detections
       17,625 cancer deaths
            0 detected cancer deaths
    9,683,216 reinfections
            0 reactivations
   774,427,776 number susceptible
   16,136,736 number infectious
      120,172 number with inactive infection
   260,436,336 number with no cellular changes
   39,733,240 number with episomal infection
        2,670 number with transformation
      120,172 number with cancer
   16,256,907 number infected
   39,853,408 number with abnormal cells
            0 number with latent infection
    3,388,858 number with precin
    5,666,257 number with cin1
    1,453,284 number with cin2
    1,126,949 number with cin3
    8,122,044 number with detectable dysplasia
            0 number with detected cancer
            0 number screened
            0 number treated for precancerous lesions
            0 number treated for cancer
            0 number vaccinated
            0 number given therapeutic vaccine
         0.58 hpv incidence (/100)
            0 cin1 incidence (/100,000)
            0 cin2 incidence (/100,000)
            0 cin3 incidence (/100,000)
            0 dysplasia incidence (/100,000)
           16 cancer incidence (/100,000)
    9,416,166 births
    2,604,267 other deaths
   -1,089,562 migration
           24 age-adjusted cervical cancer incidence (/100,000)
            0 age-adjusted cervical cancer mortality
            0 newly vaccinated
            0 cumulative number vaccinated
            0 new doses
            0 cumulative doses
            0 new therapeutic vaccine doses
            0 newly received therapeutic vaccine
            0 cumulative therapeutic vaccine doses
            0 total received therapeutic vaccine
            0 new screens
            0 newly screened
            0 new cin treatments
            0 newly treated for cins
            0 new cancer treatments
            0 newly treated for cancer
            0 cumulative screens
            0 cumulative number screened
            0 cumulative cin treatments
            0 cumulative number treated for cins
            0 cumulative cancer treatments
            0 cumulative number treated for cancer
            0 detected cancer incidence (/100,000)
           14 cancer mortality
   260,436,336 number alive
            0 crude death rate
            0 crude birth rate
         2.07 hpv prevalence (/100)
            0 pre-cin prevalence (/100,000)
            0 cin1 prevalence (/100,000)
            0 cin2 prevalence (/100,000)
            0 cin3 prevalence (/100,000)

../_images/tutorials_tut_intro_3_5.svg

Defining parameters and genotypes, and running simulations

Parameters are defined as a dictionary. Some common parameters to modify are the number of agents in the simulation, the genotypes to simulate, and the start and end dates of the simulation. We can define those as:

[2]:
pars = dict(
    n_agents = 10e3,
    genotypes = [16, 18, 'hr'], # Simulate genotypes 16 and 18, plus all other high-risk HPV genotypes pooled together
    start = 1980,
    end = 2030,
)

Running a simulation is pretty easy. In fact, running a sim with the parameters we defined above is just:

[3]:
sim = hpv.Sim(pars)
sim.run()
Loading location-specific demographic data for "nigeria"
Initializing sim with 10000 agents
Loading location-specific data for "nigeria"
  Running 1980.0 ( 0/204) (0.03 s)  ———————————————————— 0%
  Running 1982.5 (10/204) (0.31 s)  •——————————————————— 5%
  Running 1985.0 (20/204) (0.60 s)  ••—————————————————— 10%
  Running 1987.5 (30/204) (0.88 s)  •••————————————————— 15%
  Running 1990.0 (40/204) (1.17 s)  ••••———————————————— 20%
  Running 1992.5 (50/204) (1.46 s)  •••••——————————————— 25%
  Running 1995.0 (60/204) (1.76 s)  •••••——————————————— 30%
  Running 1997.5 (70/204) (2.06 s)  ••••••—————————————— 35%
  Running 2000.0 (80/204) (2.38 s)  •••••••————————————— 40%
  Running 2002.5 (90/204) (2.69 s)  ••••••••———————————— 45%
  Running 2005.0 (100/204) (3.04 s)  •••••••••——————————— 50%
  Running 2007.5 (110/204) (3.39 s)  ••••••••••—————————— 54%
  Running 2010.0 (120/204) (3.75 s)  •••••••••••————————— 59%
  Running 2012.5 (130/204) (4.14 s)  ••••••••••••———————— 64%
  Running 2015.0 (140/204) (4.52 s)  •••••••••••••——————— 69%
  Running 2017.5 (150/204) (4.92 s)  ••••••••••••••—————— 74%
  Running 2020.0 (160/204) (5.35 s)  •••••••••••••••————— 79%
  Running 2022.5 (170/204) (5.77 s)  ••••••••••••••••———— 84%
  Running 2025.0 (180/204) (6.23 s)  •••••••••••••••••——— 89%
  Running 2027.5 (190/204) (6.72 s)  ••••••••••••••••••—— 94%
  Running 2030.0 (200/204) (7.21 s)  •••••••••••••••••••— 99%
Simulation summary:
   10,199,556 infections
            0 dysplasias
            0 pre-cins
    2,185,722 cin1s
      147,965 cin2s
       45,970 cin3s
    5,104,806 cins
        9,338 cancers
            0 cancer detections
        5,746 cancer deaths
            0 detected cancer deaths
    6,981,668 reinfections
            0 reactivations
   775,935,104 number susceptible
   12,546,890 number infectious
       80,447 number with inactive infection
   260,318,592 number with no cellular changes
   31,940,412 number with episomal infection
        1,437 number with transformation
       80,447 number with cancer
   12,627,338 number infected
   32,020,860 number with abnormal cells
            0 number with latent infection
    2,570,001 number with precin
    4,253,646 number with cin1
    1,213,891 number with cin2
      820,274 number with cin3
    6,234,658 number with detectable dysplasia
            0 number with detected cancer
            0 number screened
            0 number treated for precancerous lesions
            0 number treated for cancer
            0 number vaccinated
            0 number given therapeutic vaccine
         0.44 hpv incidence (/100)
            0 cin1 incidence (/100,000)
            0 cin2 incidence (/100,000)
            0 cin3 incidence (/100,000)
            0 dysplasia incidence (/100,000)
            7 cancer incidence (/100,000)
    9,423,815 births
    2,334,406 other deaths
   -1,321,633 migration
           10 age-adjusted cervical cancer incidence (/100,000)
            0 age-adjusted cervical cancer mortality
            0 newly vaccinated
            0 cumulative number vaccinated
            0 new doses
            0 cumulative doses
            0 new therapeutic vaccine doses
            0 newly received therapeutic vaccine
            0 cumulative therapeutic vaccine doses
            0 total received therapeutic vaccine
            0 new screens
            0 newly screened
            0 new cin treatments
            0 newly treated for cins
            0 new cancer treatments
            0 newly treated for cancer
            0 cumulative screens
            0 cumulative number screened
            0 cumulative cin treatments
            0 cumulative number treated for cins
            0 cumulative cancer treatments
            0 cumulative number treated for cancer
            0 detected cancer incidence (/100,000)
            4 cancer mortality
   260,318,592 number alive
            0 crude death rate
            0 crude birth rate
         1.61 hpv prevalence (/100)
            0 pre-cin prevalence (/100,000)
            0 cin1 prevalence (/100,000)
            0 cin2 prevalence (/100,000)
            0 cin3 prevalence (/100,000)

[3]:
Sim(<no label>; 1980 to 2030; pop: 10000 default; epi: 3.72463e+08⚙, 334000♋︎)

This will generate a results dictionary sim.results. Results by genotype are named things like sim.results['infections'] and stored as arrays where each row corresponds to a genotype, while totals across all genotypes have names like sim.results['infections'] or sim.results['cancers'].

Rather than creating a parameter dictionary, any valid parameter can also be passed to the sim directly. For example, exactly equivalent to the above is:

[4]:
sim = hpv.Sim(n_agents=10e3, start=1980, end=2030)
sim.run()
Loading location-specific demographic data for "nigeria"
Initializing sim with 10000 agents
Loading location-specific data for "nigeria"
  Running 1980.0 ( 0/204) (0.04 s)  ———————————————————— 0%
  Running 1982.5 (10/204) (0.31 s)  •——————————————————— 5%
  Running 1985.0 (20/204) (0.59 s)  ••—————————————————— 10%
  Running 1987.5 (30/204) (0.87 s)  •••————————————————— 15%
  Running 1990.0 (40/204) (1.17 s)  ••••———————————————— 20%
  Running 1992.5 (50/204) (1.47 s)  •••••——————————————— 25%
  Running 1995.0 (60/204) (1.79 s)  •••••——————————————— 30%
  Running 1997.5 (70/204) (2.10 s)  ••••••—————————————— 35%
  Running 2000.0 (80/204) (2.45 s)  •••••••————————————— 40%
  Running 2002.5 (90/204) (2.79 s)  ••••••••———————————— 45%
  Running 2005.0 (100/204) (3.15 s)  •••••••••——————————— 50%
  Running 2007.5 (110/204) (3.51 s)  ••••••••••—————————— 54%
  Running 2010.0 (120/204) (3.89 s)  •••••••••••————————— 59%
  Running 2012.5 (130/204) (4.30 s)  ••••••••••••———————— 64%
  Running 2015.0 (140/204) (4.71 s)  •••••••••••••——————— 69%
  Running 2017.5 (150/204) (5.13 s)  ••••••••••••••—————— 74%
  Running 2020.0 (160/204) (5.57 s)  •••••••••••••••————— 79%
  Running 2022.5 (170/204) (6.01 s)  ••••••••••••••••———— 84%
  Running 2025.0 (180/204) (6.48 s)  •••••••••••••••••——— 89%
  Running 2027.5 (190/204) (6.99 s)  ••••••••••••••••••—— 94%
  Running 2030.0 (200/204) (7.50 s)  •••••••••••••••••••— 99%
Simulation summary:
   19,149,308 infections
            0 dysplasias
            0 pre-cins
    4,364,261 cin1s
      280,129 cin2s
       96,968 cin3s
    8,138,096 cins
       12,929 cancers
            0 cancer detections
        6,465 cancer deaths
            0 detected cancer deaths
   13,812,498 reinfections
            0 reactivations
   770,468,992 number susceptible
   21,432,716 number infectious
      100,559 number with inactive infection
   260,562,080 number with no cellular changes
   45,796,728 number with episomal infection
        6,465 number with transformation
      100,559 number with cancer
   21,533,276 number infected
   45,897,288 number with abnormal cells
            0 number with latent infection
    4,971,924 number with precin
    7,750,226 number with cin1
    1,947,253 number with cin2
    1,291,465 number with cin3
   10,820,148 number with detectable dysplasia
            0 number with detected cancer
            0 number screened
            0 number treated for precancerous lesions
            0 number treated for cancer
            0 number vaccinated
            0 number given therapeutic vaccine
         0.83 hpv incidence (/100)
            0 cin1 incidence (/100,000)
            0 cin2 incidence (/100,000)
            0 cin3 incidence (/100,000)
            0 dysplasia incidence (/100,000)
           10 cancer incidence (/100,000)
    9,409,450 births
    2,617,408 other deaths
   -1,070,235 migration
           15 age-adjusted cervical cancer incidence (/100,000)
            0 age-adjusted cervical cancer mortality
            0 newly vaccinated
            0 cumulative number vaccinated
            0 new doses
            0 cumulative doses
            0 new therapeutic vaccine doses
            0 newly received therapeutic vaccine
            0 cumulative therapeutic vaccine doses
            0 total received therapeutic vaccine
            0 new screens
            0 newly screened
            0 new cin treatments
            0 newly treated for cins
            0 new cancer treatments
            0 newly treated for cancer
            0 cumulative screens
            0 cumulative number screened
            0 cumulative cin treatments
            0 cumulative number treated for cins
            0 cumulative cancer treatments
            0 cumulative number treated for cancer
            0 detected cancer incidence (/100,000)
            5 cancer mortality
   260,562,080 number alive
            0 crude death rate
            0 crude birth rate
         2.74 hpv prevalence (/100)
            0 pre-cin prevalence (/100,000)
            0 cin1 prevalence (/100,000)
            0 cin2 prevalence (/100,000)
            0 cin3 prevalence (/100,000)

[4]:
Sim(<no label>; 1980 to 2030; pop: 10000 default; epi: 6.09129e+08⚙, 410855♋︎)

You can mix and match too – pass in a parameter dictionary with default options, and then include other parameters as keywords (including overrides; keyword arguments take precedence). For example:

[5]:
sim = hpv.Sim(pars, end=2050) # Use parameters defined above, except set the end data to 2050
sim.run()
Loading location-specific demographic data for "nigeria"
Initializing sim with 10000 agents
Loading location-specific data for "nigeria"
  Running 1980.0 ( 0/284) (0.04 s)  ———————————————————— 0%
  Running 1982.5 (10/284) (0.32 s)  ———————————————————— 4%
  Running 1985.0 (20/284) (0.60 s)  •——————————————————— 7%
  Running 1987.5 (30/284) (0.88 s)  ••—————————————————— 11%
  Running 1990.0 (40/284) (1.17 s)  ••—————————————————— 14%
  Running 1992.5 (50/284) (1.46 s)  •••————————————————— 18%
  Running 1995.0 (60/284) (1.76 s)  ••••———————————————— 21%
  Running 1997.5 (70/284) (2.06 s)  •••••——————————————— 25%
  Running 2000.0 (80/284) (2.39 s)  •••••——————————————— 29%
  Running 2002.5 (90/284) (2.70 s)  ••••••—————————————— 32%
  Running 2005.0 (100/284) (3.05 s)  •••••••————————————— 36%
  Running 2007.5 (110/284) (3.40 s)  •••••••————————————— 39%
  Running 2010.0 (120/284) (3.77 s)  ••••••••———————————— 43%
  Running 2012.5 (130/284) (4.16 s)  •••••••••——————————— 46%
  Running 2015.0 (140/284) (4.56 s)  •••••••••——————————— 50%
  Running 2017.5 (150/284) (4.96 s)  ••••••••••—————————— 53%
  Running 2020.0 (160/284) (5.39 s)  •••••••••••————————— 57%
  Running 2022.5 (170/284) (5.83 s)  ••••••••••••———————— 60%
  Running 2025.0 (180/284) (6.29 s)  ••••••••••••———————— 64%
  Running 2027.5 (190/284) (6.78 s)  •••••••••••••——————— 67%
  Running 2030.0 (200/284) (7.29 s)  ••••••••••••••—————— 71%
  Running 2032.5 (210/284) (7.79 s)  ••••••••••••••—————— 74%
  Running 2035.0 (220/284) (8.34 s)  •••••••••••••••————— 78%
  Running 2037.5 (230/284) (8.89 s)  ••••••••••••••••———— 81%
  Running 2040.0 (240/284) (9.48 s)  ••••••••••••••••———— 85%
  Running 2042.5 (250/284) (10.07 s)  •••••••••••••••••——— 88%
  Running 2045.0 (260/284) (10.75 s)  ••••••••••••••••••—— 92%
  Running 2047.5 (270/284) (11.40 s)  •••••••••••••••••••— 95%
  Running 2050.0 (280/284) (12.09 s)  •••••••••••••••••••— 99%
Simulation summary:
   13,187,595 infections
            0 dysplasias
            0 pre-cins
    3,104,400 cin1s
      247,806 cin2s
       53,153 cin3s
    7,186,377 cins
       15,802 cancers
            0 cancer detections
        7,901 cancer deaths
            0 detected cancer deaths
    8,985,665 reinfections
            0 reactivations
   1,120,278,016 number susceptible
   16,069,329 number infectious
      128,572 number with inactive infection
   375,473,728 number with no cellular changes
   46,545,896 number with episomal infection
        3,591 number with transformation
      128,572 number with cancer
   16,197,901 number infected
   46,674,464 number with abnormal cells
            0 number with latent infection
    3,232,972 number with precin
    5,606,165 number with cin1
    1,619,718 number with cin2
    1,329,534 number with cin3
    8,518,066 number with detectable dysplasia
            0 number with detected cancer
            0 number screened
            0 number treated for precancerous lesions
            0 number treated for cancer
            0 number vaccinated
            0 number given therapeutic vaccine
         0.39 hpv incidence (/100)
            0 cin1 incidence (/100,000)
            0 cin2 incidence (/100,000)
            0 cin3 incidence (/100,000)
            0 dysplasia incidence (/100,000)
            8 cancer incidence (/100,000)
   13,704,756 births
    3,246,620 other deaths
   -4,905,843 migration
           10 age-adjusted cervical cancer incidence (/100,000)
            0 age-adjusted cervical cancer mortality
            0 newly vaccinated
            0 cumulative number vaccinated
            0 new doses
            0 cumulative doses
            0 new therapeutic vaccine doses
            0 newly received therapeutic vaccine
            0 cumulative therapeutic vaccine doses
            0 total received therapeutic vaccine
            0 new screens
            0 newly screened
            0 new cin treatments
            0 newly treated for cins
            0 new cancer treatments
            0 newly treated for cancer
            0 cumulative screens
            0 cumulative number screened
            0 cumulative cin treatments
            0 cumulative number treated for cins
            0 cumulative cancer treatments
            0 cumulative number treated for cancer
            0 detected cancer incidence (/100,000)
            4 cancer mortality
   375,473,728 number alive
            0 crude death rate
            0 crude birth rate
         1.43 hpv prevalence (/100)
            0 pre-cin prevalence (/100,000)
            0 cin1 prevalence (/100,000)
            0 cin2 prevalence (/100,000)
            0 cin3 prevalence (/100,000)

[5]:
Sim(<no label>; 1980 to 2050; pop: 10000 default; epi: 6.02183e+08⚙, 617001♋︎)

Plotting results

As you saw above, plotting the results of a simulation is rather easy too:

[6]:
fig = sim.plot()
../_images/tutorials_tut_intro_13_0.svg

Full usage example

Many of the details of this example will be explained in later tutorials, but to give you a taste, here’s an example of how you would run two simulations to determine the impact of a custom intervention aimed at protecting the elderly.

[7]:
import hpvsim as hpv

# Custom vaccination intervention
def custom_vx(sim):
    if sim.yearvec[sim.t] == 2000:
        target_group = (sim.people.age>9) * (sim.people.age<14)
        sim.people.peak_imm[0, target_group] = 1

pars = dict(
    location = 'tanzania', # Use population characteristics for Japan
    n_agents = 10e3, # Have 50,000 people total in the population
    start = 1980, # Start the simulation in 1980
    n_years = 50, # Run the simulation for 50 years
    burnin = 10, # Discard the first 20 years as burnin period
    verbose = 0, # Do not print any output
)

# Running with multisims -- see Tutorial 3
s1 = hpv.Sim(pars, label='Default')
s2 = hpv.Sim(pars, interventions=custom_vx, label='Custom vaccination')
msim = hpv.MultiSim([s1, s2])
msim.run()
fig = msim.plot(['cancers', 'dysplasias'])
Loading location-specific demographic data for "tanzania"
Loading location-specific demographic data for "tanzania"
../_images/tutorials_tut_intro_15_1.svg
<Figure size 640x480 with 0 Axes>
[ ]: