hpvsim.base module

Base classes for HPVsim. These classes handle a lot of the boilerplate of the People and Sim classes (e.g. loading, saving, key lookups, etc.), so those classes can be focused on the disease-specific functionality.

class ParsObj(pars)[source]

Bases: FlexPretty

A class based around performing operations on a self.pars dict.

update_pars(pars=None, create=False)[source]

Update internal dict with new pars.

Parameters:
  • pars (dict) – the parameters to update (if None, do nothing)

  • create (bool) – if create is False, then raise a KeyNotFoundError if the key does not already exist

class Result(name=None, npts=None, scale=True, color=None, n_rows=0, n_copies=0)[source]

Bases: object

Stores a single result – by default, acts like an array.

Parameters:
  • name (str) – name of this result, e.g. new_infections

  • npts (int) – if values is None, precreate it to be of this length

  • scale (bool) – whether or not the value scales by population scale factor

  • color (str/arr) – default color for plotting (hex or RGB notation)

Example:

import hpvsim as hpv
r1 = hpv.Result(name='test1', npts=10)
r1[:5] = 20
print(r1.values)
sum()[source]

To allow result.sum() instead of result.values.sum()

mean()[source]

To allow result.mean() instead of result.values.mean()

median()[source]

To allow result.median() instead of result.values.median()

property npts
property shape
class BaseSim(*args, **kwargs)[source]

Bases: ParsObj

The BaseSim class stores various methods useful for the Sim that are not directly related to simulating the epidemic. It is not used outside of the Sim object, so the separation of methods into the BaseSim and Sim classes is purely to keep each one of manageable size.

update_pars(pars=None, create=False, **kwargs)[source]

Ensure that metaparameters get used properly before being updated

set_metadata(simfile)[source]

Set the metadata for the simulation – creation time and filename

property n

Count the number of people – if it fails, assume none

get_t(dates, exact_match=False, return_date_format=None)[source]

Convert a string, date/datetime object, or int to a timepoint (int).

Parameters:
  • date (str, date, int, or list) – convert any of these objects to a timepoint relative to the simulation’s start day

  • exact_match (bool) – whether or not to demand an exact match to the requested date

  • return_date_format (None, str) – if None, do not return dates; otherwise return them as strings or floats as requested

Returns:

the time point in the simulation cloesst to the requested date

Return type:

t (int or str)

Examples:

sim.get_t('2015-03-01') # Get the closest timepoint to the specified date
sim.get_t(3) # Will return 3
sim.get_t('2015') # Can use strings
sim.get_t(['2015.5', '2016.5']) # List of strings, will match as close as possible
sim.get_t(['2015.5', '2016.5'], exact_match=True) # Raises an error since these dates aren't directly simulated
result_keys(which='all')[source]

Get the actual results objects, not other things stored in sim.results.

If which is ‘total’, return only the main results keys. If ‘genotype’, return only genotype keys. If ‘all’, return all keys.

result_types(reskeys)[source]

Figure out what kind of result it is, which determines what plotting style to use

copy()[source]

Returns a deep copy of the sim

export_results(for_json=True, filename=None, indent=2, *args, **kwargs)[source]

Convert results to dict – see also to_json().

The results written to Excel must have a regular table shape, whereas for the JSON output, arbitrary data shapes are supported.

Parameters:
  • for_json (bool) – if False, only data associated with Result objects will be included in the converted output

  • filename (str) – filename to save to; if None, do not save

  • indent (int) – indent (int): if writing to file, how many indents to use per nested level

  • args (list) – passed to savejson()

  • kwargs (dict) – passed to savejson()

Returns:

dictionary representation of the results

Return type:

resdict (dict)

export_pars(filename=None, indent=2, *args, **kwargs)[source]

Return parameters for JSON export – see also to_json().

This method is required so that interventions can specify their JSON-friendly representation.

Parameters:
  • filename (str) – filename to save to; if None, do not save

  • indent (int) – indent (int): if writing to file, how many indents to use per nested level

  • args (list) – passed to savejson()

  • kwargs (dict) – passed to savejson()

Returns:

a dictionary containing all the parameter values

Return type:

pardict (dict)

to_json(filename=None, keys=None, tostring=False, indent=2, verbose=False, *args, **kwargs)[source]

Export results and parameters as JSON.

Parameters:
  • filename (str) – if None, return string; else, write to file

  • keys (str or list) – attributes to write to json (default: results, parameters, and summary)

  • tostring (bool) – if not writing to file, whether to write to string (alternative is sanitized dictionary)

  • indent (int) – if writing to file, how many indents to use per nested level

  • verbose (bool) – detail to print

  • args (list) – passed to savejson()

  • kwargs (dict) – passed to savejson()

Returns:

A unicode string containing a JSON representation of the results, or writes the JSON file to disk

Examples:

json = sim.to_json()
sim.to_json('results.json')
sim.to_json('summary.json', keys='summary')
to_df(date_index=False)[source]

Export results to a pandas dataframe

Parameters:

date_index (bool) – if True, use the date as the index

to_excel(filename=None, skip_pars=None)[source]

Export parameters and results as Excel format

Parameters:
  • filename (str) – if None, return string; else, write to file

  • skip_pars (list) – if provided, a custom list parameters to exclude

Returns:

An sc.Spreadsheet with an Excel file, or writes the file to disk

shrink(skip_attrs=None, in_place=True)[source]

“Shrinks” the simulation by removing the people and other memory-intensive attributes (e.g., some interventions and analyzers), and returns a copy of the “shrunken” simulation. Used to reduce the memory required for RAM or for saved files.

Parameters:
  • skip_attrs (list) – a list of attributes to skip (remove) in order to perform the shrinking; default “people”

  • in_palce (bool) – whether to perform the shrinking in place (default), or return a shrunken copy instead

Returns:

a Sim object with the listed attributes removed

Return type:

shrunken (Sim)

save(filename=None, keep_people=None, skip_attrs=None, **kwargs)[source]

Save to disk as a gzipped pickle.

Parameters:
  • filename (str or None) – the name or path of the file to save to; if None, uses stored

  • kwargs – passed to sc.makefilepath()

Returns:

the validated absolute path to the saved file

Return type:

filename (str)

Example:

sim.save() # Saves to a .sim file
static load(filename, *args, **kwargs)[source]

Load from disk from a gzipped pickle.

Parameters:
  • filename (str) – the name or path of the file to load from

  • kwargs – passed to hpv.load()

Returns:

the loaded simulation object

Return type:

sim (Sim)

Example:

sim = hpv.Sim.load('my-simulation.sim')
get_interventions(label=None, partial=False, as_inds=False)[source]

Find the matching intervention(s) by label, index, or type. If None, return all interventions. If the label provided is “summary”, then print a summary of the interventions (index, label, type).

Parameters:
  • label (str, int, Intervention, list) – the label, index, or type of intervention to get; if a list, iterate over one of those types

  • partial (bool) – if true, return partial matches (e.g. ‘beta’ will match all beta interventions)

  • as_inds (bool) – if true, return matching indices instead of the actual interventions

Examples:

tp = hpv.test_prob(symp_prob=0.1)
cb1 = hpv.change_beta(days=5, changes=0.3, label='NPI')
cb2 = hpv.change_beta(days=10, changes=0.3, label='Masks')
sim = hpv.Sim(interventions=[tp, cb1, cb2])
cb1, cb2 = sim.get_interventions(hpv.change_beta)
tp, cb2 = sim.get_interventions([0,2])
ind = sim.get_interventions(hpv.change_beta, as_inds=True) # Returns [1,2]
sim.get_interventions('summary') # Prints a summary
get_intervention(label=None, partial=False, first=False, die=True)[source]

Like get_interventions(), find the matching intervention(s) by label, index, or type. If more than one intervention matches, return the last by default. If no label is provided, return the last intervention in the list.

Parameters:
  • label (str, int, Intervention, list) – the label, index, or type of intervention to get; if a list, iterate over one of those types

  • partial (bool) – if true, return partial matches (e.g. ‘beta’ will match all beta interventions)

  • first (bool) – if true, return first matching intervention (otherwise, return last)

  • die (bool) – whether to raise an exception if no intervention is found

Examples:

tp = hpv.test_prob(symp_prob=0.1)
cb = hpv.change_beta(days=5, changes=0.3, label='NPI')
sim = hpv.Sim(interventions=[tp, cb])
cb = sim.get_intervention('NPI')
cb = sim.get_intervention('NP', partial=True)
cb = sim.get_intervention(hpv.change_beta)
cb = sim.get_intervention(1)
cb = sim.get_intervention()
tp = sim.get_intervention(first=True)
get_analyzers(label=None, partial=False, as_inds=False)[source]

Same as get_interventions(), but for analyzers.

get_analyzer(label=None, partial=False, first=False, die=True)[source]

Same as get_intervention(), but for analyzers.

class BasePeople(pars)[source]

Bases: FlexPretty

A class to handle all the boilerplate for people – note that as with the BaseSim vs Sim classes, everything interesting happens in the People class, whereas this class exists to handle the less interesting implementation details.

Initialize essential attributes used for filtering

initialize()[source]

Initialize underlying storage and map arrays

set_pars(pars=None)[source]

Re-link the parameters stored in the people object to the sim containing it, and perform some basic validation.

validate(sim_pars=None, verbose=False)[source]

Perform validation on the People object.

Parameters:
  • sim_pars (dict) – dictionary of parameters from the sim to ensure they match the current People object

  • verbose (bool) – detail to print

lock()[source]

Lock the people object to prevent keys from being added

unlock()[source]

Unlock the people object to allow keys to be added

filter_inds(inds)[source]

Store indices to allow for easy filtering of the People object.

Parameters:

inds (array) – filter by these indices

Returns:

A filtered People object, which works just like a normal People object except only operates on a subset of indices.

filter(criteria)[source]

Store indices to allow for easy filtering of the People object.

Parameters:

criteria (array) – a boolean array for the filtering critria

Returns:

A filtered People object, which works just like a normal People object except only operates on a subset of indices.

unfilter()[source]

Set main simulation attributes to be views of the underlying data

This method should be called whenever the number of agents required changes (regardless of whether or not the underlying arrays have been resized)

addtoself(people2)[source]

Combine two people arrays, avoiding dcp

set(key, value, die=True)[source]
get(key)[source]

Convenience method – key can be string or list of strings

property is_female

Boolean array of everyone female

property is_female_alive

Boolean array of everyone female and alive

property is_male

Boolean array of everyone male

property is_male_alive

Boolean array of everyone male and alive

property f_inds

Indices of everyone female

property m_inds

Indices of everyone male

property int_age

Return ages as an integer

property round_age

Rounds age up to the next highest integer

property dt_age

Return ages rounded to the nearest whole timestep

property is_active

Boolean array of everyone sexually active i.e. past debut

property is_female_adult

Boolean array of everyone eligible for screening

property is_virgin

Boolean array of everyone not yet sexually active i.e. pre debut

property alive_inds

Indices of everyone alive

property alive_level0

Indices of everyone alive who is a level 0 agent

property alive_level0_inds

Indices of everyone alive who is a level 0 agent

property n_alive

Number of people alive

property n_alive_level0

Number of people alive

property infected

Boolean array of everyone infected. Union of infectious and inactive. Includes people with cancer, people with latent infections, and people with active infections

property abnormal

Boolean array of everyone with abnormal cells. Union of episomal, transformed, and cancerous

property latent

Boolean array of everyone with latent infection. By definition, these people have inactive infection and no cancer.

property precin

Boolean array of females with HPV whose disease severity level does not meet the threshold for detectable cell changes

true(key)[source]

Return indices matching the condition

true_by_genotype(key, genotype)[source]

Return indices matching genotype-condition

false_by_genotype(key, genotype)[source]

Return indices not matching genotype-condition

false(key)[source]

Return indices not matching the condition

defined(key)[source]

Return indices of people who are not-nan

undefined(key)[source]

Return indices of people who are nan

count(key, weighted=True)[source]

Count the number of people for a given key

count_any(key, weighted=True)[source]

Count the number of people for a given key for a 2D array if any value matches

count_by_genotype(key, genotype, weighted=True)[source]

Count the number of people for a given key

keys()[source]

Returns keys for all non-derived properties of the people object

person_keys()[source]

Returns keys specific to a person (e.g., their age)

state_keys()[source]

Returns keys for different states of a person (e.g., symptomatic)

imm_keys()[source]

Returns keys for different states of a person (e.g., symptomatic)

intv_keys()[source]
date_keys()[source]

Returns keys for different event dates (e.g., date a person became symptomatic)

dur_keys()[source]

Returns keys for different durations (e.g., the duration from exposed to infectious)

layer_keys()[source]

Get the available contact keys – try contacts first, then acts

indices()[source]

The indices of each people array

to_df()[source]

Convert to a Pandas dataframe

to_arr()[source]

Return as numpy array

person(ind)[source]

Method to create person from the people

to_list()[source]

Return all people as a list

from_list(people)[source]

Convert a list of people back into a People object

to_graph()[source]

Convert all people to a networkx MultiDiGraph, including all properties of the people (nodes) and contacts (edges).

Example:

import hpvsim as hpv
import networkx as nx
sim = hpv.Sim(n_agents=50, pop_type='hybrid', contacts=dict(h=3, s=10, w=10, c=5)).run()
G = sim.people.to_graph()
nodes = G.nodes(data=True)
edges = G.edges(keys=True)
node_colors = [n['age'] for i,n in nodes]
layer_map = dict(h='#37b', s='#e11', w='#4a4', c='#a49')
edge_colors = [layer_map[G[i][j][k]['layer']] for i,j,k in edges]
edge_weights = [G[i][j][k]['beta']*5 for i,j,k in edges]
nx.draw(G, node_color=node_colors, edge_color=edge_colors, width=edge_weights, alpha=0.5)
save(filename=None, force=False, **kwargs)[source]

Save to disk as a gzipped pickle.

Note: by default this function raises an exception if trying to save a run or partially run People object, since the changes that happen during a run are usually irreversible.

Parameters:
  • filename (str or None) – the name or path of the file to save to; if None, uses stored

  • force (bool) – whether to allow saving even of a run or partially-run People object

  • kwargs – passed to sc.makefilepath()

Returns:

the validated absolute path to the saved file

Return type:

filename (str)

Example:

sim = hpv.Sim()
sim.initialize()
sim.people.save() # Saves to a .ppl file
static load(filename, *args, **kwargs)[source]

Load from disk from a gzipped pickle.

Parameters:
  • filename (str) – the name or path of the file to load from

  • args (list) – passed to hpv.load()

  • kwargs (dict) – passed to hpv.load()

Returns:

the loaded people object

Return type:

people (People)

Example:

people = hpv.people.load('my-people.ppl')
init_contacts(reset=False)[source]

Initialize the contacts dataframe with the correct columns and data types

add_contacts(contacts, lkey=None, beta=None)[source]

Add new contacts to the array. See also contacts.add_layer().

make_edgelist(contacts)[source]

Parse a list of people with a list of contacts per person and turn it into an edge list.

static remove_duplicates(df)[source]

Sort the dataframe and remove duplicates – note, not extensively tested

class Person(pars=None, uid=None, age=-1, sex=-1, debut=-1, rel_sev=-1, partners=None, current_partners=None, rship_start_dates=None, rship_end_dates=None, n_rships=None)[source]

Bases: prettyobj

Class for a single person. Note: this is largely deprecated since sim.people is now based on arrays rather than being a list of people.

class FlexDict[source]

Bases: dict

A dict that allows more flexible element access: in addition to obj[‘a’], also allow obj[0]. Lightweight implementation of the Sciris odict class.

keys()[source]
values()[source]
items()[source]
class Contacts(data=None, layer_keys=None, **kwargs)[source]

Bases: FlexDict

A simple (for now) class for storing different contact layers.

Parameters:
  • data (dict) – a dictionary that looks like a Contacts object

  • layer_keys (list) – if provided, create an empty Contacts object with these layers

  • kwargs (dict) – additional layer(s), merged with data

add_layer(**kwargs)[source]

Small method to add one or more layers to the contacts. Layers should be provided as keyword arguments.

Example:

hospitals_layer = hpv.Layer(label='hosp')
sim.people.contacts.add_layer(hospitals=hospitals_layer)
pop_layer(*args)[source]

Remove the layer(s) from the contacts.

Example:

sim.people.contacts.pop_layer('hospitals')

Note: while included here for convenience, this operation is equivalent to simply popping the key from the contacts dictionary.

to_graph()[source]

Convert all layers to a networkx MultiDiGraph

Example:

import networkx as nx
sim = hpv.Sim(n_agents=50, pop_type='hybrid').run()
G = sim.people.contacts.to_graph()
nx.draw(G)
class Layer(*args, label=None, **kwargs)[source]

Bases: FlexDict

A small class holding a single layer of contact edges (connections) between people.

The input is typically arrays including: person 1 of the connection, person 2 of the connection, the weight of the connection, the duration and start/end times of the connection. Connections are undirected; each person is both a source and sink.

This class is usually not invoked directly by the user, but instead is called as part of the population creation.

Parameters:
  • f (array) – an array of N connections, representing people on one side of the connection

  • m (array) – an array of people on the other side of the connection

  • acts (array) – an array of number of acts per timestep for each connection

  • dur (array) – duration of the connection

  • start (array) – start time of the connection

  • end (array) – end time of the connection

  • label (str) – the name of the layer (optional)

  • kwargs (dict) – other keys copied directly into the layer

Note that all arguments (except for label) must be arrays of the same length, although not all have to be supplied at the time of creation (they must all be the same at the time of initialization, though, or else validation will fail).

Examples:

# Generate an average of 10 contacts for 1000 people
n = 10_000
n_people = 1000
p1 = np.random.randint(n_people, size=n)
p2 = np.random.randint(n_people, size=n)
beta = np.ones(n)
layer = hpv.Layer(p1=p1, p2=p2, beta=beta, label='rand')
layer = hpv.Layer(dict(p1=p1, p2=p2, beta=beta), label='rand') # Alternate method

# Convert one layer to another with extra columns
index = np.arange(n)
self_conn = p1 == p2
layer2 = hpv.Layer(**layer, index=index, self_conn=self_conn, label=layer.label)
property members

Return sorted array of all members

meta_keys()[source]

Return the keys for the layer’s meta information – i.e., f, m, beta, any others

validate(force=True)[source]

Check the integrity of the layer: right types, right lengths.

If dtype is incorrect, try to convert automatically; if length is incorrect, do not.

get_inds(inds, remove=False)[source]

Get the specified indices from the edgelist and return them as a dict.

Parameters:

inds (int, array, slice) – the indices to be removed

pop_inds(inds)[source]

“Pop” the specified indices from the edgelist and return them as a dict. Returns in the right format to be used with layer.append().

Parameters:

inds (int, array, slice) – the indices to be removed

append(contacts)[source]

Append contacts to the current layer.

Parameters:

contacts (dict) – a dictionary of arrays with keys f,m,beta, as returned from layer.pop_inds()

to_df()[source]

Convert to dataframe

from_df(df, keys=None)[source]

Convert from a dataframe

to_graph()[source]

Convert to a networkx DiGraph

Example:

import networkx as nx
sim = hpv.Sim(n_agents=20, pop_type='hybrid').run()
G = sim.people.contacts['h'].to_graph()
nx.draw(G)
find_contacts(inds, as_array=True)[source]

Find all contacts of the specified people

For some purposes (e.g. contact tracing) it’s necessary to find all of the contacts associated with a subset of the people in this layer. Since contacts are bidirectional it’s necessary to check both P1 and P2 for the target indices. The return type is a Set so that there is no duplication of indices (otherwise if the Layer has explicit symmetric interactions, they could appear multiple times). This is also for performance so that the calling code doesn’t need to perform its own unique() operation. Note that this cannot be used for cases where multiple connections count differently than a single infection, e.g. exposure risk.

Parameters:
  • inds (array) – indices of people whose contacts to return

  • as_array (bool) – if true, return as sorted array (otherwise, return as unsorted set)

Returns:

a set of indices for pairing partners

Return type:

contact_inds (array)

Example: If there were a layer with - P1 = [1,2,3,4] - P2 = [2,3,1,4] Then find_contacts([1,3]) would return {1,2,3}

update(people, frac=1.0)[source]

Regenerate contacts on each timestep.

This method gets called if the layer appears in sim.pars['dynam_layer']. The Layer implements the update procedure so that derived classes can customize the update e.g. implementing over-dispersion/other distributions, random clusters, etc.

Typically, this method also takes in the people object so that the update can depend on person attributes that may change over time (e.g. changing contacts for people that are severe/critical).

Parameters:
  • people (People) – the HPVsim People object, which is usually used to make new contacts

  • frac (float) – the fraction of contacts to update on each timestep