Welcome to SynthPops¶
SynthPops is used construct synthetic networks of people that satisfy statistical properties of real-world populations (such as the age distribution, household size, etc.). SynthPops can create generic populations with different network characteristics, as well as synthetic populations that interact in different layers of a multilayer contact network. These synthetic populations can then be used with agent-based models like COVID-19 Agent-based Simulator (Covasim) to simulate epidemics. SynthPops is available on GitHub. For more information on Covasim see Covasim on GitHub.
Installation¶
Follow the instructions below to install SynthPops.
Requirements¶
Python 3.6 64-bit. (Note: Python 2 is not supported.)
We also recommend, but do not require, using Python virtual environments. For more information, see documentation for venv or Anaconda.
Installation¶
Complete the following steps to install SynthPops:
Fork and clone the SynthPops GitHub repository.
Open a command prompt and navigate to the SynthPops directory.
Run the following script:
python setup.py develop
Note: while synthpops can also be installed via pypi, this method does not currently include the data files which are required to function, and thus is not recommended.
Quick start guide¶
The following code creates a synthetic population for Seattle, Washington:
import synthpops as sp
sp.validate()
datadir = sp.datadir # this should be where your demographics data folder resides
location = 'seattle_metro'
state_location = 'Washington'
country_location = 'usa'
sheet_name = 'United States of America'
level = 'county'
npop = 10000 # how many people in your population
sp.generate_synthetic_population(npop,datadir,location=location,
state_location=state_location,country_location=country_location,
sheet_name=sheet_name,level=level)
SynthPops overview¶
Fundamentally, the population network can be considered a multilayer network with the following qualities:
Nodes are people, with attributes like age.
Edges represent interactions between people, with attributes like the setting in which the interactions take place (for example, household, school, or work). The relationship between the interaction setting and properties governing disease transmission, such as frequency of contact and risk associated with each contact, is mapped separately by Covasim or other agent-based model. SynthPops reports whether the edge exists or not.
If you are using SynthPops with Covasim, note that the relevant value in Covasim is the parameter beta, which captures the probability of transmission via a given edge per time step. The value of this parameter captures both number of effective contacts for disease transmission and transmission probability per contact.
The generated network is a multilayer network in the sense that it is possible for people to be connected by multiple edges each in different layers of the network. The layers are referred to as contact layers. For example, the workplace contact layer is a representation of all of the pairwise connections between people at work, and the household contact layer represents the pairwise connections between household members. Typically these networks are clustered; in other words, everyone within a household interacts with each other, but not with other households. However, they may interact with members of other households via their school or workplace. Some level of community contacts outside of these networks can be configured using Covasim or other model being used with SynthPops.
SynthPops functions in two stages:
Generate people living in households, and then assign individuals to workplaces and schools. Save the output to a cache file on disk. Implemented in
generate_synthetic_population()
.Load the cached file and produce a dictionary that can be used by Covasim. Implemented in
make_population()
. Covasim assigns community contacts at random on a daily basis to reflect the random and stochastic aspect of contacts in many public spaces, such as shopping centers, parks, and community centers.
SynthPops algorithm¶
This topic describes the algorithm used by SynthPops to generate the connections between people in each of the contact layers for a given location in the real world. The fundamental algorithm is the same for homes, schools, and workplaces, but with some variations for each.
The method draws upon the following previously published models to infer high-resolution age-specific contact patterns in different physical settings and locations:
The general idea is to use age-specific contact matrices that describe age mixing patterns for a specific population. By default, SynthPops uses Prem et al.’s (2017) matrices, which project inferred age mixing patterns from the POLYMOD study (Mossong et al. 2008) in Europe to other countries. However, user-specified contact matrices can also be implemented for customizing age mixing patterns for the household, school, and workplace settings (see the social contact data on Zenodo for other empirical contact matrices from survey studies).
The matrices represent the average number of contacts between people for different age bins (the default matrices use 5-year age bins). For example, a household of two individuals is relatively unlikely to consist of a 25-year-old and a 15-year-old, so for the 25-29 year age bin in the household layer, there are a low number of expected contacts with the 15-19 year age bin (c.f., Fig. 2c in Prem et al.).
Using SynthPops¶
The overall SynthPops workflow is contained in
generate_synthetic_population()
and is described below.
The population is generated through households, not a pool of people.
You can provide required data to SynthPops in a variety of formats including .csv, .txt, or Microsoft Excel (.xlsx).
Instantiate a collection of households with sizes drawn from census data. Populations cannot be created outside of the household contact layer.
For each household, sample the age of a “reference” person from data that maps household size to a reference person in those households. The reference person may be referred to as the head of the household, a parent in the household, or some other definition specific to the data being used. If no data mapping household size to ages of reference individuals are available, then the age of the reference person is sampled from the age distribution of adults for the location.
The age bin of the reference people identifies the row of the contact matrix for that location. The remaining household members are then selected by sampling an age for the distribution of contacts for the reference person’s age (in other words, normalizing the values of the row and sampling for a column) and assigning someone with that age to the household.
As households are generated, individuals are given IDs.
After households are constructed, students are chosen according to enrollment data by age to generate the school contact layer.
Students are assigned to schools using a similar method as above, where we select the age of a reference person and then select their contacts in school from an age-specific contact matrix for the school setting and data on school sizes.
With all students assigned to schools, teachers are selected from the labor force according to employment data.
The rest of the labor force are assigned to workplaces in the workplace contact layer by selecting a reference person and their contacts using an age-specific contact matrix and data on workplace sizes.
Examples¶
Examples live in the examples folder. These can be run as follows:
python examples/make_generic_contacts.py
Creates a dictionary of individuals, each of whom are represented by another dictionary with their contacts contained in the
contacts
key. Contacts are selected at random with degree distribution following the Erdos-Renyi graph model.python examples/generate_contact_network_with_microstructure.py
Creates and saves to file households, schools, and workplaces of individuals with unique IDs, and a table mapping IDs to ages. Two versions of each contact layer (households, schools, or workplaces) are saved; one with the unique IDs of each individual in each group (a single household, school or workplace), and one with their ages (for easy viewing of the age mixing patterns created).
python examples/load_contacts_and_show_some_layers.py
Loads a multilayer contact network made of three layers and shows the age and ages of contacts for the first 20 people.
In the tests folder, you can view the following to see examples of additional functionality.
test_synthpop.py
Reads in demographic data and generates populations matching those demographics.
test_contacts.py
Generates random contact networks with individuals matching demographic data or reads in synthetic contact networks with three layers (households, schools, and workplaces).
test_contact_network_generation.py
Generates synthetic contact networks in households, schools, and workplaces with Seattle Metro data (and writes to file).
The other topics in this section walk through the specific data sources and details about the settings for each of the contact layers.
Household contact layer¶
The household contact layer represents the pairwise connections between household members. The population is generated within this contact layer, not as a separate pool of people.
As locations, households are special in the following ways:
Unlike schools and workplaces, everyone must be assigned to a household.
The size of the household is important (for example, a 2-person household looks very different in comparison to a 5- or 6-person household) and some households only have 1 person.
The reference person/head of the household can be well-defined by data.
Data needed¶
The following data sets are required for households:
Age bracket distribution specifying the distribution of people in age bins for the location. For example:
age_bracket , percent 0_4 , 0.0594714358950416 5_9 , 0.06031137308234759 10_14 , 0.05338015778985113 15_19 , 0.054500690394160285 20_24 , 0.06161403846144956 25_29 , 0.08899312471888453 30_34 , 0.0883533486774803 35_39 , 0.07780767611060545 40_44 , 0.07099017823587304 45_49 , 0.06996903280562596 50_54 , 0.06655242534751997 55_59 , 0.06350008343899961 60_64 , 0.05761405140489549 65_69 , 0.04487122889235999 70_74 , 0.030964420778483555 75_100 , 0.05110673396642193
Age distribution of the reference person for each household size
The distribution is what matters, so it doesn’t matter if absolute counts are available or not, each row is normalized. If this is not available, default to sampling the age of the reference individual from the age distribution for adults:
family_size , 18-20 , 20-24 , 25-29 , 30-34 , 35-39 , 40-44 , 45-49 , 50-54 , 55-64 , 65-74 , 75-99 2 , 163 , 999 , 2316 , 2230 , 1880 , 1856 , 2390 , 3118 , 9528 , 9345 , 5584 3 , 115 , 757 , 1545 , 1907 , 2066 , 1811 , 2028 , 2175 , 3311 , 1587 , 588 4 , 135 , 442 , 1029 , 1951 , 2670 , 2547 , 2368 , 1695 , 1763 , 520 , 221 5 , 61 , 172 , 394 , 905 , 1429 , 1232 , 969 , 683 , 623 , 235 , 94 6 , 25 , 81 , 153 , 352 , 511 , 459 , 372 , 280 , 280 , 113 , 49 7 , 24 , 33 , 63 , 144 , 279 , 242 , 219 , 115 , 157 , 80 , 16
Distribution of household sizes:
household_size , percent 1 , 0.2781590909877753 2 , 0.3443313103056699 3 , 0.15759535523004006 4 , 0.13654311541644018 5 , 0.050887858718118274 6 , 0.019738368167953997 7 , 0.012744901174002305
Household contact matrix specifying the number/weight of contacts by age bin:
0-10 , 10-20 , 20-30 0-10 0.659867911 , 0.503965302 , 0.214772978 10-20 0.314776879 , 0.895460015 , 0.412465791 20-30 0.132821425 , 0.405073038 , 1.433888594
By default, SynthPops uses matrices from a study (Prem et al. 2017) that projected inferred age mixing patterns from the POLYMOD study (Mossong et al. 2008) in Europe to other countries. SynthPops can take in user-specified contact matrices if other age mixing patterns are available for the household, school, and workplace settings (see the social contact data on Zenodo for other empirical contact matrices from survey studies).
In theory, the household contact matrix varies with household size, but in general data at that resolution is unavailable.
Workflow¶
Use these SynthPops functions to instantiate households as follows:
Call
generate_synthetic_population()
and provide the binned age bracket distribution data described above. This wrapper function calls the following functions:From the binned age distribution,
get_age_n()
creates samples of ages from the binned distribution, and then normalizes to create a single-year distribution. This distribution can therefore be gathered using whatever age bins are present in any given dataset.generate_household_sizes_from_fixed_pop_size()
generates empty households with known size based on the distribution of household sizes.generate_all_households()
contains the core implementation and constructs households with individuals of different ages living together. It takes in the remaining data sources above, and then does the following:Calls
generate_living_alone()
to populate households with 1 person (either from data on those living alone or, if unavailable, from the adult age distribution).Calls
generate_larger_households()
repeatedly with with different household sizes to populate those households, first sampling the age of a reference person and then their household contacts as outlined above.
School contact layer¶
The school contact layer represents all of the pairwise connections between people in schools, including both students and teachers. Schools are special in that:
Enrollment rates by age determine the probability of individual being a student given their age.
Staff members such as teachers are chosen from individuals determined to be in the adult labor force.
The current methods in SynthPops treat student and worker status as mutually exclusive. Many young adults may be both students and workers, part time or full time in either status. The ability to select individuals to participate in both activities will be introduced in a later version of the model.
Data needed¶
The following data is required for schools:
School size distribution:
school_size , percent 0-50 , 0.2 51-100 , 0.1 101-300 , 0.3
Enrollment by age specifying the percentage of people of each age attending school. See
get_school_enrollment_rates()
, but note that this mainly implements parsing a Seattle-specific data file to produce the following data structure, which could equivalently be read directly from a file:age , percent 0 , 0 1 , 0 2 , 0 3 , 0.529 4 , 0.529 5 , 0.95 6 , 0.95 7 , 0.95 8 , 0.95 9 , 0.95 10 , 0.987 11 , 0.987 12 , 0.987 13 , 0.987
School contact matrix specifying the number/weight of contacts by age bin. This is similar to the household contact matrix. For example:
0-10 , 10-20 , 20-30 0-10 0.659867911 , 0.503965302 , 0.214772978 10-20 0.314776879 , 0.895460015 , 0.412465791 20-30 0.132821425 , 0.405073038 , 1.433888594
Employment rates by age, which is used when determining who is in the labor force, and thus which adults are available to be chosen as teachers:
Age , Percent 16 , 0.496 17 , 0.496 18 , 0.496 19 , 0.496 20 , 0.838 21 , 0.838 22 , 0.838
Student teacher ratio, which is the average ratio for the location. Methods to use a distribution or vary the ratio for different types of schools may come in later developments of the model:
student_teacher_ratio=30
Typically, contact matrices describing age-specific mixing patterns in schools include the interactions between students and their teachers. These patterns describe multiple types of schools, from possibly preschools to universities.
Workflow¶
Use these SynthPops functions to implement the school contact layer as follows:
get_uids_in_school()
uses the enrollment rates to determine which people attend school. This then provides the number of students needing to be assigned to schools.generate_school_sizes()
generates schools according to the school size distribution until there are enough places for every student to be assigned a school.send_students_to_school()
assigns specific students to specific schools.This function is similar to households in that a reference student is selected, and then the contact matrix is used to fill the remaining spots in the school.
Some particulars in this function deal with ensuring a teacher/adult is less likely to be selected as a reference person, and restricting the age range of sampled people relative to the reference person so that a primary school age reference person will result in the rest of the school being populated with other primary school age children
get_uids_potential_workers()
selects teachers by first getting a pool of working age people that are not students.get_workers_by_age_to_assign()
further filters this population by employment rates resulting in a collection of people that need to be assigned workplaces.In
assign_teachers_to_work()
, for each school, work out how many teachers are needed according to the number of students and the student-teacher ratio, and sample those teachers from the pool of adult workers. A minimum and maximum age for teachers can be provided to select teachers from a specified range of ages (this can be used to account for the additional years of education needed to become a teacher in many places).
Workplace contact layer¶
The workplace contact layer represents all of the pairwise connections between people in workplaces, except for teachers working in schools. After some workers are assigned to the school contact layer as teachers, all remaining workers are assigned to workplaces. Workplaces are special in that there is little/no age structure so workers of all ages may be present in every workplace.
Again, note that work and school are currently exclusive, because the people attending schools are removed from the list of eligible workers. This doesn’t necessarily need to be the case though. In fact, we know that in any countries and cultures around the world, people take on multiple roles as both students and workers, either part-time or full-time in one or both activities.
Data required¶
The following data are required for generating the workplace contact layer:
Workplace size distribution - again, this gets normalized so can be specified as absolute counts or as normalized values:
work_size_bracket , size_count 1-4 , 2947 5-9 , 992 10-19 , 639 20-49 , 430 50-99 , 140 100-249 , 83 250-499 , 26 500-999 , 13 1000-1999 , 12
Work contact matrix specifying the number/weight of contacts by age bin. This is similar to the household contact matrix. For example:
20-30 , 30-40 , 40-50 20-30 0.659867911 , 0.503965302 , 0.214772978 30-40 0.314776879 , 0.895460015 , 0.412465791 40-50 0.132821425 , 0.405073038 , 1.433888594
Workflow¶
generate_workplace_sizes()
generates workplace sizes according to the workplace size distribution until the number of workers is reached.assign_rest_of_workers()
populates workplaces just like for households and schools: randomly selecting the age of a reference person, and then sampling the rest of the workplace using the contact matrix.
API reference¶
Submodules¶
synthpops.base module¶
The module contains frequently-used functions that do not neatly fit into other areas of the code base.
-
synthpops.base.
norm_dic
(dic)¶ Normalize the dictionary
dic
.- Parameters
dic (dict) – A dictionary with numerical values.
- Returns
A normalized dictionary.
-
synthpops.base.
norm_age_group
(age_dic, age_min, age_max)¶ Create a normalized dictionary for the range
age_min
toage_max
, inclusive.- Parameters
age_dic (dict) – A dictionary with numerical values.
age_min (int) – The minimum value of the range for the dictionary.
age_max (int) – The maximum value of the range for the dictionary.
- Returns
A normalized dictionary for keys in the range
age_min
toage_max
, inclusive.
-
synthpops.base.
get_index_by_brackets_dic
(brackets)¶ Create a dictionary mapping each item in the value arrays to the key. For example, if brackets are age brackets, then this function will map each age to the age bracket or bin that it belongs to, so that the resulting dictionary will give by_brackets_dic[age_index] = age bracket of age_index.
- Parameters
brackets (dict) – A dictionary mapping bracket or bin keys to the array of values that belong to each bracket.
- Returns
A dictionary mapping indices to the brackets or bins each index belongs to.
- Return type
dict
-
synthpops.base.
get_age_by_brackets_dic
(age_brackets)¶ Create a dictionary mapping age to the age bracket it falls in.
- Parameters
age_brackets (dict) – A dictionary mapping age bracket keys to age bracket range.
- Returns
A dictionary of age bracket by age.
Example
age_brackets = sp.get_census_age_brackets(sp.datadir,state_location='Washington',country_location='usa') age_by_brackets_dic = sp.get_age_by_brackets_dic(age_brackets)
-
synthpops.base.
get_ids_by_age_dic
(age_by_id_dic)¶ Get lists of IDs that map to each age.
- Parameters
age_by_id_dic (dict) – A dictionary with the age of each individual by their ID.
- Returns
A dictionary listing IDs for each age from a dictionary that maps ID to age.
-
synthpops.base.
count_ages
(popdict)¶ Create an age count from a population dictionary.
- Parameters
popdict (dict) – dictionary defining population
- Returns
Dictionary of the age count of the population.
- Return type
dict
-
synthpops.base.
get_aggregate_ages
(ages, age_by_brackets_dic)¶ Create a dictionary of the count of ages by age brackets.
- Parameters
ages (dict) – A dictionary of age count by single year.
age_by_brackets_dic (dict) – A dictionary mapping age to the age bracket range it falls within.
- Returns
A dictionary of aggregated age count for specified age brackets.
Example
aggregate_age_count = sp.get_aggregate_ages(age_count, age_by_brackets_dic) aggregate_matrix = symmetric_matrix.copy() aggregate_matrix = sp.get_aggregate_matrix(aggregate_matrix, age_by_brackets_dic)
-
synthpops.base.
get_aggregate_matrix
(matrix, age_by_brackets_dic)¶ Aggregate a symmetric matrix to fewer age brackets. Do not use for homogeneous mixing matrix.
- Parameters
matrix (np.ndarray) – A symmetric age contact matrix.
age_by_brackets_dic (dict) – A dictionary mapping age to the age bracket range it falls within.
- Returns
A symmetric contact matrix (
np.ndarray
) aggregated to age brackets.
Example
age_brackets = sp.get_census_age_brackets(sp.datadir,state_location='Washington',country_location='usa') age_by_brackets_dic = sp.get_age_by_brackets_dic(age_brackets) aggregate_age_count = sp.get_aggregate_ages(age_count, age_by_brackets_dic) aggregate_matrix = symmetric_matrix.copy() aggregate_matrix = sp.get_aggregate_matrix(aggregate_matrix, age_by_brackets_dic) asymmetric_matrix = sp.get_asymmetric_matrix(aggregate_matrix, aggregate_age_count)
-
synthpops.base.
get_asymmetric_matrix
(symmetric_matrix, aggregate_ages)¶ Get the contact matrix for the average individual in each age bracket.
- Parameters
symmetric_matrix (np.ndarray) – A symmetric age contact matrix.
aggregate_ages (dict) – A dictionary mapping single year ages to age brackets.
- Returns
A contact matrix (
np.ndarray
) whose elementsM_ij
describe the contact frequency for the average individual in age bracketi
with all possible contacts in age bracketj
.
Example
age_brackets = sp.get_census_age_brackets(sp.datadir,state_location='Washington',country_location='usa') age_by_brackets_dic = sp.get_age_by_brackets_dic(age_brackets) aggregate_age_count = sp.get_aggregate_ages(age_count, age_by_brackets_dic) aggregate_matrix = symmetric_matrix.copy() aggregate_matrix = sp.get_aggregate_matrix(aggregate_matrix, age_by_brackets_dic) asymmetric_matrix = sp.get_asymmetric_matrix(aggregate_matrix, aggregate_age_count)
synthpops.config module¶
This module sets the location of the data folder and other global settings.
To change the level of log messages displayed, use e.g.
sp.logger.setLevel(‘CRITICAL’)
-
synthpops.config.
checkmem
(unit='mb', fmt='0.2f', start=0, to_string=True)¶ For use with logger, check current memory usage
-
synthpops.config.
set_nbrackets
(n)¶ Set the number of census brackets – usually 16 or 20.
-
synthpops.config.
validate
(verbose=True)¶ Check that the data folder can be found.
-
synthpops.config.
set_location_defaults
(country=None)¶
-
synthpops.config.
get_config_data
()¶
-
synthpops.config.
version_info
()¶
synthpops.contact_networks module¶
This module generates the household, school, and workplace contact networks.
-
synthpops.contact_networks.
make_contacts_from_microstructure_objects
(age_by_uid_dic, homes_by_uids, schools_by_uids=None, teachers_by_uids=None, non_teaching_staff_uids=None, workplaces_by_uids=None, facilities_by_uids=None, facilities_staff_uids=None, use_two_group_reduction=False, average_LTCF_degree=20, with_school_types=False, school_mixing_type='random', average_class_size=20, inter_grade_mixing=0.1, average_student_teacher_ratio=20, average_teacher_teacher_degree=3, average_student_all_staff_ratio=15, average_additional_staff_degree=20, school_type_by_age=None, workplaces_by_industry_codes=None, verbose=False, max_contacts=None)¶ From microstructure objects (dictionary mapping ID to age, lists of lists in different settings, etc.), create a dictionary of individuals. Each key is the ID of an individual which maps to a dictionary for that individual with attributes such as their age, household ID (hhid), school ID (scid), workplace ID (wpid), workplace industry code (wpindcode) if available, and contacts in different layers.
- Parameters
age_by_uid_dic (dict) – dictionary mapping id to age for all individuals in the population
homes_by_uids (list) – A list of lists where each sublist is a household and the IDs of the household members.
schools_by_uids (list) – A list of lists, where each sublist represents a school and the ids of the students and teachers within it
teachers_by_uids (list) – A list of lists, where each sublist represents a school and the ids of the teachers within it
workplaces_by_uids (list) – A list of lists, where each sublist represents a workplace and the ids of the workers within it
facilities_by_uids (list) – A list of lists, where each sublist represents a skilled nursing or long term care facility and the ids of the residents living within it
facilities_staff_uids (list) – A list of lists, where each sublist represents a skilled nursing or long term care facility and the ids of the staff working within it
non_teaching_staff_uids (list) – None or a list of lists, where each sublist represents a school and the ids of the non teaching staff within it
use_two_group_reduction (bool) – If True, create long term care facilities with reduced contacts across both groups
average_LTCF_degree (int) – default average degree in long term care facilities
with_school_types (bool) – If True, creates explicit school types.
school_mixing_type (str or dict) – The mixing type for schools, ‘random’, ‘age_clustered’, or ‘age_and_class_clustered’ if string, and a dictionary of these by school type otherwise. ‘random’ means random graphs for each school, ‘age_clustered’ means random graphs but with students mostly mixing within the age/grade (inter_grade_mixing controls mixing between grades), ‘age_and_grade_clustered’ means students cohorted into classes with their own teachers.
average_class_size (float) – The average classroom size.
inter_grade_mixing (float) – The average fraction of mixing between grades in the same school for clustered school mixing types.
average_student_teacher_ratio (float) – The average number of students per teacher.
average_teacher_teacher_degree (float) – The average number of contacts per teacher with other teachers.
average_student_all_staff_ratio (float) – The average number of students per staff members at school (including both teachers and non teachers).
average_additional_staff_degree (float) – The average number of contacts per additional non teaching staff in schools.
school_type_by_age (dict) – A dictionary of probabilities for the school type likely for each age.
workplaces_by_industry_codes (np.ndarray or None) – array with workplace industry code for each workplace
verbose (bool) – If True, print debugging statements.
trimmed_size_dic (dict) – If supplied, trim contacts on creation rather than post hoc.
- Returns
A popdict of people with attributes. Dictionary keys are the IDs of individuals in the population and the values are a dictionary for each individual with their attributes, such as age, household ID (hhid), school ID (scid), workplace ID (wpid), workplace industry code (wpindcode) if available, and the IDs of their contacts in different layers. Different layers available are households (‘H’), schools (‘S’), and workplaces (‘W’), and long term care facilities (‘LTCF’). Contacts in these layers are clustered and thus form a network composed of groups of people interacting with each other. For example, all household members are contacts of each other, and everyone in the same school is considered a contact of each other. If use_two_group_reduction is True, then contracts within ‘LTCF’ are reduced from fully connected.
Notes
Methods to trim large groups of contacts down to better approximate a sense of close contacts (such as classroom sizes or smaller work groups are available via sp.trim_contacts() or sp.create_reduced_contacts_with_group_types(): see these methods for more details).
-
synthpops.contact_networks.
create_reduced_contacts_with_group_types
(popdict, group_1, group_2, setting, average_degree=20, p_matrix=None, force_cross_edges=True)¶ Create contacts between members of group 1 and group 2, fixing the average degree, and the probability of an edge between any two groups controlled by p_matrix if provided. Forces inter group edge for each individual in group 1 with force_cross_groups equal to True. This means not everyone in group 2 will have a contact with group 1.
- Parameters
group_1 (list) – list of ids for group 1
group_2 (list) – list of ids for group 2
average_degree (int) – average degree across group 1 and 2
p_matrix (np.ndarray) – probability matrix for edges between any two groups
force_cross_groups (bool) – If True, force each individual to have at least one contact with a member from the other group
- Returns
Popdict with edges added for nodes in the two groups.
Notes
This method uses the Stochastic Block Model algorithm to generate contacts both between nodes in different groups
and for nodes within the same group. In the current version, fixing the average degree and p_matrix, the matrix of probabilities for edges between any two groups is not supported. Future versions may add support for this.
synthpops.data_distributions module¶
Read in data distributions.
-
synthpops.data_distributions.
get_relative_path
(datadir)¶
-
synthpops.data_distributions.
get_nbrackets
()¶
-
synthpops.data_distributions.
get_age_brackets_from_df
(ab_file_path)¶ Create a dict of age bracket ranges from ab_file_path.
- Parameters
ab_file_path (string) – file path to get the ends of different age
from (brackets) –
- Returns
A dictionary with a np.ndarray of the age range that maps to each age bracket key.
Examples:
get_age_brackets_from_df(ab_file_path) returns a dictionary age_brackets, where age_brackets[0] is the age range for the first age bracket, age_brackets[1] is the age range for the second age bracket, etc.
-
synthpops.data_distributions.
get_age_bracket_distr_path
(datadir, location=None, state_location=None, country_location=None, nbrackets=None)¶ Get file_path for age distribution by age brackets.
- Parameters
datadir (string) – file path to the data directory
location (string) – name of the location
state_location (string) – name of the state the location is in
country_location (string) – name of the country the location is in
- Returns
A file path to the age distribution by age bracket data.
-
synthpops.data_distributions.
read_age_bracket_distr
(datadir, location=None, state_location=None, country_location=None, nbrackets=None, file_path=None, use_default=False)¶ A dict of the age distribution by age brackets. If use_default, then we’ll first try to look for location specific data and if that’s not available we’ll use default data from default_location, default_state, default_country. This may not be appropriate for the population under study so it’s best to provide as much data as you can for the specific population.
- Parameters
datadir (string) – file path to the data directory
location (string) – name of the location
state_location (string) – name of the state the location is in
country_location (string) – name of the country the location is in
file_path (string) – file path to user specified age bracket distribution data
use_default (bool) – if True, try to first use the other parameters to find data specific to the location under study, otherwise returns default data drawing from the default_location, default_state, default_country.
- Returns
A dictionary of the age distribution by age bracket. Keys map to a range of ages in that age bracket.
-
synthpops.data_distributions.
get_smoothed_single_year_age_distr
(datadir, location=None, state_location=None, country_location=None, nbrackets=None, file_path=None, use_default=None, window_length=7)¶ A smoothed dict of the age distribution by single years. If use_default, then we’ll first try to look for location specific data and if that’s not available we’ll use default data from default_location, default_state, default_country. This may not be appropriate for the population under study so it’s best to provide as much data as you can for the specific population. Using moving windows to smooth out the age distribution.
- Parameters
datadir (string) – file path to the data directory
location (string) – name of the location
state_location (string) – name of the state the location is in
country_location (string) – name of the country the location is in
file_path (string) – file path to user specified age bracket distribution data
use_default (bool) – If True, try to first use the other parameters to find data specific to the location under study, otherwise returns default data drawing from the default_location, default_state, default_country.
window_length (int) – length of window, in units of years, over which to average or smooth out age distribution
- Returns
A dictionary of the age distribution by age bracket. Keys map to a range of ages in that age bracket.
-
synthpops.data_distributions.
get_household_size_distr_path
(datadir, location=None, state_location=None, country_location=None)¶ Get file_path for household size distribution.
- Parameters
datadir (string) – file path to the data directory
location (string) – name of the location
state_location (string) – name of the state the location is in
country_location (string) – name of the country the location is in
- Returns
A file path to the household size distribution data.
-
synthpops.data_distributions.
get_household_size_distr
(datadir, location=None, state_location=None, country_location=None, file_path=None, use_default=False)¶ A dictionary of the distribution of household sizes. If you don’t give the file_path, then supply the location, state_location, and country_location strings. If use_default, then we’ll first try to look for location specific data and if that’s not available we’ll use default data from default_location, default_state, default_country. This may not be appropriate for the population under study so it’s best to provide as much data as you can for the specific population.
- Parameters
datadir (string) – file path to the data directory
location (string) – name of the location
state_location (string) – name of the state the location is in
country_location (string) – name of the country the location is in
file_path (string) – file path to user specified household size distribution data
use_default (bool) – if True, try to first use the other parameters to find data specific to the location under study, otherwise returns default data drawing from default_location, default_state, default_country.
- Returns
A dictionary of the household size distribution data. Keys map to the household size as an integer, values are the percent of households of that size.
-
synthpops.data_distributions.
get_head_age_brackets_path
(datadir, location=None, state_location=None, country_location=None)¶ Get file_path for head of household age brackets.
- Parameters
datadir (string) – file path to the data directory
location (string) – name of the location
state_location (string) – name of the state
country_location (string) – name of the country the state_location is in
- Returns
A file path to the age brackets for head of household distribution data.
-
synthpops.data_distributions.
get_head_age_brackets
(datadir, location=None, state_location=None, country_location=None, file_path=None, use_default=False)¶ Get a dictionary of head age brackets either from the file_path directly, or using the other parameters to figure out what the file_path should be. If use_default, then we’ll first try to look for location specific data and if that’s not available we’ll use default data from default_location, default_state, default_country. This may not be appropriate for the population under study so it’s best to provide as much data as you can for the specific population.
- Parameters
datadir (string) – file path to the data directory
location (string) – name of the location
state_location (string) – name of the state
country_location (string) – name of the country the state_location is in
file_path (string) – file path to user specified head age brackets data
use_default (bool) – if True, try to first use the other parameters to find data specific to the location under study, otherwise returns default data drawing from the default_location, default_state, default_country.
- Returns
A dictionary of the age brackets for head of household distribution data. Keys map to the age bracket as an integer, values are the percent of households which head of household in that age bracket.
-
synthpops.data_distributions.
get_household_head_age_by_size_path
(datadir, location=None, state_location=None, country_location=None)¶ Get file_path for head of household age by size counts or distribution. If the data doesn’t exist at the state level, only give the country_location.
- Parameters
datadir (string) – file path to the data directory
location (string) – name of the location
state_location (string) – name of the state
country_location (string) – name of the country the state_location is in
- Returns
A file path to the head of household age by household size count or distribution data.
-
synthpops.data_distributions.
get_household_head_age_by_size_df
(datadir, location=None, state_location=None, country_location=None, file_path=None, use_default=False)¶ Return a pandas df of head of household age by the size of the household. If the file_path is given return from there first. If use_default, then we’ll first try to look for location specific data and if that’s not available we’ll use default data from default_location, default_state, default_country. This may not be appropriate for the population under study so it’s best to provide as much data as you can for the specific population.
- Parameters
datadir (string) – file path to the data directory
location (string) – name of the location
state_location (string) – name of the state
country_location (string) – name of the country the state_location is in
file_path (string) – file path to user specified data for the age of the head of the household by household size
use_default (bool) – if True, try to first use the other parameters to find data specific to the location under study, otherwise returns default data drawing from the default_location, default_state, default_country.
- Returns
A file path to the head of household age by household size count or distribution data.
-
synthpops.data_distributions.
get_head_age_by_size_distr
(datadir, location=None, state_location=None, country_location=None, file_path=None, household_size_1_included=False, use_default=False)¶ Create an array of head of household age bracket counts (column) given by size (row). If use_default, then we’ll first try to look for location specific data and if that’s not available we’ll use default data from the default_location, default_state, default_country. This may not be appropriate for the population under study so it’s best to provide as much data as you can for the specific population.
- Parameters
datadir (string) – file path to the data directory
location (string) – name of the location
state_location (string) – name of the state
country_location (string) – name of the country the state_location is in
file_path (string) – file path to user specified age of the head of the household by household size distribution data
household_size_1_included – if True, age distribution for who lives alone is included in the head of household age by household size dataframe, so it will be used. Else, assume a uniform distribution for this among all ages of adults.
use_default (bool) – if True, try to first use the other parameters to find data specific to the location under study, otherwise returns default data drawing from default_location, default_state, default_country.
- Returns
An array where each row s represents the age distribution of the head of households for households of size s-1.
-
synthpops.data_distributions.
get_census_age_brackets_path
(datadir, location=None, state_location=None, country_location=None, nbrackets=None)¶ Get file_path for census age brackets: will depend on the state or country of the source data on the age distribution and age specific contact patterns.
- Parameters
datadir (string) – file path to the data directory
location (string) – name of the location
state_location (string) – name of the state
country_location (string) – name of the country the state_location is in
- Returns
A file path to the age brackets to be used with census age data in combination with the contact matrix data.
-
synthpops.data_distributions.
get_census_age_brackets
(datadir, location=None, state_location=None, country_location=None, file_path=None, use_default=False, nbrackets=None)¶ Get census age brackets: depends on the country or source of the age distribution and the contact pattern data. If use_default, then we’ll first try to look for location specific data and if that’s not available we’ll use default data from default_location, default_state, default_country. This may not be appropriate for the population under study so it’s best to provide as much data as you can for the specific population.
- Parameters
datadir (string) – file path to the data directory
location (string) – name of the location
state_location (string) – name of the state
country_location (string) – name of the country the state_location is in
file_path (string) – file path to user specified census age brackets
use_default (bool) – if True, try to first use the other parameters to find data specific to the location under study, otherwise returns default data drawing from default_location, default_state, default_country.
- Returns
A dictionary of the range of ages that map to each age bracket.
-
synthpops.data_distributions.
get_contact_matrix
(datadir, setting_code, sheet_name=None, file_path=None, delimiter=' ', header=None)¶ Get setting specific age contact matrix given sheet name to use. If file_path is given, then delimiter and header should also be specified.
- Parameters
datadir (string) – file path to the data directory
setting_code (string) – name of the physial contact setting: H for households, S for schools, W for workplaces, C for community or other
sheet_name (string) – name of the sheet in the excel file with contact patterns
file_path (string) – file path to user specified age contact matrix
delimiter (string) – delimter for the contact matrix file
header (int) – row number for the header of the file
- Returns
Matrix of contact patterns where each row i is the average contact patterns for an individual in age bracket i and the columns represent the age brackets of their contacts. The matrix element i,j is then the contact rate, number, or frequency for the average individual in age bracket i with all of their contacts in age bracket j in that physical contact setting.
-
synthpops.data_distributions.
get_contact_matrix_dic
(datadir, sheet_name=None, file_path_dic=None, delimiter=' ', header=None, use_default=False)¶ Create a dict of setting specific age contact matrices. If use_default, then we’ll first try to look for location specific data and if that’s not available we’ll use default data from default_location, default_state, default_country. This may not be appropriate for the population under study so it’s best to provide as much data as you can for the specific population.
- Parameters
datadir (string) – file path to the data directory
setting_code (string) – name of the physial contact setting: H for households, S for schools, W for workplaces, C for community or other
sheet_name (string) – name of the sheet in the excel file with contact patterns
file_path_dic (string) – dictionary to file paths of user specified age contact matrix, where keys are “H”, “S”, “W”, and “C”.
delimiter (string) – delimter for the contact matrix file
header (int) – row number for the header of the file
- Returns
A dictionary of the different contact matrices for each population, given by the sheet name. Keys map to the different possible physical contact settings for which data are available.
-
synthpops.data_distributions.
get_school_enrollment_rates_path
(datadir, location=None, state_location=None, country_location=None)¶ Get a file_path for enrollment rates by age.
- Parameters
datadir (string) – file path to the data directory
location (string) – name of the location
state_location (string) – name of the state the location is in
country_location (string) – name of the country the location is in
- Returns
A file path to the school enrollment rates.
-
synthpops.data_distributions.
get_school_enrollment_rates
(datadir, location=None, state_location=None, country_location=None, file_path=None, use_default=False)¶ Get dictionary of enrollment rates by age. If use_default, then we’ll first try to look for location specific data and if that’s not available we’ll use default data from default_location, default_state, default_country. This may not be appropriate for the population under study so it’s best to provide as much data as you can for the specific population.
- Parameters
datadir (string) – file path to the data directory
location (string) – name of the location
state_location (string) – name of the state the location is in
country_location (string) – name of the country the location is in
file_path (string) – file path to user specified school enrollment by age data
use_default (bool) – if True, try to first use the other parameters to find data specific to the location under study, otherwise returns default data drawing from default_location, default_state, default_country.
- Returns
A dictionary of school enrollment rates by age.
-
synthpops.data_distributions.
get_school_size_brackets_path
(datadir, location=None, state_location=None, country_location=None)¶ Get file_path for school size brackets specific to the location under study.
- Parameters
datadir (string) – file path to the data directory
location (string) – name of the location
state_location (string) – name of the state the location is in
country_location (string) – name of the country the location is in
- Returns
A file path to school size brackets.
-
synthpops.data_distributions.
get_school_size_brackets
(datadir, location=None, state_location=None, country_location=None, file_path=None, use_default=False)¶ Get school size brackets: depends on the source/location of the data. If use_default, then we’ll first try to look for location specific data and if that’s not available we’ll use default data from default_location, default_state, default_country. This may not be appropriate for the population under study so it’s best to provide as much data as you can for the specific population.
- Parameters
datadir (string) – file path to the data directory
location (string) – name of the location
state_location (string) – name of the state the location is in
country_location (string) – name of the country the location is in
file_path (string) – file path to user specified school size brackets data
use_default (bool) – if True, try to first use the other parameters to find data specific to the location under study, otherwise returns default data drawing from default_location, default_state, default_country.
- Returns
A dictionary of school size brackets.
-
synthpops.data_distributions.
get_school_size_distr_by_brackets_path
(datadir, location=None, state_location=None, country_location=None)¶ Get file_path for the distribution of school size by brackets.
- Parameters
datadir (string) – file path to the data directory
location (string) – name of the location
state_location (string) – name of the state the location is in
country_location (string) – name of the country the location is in
- Returns
A file path to the distribution of school sizes by bracket.
-
synthpops.data_distributions.
get_school_size_distr_by_brackets
(datadir, location=None, state_location=None, country_location=None, file_path=None, use_default=False)¶ Get distribution of school sizes by size bracket or bin. If use_default, then we’ll first try to look for location specific data and if that’s not available we’ll use default data from default_location, default_state, default_country. This may not be appropriate for the population under study so it’s best to provide as much data as you can for the specific population.
- Parameters
datadir (string) – file path to the data directory
location (string) – name of the location
state_location (string) – name of the state the location is in
country_location (string) – name of the country the location is in
file_path (string) – file path to user specified school size distribution data
use_default (bool) – if True, try to first use the other parameters to find data specific to the location under study, otherwise returns default data drawing from default_location, default_state, default_country.
- Returns
A dictionary of the distribution of school sizes by bracket.
-
synthpops.data_distributions.
get_default_school_type_age_ranges
()¶ Define and return default school types and the age range for each.
- Returns
A dictionary of default school types and the age range for each.
-
synthpops.data_distributions.
get_default_school_types_distr_by_age
()¶ Define and return default probabilities of school type for each age.
- Returns
A dictionary of default probabilities for the school type likely for each age.
-
synthpops.data_distributions.
get_default_school_types_by_age_single
()¶ Define and return default school type by age by assigning the school type with the highest probability.
- Returns
A dictionary of default school type by age.
-
synthpops.data_distributions.
get_default_school_size_distr_brackets
()¶ Define and return default school size distribution brackets.
- Returns
A dictionary of school size brackets.
-
synthpops.data_distributions.
get_default_school_size_distr_by_type
()¶ Define and return default school size distribution for each school type. The school size distributions are binned to size groups or brackets.
- Returns
A dictionary of school size distributions binned by size groups or brackets for each type of default school.
-
synthpops.data_distributions.
write_school_type_age_ranges
(datadir, location, state_location, country_location, school_type_age_ranges)¶ Write to file the age range for each school type.
- Parameters
datadir (string) – file path to the data directory
location (string) – name of the location
state_location (string) – name of the state the location is in
country_location (string) – name of the country the location is in
school_type_age_ranges (dict) – a dictionary with the age range for each school type
- Returns
None.
-
synthpops.data_distributions.
get_school_type_age_ranges_path
(datadir, location=None, state_location=None, country_location=None)¶ Get file_path for the age range by school type.
- Parameters
datadir (string) – file path to the data directory
location (string) – name of the location
state_location (string) – name of the state the location is in
country_location (string) – name of the country the location is in
- Returns
A file path to the age range for different school types.
-
synthpops.data_distributions.
get_school_type_age_ranges
(datadir, location, state_location, country_location, file_path=None, use_default=None)¶ Get a dictionary of the school types and the age range for each for the location specified.
- Parameters
datadir (string) – file path to the data directory
location (string) – name of the location
state_location (string) – name of the state the location is in
country_location (string) – name of the country the location is in
file_path (string) – file path to user specified distribution data
use_default (bool) – if True, try to first use the other parameters to find data specific to the location under study, otherwise returns default data drawing from Seattle, Washington.
- Returns
A dictionary of default school types and the age range for each.
-
synthpops.data_distributions.
get_school_size_distr_by_type_path
(datadir, location=None, state_location=None, country_location=None)¶ Get file_path for the school size distribution by school type.
- Parameters
datadir (string) – file path to the data directory
location (string) – name of the location
state_location (string) – name of the state the location is in
country_location (string) – name of the country the location is in
- Returns
A file path to the school size distribution data by different school types for the region specified.
- Return type
str
-
synthpops.data_distributions.
get_school_size_distr_by_type
(datadir, location=None, state_location=None, country_location=None, file_path=None, use_default=None)¶ Get the school size distribution by school types. If use_default, then we’ll try to look for location specific data first, and if that’s not available we’ll use default data from the set default locations (see sp.config.py). This may not be appropriate for the population under study so it’s best to provide as much data as you can for the specific population.
- Parameters
datadir (string) – file path to the data directory
location (string) – name of the location
state_location (string) – name of the state the location is in
country_location (string) – name of the country the location is in, which should be the ‘usa’
file_path (string) – file path to user specified distribution data
use_default (bool) – if True, try to first use the other parameters to find data specific to the location under study, otherwise returns default data drawing from Seattle, Washington.
- Returns
A dictionary of school size distributions binned by size groups or brackets for each type of default school.
-
synthpops.data_distributions.
get_employment_rates_path
(datadir, location=None, state_location=None, country_location=None)¶ Get file_path for employment rates by age.
- Parameters
datadir (string) – file path to the data directory
location (string) – name of the location
state_location (string) – name of the state the location is in
country_location (string) – name of the country the location is in
- Returns
A file path to employment rates by age.
-
synthpops.data_distributions.
get_employment_rates
(datadir, location, state_location, country_location, file_path=None, use_default=False)¶ Get employment rates by age. If use_default, then we’ll first try to look for location specific data and if that’s not available we’ll use default data from default_location, default_state, default_country. This may not be appropriate for the population under study so it’s best to provide as much data as you can for the specific population.
- Parameters
datadir (string) – file path to the data directory
location (string) – name of the location
state_location (string) – name of the state the location is in
country_location (string) – name of the country the location is in, which should be the ‘usa’
file_path (string) – file path to user specified employment by age data
use_default (bool) – if True, try to first use the other parameters to find data specific to the location under study, otherwise returns default data drawing from default_location, default_state, default_country.
- Returns
A dictionary of employment rates by age.
-
synthpops.data_distributions.
get_workplace_size_brackets_path
(datadir, location=None, state_location=None, country_location=None)¶ Get file_path for workplace size brackets.
- Parameters
datadir (string) – file path to the data directory
location (string) – name of the location
state_location (string) – name of the state the location is in
country_location (string) – name of the country the location is in
- Returns
A file path to workplace size brackets.
-
synthpops.data_distributions.
get_workplace_size_brackets
(datadir, location=None, state_location=None, country_location=None, file_path=None, use_default=False)¶ Get workplace size brackets. If use_default, then we’ll first try to look for location specific data and if that’s not available we’ll use default data from default_location, default_state, default_country. This may not be appropriate for the population under study so it’s best to provide as much data as you can for the specific population.
- Parameters
datadir (string) – file path to the data directory
location (string) – name of the location
state_location (string) – name of the state the location is in
country_location (string) – name of the country the location is in, which should be the ‘usa’
file_path (string) – file path to user specified workplace size brackets data
use_default (bool) – if True, try to first use the other parameters to find data specific to the location under study, otherwise returns default data drawing from default_location, default_state, default_country.
- Returns
A dictionary of workplace size brackets.
-
synthpops.data_distributions.
get_workplace_size_distr_by_brackets_path
(datadir, location=None, state_location=None, country_location=None)¶ Get file_path for the distribution of workplace size by brackets.
- Parameters
datadir (string) – file path to the data directory
location (string) – name of the location
state_location (string) – name of the state the location is in
country_location (string) – name of the country the location is in
- Returns
A file path to the distribution of workplace sizes by bracket.
-
synthpops.data_distributions.
get_workplace_size_distr_by_brackets
(datadir, location=None, state_location=None, country_location=None, file_path=None, use_default=False)¶ Get the distribution of workplace size by brackets. If use_default, then we’ll first try to look for location specific data and if that’s not available we’ll use default data from default_location, default_state, default_country. This may not be appropriate for the population under study so it’s best to provide as much data as you can for the specific population.
- Parameters
datadir (string) – file path to the data directory
location (string) – name of the location
state_location (string) – name of the state the location is in
country_location (string) – name of the country the location is in
file_path (string) – file path to user specified workplace size distribution data
use_default (bool) – if True, try to first use the other parameters to find data specific to the location under study, otherwise returns default data drawing from default_location, default_state, default_country.
- Returns
A dictionary of the distribution of workplace sizes by bracket.
-
synthpops.data_distributions.
get_state_postal_code
(state_location, country_location)¶ Get the state postal code.
- Parameters
state_location (string) – name of the state
country_location (string) – name of the country the state is in
- Returns
A postal code for the state_location.
- Return type
str
-
synthpops.data_distributions.
get_usa_long_term_care_facility_path
(datadir, state_location=None, country_location=None, part=None)¶ Get file_path for state level data on Long Term Care Providers for the US from 2015-2016.
- Parameters
datadir (string) – file path to the data directory
state_location (string) – name of the state
country_location (string) – name of the country the state is in
part (int) – part 1 or 2 of the table
- Returns
A file path to data on Long Term Care Providers from ‘Long-Term Care Providers and Services Users in the United States - State Estimates Supplement: National Study of Long-Term Care Providers, 2015-2016’. Part 1 or 2 are available.
- Return type
str
-
synthpops.data_distributions.
get_usa_long_term_care_facility_data
(datadir, state_location=None, country_location=None, part=None, file_path=None, use_default=False)¶ Get state level data table from National survey on Long Term Care Providers for the US from 2015-2016.
- Parameters
datadir (string) – file path to the data directory
state_location (string) – name of the state the location is in
country_location (string) – name of the country the location is in
part (int) – part 1 or 2 of the table
file_path (string) – file path to user specified LTCF distribution data
use_default (bool) – if True, try to first use the other parameters to find data specific to the location under study, otherwise returns default data drawing from Seattle, Washington.
- Returns
A file path to data on the size distribution of residents per facility for Long Term Care Facilities.
- Return type
str
-
synthpops.data_distributions.
get_long_term_care_facility_residents_path
(datadir, location=None, state_location=None, country_location=None)¶ Get file_path for the size distribution of residents per facility for Long Term Care Facilities.
- Parameters
datadir (string) – file path to the data directory
location (string) – name of the location
state_location (string) – name of the state the location is in
country_location (string) – name of the country the location is in
- Returns
A file path to data on the size distribution of residents per facility for Long Term Care Facilities.
-
synthpops.data_distributions.
get_long_term_care_facility_residents_distr
(datadir, location=None, state_location=None, country_location=None, file_path=None, use_default=None)¶ Get size distribution of residents per facility for Long Term Care Facilities.
- Parameters
datadir (string) – file path to the data directory
location (string) – name of the location
state_location (string) – name of the state the location is in
country_location (string) – name of the country the location is in
file_path (string) – file path to user specified LTCF resident size distribution data
use_default (bool) – if True, try to first use the other parameters to find data specific to the location under study, otherwise returns default data drawing from Seattle, Washington.
- Returns
A dictionary of the distribution of residents per facility for Long Term Care Facilities.
-
synthpops.data_distributions.
get_long_term_care_facility_residents_distr_brackets_path
(datadir, location=None, state_location=None, country_location=None)¶ Get file_path for the size bins for the distribution of residents per facility for Long Term Care Facilities.
- Parameters
datadir (string) – file path to the data directory
location (string) – name of the location
state_location (string) – name of the state the location is in
country_location (string) – name of the country the location is in
- Returns
A file path to data on the size bins for the distribution of residents per facility for Long Term Care Facilities.
-
synthpops.data_distributions.
get_long_term_care_facility_residents_distr_brackets
(datadir, location=None, state_location=None, country_location=None, file_path=None, use_default=None)¶ Get size bins for the distribution of residents per facility for Long Term Care Facilities.
- Parameters
datadir (string) – file path to the data directory
location (string) – name of the location
state_location (string) – name of the state the location is in
country_location (string) – name of the country the location is in, which should be the ‘usa’
file_path (string) – file path to user specified LTCF resident size brackets data
use_default (bool) – if True, try to first use the other parameters to find data specific to the location under study, otherwise returns default data drawing from Seattle, Washington.
- Returns
A dictionary of size brackets or bins for residents per facility.
-
synthpops.data_distributions.
get_long_term_care_facility_resident_to_staff_ratios_path
(datadir, location=None, state_location=None, country_location=None)¶ Get file_path for the distribution of resident to staff ratios per facility for Long Term Care Facilities.
- Parameters
datadir (string) – file path to the data directory
location (string) – name of the location
state_location (string) – name of the state the location is in
country_location (string) – name of the country the location is in
- Returns
A file path to data on the distribution of resident to staff ratios per facility for Long Term Care Facilities.
-
synthpops.data_distributions.
get_long_term_care_facility_resident_to_staff_ratios_distr
(datadir, location=None, state_location=None, country_location=None, file_path=None, use_default=None)¶ Get size distribution of resident to staff ratios per facility for Long Term Care Facilities.
- Parameters
datadir (string) – file path to the data directory
location (string) – name of the location
state_location (string) – name of the state the location is in
country_location (string) – name of the country the location is in
file_path (string) – file path to user specified resident to staff ratio distribution data
use_default (bool) – if True, try to first use the other parameters to find data specific to the location under study, otherwise returns default data drawing from Seattle, Washington.
- Returns
A dictionary of the distribution of residents per facility for Long Term Care Facilities.
-
synthpops.data_distributions.
get_long_term_care_facility_resident_to_staff_ratios_brackets_path
(datadir, location=None, state_location=None, country_location=None)¶ Get file_path for the size bins for the distribution of residents to staff ratios per facility for Long Term Care Facilities.
- Parameters
datadir (string) – file path to the data directory
location (string) – name of the location
state_location (string) – name of the state the location is in
country_location (string) – name of the country the location is in
- Returns
A file path to data on the size bins for the distribution of resident to staff ratios per facility for Long Term Care Facilities.
- Return type
str
-
synthpops.data_distributions.
get_long_term_care_facility_resident_to_staff_ratios_brackets
(datadir, location=None, state_location=None, country_location=None, file_path=None, use_default=None)¶ Get size bins for the distribution of resident to staff ratios per facility for Long Term Care Facilities.
- Parameters
datadir (string) – file path to the data directory
location (string) – name of the location
state_location (string) – name of the state the location is in
country_location (string) – name of the country the location is in, which should be the ‘usa’
file_path (string) – file path to user specified resident to staff ratio brackets data
use_default (bool) – if True, try to first use the other parameters to find data specific to the location under study, otherwise returns default data drawing from Seattle, Washington.
- Returns
A dictionary of size brackets or bins for resident to staff ratios per facility.
-
synthpops.data_distributions.
get_long_term_care_facility_use_rates_path
(datadir, state_location=None, country_location=None)¶ Get file_path for Long Term Care Facility use rates by age for a state.
- Parameters
datadir (str) – file path to the data directory
location_alias (str) – more commonly known name of the location
state_location (str) – name of the state the location is in
country_location (str) – name of the country the location is in
- Returns
A file path to the data on the Long Term Care Facility usage rates by age.
- Return type
str
Note
Currently only available for the United States.
-
synthpops.data_distributions.
get_long_term_care_facility_use_rates
(datadir, state_location=None, country_location=None, file_path=None, use_default=None)¶ Get Long Term Care Facility use rates by age for a state.
- Parameters
datadir (str) – file path to the data directory
location_alias (str) – more commonly known name of the location
state_location (str) – name of the state the location is in
country_location (str) – name of the country the location is in
file_path (string) – file path to user specified gender by age bracket distribution data
use_default (bool) – if True, try to first use the other parameters to find data specific to the location under study, otherwise returns default data drawing from Seattle, Washington.
- Returns
A dictionary of the Long Term Care Facility usage rates by age.
- Return type
dict
Note
Currently only available for the United States.
synthpops.households module¶
Functions for generating households
-
synthpops.households.
generate_household_sizes_from_fixed_pop_size
(N, hh_size_distr)¶ Given a number of people and a household size distribution, generate the number of homes of each size needed to place everyone in a household.
- Parameters
N (int) – The number of people in the population.
hh_size_distr (dict) – The distribution of household sizes.
- Returns
An array with the count of households of size s at index s-1.
-
synthpops.households.
generate_household_head_age_by_size
(hha_by_size_counts, hha_brackets, hh_size, single_year_age_distr)¶ Generate the age of the head of the household, also known as the reference person of the household, conditional on the size of the household.
- Parameters
hha_by_size_counts (matrix) – A matrix in which each row contains the age distribution of the reference person for household size s at index s-1.
hha_brackets (dict) – The age brackets for the heads of household.
hh_size (int) – The household size.
single_year_age_distr (dict) – The age distribution.
- Returns
Age of the head of the household or reference person.
-
synthpops.households.
generate_living_alone
(hh_sizes, hha_by_size_counts, hha_brackets, single_year_age_distr)¶ Generate the ages of those living alone.
- Parameters
hh_sizes (array) – The count of household size s at index s-1.
hha_by_size_counts (matrix) – A matrix in which each row contains the age distribution of the reference person for household size s at index s-1.
hha_brackets (dict) – The age brackets for the heads of household.
single_year_age_distr (dict) – The age distribution.
- Returns
An array of households of size 1 where each household is a row and the value in the row is the age of the household member.
-
synthpops.households.
assign_uids_by_homes
(homes, id_len=16, use_int=True)¶ Assign IDs to everyone in order by their households.
- Parameters
homes (array) – The generated synthetic ages of household members.
id_len (int) – The length of the UID.
use_int (bool) – If True, use ints for the uids of individuals; otherwise use strings of length ‘id_len’.
- Returns
A copy of the generated households with IDs in place of ages, and a dictionary mapping ID to age.
-
synthpops.households.
generate_age_count
(n, age_distr)¶ Generate a stochastic count of people for each age given the age distribution (age_distr) and number of people to generate (n).
- Parameters
n (int) – number of people to generate
age_distr (list or np.ndarray) – single year age distribution
- Returns
A dictionary with the count of people to generate for each age given an age distribution and the number of people to generate.
- Return type
dict
-
synthpops.households.
generate_living_alone_method_2
(hh_sizes, hha_by_size, hha_brackets, age_count)¶ Generate the ages of those living alone.
- Parameters
hh_sizes (array) – The count of household size s at index s-1.
hha_by_size (matrix) – A matrix in which each row contains the age distribution of the reference person for household size s at index s-1.
hha_brackets (dict) – The age brackets for the heads of household.
age_distr (dict) – The age distribution.
- Returns
An array of households of size 1 where each household is a row and the value in the row is the age of the household member.
-
synthpops.households.
generate_larger_household_sizes
(hh_sizes)¶ Create a list of the households larger than 1 in random order so that as individuals are placed by age into homes running out of specific ages is not systemically an issue for any given household size unless certain sizes greatly outnumber households of other sizes.
- Parameters
hh_sizes (array) – The count of household size s at index s-1.
- Returns
An array of household sizes to be generated and place people into households.
- Return type
Np.array
-
synthpops.households.
generate_larger_households_head_ages
(larger_hh_size_array, hha_by_size, hha_brackets, ages_left_to_assign)¶ Generate the ages of the heads of households for households larger than 2.
-
synthpops.households.
generate_larger_households_method_2
(larger_hh_size_array, larger_hha_chosen, hha_brackets, cm_age_brackets, cm_age_by_brackets_dic, household_matrix, ages_left_to_assign, homes_dic)¶ Assign people to households larger than one person (excluding special residences like long term care facilities or agricultural workers living in shared residential quarters.
- Parameters
hh_sizes (array) – The count of household size s at index s-1.
hha_by_size (matrix) – A matrix in which each row contains the age distribution of the reference person for household size s at index s-1.
hha_brackets (dict) – The age brackets for the heads of household.
cm_age_brackets (dict) – The age brackets for the contact matrix.
cm_age_by_brackets_dic (dict) – A dictionary mapping age to the age bracket range it falls within.
household_matrix (dict) – The age-specific contact matrix for the household ontact setting.
larger_homes_age_count (dict) – Age count of people left to place in households larger than one person.
- Returns
A dictionary of households by age indexed by household size.
- Return type
dict
-
synthpops.households.
get_all_households
(homes_dic)¶ Get all households in a list, randomly assorted.
- Parameters
homes_dic (dict) – A dictionary of households by age indexed by household size
- Returns
A random ordering of households with the ages of the individuals.
- Return type
list
synthpops.ltcfs module¶
Modeling Seattle Metro Long Term Care Facilities
-
synthpops.ltcfs.
generate_ltcfs
(n, with_facilities, datadir, country_location, state_location, location, use_default, smooth_ages, window_length)¶ Generate residents living in long term care facilities and their ages.
- Parameters
n (int) – The number of people to create.
with_facilities (bool) – If True, create long term care facilities, currently only available for locations in the US.
datadir (string) – The file path to the data directory.
country_location (string) – name of the country the location is in
state_location (string) – name of the state the location is in
location – name of the location
use_default (bool) – If True, try to first use the other parameters to find data specific to the location under study; otherwise, return default data drawing from default_location, default_state, default_country.
smooth_ages (bool) – If True, use smoothed out age distribution.
window_length (int) – length of window over which to average or smooth out age distribution
- Returns
The number of people expected to live outside long term care facilities, age_brackets, age_by_brackets dictionary, age distribution adjusted for long term care facility residents already sampled, and facilities with people living in them.
-
synthpops.ltcfs.
assign_facility_staff
(datadir, location, state_location, country_location, ltcf_staff_age_min, ltcf_staff_age_max, facilities, workers_by_age_to_assign_count, potential_worker_uids_by_age, potential_worker_uids, facilities_by_uids, age_by_uid_dic, use_default=False)¶ Assign Long Term Care Facility staff to the generated facilities with residents.
- Parameters
datadir (string) – The file path to the data directory.
location – name of the location
state_location (string) – name of the state the location is in
country_location (string) – name of the country the location is in
ltcf_staff_age_min (int) – Long term care facility staff minimum age.
ltcf_staff_age_max (int) – Long term care facility staff maximum age.
facilities (list) – A list of lists where each sublist is a facility with the resident ages
workers_by_age_to_assign_count (dict) – A dictionary mapping age to the count of employed individuals of that age.
potential_worker_uids (dict) – dictionary of potential workers mapping their id to their age
facilities – A list of lists where each sublist is a facility with the resident IDs
age_by_uid_dic (dict) – dictionary mapping id to age for all individuals in the population
use_default (bool) – If True, try to first use the other parameters to find data specific to the location under study; otherwise, return default data drawing from default_location, default_state, default_country.
- Returns
A list of lists with the facility staff IDs for each facility.
- Return type
list
-
synthpops.ltcfs.
remove_ltcf_residents_from_potential_workers
(facilities_by_uids, potential_worker_uids, potential_worker_uids_by_age, workers_by_age_to_assign_count, age_by_uid_dic)¶ Remove facilities residents from potential workers
-
synthpops.ltcfs.
ltcf_resample_age
(exp_age_distr, a)¶ Resampling younger ages to better match data
- Parameters
exp_age_distr (dict) – age distribution
age (int) – age as an integer
- Returns
Resampled age as an integer.
Notes
This is not always necessary, but is mostly used to smooth out sharp edges in the age distribution when spsamp.resample_age() produces too many of one year and under produces the surrounding ages. For example, new borns (0 years old) may be over produced, and 1 year olds under produced, so this function can be customized to correct for that. It is currently customized to model well the age distribution for Seattle, Washington.
-
synthpops.ltcfs.
generate_larger_households_method_1
(size, hh_sizes, hha_by_size_counts, hha_brackets, cm_age_brackets, cm_age_by_brackets_dic, contact_matrix_dic, single_year_age_distr)¶ Generate ages of those living in households of greater than one individual. Reference individual is sampled conditional on the household size. All other household members have their ages sampled conditional on the reference person’s age and the age mixing contact matrix in households for the population under study.
- Parameters
size (int) – The household size.
hh_sizes (array) – The count of household size s at index s-1.
hha_by_size_counts (matrix) – A matrix in which each row contains the age distribution of the reference person for household size s at index s-1.
hha_brackets (dict) – The age brackets for the heads of household.
cm_age_brackets (dict) – The dictionary mapping age bracket keys to age bracket range matching the household contact matrix.
cm_age_by_brackets_dic (dict) – The dictionary mapping age to the age bracket range it falls within matching the household contact matrix.
contact_matrix_dic (dict) – A dictionary of the age-specific contact matrix for different physical contact settings.
single_year_age_distr (dict) – The age distribution.
- Returns
An array of households for size
size
where each household is a row and the values in the row are the ages of the household members. The first age in the row is the age of the reference individual.
-
synthpops.ltcfs.
generate_all_households_method_1
(N, hh_sizes, hha_by_size_counts, hha_brackets, cm_age_brackets, cm_age_by_brackets_dic, contact_matrix_dic, single_year_age_distr)¶ Generate the ages of those living in households together. First create households of people living alone, then larger households. For households larger than 1, a reference individual’s age is sampled conditional on the household size, while all other household members have their ages sampled conditional on the reference person’s age and the age mixing contact matrix in households for the population under study.
- Parameters
N (int) – The number of people in the population.
hh_sizes (array) – The count of household size s at index s-1.
hha_by_size_counts (matrix) – A matrix in which each row contains the age distribution of the reference person for household size s at index s-1.
hha_brackets (dict) – The age brackets for the heads of household.
cm_age_brackets (dict) – The dictionary mapping age bracket keys to age bracket range matching the household contact matrix.
cm_age_by_brackets_dic (dict) – The dictionary mapping age to the age bracket range it falls within matching the household contact matrix.
contact_matrix_dic (dict) – The dictionary of the age-specific contact matrix for different physical contact settings.
single_year_age_distr (dict) – The age distribution.
- Returns
An array of all households where each household is a row and the values in the row are the ages of the household members. The first age in the row is the age of the reference individual. Households are randomly shuffled by size.
Note
This method is not guaranteed to model the population age distribution well automatically. The method called inside, generate_larger_households_method_1 uses the method ltcf_resample_age to fit Seattle, Washington populations with long term care facilities generated. For a method that matches the age distribution well for populations in general, please use generate_all_households_methods_2.
-
synthpops.ltcfs.
generate_all_households_method_2
(n_nonltcf, hh_sizes, hha_by_size, hha_brackets, cm_age_brackets, cm_age_by_brackets_dic, contact_matrix_dic, ltcf_adjusted_age_distr)¶ Generate the ages of those living in households together. First create households of people living alone, then larger households. For households larger than 1, a reference individual’s age is sampled conditional on the household size, while all other household members have their ages sampled conditional on the reference person’s age and the age mixing contact matrix in households for the population under study. Fix the count of ages in the population before placing individuals in households so that the age distribution of the generated population is fixed to closely match the age distribution from data on the population.
- Parameters
n_nonltcf (int) – The number of people in the population not living in long term care facilities.
hh_sizes (array) – The count of household size s at index s-1.
hha_by_size_counts (matrix) – A matrix in which each row contains the age distribution of the reference person for household size s at index s-1.
hha_brackets (dict) – The age brackets for the heads of household.
cm_age_brackets (dict) – The dictionary mapping age bracket keys to age bracket range matching the household contact matrix.
cm_age_by_brackets_dic (dict) – The dictionary mapping age to the age bracket range it falls within matching the household contact matrix.
contact_matrix_dic (dict) – The dictionary of the age-specific contact matrix for different physical contact settings.
ltcf_adjusted_age_distr (dict) – The age distribution.
- Returns
An array of all households where each household is a row and the values in the row are the ages of the household members. The first age in the row is the age of the reference individual. Households are randomly shuffled by size.
synthpops.plotting module¶
This module provides plotting methods including methods to plot the age-specific contact matrix in different contact layers.
-
synthpops.plotting.
calculate_contact_matrix
(population, density_or_frequency='density', layer='H')¶ Calculate the symmetric age-specific contact matrix from the connections for all people in the population. density_or_frequency sets the type of contact matrix calculated.
When density_or_frequency is set to ‘frequency’ each person is assumed to have a fixed amount of contact with others they are connected to in a setting so each person will split their contact amount equally among their connections. This means that if a person has links to 4 other individuals then 1/4 will be added to the matrix element matrix[age_i][age_j] for each link, where age_i is the age of the individual and age_j is the age of the contact. This follows the mass action principle such that increased density or number of people a person is in contact with leads to decreased per-link or connection contact rate.
When density_or_frequency is set to ‘density’ the amount of contact each person has with others scales with the number of people they are connected to. This means that a person with connections to 4 individuals has a higher total contact rate than a person with connection to 3 individuals. For this definition if a person has links to 4 other individuals then 1 will be added to the matrix element matrix[age_i][age_j] for each contact. This follows the ‘pseudo’mass action principle such that the per-link or connection contact rate is constant.
- Parameters
population (dict) – A dictionary of a population with attributes.
density_or_frequency (str) – option for the type of contact matrix calculated.
layer (str) – name of the physial contact setting, see notes.
- Returns
Symmetric age specific contact matrix.
- Return type
np.ndarray
Note
H for households, S for schools, W for workplaces, C for community or other, and ‘LTCF’ for long term care facilities.
-
synthpops.plotting.
plot_contacts
(pop, **kwargs)¶ Plot the age mixing matrix for a specific contact layer.
- Parameters
pop (pop object) – population, either synthpops.pop.Pop or dict
**layer (str) – name of the physial contact layer: H for households, S for schools, W for workplaces, C for community or other
**aggregate_flag (bool) – If True, plot the contact matrix for aggregate age brackets, else single year age contact matrix.
**logcolors_flag (bool) – If True, plot heatmap in logscale
**density_or_frequency (str) – If ‘density’, then each contact counts for 1/(group size -1) of a person’s contact in a group, elif ‘frequency’ then count each contact. This means that more people in a group leads to higher rates of contact/exposure.
**state_location (string) – name of the state the location is in
**country_location (string) – name of the country the location is in
**cmap (str or matplotlib cmap) – colormap
**fontsize (int) – base font size
**rotation (int) – rotation for x axis labels
**title_prefix (str) – optional title prefix for the figure
**fig (matplotlib.figure) – If supplied, use this figure instead of generating one
**ax (matplotlib.axes) – If supplied, use these axes instead of generating one
**do_show (bool) – If True, show the plot
**do_save (bool) – If True, save the plot to disk
- Returns
Matplotlib figure.
-
synthpops.plotting.
plot_ages
(pop, **kwargs)¶ Plot a comparison of the expected and generated age distribution.
- Parameters
pop (pop object) – population, either synthpops.pop.Pop, covasim.people.People, or dict
**left (float) – Matplotlib.figure.subplot.left
**right (float) – Matplotlib.figure.subplot.right
**top (float) – Matplotlib.figure.subplot.top
**bottom (float) – Matplotlib.figure.subplot.bottom
**color_1 (str) – color for expected data
**color_2 (str) – color for data from generated population
**fontsize (float) – Matplotlib.figure.fontsize
**figname (str) – name to save figure to disk
**comparison (bool) – If True, plot comparison to the generated population
- Returns
Matplotlib figure and axes.
Note
If using pop with type covasim.people.Pop or dict, args must be supplied for the location parameters to get the expected distribution.
Example:
pars = {'n': 10e3, location='seattle_metro', state_location='Washington', country_location='usa'} pop = sp.Pop(**pars) fig, ax = pop.plot_age_distribution_comparison() popdict = pop.to_dict() kwargs = pars.copy() kwargs['datadir'] = sp.datadir fig, ax = sp.plot_age_distribution_comparison(popdict, **kwargs)
-
synthpops.plotting.
plot_school_sizes
(pop, **kwargs)¶ Plot a comparison of the expected and generated school size distribution for each type of school expected.
- Parameters
pop (pop object) – population, either synthpops.pop.Pop, or dict
**with_school_types (type) – If True, plot school size distributions by type, else plot overall school size distributions
**keys_to_exclude (str or list) – school types to exclude
**left (float) – Matplotlib.figure.subplot.left
**right (float) – Matplotlib.figure.subplot.right
**top (float) – Matplotlib.figure.subplot.top
**bottom (float) – Matplotlib.figure.subplot.bottom
**hspace (float) – Matplotlib.figure.subplot.hspace
**subplot_height (float) – height of subplot in inches
**subplot_width (float) – width of subplot in inches
**screen_height_factor (float) – fraction of the screen height to use for display
**location_text_y (float) – height to add location text to figure
**fontsize (float) – Matplotlib.figure.fontsize
**rotation (float) – rotation angle for xticklabels
**cmap (str) – colormap
**figname (str) – name to save figure to disk
**comparison (bool) – If True, plot comparison to the generated population
- Returns
Matplotlib figure and axes.
Note
If using pop with type covasim.people.Pop or dict, args must be supplied for the location parameters to get the expected distribution.
Example:
pars = {'n': 10e3, location='seattle_metro', state_location='Washington', country_location='usa'} pop = sp.Pop(**pars) fig, ax = pop.plot_school_sizes_by_type() popdict = pop.to_dict() kwargs = pars.copy() kwargs['datadir'] = sp.datadir fig, ax = sp.plot_school_sizes(popdict, **kwargs)
-
class
synthpops.plotting.
plotting_kwargs
(*args, **kwargs)¶ Bases:
sciris.sc_odict.objdict
A class to set and operate on plotting kwargs throughout synthpops.
- Parameters
kwargs (dict) – dictionary of plotting parameters to be used.
-
initialize
()¶ Initialize plot settings.
-
set_font
(*args, **font)¶ Set font styles.
-
default_plotting_kwargs
()¶ Define default plotting kwrgs to be used in plotting methods.
-
set_figure_display_size
(*args, **kwargs)¶ Update plotting kwargs with new calculated display sizes.
- Parameters
kwargs (sc.objdict) – new values to update with
- Returns
Updated kwargs and recalculating the display sizes.
-
set_default_pop_pars
()¶ Check if method has some key pop parameters to call on data. If not, use defaults and warn user of their use and value.
-
restore_defaults
()¶ Reset matplotlib defaults.
-
update_defaults
(method_defaults, kwargs)¶ Update defaults with method defaults and kwargs.
-
property
axis
¶ Dictionary of axis settings.
synthpops.pop module¶
This module provides the layer for communicating with the agent-based model Covasim.
-
class
synthpops.pop.
Pop
(n=None, max_contacts=None, ltcf_pars=None, school_pars=None, with_industry_code=False, with_facilities=False, use_default=False, use_two_group_reduction=True, average_LTCF_degree=20, ltcf_staff_age_min=20, ltcf_staff_age_max=60, with_school_types=False, school_mixing_type='random', average_class_size=20, inter_grade_mixing=0.1, average_student_teacher_ratio=20, average_teacher_teacher_degree=3, teacher_age_min=25, teacher_age_max=75, with_non_teaching_staff=False, average_student_all_staff_ratio=15, average_additional_staff_degree=20, staff_age_min=20, staff_age_max=75, rand_seed=None, country_location=None, state_location=None, location=None, sheet_name=None, household_method='infer_ages', smooth_ages=False, window_length=7, do_make=True)¶ Bases:
sciris.sc_utils.prettyobj
-
generate
(verbose=False)¶ Actually generate the network.
- Parameters
verbose (bool) – If True, print statements about the population and networks as they’re being generated.
- Returns
A dictionary of the full population with ages, connections, and other attributes.
- Return type
network (dict)
-
to_dict
()¶ Export to a dictionary – official way to get the popdict.
Example:
popdict = pop.to_dict()
-
to_json
(filename, indent=2, **kwargs)¶ Export to a JSON file.
Example:
pop.to_json('my-pop.json')
-
save
(filename, **kwargs)¶ Save population to an binary, gzipped object file.
Example:
pop.save('my-pop.pop')
-
static
load
(filename, *args, **kwargs)¶ Load from disk from a gzipped pickle.
- Parameters
filename (str) – the name or path of the file to load from
kwargs – passed to sc.loadobj()
Example:
pop = sp.Pop.load('my-pop.pop')
-
count_pop_ages
()¶ Create an age count of the generated population.
- Returns
Dictionary of the age count of the generated population.
- Return type
dict
-
get_enrollment_by_school_type
(*args, **kwargs)¶ Get enrollment sizes by school types in popdict.
- Returns
List of generated enrollment sizes by school type.
- Return type
list
-
plot_people
(*args, **kwargs)¶ Placeholder example of plotting the people in a population.
-
plot_contacts
(*args, **kwargs)¶ Plot matrices of the contacts for a given layer or layers.
-
plot_ages
(*args, **kwargs)¶ Plot a comparison of the expected and generated age distribution.
Example:
pars = {'n': 10e3, location='seattle_metro', state_location='Washington', country_location='usa'} pop = sp.Pop(**pars) fig, ax = pop.plot_ages()
-
plot_enrollment_rates_by_age_comparison
(*args, **kwargs)¶ Plot a comparison of the expected and generated enrollment rates by age.
Example:
pars = {'n': 10e3, location='seattle_metro', state_location='Washington', country_location='usa'} pop = sp.Pop(**pars) fig, ax = pop.plot_age_distribution_comparison()
-
plot_school_sizes
(*args, **kwargs)¶ Plot a comparison of the expected and generated school size distributions by school type.
Example:
pars = {'n': 10e3, location='seattle_metro', state_location='Washington', country_location='usa'} pop = sp.Pop(**pars) fig, ax = pop.plot_school_sizes()
-
-
synthpops.pop.
make_population
(*args, **kwargs)¶ Interface to sp.Pop().to_dict(). Included for backwards compatibility.
-
synthpops.pop.
generate_synthetic_population
(*args, **kwargs)¶ For backwards compatibility only.
synthpops.process_census module¶
This module provides functions that process data tables from the US Census Bureau into simple distribution tables that SynthPops functions can talk to.
Also includes functions to process data tables from the National survey on Long Term Care Providers in the US to convert those into rates by age for each US state using SynthPops functions.
-
synthpops.process_census.
process_us_census_age_counts
(datadir, location, state_location, country_location, year, acs_period)¶ Process American Community Survey data for a given year to get an age count for the location binned into 18 age brackets.
- Parameters
datadir (str) – file path to the data directory
location (str) – name of the location
state_location (str) – name of the state the location is in
country_location (str) – name of the country the location is in
year (int) – the year for the American Community Survey
acs_period (int) – the number of years for the American Community Survey
- Returns
A dictionary with the binned age count and a dictionary with the age bracket ranges.
-
synthpops.process_census.
process_us_census_age_counts_by_gender
(datadir, location, state_location, country_location, year, acs_period)¶ Process American Community Survey data for a given year to get an age count by gender for the location binned into 18 age brackets.
- Parameters
datadir (str) – file path to the data directory
location (str) – name of the location
state_location (str) – name of the state the location is in
country_location (str) – name of the country the location is in
year (int) – the year for the American Community Survey
acs_period (int) – the number of years for the American Community Survey
- Returns
A dictionary with the binned age count by gender and a dictionary with the age bracket ranges.
-
synthpops.process_census.
process_us_census_population_size
(datadir, location, state_location, country_location, year, acs_period)¶ Process American Community Survey data for a given year to get the population size for the location.
- Parameters
datadir (str) – file path to the data directory
location (str) – name of the location
state_location (str) – name of the state the location is in
country_location (str) – name of the country the location is in
year (int) – the year for the American Community Survey
acs_period (int) – the number of years for the American Community Survey
- Returns
The population size of the location for a given year estimated from the American Community Survey.
- Return type
int
-
synthpops.process_census.
process_us_census_household_size_count
(datadir, location, state_location, country_location, year, acs_period)¶ Process American Community Survey data for a given year to get a household size count for the location. The last bin represents households of size 7 or higher.
- Parameters
datadir (str) – file path to the data directory
location (str) – name of the location
state_location (str) – name of the state the location is in
country_location (str) – name of the country the location is in
year (int) – the year for the American Community Survey
acs_period (int) – the number of years for the American Community Survey
- Returns
A dictionary with the household size count.
-
synthpops.process_census.
process_us_census_employment_rates
(datadir, location, state_location, country_location, year, acs_period)¶ Process American Community Survey data for a given year to get employment rates by age as a fraction.
- Parameters
datadir (str) – file path to the data directory
location (str) – name of the location
state_location (str) – name of the state the location is in
country_location (str) – name of the country the location is in
year (int) – the year for the American Community Survey
acs_period (int) – the number of years for the American Community Survey
- Returns
A dictionary with the employment rates by age as a fraction.
-
synthpops.process_census.
process_us_census_enrollment_rates
(datadir, location, state_location, country_location, year, acs_period)¶ Process American Community Survey data for a given year to get enrollment rates by age as a fraction.
- Parameters
datadir (str) – file path to the data directory
location (str) – name of the location
state_location (str) – name of the state the location is in
country_location (str) – name of the country the location is in
year (int) – the year for the American Community Survey
acs_period (int) – the number of years for the American Community Survey
- Returns
A dictionary with the enrollment rates by age as a fraction.
-
synthpops.process_census.
process_us_census_workplace_sizes
(datadir, location, state_location, country_location, year)¶ Process American Community Survey data for a given year to get a count of workplace sizes as the number of employees per establishment.
- Parameters
datadir (str) – file path to the data directory
location (str) – name of the location
state_location (str) – name of the state the location is in
country_location (str) – name of the country the location is in
year (int) – the year for the American Community Survey
- Returns
A dictionary with the workplace or establishment size distribution as a count.
-
synthpops.process_census.
process_long_term_care_facility_rates_by_age
(datadir, state_location, country_location)¶ Process the National Long Term Care Providers state data tables from 2016 to get the estimated user rates by age.
- Parameters
datadir (string) – file path to the data directory
state_location (string) – name of the state
country_location (string) – name of the country the state is in
- Returns
A dictionary with the estimated rates of Long Term Care Facility usage by age for the state in 2016.
- Return type
dict
-
synthpops.process_census.
process_usa_ltcf_resident_to_staff_ratios
(datadir, country_location, state_location, location_alias, location_list=[''], save=False)¶ Process the Kaiser Health News (KHN) dashboard data on the ratios by facility to estimate the ratios for all facilities in the area. from 2016 to get the estimated user rates by age. Then write to file.
- Parameters
datadir (string) – file path to the data directory
country_location (string) – name of the country
state_location (string) – name of the state
location_alias (str) – more commonly known name of the location
location_list (list) – list of locations to include
save (bool) – If True, save to file.
- Returns
A dictionary with the probability of resident to staff ratios and the bins.
- Return type
dict
-
synthpops.process_census.
write_age_bracket_distr_18
(datadir, location_alias, state_location, country_location, age_bracket_count, age_brackets)¶ Write age bracket distribution binned to 18 age brackets.
- Parameters
datadir (str) – file path to the data directory
location_alias (str) – more commonly known name of the location
state_location (str) – name of the state the location is in
country_location (str) – name of the country the location is in
age_bracket_count (dict) – dictionary of the age count given by 18 brackets
age_brackets (dict) – dictionary of the age range for each bracket
- Returns
None.
-
synthpops.process_census.
write_age_bracket_distr_16
(datadir, location_alias, state_location, country_location, age_bracket_count, age_brackets)¶ Write age bracket distribution binned to 16 age brackets.
- Parameters
datadir (str) – file path to the data directory
location_alias (str) – more commonly known name of the location
state_location (str) – name of the state the location is in
country_location (str) – name of the country the location is in
age_bracket_count (dict) – dictionary of the age count given by 18 brackets
age_brackets (dict) – dictionary of the age range for each bracket
- Returns
None.
-
synthpops.process_census.
write_gender_age_bracket_distr_18
(datadir, location_alias, state_location, country_location, age_bracket_count_by_gender, age_brackets)¶ Write age bracket by gender distribution data binned to 18 age brackets.
- Parameters
datadir (str) – file path to the data directory
location_alias (str) – more commonly known name of the location
state_location (str) – name of the state the location is in
country_location (str) – name of the country the location is in
age_bracket_distr (dict) – dictionary of the age count by gender given by 18 brackets
age_brackets (dict) – dictionary of the age range for each bracket
- Returns
None.
-
synthpops.process_census.
write_gender_age_bracket_distr_16
(datadir, location_alias, state_location, country_location, age_bracket_count_by_gender, age_brackets)¶ Write age bracket by gender distribution binned to 16 age brackets.
- Parameters
datadir (str) – file path to the data directory
location_alias (str) – more commonly known name of the location
state_location (str) – name of the state the location is in
country_location (str) – name of the country the location is in
age_bracket_distr (dict) – dictionary of the age count by gender given by 18 brackets
age_brackets (dict) – dictionary of the age range for each bracket
- Returns
None.
-
synthpops.process_census.
read_household_size_count
(datadir, location_alias, state_location, country_location)¶ Get household size count dictionary.
- Parameters
datadir (str) – file path to the data directory
location_alias (str) – more commonly known name of the location
state_location (str) – name of the state the location is in
country_location (str) – name of the country the location is in
- Returns
A dictionary of the household size count.
- Return type
dict
-
synthpops.process_census.
write_household_size_count
(datadir, location_alias, state_location, country_location, household_size_count)¶ Write household size count.
- Parameters
datadir (str) – file path to the data directory
location_alias (str) – more commonly known name of the location
state_location (str) – name of the state the location is in
country_location (str) – name of the country the location is in
household_size_count (dict) – dictionary of the household size count.
- Returns
None.
-
synthpops.process_census.
write_household_size_distr
(datadir, location_alias, state_location, country_location, household_size_count)¶ Write household size distribution.
- Parameters
datadir (str) – file path to the data directory
location_alias (str) – more commonly known name of the location
state_location (str) – name of the state the location is in
country_location (str) – name of the country the location is in
household_size_count (dict) – dictionary of the household size count.
- Returns
None.
-
synthpops.process_census.
write_employment_rates
(datadir, location_alias, state_location, country_location, employment_rates)¶ Write employment rates by age as a fraction.
- Parameters
datadir (str) – file path to the data directory
location_alias (str) – more commonly known name of the location
state_location (str) – name of the state the location is in
country_location (str) – name of the country the location is in
employment_rates (dict) – dictionary of the employment rates by age as a fraction.
- Returns
None.
-
synthpops.process_census.
write_enrollment_rates
(datadir, location_alias, state_location, country_location, enrollment_rates)¶ Write employment rates by age as a fraction.
- Parameters
datadir (str) – file path to the data directory
location_alias (str) – more commonly known name of the location
state_location (str) – name of the state the location is in
country_location (str) – name of the country the location is in
enrollment_rates (dict) – dictionary of the enrollment rates by age as a fraction.
- Returns
None.
-
synthpops.process_census.
write_long_term_care_facility_use_rates
(datadir, state_location, country_location, ltcf_rates_by_age)¶ Write Long Term Care Facility usage rates by age as a fraction for a state in the United States.
- Parameters
datadir (str) – file path to the data directory
location_alias (str) – more commonly known name of the location
state_location (str) – name of the state the location is in
country_location (str) – name of the country the location is in
ltcf_rates_by_age (dict) – dictionary of the long term care facility use rates by age as a fraction.
- Returns
None.
-
synthpops.process_census.
write_workplace_size_counts
(datadir, location_alias, state_location, country_location, size_label_mappings, establishment_size_counts)¶ Write workplace or establishment size count distribution.
- Parameters
datadir (str) – file path to the data directory
location_alias (str) – more commonly known name of the location
state_location (str) – name of the state the location is in
country_location (str) – name of the country the location is in
size_label_mappings (dict) – dictionary of the size labels mapping to the size bin
establishment_size_counts (dict) – dictionary of the count of workplaces by size label
- Returns
None.
synthpops.sampling module¶
Sample distributions, either from real world data or from uniform distributions.
-
synthpops.sampling.
set_seed
(seed=None)¶ Reset the random seed – complicated because of Numba.
-
synthpops.sampling.
fast_choice
(weights)¶ Choose an option – quickly – from the provided weights. Weights do not need to be normalized.
Reimplementation of random.choices(), removing everything inessential.
Example
fast_choice([0.1,0.2,0.3,0.2,0.1]) # might return 2
-
synthpops.sampling.
sample_single_dict
(distr_keys, distr_vals)¶ Sample from a distribution.
- Parameters
distr (dict or np.ndarray) – distribution
- Returns
A single sampled value from a distribution.
-
synthpops.sampling.
sample_single_arr
(distr)¶ Sample from a distribution.
- Parameters
distr (dict or np.ndarray) – distribution
- Returns
A single sampled value from a distribution.
-
synthpops.sampling.
resample_age
(age_dist_vals, age)¶ Resample age from single year age distribution.
- Parameters
single_year_age_distr (arr) – age distribution, ordered by age
age (int) – age as an integer
- Returns
Resampled age as an integer.
-
synthpops.sampling.
sample_from_range
(distr, min_val, max_val)¶ Sample from a distribution from min_val to max_val, inclusive.
- Parameters
distr (dict) – distribution with integer keys
min_val (int) – minimum of the range to sample from
max_val (int) – maximum of the range to sample from
- Returns
A sampled number from the range min_val to max_val in the distribution distr.
synthpops.schools module¶
This module generates school contacts by class and grade in flexible ways. Contacts can be clustered into classes and also mixed across the grade and across the school.
H. Guclu et. al (2016) shows that mixing across grades is low for public schools in elementary and middle schools. Mixing across grades is however higher in high schools.
Functions in this module are flexible to allow users to specify the inter-grade mixing (for ‘age_clustered’ school_mixing_type), and to choose whether contacts are clustered within a grade. Clustering contacts across different grades is not supported because there is no data to suggest that this happens commonly.
-
synthpops.schools.
get_school_type_labels
()¶
-
synthpops.schools.
get_enrollment_by_school_type
(popdict, **kwargs)¶ Get enrollment sizes by school types in popdict.
- Parameters
popdict (dict) – population dictionary
**with_school_types (bool) – If True, return enrollment by school types as defined in the popdict. Otherwise, combine all enrollment sizes for a school type of None.
**keys_to_exclude (list) – school types to exclude
- Returns
Dictionary of generated enrollment sizes by school type.
- Return type
dict
-
synthpops.schools.
get_generated_school_size_distributions
(enrollment_by_school_type, bins)¶ Get school size distributions by type.
- Parameters
enrollment_by_school_type (dict) – generated enrollment sizes by school types
bins (list) – school size bins
- Returns
Dictionary of generated school size distribution by school type.
- Return type
dict
-
synthpops.schools.
get_bin_edges
(size_brackets)¶ Get the bin edges for size brackets.
- Parameters
size_brackets (dict) – dictionary mapping bracket or bin number to an array of the range of sizes
- Returns
An array of the bin edges.
-
synthpops.schools.
get_bin_labels
(size_brackets)¶ Get the bin labels from the values contained within each bracket or bin.
- Parameters
size_brackets (dict) – dictionary mapping bracket or bin number to an array of the range of sizes
- Returns
A list of bin labels.
synthpops.version module¶
synthpops.workplaces module¶
-
synthpops.workplaces.
get_uids_potential_workers
(syn_school_uids, employment_rates, age_by_uid_dic)¶ Get IDs for everyone who could be a worker by removing those who are students and those who can’t be employed officially.
- Parameters
syn_school_uids (list) – A list of lists where each sublist represents a school with the IDs of students in the school.
employment_rates (dict) – The employment rates by age.
age_by_uid_dic (dict) – A dictionary mapping ID to age for individuals in the population.
- Returns
A dictionary of potential workers mapping their ID to their age, a dictionary mapping age to the list of IDs for potential workers with that age, and a dictionary mapping age to the count of potential workers left to assign to a workplace for that age.
-
synthpops.workplaces.
generate_workplace_sizes
(workplace_size_distr_by_bracket, workplace_size_brackets, workers_by_age_to_assign_count)¶ Given a number of individuals employed, generate a list of workplace sizes to place everyone in a workplace.
- Parameters
workplace_size_distr_by_bracket (dict) – The distribution of binned workplace sizes.
worplace_size_brackets (dict) – A dictionary of workplace size brackets.
workers_by_age_to_assign_count (dict) – A dictionary mapping age to the count of employed individuals of that age.
- Returns
A list of workplace sizes.
-
synthpops.workplaces.
get_workers_by_age_to_assign
(employment_rates, potential_worker_ages_left_count, uids_by_age_dic)¶ Get the number of people to assign to a workplace by age using those left who can potentially go to work and employment rates by age.
- Parameters
employment_rates (dict) – A dictionary of employment rates by age.
potential_worker_ages_left_count (dict) – A dictionary of the count of workers to assign by age.
uids_by_age_dic (dict) – A dictionary mapping age to the list of ids with that age.
- Returns
A dictionary with a count of workers to assign to a workplace.
-
synthpops.workplaces.
assign_rest_of_workers
(workplace_sizes, potential_worker_uids, potential_worker_uids_by_age, workers_by_age_to_assign_count, age_by_uid_dic, age_brackets, age_by_brackets_dic, contact_matrix_dic, verbose=False)¶ Assign the rest of the workers to non-school workplaces.
- Parameters
workplace_sizes (list) – list of workplace sizes
potential_worker_uids (dict) – dictionary of potential workers mapping their id to their age
potential_worker_uids_by_age (dict) – dictionary mapping age to the list of worker ids with that age
workers_by_age_to_assign_count (dict) – dictionary of the count of workers left to assign by age
age_by_uid_dic (dict) – dictionary mapping id to age for all individuals in the population
age_brackets (dict) – dictionary mapping age bracket keys to age bracket range
age_by_brackets_dic (dict) – dictionary mapping age to the age bracket range it falls in
contact_matrix_dic (dict) – dictionary of age specific contact matrix for different physical contact settings
verbose (bool) – If True, print statements about the generated schools as teachers are being added to each school.
- Returns
List of lists where each sublist is a workplace with the ages of workers, list of lists where each sublist is a workplace with the ids of workers, dictionary of potential workers left mapping id to age, dictionary mapping age to a list of potential workers left of that age, dictionary mapping age to the count of workers left to assign.
Glossary¶
- contact layers
Each of the layers of the population network that is a representation of all of the pairwise connections between people in a given location, such as school, work, or households.
- node
In network theory, the discrete object being represented. In SynthPops, nodes represent people and can have attributes like age assigned.
- edge
In network theory, the interactions between discrete objects. In SynthPops, edges represent interactions between people, with attributes like the setting in which the interactions take place (for example, household, school, or work). The relationship between the interaction setting and properties governing disease transmission, such as frequency of contact and risk associated with each contact, is mapped separately by Covasim or other agent-based model. SynthPops reports whether the edge exists or not.
- agent-based model
A type of simulation that models the actions and interactions of autonomous agents (both individual and collective entities such as organizations or groups).
- time step
A discrete number of hours or days in which the “simulation states” of all “simulation objects” (interventions, infections, immune systems, or individuals) are updated in a simulation. Each time step will complete processing before launching the next one. For example, a time step would process the migration data for populations moving between nodes via rail, airline, and road. The migration of individuals between nodes is the last step of the time step after updating states.
- household contact layer
The layer in the population network that represents all of the pairwise connections between people in households. All people must be part of the household contact layer, though some households may consist of a single person.
- school contact layer
The layer in the population network that represents all of the pairwise connections between people in schools. This includes both students and teachers. The school and workplace contact layers are mutually exclusive, someone cannot be both a student and a worker.
- workplace contact layer
The layer in the population network that represents all of the pairwise connections between people in workplaces excluding teachers in schools. The school and workplace contact layers are mutually exclusive, someone cannot be both a student and a worker.