synthpops.data_distributions module

Read in data distributions.

get_relative_path(datadir)[source]

Get the path relative for the datadir.

Parameters:datadir (str) – path to a specified data directory
Returns:A path relative to a specified data directory datadir
Return type:str
get_nbrackets()[source]

Return the default number of age brackets.

calculate_which_nbrackets_to_use(location_data, nbrackets=None)[source]

Calculate the number of age brackets to use by default.

Parameters:nbrackets (int) – the number of age brackets to use
Returns:The number of age brackets to use.
Return type:int
sanitize_location(location)[source]

Process and return a valid name for a location.

Parameters:location (str) – name of the location
Returns:A processed location name.
Return type:str
calculate_location_filename(location, state_location, country_location)[source]

Process a location filename.

Parameters:
  • location (string) – name of the location
  • state_location (string) – name of the state the location is in
  • country_location (string) – name of the country the location is in
Returns:

A filename for where the location data reside.

Return type:

str

calculate_location_filepath(location, state_location, country_location)[source]

Process a location filepath.

Parameters:
  • location (string) – name of the location
  • state_location (string) – name of the state the location is in
  • country_location (string) – name of the country the location is in
Returns:

A filename for where the location data reside.

Return type:

str

load_location(specific_location, state_location, country_location, revert_to_default=None)[source]

Loading json object for the location data.

Parameters:
  • specific_location (string) – name of the location
  • state_location (string) – name of the state the location is in
  • country_location (string) – name of the country the location is in
  • revert_to_default (bool) – If True, try to first find location specific data to return otherwise use default data specified by the default location
Returns:

A filename for where the location data reside.

Return type:

str

read_age_bracket_distr(datadir=None, location=None, state_location=None, country_location=None, nbrackets=None, file_path=None, use_default=False)[source]

A dict of the age distribution by age brackets. If use_default, then we’ll first try to look for location specific data and if that’s not available we’ll use default data from settings.location, settings.state_location, settings.country_location. This may not be appropriate for the population under study so it’s best to provide as much data as you can for the specific population.

Parameters:
  • datadir (string) – file path to the data directory
  • location (string) – name of the location
  • state_location (string) – name of the state the location is in
  • country_location (string) – name of the country the location is in
  • file_path (string) – file path to user specified age bracket distribution data
  • use_default (bool) – if True, try to first use the other parameters to find data specific to the location under study, otherwise returns default data drawing from the settings.location, settings.state_location, settings.country_location.
Returns:

A dictionary of the age distribution by age bracket. Keys map to a range of ages in that age bracket.

Return type:

dict

get_smoothed_single_year_age_distr(datadir=None, location=None, state_location=None, country_location=None, nbrackets=None, file_path=None, use_default=False, window_length=7)[source]

A smoothed dict of the age distribution by single years. If use_default, then we’ll first try to look for location specific data and if that’s not available we’ll use default data from settings.location, settings.state_location, settings.country_location. This may not be appropriate for the population under study so it’s best to provide as much data as you can for the specific population. Using moving windows to smooth out the age distribution.

Parameters:
  • datadir (string) – file path to the data directory
  • location (string) – name of the location
  • state_location (string) – name of the state the location is in
  • country_location (string) – name of the country the location is in
  • file_path (string) – file path to user specified age bracket distribution data
  • use_default (bool) – If True, try to first use the other parameters to find data specific to the location under study, otherwise returns default data drawing from the settings.location, settings.state_location, settings.country_location.
  • window_length (int) – length of window, in units of years, over which to average or smooth out age distribution
Returns:

A dictionary of the age distribution by age bracket. Keys map to a range of ages in that age bracket.

Return type:

dict

get_household_size_distr(datadir=None, location=None, state_location=None, country_location=None, file_path=None, use_default=False)[source]

A dictionary of the distribution of household sizes. If you don’t give the file_path, then supply the location, state_location, and country_location strings. If use_default, then we’ll first try to look for location specific data and if that’s not available we’ll use default data from settings.location, settings.state_location, settings.country_location. This may not be appropriate for the population under study so it’s best to provide as much data as you can for the specific population.

Parameters:
  • datadir (string) – file path to the data directory
  • location (string) – name of the location
  • state_location (string) – name of the state the location is in
  • country_location (string) – name of the country the location is in
  • file_path (string) – file path to user specified household size distribution data
  • use_default (bool) – if True, try to first use the other parameters to find data specific to the location under study, otherwise returns default data drawing from settings.location, settings.state_location, settings.country_location.
Returns:

A dictionary of the household size distribution data. Keys map to the household size as an integer, values are the percent of households of that size.

Return type:

dict

get_head_age_brackets(datadir=None, location=None, state_location=None, country_location=None, file_path=None, use_default=False)[source]

Get a dictionary of head age brackets either from the file_path directly, or using the other parameters to figure out what the file_path should be. If use_default, then we’ll first try to look for location specific data and if that’s not available we’ll use default data from settings.location, settings.state_location, settings.country_location. This may not be appropriate for the population under study so it’s best to provide as much data as you can for the specific population.

Parameters:
  • datadir (string) – file path to the data directory
  • location (string) – name of the location
  • state_location (string) – name of the state
  • country_location (string) – name of the country the state_location is in
  • file_path (string) – file path to user specified head age brackets data
  • use_default (bool) – if True, try to first use the other parameters to find data specific to the location under study, otherwise returns default data drawing from the settings.location, settings.state_location, settings.country_location.
Returns:

A dictionary of the age brackets for head of household distribution data. Keys map to the age bracket as an integer, values are the percent of households which head of household in that age bracket.

Return type:

dict

get_head_age_by_size_distr(datadir=None, location=None, state_location=None, country_location=None, file_path=None, use_default=False)[source]

Create an array of head of household age bracket counts (column) given by size (row). If use_default, then we’ll first try to look for location specific data and if that’s not available we’ll use default data from the settings.location, settings.state_location, settings.country_location. This may not be appropriate for the population under study so it’s best to provide as much data as you can for the specific population.

Parameters:
  • datadir (string) – file path to the data directory
  • location (string) – name of the location
  • state_location (string) – name of the state
  • country_location (string) – name of the country the state_location is in
  • file_path (string) – file path to user specified age of the head of the household by household size distribution data
  • use_default (bool) – if True, try to first use the other parameters to find data specific to the location under study, otherwise returns default data drawing from settings.location, settings.state_location, settings.country_location.
Returns:

An array where each row s represents the age distribution of the head of households for households of size s-1.

Return type:

ndarray

get_census_age_brackets(datadir=None, location=None, state_location=None, country_location=None, file_path=None, use_default=False, nbrackets=None)[source]

Get census age brackets: depends on the country or source of the age distribution and the contact pattern data. If use_default, then we’ll first try to look for location specific data and if that’s not available we’ll use default data from settings.location, settings.state_location, settings.country_location. This may not be appropriate for the population under study so it’s best to provide as much data as you can for the specific population.

Parameters:
  • datadir (string) – file path to the data directory
  • location (string) – name of the location
  • state_location (string) – name of the state
  • country_location (string) – name of the country the state_location is in
  • file_path (string) – file path to user specified census age brackets
  • use_default (bool) – if True, try to first use the other parameters to find data specific to the location under study, otherwise returns default data drawing from settings.location, settings.state_location, settings.country_location.
Returns:

A dictionary of the range of ages that map to each age bracket.

Return type:

dict

get_contact_matrix(datadir, setting_code, sheet_name=None, file_path=None, delimiter=' ', header=None)[source]

Get setting specific age contact matrix given sheet name to use. If file_path is given, then delimiter and header should also be specified.

Parameters:
  • datadir (string) – file path to the data directory
  • setting_code (string) – name of the physial contact setting: H for households, S for schools, W for workplaces, C for community or other
  • sheet_name (string) – name of the sheet in the excel file with contact patterns
  • file_path (string) – file path to user specified age contact matrix
  • delimiter (string) – delimter for the contact matrix file
  • header (int) – row number for the header of the file
Returns:

Matrix of contact patterns where each row i is the average contact patterns for an individual in age bracket i and the columns represent the age brackets of their contacts. The matrix element i,j is then the contact rate, number, or frequency for the average individual in age bracket i with all of their contacts in age bracket j in that physical contact setting.

Return type:

ndarray

get_contact_matrices(datadir=None, sheet_name=None, file_path_dic=None, delimiter=' ', header=None, use_default=False)[source]

Create a dict of setting specific age contact matrices. If use_default, then we’ll first try to look for location specific data and if that’s not available we’ll use default data from settings.sheet_name. This may not be appropriate for the population under study so it’s best to provide as much data as you can for the specific population.

Parameters:
  • datadir (string) – file path to the data directory
  • setting_code (string) – name of the physial contact setting: H for households, S for schools, W for workplaces, C for community or other
  • sheet_name (string) – name of the sheet in the excel file with contact patterns
  • file_path_dic (string) – dictionary to file paths of user specified age contact matrix, where keys are “H”, “S”, “W”, and “C”.
  • delimiter (string) – delimter for the contact matrix file
  • header (int) – row number for the header of the file
Returns:

A dictionary of the different contact matrices for each population, given by the sheet name. Keys map to the different possible physical contact settings for which data are available.

Return type:

dict

get_school_enrollment_rates(datadir=None, location=None, state_location=None, country_location=None, file_path=None, use_default=False)[source]

Get dictionary of enrollment rates by age. If use_default, then we’ll first try to look for location specific data and if that’s not available we’ll use default data from settings.location, settings.state_location, settings.country_location. This may not be appropriate for the population under study so it’s best to provide as much data as you can for the specific population.

Parameters:
  • datadir (string) – file path to the data directory
  • location (string) – name of the location
  • state_location (string) – name of the state the location is in
  • country_location (string) – name of the country the location is in
  • file_path (string) – file path to user specified school enrollment by age data
  • use_default (bool) – if True, try to first use the other parameters to find data specific to the location under study, otherwise returns default data drawing from settings.location, settings.state_location, settings.country_location.
Returns:

A dictionary of school enrollment rates by age.

Return type:

dict

get_school_size_brackets(datadir=None, location=None, state_location=None, country_location=None, file_path=None, use_default=False)[source]

Get school size brackets: depends on the source/location of the data. If use_default, then we’ll first try to look for location specific data and if that’s not available we’ll use default data from settings.location, settings.state_location, settings.country_location. This may not be appropriate for the population under study so it’s best to provide as much data as you can for the specific population.

Parameters:
  • datadir (string) – file path to the data directory
  • location (string) – name of the location
  • state_location (string) – name of the state the location is in
  • country_location (string) – name of the country the location is in
  • file_path (string) – file path to user specified school size brackets data
  • use_default (bool) – if True, try to first use the other parameters to find data specific to the location under study, otherwise returns default data drawing from settings.location, settings.state_location, settings.country_location.
Returns:

A dictionary of school size brackets.

Return type:

dict

get_school_size_distr_by_brackets(datadir=None, location=None, state_location=None, country_location=None, file_path=None, use_default=False)[source]

Get distribution of school sizes by size bracket or bin. If use_default, then we’ll first try to look for location specific data and if that’s not available we’ll use default data from settings.location, settings.state_location, settings.country_location. This may not be appropriate for the population under study so it’s best to provide as much data as you can for the specific population.

Parameters:
  • datadir (string) – file path to the data directory
  • location (string) – name of the location
  • state_location (string) – name of the state the location is in
  • country_location (string) – name of the country the location is in
  • file_path (string) – file path to user specified school size distribution data
  • use_default (bool) – if True, try to first use the other parameters to find data specific to the location under study, otherwise returns default data drawing from settings.location, settings.state_location, settings.country_location.
Returns:

A dictionary of the distribution of school sizes by bracket.

Return type:

dict

get_default_school_type_age_ranges()[source]

Define and return default school types and the age range for each.

Returns:A dictionary of default school types and the age range for each.
Return type:dict
get_default_school_types_distr_by_age()[source]

Define and return default probabilities of school type for each age.

Returns:A dictionary of default probabilities for the school type likely for each age.
Return type:dict
get_default_school_types_by_age_single()[source]

Define and return default school type by age by assigning the school type with the highest probability.

Returns:A dictionary of default school type by age.
Return type:dict
get_default_school_size_distr_brackets()[source]

Define and return default school size distribution brackets.

Returns:A dictionary of school size brackets.
Return type:dict
get_default_school_size_distr_by_type()[source]

Define and return default school size distribution for each school type. The school size distributions are binned to size groups or brackets.

Returns:A dictionary of school size distributions binned by size groups or brackets for each type of default school.
Return type:dict
get_school_type_age_ranges(datadir=None, location=None, state_location=None, country_location=None, file_path=None, use_default=False)[source]

Get a dictionary of the school types and the age range for each for the location specified.

Parameters:
  • datadir (string) – file path to the data directory
  • location (string) – name of the location
  • state_location (string) – name of the state the location is in
  • country_location (string) – name of the country the location is in
  • file_path (string) – file path to user specified distribution data
  • use_default (bool) – if True, try to first use the other parameters to find data specific to the location under study, otherwise returns default data drawing from Seattle, Washington.
Returns:

A dictionary of default school types and the age range for each.

Return type:

dict

get_school_size_distr_by_type(datadir=None, location=None, state_location=None, country_location=None, file_path=None, use_default=False)[source]

Get the school size distribution by school types. If use_default, then we’ll try to look for location specific data first, and if that’s not available we’ll use default data from the set default locations (see sp.defaults.py). This may not be appropriate for the population under study so it’s best to provide as much data as you can for the specific population.

Parameters:
  • datadir (string) – file path to the data directory
  • location (string) – name of the location
  • state_location (string) – name of the state the location is in
  • country_location (string) – name of the country the location is in
  • file_path (string) – file path to user specified distribution data
  • use_default (bool) – if True, try to first use the other parameters to find data specific to the location under study, otherwise returns default data drawing from settings.location, settings.state_location, settings.country_location
Returns:

A dictionary of school size distributions binned by size groups or brackets for each type of default school.

Return type:

dict

get_employment_rates(datadir=None, location=None, state_location=None, country_location=None, file_path=None, use_default=False)[source]

Get employment rates by age. If use_default, then we’ll first try to look for location specific data and if that’s not available we’ll use default data from settings.location, settings.state_location, settings.country_location. This may not be appropriate for the population under study so it’s best to provide as much data as you can for the specific population.

Parameters:
  • datadir (string) – file path to the data directory
  • location (string) – name of the location
  • state_location (string) – name of the state the location is in
  • country_location (string) – name of the country the location is in, which should be the ‘usa’
  • file_path (string) – file path to user specified employment by age data
  • use_default (bool) – if True, try to first use the other parameters to find data specific to the location under study, otherwise returns default data drawing from settings.location, settings.state_location, settings.country_location.
Returns:

A dictionary of employment rates by age.

Return type:

dict

get_workplace_size_brackets(datadir=None, location=None, state_location=None, country_location=None, file_path=None, use_default=False)[source]

Get workplace size brackets. If use_default, then we’ll first try to look for location specific data and if that’s not available we’ll use default data from settings.location, settings.state_location, settings.country_location. This may not be appropriate for the population under study so it’s best to provide as much data as you can for the specific population.

Parameters:
  • datadir (string) – file path to the data directory
  • location (string) – name of the location
  • state_location (string) – name of the state the location is in
  • country_location (string) – name of the country the location is in, which should be the ‘usa’
  • file_path (string) – file path to user specified workplace size brackets data
  • use_default (bool) – if True, try to first use the other parameters to find data specific to the location under study, otherwise returns default data drawing from settings.location, settings.state_location, settings.country_location.
Returns:

A dictionary of workplace size brackets.

Return type:

dict

get_workplace_size_distr_by_brackets(datadir=None, location=None, state_location=None, country_location=None, file_path=None, use_default=False)[source]

Get the distribution of workplace size by brackets. If use_default, then we’ll first try to look for location specific data and if that’s not available we’ll use default data from settings.location, settings.state_location, settings.country_location. This may not be appropriate for the population under study so it’s best to provide as much data as you can for the specific population.

Parameters:
  • datadir (string) – file path to the data directory
  • location (string) – name of the location
  • state_location (string) – name of the state the location is in
  • country_location (string) – name of the country the location is in
  • file_path (string) – file path to user specified workplace size distribution data
  • use_default (bool) – if True, try to first use the other parameters to find data specific to the location under study, otherwise returns default data drawing from settings.location, settings.state_location, settings.country_location.
Returns:

A dictionary of the distribution of workplace sizes by bracket.

Return type:

dict

get_state_postal_code(state_location, country_location)[source]

Get the state postal code.

Parameters:
  • state_location (string) – name of the state
  • country_location (string) – name of the country the state is in
Returns:

A postal code for the state_location.

Return type:

str

get_long_term_care_facility_residents_distr(datadir=None, location=None, state_location=None, country_location=None, file_path=None, use_default=False)[source]

Get size distribution of residents per facility for Long Term Care Facilities.

Parameters:
  • datadir (string) – file path to the data directory
  • location (string) – name of the location
  • state_location (string) – name of the state the location is in
  • country_location (string) – name of the country the location is in
  • file_path (string) – file path to user specified LTCF resident size distribution data
  • use_default (bool) – if True, try to first use the other parameters to find data specific to the location under study, otherwise returns default data drawing from settings.location, settings.state_location, settings.country_location.
Returns:

A dictionary of the distribution of residents per facility for Long Term Care Facilities.

Return type:

dict

get_long_term_care_facility_residents_distr_brackets(datadir=None, location=None, state_location=None, country_location=None, file_path=None, use_default=False)[source]

Get size bins for the distribution of residents per facility for Long Term Care Facilities.

Parameters:
  • datadir (string) – file path to the data directory
  • location (string) – name of the location
  • state_location (string) – name of the state the location is in
  • country_location (string) – name of the country the location is in, which should be the ‘usa’
  • file_path (string) – file path to user specified LTCF resident size brackets data
  • use_default (bool) – if True, try to first use the other parameters to find data specific to the location under study, otherwise returns default data drawing from settings.location, settings.state_location, settings.country_location.
Returns:

A dictionary of size brackets or bins for residents per facility.

Return type:

dict

get_long_term_care_facility_resident_to_staff_ratios_distr(datadir=None, location=None, state_location=None, country_location=None, file_path=None, use_default=False)[source]

Get size distribution of resident to staff ratios per facility for Long Term Care Facilities.

Parameters:
  • datadir (string) – file path to the data directory
  • location (string) – name of the location
  • state_location (string) – name of the state the location is in
  • country_location (string) – name of the country the location is in
  • file_path (string) – file path to user specified resident to staff ratio distribution data
  • use_default (bool) – if True, try to first use the other parameters to find data specific to the location under study, otherwise returns default data drawing from settings.location, settings.state_location, settings.country_location.
Returns:

A dictionary of the distribution of residents per facility for Long Term Care Facilities.

Return type:

dict

get_long_term_care_facility_resident_to_staff_ratios_brackets(datadir=None, location=None, state_location=None, country_location=None, file_path=None, use_default=False)[source]

Get size bins for the distribution of resident to staff ratios per facility for Long Term Care Facilities.

Parameters:
  • datadir (string) – file path to the data directory
  • location (string) – name of the location
  • state_location (string) – name of the state the location is in
  • country_location (string) – name of the country the location is in, which should be the ‘usa’
  • file_path (string) – file path to user specified resident to staff ratio brackets data
  • use_default (bool) – if True, try to first use the other parameters to find data specific to the location under study, otherwise returns default data drawing from settings.location, settings.state_location, settings.country_location.
Returns:

A dictionary of size brackets or bins for resident to staff ratios per facility.

Return type:

dict

get_long_term_care_facility_use_rates(datadir=None, location=None, state_location=None, country_location=None, file_path=None, use_default=False)[source]

Get Long Term Care Facility use rates by age for a state.

Parameters:
  • datadir (str) – file path to the data directory
  • location_alias (str) – more commonly known name of the location
  • state_location (str) – name of the state the location is in
  • country_location (str) – name of the country the location is in
  • file_path (string) – file path to user specified gender by age bracket distribution data
  • use_default (bool) – if True, try to first use the other parameters to find data specific to the location under study, otherwise returns default data drawing from settings.location, settings.state_location, settings.country_location.
Returns:

A dictionary of the Long Term Care Facility usage rates by age.

Return type:

dict

Note

Currently only available for the United States.