synthpops.data_distributions module¶
Read in data distributions.
-
get_relative_path
(datadir)[source]¶ Get the path relative for the datadir.
Parameters: datadir (str) – path to a specified data directory Returns: A path relative to a specified data directory datadir Return type: str
-
calculate_which_nbrackets_to_use
(location_data, nbrackets=None)[source]¶ Calculate the number of age brackets to use by default.
Parameters: nbrackets (int) – the number of age brackets to use Returns: The number of age brackets to use. Return type: int
-
sanitize_location
(location)[source]¶ Process and return a valid name for a location.
Parameters: location (str) – name of the location Returns: A processed location name. Return type: str
-
calculate_location_filename
(location, state_location, country_location)[source]¶ Process a location filename.
Parameters: - location (string) – name of the location
- state_location (string) – name of the state the location is in
- country_location (string) – name of the country the location is in
Returns: A filename for where the location data reside.
Return type: str
-
calculate_location_filepath
(location, state_location, country_location)[source]¶ Process a location filepath.
Parameters: - location (string) – name of the location
- state_location (string) – name of the state the location is in
- country_location (string) – name of the country the location is in
Returns: A filename for where the location data reside.
Return type: str
-
load_location
(specific_location, state_location, country_location, revert_to_default=None)[source]¶ Loading json object for the location data.
Parameters: - specific_location (string) – name of the location
- state_location (string) – name of the state the location is in
- country_location (string) – name of the country the location is in
- revert_to_default (bool) – If True, try to first find location specific data to return otherwise use default data specified by the default location
Returns: A filename for where the location data reside.
Return type: str
-
read_age_bracket_distr
(datadir=None, location=None, state_location=None, country_location=None, nbrackets=None, file_path=None, use_default=False)[source]¶ A dict of the age distribution by age brackets. If use_default, then we’ll first try to look for location specific data and if that’s not available we’ll use default data from settings.location, settings.state_location, settings.country_location. This may not be appropriate for the population under study so it’s best to provide as much data as you can for the specific population.
Parameters: - datadir (string) – file path to the data directory
- location (string) – name of the location
- state_location (string) – name of the state the location is in
- country_location (string) – name of the country the location is in
- file_path (string) – file path to user specified age bracket distribution data
- use_default (bool) – if True, try to first use the other parameters to find data specific to the location under study, otherwise returns default data drawing from the settings.location, settings.state_location, settings.country_location.
Returns: A dictionary of the age distribution by age bracket. Keys map to a range of ages in that age bracket.
Return type: dict
-
get_smoothed_single_year_age_distr
(datadir=None, location=None, state_location=None, country_location=None, nbrackets=None, file_path=None, use_default=False, window_length=7)[source]¶ A smoothed dict of the age distribution by single years. If use_default, then we’ll first try to look for location specific data and if that’s not available we’ll use default data from settings.location, settings.state_location, settings.country_location. This may not be appropriate for the population under study so it’s best to provide as much data as you can for the specific population. Using moving windows to smooth out the age distribution.
Parameters: - datadir (string) – file path to the data directory
- location (string) – name of the location
- state_location (string) – name of the state the location is in
- country_location (string) – name of the country the location is in
- file_path (string) – file path to user specified age bracket distribution data
- use_default (bool) – If True, try to first use the other parameters to find data specific to the location under study, otherwise returns default data drawing from the settings.location, settings.state_location, settings.country_location.
- window_length (int) – length of window, in units of years, over which to average or smooth out age distribution
Returns: A dictionary of the age distribution by age bracket. Keys map to a range of ages in that age bracket.
Return type: dict
-
get_household_size_distr
(datadir=None, location=None, state_location=None, country_location=None, file_path=None, use_default=False)[source]¶ A dictionary of the distribution of household sizes. If you don’t give the file_path, then supply the location, state_location, and country_location strings. If use_default, then we’ll first try to look for location specific data and if that’s not available we’ll use default data from settings.location, settings.state_location, settings.country_location. This may not be appropriate for the population under study so it’s best to provide as much data as you can for the specific population.
Parameters: - datadir (string) – file path to the data directory
- location (string) – name of the location
- state_location (string) – name of the state the location is in
- country_location (string) – name of the country the location is in
- file_path (string) – file path to user specified household size distribution data
- use_default (bool) – if True, try to first use the other parameters to find data specific to the location under study, otherwise returns default data drawing from settings.location, settings.state_location, settings.country_location.
Returns: A dictionary of the household size distribution data. Keys map to the household size as an integer, values are the percent of households of that size.
Return type: dict
-
get_head_age_brackets
(datadir=None, location=None, state_location=None, country_location=None, file_path=None, use_default=False)[source]¶ Get a dictionary of head age brackets either from the file_path directly, or using the other parameters to figure out what the file_path should be. If use_default, then we’ll first try to look for location specific data and if that’s not available we’ll use default data from settings.location, settings.state_location, settings.country_location. This may not be appropriate for the population under study so it’s best to provide as much data as you can for the specific population.
Parameters: - datadir (string) – file path to the data directory
- location (string) – name of the location
- state_location (string) – name of the state
- country_location (string) – name of the country the state_location is in
- file_path (string) – file path to user specified head age brackets data
- use_default (bool) – if True, try to first use the other parameters to find data specific to the location under study, otherwise returns default data drawing from the settings.location, settings.state_location, settings.country_location.
Returns: A dictionary of the age brackets for head of household distribution data. Keys map to the age bracket as an integer, values are the percent of households which head of household in that age bracket.
Return type: dict
-
get_head_age_by_size_distr
(datadir=None, location=None, state_location=None, country_location=None, file_path=None, use_default=False)[source]¶ Create an array of head of household age bracket counts (column) given by size (row). If use_default, then we’ll first try to look for location specific data and if that’s not available we’ll use default data from the settings.location, settings.state_location, settings.country_location. This may not be appropriate for the population under study so it’s best to provide as much data as you can for the specific population.
Parameters: - datadir (string) – file path to the data directory
- location (string) – name of the location
- state_location (string) – name of the state
- country_location (string) – name of the country the state_location is in
- file_path (string) – file path to user specified age of the head of the household by household size distribution data
- use_default (bool) – if True, try to first use the other parameters to find data specific to the location under study, otherwise returns default data drawing from settings.location, settings.state_location, settings.country_location.
Returns: An array where each row s represents the age distribution of the head of households for households of size s-1.
Return type: ndarray
-
get_census_age_brackets
(datadir=None, location=None, state_location=None, country_location=None, file_path=None, use_default=False, nbrackets=None)[source]¶ Get census age brackets: depends on the country or source of the age distribution and the contact pattern data. If use_default, then we’ll first try to look for location specific data and if that’s not available we’ll use default data from settings.location, settings.state_location, settings.country_location. This may not be appropriate for the population under study so it’s best to provide as much data as you can for the specific population.
Parameters: - datadir (string) – file path to the data directory
- location (string) – name of the location
- state_location (string) – name of the state
- country_location (string) – name of the country the state_location is in
- file_path (string) – file path to user specified census age brackets
- use_default (bool) – if True, try to first use the other parameters to find data specific to the location under study, otherwise returns default data drawing from settings.location, settings.state_location, settings.country_location.
Returns: A dictionary of the range of ages that map to each age bracket.
Return type: dict
-
get_contact_matrix
(datadir, setting_code, sheet_name=None, file_path=None, delimiter=' ', header=None)[source]¶ Get setting specific age contact matrix given sheet name to use. If file_path is given, then delimiter and header should also be specified.
Parameters: - datadir (string) – file path to the data directory
- setting_code (string) – name of the physial contact setting: H for households, S for schools, W for workplaces, C for community or other
- sheet_name (string) – name of the sheet in the excel file with contact patterns
- file_path (string) – file path to user specified age contact matrix
- delimiter (string) – delimter for the contact matrix file
- header (int) – row number for the header of the file
Returns: Matrix of contact patterns where each row i is the average contact patterns for an individual in age bracket i and the columns represent the age brackets of their contacts. The matrix element i,j is then the contact rate, number, or frequency for the average individual in age bracket i with all of their contacts in age bracket j in that physical contact setting.
Return type: ndarray
-
get_contact_matrices
(datadir=None, sheet_name=None, file_path_dic=None, delimiter=' ', header=None, use_default=False)[source]¶ Create a dict of setting specific age contact matrices. If use_default, then we’ll first try to look for location specific data and if that’s not available we’ll use default data from settings.sheet_name. This may not be appropriate for the population under study so it’s best to provide as much data as you can for the specific population.
Parameters: - datadir (string) – file path to the data directory
- setting_code (string) – name of the physial contact setting: H for households, S for schools, W for workplaces, C for community or other
- sheet_name (string) – name of the sheet in the excel file with contact patterns
- file_path_dic (string) – dictionary to file paths of user specified age contact matrix, where keys are “H”, “S”, “W”, and “C”.
- delimiter (string) – delimter for the contact matrix file
- header (int) – row number for the header of the file
Returns: A dictionary of the different contact matrices for each population, given by the sheet name. Keys map to the different possible physical contact settings for which data are available.
Return type: dict
-
get_school_enrollment_rates
(datadir=None, location=None, state_location=None, country_location=None, file_path=None, use_default=False)[source]¶ Get dictionary of enrollment rates by age. If use_default, then we’ll first try to look for location specific data and if that’s not available we’ll use default data from settings.location, settings.state_location, settings.country_location. This may not be appropriate for the population under study so it’s best to provide as much data as you can for the specific population.
Parameters: - datadir (string) – file path to the data directory
- location (string) – name of the location
- state_location (string) – name of the state the location is in
- country_location (string) – name of the country the location is in
- file_path (string) – file path to user specified school enrollment by age data
- use_default (bool) – if True, try to first use the other parameters to find data specific to the location under study, otherwise returns default data drawing from settings.location, settings.state_location, settings.country_location.
Returns: A dictionary of school enrollment rates by age.
Return type: dict
-
get_school_size_brackets
(datadir=None, location=None, state_location=None, country_location=None, file_path=None, use_default=False)[source]¶ Get school size brackets: depends on the source/location of the data. If use_default, then we’ll first try to look for location specific data and if that’s not available we’ll use default data from settings.location, settings.state_location, settings.country_location. This may not be appropriate for the population under study so it’s best to provide as much data as you can for the specific population.
Parameters: - datadir (string) – file path to the data directory
- location (string) – name of the location
- state_location (string) – name of the state the location is in
- country_location (string) – name of the country the location is in
- file_path (string) – file path to user specified school size brackets data
- use_default (bool) – if True, try to first use the other parameters to find data specific to the location under study, otherwise returns default data drawing from settings.location, settings.state_location, settings.country_location.
Returns: A dictionary of school size brackets.
Return type: dict
-
get_school_size_distr_by_brackets
(datadir=None, location=None, state_location=None, country_location=None, file_path=None, use_default=False)[source]¶ Get distribution of school sizes by size bracket or bin. If use_default, then we’ll first try to look for location specific data and if that’s not available we’ll use default data from settings.location, settings.state_location, settings.country_location. This may not be appropriate for the population under study so it’s best to provide as much data as you can for the specific population.
Parameters: - datadir (string) – file path to the data directory
- location (string) – name of the location
- state_location (string) – name of the state the location is in
- country_location (string) – name of the country the location is in
- file_path (string) – file path to user specified school size distribution data
- use_default (bool) – if True, try to first use the other parameters to find data specific to the location under study, otherwise returns default data drawing from settings.location, settings.state_location, settings.country_location.
Returns: A dictionary of the distribution of school sizes by bracket.
Return type: dict
-
get_default_school_type_age_ranges
()[source]¶ Define and return default school types and the age range for each.
Returns: A dictionary of default school types and the age range for each. Return type: dict
-
get_default_school_types_distr_by_age
()[source]¶ Define and return default probabilities of school type for each age.
Returns: A dictionary of default probabilities for the school type likely for each age. Return type: dict
-
get_default_school_types_by_age_single
()[source]¶ Define and return default school type by age by assigning the school type with the highest probability.
Returns: A dictionary of default school type by age. Return type: dict
-
get_default_school_size_distr_brackets
()[source]¶ Define and return default school size distribution brackets.
Returns: A dictionary of school size brackets. Return type: dict
-
get_default_school_size_distr_by_type
()[source]¶ Define and return default school size distribution for each school type. The school size distributions are binned to size groups or brackets.
Returns: A dictionary of school size distributions binned by size groups or brackets for each type of default school. Return type: dict
-
get_school_type_age_ranges
(datadir=None, location=None, state_location=None, country_location=None, file_path=None, use_default=False)[source]¶ Get a dictionary of the school types and the age range for each for the location specified.
Parameters: - datadir (string) – file path to the data directory
- location (string) – name of the location
- state_location (string) – name of the state the location is in
- country_location (string) – name of the country the location is in
- file_path (string) – file path to user specified distribution data
- use_default (bool) – if True, try to first use the other parameters to find data specific to the location under study, otherwise returns default data drawing from Seattle, Washington.
Returns: A dictionary of default school types and the age range for each.
Return type: dict
-
get_school_size_distr_by_type
(datadir=None, location=None, state_location=None, country_location=None, file_path=None, use_default=False)[source]¶ Get the school size distribution by school types. If use_default, then we’ll try to look for location specific data first, and if that’s not available we’ll use default data from the set default locations (see sp.defaults.py). This may not be appropriate for the population under study so it’s best to provide as much data as you can for the specific population.
Parameters: - datadir (string) – file path to the data directory
- location (string) – name of the location
- state_location (string) – name of the state the location is in
- country_location (string) – name of the country the location is in
- file_path (string) – file path to user specified distribution data
- use_default (bool) – if True, try to first use the other parameters to find data specific to the location under study, otherwise returns default data drawing from settings.location, settings.state_location, settings.country_location
Returns: A dictionary of school size distributions binned by size groups or brackets for each type of default school.
Return type: dict
-
get_employment_rates
(datadir=None, location=None, state_location=None, country_location=None, file_path=None, use_default=False)[source]¶ Get employment rates by age. If use_default, then we’ll first try to look for location specific data and if that’s not available we’ll use default data from settings.location, settings.state_location, settings.country_location. This may not be appropriate for the population under study so it’s best to provide as much data as you can for the specific population.
Parameters: - datadir (string) – file path to the data directory
- location (string) – name of the location
- state_location (string) – name of the state the location is in
- country_location (string) – name of the country the location is in, which should be the ‘usa’
- file_path (string) – file path to user specified employment by age data
- use_default (bool) – if True, try to first use the other parameters to find data specific to the location under study, otherwise returns default data drawing from settings.location, settings.state_location, settings.country_location.
Returns: A dictionary of employment rates by age.
Return type: dict
-
get_workplace_size_brackets
(datadir=None, location=None, state_location=None, country_location=None, file_path=None, use_default=False)[source]¶ Get workplace size brackets. If use_default, then we’ll first try to look for location specific data and if that’s not available we’ll use default data from settings.location, settings.state_location, settings.country_location. This may not be appropriate for the population under study so it’s best to provide as much data as you can for the specific population.
Parameters: - datadir (string) – file path to the data directory
- location (string) – name of the location
- state_location (string) – name of the state the location is in
- country_location (string) – name of the country the location is in, which should be the ‘usa’
- file_path (string) – file path to user specified workplace size brackets data
- use_default (bool) – if True, try to first use the other parameters to find data specific to the location under study, otherwise returns default data drawing from settings.location, settings.state_location, settings.country_location.
Returns: A dictionary of workplace size brackets.
Return type: dict
-
get_workplace_size_distr_by_brackets
(datadir=None, location=None, state_location=None, country_location=None, file_path=None, use_default=False)[source]¶ Get the distribution of workplace size by brackets. If use_default, then we’ll first try to look for location specific data and if that’s not available we’ll use default data from settings.location, settings.state_location, settings.country_location. This may not be appropriate for the population under study so it’s best to provide as much data as you can for the specific population.
Parameters: - datadir (string) – file path to the data directory
- location (string) – name of the location
- state_location (string) – name of the state the location is in
- country_location (string) – name of the country the location is in
- file_path (string) – file path to user specified workplace size distribution data
- use_default (bool) – if True, try to first use the other parameters to find data specific to the location under study, otherwise returns default data drawing from settings.location, settings.state_location, settings.country_location.
Returns: A dictionary of the distribution of workplace sizes by bracket.
Return type: dict
-
get_state_postal_code
(state_location, country_location)[source]¶ Get the state postal code.
Parameters: - state_location (string) – name of the state
- country_location (string) – name of the country the state is in
Returns: A postal code for the state_location.
Return type: str
-
get_long_term_care_facility_residents_distr
(datadir=None, location=None, state_location=None, country_location=None, file_path=None, use_default=False)[source]¶ Get size distribution of residents per facility for Long Term Care Facilities.
Parameters: - datadir (string) – file path to the data directory
- location (string) – name of the location
- state_location (string) – name of the state the location is in
- country_location (string) – name of the country the location is in
- file_path (string) – file path to user specified LTCF resident size distribution data
- use_default (bool) – if True, try to first use the other parameters to find data specific to the location under study, otherwise returns default data drawing from settings.location, settings.state_location, settings.country_location.
Returns: A dictionary of the distribution of residents per facility for Long Term Care Facilities.
Return type: dict
-
get_long_term_care_facility_residents_distr_brackets
(datadir=None, location=None, state_location=None, country_location=None, file_path=None, use_default=False)[source]¶ Get size bins for the distribution of residents per facility for Long Term Care Facilities.
Parameters: - datadir (string) – file path to the data directory
- location (string) – name of the location
- state_location (string) – name of the state the location is in
- country_location (string) – name of the country the location is in, which should be the ‘usa’
- file_path (string) – file path to user specified LTCF resident size brackets data
- use_default (bool) – if True, try to first use the other parameters to find data specific to the location under study, otherwise returns default data drawing from settings.location, settings.state_location, settings.country_location.
Returns: A dictionary of size brackets or bins for residents per facility.
Return type: dict
-
get_long_term_care_facility_resident_to_staff_ratios_distr
(datadir=None, location=None, state_location=None, country_location=None, file_path=None, use_default=False)[source]¶ Get size distribution of resident to staff ratios per facility for Long Term Care Facilities.
Parameters: - datadir (string) – file path to the data directory
- location (string) – name of the location
- state_location (string) – name of the state the location is in
- country_location (string) – name of the country the location is in
- file_path (string) – file path to user specified resident to staff ratio distribution data
- use_default (bool) – if True, try to first use the other parameters to find data specific to the location under study, otherwise returns default data drawing from settings.location, settings.state_location, settings.country_location.
Returns: A dictionary of the distribution of residents per facility for Long Term Care Facilities.
Return type: dict
-
get_long_term_care_facility_resident_to_staff_ratios_brackets
(datadir=None, location=None, state_location=None, country_location=None, file_path=None, use_default=False)[source]¶ Get size bins for the distribution of resident to staff ratios per facility for Long Term Care Facilities.
Parameters: - datadir (string) – file path to the data directory
- location (string) – name of the location
- state_location (string) – name of the state the location is in
- country_location (string) – name of the country the location is in, which should be the ‘usa’
- file_path (string) – file path to user specified resident to staff ratio brackets data
- use_default (bool) – if True, try to first use the other parameters to find data specific to the location under study, otherwise returns default data drawing from settings.location, settings.state_location, settings.country_location.
Returns: A dictionary of size brackets or bins for resident to staff ratios per facility.
Return type: dict
-
get_long_term_care_facility_use_rates
(datadir=None, location=None, state_location=None, country_location=None, file_path=None, use_default=False)[source]¶ Get Long Term Care Facility use rates by age for a state.
Parameters: - datadir (str) – file path to the data directory
- location_alias (str) – more commonly known name of the location
- state_location (str) – name of the state the location is in
- country_location (str) – name of the country the location is in
- file_path (string) – file path to user specified gender by age bracket distribution data
- use_default (bool) – if True, try to first use the other parameters to find data specific to the location under study, otherwise returns default data drawing from settings.location, settings.state_location, settings.country_location.
Returns: A dictionary of the Long Term Care Facility usage rates by age.
Return type: dict
Note
Currently only available for the United States.