synthpops.data module

class PopulationAgeDistribution[source]

Bases: jsonobject.api.JsonObject

Class for population age distribution with a specified number of bins.

num_bins
distribution
class SchoolSizeDistributionByType[source]

Bases: jsonobject.api.JsonObject

Class for the school size distribution by school type.

school_type
size_distribution
class SchoolTypeByAge[source]

Bases: jsonobject.api.JsonObject

Class for the school type by age range.

school_type
age_range
class Location[source]

Bases: jsonobject.api.JsonObject

Class for the json object for the location containing data about the population to generate representative contact networks.

The general use case of this is to use a filepath, and the parent data is parsed from the filepath. DefaultProperty type handles either a scalar or json object. We allow a json object mainly for testing of inheriting from a parent specified directly in the json.

Most users will want to populate this with a relative or absolute file path.

Note

The structures for the population age distribution will be updated to be more flexible to take in a parameter for the number of age brackets to generate the population age distribution structure.

location_name
data_provenance_notices
citations
notes
parent
population_age_distributions
employment_rates_by_age
enrollment_rates_by_age
household_head_age_brackets
household_head_age_distribution_by_family_size
household_size_distribution
ltcf_resident_to_staff_ratio_distribution
ltcf_num_residents_distribution
ltcf_num_staff_distribution
ltcf_use_rate_distribution
school_size_brackets
school_size_distribution
school_size_distribution_by_type
school_types_by_age
workplace_size_counts_by_num_personnel
get_list_properties()[source]

Get the properties of the location data object as a list.

Returns:A list of the properties of the location json object with data about the location.
Return type:list
get_population_age_distribution(nbrackets)[source]

Get the age distribution of the population aggregated to nbrackets age brackets. If the data doesn’t contain a distribution with the requested number of brackets, an exception is raised.

Parameters:nbrackets (int) – the number of age brackets the age distribution is aggregated to
Returns:A list of the probability age distribution values indexed by the bracket number.
Return type:list
populate_parent_data_from_file_path(location, parent_file_path)[source]

Loading a location json object with necessary data fields filled from the parent location using the parent location file path.

Parameters:
  • location (json) – json object for the location data
  • parent_file_path (str) – file path to the parent location
Returns:

The location json object with necessary data fields filled from the parent location.

Return type:

json

populate_parent_data_from_json_obj(location, parent)[source]

Loading a location json object with necessary data fields filled from the parent location json.

Parameters:
  • location (json) – json object for the location data
  • parent (json) – json object for the parent location
Returns:

The location json object with necessary data fields filled from the parent location.

Return type:

json

populate_parent_data(location)[source]

Populate location json object with fields from the parent location if available.

Parameters:location (json) – json data object for the location # parameter name change for more specificity
Returns:The location json data object with data fields filled from the parent location.
Return type:json
load_location_from_json(json_obj, check_constraints=None)[source]

Load location data from json object with some checks made.

Parameters:json_obj (json) – json object containing location data
Returns:The json object with location data.
Return type:json
load_location_from_json_str(json_str, check_constraints=None)[source]

Load location data from json str with some checks made.

Parameters:json_str (str) – string version of the json object
Returns:The json object with location data.
Return type:json
get_relative_path(datadir)[source]

Get the relative path for the data folder.

Parameters:datadir (str) – data folder path
Returns:Relative path for the data folder.
Return type:str

Notes

This method may not be necessary anymore…

get_location_attr(location, property_name)[source]

Get the attribute from the json object containing location data given the associated property name.

Parameters:
  • location (json) – the json object with location data
  • property_name (str) – the property name
Returns:

If property_name exists in the location json object, return [True, attribute]. Else, return [False, None].

load_location_from_filepath(rel_filepath, check_constraints=None)[source]

Loads location data object from provided relative filepath where the file path is relative to defaults.settings.datadir.

Parameters:rel_filepath (str) – relative file path for the location data
Returns:The json object with location data.
Return type:json
save_location_to_filepath(location, abs_filepath)[source]

Saves json object with location data to provided absolute filepath.

Parameters:
  • location (json) – the json object with location data
  • abs_filepath (str) – absolute file path to where the json is saved
Returns:

None.

check_location_constraints_satisfied(location)[source]

Checks a number of constraints that need to be satisfied for the schema.

Parameters:

location (json) – the json object with location data

Returns:

None.

Raises:
  • RuntimeError with a description if one of the constraints is not
  • satisfied.
are_location_constraints_satisfied(location)[source]

Checks a number of constraints that need to be satisfied for the schema.

Parameters:location (json) – the json object with location data
Returns:[True, None] if all constraints are satisfied. [False, str] if a constraint is violated. The returned str is one of the error messages.
check_array_of_array_entry_lens_arr(array_of_arrays, expected_len)[source]
check_array_of_arrays_entry_lens(location, expected_len, property_name)[source]

Check that each array in an array of arrays has the expected length.

Parameters:
  • location (json) – the json object with location data
  • expected_len (int) – the expected length of each sub array
  • property_name (str) – the property name
Returns:

[True, None] if sub array length checks pass. [False, str] if sub array length checks fail. The returned str is the error message.

check_valid_probability_distributions(property_name, valid_properties=None)[source]

Check that the property_name is a valid probability distribution.

Parameters:
  • property_name (str) – the property name
  • valid_properties (str or list) – a list of the valid probability distributions
Returns:

None.

check_probability_distribution_sum_age_distributions(location, arr, tolerance=0.01, **kwargs)[source]

Check that each population age distribution has a sum equal to 1 within some tolerance.

Parameters:
  • location (json) – the json object with location data
  • arr (list) – the list of population age distributions
  • tolerance (float) – difference from the sum of 1 tolerated
  • kwargs (dict) – dictionary of values passed to np.isclose()
Returns:

[True, None] if the sum of the probability distribution is equal to 1 within the tolerance level. [False, str] else. The returned str is the error message with some information about the check.

check_probability_distribution_nonnegative_age_distributions(location, arr)[source]

Check that each population age distribution has all non negative values.

Parameters:
  • location (json) – the json object with location data
  • arr (list) – the list of population age distributions
Returns:

[True, None] if the sum of the probability distribution is equal to 1 within the tolerance level. [False, str] else. The returned str is the error message with some information about the check.

check_probability_distribution_sum(location, property_name, tolerance=0.01, valid_properties=None, **kwargs)[source]

Check that fields representing probability distributions have sums equal to 1 within some tolerance.

Parameters:
  • location (json) – the json object with location data
  • property_name (str) – the property name
  • tolerance (float) – difference from the sum of 1 tolerated
  • valid_properties (str or list) – a list of the valid probability distributions
  • kwargs (dict) – dictionary of values passed to np.isclose()
Returns:

[True, None] if the sum of the probability distribution is equal to 1 within the tolerance level. [False, str] else. The returned str is the error message with some information about the check.

check_probability_distribution_nonnegative(location, property_name, valid_properties=None)[source]

Check that fields representing probability distributions have all non negative values.

Parameters:
  • location (json) – the json object with location data
  • property_name (str) – the property name
  • valid_properties (str or list) – a list of the valid probability distributions
Returns:

[True, None] if the values of the probability distribution are all non negative. [False, str] else. The returned str is the error message with some information about the check.

check_all_probability_distribution_sums(location, tolerance=0.01, die=False, verbose=False, **kwargs)[source]

Checks that each probability distribution available to a location has a sum close to 1.

Parameters:
  • location (json) – the json object with location data
  • tolerance (float) – difference from the sum of 1 tolerated
  • die (bool) – raise an exception if the check fails
  • verbose (bool) – print a warning if the check fails
  • kwargs (dict) – dictionary of values passed to np.isclose()
Returns:

List of checks and a list of associated error messages.

Return type:

list, list

check_all_probability_distribution_nonnegative(location, die=False, verbose=True)[source]

Run checks that a field representing probabilty distributions has all non negative values.

Parameters:
  • location (json) – json object with the location data
  • die (bool) – raise an exception if the check fails
  • verbose (bool) – print a warning if the check fails
Returns:

List of checks and a list of associated error messages.

Return type:

list, list

check_location_name(location)[source]

Check the location json data object has a string.

Parameters:location (json) – the json object with location data
Returns:[True, str] if the location json has a str value in the location_name field. Returned str specifies the location_name. [False, str] if the location json does not have a str value in the location_name field.
check_population_age_distributions(location)[source]

Check that the population age distributions are self-consistent in the number of brackets, and each sub array has length 3.

Parameters:location (json) – the json object with location data
Returns:[True, None] if checks pass. [False, str] if checks fail.
check_employment_rates_by_age(location)[source]

Check that the employment rates by age is an array of arrays, where each sub array has length 2.

Parameters:location (json) – the json object with location data
Returns:[True, None] if checks pass. [False, str] if checks fail.
check_enrollment_rates_by_age(location)[source]

Check that the enrollment rates by age is an array of arrays, where each sub array has length 2.

Parameters:location (json) – the json object with location data
Returns:[True, None] if checks pass. [False, str] if checks fail.
check_household_head_age_brackets(location)[source]

Check that the household head age brackets is an array of arrays, where each sub array has length 2.

Parameters:location (json) – the json object with location data
Returns:[True, None] if checks pass. [False, str] if checks fail.
check_household_head_age_distributions_by_family_size(location)[source]

Check that the conditional household head age distribution by household size is an array with length equal to the number of household head age brackets.

Parameters:location (json) – the json object with location data
Returns:[True, None] if checks pass. [False, str] if checks fail.
check_household_size_distribution(location)[source]

Check that the household size distribution is an array of arrays, where each sub array has length 2.

Parameters:location (json) – the json object location data
Returns:[True, None] if checks pass. [False, str] if checks fail.
check_ltcf_resident_to_staff_ratio_distribution(location)[source]

Check that the long term care facility resident to staff ratio distribution is an array of arrays, where each sub array has length 3.

Parameters:location (json) – the json object location data
Returns:[True, None] if checks pass. [False, str] if checks fail.
check_ltcf_num_residents_distribution(location)[source]

Check that the long term care facility resident size distribution is an array of arrays, where each sub array has length 3.

Parameters:location (json) – the json object location data
Returns:[True, None] if checks pass. [False, str] if checks fail.
check_ltcf_num_staff_distribution(location)[source]

Check that the long term care facility staff size distribution is an array of arrays, where each sub array has length 3.

Parameters:location (json) – the json object location data
Returns:[True, None] if checks pass. [False, str] if checks fail.
check_school_size_brackets(location)[source]

Check that the school size distribution brackets is an array of arrays, where each sub array has length 2.

Parameters:location (json) – the json object location data
Returns:[True, None] if checks pass. [False, str] if checks fail.
check_school_size_distribution(location)[source]
check_school_size_distribution_by_type(location)[source]

Check that the school size distribution by school type is an array of arrays, where each sub array has length 3.

Parameters:location (json) – the json object location data
Returns:[True, None] if checks pass. [False, str] if checks fail.
check_school_types_by_age(location)[source]

Check that the school types by age range is an array of arrays, where each sub array has length 2.

Parameters:location (json) – the json object location data
Returns:[True, None] if checks pass. [False, str] if checks fail.
check_workplace_size_counts_by_num_personnel(location)[source]

Check that the workplace size count is an array of arrays, where each sub array has length 3.

Parameters:location (json) – the json object location data
Returns:[True, None] if checks pass. [False, str] if checks fail.
convert_df_to_json_array(df, cols, int_cols=None)[source]

Convert desired data from a pandas dataframe into a json array.

Parameters:
  • df (pandas dataframe) – the dataframe with data
  • cols (list) – list of the columns to convert to the json array format
  • int_cols (str or list) – a str or list of columns to convert to integer values
Returns:

An array version of the pandas dataframe to be added to synthpops json data objects.

Return type:

array