hiv_workflow.lib.analysis.base_distribution module¶

class hiv_workflow.lib.analysis.base_distribution.BaseDistribution¶

Bases: object

exception UnknownDistributionException¶: Bases: Exception

LOG_FLOAT_TINY = -708.3964185322641¶

abstract prepare(dfw: hiv_workflow.lib.analysis.data_frame_wrapper.DataFrameWrapper, channel: str, weight_channel: str, additional_keep: List[str]) → hiv_workflow.lib.analysis.data_frame_wrapper.DataFrameWrapper ¶

Prepare a DataFrameWrapper and this distribution object for a compare() call together. This includes dataframe verification/data checking, adding additional distribution-specific channels/columns, and trimming the data columns to the minimum needed. Depending on the particular distribution type, additional attributes on self may be set to prepare it in addition to the dfw (e.g. setting self.alpha_channel and self.beta_channel, derived from arg channel on self for BetaDistribution). :param dfw: DataFrameWrapper containing data that will be used in a future compare() call :param channel: data channel/column in dfw that the future compare() call be regarding :param weight_channel: an analyzer weighting channel that must be kept, if specified :param additional_keep: additional columns in the DataFrameWrapper to preserve

Returns: a modified copy of the input DataFrameWrapper

abstract compare(df: pandas.core.frame.DataFrame, reference_channel: str, data_channel: str) → float ¶

Returns a score between -708.3964 and 100 (bad, good) for how well the dataframe (df) simulation data column (data_channel) matches the reference data column (reference_channel). :param df: pandas DataFrame with columns of data to compare :param reference_channel: reference data channel in dataframe :param data_channel: simulation data channel to compare to the reference data channel

Returns: a floating point score measuring the degree of data/reference fit, also known colloquially as ‘likelihood’

abstract add_percentile_values(dfw: hiv_workflow.lib.analysis.data_frame_wrapper.DataFrameWrapper, channel: str, p: float) → List[str]¶

Adds a new data channel to a DataFrameWrapper object that represents a requested probability threshold/value for a specified channel. Useful for creating uncertainty envelopes in plots. :param dfw: DataFrameWrapper with data to construct percentiles and to add percentiles to :param channel: the column in dfw that percentiles will be constructed from/for :param p: the 0-1 percentile level for the given channel to add

Returns: a list containing the new channel name in dfw

classmethod from_string(distribution_name: str) → hiv_workflow.lib.analysis.base_distribution.BaseDistribution ¶

Loads and returns a distribution object of the type appropriate to the provided name, e.g. BetaDistribution from “beta”. :param distribution_name: name of distribution type to load

Returns: a distribution object

classmethod from_uncertainty_channel(uncertainty_channel: str) → hiv_workflow.lib.analysis.base_distribution.BaseDistribution ¶

Loads and returns a distribution object of the type appropriate to the provided uncertainty channel, e.g. BetaDistribution from ‘effective_count’. WARNING: this method will return the FIRST MATCH from checking distribution types in a non-guaranteed order, so there could be an issue if there are ever distribution types that share an uncertainty channel name. :param uncertainty_channel: name of uncertainty channel to detect a distribution from

Returns: a distribution object