hiv_workflow.lib.analysis.beta_distribution module

class hiv_workflow.lib.analysis.beta_distribution.BetaDistribution

Bases: hiv_workflow.lib.analysis.base_distribution.BaseDistribution

exception InvalidEffectiveCountException

Bases: Exception

exception InvalidCountChannelException

Bases: Exception

COUNT_CHANNEL = 'effective_count'
UNCERTAINTY_CHANNEL = 'effective_count'
prepare(dfw, channel, weight_channel=None, additional_keep=None)

Prepare a DataFrameWrapper and this distribution object for a compare() call together. This includes dataframe verification/data checking, adding additional distribution-specific channels/columns, and trimming the data columns to the minimum needed. Depending on the particular distribution type, additional attributes on self may be set to prepare it in addition to the dfw (e.g. setting self.alpha_channel and self.beta_channel, derived from arg channel on self for BetaDistribution). :param dfw: DataFrameWrapper containing data that will be used in a future compare() call :param channel: data channel/column in dfw that the future compare() call be regarding :param weight_channel: an analyzer weighting channel that must be kept, if specified :param additional_keep: additional columns in the DataFrameWrapper to preserve

Returns: a modified copy of the input DataFrameWrapper

compare(df, reference_channel, data_channel)

Returns a score between -708.3964 and 100 (bad, good) for how well the dataframe (df) simulation data column (data_channel) matches the reference data column (reference_channel). :param df: pandas DataFrame with columns of data to compare :param reference_channel: reference data channel in dataframe :param data_channel: simulation data channel to compare to the reference data channel

Returns: a floating point score measuring the degree of data/reference fit, also known colloquially as ‘likelihood’

static construct_beta_channel(channel, type)
add_percentile_values(dfw, channel, p)

Adds a new data channel to a DataFrameWrapper object that represents a requested probability threshold/value for a specified channel. Useful for creating uncertainty envelopes in plots. :param dfw: DataFrameWrapper with data to construct percentiles and to add percentiles to :param channel: the column in dfw that percentiles will be constructed from/for :param p: the 0-1 percentile level for the given channel to add

Returns: a list containing the new channel name in dfw

add_beta_parameters(dfw, channel)

Compute and add alpha, beta parameters for a beta distribution to the current self._dataframe object. Distribution is computed for the provided channel (data field), using ‘count’. Result is put into new channels/columns named <channel>–Beta-alpha, <channel>–Beta-beta. If both alpha/beta channels already exist in the dataframe, nothing is computed.


channel – The data channel/column to compute the beta distribution for.


a list of the channel-associated alpha and beta parameter channel names.