hiv_workflow.lib.analysis.base_distribution module¶
-
class
hiv_workflow.lib.analysis.base_distribution.
BaseDistribution
¶ Bases:
object
-
LOG_FLOAT_TINY
= -708.3964185322641¶
-
abstract
prepare
(dfw: hiv_workflow.lib.analysis.data_frame_wrapper.DataFrameWrapper, channel: str, weight_channel: str, additional_keep: List[str]) → hiv_workflow.lib.analysis.data_frame_wrapper.DataFrameWrapper¶ Prepare a DataFrameWrapper and this distribution object for a compare() call together. This includes dataframe verification/data checking, adding additional distribution-specific channels/columns, and trimming the data columns to the minimum needed. Depending on the particular distribution type, additional attributes on self may be set to prepare it in addition to the dfw (e.g. setting self.alpha_channel and self.beta_channel, derived from arg channel on self for BetaDistribution). :param dfw: DataFrameWrapper containing data that will be used in a future compare() call :param channel: data channel/column in dfw that the future compare() call be regarding :param weight_channel: an analyzer weighting channel that must be kept, if specified :param additional_keep: additional columns in the DataFrameWrapper to preserve
Returns: a modified copy of the input DataFrameWrapper
-
abstract
compare
(df: pandas.core.frame.DataFrame, reference_channel: str, data_channel: str) → float¶ Returns a score between -708.3964 and 100 (bad, good) for how well the dataframe (df) simulation data column (data_channel) matches the reference data column (reference_channel). :param df: pandas DataFrame with columns of data to compare :param reference_channel: reference data channel in dataframe :param data_channel: simulation data channel to compare to the reference data channel
Returns: a floating point score measuring the degree of data/reference fit, also known colloquially as ‘likelihood’
-
abstract
add_percentile_values
(dfw: hiv_workflow.lib.analysis.data_frame_wrapper.DataFrameWrapper, channel: str, p: float) → List[str]¶ Adds a new data channel to a DataFrameWrapper object that represents a requested probability threshold/value for a specified channel. Useful for creating uncertainty envelopes in plots. :param dfw: DataFrameWrapper with data to construct percentiles and to add percentiles to :param channel: the column in dfw that percentiles will be constructed from/for :param p: the 0-1 percentile level for the given channel to add
Returns: a list containing the new channel name in dfw
-
classmethod
from_string
(distribution_name: str) → hiv_workflow.lib.analysis.base_distribution.BaseDistribution¶ Loads and returns a distribution object of the type appropriate to the provided name, e.g. BetaDistribution from “beta”. :param distribution_name: name of distribution type to load
Returns: a distribution object
-
classmethod
from_uncertainty_channel
(uncertainty_channel: str) → hiv_workflow.lib.analysis.base_distribution.BaseDistribution¶ Loads and returns a distribution object of the type appropriate to the provided uncertainty channel, e.g. BetaDistribution from ‘effective_count’. WARNING: this method will return the FIRST MATCH from checking distribution types in a non-guaranteed order, so there could be an issue if there are ever distribution types that share an uncertainty channel name. :param uncertainty_channel: name of uncertainty channel to detect a distribution from
Returns: a distribution object
-