hiv_workflow.lib.analysis.data_frame_wrapper module

Maybe add xlsx reading, from a defined, similar format to csv

Currently, all files read via .from_directory() are merged into ONE dataframe.

class hiv_workflow.lib.analysis.data_frame_wrapper.DataFrameWrapper(filename=None, dataframe=None, stratifiers=None)

Bases: object

exception UnsupportedFileType

Bases: Exception

exception MissingRequiredData

Bases: Exception

exception InconsistentStratification

Bases: Exception

CSV = 'csv'
property channels

Channels are non-stratifier columns :return:

filter(conditions=None, keep_only=None)

Selects rows from the internal dataframe that satisfy all provided conditions. The stratifiers of the result will exclude current-object stratifiers that contain NaN in the resulting rows.

This method should very rarely if ever be called without a keep_only specified, unless conditions are specified.

Always results in the minimal row/column set satisfying the inputs with no remaining NaN values in the dataframe

Parameters
  • conditions – an iterator (e.g. list) of tuples/triplets specifying (in order) stratifier, operator, value. e.g. [‘min_age’, operator.ge, 25] (to select rows where ‘min_age’ is >= 25)

  • keep_only – If not None, then is a list of data channels to keep (in addition to stratifiers) after filtering. Rows with any NaN values will be dropped after trimming to these channels.

Returns

an object of the same type as the object this method is called on with only selected rows remaining.

merge(other_dfw, index, keep_only=None)

Attempts to merge two DataFrameWrapper objects into one using the provided index list as a multi-index.

Parameters
  • other_dfw – the DataFrameWrapper object to merge with.

  • index – a list of columns to merge on. All are required in both DataFrameWrapper objects.

  • keep_only – a list of columns. Triggers removal of result rows where NaN appears in any of these specified columns. Result will contain these columns AND those from provided index.

Returns

A newly created, merged object of the exact type of self and the stratifiers equal to the provided index.

verify_required_items(needed, available=None)

Standard method for checking if necessary items/channels are available and printing a meaningful error if not :param needed: channels to look for :param available: channel list to look in :return: Nothing

equals(other_dfw)
classmethod from_directory(directory, file_type=None, stratifiers=None)