Samples#

class Samples(fname, memory_buffer=True, preload=False)[source]#

Bases: object

Stores CSV outputs and summary dataframes

To construct, use Samples.new(). To read an existing one, use Samples(fname). The sample files are just ZIP archives with plain text CSV and TXT files so they can be easily accessed externally as well.

Attributes

`columns`	Alias summary dataframe columns
`id`	Return a dictionary with the identifiers and associated values
`identifier`	Return tuple identifier for this run
`index`	Alias summary dataframe index
`seeds`	Return array of all seeds
`zipfile`

Methods

copy()[source]#

Shallow copy - shared cache, copied summary

This allows efficient filtering of seeds within runs by removing rows from the copy’s summary, while not reloading or duplicating any of the dataframes in memory

preload()[source]#

Load all dataframes into cache

This is done based on the seeds in self.seeds, therefore if some of the seeds are removed prior to preloading, then those dataframes will not be loaded

property index#: Alias summary dataframe index

property columns#: Alias summary dataframe columns

property identifier#

Return tuple identifier for this run

The identifier is something like

(‘level_1’,3,’level_5’)

Can be used to identify this result set in the context of sweeping over these identifiers e.g. the identifier could contain the starting level of restrictions. The dimensionality will vary depending on the analysis, hence it returns a tuple of arbitrary length. It would just be expected that all results being analyzed at the same time would have the same set of identifiers. The first index level is always ‘seed’, therefore it will dropped from the ID.

property id#

Return a dictionary with the identifiers and associated values

For example:

>>> result.identifier
(2.0, 'Gradually escalate restrictions', 0.16, 0.5, '95_70', 1)
>>>  result.id
{'beta_multiplier': 2.0,
 'strategy': 'Gradually escalate restrictions',
 'symp_test': 0.16,
 'vac_rel_test': 0.5,
 'vac_peak_coverage': '95_70',
 'incursions_per_day': 1}

Returns: A dictionary {identifier name: value}

property seeds#

Return array of all seeds

The seeds are ‘registered’ in the “seed” column of the summary dataframe. Therefore, to discard seeds, the rows can be dropped from the summary dataframe. In that case, iterating over Samples.seeds() will skip the excluded runs. Indexing into the Samples object will also fail to retrieve seeds that have been removed from the summary.

Returns: Array of seed values

classmethod new(folder, outputs, identifiers=None, fname=None, verbose=True)[source]#

Parameters:

folder – The folder name
outputs – A list of tuples (df:pd.DataFrame, summary_row:dict) where the summary row as an entry ‘seed’ for the seed
identifiers – A list of columns to use as identifiers. These should appear in the summary dataframe and should have the same value for all samples. This is useful when generating multiple sets of results e.g., for scenarios (optional)

get(seed)[source]#

Retrieve dataframe and summary row

Use Samples[seed] to read only the dataframe. Use Samples.get(seed) to read both the dataframe and summary row

items()[source]#

Iterate over seeds and dataframes

Example usage

>>> res = Samples(...)
>>> for seed, (row, df) in res:
>>>     ...

Returns: Tuple with

seed
Samples.get(seed) i.e. a tuple with
- the summary dataframe row for the requested seed
- the corresponding CSV output for that run

apply(fcn, *args, **kwargs)[source]#

Apply/map function to every dataframe

The function will be applied to every individual dataframe in the collection.

Parameters:

fcn – A function to apply. It should take in a dataframe
args – Additional arguments for fcn
kwargs – Additional arguments for fcn

Returns: A list with the output of fcn