rsvsim.analysis module¶
Additional analysis functions that are not part of the core RSVsim workflow, but which are useful for particular investigations.
-
class
rsvsim.analysis.
Analyzer
(label=None)[source]¶ Bases:
sciris.sc_utils.prettyobj
Base class for analyzers. Based on the Intervention class. Analyzers are used to provide more detailed information about a simulation than is available by default – for example, pulling states out of sim.people on a particular timestep before it gets updated in the next timestep.
To retrieve a particular analyzer from a sim, use sim.get_analyzer().
- Parameters
label (str) – a label for the Analyzer (used for ease of identification)
-
finalize
(sim=None)[source]¶ Finalize analyzer
This method is run once as part of sim.finalize() enabling the analyzer to perform any final operations after the simulation is complete (e.g. rescaling)
-
apply
(sim)[source]¶ Apply analyzer at each time point. The analyzer has full access to the sim object, and typically stores data/results in itself. This is the core method which each analyzer object needs to implement.
- Parameters
sim – the Sim instance
-
shrink
(in_place=False)[source]¶ Remove any excess stored data from the intervention; for use with sim.shrink().
- Parameters
in_place (bool) – whether to shrink the intervention (else shrink a copy)
-
to_json
()[source]¶ Return JSON-compatible representation
Custom classes can’t be directly represented in JSON. This method is a one-way export to produce a JSON-compatible representation of the intervention. This method will attempt to JSONify each attribute of the intervention, skipping any that fail.
- Returns
JSON-serializable representation
-
class
rsvsim.analysis.
snapshot
(days, *args, die=True, **kwargs)[source]¶ Bases:
rsvsim.analysis.Analyzer
Analyzer that takes a “snapshot” of the sim.people array at specified points in time, and saves them to itself. To retrieve them, you can either access the dictionary directly, or use the get() method.
- Parameters
days (list) – list of ints/strings/date objects, the days on which to take the snapshot
args (list) – additional day(s)
die (bool) – whether or not to raise an exception if a date is not found (default true)
kwargs (dict) – passed to Analyzer()
Example:
sim = cv.Sim(analyzers=cv.snapshot('2020-04-04', '2020-04-14')) sim.run() snapshot = sim['analyzers'][0] people = snapshot.snapshots[0] # Option 1 people = snapshot.snapshots['2020-04-04'] # Option 2 people = snapshot.get('2020-04-14') # Option 3 people = snapshot.get(34) # Option 4 people = snapshot.get() # Option 5
-
apply
(sim)[source]¶ Apply analyzer at each time point. The analyzer has full access to the sim object, and typically stores data/results in itself. This is the core method which each analyzer object needs to implement.
- Parameters
sim – the Sim instance
-
class
rsvsim.analysis.
age_histogram
(days=None, states=None, edges=None, datafile=None, sim=None, die=True, **kwargs)[source]¶ Bases:
rsvsim.analysis.Analyzer
Calculate statistics across age bins, including histogram plotting functionality.
- Parameters
days (list) – list of ints/strings/date objects, the days on which to calculate the histograms (default: last day)
states (list) – which states of people to record (default: exposed, tested, diagnosed, dead)
edges (list) – edges of age bins to use (default: 10 year bins from 0 to 100)
datafile (str) – the name of the data file to load in for comparison, or a dataframe of data (optional)
sim (Sim) – only used if the analyzer is being used after a sim has already been run
die (bool) – whether to raise an exception if dates are not found (default true)
kwargs (dict) – passed to Analyzer()
Examples:
sim = cv.Sim(analyzers=cv.age_histogram()) sim.run() agehist = sim.get_analyzer() agehist = cv.age_histogram(sim=sim) # Alternate method agehist.plot()
-
apply
(sim)[source]¶ Apply analyzer at each time point. The analyzer has full access to the sim object, and typically stores data/results in itself. This is the core method which each analyzer object needs to implement.
- Parameters
sim – the Sim instance
-
finalize
(sim)[source]¶ Finalize analyzer
This method is run once as part of sim.finalize() enabling the analyzer to perform any final operations after the simulation is complete (e.g. rescaling)
-
plot
(windows=False, width=0.8, color='#F8A493', fig_args=None, axis_args=None, data_args=None)[source]¶ Simple method for plotting the histograms.
- Parameters
windows (bool) – whether to plot windows instead of cumulative counts
width (float) – width of bars
color (hex or rgb) – the color of the bars
fig_args (dict) – passed to pl.figure()
axis_args (dict) – passed to pl.subplots_adjust()
data_args (dict) – ‘width’, ‘color’, and ‘offset’ arguments for the data
-
class
rsvsim.analysis.
daily_age_stats
(states=None, edges=None, **kwargs)[source]¶ Bases:
rsvsim.analysis.Analyzer
Calculate daily counts by age, saving for each day of the simulation. Can plot either time series by age or a histogram over all time.
- Parameters
states (list) – which states of people to record (default: [‘diagnoses’, ‘deaths’, ‘tests’, ‘severe’])
edges (list) – edges of age bins to use (default: 10 year bins from 0 to 100)
kwargs (dict) – passed to Analyzer()
Examples:
sim = cv.Sim(analyzers=cv.daily_age_stats()) sim = cv.Sim(pars, analyzers=daily_age) sim.run() daily_age = sim.get_analyzer() daily_age.plot() daily_age.plot(total=True)
-
apply
(sim)[source]¶ Apply analyzer at each time point. The analyzer has full access to the sim object, and typically stores data/results in itself. This is the core method which each analyzer object needs to implement.
- Parameters
sim – the Sim instance
-
plot
(total=False, do_show=None, fig_args=None, axis_args=None, plot_args=None, dateformat='%b-%d', width=0.8, color='#F8A493', data_args=None)[source]¶ Plot the results.
- Parameters
total (bool) – whether to plot the total histograms rather than time series
do_show (bool) – whether to show the plot
fig_args (dict) – passed to pl.figure()
axis_args (dict) – passed to pl.subplots_adjust()
plot_args (dict) – passed to pl.plot()
dateformat (str) – the format to use for the x-axes (only used for time series)
width (float) – width of bars (only used for histograms)
color (hex/rgb) – the color of the bars (only used for histograms)
-
class
rsvsim.analysis.
daily_stats
(days=None, verbose=True, reporter=None, save_inds=False, **kwargs)[source]¶ Bases:
rsvsim.analysis.Analyzer
Print out daily statistics about the simulation. Note that this analyzer takes a considerable amount of time, so should be used primarily for debugging, not in production code. To keep the intervention but toggle it off, pass an empty list of days.
To show the stats for a day after a run has finished, use e.g.
daily_stats.report('2020-04-04')
.- Parameters
days (list) – days on which to print out statistics (if None, assume all)
verbose (bool) – whether to print on each timestep
reporter (func) – if supplied, a custom parser of the stats object into a report (see make_report() function for syntax)
save_inds (bool) – whether to save the indices of every infection at every timestep (also recoverable from the infection log)
Example:
sim = cv.Sim(analyzers=cv.daily_stats()) sim.run() sim['analyzers'][0].plot()
-
intersect
(*args)[source]¶ Compute the intersection between arrays of indices, handling either keys to precomputed indices or lists of indices. With two array inputs, simply performs np.intersect1d(arr1, arr2).
-
apply
(sim)[source]¶ Apply analyzer at each time point. The analyzer has full access to the sim object, and typically stores data/results in itself. This is the core method which each analyzer object needs to implement.
- Parameters
sim – the Sim instance
-
transpose
(keys=None)[source]¶ Transpose the data from a list-of-dicts-of-dicts to a dict-of-dicts-of-lists
-
plot
(fig_args=None, axis_args=None, plot_args=None, do_show=None)[source]¶ Plot the daily statistics recorded. Some overlap with e.g.
sim.plot(to_plot='overview')
.- Parameters
fig_args (dict) – passed to pl.figure()
axis_args (dict) – passed to pl.subplots_adjust()
plot_args (dict) – passed to pl.plot()
do_show (bool) – whether to show the plot
-
class
rsvsim.analysis.
nab_histogram
(days=None, edges=None, **kwargs)[source]¶ Bases:
rsvsim.analysis.Analyzer
Store histogram of log_{10}(NAb) distribution
- Parameters
days (list) – days on which calculate the NAb histogram (if None, assume last day)
edges (list) – log10 bin edges for histogram
Example:
sim = cv.Sim(analyzers=cv.nab_histogram()) sim.run() sim['analyzers'][0].plot()
-
class
rsvsim.analysis.
Fit
(sim, weights=None, keys=None, custom=None, compute=True, verbose=False, die=True, **kwargs)[source]¶ Bases:
rsvsim.analysis.Analyzer
A class for calculating the fit between the model and the data. Note the following terminology is used here:
fit: nonspecific term for how well the model matches the data
difference: the absolute numerical differences between the model and the data (one time series per result)
goodness-of-fit: the result of passing the difference through a statistical function, such as mean squared error
loss: the goodness-of-fit for each result multiplied by user-specified weights (one time series per result)
mismatches: the sum of all the losses (a single scalar value per time series)
mismatch: the sum of the mismatches – this is the value to be minimized during calibration
- Parameters
sim (Sim) – the sim object
weights (dict) – the relative weight to place on each result (by default: 10 for deaths, 5 for diagnoses, 1 for everything else)
keys (list) – the keys to use in the calculation
custom (dict) – a custom dictionary of additional data to fit; format is e.g. {‘my_output’:{‘data’:[1,2,3], ‘sim’:[1,2,4], ‘weights’:2.0}}
compute (bool) – whether to compute the mismatch immediately
verbose (bool) – detail to print
die (bool) – whether to raise an exception if no data are supplied
kwargs (dict) – passed to cv.compute_gof() – see this function for more detail on goodness-of-fit calculation options
Example:
sim = cv.Sim(datafile='my-data-file.csv') sim.run() fit = sim.compute_fit() fit.plot()
-
plot
(keys=None, width=0.8, fig_args=None, axis_args=None, plot_args=None, date_args=None, do_show=None, fig=None)[source]¶ Plot the fit of the model to the data. For each result, plot the data and the model; the difference; and the loss (weighted difference). Also plots the loss as a function of time.
- Parameters
keys (list) – which keys to plot (default, all)
width (float) – bar width
fig_args (dict) – passed to pl.figure()
axis_args (dict) – passed to pl.subplots_adjust()
plot_args (dict) – passed to pl.plot()
date_args (dict) – passed to cv.plotting.reset_ticks() (handle date format, rotation, etc.)
do_show (bool) – whether to show the plot
fig (fig) – if supplied, use this figure to plot in
- Returns
Figure object
-
class
rsvsim.analysis.
Calibration
(sim, calib_pars=None, fit_args=None, custom_fn=None, par_samplers=None, n_trials=None, n_workers=None, total_trials=None, name=None, db_name=None, storage=None, label=None, verbose=True)[source]¶ Bases:
rsvsim.analysis.Analyzer
A class to handle calibration of RSVsim simulations. Uses the Optuna hyperparameter optimization library (optuna.org), which must be installed separately (via pip install optuna).
Note: running a calibration does not guarantee a good fit! You must ensure that you run for a sufficient number of iterations, have enough free parameters, and that the parameters have wide enough bounds. Please see the tutorial on calibration for more information.
- Parameters
sim (Sim) – the simulation to calibrate
calib_pars (dict) – a dictionary of the parameters to calibrate of the format dict(key1=[best, low, high])
fit_args (dict) – a dictionary of options that are passed to sim.compute_fit() to calculate the goodness-of-fit
par_samplers (dict) – an optional mapping from parameters to the Optuna sampler to use for choosing new points for each; by default, suggest_uniform
custom_fn (func) – a custom function for modifying the simulation; receives the sim and calib_pars as inputs, should return the modified sim
n_trials (int) – the number of trials per worker
n_workers (int) – the number of parallel workers (default: maximum
total_trials (int) – if n_trials is not supplied, calculate by dividing this number by n_workers)
name (str) – the name of the database (default: ‘rsvsim_calibration’)
db_name (str) – the name of the database file (default: ‘rsvsim_calibration.db’)
storage (str) – the location of the database (default: sqlite)
label (str) – a label for this calibration object
verbose (bool) – whether to print details of the calibration
kwargs (dict) – passed to cv.Calibration()
- Returns
A Calibration object
Example:
sim = cv.Sim(datafile='data.csv') calib_pars = dict(beta=[0.015, 0.010, 0.020]) calib = cv.Calibration(sim, calib_pars, total_trials=100) calib.calibrate() calib.plot()
New in version 3.0.3.
-
calibrate
(calib_pars=None, verbose=True, **kwargs)[source]¶ Actually perform calibration.
- Parameters
calib_pars (dict) – if supplied, overwrite stored calib_pars
verbose (bool) – whether to print output from each trial
kwargs (dict) – if supplied, overwrite stored run_args (n_trials, n_workers, etc.)
-
class
rsvsim.analysis.
TransTree
(sim, to_networkx=False, **kwargs)[source]¶ Bases:
rsvsim.analysis.Analyzer
A class for holding a transmission tree. There are several different representations of the transmission tree: “infection_log” is copied from the people object and is the simplest representation. “detailed h” includes additional attributes about the source and target. If NetworkX is installed (required for most methods), “graph” includes an NX representation of the transmission tree.
- Parameters
sim (Sim) – the sim object
to_networkx (bool) – whether to convert the graph to a NetworkX object
Example:
sim = cv.Sim().run() sim.run() tt = sim.make_transtree() tt.plot() tt.plot_histograms()
New in version 2.1.0:
tt.detailed
is a dataframe rather than a list of dictionaries; for the latter, usett.detailed.to_dict('records')
.-
count_targets
(start_day=None, end_day=None)[source]¶ Count the number of targets each infected person has. If start and/or end days are given, it will only count the targets of people who got infected between those dates (it does not, however, filter on the date the target got infected).
- Parameters
start_day (int/str) – the day on which to start counting people who got infected
end_day (int/str) – the day on which to stop counting people who got infected
-
count_transmissions
()[source]¶ Iterable over edges corresponding to transmission events
This excludes edges corresponding to seeded infections without a source
-
make_detailed
(people, reset=False)[source]¶ Construct a detailed transmission tree, with additional information for each person
-
r0
(recovered_only=False)[source]¶ Return average number of transmissions per person
This doesn’t include seed transmissions. By default, it also doesn’t adjust for length of infection (e.g. people infected towards the end of the simulation will have fewer transmissions because their infection may extend past the end of the simulation, these people are not included). If ‘recovered_only=True’ then the downstream transmissions will only be included for people that recover before the end of the simulation, thus ensuring they all had the same amount of time to transmit.
-
plot
(fig_args=None, plot_args=None, do_show=None, fig=None)[source]¶ Plot the transmission tree.
- Parameters
fig_args (dict) – passed to pl.figure()
plot_args (dict) – passed to pl.plot()
do_show (bool) – whether to show the plot
fig (fig) – if supplied, use this figure
-
animate
(*args, **kwargs)[source]¶ Animate the transmission tree.
- Parameters
animate (bool) – whether to animate the plot (otherwise, show when finished)
verbose (bool) – print out progress of each frame
markersize (int) – size of the markers
sus_color (list) – color for susceptibles
fig_args (dict) – arguments passed to pl.figure()
axis_args (dict) – arguments passed to pl.subplots_adjust()
plot_args (dict) – arguments passed to pl.plot()
delay (float) – delay between frames in seconds
colors (list) – color of each person
cmap (str) – colormap for each person (if colors is not supplied)
fig (fig) – if supplied, use this figure
- Returns
the figure object
- Return type
fig
-
plot_histograms
(start_day=None, end_day=None, bins=None, width=0.8, fig_args=None, fig=None)[source]¶ Plots a histogram of the number of transmissions.
- Parameters
start_day (int/str) – the day on which to start counting people who got infected
end_day (int/str) – the day on which to stop counting people who got infected
bins (list) – bin edges to use for the histogram
width (float) – width of bars
fig_args (dict) – passed to pl.figure()
fig (fig) – if supplied, use this figure