hpvsim.misc module¶
Miscellaneous functions that do not belong anywhere else
- date(obj=None, *args, start_date=None, readformat=None, to='date', as_date=None, outformat=None, **kwargs)[source]¶
Convert any reasonable object – a string, integer, or datetime object, or list/array of any of those – to a date object (or string, pandas, or numpy date).
If the object is an integer, this is interpreted as follows:
With readformat=’posix’: treat as a POSIX timestamp, in seconds from 1970
With readformat=’ordinal’/’matplotlib’: treat as an ordinal number of days from 1970 (Matplotlib default)
With start_date provided: treat as a number of days from this date
Note: in this and other date functions, arguments work either with or without underscores (e.g.
start_date
orstartdate
)- Parameters:
obj (str/int/date/datetime/list/array) – the object to convert; if None, return current date
args (str/int/date/datetime) – additional objects to convert
start_date (str/date/datetime) – the starting date, if an integer is supplied
readformat (str/list) – the format to read the date in; passed to
sc.readdate()
(NB: can also use “format” instead of “readformat”)to (str) – the output format: ‘date’ (default), ‘datetime’, ‘str’ (or ‘string’), ‘pandas’, or ‘numpy’
as_date (bool) – alternate method of choosing between output format of ‘date’ (True) or ‘str’ (False); if None, use “to” instead
outformat (str) – the format to output the date in, if returning a string
kwargs (dict) – only used for deprecated argument aliases
- Returns:
either a single date object, or a list of them (matching input data type where possible)
- Return type:
dates (date or list)
Examples:
sc.date('2020-04-05') # Returns datetime.date(2020, 4, 5) sc.date([35,36,37], start_date='2020-01-01', to='str') # Returns ['2020-02-05', '2020-02-06', '2020-02-07'] sc.date(1923288822, readformat='posix') # Interpret as a POSIX timestamp
New in version 1.0.0.New in version 1.2.2: “readformat” argument; renamed “dateformat” to “outformat”New in version 2.0.0: support fornp.datetime64
objectsNew in version 3.0.0: added “to” argument, and support forpd.Timestamp
andnp.datetime64
output; allow NoneNew in version 3.1.0: allow “datetime” output
- day(obj, *args, start_date=None, **kwargs)[source]¶
Convert a string, date/datetime object, or int to a day (int), the number of days since the start day. See also
sc.date()
andsc.daydiff() <daydiff>`()
. If a start day is not supplied, it returns the number of days into the current year.- Parameters:
obj (str, date, int, list, array) – convert any of these objects to a day relative to the start day
args (list) – additional days
start_date (str or date) – the start day; if none is supplied, return days since (supplied year)-01-01.
- Returns:
the day(s) in simulation time (matching input data type where possible)
- Return type:
days (int or list)
Examples:
sc.day(sc.now()) # Returns how many days into the year we are sc.day(['2021-01-21', '2024-04-04'], start_date='2022-02-22') # Days can be positive or negative
New in version 1.0.0.New in version 1.2.2: renamed “start_day” to “start_date”
- daydiff(*args)[source]¶
Convenience function to find the difference between two or more days. With only one argument, calculate days since Jan. 1st.
Examples:
diff = sc.daydiff('2020-03-20', '2020-04-05') # Returns 16 diffs = sc.daydiff('2020-03-20', '2020-04-05', '2020-05-01') # Returns [16, 26] doy = sc.daydiff('2022-03-20') # Returns 79, the number of days since 2022-01-01
New in version 1.0.0.New in version 3.0.0: Calculated relative days with one argument
- date_range(start_date=None, end_date=None, interval=None, inclusive=True, as_date=None, readformat=None, outformat=None, **kwargs)¶
Return a list of dates from the start date to the end date. To convert a list of days (as integers) to dates, use
sc.date()
instead.Note: instead of an end date, can also pass one or more of days, months, weeks, or years, which will be added on to the start date via
sc.datedelta()
.- Parameters:
start_date (int/str/date) – the starting date, in any format
end_date (int/str/date) – the end date, in any format (see also kwargs below)
interval (int/str/dict) – if an int, the number of days; if ‘week’, ‘month’, or ‘year’, one of those; if a dict, passed to
dt.relativedelta()
inclusive (bool) – if True (default), return to end_date inclusive; otherwise, stop the day before
as_date (bool) – if True, return a list of
datetime.date
objects; else, as input type (e.g. strings; note: you can also use “asdate” instead of “as_date”)readformat (str) – passed to
sc.date()
outformat (str) – passed to
sc.date()
kwargs (dict) – optionally, use any valid argument to
sc.datedelta()
to create the end_date
Examples:
dates1 = sc.daterange('2020-03-01', '2020-04-04') dates2 = sc.daterange('2020-03-01', '2022-05-01', interval=dict(months=2), asdate=True) dates3 = sc.daterange('2020-03-01', weeks=5)
New in version 1.0.0.New in version 1.3.0: “interval” argumentNew in version 2.0.0:sc.datedelta()
argumentsNew in version 3.0.0: preserve input type
- load_data(datafile, check_date=False, header='infer', calculate=True, **kwargs)[source]¶
Load data for comparing to the model output, either from file or from a dataframe. Data is expected to be in wide format, with each row representing a year and columns for each variable by genotype/age/sex.
- Parameters:
datafile (str/df) – if a string, the name of the file to load (either Excel or CSV); if a dataframe, use directly
start_year (int) – first year with data available
kwargs (dict) – passed to pd.read_excel()
- Returns:
pandas dataframe of the loaded data
- Return type:
data (dataframe)
- load(*args, update=True, verbose=True, **kwargs)[source]¶
Convenience method for sc.loadobj() and equivalent to hpv.Sim.load() or hpv.Scenarios.load().
- Parameters:
filename (str) – file to load
do_migrate (bool) – whether to migrate if loading an old object
update (bool) – whether to modify the object to reflect the new version
verbose (bool) – whether to print migration information
args (list) – passed to sc.loadobj()
kwargs (dict) – passed to sc.loadobj()
- Returns:
Loaded object
Examples:
sim = hpv.load('calib.sim') # Equivalent to hpv.Sim.load('calib.sim') scens = hpv.load(filename='school-closures.scens', folder='schools')
- save(*args, **kwargs)[source]¶
Convenience method for sc.saveobj() and equivalent to hpv.Sim.save() or hpv.Scenarios.save().
- Parameters:
filename (str) – file to save to
obj (object) – object to save
args (list) – passed to sc.saveobj()
kwargs (dict) – passed to sc.saveobj()
- Returns:
Filename the object is saved to
Examples:
hpv.save('calib.sim', sim) # Equivalent to sim.save('calib.sim') hpv.save(filename='school-closures.scens', folder='schools', obj=scens)
- savefig(filename=None, comments=None, fig=None, **kwargs)[source]¶
Wrapper for Matplotlib’s
pl.savefig()
function which automatically stores HPVsim metadata in the figure.By default, saves (git) information from both the HPVsim version and the calling function. Additional comments can be added to the saved file as well. These can be retrieved via
hpv.get_png_metadata()
(orsciris.sc_plotting.loadmetadata()
). Metadata can also be stored for PDF, but cannot be automatically retrieved.- Parameters:
filename (str/list) – name of the file to save to (default, timestamp); can also be a list of names
comments (str/dict) – additional metadata to save to the figure
fig (fig/list) – figure to save (by default, current one); can also be a list of figures
kwargs (dict) – passed to
fig.savefig()
Example:
hpv.Sim().run().plot() hpv.savefig()
- git_info(filename=None, check=False, comments=None, old_info=None, die=False, indent=2, verbose=True, frame=2, **kwargs)[source]¶
Get current git information and optionally write it to disk. Simplest usage is hpv.git_info(__file__)
- Parameters:
filename (str) – name of the file to write to or read from
check (bool) – whether or not to compare two git versions
comments (dict) – additional comments to include in the file
old_info (dict) – dictionary of information to check against
die (bool) – whether or not to raise an exception if the check fails
indent (int) – how many indents to use when writing the file to disk
verbose (bool) – detail to print
frame (int) – how many frames back to look for caller info
kwargs (dict) – passed to sc.loadjson() (if check=True) or sc.savejson() (if check=False)
Examples:
hpv.git_info() # Return information hpv.git_info(__file__) # Writes to disk hpv.git_info('hpvsim_version.gitinfo') # Writes to disk hpv.git_info('hpvsim_version.gitinfo', check=True) # Checks that current version matches saved file
- check_version(expected, die=False, verbose=True)[source]¶
Get current git information and optionally write it to disk. The expected version string may optionally start with ‘>=’ or ‘<=’ (== is implied otherwise), but other operators (e.g. ~=) are not supported. Note that e.g. ‘>’ is interpreted to mean ‘>=’.
- Parameters:
expected (str) – expected version information
die (bool) – whether or not to raise an exception if the check fails
Example:
hpv.check_version('>=1.7.0', die=True) # Will raise an exception if an older version is used
- check_save_version(expected=None, filename=None, die=False, verbose=True, **kwargs)[source]¶
A convenience function that bundles check_version with git_info and saves automatically to disk from the calling file. The idea is to put this at the top of an analysis script, and commit the resulting file, to keep track of which version of HPVsim was used.
- Parameters:
expected (str) – expected version information
filename (str) – file to save to; if None, guess based on current file name
kwargs (dict) – passed to git_info(), and thence to sc.savejson()
Examples:
hpv.check_save_version() hpv.check_save_version('1.3.2', filename='script.gitinfo', comments='This is the main analysis script') hpv.check_save_version('1.7.2', folder='gitinfo', comments={'SynthPops':sc.gitinfo(sp.__file__)})
- get_version_pars(version, verbose=True)[source]¶
Function for loading parameters from the specified version.
Parameters will be loaded for HPVsim ‘as at’ the requested version i.e. the most recent set of parameters that is <= the requested version. Available parameter values are stored in the regression folder. If parameters are available for versions 1.3, and 1.4, then this function will return the following
If parameters for version ‘1.3’ are requested, parameters will be returned from ‘1.3’
If parameters for version ‘1.3.5’ are requested, parameters will be returned from ‘1.3’, since HPVsim at version 1.3.5 would have been using the parameters defined at version 1.3.
If parameters for version ‘1.4’ are requested, parameters will be returned from ‘1.4’
- Parameters:
version (str) – the version to load parameters from
- Returns:
Dictionary of parameters from that version
- get_png_metadata(filename, output=False)[source]¶
Read metadata from a PNG file. For use with images saved with hpv.savefig(). Requires pillow, an optional dependency. Metadata retrieval for PDF and SVG is not currently supported.
- Parameters:
filename (str) – the name of the file to load the data from
Example:
hpv.Sim().run(do_plot=True) hpv.savefig('hpvsim.png') hpv.get_png_metadata('hpvsim.png')
- get_doubling_time(sim, series=None, interval=None, start_day=None, end_day=None, moving_window=None, exp_approx=False, max_doubling_time=100, eps=0.001, verbose=None)[source]¶
Alternate method to calculate doubling time (one is already implemented in the sim object).
Examples:
hpv.get_doubling_time(sim, interval=[3,30]) # returns the doubling time over the given interval (single float) hpv.get_doubling_time(sim, interval=[3,30], moving_window=3) # returns doubling times calculated over moving windows (array)
- compute_gof(actual, predicted, normalize=True, use_frac=False, use_squared=False, as_scalar='none', eps=1e-09, skestimator=None, estimator=None, **kwargs)[source]¶
Calculate the goodness of fit. By default use normalized absolute error, but highly customizable. For example, mean squared error is equivalent to setting normalize=False, use_squared=True, as_scalar=’mean’.
- Parameters:
actual (arr) – array of actual (data) points
predicted (arr) – corresponding array of predicted (model) points
normalize (bool) – whether to divide the values by the largest value in either series
use_frac (bool) – convert to fractional mismatches rather than absolute
use_squared (bool) – square the mismatches
as_scalar (str) – return as a scalar instead of a time series: choices are sum, mean, median
eps (float) – to avoid divide-by-zero
skestimator (str) – if provided, use this scikit-learn estimator instead
estimator (func) – if provided, use this custom estimator instead
kwargs (dict) – passed to the scikit-learn or custom estimator
- Returns:
array of goodness-of-fit values, or a single value if as_scalar is True
- Return type:
gofs (arr)
Examples:
x1 = np.cumsum(np.random.random(100)) x2 = np.cumsum(np.random.random(100)) e1 = compute_gof(x1, x2) # Default, normalized absolute error e2 = compute_gof(x1, x2, normalize=False, use_frac=False) # Fractional error e3 = compute_gof(x1, x2, normalize=False, use_squared=True, as_scalar='mean') # Mean squared error e4 = compute_gof(x1, x2, skestimator='mean_squared_error') # Scikit-learn's MSE method e5 = compute_gof(x1, x2, as_scalar='median') # Normalized median absolute error -- highly robust
- help(pattern=None, source=False, ignorecase=True, flags=None, context=False, output=False)[source]¶
Get help on HPVsim in general, or search for a word/expression.
- Parameters:
pattern (str) – the word, phrase, or regex to search for
source (bool) – whether to search source code instead of docstrings for matches
ignorecase (bool) – whether to ignore case (equivalent to
flags=re.I
)flags (list) – additional flags to pass to
re.findall()
context (bool) – whether to show the line(s) of matches
output (bool) – whether to return the dictionary of matches
Examples:
hpv.help() hpv.help('vaccine') hpv.help('contact', ignorecase=False, context=True) hpv.help('lognormal', source=True, context=True)
New in version 3.1.2.