Analyzers
Analyzers compare simulation output against reference data and return a scalar score (likelihood or error) per simulation. They extend idmtools' IAnalyzer interface.
BaseCalibrationAnalyzer
| from idmtools_calibra.analyzers.base_calibration_analyzer import BaseCalibrationAnalyzer
|
Abstract base class for all calibration analyzers. Extends idmtools IAnalyzer with calibration-specific fields.
Constructor
| BaseCalibrationAnalyzer(
uid=None,
working_dir=None,
parse=True,
need_dir_map=False,
filenames=None,
reference_data=None,
weight=1
)
|
| Parameter |
Type |
Default |
Description |
uid |
str |
None |
Unique identifier for this analyzer instance |
working_dir |
str |
None |
Working directory for output files |
parse |
bool |
True |
Whether to parse simulation output files |
need_dir_map |
bool |
False |
Whether to use a directory map |
filenames |
List[str] |
None |
Output filenames to retrieve from each simulation |
reference_data |
any |
None |
Reference/observed data to compare against |
weight |
float |
1 |
Relative weight of this analyzer when combining scores across multiple analyzers |
Methods to implement
Subclasses must implement the standard IAnalyzer interface:
| Method |
Signature |
Description |
map |
(data, item) -> any |
Extract and align data for a single simulation |
reduce |
(all_data) -> DataFrame |
Aggregate map results across all simulations into a score per sample |
Example
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19 | from idmtools_calibra.analyzers.base_calibration_analyzer import BaseCalibrationAnalyzer
import pandas as pd
import numpy as np
class MyAnalyzer(BaseCalibrationAnalyzer):
def __init__(self, site, weight=1):
self.reference = site.get_reference_data()
super().__init__(filenames=['output/results.csv'], weight=weight)
def map(self, data, item):
sim_df = data['output/results.csv']
return sim_df.merge(self.reference, on='time')
def reduce(self, all_data):
scores = {}
for sim_id, merged in all_data.items():
rmse = np.sqrt(((merged['model'] - merged['reference']) ** 2).mean())
scores[sim_id] = -rmse # negative because higher = better
return pd.DataFrame({'score': scores})
|
RMSEAnalyzer
| from idmtools_calibra.analyzers.rmse_analyzer import RMSEAnalyzer
|
A ready-to-use analyzer that computes Root Mean Squared Error (RMSE) between a simulation output CSV and reference data. Automatically wired up by RMSESiteSingleChannel.
Constructor
| RMSEAnalyzer(
site,
dependent_column,
independent_column,
output_filename='output.csv'
)
|
| Parameter |
Type |
Default |
Description |
site |
CalibSite |
— |
The calibration site providing reference data (via site.get_reference_data()) |
dependent_column |
str |
— |
Name of the column to compare (the quantity being calibrated) |
independent_column |
str |
— |
Name of the column used to align reference and model data (e.g. 'time') |
output_filename |
str |
'output.csv' |
Model output filename inside the output/ directory |
Custom cost function
By default RMSE uses scikit-learn's root_mean_squared_error. Override the cost function for the entire class:
| RMSEAnalyzer.set_custom_cost_fn(my_cost_fn)
# my_cost_fn(series_model, series_reference, series_weights) -> float
|
Example
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16 | from idmtools_calibra.analyzers.rmse_analyzer import RMSEAnalyzer
from idmtools_calibra.rmse_site import RMSESiteSingleChannel
site = RMSESiteSingleChannel(
name='my_site',
reference_sources={'data': 'reference/output.csv'}
)
# RMSEAnalyzer is created automatically by RMSESiteSingleChannel.
# To use it directly:
analyzer = RMSEAnalyzer(
site=site,
dependent_column='prevalence',
independent_column='year',
output_filename='output.csv',
)
|