History Matching: Manual Workflow¶
This notebook shows how to use history matching in manual mode, where you control each iteration individually. Manual mode lets you:
- Inspect results before committing to the next iteration
- Change feature selection or emulator type between iterations
- Evaluate emulator diagnostics at each step
- Gain a detailed understanding of what the algorithm is doing
For the fully automated one-call workflow, see 01_basic_workflow.ipynb.
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import historymatching as hm
from model import SIR, generate_observed_data
%matplotlib inline
WARNING: All log messages before absl::InitializeLog() is called are written to STDERR I0000 00:00:1780533729.336724 1035844 port.cc:153] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`. I0000 00:00:1780533729.445022 1035844 cpu_feature_guard.cc:227] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations. To enable the following instructions: AVX2 AVX_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
WARNING: All log messages before absl::InitializeLog() is called are written to STDERR I0000 00:00:1780533730.802356 1035844 port.cc:153] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
/software/conda/envs/hm/lib/python3.13/site-packages/gpflow/versions.py:1: UserWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html. The pkg_resources package is slated for removal as early as 2025-11-30. Refrain from using this package or pin to Setuptools<81. import pkg_resources
Generate Synthetic Observed Data¶
We define true parameter values and generate synthetic incidence data that we will try to recover through history matching.
beta_true = 1.3
gamma_true = 0.5
population_size = 10_000
n_seed_infections = 100
incidence_obs, true_model = generate_observed_data(
beta_true=beta_true,
gamma_true=gamma_true,
population_size=population_size,
n_seed_infections=n_seed_infections,
seed=0,
)
data = incidence_obs.to_frame() # DataFrame with column 'incidence', index 'day'
data.head()
| incidence | |
|---|---|
| day | |
| 0 | 0.0 |
| 1 | 177.0 |
| 2 | 415.0 |
| 3 | 762.0 |
| 4 | 1246.0 |
data.plot(style="o")
plt.title("Observed daily incidence")
plt.ylabel("New cases")
plt.xlabel("Day")
plt.grid(linestyle=":")
plt.show()
Define Parameter Space and Observations¶
parameter_bounds = {
"beta": (0.3, 3.0), # narrower than 01 — avoids extinction / burn-through
"gamma": (0.1, 1.0),
}
# One observation per day of the incidence time series
observations_dict = {
f"incidence_{day}": (value, 10.0)
for day, value in enumerate(data["incidence"])
}
print(f"Parameter space: {parameter_bounds}")
print(f"Observations: {len(observations_dict)} features (one per day)")
Parameter space: {'beta': (0.3, 3.0), 'gamma': (0.1, 1.0)}
Observations: 20 features (one per day)
Define Simulation Function¶
The simulation function must accept a DataFrame of parameter samples and return a DataFrame of outputs. Column names must match the observation keys.
def sir_model_wrapper(points: pd.DataFrame) -> pd.DataFrame:
"""Run SIR for each row and return incidence time series."""
results = []
for parameters in points.itertuples():
model = SIR(
beta=parameters.beta,
gamma=parameters.gamma,
s0=population_size - n_seed_infections,
i0=n_seed_infections,
)
results.append(pd.Series(model.get_incidence()))
return pd.concat(results, axis=1).T.add_prefix("incidence_").reset_index(drop=True)
# Test with a few random points to confirm the wrapper works
np.random.seed(42)
test_points = pd.DataFrame({
"beta": np.random.uniform(0.1, 5.0, 5),
"gamma": np.random.uniform(0.1, 1.0, 5),
})
test_results = sir_model_wrapper(test_points)
print(f"Output shape: {test_results.shape} (5 runs x {test_results.shape[1]} days)")
Output shape: (5, 20) (5 runs x 20 days)
Configure the History Matching Engine¶
We use the builder to configure the engine with a manually chosen feature for the first iteration. Feature selection can be updated before each subsequent iteration.
engine = hm.HistoryMatching(
bounds=parameter_bounds,
observations=observations_dict,
function=sir_model_wrapper,
sampling_strategy="lhs",
emulator_type="gpr",
n_samples=500,
max_iterations=3,
implausibility_threshold=3.0,
random_seed=42,
)
print(f"Engine ready.")
print(f" Samples per iteration: {engine.n_samples}")
Engine ready. Samples per iteration: 500
Iteration 1¶
Run the first iteration, inspect the results, then commit.
engine.feature_selection = ["incidence_11"]
print("Running iteration 1...")
result_1 = engine.step()
print(f" Samples generated: {len(result_1.samples)}")
print(f" NROY fraction: {result_1.nroy_fraction:.1%}")
print(f" Outputs emulated: {result_1.emulated_outputs}")
print(f" Emulators trained: {len(result_1.emulators)}")
Running iteration 1...
I0000 00:00:1780533734.775735 1035844 gpu_device.cc:2043] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 2175 MB memory: -> device: 0, name: NVIDIA GeForce RTX 3050 Laptop GPU, pci bus id: 0000:01:00.0, compute capability: 8.6
I0000 00:00:1780533740.927591 1036810 cuda_solvers.cc:175] Creating GpuSolver handles for stream 0x5701e31fe940
Samples generated: 500 NROY fraction: 11.6% Outputs emulated: ['incidence_11'] Emulators trained: 1
Inspect Emulator Performance¶
emulator_1 = result_1.get_emulator("incidence_11")
print(f"Emulator type: {type(emulator_1).__name__}")
emulator_1.info()
Emulator type: GPR
... General information:
Number of parameters = 2
Number of samples (total) = 500
Number of training samples = 375
Number of testing samples = 125
... Emulator configuration:
model description:
╒═════════════════════════╤═══════════╤═════════════╤═════════╤═════════════╤═════════╤═════════╤═════════════════════════╕
│ name │ class │ transform │ prior │ trainable │ shape │ dtype │ value │
╞═════════════════════════╪═══════════╪═════════════╪═════════╪═════════════╪═════════╪═════════╪═════════════════════════╡
│ GPR.mean_function.c │ Parameter │ Identity │ │ True │ (1,) │ float64 │ [-0.10196192] │
├─────────────────────────┼───────────┼─────────────┼─────────┼─────────────┼─────────┼─────────┼─────────────────────────┤
│ GPR.kernel.variance │ Parameter │ Softplus │ │ True │ () │ float64 │ 0.9141858201458425 │
├─────────────────────────┼───────────┼─────────────┼─────────┼─────────────┼─────────┼─────────┼─────────────────────────┤
│ GPR.kernel.lengthscales │ Parameter │ Softplus │ │ True │ (2,) │ float64 │ [0.07853018 0.25184734] │
├─────────────────────────┼───────────┼─────────────┼─────────┼─────────────┼─────────┼─────────┼─────────────────────────┤
│ GPR.likelihood.variance │ Parameter │ Softplus │ │ True │ () │ float64 │ 0.01874381345845609 │
╘═════════════════════════╧═══════════╧═════════════╧═════════╧═════════════╧═════════╧═════════╧═════════════════════════╛
optimization logs:
message: CONVERGENCE: RELATIVE REDUCTION OF F <= FACTR*EPSMCH
success: True
status: 0
fun: -58.16664669341981
x: [-2.505e+00 -1.250e+00 4.020e-01 -3.968e+00 -1.020e-01]
nit: 21
jac: [-9.377e-05 -4.812e-05 4.169e-05 -4.363e-05 -5.532e-06]
nfev: 29
njev: 29
hess_inv: <5x5 LbfgsInvHessProduct with dtype=float64>
This emulator has not been tested yet
emulator_1.test()
emulator_1.plot_diagnostics()
Inspect Parameter Space After Iteration 1¶
samples_1 = result_1.samples
pending_2 = engine.get_pending_next_samples()
fig, axes = plt.subplots(1, 2, figsize=(12, 4))
ax = axes[0]
ax.scatter(samples_1["beta"], samples_1["gamma"], alpha=0.6, s=20, color="gray", label="Iteration 1 samples")
ax.scatter(pending_2["beta"], pending_2["gamma"], alpha=0.6, s=20, color="green", label="Pending for iteration 2")
ax.axvline(beta_true, color="red", linestyle="--", linewidth=2, label=f"True beta={beta_true}")
ax.axhline(gamma_true, color="red", linestyle="--", linewidth=2, label=f"True gamma={gamma_true}")
ax.set_xlabel("beta")
ax.set_ylabel("gamma")
ax.set_title("Parameter space after iteration 1")
ax.legend(fontsize=8)
ax.grid(True, alpha=0.3)
ax = axes[1]
sims = sir_model_wrapper(samples_1)
days = np.arange(sims.shape[1])
obs_vals = data["incidence"].values
# Plot each sim as a thin gray line (no legend entries)
for row in sims.values:
ax.plot(days, row, color="gray", alpha=0.15, linewidth=0.8)
ax.plot(days, obs_vals, "ro-", linewidth=2, markersize=4, label="Observed")
# Highlight the emulated time-point(s) for this wave
emulated_days = [int(f.split("_")[1]) for f in result_1.emulated_outputs]
ax.scatter(emulated_days, obs_vals[emulated_days], s=100, zorder=5,
facecolors="none", edgecolors="red", linewidths=2, label="Emulated day(s)")
ax.set_xlabel("Day")
ax.set_ylabel("Incidence")
ax.set_title("Simulated trajectories vs observed")
ax.legend(fontsize=8)
ax.grid(linestyle=":")
plt.tight_layout()
plt.show()
print(f"Pending samples for iteration 2: {len(pending_2)}")
print(f" beta range: [{pending_2['beta'].min():.3f}, {pending_2['beta'].max():.3f}]")
print(f" gamma range: [{pending_2['gamma'].min():.3f}, {pending_2['gamma'].max():.3f}]")
Pending samples for iteration 2: 500 beta range: [0.303, 1.814] gamma range: [0.107, 0.998]
# Accept the first iteration
engine.commit_step()
print(f"Iteration 1 committed. Current iteration: {engine.current_iteration}")
INFO:historymatching.engine:Checkpoint saved to hm_output/run_20260603_204213/checkpoint.pkl
INFO:historymatching.engine:Wave 1 output saved to hm_output/run_20260603_204213/wave1
INFO:historymatching.engine:[Wave 1] COMMITTED — diagnostics and checkpoint saved to hm_output/run_20260603_204213
INFO:historymatching.engine:============================================================
Iteration 1 committed. Current iteration: 1
Iteration 2¶
Switch to an earlier time point — day 3 is on the rising limb of the epidemic, so it should constrain different parameter combinations than day 11 (descending limb).
engine.feature_selection = ["incidence_3"]
print("Feature selection updated to ['incidence_3']")
print("Running iteration 2...")
result_2 = engine.step()
print(f" Samples generated: {len(result_2.samples)}")
print(f" NROY fraction: {result_2.nroy_fraction:.1%}")
print(f" Acceptance rate: {engine.acceptance_rate:.3f}")
print(f" Outputs emulated: {result_2.emulated_outputs}")
INFO:historymatching.engine:Feature selection: Manual Selection (1 features)
INFO:historymatching.engine:============================================================
INFO:historymatching.engine:WAVE 2 STARTING
INFO:historymatching.engine:============================================================
INFO:historymatching.engine:[Wave 2] Phase 1/5 SAMPLING: using 500 pre-computed samples from wave 1 [0.0s]
INFO:historymatching.engine:[Wave 2] Phase 2/5 SIMULATION: running 500 simulations...
Feature selection updated to ['incidence_3'] Running iteration 2...
INFO:historymatching.engine:[Wave 2] Phase 2/5 SIMULATION: complete — 20 outputs [0.8s]
INFO:historymatching.engine:[Wave 2] Phase 3/5 FEATURE SELECTION: ['incidence_3'] [0.0s]
INFO:historymatching.engine:[Wave 2] Phase 4/5 EMULATOR TRAINING: training 1 emulators...
INFO:historymatching.emulators.factory: Emulator 1/1 [incidence_3]: creating (500 samples, 2 params, type=gpr)...
DEBUG:historymatching.emulators.factory:Created gpr emulator for feature: incidence_3
INFO:historymatching.emulators.factory: Emulator 1/1 [incidence_3]: training...
INFO:historymatching.emulators.gpr: Training GPR: 375 points, 2 dims
INFO:historymatching.emulators.gpr: Building GPflow model and optimizing hyperparameters...
INFO:historymatching.emulators.gpr: GPR training complete — noise=0.0615, lengthscale range=[0.857, 1.695] [1.1s]
INFO:historymatching.emulators.factory: Emulator 1/1 [incidence_3]: trained [1.1s]
INFO:historymatching.engine:[Wave 2] Phase 4/5 EMULATOR TRAINING: complete [1.1s]
INFO:historymatching.engine:[Wave 2] Phase 5/5 NROY SAMPLING: finding 500 plausible candidates for next wave...
INFO:historymatching.nroy_sampling: LHS rejection: 0/500 (0%) — starting
INFO:historymatching.nroy_sampling: LHS rejection: 38/500 (7%) | 550 tested | rate=6.9091% [0s]
INFO:historymatching.nroy_sampling: LHS rejection: 604/500 (120%) | 7,905 tested | rate=7.6407% [0s]
INFO:historymatching.nroy_sampling: LHS rejection: 500/500 [0.5s]
INFO:historymatching.nroy_sampling: NROY SUMMARY: 500/500 [OK] — sources: LHS=500 [0.5s]
DEBUG:historymatching.engine:NROY fraction: 0.063251
INFO:historymatching.engine:[Wave 2] Phase 5/5 NROY SAMPLING: found 500 candidates [0.5s]
INFO:historymatching.engine:[Wave 2] ALL PHASES COMPLETE [2.4s total]. Committing...
Samples generated: 500 NROY fraction: 6.3% Acceptance rate: 0.118 Outputs emulated: ['incidence_3']
Emulator Diagnostics for Iteration 2¶
emulator_2 = result_2.get_emulator("incidence_3")
emulator_2.test()
# Show GPflow model summary — check likelihood.variance for overconfidence
# (near-zero noise variance → CIs collapse → many "failed" predictions)
emulator_2.info()
emulator_2.plot_diagnostics()
... General information:
Number of parameters = 2
Number of samples (total) = 500
Number of training samples = 375
Number of testing samples = 125
... Emulator configuration:
model description:
╒═════════════════════════╤═══════════╤═════════════╤═════════╤═════════════╤═════════╤═════════╤═════════════════════════╕
│ name │ class │ transform │ prior │ trainable │ shape │ dtype │ value │
╞═════════════════════════╪═══════════╪═════════════╪═════════╪═════════════╪═════════╪═════════╪═════════════════════════╡
│ GPR.mean_function.c │ Parameter │ Identity │ │ True │ (1,) │ float64 │ [7.25208] │
├─────────────────────────┼───────────┼─────────────┼─────────┼─────────────┼─────────┼─────────┼─────────────────────────┤
│ GPR.kernel.variance │ Parameter │ Softplus │ │ True │ () │ float64 │ 60.92584 │
├─────────────────────────┼───────────┼─────────────┼─────────┼─────────────┼─────────┼─────────┼─────────────────────────┤
│ GPR.kernel.lengthscales │ Parameter │ Softplus │ │ True │ (2,) │ float64 │ [0.85660914 1.6946 ] │
├─────────────────────────┼───────────┼─────────────┼─────────┼─────────────┼─────────┼─────────┼─────────────────────────┤
│ GPR.likelihood.variance │ Parameter │ Softplus │ │ True │ () │ float64 │ 0.061463321453403774 │
╘═════════════════════════╧═══════════╧═════════════╧═════════╧═════════════╧═════════╧═════════╧═════════════════════════╛
optimization logs:
message: CONVERGENCE: RELATIVE REDUCTION OF F <= FACTR*EPSMCH
success: True
status: 0
fun: 36.20717402944217
x: [ 3.039e-01 1.492e+00 6.093e+01 -2.758e+00 7.252e+00]
nit: 38
jac: [ 4.797e-07 1.271e-06 -4.230e-08 1.779e-05 3.117e-08]
nfev: 44
njev: 44
hess_inv: <5x5 LbfgsInvHessProduct with dtype=float64>
... Performance results:
MSE = 5891.911769132801
L-inf = 277.58583394692494
L-1 = 57.64705154488413
R2 = 0.9413891248720488
AIC = 1089.166975347683
BIC = 1104.4802302968924
Parameter Space After Iteration 2¶
# result_2.samples = training data for iter 2 (passed iter 1 only)
# pending_3 = samples that passed BOTH iter 1 and iter 2 emulators
pending_3 = engine.get_pending_next_samples()
fig, axes = plt.subplots(1, 2, figsize=(12, 4))
ax = axes[0]
ax.scatter(result_2.samples["beta"], result_2.samples["gamma"],
alpha=0.3, s=15, color="lightgray", label="Iter 2 training (passed iter 1)")
ax.scatter(pending_3["beta"], pending_3["gamma"],
alpha=0.7, s=20, color="steelblue", label="NROY after iter 2")
ax.axvline(beta_true, color="red", linestyle="--", linewidth=2, label=f"True beta={beta_true}")
ax.axhline(gamma_true, color="red", linestyle="--", linewidth=2, label=f"True gamma={gamma_true}")
ax.set_xlabel("beta")
ax.set_ylabel("gamma")
ax.set_title("Parameter space after iteration 2")
ax.legend(fontsize=8)
ax.grid(True, alpha=0.3)
ax = axes[1]
sims_nroy = sir_model_wrapper(pending_3)
days = np.arange(sims_nroy.shape[1])
obs_vals = data["incidence"].values
for row in sims_nroy.values:
ax.plot(days, row, color="steelblue", alpha=0.15, linewidth=0.8)
ax.plot(days, obs_vals, "ro-", linewidth=2, markersize=4, label="Observed")
# Highlight all emulated time-points so far (iterations 1 + 2)
all_emulated = set()
for r in [result_1, result_2]:
all_emulated.update(int(f.split("_")[1]) for f in r.emulated_outputs)
all_emulated = sorted(all_emulated)
ax.scatter(all_emulated, obs_vals[all_emulated], s=100, zorder=5,
facecolors="none", edgecolors="red", linewidths=2, label="Emulated day(s)")
ax.set_xlabel("Day")
ax.set_ylabel("Incidence")
ax.set_title("NROY trajectories after 2 iterations")
ax.legend(fontsize=8)
ax.grid(linestyle=":")
plt.tight_layout()
plt.show()
print(f"NROY samples after iteration 2: {len(pending_3)}")
print(f" beta range: [{pending_3['beta'].min():.3f}, {pending_3['beta'].max():.3f}]")
print(f" gamma range: [{pending_3['gamma'].min():.3f}, {pending_3['gamma'].max():.3f}]")
beta_ok = pending_3["beta"].min() <= beta_true <= pending_3["beta"].max()
gamma_ok = pending_3["gamma"].min() <= gamma_true <= pending_3["gamma"].max()
print(f"True beta in plausible region: {'Yes' if beta_ok else 'No'}")
print(f"True gamma in plausible region: {'Yes' if gamma_ok else 'No'}")
NROY samples after iteration 2: 500 beta range: [0.895, 1.797] gamma range: [0.101, 1.000] True beta in plausible region: Yes True gamma in plausible region: Yes
engine.commit_step()
print(f"Iteration 2 committed. Total iterations: {engine.current_iteration}")
INFO:historymatching.engine:Checkpoint saved to hm_output/run_20260603_204213/checkpoint.pkl
INFO:historymatching.engine:Wave 2 output saved to hm_output/run_20260603_204213/wave2
INFO:historymatching.engine:[Wave 2] COMMITTED — diagnostics and checkpoint saved to hm_output/run_20260603_204213
INFO:historymatching.engine:============================================================
Iteration 2 committed. Total iterations: 2
Iteration 3: Multiple Emulators Per Wave¶
A single wave can train one emulator per output — the engine loops over the output list and produces a separate emulator for each one. Implausibility is then computed per-emulator and the maximum across all outputs is used to filter samples, so a point must be consistent with every constraint to survive.
Assign a list of output names to engine.feature_selection to activate this. Here we emulate three time-points simultaneously:
engine.feature_selection = ["incidence_3", "incidence_6", "incidence_11"]
print("Running iteration 3 with 3 simultaneous emulators...")
result_3 = engine.step()
print(f" Samples generated: {len(result_3.samples)}")
print(f" NROY fraction: {result_3.nroy_fraction:.1%}")
print(f" Outputs emulated: {result_3.emulated_outputs}")
print(f" Emulators trained: {len(result_3.emulators)}")
# Save training data for offline debugging
result_3.samples.to_csv("/tmp/wave3_samples.csv", index=False)
result_3.simulation_results.to_csv("/tmp/wave3_sim_results.csv", index=False)
print("\nSaved /tmp/wave3_samples.csv and /tmp/wave3_sim_results.csv")
INFO:historymatching.engine:Feature selection: Manual Selection (3 features)
INFO:historymatching.engine:============================================================
INFO:historymatching.engine:WAVE 3 STARTING
INFO:historymatching.engine:============================================================
INFO:historymatching.engine:[Wave 3] Phase 1/5 SAMPLING: using 500 pre-computed samples from wave 2 [0.0s]
INFO:historymatching.engine:[Wave 3] Phase 2/5 SIMULATION: running 500 simulations...
Running iteration 3 with 3 simultaneous emulators...
INFO:historymatching.engine:[Wave 3] Phase 2/5 SIMULATION: complete — 20 outputs [0.7s]
INFO:historymatching.engine:[Wave 3] Phase 3/5 FEATURE SELECTION: ['incidence_3', 'incidence_6', 'incidence_11'] [0.0s]
INFO:historymatching.engine:[Wave 3] Phase 4/5 EMULATOR TRAINING: training 3 emulators...
INFO:historymatching.emulators.factory: Emulator 1/3 [incidence_3]: creating (500 samples, 2 params, type=gpr)...
DEBUG:historymatching.emulators.factory:Created gpr emulator for feature: incidence_3
INFO:historymatching.emulators.factory: Emulator 1/3 [incidence_3]: training...
INFO:historymatching.emulators.gpr: Training GPR: 375 points, 2 dims
INFO:historymatching.emulators.gpr: Building GPflow model and optimizing hyperparameters...
INFO:historymatching.emulators.gpr: GPR training complete — noise=0.3555, lengthscale range=[2.383, 2.725] [1.3s]
INFO:historymatching.emulators.factory: Emulator 1/3 [incidence_3]: trained [1.3s]
INFO:historymatching.emulators.factory: Emulator 2/3 [incidence_6]: creating (500 samples, 2 params, type=gpr)...
DEBUG:historymatching.emulators.factory:Created gpr emulator for feature: incidence_6
INFO:historymatching.emulators.factory: Emulator 2/3 [incidence_6]: training...
INFO:historymatching.emulators.gpr: Training GPR: 375 points, 2 dims
INFO:historymatching.emulators.gpr: Building GPflow model and optimizing hyperparameters...
INFO:historymatching.emulators.gpr: GPR training complete — noise=0.0646, lengthscale range=[0.799, 0.935] [1.1s]
INFO:historymatching.emulators.factory: Emulator 2/3 [incidence_6]: trained [1.1s]
INFO:historymatching.emulators.factory: Emulator 3/3 [incidence_11]: creating (500 samples, 2 params, type=gpr)...
DEBUG:historymatching.emulators.factory:Created gpr emulator for feature: incidence_11
INFO:historymatching.emulators.factory: Emulator 3/3 [incidence_11]: training...
INFO:historymatching.emulators.gpr: Training GPR: 375 points, 2 dims
INFO:historymatching.emulators.gpr: Building GPflow model and optimizing hyperparameters...
INFO:historymatching.emulators.gpr: GPR training complete — noise=0.2854, lengthscale range=[0.885, 0.956] [1.0s]
INFO:historymatching.emulators.factory: Emulator 3/3 [incidence_11]: trained [1.0s]
INFO:historymatching.engine:[Wave 3] Phase 4/5 EMULATOR TRAINING: complete [3.3s]
INFO:historymatching.engine:[Wave 3] Phase 5/5 NROY SAMPLING: finding 500 plausible candidates for next wave...
INFO:historymatching.nroy_sampling: LHS rejection: 0/500 (0%) — starting
INFO:historymatching.nroy_sampling: LHS rejection: 20/500 (4%) | 550 tested | rate=3.6364% [0s]
INFO:historymatching.nroy_sampling: LHS rejection: 540/500 (108%) | 15,070 tested | rate=3.5833% [1s]
INFO:historymatching.nroy_sampling: LHS rejection: 500/500 [1.3s]
INFO:historymatching.nroy_sampling: NROY SUMMARY: 500/500 [OK] — sources: LHS=500 [1.3s]
DEBUG:historymatching.engine:NROY fraction: 0.033179
INFO:historymatching.engine:[Wave 3] Phase 5/5 NROY SAMPLING: found 500 candidates [1.3s]
INFO:historymatching.engine:[Wave 3] ALL PHASES COMPLETE [5.3s total]. Committing...
Samples generated: 500 NROY fraction: 3.3% Outputs emulated: ['incidence_3', 'incidence_6', 'incidence_11'] Emulators trained: 3 Saved /tmp/wave3_samples.csv and /tmp/wave3_sim_results.csv
Inspect each emulator's quality¶
Iterate over result.emulators (a {feature: emulator} dict) to check fit quality
for every feature before deciding what to keep:
metrics_3 = result_3.get_emulator_quality_metrics()
print(f"{'Output':<20} {'R²':>6} {'MSE':>10} {'n_train':>8}")
print("-" * 50)
for feature in result_3.emulated_outputs:
m = metrics_3[feature]
r2 = m.get("r2", float("nan"))
mse = m.get("mse", float("nan"))
n = m.get("n_train", "?")
flag = " ← poor fit" if r2 < 0.7 else ""
print(f"{feature:<20} {r2:>6.3f} {mse:>10.4g} {n:>8}{flag}")
Output R² MSE n_train -------------------------------------------------- incidence_3 0.612 9474 375 ← poor fit incidence_6 0.935 5107 375 incidence_11 0.761 748.8 375
for feature in result_3.emulated_outputs:
r2 = metrics_3[feature].get("r2", float("nan"))
print(f"\n── {feature} (R²={r2:.3f}) ──")
em = result_3.get_emulator(feature)
em.test()
em.plot_diagnostics()
── incidence_3 (R²=0.612) ──
── incidence_6 (R²=0.935) ──
── incidence_11 (R²=0.761) ──
Drop a poor emulator before committing¶
If one emulator has a low R² it will add noise to the implausibility calculation —
potentially excluding valid parameter regions. Call engine.drop_emulator_from_pending()
to remove it from the pending wave without discarding the other emulators or re-running
any simulations. The remaining emulators are committed as normal.
API note:
drop_emulator_from_pending()must be called afterstep()but beforecommit_step(). Calling it after committing raises aRuntimeError.
R2_THRESHOLD = 0.7
dropped = []
for feature in result_3.emulated_outputs:
r2 = metrics_3[feature].get("r2", 1.0)
if r2 < R2_THRESHOLD:
engine.drop_emulator_from_pending(feature)
dropped.append(feature)
print(f"Dropped '{feature}' (R²={r2:.3f} < {R2_THRESHOLD})")
if not dropped:
print("All emulators meet the R² threshold — keeping all three.")
# Capture NROY samples BEFORE committing — after commit, pending is cleared
pending_4 = engine.get_pending_next_samples()
engine.commit_step()
print(f"\nIteration 3 committed with {len(result_3.emulated_outputs) - len(dropped)} emulators.")
print(f"Total committed iterations: {engine.current_iteration}")
INFO:historymatching.engine:Emulator for 'incidence_3' removed from pending iteration 3.
Dropped 'incidence_3' (R²=0.612 < 0.7)
INFO:historymatching.engine:Checkpoint saved to hm_output/run_20260603_204213/checkpoint.pkl
INFO:historymatching.engine:Wave 3 output saved to hm_output/run_20260603_204213/wave3
INFO:historymatching.engine:[Wave 3] COMMITTED — diagnostics and checkpoint saved to hm_output/run_20260603_204213
INFO:historymatching.engine:============================================================
Iteration 3 committed with 2 emulators. Total committed iterations: 3
Parameter Space and Trajectories After Iteration 3¶
# pending_4 = samples that passed ALL 3 waves' emulators (captured before commit)
fig, axes = plt.subplots(1, 2, figsize=(12, 4))
# Left: parameter space — show progression
ax = axes[0]
ax.scatter(result_2.samples["beta"], result_2.samples["gamma"],
alpha=0.3, s=10, color="lightgray", label="Iter 3 training (passed iters 1+2)")
ax.scatter(pending_4["beta"], pending_4["gamma"],
alpha=0.7, s=20, color="steelblue", label="NROY after iter 3")
ax.axvline(beta_true, color="red", linestyle="--", linewidth=2, label=f"True beta={beta_true}")
ax.axhline(gamma_true, color="red", linestyle="--", linewidth=2, label=f"True gamma={gamma_true}")
ax.set_xlabel("beta")
ax.set_ylabel("gamma")
ax.set_title("Parameter space after iteration 3")
ax.legend(fontsize=7)
ax.grid(True, alpha=0.3)
# Right: trajectories from NROY samples
ax = axes[1]
sims_final = sir_model_wrapper(pending_4)
days = np.arange(sims_final.shape[1])
obs_vals = data["incidence"].values
for row in sims_final.values:
ax.plot(days, row, color="steelblue", alpha=0.15, linewidth=0.8)
ax.plot(days, obs_vals, "ro-", linewidth=2, markersize=4, label="Observed")
# Highlight all emulated time-points across all 3 iterations
all_emulated = set()
for r in [result_1, result_2, result_3]:
all_emulated.update(int(f.split("_")[1]) for f in r.emulated_outputs)
all_emulated = sorted(all_emulated)
ax.scatter(all_emulated, obs_vals[all_emulated], s=100, zorder=5,
facecolors="none", edgecolors="red", linewidths=2, label="Emulated day(s)")
ax.set_xlabel("Day")
ax.set_ylabel("Incidence")
ax.set_title("NROY trajectories after 3 iterations")
ax.legend(fontsize=8)
ax.grid(linestyle=":")
plt.tight_layout()
plt.show()
print(f"NROY samples after iteration 3: {len(pending_4)}")
print(f" beta range: [{pending_4['beta'].min():.3f}, {pending_4['beta'].max():.3f}]")
print(f" gamma range: [{pending_4['gamma'].min():.3f}, {pending_4['gamma'].max():.3f}]")
NROY samples after iteration 3: 500 beta range: [1.067, 1.639] gamma range: [0.286, 0.798]
Final Summary¶
print("FINAL HISTORY MATCHING RESULTS")
print("=" * 50)
print(f"True parameters: beta={beta_true}, gamma={gamma_true}, R0={beta_true/gamma_true:.2f}")
print()
# pending_4 = NROY after all 3 waves (captured before final commit)
print(f"Plausible ranges after {engine.current_iteration} iterations:")
print(f" beta: [{pending_4['beta'].min():.3f}, {pending_4['beta'].max():.3f}]")
print(f" gamma: [{pending_4['gamma'].min():.3f}, {pending_4['gamma'].max():.3f}]")
R0_final = pending_4["beta"] / pending_4["gamma"]
print(f" R0: [{R0_final.min():.2f}, {R0_final.max():.2f}]")
print()
# NROY fraction per wave
all_results = engine.get_all_results()
print("NROY fraction per wave:")
for r in all_results:
print(f" Wave {r.iteration}: {r.nroy_fraction:.1%}")
print()
print(f"Engine statistics:")
print(f" Total samples generated: {engine.samples_generated}")
print(f" Total samples accepted: {engine.samples_accepted}")
print(f" Final acceptance rate: {engine.acceptance_rate:.3f}")
print(f" Emulators trained: {engine.emulators_trained}")
beta_reduction = (parameter_bounds["beta"][1] - parameter_bounds["beta"][0]) / (pending_4["beta"].max() - pending_4["beta"].min())
gamma_reduction = (parameter_bounds["gamma"][1] - parameter_bounds["gamma"][0]) / (pending_4["gamma"].max() - pending_4["gamma"].min())
print()
print(f"Parameter space reduced:")
print(f" beta space reduced by factor of {beta_reduction:.1f}x")
print(f" gamma space reduced by factor of {gamma_reduction:.1f}x")
FINAL HISTORY MATCHING RESULTS ================================================== True parameters: beta=1.3, gamma=0.5, R0=2.60 Plausible ranges after 3 iterations: beta: [1.067, 1.639] gamma: [0.286, 0.798] R0: [2.04, 3.73] NROY fraction per wave: Wave 1: 11.6% Wave 2: 6.3% Wave 3: 3.3% Engine statistics: Total samples generated: 27804 Total samples accepted: 3000 Final acceptance rate: 0.090 Emulators trained: 5 Parameter space reduced: beta space reduced by factor of 4.7x gamma space reduced by factor of 1.8x
import tempfile, os
diag_dir = tempfile.mkdtemp(prefix="hm_diagnostics_")
for result in all_results:
result.save(diag_dir, all_results=all_results)
print(f"\nDiagnostics saved to {diag_dir}/")
for f in sorted(os.listdir(diag_dir)):
print(f" {f}")
Diagnostics saved to /tmp/hm_diagnostics_re8jh79s/ wave1 wave2 wave3
Summary¶
This notebook demonstrated the manual history matching workflow:
- Setup: imported
SIRandgenerate_observed_datafrommodel.py - Iteration 1: emulated
incidence_11(descending limb), ranengine.step(), inspected emulator diagnostics and parameter space, then committed - Iteration 2: switched to
incidence_3(rising limb), repeated the step-inspect-commit cycle — this breaks the β/γ degeneracy left by iteration 1 - Iteration 3: selected three outputs simultaneously (
incidence_3,incidence_6,incidence_11), inspected per-output R² metrics, dropped any poor-quality emulators withengine.drop_emulator_from_pending(), then committed
Key API points:
engine.step()runs one iteration and returns anIterationResultengine.commit_step()accepts the result and advances the iteration counterengine.revert_step()discards the pending result (re-runs simulations on nextstep())engine.feature_selection = [...]sets outputs for the next wave — assign a list of any length to train multiple emulators in one waveresult.nroy_fraction— fraction of the prior that remains plausible after this waveresult.get_emulator(name)gives access to individual emulator diagnosticsresult.get_emulator_quality_metrics()returns R², MSE, and training size per outputresult.save(fig_dir, all_results)writes per-output diagnostic figures + metrics JSONengine.drop_emulator_from_pending(output)removes one emulator from the pending wave before committing — useful when an output's emulator has a poor fit
Next steps¶
| Goal | Tutorial |
|---|---|
| Fully automated one-call workflow | 01_basic_workflow.ipynb |
| Advanced builder options, callbacks, checkpointing | 03_advanced_configuration.ipynb |
Select specific (parameter, seed) trajectories from the NROY ensemble |
05_trajectory_selection.ipynb |