Tutorial 5: Trajectory Selection¶
History matching identifies the NROY region — the part of parameter space Not Ruled Out Yet by the available evidence. But for stochastic models, knowing which parameters are plausible is only half the story. The same parameters with different random seeds can produce very different epidemic curves.
Trajectory selection is the post-calibration step that picks specific (parameter set, random seed) pairs whose simulated outputs are consistent with the observed data. The result is a set of plausible trajectories — not just plausible parameters.
This tutorial uses Sampling Importance Resampling (SIR)¹ to select trajectories:
- Sample
(β, γ, seed)triples from the NROY region - Run the stochastic simulator for each triple → get an ensemble of incidence curves
- Weight each trajectory by how well it matches the observations (pseudo-likelihood)
- Resample proportional to those weights → the selected trajectory set
¹ Yes, the acronym matches the epidemic model. This is intentional and delightful.
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.colors as mcolors
import historymatching as hm
from model import SIR, generate_observed_data
%matplotlib inline
WARNING: All log messages before absl::InitializeLog() is called are written to STDERR I0000 00:00:1780533969.195809 1041217 port.cc:153] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`. I0000 00:00:1780533969.260437 1041217 cpu_feature_guard.cc:227] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations. To enable the following instructions: AVX2 AVX_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
WARNING: All log messages before absl::InitializeLog() is called are written to STDERR I0000 00:00:1780533970.479857 1041217 port.cc:153] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
/software/conda/envs/hm/lib/python3.13/site-packages/gpflow/versions.py:1: UserWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html. The pkg_resources package is slated for removal as early as 2025-11-30. Refrain from using this package or pin to Setuptools<81. import pkg_resources
1. Generate synthetic observations¶
We generate a synthetic epidemic using a known (β, γ) pair. In a real study this would be replaced by actual surveillance data.
POPULATION = 5_000
SEED_INFECTIONS = 50
BETA_TRUE = 0.7
GAMMA_TRUE = 0.4
obs_incidence, obs_model = generate_observed_data(
beta_true = BETA_TRUE,
gamma_true = GAMMA_TRUE,
population_size = POPULATION,
n_seed_infections= SEED_INFECTIONS,
seed=40,
)
print(f"True parameters: β={BETA_TRUE}, γ={GAMMA_TRUE} (R₀ ≈ {BETA_TRUE/GAMMA_TRUE:.2f})")
print(f"Observed peak: {obs_incidence.max():.0f} cases on day {obs_incidence.argmax()}")
print(f"Total cases: {obs_incidence.sum():.0f} (attack rate {obs_incidence.sum()/POPULATION:.1%})")
fig, ax = plt.subplots(figsize=(8, 3))
ax.plot(obs_incidence.values, 'ko-', ms=4, lw=1.5, label='Observed incidence')
ax.set_xlabel('Day')
ax.set_ylabel('Daily new cases')
ax.set_title('Observed epidemic curve')
ax.legend()
ax.grid(ls=':')
fig.tight_layout()
True parameters: β=0.7, γ=0.4 (R₀ ≈ 1.75) Observed peak: 236 cases on day 12 Total cases: 2852 (attack rate 57.0%)
2. Run history matching to identify the NROY region¶
We calibrate using three summary statistics — peak incidence, attack rate, and incidence at day 5.
The simulation function injects a rand_seed column into the samples DataFrame if one isn't already present. Each (β, γ, seed) triple then maps to a single deterministic output, giving the GPR emulator a clean surface to fit. The engine ignores rand_seed for emulation (it only passes parameter-space columns to the emulator), but stores it on result.samples so the exact triples are available for trajectory selection later.
def run_sir(samples: pd.DataFrame) -> pd.DataFrame:
"""Simulator wrapper — injects rand_seed for reproducibility.
The engine only sees 'beta' and 'gamma' as parameter-space columns.
'rand_seed' is metadata that flows through on result.samples.
"""
df = samples.copy()
if 'rand_seed' not in df.columns:
df['rand_seed'] = np.random.default_rng(42).integers(0, 2**31, size=len(df))
rows = []
for _, row in df.iterrows():
model = SIR(
beta = row['beta'],
gamma = row['gamma'],
s0 = POPULATION - SEED_INFECTIONS,
i0 = SEED_INFECTIONS,
seed = int(row['rand_seed']),
)
inc = model.get_incidence()
rows.append({
'peak_incidence': float(inc.max()),
'attack_rate': float(inc.sum() / POPULATION),
'incidence_5': float(inc[5]) if len(inc) > 5 else 0.0,
})
return pd.DataFrame(rows)
obs_peak = float(obs_incidence.max())
obs_ar = float(obs_incidence.sum() / POPULATION)
obs_inc5 = float(obs_incidence.values[5])
print(f"Observed peak: {obs_peak:.0f} cases/day")
print(f"Observed incidence_5: {obs_inc5:.0f} cases/day")
print(f"Observed attack rate: {obs_ar:.2f}")
engine = hm.HistoryMatching(
bounds = {'beta': (0.3, 2.0), 'gamma': (0.1, 0.8)},
observations = {
'peak_incidence': (obs_peak, obs_peak * 0.05), # ±5%
'attack_rate': (obs_ar, obs_ar * 0.03), # ±3%
'incidence_5': (obs_inc5, obs_inc5 * 0.10), # ±10%
},
function = run_sir,
emulator_type = 'gpr',
n_samples = 500,
max_iterations = 5,
implausibility_threshold = 3.0,
)
results = engine.run()
print(engine)
Observed peak: 236 cases/day Observed incidence_5: 96 cases/day Observed attack rate: 0.57
I0000 00:00:1780533973.463531 1041217 gpu_device.cc:2043] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 2175 MB memory: -> device: 0, name: NVIDIA GeForce RTX 3050 Laptop GPU, pci bus id: 0000:01:00.0, compute capability: 8.6
I0000 00:00:1780533978.277694 1041403 cuda_solvers.cc:175] Creating GpuSolver handles for stream 0x609fbc058860
INFO:historymatching.engine:Checkpoint saved to hm_output/run_20260603_204612/checkpoint.pkl
INFO:historymatching.engine:Wave 1 output saved to hm_output/run_20260603_204612/wave1
INFO:historymatching.engine:[Wave 1] COMMITTED — diagnostics and checkpoint saved to hm_output/run_20260603_204612
INFO:historymatching.engine:============================================================
INFO:historymatching.engine:============================================================
INFO:historymatching.engine:WAVE 2 STARTING
INFO:historymatching.engine:============================================================
INFO:historymatching.engine:[Wave 2] Phase 1/5 SAMPLING: using 500 pre-computed samples from wave 1 [0.0s]
INFO:historymatching.engine:[Wave 2] Phase 2/5 SIMULATION: running 500 simulations...
INFO:historymatching.engine:[Wave 2] Phase 2/5 SIMULATION: complete — 3 outputs [0.4s]
INFO:historymatching.feature_selection: Feature ranking (mean_sq_z): attack_rate=116.38, peak_incidence=57.05, incidence_5=48.93
INFO:historymatching.feature_selection: Cooldown (skip): ['peak_incidence']
INFO:historymatching.feature_selection: SELECTED: incidence_5 (score=48.93, mean_z=1.38, std_z=6.86)
INFO:historymatching.engine:[Wave 2] Phase 3/5 FEATURE SELECTION: ['incidence_5'] [0.0s]
INFO:historymatching.engine:[Wave 2] Phase 4/5 EMULATOR TRAINING: training 1 emulators...
INFO:historymatching.emulators.factory: Emulator 1/1 [incidence_5]: creating (500 samples, 2 params, type=gpr)...
DEBUG:historymatching.emulators.factory:Created gpr emulator for feature: incidence_5
INFO:historymatching.emulators.factory: Emulator 1/1 [incidence_5]: training...
INFO:historymatching.emulators.gpr: Training GPR: 375 points, 2 dims
INFO:historymatching.emulators.gpr: Building GPflow model and optimizing hyperparameters...
INFO:historymatching.emulators.gpr: GPR training complete — noise=0.2124, lengthscale range=[0.862, 1.753] [1.0s]
INFO:historymatching.emulators.factory: Emulator 1/1 [incidence_5]: trained [1.0s]
INFO:historymatching.engine:[Wave 2] Phase 4/5 EMULATOR TRAINING: complete [1.0s]
INFO:historymatching.engine:[Wave 2] Phase 5/5 NROY SAMPLING: finding 500 plausible candidates for next wave...
INFO:historymatching.nroy_sampling: LHS rejection: 0/500 (0%) — starting
INFO:historymatching.nroy_sampling: LHS rejection: 79/500 (15%) | 550 tested | rate=14.3636% [0s]
INFO:historymatching.nroy_sampling: LHS rejection: 559/500 (111%) | 3,774 tested | rate=14.8119% [0s]
INFO:historymatching.nroy_sampling: LHS rejection: 500/500 [0.4s]
INFO:historymatching.nroy_sampling: NROY SUMMARY: 500/500 [OK] — sources: LHS=500 [0.4s]
DEBUG:historymatching.engine:NROY fraction: 0.132485
INFO:historymatching.engine:[Wave 2] Phase 5/5 NROY SAMPLING: found 500 candidates [0.4s]
INFO:historymatching.engine:[Wave 2] ALL PHASES COMPLETE [1.8s total]. Committing...
INFO:historymatching.engine:Checkpoint saved to hm_output/run_20260603_204612/checkpoint.pkl
INFO:historymatching.engine:Wave 2 output saved to hm_output/run_20260603_204612/wave2
INFO:historymatching.engine:[Wave 2] COMMITTED — diagnostics and checkpoint saved to hm_output/run_20260603_204612
INFO:historymatching.engine:============================================================
INFO:historymatching.engine:============================================================
INFO:historymatching.engine:WAVE 3 STARTING
INFO:historymatching.engine:============================================================
INFO:historymatching.engine:[Wave 3] Phase 1/5 SAMPLING: using 500 pre-computed samples from wave 2 [0.0s]
INFO:historymatching.engine:[Wave 3] Phase 2/5 SIMULATION: running 500 simulations...
INFO:historymatching.engine:[Wave 3] Phase 2/5 SIMULATION: complete — 3 outputs [0.4s]
INFO:historymatching.feature_selection: Feature ranking (mean_sq_z): attack_rate=139.63, peak_incidence=59.79, incidence_5=26.93
INFO:historymatching.feature_selection: Cooldown (skip): ['incidence_5']
INFO:historymatching.feature_selection: SELECTED: attack_rate (score=139.63, mean_z=-6.07, std_z=10.15)
INFO:historymatching.engine:[Wave 3] Phase 3/5 FEATURE SELECTION: ['attack_rate'] [0.0s]
INFO:historymatching.engine:[Wave 3] Phase 4/5 EMULATOR TRAINING: training 1 emulators...
INFO:historymatching.emulators.factory: Emulator 1/1 [attack_rate]: creating (500 samples, 2 params, type=gpr)...
DEBUG:historymatching.emulators.factory:Created gpr emulator for feature: attack_rate
INFO:historymatching.emulators.factory: Emulator 1/1 [attack_rate]: training...
INFO:historymatching.emulators.gpr: Training GPR: 375 points, 2 dims
INFO:historymatching.emulators.gpr: Building GPflow model and optimizing hyperparameters...
INFO:historymatching.emulators.gpr: GPR training complete — noise=0.0767, lengthscale range=[0.506, 0.687] [0.9s]
INFO:historymatching.emulators.factory: Emulator 1/1 [attack_rate]: trained [0.9s]
INFO:historymatching.engine:[Wave 3] Phase 4/5 EMULATOR TRAINING: complete [0.9s]
INFO:historymatching.engine:[Wave 3] Phase 5/5 NROY SAMPLING: finding 500 plausible candidates for next wave...
INFO:historymatching.nroy_sampling: LHS rejection: 0/500 (0%) — starting
INFO:historymatching.nroy_sampling: LHS rejection: 44/500 (8%) | 550 tested | rate=8.0000% [0s]
INFO:historymatching.nroy_sampling: LHS rejection: 585/500 (117%) | 6,820 tested | rate=8.5777% [1s]
INFO:historymatching.nroy_sampling: LHS rejection: 500/500 [0.7s]
INFO:historymatching.nroy_sampling: NROY SUMMARY: 500/500 [OK] — sources: LHS=500 [0.7s]
DEBUG:historymatching.engine:NROY fraction: 0.073314
INFO:historymatching.engine:[Wave 3] Phase 5/5 NROY SAMPLING: found 500 candidates [0.7s]
INFO:historymatching.engine:[Wave 3] ALL PHASES COMPLETE [2.0s total]. Committing...
INFO:historymatching.engine:Checkpoint saved to hm_output/run_20260603_204612/checkpoint.pkl
INFO:historymatching.engine:Wave 3 output saved to hm_output/run_20260603_204612/wave3
INFO:historymatching.engine:[Wave 3] COMMITTED — diagnostics and checkpoint saved to hm_output/run_20260603_204612
INFO:historymatching.engine:============================================================
INFO:historymatching.engine:============================================================
INFO:historymatching.engine:WAVE 4 STARTING
INFO:historymatching.engine:============================================================
INFO:historymatching.engine:[Wave 4] Phase 1/5 SAMPLING: using 500 pre-computed samples from wave 3 [0.0s]
INFO:historymatching.engine:[Wave 4] Phase 2/5 SIMULATION: running 500 simulations...
INFO:historymatching.engine:[Wave 4] Phase 2/5 SIMULATION: complete — 3 outputs [0.4s]
INFO:historymatching.feature_selection: Feature ranking (mean_sq_z): attack_rate=33.00, incidence_5=30.26, peak_incidence=25.22
INFO:historymatching.feature_selection: Cooldown (skip): ['attack_rate']
INFO:historymatching.feature_selection: SELECTED: incidence_5 (score=30.26, mean_z=1.97, std_z=5.14)
INFO:historymatching.engine:[Wave 4] Phase 3/5 FEATURE SELECTION: ['incidence_5'] [0.0s]
INFO:historymatching.engine:[Wave 4] Phase 4/5 EMULATOR TRAINING: training 1 emulators...
INFO:historymatching.emulators.factory: Emulator 1/1 [incidence_5]: creating (500 samples, 2 params, type=gpr)...
DEBUG:historymatching.emulators.factory:Created gpr emulator for feature: incidence_5
INFO:historymatching.emulators.factory: Emulator 1/1 [incidence_5]: training...
INFO:historymatching.emulators.gpr: Training GPR: 375 points, 2 dims
INFO:historymatching.emulators.gpr: Building GPflow model and optimizing hyperparameters...
INFO:historymatching.emulators.gpr: GPR training complete — noise=0.3334, lengthscale range=[1.129, 1.565] [1.0s]
INFO:historymatching.emulators.factory: Emulator 1/1 [incidence_5]: trained [1.0s]
INFO:historymatching.engine:[Wave 4] Phase 4/5 EMULATOR TRAINING: complete [1.0s]
INFO:historymatching.engine:[Wave 4] Phase 5/5 NROY SAMPLING: finding 500 plausible candidates for next wave...
INFO:historymatching.nroy_sampling: LHS rejection: 0/500 (0%) — starting
INFO:historymatching.nroy_sampling: LHS rejection: 54/500 (10%) | 550 tested | rate=9.8182% [0s]
INFO:historymatching.nroy_sampling: LHS rejection: 471/500 (94%) | 5,546 tested | rate=8.4926% [1s]
INFO:historymatching.nroy_sampling: LHS rejection: 500/500 (100%) | 5,960 tested | rate=8.3893% [2s]
INFO:historymatching.nroy_sampling: LHS rejection: 500/500 [2.2s]
INFO:historymatching.nroy_sampling: NROY SUMMARY: 500/500 [OK] — sources: LHS=500 [2.2s]
DEBUG:historymatching.engine:NROY fraction: 0.083893
INFO:historymatching.engine:[Wave 4] Phase 5/5 NROY SAMPLING: found 500 candidates [2.3s]
INFO:historymatching.engine:[Wave 4] ALL PHASES COMPLETE [3.6s total]. Committing...
INFO:historymatching.engine:Checkpoint saved to hm_output/run_20260603_204612/checkpoint.pkl
INFO:historymatching.engine:Wave 4 output saved to hm_output/run_20260603_204612/wave4
INFO:historymatching.engine:[Wave 4] COMMITTED — diagnostics and checkpoint saved to hm_output/run_20260603_204612
INFO:historymatching.engine:============================================================
INFO:historymatching.engine:============================================================
INFO:historymatching.engine:WAVE 5 STARTING
INFO:historymatching.engine:============================================================
INFO:historymatching.engine:[Wave 5] Phase 1/5 SAMPLING: using 500 pre-computed samples from wave 4 [0.0s]
INFO:historymatching.engine:[Wave 5] Phase 2/5 SIMULATION: running 500 simulations...
INFO:historymatching.engine:[Wave 5] Phase 2/5 SIMULATION: complete — 3 outputs [0.4s]
INFO:historymatching.feature_selection: Feature ranking (mean_sq_z): attack_rate=34.29, incidence_5=26.63, peak_incidence=25.91
INFO:historymatching.feature_selection: Cooldown (skip): ['incidence_5']
INFO:historymatching.feature_selection: SELECTED: attack_rate (score=34.29, mean_z=-0.17, std_z=5.86)
INFO:historymatching.engine:[Wave 5] Phase 3/5 FEATURE SELECTION: ['attack_rate'] [0.0s]
INFO:historymatching.engine:[Wave 5] Phase 4/5 EMULATOR TRAINING: training 1 emulators...
INFO:historymatching.emulators.factory: Emulator 1/1 [attack_rate]: creating (500 samples, 2 params, type=gpr)...
DEBUG:historymatching.emulators.factory:Created gpr emulator for feature: attack_rate
INFO:historymatching.emulators.factory: Emulator 1/1 [attack_rate]: training...
INFO:historymatching.emulators.gpr: Training GPR: 375 points, 2 dims
INFO:historymatching.emulators.gpr: Building GPflow model and optimizing hyperparameters...
INFO:historymatching.emulators.gpr: GPR training complete — noise=0.1423, lengthscale range=[0.807, 1.062] [1.1s]
INFO:historymatching.emulators.factory: Emulator 1/1 [attack_rate]: trained [1.1s]
INFO:historymatching.engine:[Wave 5] Phase 4/5 EMULATOR TRAINING: complete [1.1s]
INFO:historymatching.engine:[Wave 5] Phase 5/5 NROY SAMPLING: finding 500 plausible candidates for next wave...
INFO:historymatching.nroy_sampling: LHS rejection: 0/500 (0%) — starting
INFO:historymatching.nroy_sampling: LHS rejection: 33/500 (6%) | 550 tested | rate=6.0000% [0s]
INFO:historymatching.nroy_sampling: LHS rejection: 556/500 (111%) | 9,111 tested | rate=6.1025% [1s]
INFO:historymatching.nroy_sampling: LHS rejection: 500/500 [1.1s]
INFO:historymatching.nroy_sampling: NROY SUMMARY: 500/500 [OK] — sources: LHS=500 [1.1s]
DEBUG:historymatching.engine:NROY fraction: 0.054879
INFO:historymatching.engine:[Wave 5] Phase 5/5 NROY SAMPLING: found 500 candidates [1.2s]
INFO:historymatching.engine:[Wave 5] ALL PHASES COMPLETE [2.7s total]. Committing...
INFO:historymatching.engine:Checkpoint saved to hm_output/run_20260603_204612/checkpoint.pkl
INFO:historymatching.engine:Wave 5 output saved to hm_output/run_20260603_204612/wave5
INFO:historymatching.engine:[Wave 5] COMMITTED — diagnostics and checkpoint saved to hm_output/run_20260603_204612
INFO:historymatching.engine:============================================================
INFO:historymatching.engine:Automated run completed. 5 iterations executed.
HistoryMatching( parameters=2 ['beta', 'gamma'], outputs=3 ['peak_incidence', 'attack_rate', 'incidence_5'], emulator=gpr, simulator=set, n_samples=500, implausibility_threshold=3.0, state=completed, wave 5/5, acceptance_rate=0.155 )
3. Inspect the NROY region¶
After history matching the parameter samples from the final iteration were drawn by rejection-sampling the full prior against all trained emulators. They represent a draw from the current NROY region.
# Collect samples from all iterations to visualise the space reduction
all_samples = pd.concat(
[r.samples.assign(iteration=r.iteration) for r in results],
ignore_index=True,
)
fig, axs = plt.subplots(1, 2, figsize=(12, 4))
cmap = plt.get_cmap('viridis')
n_iter = len(results)
for it in range(1, n_iter + 1):
df = all_samples[all_samples['iteration'] == it]
axs[0].scatter(df['beta'], df['gamma'],
s=10, alpha=0.5,
color=cmap((it - 1) / n_iter),
label=f'Iteration {it}')
axs[0].scatter([BETA_TRUE], [GAMMA_TRUE], marker='*', s=200,
c='red', zorder=5, label='True values')
axs[0].set_xlabel('β')
axs[0].set_ylabel('γ')
axs[0].set_title('Parameter space — NROY shrinks each iteration')
axs[0].legend(fontsize=8)
axs[0].grid(ls=':')
# NROY fraction from the engine (what fraction of the prior remains plausible)
iters = [r.iteration for r in results]
nroy_fracs = [r.nroy_fraction for r in results]
axs[1].bar(iters, nroy_fracs, color=[cmap((i) / n_iter) for i in range(n_iter)])
for w, f in zip(iters, nroy_fracs):
axs[1].annotate(f'{f:.0%}', (w, f), textcoords='offset points',
xytext=(0, 5), ha='center', fontsize=9)
axs[1].set_xlabel('Iteration')
axs[1].set_ylabel('NROY fraction')
axs[1].set_title('Fraction of prior space remaining')
axs[1].set_xticks(iters)
axs[1].set_ylim(0, 1)
axs[1].grid(ls=':', axis='y')
fig.tight_layout()
# NROY pool = samples that passed ALL waves' emulators
# These already have rand_seed attached (injected by run_sir)
nroy_samples = engine.get_nroy_samples(10000)
print(f"NROY parameter pool: {len(nroy_samples)} (β, γ, seed) triples")
print(f" β range: [{nroy_samples['beta'].min():.3f}, {nroy_samples['beta'].max():.3f}]")
print(f" γ range: [{nroy_samples['gamma'].min():.3f}, {nroy_samples['gamma'].max():.3f}]")
INFO:historymatching.nroy_sampling: LHS rejection: 0/10000 (0%) — starting
INFO:historymatching.nroy_sampling: LHS rejection: 683/10000 (6%) | 11,000 tested | rate=6.2091% [1s]
INFO:historymatching.nroy_sampling: LHS rejection: 7131/10000 (71%) | 111,000 tested | rate=6.4243% [4s]
INFO:historymatching.nroy_sampling: LHS rejection: 10228/10000 (102%) | 160,124 tested | rate=6.3875% [5s]
INFO:historymatching.nroy_sampling: LHS rejection: 10000/10000 [5.3s]
INFO:historymatching.nroy_sampling: NROY SUMMARY: 10000/10000 [OK] — sources: LHS=10000 [5.3s]
NROY parameter pool: 10000 (β, γ, seed) triples β range: [0.314, 1.125] γ range: [0.100, 0.800]
4. Generate a trajectory ensemble¶
We draw a large NROY pool using engine.get_nroy_samples(10000) — this rejection-samples fresh candidates from the full prior against all trained emulators, cheaply and without running any new simulations. Each NROY point gets a unique seed so the trajectory is deterministic and reproducible.
N_DAYS = obs_incidence.shape[0]
ensemble = []
for i, (_, row) in enumerate(nroy_samples.iterrows()):
model = SIR(
beta = row['beta'],
gamma = row['gamma'],
s0 = POPULATION - SEED_INFECTIONS,
i0 = SEED_INFECTIONS,
seed = i, # unique seed per NROY point
)
inc = model.get_incidence()
ensemble.append({
'beta': row['beta'],
'gamma': row['gamma'],
'seed': i,
'incidence': inc[:N_DAYS],
})
ensemble_df = pd.DataFrame(ensemble)
print(f"Ensemble size: {len(ensemble_df)} trajectories (1 per NROY sample)")
Ensemble size: 10000 trajectories (1 per NROY sample)
inc_matrix = np.vstack(ensemble_df['incidence'].values) # shape (n_traj, n_days)
# Compute calibration features for every trajectory
ensemble_df['peak'] = inc_matrix.max(axis=1)
ensemble_df['ar'] = inc_matrix.sum(axis=1) / POPULATION
ensemble_df['inc5'] = inc_matrix[:, 5]
fig, axs = plt.subplots(2, 2, figsize=(14, 8))
# Top-left: time-series spaghetti with day 5 highlighted
ax = axs[0, 0]
for row in inc_matrix:
ax.plot(row, color='steelblue', alpha=0.05, lw=0.8)
ax.plot(inc_matrix.mean(axis=0), color='steelblue', lw=2, label='Ensemble mean')
ax.plot(obs_incidence.values, 'ko-', ms=4, lw=2, label='Observed', zorder=5)
ax.scatter([5], [obs_inc5], s=120, facecolors='none', edgecolors='red',
linewidths=2, zorder=6, label='Emulated day 5')
ax.set_xlabel('Day')
ax.set_ylabel('Daily new cases')
ax.set_title(f'Prior NROY ensemble ({len(ensemble_df):,} trajectories)')
ax.legend(fontsize=8)
ax.grid(ls=':')
# Top-right: peak incidence
ax = axs[0, 1]
ax.hist(ensemble_df['peak'], bins=30, color='steelblue', alpha=0.6, edgecolor='white')
ax.axvline(obs_peak, color='red', lw=2, ls='--', label=f'Observed ({obs_peak:.0f})')
ax.axvspan(obs_peak * (1 - 1.96 * 0.05), obs_peak * (1 + 1.96 * 0.05),
color='red', alpha=0.08, label='HM tolerance (±5%)')
ax.set_xlabel('Peak incidence')
ax.set_ylabel('Count')
ax.set_title('Peak incidence')
ax.legend(fontsize=8)
ax.grid(ls=':')
# Bottom-left: attack rate
ax = axs[1, 0]
ax.hist(ensemble_df['ar'], bins=30, color='steelblue', alpha=0.6, edgecolor='white')
ax.axvline(obs_ar, color='red', lw=2, ls='--', label=f'Observed ({obs_ar:.2f})')
ax.axvspan(obs_ar * (1 - 1.96 * 0.03), obs_ar * (1 + 1.96 * 0.03),
color='red', alpha=0.08, label='HM tolerance (±3%)')
ax.set_xlabel('Attack rate')
ax.set_ylabel('Count')
ax.set_title('Attack rate')
ax.legend(fontsize=8)
ax.grid(ls=':')
# Bottom-right: incidence day 5
ax = axs[1, 1]
ax.hist(ensemble_df['inc5'], bins=30, color='steelblue', alpha=0.6, edgecolor='white')
ax.axvline(obs_inc5, color='red', lw=2, ls='--', label=f'Observed ({obs_inc5:.0f})')
ax.axvspan(obs_inc5 * (1 - 1.96 * 0.10), obs_inc5 * (1 + 1.96 * 0.10),
color='red', alpha=0.08, label='HM tolerance (±10%)')
ax.set_xlabel('Incidence day 5')
ax.set_ylabel('Count')
ax.set_title('Incidence day 5')
ax.legend(fontsize=8)
ax.grid(ls=':')
fig.suptitle('Prior NROY ensemble vs HM targets', fontsize=12)
fig.tight_layout()
5. Weight trajectories by pseudo-likelihood¶
We score each trajectory against the observed incidence curve using a Gaussian pseudo-likelihood:
$$w_i = \exp\!\left(-\frac{d_i^2}{2\,\varepsilon^2}\right), \qquad d_i = \sqrt{\frac{1}{T}\sum_{t=1}^T\left(y_i(t) - y_{\mathrm{obs}}(t)\right)^2}$$
where $d_i$ is the RMSE between trajectory $i$ and the observed data, and $\varepsilon$ is a tolerance that controls how strict the selection is.
- Large ε → lenient: weights spread broadly, selected set ≈ random draw from NROY
- Small ε → strict: only trajectories very close to the observed curve get non-negligible weight
A practical starting point is $\varepsilon \approx$ one standard deviation of the observed peak.
def gaussian_weights(inc_matrix: np.ndarray, obs: np.ndarray, epsilon: float) -> np.ndarray:
"""
Compute normalised pseudo-likelihood weights for each trajectory.
Args:
inc_matrix: Array of shape (n_traj, n_days) — simulated trajectories.
obs: Array of shape (n_days,) — observed incidence.
epsilon: Tolerance (RMSE scale). Smaller = stricter selection.
Returns:
Normalised weight array of shape (n_traj,).
"""
rmse = np.sqrt(np.mean((inc_matrix - obs[np.newaxis, :]) ** 2, axis=1))
w = np.exp(-0.5 * (rmse / epsilon) ** 2)
total = w.sum()
if total == 0:
raise ValueError("All weights are zero — try a larger epsilon.")
return w / total
# Tolerance ≈ 10 % of the peak value — a reasonable starting point
epsilon = obs_peak * 0.03
obs_array = obs_incidence.values[:N_DAYS]
weights = gaussian_weights(inc_matrix, obs_array, epsilon=epsilon)
print(f"Tolerance ε = {epsilon:.1f} cases/day")
print(f"Effective sample size: {1 / (weights**2).sum():.0f} "
f"(out of {len(weights):,} total)")
# Scatter: parameter space coloured by weight
fig, ax = plt.subplots(figsize=(6, 5))
sc = ax.scatter(
ensemble_df['beta'], ensemble_df['gamma'],
c=weights, cmap='viridis_r', s=6, alpha=0.6, norm=mcolors.LogNorm(),
)
ax.scatter([BETA_TRUE], [GAMMA_TRUE], marker='*', s=250, c='red',
edgecolors='k', lw=0.5, zorder=5, label='True values')
plt.colorbar(sc, ax=ax, label='Weight (log scale)')
ax.set_xlabel('β')
ax.set_ylabel('γ')
ax.set_title('Trajectory weights in parameter space')
ax.legend()
ax.grid(ls=':')
fig.tight_layout()
Tolerance ε = 7.1 cases/day Effective sample size: 338 (out of 10,000 total)
6. Importance resampling — select the final trajectory set¶
We draw N_SELECT trajectories without replacement from the ensemble, proportional to their weights. The result is a volume-manageable set of plausible trajectories that can be used for downstream analysis (forecasting quantiles, reporting representative epidemic curves, etc.).
N_SELECT = 15 # number of trajectories to retain
rng = np.random.default_rng(seed=0)
selected_idx = rng.choice(
len(ensemble_df),
size = N_SELECT,
replace = False,
p = weights,
)
selected_df = ensemble_df.iloc[selected_idx].copy()
selected_inc = inc_matrix[selected_idx]
print(f"Selected {N_SELECT} trajectories")
print(f"β range in selected set: {selected_df['beta'].min():.3f} – {selected_df['beta'].max():.3f}")
print(f"γ range in selected set: {selected_df['gamma'].min():.3f} – {selected_df['gamma'].max():.3f}")
# ── Consistency check: re-run selected triples, verify identical outputs ──
print("\nReproducibility check (re-running 5 selected trajectories):")
for check_idx in selected_idx[:5]:
row = ensemble_df.iloc[check_idx]
model = SIR(
beta = row['beta'],
gamma = row['gamma'],
s0 = POPULATION - SEED_INFECTIONS,
i0 = SEED_INFECTIONS,
seed = int(row['seed']),
)
rerun_inc = model.get_incidence()[:N_DAYS]
orig_inc = inc_matrix[check_idx]
match = np.allclose(rerun_inc, orig_inc)
print(f" idx={check_idx:4d} β={row['beta']:.3f} γ={row['gamma']:.3f} "
f"seed={int(row['seed'])} match={match}")
assert match, f"Trajectory {check_idx} not reproducible!"
print("All checks passed — selected trajectories are reproducible.")
Selected 15 trajectories β range in selected set: 0.499 – 0.763 γ range in selected set: 0.260 – 0.487 Reproducibility check (re-running 5 selected trajectories): idx=6447 β=0.560 γ=0.329 seed=6447 match=True idx=2677 β=0.713 γ=0.412 seed=2677 match=True idx= 341 β=0.617 γ=0.382 seed=341 match=True idx= 226 β=0.763 γ=0.469 seed=226 match=True idx=7882 β=0.708 γ=0.400 seed=7882 match=True All checks passed — selected trajectories are reproducible.
7. Visualise the selected trajectories¶
Compare the full prior ensemble (grey) against the selected trajectories (coloured) and the observations (black). The selected set should be visually consistent with the observed curve, whereas the full ensemble is much broader.
selected_peaks = selected_inc.max(axis=1)
selected_ar = selected_inc.sum(axis=1) / POPULATION
selected_inc5 = selected_inc[:, 5]
fig, axs = plt.subplots(2, 3, figsize=(17, 10))
# ── Top row: time-series comparison ──────────────────────────────────────
# Top-left: full prior ensemble
ax = axs[0, 0]
for row in inc_matrix:
ax.plot(row, color='steelblue', alpha=0.04, lw=0.7)
ax.plot(obs_array, 'ko-', ms=4, lw=2, label='Observed', zorder=5)
ax.scatter([5], [obs_inc5], s=120, facecolors='none', edgecolors='red',
linewidths=2, zorder=6, label='Emulated day 5')
ax.set_title(f'Prior ensemble\n({len(inc_matrix):,} trajectories)')
ax.set_xlabel('Day')
ax.set_ylabel('Daily new cases')
ax.legend(fontsize=8)
ax.grid(ls=':')
# Top-middle: selected trajectories
ax = axs[0, 1]
cmap_sel = plt.get_cmap('plasma')
for k, row in enumerate(selected_inc):
ax.plot(row, color=cmap_sel(k / N_SELECT), alpha=0.6, lw=1.0)
ax.fill_between(
range(N_DAYS),
selected_inc.min(axis=0),
selected_inc.max(axis=0),
alpha=0.15, color='purple', label='Selected range',
)
ax.plot(selected_inc.mean(axis=0), color='purple', lw=2.5, label='Selected mean')
ax.plot(obs_array, 'ko-', ms=4, lw=2, label='Observed', zorder=5)
ax.scatter([5], [obs_inc5], s=120, facecolors='none', edgecolors='red',
linewidths=2, zorder=6, label='Emulated day 5')
ax.set_title(f'Selected trajectories\n({N_SELECT} via importance resampling)')
ax.set_xlabel('Day')
ax.legend(fontsize=8)
ax.grid(ls=':')
# Top-right: parameter space
ax = axs[0, 2]
ax.scatter(ensemble_df['beta'], ensemble_df['gamma'],
s=4, alpha=0.2, color='steelblue', label='Prior NROY')
ax.scatter(selected_df['beta'], selected_df['gamma'],
s=20, alpha=0.8, color='purple', label='Selected')
ax.scatter([BETA_TRUE], [GAMMA_TRUE], marker='*', s=200,
c='red', zorder=5, label='True values')
ax.set_xlabel('β')
ax.set_ylabel('γ')
ax.set_title('Parameter space')
ax.legend(fontsize=8)
ax.grid(ls=':')
# ── Bottom row: feature-space comparison ─────────────────────────────────
# Bottom-left: peak incidence
ax = axs[1, 0]
ax.hist(ensemble_df['peak'], bins=30, density=True, color='steelblue',
alpha=0.4, edgecolor='white', label='Prior NROY')
ax.hist(selected_peaks, bins=15, density=True, color='purple',
alpha=0.7, edgecolor='white', label='Selected')
ax.axvline(obs_peak, color='red', lw=2, ls='--', label=f'Observed ({obs_peak:.0f})')
ax.set_xlabel('Peak incidence')
ax.set_ylabel('Density')
ax.set_title('Peak incidence')
ax.legend(fontsize=8)
ax.grid(ls=':')
# Bottom-middle: attack rate
ax = axs[1, 1]
ax.hist(ensemble_df['ar'], bins=30, density=True, color='steelblue',
alpha=0.4, edgecolor='white', label='Prior NROY')
ax.hist(selected_ar, bins=15, density=True, color='purple',
alpha=0.7, edgecolor='white', label='Selected')
ax.axvline(obs_ar, color='red', lw=2, ls='--', label=f'Observed ({obs_ar:.2f})')
ax.set_xlabel('Attack rate')
ax.set_ylabel('Density')
ax.set_title('Attack rate')
ax.legend(fontsize=8)
ax.grid(ls=':')
# Bottom-right: incidence day 5
ax = axs[1, 2]
ax.hist(ensemble_df['inc5'], bins=30, density=True, color='steelblue',
alpha=0.4, edgecolor='white', label='Prior NROY')
ax.hist(selected_inc5, bins=15, density=True, color='purple',
alpha=0.7, edgecolor='white', label='Selected')
ax.axvline(obs_inc5, color='red', lw=2, ls='--', label=f'Observed ({obs_inc5:.0f})')
ax.set_xlabel('Incidence day 5')
ax.set_ylabel('Density')
ax.set_title('Incidence day 5')
ax.legend(fontsize=8)
ax.grid(ls=':')
fig.suptitle('NROY ensemble → Trajectory selection', fontsize=13)
fig.tight_layout()
# Parameter distributions: NROY pool vs. selected set
fig, axs = plt.subplots(1, 2, figsize=(11, 4))
for ax, param, true_val in zip(axs, ['beta', 'gamma'], [BETA_TRUE, GAMMA_TRUE]):
ax.hist(nroy_samples[param], bins=20, density=True,
alpha=0.5, color='steelblue', label='NROY pool')
ax.hist(selected_df[param], bins=20, density=True,
alpha=0.7, color='purple', label='Selected')
ax.axvline(true_val, color='red', ls='--', lw=1.5, label=f'True ({true_val})')
ax.set_xlabel(param)
ax.set_ylabel('Density')
ax.set_title(f'Distribution of {param}')
ax.legend(fontsize=8)
ax.grid(ls=':')
fig.suptitle('Parameter distributions before and after trajectory selection', fontsize=12)
fig.tight_layout()
8. Effect of tolerance ε¶
The tolerance ε is the key tuning parameter. Here we sweep over several values to show the trade-off between specificity (small ε → selected trajectories are close to the observed data) and coverage (large ε → selected trajectories span the full NROY ensemble).
A useful diagnostic is the effective sample size $N_{\mathrm{eff}} = \left(\sum_i w_i^2\right)^{-1}$. When $N_{\mathrm{eff}} \ll N$ the weight distribution is too concentrated; when $N_{\mathrm{eff}} \approx N$ the selection is almost uniform (lenient).
epsilons = [obs_peak * f for f in [0.02, 0.05, 0.10, 0.20, 0.40]]
fig, axs = plt.subplots(1, len(epsilons), figsize=(16, 4), sharey=True)
for ax, eps in zip(axs, epsilons):
w = gaussian_weights(inc_matrix, obs_array, epsilon=eps)
nef = 1 / (w**2).sum()
idx_sel = rng.choice(len(ensemble_df), size=N_SELECT, replace=False, p=w)
sel_inc = inc_matrix[idx_sel]
ax.fill_between(
range(N_DAYS),
np.percentile(sel_inc, 10, axis=0),
np.percentile(sel_inc, 90, axis=0),
alpha=0.3, color='purple', label='10–90 pct',
)
ax.plot(sel_inc.mean(axis=0), color='purple', lw=2)
ax.plot(obs_array, 'ko-', ms=3, lw=1.5, zorder=5)
ax.set_title(f'ε = {eps:.1f}\nNeff = {nef:.0f}')
ax.set_xlabel('Day')
ax.grid(ls=':')
axs[0].set_ylabel('Daily new cases')
fig.suptitle('Trajectory selection sensitivity to tolerance ε', fontsize=12)
fig.tight_layout()
Summary¶
| Step | What it does | Output |
|---|---|---|
| History matching | Eliminates implausible regions of parameter space using emulators | NROY parameter samples |
| Trajectory ensemble | Runs the stochastic model at NROY parameters × multiple seeds | Large ensemble of candidate trajectories |
| Pseudo-likelihood weighting | Scores each trajectory by how closely it matches the observed data | Weight per (parameter, seed) pair |
| Importance resampling | Draws a tractable set proportional to weights | Selected plausible trajectories |
Key design choices¶
Why use summary statistics for HM but full time series for trajectory selection?
History matching with the full time series would require one emulator per day — high-dimensional and expensive. Summary statistics keep HM tractable. The trajectory selection step then acts as a second, finer filter using the full data.
How to choose ε?
Start with ε ≈ 10% of the observed peak. Check the effective sample size — aim for $N_{\mathrm{eff}} \gtrsim 50$. If the selected trajectories look too broad, tighten ε. If they look implausibly narrow, relax it.
Using the selected trajectories
Each selected (β, γ, seed) triple is a complete, reproducible simulation run. You can re-run these with the full model (e.g., longer time horizon, additional compartments) to generate matched forecasts or counterfactuals.