Tutorial 5: Trajectory Selection¶

History matching identifies the NROY region — the part of parameter space Not Ruled Out Yet by the available evidence. But for stochastic models, knowing which parameters are plausible is only half the story. The same parameters with different random seeds can produce very different epidemic curves.

Trajectory selection is the post-calibration step that picks specific (parameter set, random seed) pairs whose simulated outputs are consistent with the observed data. The result is a set of plausible trajectories — not just plausible parameters.

This tutorial uses Sampling Importance Resampling (SIR)¹ to select trajectories:

Sample (β, γ, seed) triples from the NROY region
Run the stochastic simulator for each triple → get an ensemble of incidence curves
Weight each trajectory by how well it matches the observations (pseudo-likelihood)
Resample proportional to those weights → the selected trajectory set

¹ Yes, the acronym matches the epidemic model. This is intentional and delightful.

In [1]:

Copied!





import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.colors as mcolors

import historymatching as hm
from model import SIR, generate_observed_data

%matplotlib inline
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.colors as mcolors

import historymatching as hm
from model import SIR, generate_observed_data

%matplotlib inline

WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
I0000 00:00:1780533969.195809 1041217 port.cc:153] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
I0000 00:00:1780533969.260437 1041217 cpu_feature_guard.cc:227] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.

WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
I0000 00:00:1780533970.479857 1041217 port.cc:153] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.

/software/conda/envs/hm/lib/python3.13/site-packages/gpflow/versions.py:1: UserWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html. The pkg_resources package is slated for removal as early as 2025-11-30. Refrain from using this package or pin to Setuptools<81.
  import pkg_resources

1. Generate synthetic observations¶

We generate a synthetic epidemic using a known (β, γ) pair. In a real study this would be replaced by actual surveillance data.

In [2]:

Copied!





POPULATION       = 5_000
SEED_INFECTIONS  = 50
BETA_TRUE        = 0.7
GAMMA_TRUE       = 0.4

obs_incidence, obs_model = generate_observed_data(
    beta_true        = BETA_TRUE,
    gamma_true       = GAMMA_TRUE,
    population_size  = POPULATION,
    n_seed_infections= SEED_INFECTIONS,
    seed=40,
)

print(f"True parameters:  β={BETA_TRUE}, γ={GAMMA_TRUE}  (R₀ ≈ {BETA_TRUE/GAMMA_TRUE:.2f})")
print(f"Observed peak:    {obs_incidence.max():.0f} cases on day {obs_incidence.argmax()}")
print(f"Total cases:      {obs_incidence.sum():.0f}  (attack rate {obs_incidence.sum()/POPULATION:.1%})")

fig, ax = plt.subplots(figsize=(8, 3))
ax.plot(obs_incidence.values, 'ko-', ms=4, lw=1.5, label='Observed incidence')
ax.set_xlabel('Day')
ax.set_ylabel('Daily new cases')
ax.set_title('Observed epidemic curve')
ax.legend()
ax.grid(ls=':')
fig.tight_layout()
POPULATION       = 5_000
SEED_INFECTIONS  = 50
BETA_TRUE        = 0.7
GAMMA_TRUE       = 0.4

obs_incidence, obs_model = generate_observed_data(
    beta_true        = BETA_TRUE,
    gamma_true       = GAMMA_TRUE,
    population_size  = POPULATION,
    n_seed_infections= SEED_INFECTIONS,
    seed=40,
)

print(f"True parameters:  β={BETA_TRUE}, γ={GAMMA_TRUE}  (R₀ ≈ {BETA_TRUE/GAMMA_TRUE:.2f})")
print(f"Observed peak:    {obs_incidence.max():.0f} cases on day {obs_incidence.argmax()}")
print(f"Total cases:      {obs_incidence.sum():.0f}  (attack rate {obs_incidence.sum()/POPULATION:.1%})")

fig, ax = plt.subplots(figsize=(8, 3))
ax.plot(obs_incidence.values, 'ko-', ms=4, lw=1.5, label='Observed incidence')
ax.set_xlabel('Day')
ax.set_ylabel('Daily new cases')
ax.set_title('Observed epidemic curve')
ax.legend()
ax.grid(ls=':')
fig.tight_layout()

True parameters:  β=0.7, γ=0.4  (R₀ ≈ 1.75)
Observed peak:    236 cases on day 12
Total cases:      2852  (attack rate 57.0%)

No description has been provided for this image

2. Run history matching to identify the NROY region¶

We calibrate using three summary statistics — peak incidence, attack rate, and incidence at day 5.

The simulation function injects a rand_seed column into the samples DataFrame if one isn't already present. Each (β, γ, seed) triple then maps to a single deterministic output, giving the GPR emulator a clean surface to fit. The engine ignores rand_seed for emulation (it only passes parameter-space columns to the emulator), but stores it on result.samples so the exact triples are available for trajectory selection later.

In [3]:

Copied!





def run_sir(samples: pd.DataFrame) -> pd.DataFrame:
    """Simulator wrapper — injects rand_seed for reproducibility.

    The engine only sees 'beta' and 'gamma' as parameter-space columns.
    'rand_seed' is metadata that flows through on result.samples.
    """
    df = samples.copy()
    if 'rand_seed' not in df.columns:
        df['rand_seed'] = np.random.default_rng(42).integers(0, 2**31, size=len(df))

    rows = []
    for _, row in df.iterrows():
        model = SIR(
            beta  = row['beta'],
            gamma = row['gamma'],
            s0    = POPULATION - SEED_INFECTIONS,
            i0    = SEED_INFECTIONS,
            seed  = int(row['rand_seed']),
        )
        inc = model.get_incidence()
        rows.append({
            'peak_incidence': float(inc.max()),
            'attack_rate':    float(inc.sum() / POPULATION),
            'incidence_5':    float(inc[5]) if len(inc) > 5 else 0.0,
        })
    return pd.DataFrame(rows)
def run_sir(samples: pd.DataFrame) -> pd.DataFrame:
    """Simulator wrapper — injects rand_seed for reproducibility.

    The engine only sees 'beta' and 'gamma' as parameter-space columns.
    'rand_seed' is metadata that flows through on result.samples.
    """
    df = samples.copy()
    if 'rand_seed' not in df.columns:
        df['rand_seed'] = np.random.default_rng(42).integers(0, 2**31, size=len(df))

    rows = []
    for _, row in df.iterrows():
        model = SIR(
            beta  = row['beta'],
            gamma = row['gamma'],
            s0    = POPULATION - SEED_INFECTIONS,
            i0    = SEED_INFECTIONS,
            seed  = int(row['rand_seed']),
        )
        inc = model.get_incidence()
        rows.append({
            'peak_incidence': float(inc.max()),
            'attack_rate':    float(inc.sum() / POPULATION),
            'incidence_5':    float(inc[5]) if len(inc) > 5 else 0.0,
        })
    return pd.DataFrame(rows)

In [4]:

Copied!





obs_peak  = float(obs_incidence.max())
obs_ar    = float(obs_incidence.sum() / POPULATION)
obs_inc5  = float(obs_incidence.values[5])

print(f"Observed peak:        {obs_peak:.0f} cases/day")
print(f"Observed incidence_5: {obs_inc5:.0f} cases/day")
print(f"Observed attack rate: {obs_ar:.2f}")

engine = hm.HistoryMatching(
    bounds = {'beta': (0.3, 2.0), 'gamma': (0.1, 0.8)},
    observations     = {
        'peak_incidence': (obs_peak,  obs_peak * 0.05),   # ±5%
        'attack_rate':    (obs_ar,    obs_ar   * 0.03),   # ±3%
        'incidence_5':    (obs_inc5,  obs_inc5 * 0.10),   # ±10%
    },
    function                 = run_sir,
    emulator_type            = 'gpr',
    n_samples                = 500,
    max_iterations           = 5,
    implausibility_threshold = 3.0,
)

results = engine.run()
print(engine)
obs_peak  = float(obs_incidence.max())
obs_ar    = float(obs_incidence.sum() / POPULATION)
obs_inc5  = float(obs_incidence.values[5])

print(f"Observed peak:        {obs_peak:.0f} cases/day")
print(f"Observed incidence_5: {obs_inc5:.0f} cases/day")
print(f"Observed attack rate: {obs_ar:.2f}")

engine = hm.HistoryMatching(
    bounds = {'beta': (0.3, 2.0), 'gamma': (0.1, 0.8)},
    observations     = {
        'peak_incidence': (obs_peak,  obs_peak * 0.05),   # ±5%
        'attack_rate':    (obs_ar,    obs_ar   * 0.03),   # ±3%
        'incidence_5':    (obs_inc5,  obs_inc5 * 0.10),   # ±10%
    },
    function                 = run_sir,
    emulator_type            = 'gpr',
    n_samples                = 500,
    max_iterations           = 5,
    implausibility_threshold = 3.0,
)

results = engine.run()
print(engine)

Observed peak:        236 cases/day
Observed incidence_5: 96 cases/day
Observed attack rate: 0.57

I0000 00:00:1780533973.463531 1041217 gpu_device.cc:2043] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 2175 MB memory:  -> device: 0, name: NVIDIA GeForce RTX 3050 Laptop GPU, pci bus id: 0000:01:00.0, compute capability: 8.6

I0000 00:00:1780533978.277694 1041403 cuda_solvers.cc:175] Creating GpuSolver handles for stream 0x609fbc058860

INFO:historymatching.engine:Checkpoint saved to hm_output/run_20260603_204612/checkpoint.pkl

INFO:historymatching.engine:Wave 1 output saved to hm_output/run_20260603_204612/wave1

INFO:historymatching.engine:[Wave 1] COMMITTED — diagnostics and checkpoint saved to hm_output/run_20260603_204612

INFO:historymatching.engine:============================================================

INFO:historymatching.engine:============================================================

INFO:historymatching.engine:WAVE 2 STARTING

INFO:historymatching.engine:============================================================

INFO:historymatching.engine:[Wave 2] Phase 1/5 SAMPLING: using 500 pre-computed samples from wave 1 [0.0s]

INFO:historymatching.engine:[Wave 2] Phase 2/5 SIMULATION: running 500 simulations...

INFO:historymatching.engine:[Wave 2] Phase 2/5 SIMULATION: complete — 3 outputs [0.4s]

INFO:historymatching.feature_selection:  Feature ranking (mean_sq_z): attack_rate=116.38, peak_incidence=57.05, incidence_5=48.93

INFO:historymatching.feature_selection:  Cooldown (skip): ['peak_incidence']

INFO:historymatching.feature_selection:  SELECTED: incidence_5 (score=48.93, mean_z=1.38, std_z=6.86)

INFO:historymatching.engine:[Wave 2] Phase 3/5 FEATURE SELECTION: ['incidence_5'] [0.0s]

INFO:historymatching.engine:[Wave 2] Phase 4/5 EMULATOR TRAINING: training 1 emulators...

INFO:historymatching.emulators.factory:  Emulator 1/1 [incidence_5]: creating (500 samples, 2 params, type=gpr)...

DEBUG:historymatching.emulators.factory:Created gpr emulator for feature: incidence_5

INFO:historymatching.emulators.factory:  Emulator 1/1 [incidence_5]: training...

INFO:historymatching.emulators.gpr:    Training GPR: 375 points, 2 dims

INFO:historymatching.emulators.gpr:    Building GPflow model and optimizing hyperparameters...

INFO:historymatching.emulators.gpr:    GPR training complete — noise=0.2124, lengthscale range=[0.862, 1.753] [1.0s]

INFO:historymatching.emulators.factory:  Emulator 1/1 [incidence_5]: trained [1.0s]

INFO:historymatching.engine:[Wave 2] Phase 4/5 EMULATOR TRAINING: complete [1.0s]

INFO:historymatching.engine:[Wave 2] Phase 5/5 NROY SAMPLING: finding 500 plausible candidates for next wave...

INFO:historymatching.nroy_sampling:  LHS rejection: 0/500 (0%) — starting

INFO:historymatching.nroy_sampling:  LHS rejection: 79/500 (15%) | 550 tested | rate=14.3636% [0s]

INFO:historymatching.nroy_sampling:  LHS rejection: 559/500 (111%) | 3,774 tested | rate=14.8119% [0s]

INFO:historymatching.nroy_sampling:  LHS rejection: 500/500 [0.4s]

INFO:historymatching.nroy_sampling:  NROY SUMMARY: 500/500 [OK] — sources: LHS=500 [0.4s]

DEBUG:historymatching.engine:NROY fraction: 0.132485

INFO:historymatching.engine:[Wave 2] Phase 5/5 NROY SAMPLING: found 500 candidates [0.4s]

INFO:historymatching.engine:[Wave 2] ALL PHASES COMPLETE [1.8s total]. Committing...

INFO:historymatching.engine:Checkpoint saved to hm_output/run_20260603_204612/checkpoint.pkl

INFO:historymatching.engine:Wave 2 output saved to hm_output/run_20260603_204612/wave2

INFO:historymatching.engine:[Wave 2] COMMITTED — diagnostics and checkpoint saved to hm_output/run_20260603_204612

INFO:historymatching.engine:============================================================

INFO:historymatching.engine:============================================================

INFO:historymatching.engine:WAVE 3 STARTING

INFO:historymatching.engine:============================================================

INFO:historymatching.engine:[Wave 3] Phase 1/5 SAMPLING: using 500 pre-computed samples from wave 2 [0.0s]

INFO:historymatching.engine:[Wave 3] Phase 2/5 SIMULATION: running 500 simulations...

INFO:historymatching.engine:[Wave 3] Phase 2/5 SIMULATION: complete — 3 outputs [0.4s]

INFO:historymatching.feature_selection:  Feature ranking (mean_sq_z): attack_rate=139.63, peak_incidence=59.79, incidence_5=26.93

INFO:historymatching.feature_selection:  Cooldown (skip): ['incidence_5']

INFO:historymatching.feature_selection:  SELECTED: attack_rate (score=139.63, mean_z=-6.07, std_z=10.15)

INFO:historymatching.engine:[Wave 3] Phase 3/5 FEATURE SELECTION: ['attack_rate'] [0.0s]

INFO:historymatching.engine:[Wave 3] Phase 4/5 EMULATOR TRAINING: training 1 emulators...

INFO:historymatching.emulators.factory:  Emulator 1/1 [attack_rate]: creating (500 samples, 2 params, type=gpr)...

DEBUG:historymatching.emulators.factory:Created gpr emulator for feature: attack_rate

INFO:historymatching.emulators.factory:  Emulator 1/1 [attack_rate]: training...

INFO:historymatching.emulators.gpr:    Training GPR: 375 points, 2 dims

INFO:historymatching.emulators.gpr:    Building GPflow model and optimizing hyperparameters...

INFO:historymatching.emulators.gpr:    GPR training complete — noise=0.0767, lengthscale range=[0.506, 0.687] [0.9s]

INFO:historymatching.emulators.factory:  Emulator 1/1 [attack_rate]: trained [0.9s]

INFO:historymatching.engine:[Wave 3] Phase 4/5 EMULATOR TRAINING: complete [0.9s]

INFO:historymatching.engine:[Wave 3] Phase 5/5 NROY SAMPLING: finding 500 plausible candidates for next wave...

INFO:historymatching.nroy_sampling:  LHS rejection: 0/500 (0%) — starting

INFO:historymatching.nroy_sampling:  LHS rejection: 44/500 (8%) | 550 tested | rate=8.0000% [0s]

INFO:historymatching.nroy_sampling:  LHS rejection: 585/500 (117%) | 6,820 tested | rate=8.5777% [1s]

INFO:historymatching.nroy_sampling:  LHS rejection: 500/500 [0.7s]

INFO:historymatching.nroy_sampling:  NROY SUMMARY: 500/500 [OK] — sources: LHS=500 [0.7s]

DEBUG:historymatching.engine:NROY fraction: 0.073314

INFO:historymatching.engine:[Wave 3] Phase 5/5 NROY SAMPLING: found 500 candidates [0.7s]

INFO:historymatching.engine:[Wave 3] ALL PHASES COMPLETE [2.0s total]. Committing...

INFO:historymatching.engine:Checkpoint saved to hm_output/run_20260603_204612/checkpoint.pkl

INFO:historymatching.engine:Wave 3 output saved to hm_output/run_20260603_204612/wave3

INFO:historymatching.engine:[Wave 3] COMMITTED — diagnostics and checkpoint saved to hm_output/run_20260603_204612

INFO:historymatching.engine:============================================================

INFO:historymatching.engine:============================================================

INFO:historymatching.engine:WAVE 4 STARTING

INFO:historymatching.engine:============================================================

INFO:historymatching.engine:[Wave 4] Phase 1/5 SAMPLING: using 500 pre-computed samples from wave 3 [0.0s]

INFO:historymatching.engine:[Wave 4] Phase 2/5 SIMULATION: running 500 simulations...

INFO:historymatching.engine:[Wave 4] Phase 2/5 SIMULATION: complete — 3 outputs [0.4s]

INFO:historymatching.feature_selection:  Feature ranking (mean_sq_z): attack_rate=33.00, incidence_5=30.26, peak_incidence=25.22

INFO:historymatching.feature_selection:  Cooldown (skip): ['attack_rate']

INFO:historymatching.feature_selection:  SELECTED: incidence_5 (score=30.26, mean_z=1.97, std_z=5.14)

INFO:historymatching.engine:[Wave 4] Phase 3/5 FEATURE SELECTION: ['incidence_5'] [0.0s]

INFO:historymatching.engine:[Wave 4] Phase 4/5 EMULATOR TRAINING: training 1 emulators...

INFO:historymatching.emulators.factory:  Emulator 1/1 [incidence_5]: creating (500 samples, 2 params, type=gpr)...

DEBUG:historymatching.emulators.factory:Created gpr emulator for feature: incidence_5

INFO:historymatching.emulators.factory:  Emulator 1/1 [incidence_5]: training...

INFO:historymatching.emulators.gpr:    Training GPR: 375 points, 2 dims

INFO:historymatching.emulators.gpr:    Building GPflow model and optimizing hyperparameters...

INFO:historymatching.emulators.gpr:    GPR training complete — noise=0.3334, lengthscale range=[1.129, 1.565] [1.0s]

INFO:historymatching.emulators.factory:  Emulator 1/1 [incidence_5]: trained [1.0s]

INFO:historymatching.engine:[Wave 4] Phase 4/5 EMULATOR TRAINING: complete [1.0s]

INFO:historymatching.engine:[Wave 4] Phase 5/5 NROY SAMPLING: finding 500 plausible candidates for next wave...

INFO:historymatching.nroy_sampling:  LHS rejection: 0/500 (0%) — starting

INFO:historymatching.nroy_sampling:  LHS rejection: 54/500 (10%) | 550 tested | rate=9.8182% [0s]

INFO:historymatching.nroy_sampling:  LHS rejection: 471/500 (94%) | 5,546 tested | rate=8.4926% [1s]

INFO:historymatching.nroy_sampling:  LHS rejection: 500/500 (100%) | 5,960 tested | rate=8.3893% [2s]

INFO:historymatching.nroy_sampling:  LHS rejection: 500/500 [2.2s]

INFO:historymatching.nroy_sampling:  NROY SUMMARY: 500/500 [OK] — sources: LHS=500 [2.2s]

DEBUG:historymatching.engine:NROY fraction: 0.083893

INFO:historymatching.engine:[Wave 4] Phase 5/5 NROY SAMPLING: found 500 candidates [2.3s]

INFO:historymatching.engine:[Wave 4] ALL PHASES COMPLETE [3.6s total]. Committing...

INFO:historymatching.engine:Checkpoint saved to hm_output/run_20260603_204612/checkpoint.pkl

INFO:historymatching.engine:Wave 4 output saved to hm_output/run_20260603_204612/wave4

INFO:historymatching.engine:[Wave 4] COMMITTED — diagnostics and checkpoint saved to hm_output/run_20260603_204612

INFO:historymatching.engine:============================================================

INFO:historymatching.engine:============================================================

INFO:historymatching.engine:WAVE 5 STARTING

INFO:historymatching.engine:============================================================

INFO:historymatching.engine:[Wave 5] Phase 1/5 SAMPLING: using 500 pre-computed samples from wave 4 [0.0s]

INFO:historymatching.engine:[Wave 5] Phase 2/5 SIMULATION: running 500 simulations...

INFO:historymatching.engine:[Wave 5] Phase 2/5 SIMULATION: complete — 3 outputs [0.4s]

INFO:historymatching.feature_selection:  Feature ranking (mean_sq_z): attack_rate=34.29, incidence_5=26.63, peak_incidence=25.91

INFO:historymatching.feature_selection:  Cooldown (skip): ['incidence_5']

INFO:historymatching.feature_selection:  SELECTED: attack_rate (score=34.29, mean_z=-0.17, std_z=5.86)

INFO:historymatching.engine:[Wave 5] Phase 3/5 FEATURE SELECTION: ['attack_rate'] [0.0s]

INFO:historymatching.engine:[Wave 5] Phase 4/5 EMULATOR TRAINING: training 1 emulators...

INFO:historymatching.emulators.factory:  Emulator 1/1 [attack_rate]: creating (500 samples, 2 params, type=gpr)...

DEBUG:historymatching.emulators.factory:Created gpr emulator for feature: attack_rate

INFO:historymatching.emulators.factory:  Emulator 1/1 [attack_rate]: training...

INFO:historymatching.emulators.gpr:    Training GPR: 375 points, 2 dims

INFO:historymatching.emulators.gpr:    Building GPflow model and optimizing hyperparameters...

INFO:historymatching.emulators.gpr:    GPR training complete — noise=0.1423, lengthscale range=[0.807, 1.062] [1.1s]

INFO:historymatching.emulators.factory:  Emulator 1/1 [attack_rate]: trained [1.1s]

INFO:historymatching.engine:[Wave 5] Phase 4/5 EMULATOR TRAINING: complete [1.1s]

INFO:historymatching.engine:[Wave 5] Phase 5/5 NROY SAMPLING: finding 500 plausible candidates for next wave...

INFO:historymatching.nroy_sampling:  LHS rejection: 0/500 (0%) — starting

INFO:historymatching.nroy_sampling:  LHS rejection: 33/500 (6%) | 550 tested | rate=6.0000% [0s]

INFO:historymatching.nroy_sampling:  LHS rejection: 556/500 (111%) | 9,111 tested | rate=6.1025% [1s]

INFO:historymatching.nroy_sampling:  LHS rejection: 500/500 [1.1s]

INFO:historymatching.nroy_sampling:  NROY SUMMARY: 500/500 [OK] — sources: LHS=500 [1.1s]

DEBUG:historymatching.engine:NROY fraction: 0.054879

INFO:historymatching.engine:[Wave 5] Phase 5/5 NROY SAMPLING: found 500 candidates [1.2s]

INFO:historymatching.engine:[Wave 5] ALL PHASES COMPLETE [2.7s total]. Committing...

INFO:historymatching.engine:Checkpoint saved to hm_output/run_20260603_204612/checkpoint.pkl

INFO:historymatching.engine:Wave 5 output saved to hm_output/run_20260603_204612/wave5

INFO:historymatching.engine:[Wave 5] COMMITTED — diagnostics and checkpoint saved to hm_output/run_20260603_204612

INFO:historymatching.engine:============================================================

INFO:historymatching.engine:Automated run completed. 5 iterations executed.

HistoryMatching(
  parameters=2 ['beta', 'gamma'],
  outputs=3 ['peak_incidence', 'attack_rate', 'incidence_5'],
  emulator=gpr, simulator=set,
  n_samples=500, implausibility_threshold=3.0,
  state=completed, wave 5/5, acceptance_rate=0.155
)

3. Inspect the NROY region¶

After history matching the parameter samples from the final iteration were drawn by rejection-sampling the full prior against all trained emulators. They represent a draw from the current NROY region.

In [5]:

Copied!





# Collect samples from all iterations to visualise the space reduction
all_samples = pd.concat(
    [r.samples.assign(iteration=r.iteration) for r in results],
    ignore_index=True,
)

fig, axs = plt.subplots(1, 2, figsize=(12, 4))

cmap  = plt.get_cmap('viridis')
n_iter = len(results)
for it in range(1, n_iter + 1):
    df = all_samples[all_samples['iteration'] == it]
    axs[0].scatter(df['beta'], df['gamma'],
                   s=10, alpha=0.5,
                   color=cmap((it - 1) / n_iter),
                   label=f'Iteration {it}')

axs[0].scatter([BETA_TRUE], [GAMMA_TRUE], marker='*', s=200,
               c='red', zorder=5, label='True values')
axs[0].set_xlabel('β')
axs[0].set_ylabel('γ')
axs[0].set_title('Parameter space — NROY shrinks each iteration')
axs[0].legend(fontsize=8)
axs[0].grid(ls=':')

# NROY fraction from the engine (what fraction of the prior remains plausible)
iters = [r.iteration for r in results]
nroy_fracs = [r.nroy_fraction for r in results]

axs[1].bar(iters, nroy_fracs, color=[cmap((i) / n_iter) for i in range(n_iter)])
for w, f in zip(iters, nroy_fracs):
    axs[1].annotate(f'{f:.0%}', (w, f), textcoords='offset points',
                    xytext=(0, 5), ha='center', fontsize=9)
axs[1].set_xlabel('Iteration')
axs[1].set_ylabel('NROY fraction')
axs[1].set_title('Fraction of prior space remaining')
axs[1].set_xticks(iters)
axs[1].set_ylim(0, 1)
axs[1].grid(ls=':', axis='y')

fig.tight_layout()

# NROY pool = samples that passed ALL waves' emulators
# These already have rand_seed attached (injected by run_sir)
nroy_samples = engine.get_nroy_samples(10000)
print(f"NROY parameter pool: {len(nroy_samples)} (β, γ, seed) triples")
print(f"  β range:    [{nroy_samples['beta'].min():.3f}, {nroy_samples['beta'].max():.3f}]")
print(f"  γ range:    [{nroy_samples['gamma'].min():.3f}, {nroy_samples['gamma'].max():.3f}]")
# Collect samples from all iterations to visualise the space reduction
all_samples = pd.concat(
    [r.samples.assign(iteration=r.iteration) for r in results],
    ignore_index=True,
)

fig, axs = plt.subplots(1, 2, figsize=(12, 4))

cmap  = plt.get_cmap('viridis')
n_iter = len(results)
for it in range(1, n_iter + 1):
    df = all_samples[all_samples['iteration'] == it]
    axs[0].scatter(df['beta'], df['gamma'],
                   s=10, alpha=0.5,
                   color=cmap((it - 1) / n_iter),
                   label=f'Iteration {it}')

axs[0].scatter([BETA_TRUE], [GAMMA_TRUE], marker='*', s=200,
               c='red', zorder=5, label='True values')
axs[0].set_xlabel('β')
axs[0].set_ylabel('γ')
axs[0].set_title('Parameter space — NROY shrinks each iteration')
axs[0].legend(fontsize=8)
axs[0].grid(ls=':')

# NROY fraction from the engine (what fraction of the prior remains plausible)
iters = [r.iteration for r in results]
nroy_fracs = [r.nroy_fraction for r in results]

axs[1].bar(iters, nroy_fracs, color=[cmap((i) / n_iter) for i in range(n_iter)])
for w, f in zip(iters, nroy_fracs):
    axs[1].annotate(f'{f:.0%}', (w, f), textcoords='offset points',
                    xytext=(0, 5), ha='center', fontsize=9)
axs[1].set_xlabel('Iteration')
axs[1].set_ylabel('NROY fraction')
axs[1].set_title('Fraction of prior space remaining')
axs[1].set_xticks(iters)
axs[1].set_ylim(0, 1)
axs[1].grid(ls=':', axis='y')

fig.tight_layout()

# NROY pool = samples that passed ALL waves' emulators
# These already have rand_seed attached (injected by run_sir)
nroy_samples = engine.get_nroy_samples(10000)
print(f"NROY parameter pool: {len(nroy_samples)} (β, γ, seed) triples")
print(f"  β range:    [{nroy_samples['beta'].min():.3f}, {nroy_samples['beta'].max():.3f}]")
print(f"  γ range:    [{nroy_samples['gamma'].min():.3f}, {nroy_samples['gamma'].max():.3f}]")

INFO:historymatching.nroy_sampling:  LHS rejection: 0/10000 (0%) — starting

INFO:historymatching.nroy_sampling:  LHS rejection: 683/10000 (6%) | 11,000 tested | rate=6.2091% [1s]

INFO:historymatching.nroy_sampling:  LHS rejection: 7131/10000 (71%) | 111,000 tested | rate=6.4243% [4s]

INFO:historymatching.nroy_sampling:  LHS rejection: 10228/10000 (102%) | 160,124 tested | rate=6.3875% [5s]

INFO:historymatching.nroy_sampling:  LHS rejection: 10000/10000 [5.3s]

INFO:historymatching.nroy_sampling:  NROY SUMMARY: 10000/10000 [OK] — sources: LHS=10000 [5.3s]

NROY parameter pool: 10000 (β, γ, seed) triples
  β range:    [0.314, 1.125]
  γ range:    [0.100, 0.800]

4. Generate a trajectory ensemble¶

We draw a large NROY pool using engine.get_nroy_samples(10000) — this rejection-samples fresh candidates from the full prior against all trained emulators, cheaply and without running any new simulations. Each NROY point gets a unique seed so the trajectory is deterministic and reproducible.

In [6]:

Copied!





N_DAYS = obs_incidence.shape[0]

ensemble = []
for i, (_, row) in enumerate(nroy_samples.iterrows()):
    model = SIR(
        beta  = row['beta'],
        gamma = row['gamma'],
        s0    = POPULATION - SEED_INFECTIONS,
        i0    = SEED_INFECTIONS,
        seed  = i,   # unique seed per NROY point
    )
    inc = model.get_incidence()
    ensemble.append({
        'beta':      row['beta'],
        'gamma':     row['gamma'],
        'seed':      i,
        'incidence': inc[:N_DAYS],
    })

ensemble_df = pd.DataFrame(ensemble)
print(f"Ensemble size: {len(ensemble_df)} trajectories (1 per NROY sample)")
N_DAYS = obs_incidence.shape[0]

ensemble = []
for i, (_, row) in enumerate(nroy_samples.iterrows()):
    model = SIR(
        beta  = row['beta'],
        gamma = row['gamma'],
        s0    = POPULATION - SEED_INFECTIONS,
        i0    = SEED_INFECTIONS,
        seed  = i,   # unique seed per NROY point
    )
    inc = model.get_incidence()
    ensemble.append({
        'beta':      row['beta'],
        'gamma':     row['gamma'],
        'seed':      i,
        'incidence': inc[:N_DAYS],
    })

ensemble_df = pd.DataFrame(ensemble)
print(f"Ensemble size: {len(ensemble_df)} trajectories (1 per NROY sample)")

Ensemble size: 10000 trajectories (1 per NROY sample)

In [7]:

Copied!





inc_matrix = np.vstack(ensemble_df['incidence'].values)  # shape (n_traj, n_days)

# Compute calibration features for every trajectory
ensemble_df['peak']  = inc_matrix.max(axis=1)
ensemble_df['ar']    = inc_matrix.sum(axis=1) / POPULATION
ensemble_df['inc5']  = inc_matrix[:, 5]

fig, axs = plt.subplots(2, 2, figsize=(14, 8))

# Top-left: time-series spaghetti with day 5 highlighted
ax = axs[0, 0]
for row in inc_matrix:
    ax.plot(row, color='steelblue', alpha=0.05, lw=0.8)
ax.plot(inc_matrix.mean(axis=0), color='steelblue', lw=2, label='Ensemble mean')
ax.plot(obs_incidence.values, 'ko-', ms=4, lw=2, label='Observed', zorder=5)
ax.scatter([5], [obs_inc5], s=120, facecolors='none', edgecolors='red',
           linewidths=2, zorder=6, label='Emulated day 5')
ax.set_xlabel('Day')
ax.set_ylabel('Daily new cases')
ax.set_title(f'Prior NROY ensemble ({len(ensemble_df):,} trajectories)')
ax.legend(fontsize=8)
ax.grid(ls=':')

# Top-right: peak incidence
ax = axs[0, 1]
ax.hist(ensemble_df['peak'], bins=30, color='steelblue', alpha=0.6, edgecolor='white')
ax.axvline(obs_peak, color='red', lw=2, ls='--', label=f'Observed ({obs_peak:.0f})')
ax.axvspan(obs_peak * (1 - 1.96 * 0.05), obs_peak * (1 + 1.96 * 0.05),
           color='red', alpha=0.08, label='HM tolerance (±5%)')
ax.set_xlabel('Peak incidence')
ax.set_ylabel('Count')
ax.set_title('Peak incidence')
ax.legend(fontsize=8)
ax.grid(ls=':')

# Bottom-left: attack rate
ax = axs[1, 0]
ax.hist(ensemble_df['ar'], bins=30, color='steelblue', alpha=0.6, edgecolor='white')
ax.axvline(obs_ar, color='red', lw=2, ls='--', label=f'Observed ({obs_ar:.2f})')
ax.axvspan(obs_ar * (1 - 1.96 * 0.03), obs_ar * (1 + 1.96 * 0.03),
           color='red', alpha=0.08, label='HM tolerance (±3%)')
ax.set_xlabel('Attack rate')
ax.set_ylabel('Count')
ax.set_title('Attack rate')
ax.legend(fontsize=8)
ax.grid(ls=':')

# Bottom-right: incidence day 5
ax = axs[1, 1]
ax.hist(ensemble_df['inc5'], bins=30, color='steelblue', alpha=0.6, edgecolor='white')
ax.axvline(obs_inc5, color='red', lw=2, ls='--', label=f'Observed ({obs_inc5:.0f})')
ax.axvspan(obs_inc5 * (1 - 1.96 * 0.10), obs_inc5 * (1 + 1.96 * 0.10),
           color='red', alpha=0.08, label='HM tolerance (±10%)')
ax.set_xlabel('Incidence day 5')
ax.set_ylabel('Count')
ax.set_title('Incidence day 5')
ax.legend(fontsize=8)
ax.grid(ls=':')

fig.suptitle('Prior NROY ensemble vs HM targets', fontsize=12)
fig.tight_layout()
inc_matrix = np.vstack(ensemble_df['incidence'].values)  # shape (n_traj, n_days)

# Compute calibration features for every trajectory
ensemble_df['peak']  = inc_matrix.max(axis=1)
ensemble_df['ar']    = inc_matrix.sum(axis=1) / POPULATION
ensemble_df['inc5']  = inc_matrix[:, 5]

fig, axs = plt.subplots(2, 2, figsize=(14, 8))

# Top-left: time-series spaghetti with day 5 highlighted
ax = axs[0, 0]
for row in inc_matrix:
    ax.plot(row, color='steelblue', alpha=0.05, lw=0.8)
ax.plot(inc_matrix.mean(axis=0), color='steelblue', lw=2, label='Ensemble mean')
ax.plot(obs_incidence.values, 'ko-', ms=4, lw=2, label='Observed', zorder=5)
ax.scatter([5], [obs_inc5], s=120, facecolors='none', edgecolors='red',
           linewidths=2, zorder=6, label='Emulated day 5')
ax.set_xlabel('Day')
ax.set_ylabel('Daily new cases')
ax.set_title(f'Prior NROY ensemble ({len(ensemble_df):,} trajectories)')
ax.legend(fontsize=8)
ax.grid(ls=':')

# Top-right: peak incidence
ax = axs[0, 1]
ax.hist(ensemble_df['peak'], bins=30, color='steelblue', alpha=0.6, edgecolor='white')
ax.axvline(obs_peak, color='red', lw=2, ls='--', label=f'Observed ({obs_peak:.0f})')
ax.axvspan(obs_peak * (1 - 1.96 * 0.05), obs_peak * (1 + 1.96 * 0.05),
           color='red', alpha=0.08, label='HM tolerance (±5%)')
ax.set_xlabel('Peak incidence')
ax.set_ylabel('Count')
ax.set_title('Peak incidence')
ax.legend(fontsize=8)
ax.grid(ls=':')

# Bottom-left: attack rate
ax = axs[1, 0]
ax.hist(ensemble_df['ar'], bins=30, color='steelblue', alpha=0.6, edgecolor='white')
ax.axvline(obs_ar, color='red', lw=2, ls='--', label=f'Observed ({obs_ar:.2f})')
ax.axvspan(obs_ar * (1 - 1.96 * 0.03), obs_ar * (1 + 1.96 * 0.03),
           color='red', alpha=0.08, label='HM tolerance (±3%)')
ax.set_xlabel('Attack rate')
ax.set_ylabel('Count')
ax.set_title('Attack rate')
ax.legend(fontsize=8)
ax.grid(ls=':')

# Bottom-right: incidence day 5
ax = axs[1, 1]
ax.hist(ensemble_df['inc5'], bins=30, color='steelblue', alpha=0.6, edgecolor='white')
ax.axvline(obs_inc5, color='red', lw=2, ls='--', label=f'Observed ({obs_inc5:.0f})')
ax.axvspan(obs_inc5 * (1 - 1.96 * 0.10), obs_inc5 * (1 + 1.96 * 0.10),
           color='red', alpha=0.08, label='HM tolerance (±10%)')
ax.set_xlabel('Incidence day 5')
ax.set_ylabel('Count')
ax.set_title('Incidence day 5')
ax.legend(fontsize=8)
ax.grid(ls=':')

fig.suptitle('Prior NROY ensemble vs HM targets', fontsize=12)
fig.tight_layout()

5. Weight trajectories by pseudo-likelihood¶

We score each trajectory against the observed incidence curve using a Gaussian pseudo-likelihood:

$$w_i = \exp\!\left(-\frac{d_i^2}{2\,\varepsilon^2}\right), \qquad d_i = \sqrt{\frac{1}{T}\sum_{t=1}^T\left(y_i(t) - y_{\mathrm{obs}}(t)\right)^2}$$

where $d_i$ is the RMSE between trajectory $i$ and the observed data, and $\varepsilon$ is a tolerance that controls how strict the selection is.

Large ε → lenient: weights spread broadly, selected set ≈ random draw from NROY
Small ε → strict: only trajectories very close to the observed curve get non-negligible weight

A practical starting point is $\varepsilon \approx$ one standard deviation of the observed peak.

In [8]:

Copied!





def gaussian_weights(inc_matrix: np.ndarray, obs: np.ndarray, epsilon: float) -> np.ndarray:
    """
    Compute normalised pseudo-likelihood weights for each trajectory.

    Args:
        inc_matrix: Array of shape (n_traj, n_days) — simulated trajectories.
        obs:        Array of shape (n_days,)         — observed incidence.
        epsilon:    Tolerance (RMSE scale).  Smaller = stricter selection.

    Returns:
        Normalised weight array of shape (n_traj,).
    """
    rmse = np.sqrt(np.mean((inc_matrix - obs[np.newaxis, :]) ** 2, axis=1))
    w    = np.exp(-0.5 * (rmse / epsilon) ** 2)
    total = w.sum()
    if total == 0:
        raise ValueError("All weights are zero — try a larger epsilon.")
    return w / total


# Tolerance ≈ 10 % of the peak value — a reasonable starting point
epsilon = obs_peak * 0.03

obs_array = obs_incidence.values[:N_DAYS]
weights   = gaussian_weights(inc_matrix, obs_array, epsilon=epsilon)

print(f"Tolerance ε = {epsilon:.1f} cases/day")
print(f"Effective sample size: {1 / (weights**2).sum():.0f}  "
      f"(out of {len(weights):,} total)")

# Scatter: parameter space coloured by weight
fig, ax = plt.subplots(figsize=(6, 5))
sc = ax.scatter(
    ensemble_df['beta'], ensemble_df['gamma'],
    c=weights, cmap='viridis_r', s=6, alpha=0.6, norm=mcolors.LogNorm(),
)
ax.scatter([BETA_TRUE], [GAMMA_TRUE], marker='*', s=250, c='red',
           edgecolors='k', lw=0.5, zorder=5, label='True values')
plt.colorbar(sc, ax=ax, label='Weight (log scale)')
ax.set_xlabel('β')
ax.set_ylabel('γ')
ax.set_title('Trajectory weights in parameter space')
ax.legend()
ax.grid(ls=':')
fig.tight_layout()
def gaussian_weights(inc_matrix: np.ndarray, obs: np.ndarray, epsilon: float) -> np.ndarray:
    """
    Compute normalised pseudo-likelihood weights for each trajectory.

    Args:
        inc_matrix: Array of shape (n_traj, n_days) — simulated trajectories.
        obs:        Array of shape (n_days,)         — observed incidence.
        epsilon:    Tolerance (RMSE scale).  Smaller = stricter selection.

    Returns:
        Normalised weight array of shape (n_traj,).
    """
    rmse = np.sqrt(np.mean((inc_matrix - obs[np.newaxis, :]) ** 2, axis=1))
    w    = np.exp(-0.5 * (rmse / epsilon) ** 2)
    total = w.sum()
    if total == 0:
        raise ValueError("All weights are zero — try a larger epsilon.")
    return w / total


# Tolerance ≈ 10 % of the peak value — a reasonable starting point
epsilon = obs_peak * 0.03

obs_array = obs_incidence.values[:N_DAYS]
weights   = gaussian_weights(inc_matrix, obs_array, epsilon=epsilon)

print(f"Tolerance ε = {epsilon:.1f} cases/day")
print(f"Effective sample size: {1 / (weights**2).sum():.0f}  "
      f"(out of {len(weights):,} total)")

# Scatter: parameter space coloured by weight
fig, ax = plt.subplots(figsize=(6, 5))
sc = ax.scatter(
    ensemble_df['beta'], ensemble_df['gamma'],
    c=weights, cmap='viridis_r', s=6, alpha=0.6, norm=mcolors.LogNorm(),
)
ax.scatter([BETA_TRUE], [GAMMA_TRUE], marker='*', s=250, c='red',
           edgecolors='k', lw=0.5, zorder=5, label='True values')
plt.colorbar(sc, ax=ax, label='Weight (log scale)')
ax.set_xlabel('β')
ax.set_ylabel('γ')
ax.set_title('Trajectory weights in parameter space')
ax.legend()
ax.grid(ls=':')
fig.tight_layout()

Tolerance ε = 7.1 cases/day
Effective sample size: 338  (out of 10,000 total)

6. Importance resampling — select the final trajectory set¶

We draw N_SELECT trajectories without replacement from the ensemble, proportional to their weights. The result is a volume-manageable set of plausible trajectories that can be used for downstream analysis (forecasting quantiles, reporting representative epidemic curves, etc.).

In [9]:

Copied!





N_SELECT  = 15   # number of trajectories to retain
rng       = np.random.default_rng(seed=0)

selected_idx = rng.choice(
    len(ensemble_df),
    size    = N_SELECT,
    replace = False,
    p       = weights,
)
selected_df  = ensemble_df.iloc[selected_idx].copy()
selected_inc = inc_matrix[selected_idx]

print(f"Selected {N_SELECT} trajectories")
print(f"β range in selected set:  {selected_df['beta'].min():.3f} – {selected_df['beta'].max():.3f}")
print(f"γ range in selected set:  {selected_df['gamma'].min():.3f} – {selected_df['gamma'].max():.3f}")

# ── Consistency check: re-run selected triples, verify identical outputs ──
print("\nReproducibility check (re-running 5 selected trajectories):")
for check_idx in selected_idx[:5]:
    row = ensemble_df.iloc[check_idx]
    model = SIR(
        beta  = row['beta'],
        gamma = row['gamma'],
        s0    = POPULATION - SEED_INFECTIONS,
        i0    = SEED_INFECTIONS,
        seed  = int(row['seed']),
    )
    rerun_inc = model.get_incidence()[:N_DAYS]
    orig_inc  = inc_matrix[check_idx]
    match = np.allclose(rerun_inc, orig_inc)
    print(f"  idx={check_idx:4d}  β={row['beta']:.3f}  γ={row['gamma']:.3f}  "
          f"seed={int(row['seed'])}  match={match}")
    assert match, f"Trajectory {check_idx} not reproducible!"

print("All checks passed — selected trajectories are reproducible.")
N_SELECT  = 15   # number of trajectories to retain
rng       = np.random.default_rng(seed=0)

selected_idx = rng.choice(
    len(ensemble_df),
    size    = N_SELECT,
    replace = False,
    p       = weights,
)
selected_df  = ensemble_df.iloc[selected_idx].copy()
selected_inc = inc_matrix[selected_idx]

print(f"Selected {N_SELECT} trajectories")
print(f"β range in selected set:  {selected_df['beta'].min():.3f} – {selected_df['beta'].max():.3f}")
print(f"γ range in selected set:  {selected_df['gamma'].min():.3f} – {selected_df['gamma'].max():.3f}")

# ── Consistency check: re-run selected triples, verify identical outputs ──
print("\nReproducibility check (re-running 5 selected trajectories):")
for check_idx in selected_idx[:5]:
    row = ensemble_df.iloc[check_idx]
    model = SIR(
        beta  = row['beta'],
        gamma = row['gamma'],
        s0    = POPULATION - SEED_INFECTIONS,
        i0    = SEED_INFECTIONS,
        seed  = int(row['seed']),
    )
    rerun_inc = model.get_incidence()[:N_DAYS]
    orig_inc  = inc_matrix[check_idx]
    match = np.allclose(rerun_inc, orig_inc)
    print(f"  idx={check_idx:4d}  β={row['beta']:.3f}  γ={row['gamma']:.3f}  "
          f"seed={int(row['seed'])}  match={match}")
    assert match, f"Trajectory {check_idx} not reproducible!"

print("All checks passed — selected trajectories are reproducible.")

Selected 15 trajectories
β range in selected set:  0.499 – 0.763
γ range in selected set:  0.260 – 0.487

Reproducibility check (re-running 5 selected trajectories):
  idx=6447  β=0.560  γ=0.329  seed=6447  match=True
  idx=2677  β=0.713  γ=0.412  seed=2677  match=True
  idx= 341  β=0.617  γ=0.382  seed=341  match=True
  idx= 226  β=0.763  γ=0.469  seed=226  match=True
  idx=7882  β=0.708  γ=0.400  seed=7882  match=True
All checks passed — selected trajectories are reproducible.

7. Visualise the selected trajectories¶

Compare the full prior ensemble (grey) against the selected trajectories (coloured) and the observations (black). The selected set should be visually consistent with the observed curve, whereas the full ensemble is much broader.

In [10]:

Copied!





selected_peaks = selected_inc.max(axis=1)
selected_ar    = selected_inc.sum(axis=1) / POPULATION
selected_inc5  = selected_inc[:, 5]

fig, axs = plt.subplots(2, 3, figsize=(17, 10))

# ── Top row: time-series comparison ──────────────────────────────────────

# Top-left: full prior ensemble
ax = axs[0, 0]
for row in inc_matrix:
    ax.plot(row, color='steelblue', alpha=0.04, lw=0.7)
ax.plot(obs_array, 'ko-', ms=4, lw=2, label='Observed', zorder=5)
ax.scatter([5], [obs_inc5], s=120, facecolors='none', edgecolors='red',
           linewidths=2, zorder=6, label='Emulated day 5')
ax.set_title(f'Prior ensemble\n({len(inc_matrix):,} trajectories)')
ax.set_xlabel('Day')
ax.set_ylabel('Daily new cases')
ax.legend(fontsize=8)
ax.grid(ls=':')

# Top-middle: selected trajectories
ax = axs[0, 1]
cmap_sel = plt.get_cmap('plasma')
for k, row in enumerate(selected_inc):
    ax.plot(row, color=cmap_sel(k / N_SELECT), alpha=0.6, lw=1.0)
ax.fill_between(
    range(N_DAYS),
    selected_inc.min(axis=0),
    selected_inc.max(axis=0),
    alpha=0.15, color='purple', label='Selected range',
)
ax.plot(selected_inc.mean(axis=0), color='purple', lw=2.5, label='Selected mean')
ax.plot(obs_array, 'ko-', ms=4, lw=2, label='Observed', zorder=5)
ax.scatter([5], [obs_inc5], s=120, facecolors='none', edgecolors='red',
           linewidths=2, zorder=6, label='Emulated day 5')
ax.set_title(f'Selected trajectories\n({N_SELECT} via importance resampling)')
ax.set_xlabel('Day')
ax.legend(fontsize=8)
ax.grid(ls=':')

# Top-right: parameter space
ax = axs[0, 2]
ax.scatter(ensemble_df['beta'], ensemble_df['gamma'],
           s=4, alpha=0.2, color='steelblue', label='Prior NROY')
ax.scatter(selected_df['beta'], selected_df['gamma'],
           s=20, alpha=0.8, color='purple', label='Selected')
ax.scatter([BETA_TRUE], [GAMMA_TRUE], marker='*', s=200,
           c='red', zorder=5, label='True values')
ax.set_xlabel('β')
ax.set_ylabel('γ')
ax.set_title('Parameter space')
ax.legend(fontsize=8)
ax.grid(ls=':')

# ── Bottom row: feature-space comparison ─────────────────────────────────

# Bottom-left: peak incidence
ax = axs[1, 0]
ax.hist(ensemble_df['peak'], bins=30, density=True, color='steelblue',
        alpha=0.4, edgecolor='white', label='Prior NROY')
ax.hist(selected_peaks, bins=15, density=True, color='purple',
        alpha=0.7, edgecolor='white', label='Selected')
ax.axvline(obs_peak, color='red', lw=2, ls='--', label=f'Observed ({obs_peak:.0f})')
ax.set_xlabel('Peak incidence')
ax.set_ylabel('Density')
ax.set_title('Peak incidence')
ax.legend(fontsize=8)
ax.grid(ls=':')

# Bottom-middle: attack rate
ax = axs[1, 1]
ax.hist(ensemble_df['ar'], bins=30, density=True, color='steelblue',
        alpha=0.4, edgecolor='white', label='Prior NROY')
ax.hist(selected_ar, bins=15, density=True, color='purple',
        alpha=0.7, edgecolor='white', label='Selected')
ax.axvline(obs_ar, color='red', lw=2, ls='--', label=f'Observed ({obs_ar:.2f})')
ax.set_xlabel('Attack rate')
ax.set_ylabel('Density')
ax.set_title('Attack rate')
ax.legend(fontsize=8)
ax.grid(ls=':')

# Bottom-right: incidence day 5
ax = axs[1, 2]
ax.hist(ensemble_df['inc5'], bins=30, density=True, color='steelblue',
        alpha=0.4, edgecolor='white', label='Prior NROY')
ax.hist(selected_inc5, bins=15, density=True, color='purple',
        alpha=0.7, edgecolor='white', label='Selected')
ax.axvline(obs_inc5, color='red', lw=2, ls='--', label=f'Observed ({obs_inc5:.0f})')
ax.set_xlabel('Incidence day 5')
ax.set_ylabel('Density')
ax.set_title('Incidence day 5')
ax.legend(fontsize=8)
ax.grid(ls=':')

fig.suptitle('NROY ensemble  →  Trajectory selection', fontsize=13)
fig.tight_layout()
selected_peaks = selected_inc.max(axis=1)
selected_ar    = selected_inc.sum(axis=1) / POPULATION
selected_inc5  = selected_inc[:, 5]

fig, axs = plt.subplots(2, 3, figsize=(17, 10))

# ── Top row: time-series comparison ──────────────────────────────────────

# Top-left: full prior ensemble
ax = axs[0, 0]
for row in inc_matrix:
    ax.plot(row, color='steelblue', alpha=0.04, lw=0.7)
ax.plot(obs_array, 'ko-', ms=4, lw=2, label='Observed', zorder=5)
ax.scatter([5], [obs_inc5], s=120, facecolors='none', edgecolors='red',
           linewidths=2, zorder=6, label='Emulated day 5')
ax.set_title(f'Prior ensemble\n({len(inc_matrix):,} trajectories)')
ax.set_xlabel('Day')
ax.set_ylabel('Daily new cases')
ax.legend(fontsize=8)
ax.grid(ls=':')

# Top-middle: selected trajectories
ax = axs[0, 1]
cmap_sel = plt.get_cmap('plasma')
for k, row in enumerate(selected_inc):
    ax.plot(row, color=cmap_sel(k / N_SELECT), alpha=0.6, lw=1.0)
ax.fill_between(
    range(N_DAYS),
    selected_inc.min(axis=0),
    selected_inc.max(axis=0),
    alpha=0.15, color='purple', label='Selected range',
)
ax.plot(selected_inc.mean(axis=0), color='purple', lw=2.5, label='Selected mean')
ax.plot(obs_array, 'ko-', ms=4, lw=2, label='Observed', zorder=5)
ax.scatter([5], [obs_inc5], s=120, facecolors='none', edgecolors='red',
           linewidths=2, zorder=6, label='Emulated day 5')
ax.set_title(f'Selected trajectories\n({N_SELECT} via importance resampling)')
ax.set_xlabel('Day')
ax.legend(fontsize=8)
ax.grid(ls=':')

# Top-right: parameter space
ax = axs[0, 2]
ax.scatter(ensemble_df['beta'], ensemble_df['gamma'],
           s=4, alpha=0.2, color='steelblue', label='Prior NROY')
ax.scatter(selected_df['beta'], selected_df['gamma'],
           s=20, alpha=0.8, color='purple', label='Selected')
ax.scatter([BETA_TRUE], [GAMMA_TRUE], marker='*', s=200,
           c='red', zorder=5, label='True values')
ax.set_xlabel('β')
ax.set_ylabel('γ')
ax.set_title('Parameter space')
ax.legend(fontsize=8)
ax.grid(ls=':')

# ── Bottom row: feature-space comparison ─────────────────────────────────

# Bottom-left: peak incidence
ax = axs[1, 0]
ax.hist(ensemble_df['peak'], bins=30, density=True, color='steelblue',
        alpha=0.4, edgecolor='white', label='Prior NROY')
ax.hist(selected_peaks, bins=15, density=True, color='purple',
        alpha=0.7, edgecolor='white', label='Selected')
ax.axvline(obs_peak, color='red', lw=2, ls='--', label=f'Observed ({obs_peak:.0f})')
ax.set_xlabel('Peak incidence')
ax.set_ylabel('Density')
ax.set_title('Peak incidence')
ax.legend(fontsize=8)
ax.grid(ls=':')

# Bottom-middle: attack rate
ax = axs[1, 1]
ax.hist(ensemble_df['ar'], bins=30, density=True, color='steelblue',
        alpha=0.4, edgecolor='white', label='Prior NROY')
ax.hist(selected_ar, bins=15, density=True, color='purple',
        alpha=0.7, edgecolor='white', label='Selected')
ax.axvline(obs_ar, color='red', lw=2, ls='--', label=f'Observed ({obs_ar:.2f})')
ax.set_xlabel('Attack rate')
ax.set_ylabel('Density')
ax.set_title('Attack rate')
ax.legend(fontsize=8)
ax.grid(ls=':')

# Bottom-right: incidence day 5
ax = axs[1, 2]
ax.hist(ensemble_df['inc5'], bins=30, density=True, color='steelblue',
        alpha=0.4, edgecolor='white', label='Prior NROY')
ax.hist(selected_inc5, bins=15, density=True, color='purple',
        alpha=0.7, edgecolor='white', label='Selected')
ax.axvline(obs_inc5, color='red', lw=2, ls='--', label=f'Observed ({obs_inc5:.0f})')
ax.set_xlabel('Incidence day 5')
ax.set_ylabel('Density')
ax.set_title('Incidence day 5')
ax.legend(fontsize=8)
ax.grid(ls=':')

fig.suptitle('NROY ensemble  →  Trajectory selection', fontsize=13)
fig.tight_layout()

In [11]:

Copied!





# Parameter distributions: NROY pool vs. selected set
fig, axs = plt.subplots(1, 2, figsize=(11, 4))

for ax, param, true_val in zip(axs, ['beta', 'gamma'], [BETA_TRUE, GAMMA_TRUE]):
    ax.hist(nroy_samples[param], bins=20, density=True,
            alpha=0.5, color='steelblue', label='NROY pool')
    ax.hist(selected_df[param], bins=20, density=True,
            alpha=0.7, color='purple', label='Selected')
    ax.axvline(true_val, color='red', ls='--', lw=1.5, label=f'True ({true_val})')
    ax.set_xlabel(param)
    ax.set_ylabel('Density')
    ax.set_title(f'Distribution of {param}')
    ax.legend(fontsize=8)
    ax.grid(ls=':')

fig.suptitle('Parameter distributions before and after trajectory selection', fontsize=12)
fig.tight_layout()
# Parameter distributions: NROY pool vs. selected set
fig, axs = plt.subplots(1, 2, figsize=(11, 4))

for ax, param, true_val in zip(axs, ['beta', 'gamma'], [BETA_TRUE, GAMMA_TRUE]):
    ax.hist(nroy_samples[param], bins=20, density=True,
            alpha=0.5, color='steelblue', label='NROY pool')
    ax.hist(selected_df[param], bins=20, density=True,
            alpha=0.7, color='purple', label='Selected')
    ax.axvline(true_val, color='red', ls='--', lw=1.5, label=f'True ({true_val})')
    ax.set_xlabel(param)
    ax.set_ylabel('Density')
    ax.set_title(f'Distribution of {param}')
    ax.legend(fontsize=8)
    ax.grid(ls=':')

fig.suptitle('Parameter distributions before and after trajectory selection', fontsize=12)
fig.tight_layout()

8. Effect of tolerance ε¶

The tolerance ε is the key tuning parameter. Here we sweep over several values to show the trade-off between specificity (small ε → selected trajectories are close to the observed data) and coverage (large ε → selected trajectories span the full NROY ensemble).

A useful diagnostic is the effective sample size $N_{\mathrm{eff}} = \left(\sum_i w_i^2\right)^{-1}$. When $N_{\mathrm{eff}} \ll N$ the weight distribution is too concentrated; when $N_{\mathrm{eff}} \approx N$ the selection is almost uniform (lenient).

In [12]:

Copied!





epsilons = [obs_peak * f for f in [0.02, 0.05, 0.10, 0.20, 0.40]]

fig, axs = plt.subplots(1, len(epsilons), figsize=(16, 4), sharey=True)

for ax, eps in zip(axs, epsilons):
    w   = gaussian_weights(inc_matrix, obs_array, epsilon=eps)
    nef = 1 / (w**2).sum()

    idx_sel = rng.choice(len(ensemble_df), size=N_SELECT, replace=False, p=w)
    sel_inc = inc_matrix[idx_sel]

    ax.fill_between(
        range(N_DAYS),
        np.percentile(sel_inc, 10, axis=0),
        np.percentile(sel_inc, 90, axis=0),
        alpha=0.3, color='purple', label='10–90 pct',
    )
    ax.plot(sel_inc.mean(axis=0), color='purple', lw=2)
    ax.plot(obs_array, 'ko-', ms=3, lw=1.5, zorder=5)
    ax.set_title(f'ε = {eps:.1f}\nNeff = {nef:.0f}')
    ax.set_xlabel('Day')
    ax.grid(ls=':')

axs[0].set_ylabel('Daily new cases')
fig.suptitle('Trajectory selection sensitivity to tolerance ε', fontsize=12)
fig.tight_layout()
epsilons = [obs_peak * f for f in [0.02, 0.05, 0.10, 0.20, 0.40]]

fig, axs = plt.subplots(1, len(epsilons), figsize=(16, 4), sharey=True)

for ax, eps in zip(axs, epsilons):
    w   = gaussian_weights(inc_matrix, obs_array, epsilon=eps)
    nef = 1 / (w**2).sum()

    idx_sel = rng.choice(len(ensemble_df), size=N_SELECT, replace=False, p=w)
    sel_inc = inc_matrix[idx_sel]

    ax.fill_between(
        range(N_DAYS),
        np.percentile(sel_inc, 10, axis=0),
        np.percentile(sel_inc, 90, axis=0),
        alpha=0.3, color='purple', label='10–90 pct',
    )
    ax.plot(sel_inc.mean(axis=0), color='purple', lw=2)
    ax.plot(obs_array, 'ko-', ms=3, lw=1.5, zorder=5)
    ax.set_title(f'ε = {eps:.1f}\nNeff = {nef:.0f}')
    ax.set_xlabel('Day')
    ax.grid(ls=':')

axs[0].set_ylabel('Daily new cases')
fig.suptitle('Trajectory selection sensitivity to tolerance ε', fontsize=12)
fig.tight_layout()

Summary¶

Step	What it does	Output
History matching	Eliminates implausible regions of parameter space using emulators	NROY parameter samples
Trajectory ensemble	Runs the stochastic model at NROY parameters × multiple seeds	Large ensemble of candidate trajectories
Pseudo-likelihood weighting	Scores each trajectory by how closely it matches the observed data	Weight per `(parameter, seed)` pair
Importance resampling	Draws a tractable set proportional to weights	Selected plausible trajectories

Key design choices¶

Why use summary statistics for HM but full time series for trajectory selection?
History matching with the full time series would require one emulator per day — high-dimensional and expensive. Summary statistics keep HM tractable. The trajectory selection step then acts as a second, finer filter using the full data.

How to choose ε?
Start with ε ≈ 10% of the observed peak. Check the effective sample size — aim for $N_{\mathrm{eff}} \gtrsim 50$. If the selected trajectories look too broad, tighten ε. If they look implausibly narrow, relax it.

Using the selected trajectories
Each selected (β, γ, seed) triple is a complete, reproducible simulation run. You can re-run these with the full model (e.g., longer time horizon, additional compartments) to generate matched forecasts or counterfactuals.