Emulator Showcase¶

This notebook demonstrates the different emulator types available in the history matching library and how to use them effectively within the modern object-oriented API.

Overview¶

Emulators are the heart of history matching - they provide fast statistical approximations of expensive simulations. We'll explore:

Linear Emulators: Fast, simple, good for linear relationships
Bayes Linear: GP-like accuracy and uncertainty, pure numpy/scipy — no TensorFlow needed
Gaussian Process Regression (GPR): Flexible, uncertainty-aware, handles non-linearity
Neural Network Emulators: Deep learning for complex patterns
Comparison Framework: How to choose the right emulator for your problem
Integration with History Matching: Using emulators within the workflow

We'll test emulators on different synthetic models to understand their strengths and limitations.

In [1]:

Copied!





import math
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.metrics import mean_squared_error, r2_score
import time

# Import the history matching library
import historymatching as hm
import logging; logging.getLogger("historymatching").propagate = False  # keep INFO logs out of the rendered docs

%matplotlib inline
np.random.seed(42)

print(" Emulator Showcase and Comparison")
print("Testing different emulator types on various synthetic problems")
import math
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.metrics import mean_squared_error, r2_score
import time

# Import the history matching library
import historymatching as hm
import logging; logging.getLogger("historymatching").propagate = False  # keep INFO logs out of the rendered docs

%matplotlib inline
np.random.seed(42)

print(" Emulator Showcase and Comparison")
print("Testing different emulator types on various synthetic problems")

WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
I0000 00:00:1782248332.049429    2982 cudart_stub.cc:31] Could not find cuda drivers on your machine, GPU will not be used.
I0000 00:00:1782248332.093158    2982 cpu_feature_guard.cc:227] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.

WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
I0000 00:00:1782248333.253748    2982 cudart_stub.cc:31] Could not find cuda drivers on your machine, GPU will not be used.

 Emulator Showcase and Comparison
Testing different emulator types on various synthetic problems

/home/runner/work/historymatching/historymatching/.venv/lib/python3.13/site-packages/gpflow/versions.py:1: UserWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html. The pkg_resources package is slated for removal as early as 2025-11-30. Refrain from using this package or pin to Setuptools<81.
  import pkg_resources

Test Functions for Emulator Evaluation¶

We'll create several test functions with different characteristics to evaluate emulator performance.

In [2]:

Copied!





class TestFunctions:
    """Collection of test functions for emulator evaluation."""
    
    @staticmethod
    def linear_1d(x, noise_level=5.0):
        """Simple linear function: y = 3x + 2 + noise"""
        y = 3 * x + 2
        if noise_level > 0:
            y += np.random.normal(0, noise_level, x.shape)
        return y
    
    @staticmethod
    def quadratic_1d(x, noise_level=2.0):
        """Quadratic function: y = 0.5x² - 2x + 1 + noise"""
        y = 0.5 * x**2 - 2 * x + 1
        if noise_level > 0:
            y += np.random.normal(0, noise_level, x.shape)
        return y
    
    @staticmethod
    def sinusoidal_1d(x, noise_level=1.0):
        """Sinusoidal function: y = 10*sin(x) + 0.5x + noise"""
        y = 10 * np.sin(x) + 0.5 * x
        if noise_level > 0:
            y += np.random.normal(0, noise_level, x.shape)
        return y
    
    @staticmethod
    def exponential_1d(x, noise_level=0.5):
        """Exponential growth: y = 2*exp(0.3x) + noise"""
        y = 2 * np.exp(0.3 * x)
        if noise_level > 0:
            y += np.random.normal(0, noise_level * y, x.shape)  # Proportional noise
        return y
    
    @staticmethod
    def multimodal_1d(x, noise_level=1.5):
        """Complex multimodal function with multiple peaks"""
        y = (np.sin(x) * np.cos(2*x) * np.exp(-0.1*x**2) * 10 + 
             np.sin(0.5*x) * 5 + 
             0.1 * x**2)
        if noise_level > 0:
            y += np.random.normal(0, noise_level, x.shape)
        return y
    
    @staticmethod
    def linear_2d(x1, x2, noise_level=3.0):
        """2D linear function: y = 2x₁ + 3x₂ + 1 + noise"""
        y = 2 * x1 + 3 * x2 + 1
        if noise_level > 0:
            y += np.random.normal(0, noise_level, x1.shape)
        return y
    
    @staticmethod
    def nonlinear_2d(x1, x2, noise_level=2.0):
        """2D nonlinear function: y = x₁²*x₂ + sin(x₁)*cos(x₂) + noise"""
        y = x1**2 * x2 + np.sin(x1) * np.cos(x2) * 5
        if noise_level > 0:
            y += np.random.normal(0, noise_level, x1.shape)
        return y

# Visualize test functions
fig, axes = plt.subplots(2, 3, figsize=(18, 12))

x_test = np.linspace(-5, 5, 100)

functions_1d = [
    ('Linear', TestFunctions.linear_1d),
    ('Quadratic', TestFunctions.quadratic_1d),
    ('Sinusoidal', TestFunctions.sinusoidal_1d),
    ('Exponential', TestFunctions.exponential_1d),
    ('Multimodal', TestFunctions.multimodal_1d)
]

for i, (name, func) in enumerate(functions_1d):
    ax = axes[i//3, i%3]
    
    # Clean function (no noise)
    y_clean = func(x_test, noise_level=0)
    ax.plot(x_test, y_clean, 'b-', linewidth=2, label='True function')
    
    # Noisy samples
    x_sample = np.linspace(-5, 5, 20)
    y_noisy = func(x_sample, noise_level=func.__defaults__[0] if func.__defaults__ else 1.0)
    ax.scatter(x_sample, y_noisy, color='red', alpha=0.7, s=30, label='Noisy samples')
    
    ax.set_title(f'{name} Function')
    ax.set_xlabel('x')
    ax.set_ylabel('y')
    ax.legend()
    ax.grid(True, alpha=0.3)

# 2D function visualization
ax = axes[1, 2]
x1_grid, x2_grid = np.meshgrid(np.linspace(-2, 2, 50), np.linspace(-2, 2, 50))
y_grid = TestFunctions.nonlinear_2d(x1_grid, x2_grid, noise_level=0)
contour = ax.contourf(x1_grid, x2_grid, y_grid, levels=20, cmap='viridis')
ax.set_title('2D Nonlinear Function')
ax.set_xlabel('x₁')
ax.set_ylabel('x₂')
plt.colorbar(contour, ax=ax)

plt.tight_layout()
plt.show()

print(" Test functions created:")
for name, _ in functions_1d:
    print(f"  - {name}: Various complexity levels")
print(" - 2D Nonlinear: Multi-dimensional test case")
class TestFunctions:
    """Collection of test functions for emulator evaluation."""
    
    @staticmethod
    def linear_1d(x, noise_level=5.0):
        """Simple linear function: y = 3x + 2 + noise"""
        y = 3 * x + 2
        if noise_level > 0:
            y += np.random.normal(0, noise_level, x.shape)
        return y
    
    @staticmethod
    def quadratic_1d(x, noise_level=2.0):
        """Quadratic function: y = 0.5x² - 2x + 1 + noise"""
        y = 0.5 * x**2 - 2 * x + 1
        if noise_level > 0:
            y += np.random.normal(0, noise_level, x.shape)
        return y
    
    @staticmethod
    def sinusoidal_1d(x, noise_level=1.0):
        """Sinusoidal function: y = 10*sin(x) + 0.5x + noise"""
        y = 10 * np.sin(x) + 0.5 * x
        if noise_level > 0:
            y += np.random.normal(0, noise_level, x.shape)
        return y
    
    @staticmethod
    def exponential_1d(x, noise_level=0.5):
        """Exponential growth: y = 2*exp(0.3x) + noise"""
        y = 2 * np.exp(0.3 * x)
        if noise_level > 0:
            y += np.random.normal(0, noise_level * y, x.shape)  # Proportional noise
        return y
    
    @staticmethod
    def multimodal_1d(x, noise_level=1.5):
        """Complex multimodal function with multiple peaks"""
        y = (np.sin(x) * np.cos(2*x) * np.exp(-0.1*x**2) * 10 + 
             np.sin(0.5*x) * 5 + 
             0.1 * x**2)
        if noise_level > 0:
            y += np.random.normal(0, noise_level, x.shape)
        return y
    
    @staticmethod
    def linear_2d(x1, x2, noise_level=3.0):
        """2D linear function: y = 2x₁ + 3x₂ + 1 + noise"""
        y = 2 * x1 + 3 * x2 + 1
        if noise_level > 0:
            y += np.random.normal(0, noise_level, x1.shape)
        return y
    
    @staticmethod
    def nonlinear_2d(x1, x2, noise_level=2.0):
        """2D nonlinear function: y = x₁²*x₂ + sin(x₁)*cos(x₂) + noise"""
        y = x1**2 * x2 + np.sin(x1) * np.cos(x2) * 5
        if noise_level > 0:
            y += np.random.normal(0, noise_level, x1.shape)
        return y

# Visualize test functions
fig, axes = plt.subplots(2, 3, figsize=(18, 12))

x_test = np.linspace(-5, 5, 100)

functions_1d = [
    ('Linear', TestFunctions.linear_1d),
    ('Quadratic', TestFunctions.quadratic_1d),
    ('Sinusoidal', TestFunctions.sinusoidal_1d),
    ('Exponential', TestFunctions.exponential_1d),
    ('Multimodal', TestFunctions.multimodal_1d)
]

for i, (name, func) in enumerate(functions_1d):
    ax = axes[i//3, i%3]
    
    # Clean function (no noise)
    y_clean = func(x_test, noise_level=0)
    ax.plot(x_test, y_clean, 'b-', linewidth=2, label='True function')
    
    # Noisy samples
    x_sample = np.linspace(-5, 5, 20)
    y_noisy = func(x_sample, noise_level=func.__defaults__[0] if func.__defaults__ else 1.0)
    ax.scatter(x_sample, y_noisy, color='red', alpha=0.7, s=30, label='Noisy samples')
    
    ax.set_title(f'{name} Function')
    ax.set_xlabel('x')
    ax.set_ylabel('y')
    ax.legend()
    ax.grid(True, alpha=0.3)

# 2D function visualization
ax = axes[1, 2]
x1_grid, x2_grid = np.meshgrid(np.linspace(-2, 2, 50), np.linspace(-2, 2, 50))
y_grid = TestFunctions.nonlinear_2d(x1_grid, x2_grid, noise_level=0)
contour = ax.contourf(x1_grid, x2_grid, y_grid, levels=20, cmap='viridis')
ax.set_title('2D Nonlinear Function')
ax.set_xlabel('x₁')
ax.set_ylabel('x₂')
plt.colorbar(contour, ax=ax)

plt.tight_layout()
plt.show()

print(" Test functions created:")
for name, _ in functions_1d:
    print(f"  - {name}: Various complexity levels")
print(" - 2D Nonlinear: Multi-dimensional test case")

No description has been provided for this image

 Test functions created:
  - Linear: Various complexity levels
  - Quadratic: Various complexity levels
  - Sinusoidal: Various complexity levels
  - Exponential: Various complexity levels
  - Multimodal: Various complexity levels
 - 2D Nonlinear: Multi-dimensional test case

Emulator Performance Framework¶

Let's create a framework to systematically test and compare emulator performance.

In [3]:

Copied!





def evaluate_emulator_performance(emulator_type, test_function, x_range, n_train=50, n_test=100, **emulator_kwargs):
    """
    Comprehensive emulator evaluation framework.
    
    Args:
        emulator_type: 'linear', 'gpr', or 'neural_network'
        test_function: Function to emulate
        x_range: (min, max) range for input values
        n_train: Number of training samples
        n_test: Number of test samples
        **emulator_kwargs: Additional emulator configuration
    
    Returns:
        Dictionary with performance metrics and results
    """
    
    # Generate training data
    x_train = np.random.uniform(x_range[0], x_range[1], n_train)
    y_train = test_function(x_train)
    
    # Generate test data
    x_test = np.linspace(x_range[0], x_range[1], n_test)
    y_test_true = test_function(x_test, noise_level=0)  # True function without noise
    
    # Prepare data for history matching format
    X_train = pd.DataFrame({'x': x_train})
    y_train_df = pd.DataFrame({'y': y_train})
    X_test = pd.DataFrame({'x': x_test})
    
    # Create emulator using EmulatorFactory
    factory = hm.EmulatorFactory(default_type=emulator_type, **emulator_kwargs)
    
    # Time the training
    start_time = time.time()
    
    emulators = factory.create_emulators_for_features(
        X_train, y_train_df, ['y']
    )
    emulator = emulators['y']
    
    training_time = time.time() - start_time
    
    # Time the prediction
    start_time = time.time()
    predictions = emulator.predict(X_test)
    prediction_time = time.time() - start_time
    
    y_pred = predictions.get_mean().values
    y_var = predictions.get_variance().values
    
    # Calculate performance metrics
    mse = mean_squared_error(y_test_true, y_pred)
    rmse = np.sqrt(mse)
    r2 = r2_score(y_test_true, y_pred)
    
    # Mean absolute error
    mae = np.mean(np.abs(y_test_true - y_pred))
    
    # Coverage metrics (if uncertainty available)
    coverage_95 = None
    if np.any(y_var > 0):
        y_std = np.sqrt(y_var)
        lower_bound = y_pred - 1.96 * y_std
        upper_bound = y_pred + 1.96 * y_std
        coverage_95 = np.mean((y_test_true >= lower_bound) & (y_test_true <= upper_bound))
    
    return {
        'emulator_type': emulator_type,
        'emulator': emulator,
        'training_time': training_time,
        'prediction_time': prediction_time,
        'mse': mse,
        'rmse': rmse,
        'mae': mae,
        'r2': r2,
        'coverage_95': coverage_95,
        'x_train': x_train,
        'y_train': y_train,
        'x_test': x_test,
        'y_test_true': y_test_true,
        'y_pred': y_pred,
        'y_var': y_var
    }

def plot_emulator_comparison(results_list, title="Emulator Comparison"):
    """
    Plot comparison of multiple emulator results.
    """
    fig, axes = plt.subplots(2, 2, figsize=(15, 12))
    
    colors = ['blue', 'red', 'green', 'purple', 'orange']
    
    # Predictions plot
    ax = axes[0, 0]
    for i, result in enumerate(results_list):
        color = colors[i % len(colors)]
        
        # Plot true function
        if i == 0:
            ax.plot(result['x_test'], result['y_test_true'], 'k-', linewidth=2, label='True function')
            ax.scatter(result['x_train'], result['y_train'], color='gray', alpha=0.6, s=20, label='Training data')
        
        # Plot predictions
        ax.plot(result['x_test'], result['y_pred'], color=color, linestyle='--', linewidth=2, 
               label=f"{result['emulator_type']} (R²={result['r2']:.3f})")
        
        # Plot uncertainty if available
        if np.any(result['y_var'] > 0):
            y_std = np.sqrt(result['y_var'])
            ax.fill_between(result['x_test'], 
                           result['y_pred'] - 1.96 * y_std,
                           result['y_pred'] + 1.96 * y_std,
                           color=color, alpha=0.2)
    
    ax.set_title('Predictions Comparison')
    ax.set_xlabel('x')
    ax.set_ylabel('y')
    ax.legend()
    ax.grid(True, alpha=0.3)
    
    # Performance metrics
    ax = axes[0, 1]
    metrics = ['rmse', 'mae', 'r2']
    metric_names = ['RMSE', 'MAE', 'R²']
    
    x_pos = np.arange(len(metrics))
    width = 0.8 / len(results_list)
    
    for i, result in enumerate(results_list):
        values = [result[metric] for metric in metrics]
        ax.bar(x_pos + i * width, values, width, label=result['emulator_type'], 
               color=colors[i % len(colors)], alpha=0.7)
    
    ax.set_xlabel('Metrics')
    ax.set_ylabel('Value')
    ax.set_title('Performance Metrics')
    ax.set_xticks(x_pos + width * (len(results_list) - 1) / 2)
    ax.set_xticklabels(metric_names)
    ax.legend()
    ax.grid(True, alpha=0.3)
    
    # Timing comparison
    ax = axes[1, 0]
    emulator_names = [r['emulator_type'] for r in results_list]
    training_times = [r['training_time'] for r in results_list]
    prediction_times = [r['prediction_time'] for r in results_list]
    
    x_pos = np.arange(len(emulator_names))
    width = 0.35
    
    ax.bar(x_pos - width/2, training_times, width, label='Training time', alpha=0.7)
    ax.bar(x_pos + width/2, prediction_times, width, label='Prediction time', alpha=0.7)
    
    ax.set_xlabel('Emulator Type')
    ax.set_ylabel('Time (seconds)')
    ax.set_title('Training and Prediction Times')
    ax.set_xticks(x_pos)
    ax.set_xticklabels(emulator_names)
    ax.legend()
    ax.grid(True, alpha=0.3)
    
    # Residuals plot
    ax = axes[1, 1]
    for i, result in enumerate(results_list):
        residuals = result['y_test_true'] - result['y_pred']
        ax.scatter(result['y_pred'], residuals, alpha=0.6, s=20, 
                  color=colors[i % len(colors)], label=result['emulator_type'])
    
    ax.axhline(y=0, color='black', linestyle='--', alpha=0.5)
    ax.set_xlabel('Predicted values')
    ax.set_ylabel('Residuals')
    ax.set_title('Residuals vs Predicted')
    ax.legend()
    ax.grid(True, alpha=0.3)
    
    plt.suptitle(title, fontsize=16)
    plt.tight_layout()
    plt.show()

print("Emulator evaluation framework ready!")
def evaluate_emulator_performance(emulator_type, test_function, x_range, n_train=50, n_test=100, **emulator_kwargs):
    """
    Comprehensive emulator evaluation framework.
    
    Args:
        emulator_type: 'linear', 'gpr', or 'neural_network'
        test_function: Function to emulate
        x_range: (min, max) range for input values
        n_train: Number of training samples
        n_test: Number of test samples
        **emulator_kwargs: Additional emulator configuration
    
    Returns:
        Dictionary with performance metrics and results
    """
    
    # Generate training data
    x_train = np.random.uniform(x_range[0], x_range[1], n_train)
    y_train = test_function(x_train)
    
    # Generate test data
    x_test = np.linspace(x_range[0], x_range[1], n_test)
    y_test_true = test_function(x_test, noise_level=0)  # True function without noise
    
    # Prepare data for history matching format
    X_train = pd.DataFrame({'x': x_train})
    y_train_df = pd.DataFrame({'y': y_train})
    X_test = pd.DataFrame({'x': x_test})
    
    # Create emulator using EmulatorFactory
    factory = hm.EmulatorFactory(default_type=emulator_type, **emulator_kwargs)
    
    # Time the training
    start_time = time.time()
    
    emulators = factory.create_emulators_for_features(
        X_train, y_train_df, ['y']
    )
    emulator = emulators['y']
    
    training_time = time.time() - start_time
    
    # Time the prediction
    start_time = time.time()
    predictions = emulator.predict(X_test)
    prediction_time = time.time() - start_time
    
    y_pred = predictions.get_mean().values
    y_var = predictions.get_variance().values
    
    # Calculate performance metrics
    mse = mean_squared_error(y_test_true, y_pred)
    rmse = np.sqrt(mse)
    r2 = r2_score(y_test_true, y_pred)
    
    # Mean absolute error
    mae = np.mean(np.abs(y_test_true - y_pred))
    
    # Coverage metrics (if uncertainty available)
    coverage_95 = None
    if np.any(y_var > 0):
        y_std = np.sqrt(y_var)
        lower_bound = y_pred - 1.96 * y_std
        upper_bound = y_pred + 1.96 * y_std
        coverage_95 = np.mean((y_test_true >= lower_bound) & (y_test_true <= upper_bound))
    
    return {
        'emulator_type': emulator_type,
        'emulator': emulator,
        'training_time': training_time,
        'prediction_time': prediction_time,
        'mse': mse,
        'rmse': rmse,
        'mae': mae,
        'r2': r2,
        'coverage_95': coverage_95,
        'x_train': x_train,
        'y_train': y_train,
        'x_test': x_test,
        'y_test_true': y_test_true,
        'y_pred': y_pred,
        'y_var': y_var
    }

def plot_emulator_comparison(results_list, title="Emulator Comparison"):
    """
    Plot comparison of multiple emulator results.
    """
    fig, axes = plt.subplots(2, 2, figsize=(15, 12))
    
    colors = ['blue', 'red', 'green', 'purple', 'orange']
    
    # Predictions plot
    ax = axes[0, 0]
    for i, result in enumerate(results_list):
        color = colors[i % len(colors)]
        
        # Plot true function
        if i == 0:
            ax.plot(result['x_test'], result['y_test_true'], 'k-', linewidth=2, label='True function')
            ax.scatter(result['x_train'], result['y_train'], color='gray', alpha=0.6, s=20, label='Training data')
        
        # Plot predictions
        ax.plot(result['x_test'], result['y_pred'], color=color, linestyle='--', linewidth=2, 
               label=f"{result['emulator_type']} (R²={result['r2']:.3f})")
        
        # Plot uncertainty if available
        if np.any(result['y_var'] > 0):
            y_std = np.sqrt(result['y_var'])
            ax.fill_between(result['x_test'], 
                           result['y_pred'] - 1.96 * y_std,
                           result['y_pred'] + 1.96 * y_std,
                           color=color, alpha=0.2)
    
    ax.set_title('Predictions Comparison')
    ax.set_xlabel('x')
    ax.set_ylabel('y')
    ax.legend()
    ax.grid(True, alpha=0.3)
    
    # Performance metrics
    ax = axes[0, 1]
    metrics = ['rmse', 'mae', 'r2']
    metric_names = ['RMSE', 'MAE', 'R²']
    
    x_pos = np.arange(len(metrics))
    width = 0.8 / len(results_list)
    
    for i, result in enumerate(results_list):
        values = [result[metric] for metric in metrics]
        ax.bar(x_pos + i * width, values, width, label=result['emulator_type'], 
               color=colors[i % len(colors)], alpha=0.7)
    
    ax.set_xlabel('Metrics')
    ax.set_ylabel('Value')
    ax.set_title('Performance Metrics')
    ax.set_xticks(x_pos + width * (len(results_list) - 1) / 2)
    ax.set_xticklabels(metric_names)
    ax.legend()
    ax.grid(True, alpha=0.3)
    
    # Timing comparison
    ax = axes[1, 0]
    emulator_names = [r['emulator_type'] for r in results_list]
    training_times = [r['training_time'] for r in results_list]
    prediction_times = [r['prediction_time'] for r in results_list]
    
    x_pos = np.arange(len(emulator_names))
    width = 0.35
    
    ax.bar(x_pos - width/2, training_times, width, label='Training time', alpha=0.7)
    ax.bar(x_pos + width/2, prediction_times, width, label='Prediction time', alpha=0.7)
    
    ax.set_xlabel('Emulator Type')
    ax.set_ylabel('Time (seconds)')
    ax.set_title('Training and Prediction Times')
    ax.set_xticks(x_pos)
    ax.set_xticklabels(emulator_names)
    ax.legend()
    ax.grid(True, alpha=0.3)
    
    # Residuals plot
    ax = axes[1, 1]
    for i, result in enumerate(results_list):
        residuals = result['y_test_true'] - result['y_pred']
        ax.scatter(result['y_pred'], residuals, alpha=0.6, s=20, 
                  color=colors[i % len(colors)], label=result['emulator_type'])
    
    ax.axhline(y=0, color='black', linestyle='--', alpha=0.5)
    ax.set_xlabel('Predicted values')
    ax.set_ylabel('Residuals')
    ax.set_title('Residuals vs Predicted')
    ax.legend()
    ax.grid(True, alpha=0.3)
    
    plt.suptitle(title, fontsize=16)
    plt.tight_layout()
    plt.show()

print("Emulator evaluation framework ready!")

Emulator evaluation framework ready!

Linear Function Test: Ideal Case for Linear Emulator¶

In [4]:

Copied!





print(" Test 1: Linear Function - Ideal for Linear Emulator")
print("=" * 50)

# Test all emulator types on linear function
linear_results = []

# Linear emulator
result_linear = evaluate_emulator_performance(
    'linear', 
    TestFunctions.linear_1d, 
    x_range=(-5, 5),
    n_train=30,
    n_test=100
)
linear_results.append(result_linear)

# GPR emulator
result_gpr = evaluate_emulator_performance(
    'gpr',
    TestFunctions.linear_1d,
    x_range=(-5, 5),
    n_train=30,
    n_test=100
)
linear_results.append(result_gpr)

# Bayes Linear emulator
result_bl = evaluate_emulator_performance(
    'bayes_linear',
    TestFunctions.linear_1d,
    x_range=(-5, 5),
    n_train=30,
    n_test=100
)
linear_results.append(result_bl)

# Neural network emulator (if available)
try:
    result_nn = evaluate_emulator_performance(
        'neural_network',
        TestFunctions.linear_1d,
        x_range=(-5, 5),
        n_train=30,
        n_test=100,
        hidden_layers=[10, 5],
        epochs=100
    )
    linear_results.append(result_nn)
except Exception as e:
    print(f"Neural network emulator not available: {e}")

# Plot comparison
plot_emulator_comparison(linear_results, "Linear Function Test")

# Print performance summary
print("\n Performance Summary (Linear Function):")
print(f"{'Emulator':<15} {'RMSE':<8} {'R²':<8} {'Train Time':<12} {'Coverage 95%':<12}")
print("-" * 65)

for result in linear_results:
    coverage_str = f"{result['coverage_95']:.3f}" if result['coverage_95'] is not None else "N/A"
    print(f"{result['emulator_type']:<15} {result['rmse']:<8.3f} {result['r2']:<8.3f} "
          f"{result['training_time']:<12.4f} {coverage_str:<12}")

print("\n Expected: Linear emulator should perform best with fastest training time")
print(" Test 1: Linear Function - Ideal for Linear Emulator")
print("=" * 50)

# Test all emulator types on linear function
linear_results = []

# Linear emulator
result_linear = evaluate_emulator_performance(
    'linear', 
    TestFunctions.linear_1d, 
    x_range=(-5, 5),
    n_train=30,
    n_test=100
)
linear_results.append(result_linear)

# GPR emulator
result_gpr = evaluate_emulator_performance(
    'gpr',
    TestFunctions.linear_1d,
    x_range=(-5, 5),
    n_train=30,
    n_test=100
)
linear_results.append(result_gpr)

# Bayes Linear emulator
result_bl = evaluate_emulator_performance(
    'bayes_linear',
    TestFunctions.linear_1d,
    x_range=(-5, 5),
    n_train=30,
    n_test=100
)
linear_results.append(result_bl)

# Neural network emulator (if available)
try:
    result_nn = evaluate_emulator_performance(
        'neural_network',
        TestFunctions.linear_1d,
        x_range=(-5, 5),
        n_train=30,
        n_test=100,
        hidden_layers=[10, 5],
        epochs=100
    )
    linear_results.append(result_nn)
except Exception as e:
    print(f"Neural network emulator not available: {e}")

# Plot comparison
plot_emulator_comparison(linear_results, "Linear Function Test")

# Print performance summary
print("\n Performance Summary (Linear Function):")
print(f"{'Emulator':<15} {'RMSE':<8} {'R²':<8} {'Train Time':<12} {'Coverage 95%':<12}")
print("-" * 65)

for result in linear_results:
    coverage_str = f"{result['coverage_95']:.3f}" if result['coverage_95'] is not None else "N/A"
    print(f"{result['emulator_type']:<15} {result['rmse']:<8.3f} {result['r2']:<8.3f} "
          f"{result['training_time']:<12.4f} {coverage_str:<12}")

print("\n Expected: Linear emulator should perform best with fastest training time")

 Test 1: Linear Function - Ideal for Linear Emulator
==================================================

E0000 00:00:1782248335.020519    2982 cuda_platform.cc:52] failed call to cuInit: INTERNAL: CUDA error: Failed call to cuInit: UNKNOWN ERROR (303)

Neural network emulator not available: Unknown default emulator type: 'neural_network'. Available types: ['linear', 'glm', 'gpr', 'gaussian', 'bayes_linear']

 Performance Summary (Linear Function):
Emulator        RMSE     R²       Train Time   Coverage 95%
-----------------------------------------------------------------
linear          1.733    0.961    0.0028       1.000       
gpr             2.016    0.947    1.3585       1.000       
bayes_linear    0.834    0.991    0.0038       1.000       

 Expected: Linear emulator should perform best with fastest training time

Nonlinear Function Tests: Where GPR Excels¶

In [5]:

Copied!





print(" Test 2: Quadratic Function - Moderate Nonlinearity")
print("=" * 50)

# Test on quadratic function
quadratic_results = []

emulator_types = ['linear', 'gpr', 'bayes_linear']
for emulator_type in emulator_types:
    try:
        result = evaluate_emulator_performance(
            emulator_type,
            TestFunctions.quadratic_1d,
            x_range=(-3, 3),
            n_train=40,
            n_test=100
        )
        quadratic_results.append(result)
    except Exception as e:
        print(f"Failed to evaluate {emulator_type}: {e}")

# Add neural network if available
try:
    result_nn = evaluate_emulator_performance(
        'neural_network',
        TestFunctions.quadratic_1d,
        x_range=(-3, 3),
        n_train=40,
        n_test=100,
        hidden_layers=[20, 10],
        epochs=150
    )
    quadratic_results.append(result_nn)
except:
    pass

plot_emulator_comparison(quadratic_results, "Quadratic Function Test")

# Performance summary
print("\n Performance Summary (Quadratic Function):")
print(f"{'Emulator':<15} {'RMSE':<8} {'R²':<8} {'Train Time':<12} {'Coverage 95%':<12}")
print("-" * 65)

for result in quadratic_results:
    coverage_str = f"{result['coverage_95']:.3f}" if result['coverage_95'] is not None else "N/A"
    print(f"{result['emulator_type']:<15} {result['rmse']:<8.3f} {result['r2']:<8.3f} "
          f"{result['training_time']:<12.4f} {coverage_str:<12}")

print("\n Expected: GPR should outperform linear emulator due to nonlinearity")
print(" Test 2: Quadratic Function - Moderate Nonlinearity")
print("=" * 50)

# Test on quadratic function
quadratic_results = []

emulator_types = ['linear', 'gpr', 'bayes_linear']
for emulator_type in emulator_types:
    try:
        result = evaluate_emulator_performance(
            emulator_type,
            TestFunctions.quadratic_1d,
            x_range=(-3, 3),
            n_train=40,
            n_test=100
        )
        quadratic_results.append(result)
    except Exception as e:
        print(f"Failed to evaluate {emulator_type}: {e}")

# Add neural network if available
try:
    result_nn = evaluate_emulator_performance(
        'neural_network',
        TestFunctions.quadratic_1d,
        x_range=(-3, 3),
        n_train=40,
        n_test=100,
        hidden_layers=[20, 10],
        epochs=150
    )
    quadratic_results.append(result_nn)
except:
    pass

plot_emulator_comparison(quadratic_results, "Quadratic Function Test")

# Performance summary
print("\n Performance Summary (Quadratic Function):")
print(f"{'Emulator':<15} {'RMSE':<8} {'R²':<8} {'Train Time':<12} {'Coverage 95%':<12}")
print("-" * 65)

for result in quadratic_results:
    coverage_str = f"{result['coverage_95']:.3f}" if result['coverage_95'] is not None else "N/A"
    print(f"{result['emulator_type']:<15} {result['rmse']:<8.3f} {result['r2']:<8.3f} "
          f"{result['training_time']:<12.4f} {coverage_str:<12}")

print("\n Expected: GPR should outperform linear emulator due to nonlinearity")

 Test 2: Quadratic Function - Moderate Nonlinearity
==================================================

 Performance Summary (Quadratic Function):
Emulator        RMSE     R²       Train Time   Coverage 95%
-----------------------------------------------------------------
linear          1.420    0.857    0.0019       0.530       
gpr             0.427    0.987    0.2714       1.000       
bayes_linear    1.414    0.858    0.0035       1.000       

 Expected: GPR should outperform linear emulator due to nonlinearity

In [6]:

Copied!





print(" Test 3: Sinusoidal Function - High Nonlinearity")
print("=" * 50)

# Test on sinusoidal function
sinusoidal_results = []

for emulator_type in ['linear', 'gpr', 'bayes_linear']:
    try:
        result = evaluate_emulator_performance(
            emulator_type,
            TestFunctions.sinusoidal_1d,
            x_range=(0, 10),
            n_train=50,
            n_test=100
        )
        sinusoidal_results.append(result)
    except Exception as e:
        print(f"Failed to evaluate {emulator_type}: {e}")

plot_emulator_comparison(sinusoidal_results, "Sinusoidal Function Test")

# Performance summary
print("\n Performance Summary (Sinusoidal Function):")
print(f"{'Emulator':<15} {'RMSE':<8} {'R²':<8} {'Train Time':<12} {'Coverage 95%':<12}")
print("-" * 65)

for result in sinusoidal_results:
    coverage_str = f"{result['coverage_95']:.3f}" if result['coverage_95'] is not None else "N/A"
    print(f"{result['emulator_type']:<15} {result['rmse']:<8.3f} {result['r2']:<8.3f} "
          f"{result['training_time']:<12.4f} {coverage_str:<12}")

print("\n Expected: GPR should significantly outperform linear emulator")
print(" Linear emulator will struggle with sinusoidal patterns")
print(" Test 3: Sinusoidal Function - High Nonlinearity")
print("=" * 50)

# Test on sinusoidal function
sinusoidal_results = []

for emulator_type in ['linear', 'gpr', 'bayes_linear']:
    try:
        result = evaluate_emulator_performance(
            emulator_type,
            TestFunctions.sinusoidal_1d,
            x_range=(0, 10),
            n_train=50,
            n_test=100
        )
        sinusoidal_results.append(result)
    except Exception as e:
        print(f"Failed to evaluate {emulator_type}: {e}")

plot_emulator_comparison(sinusoidal_results, "Sinusoidal Function Test")

# Performance summary
print("\n Performance Summary (Sinusoidal Function):")
print(f"{'Emulator':<15} {'RMSE':<8} {'R²':<8} {'Train Time':<12} {'Coverage 95%':<12}")
print("-" * 65)

for result in sinusoidal_results:
    coverage_str = f"{result['coverage_95']:.3f}" if result['coverage_95'] is not None else "N/A"
    print(f"{result['emulator_type']:<15} {result['rmse']:<8.3f} {result['r2']:<8.3f} "
          f"{result['training_time']:<12.4f} {coverage_str:<12}")

print("\n Expected: GPR should significantly outperform linear emulator")
print(" Linear emulator will struggle with sinusoidal patterns")

 Test 3: Sinusoidal Function - High Nonlinearity
==================================================

 Performance Summary (Sinusoidal Function):
Emulator        RMSE     R²       Train Time   Coverage 95%
-----------------------------------------------------------------
linear          6.706    -0.003   0.0019       0.320       
gpr             0.547    0.993    0.2782       1.000       
bayes_linear    0.535    0.994    0.0107       1.000       

 Expected: GPR should significantly outperform linear emulator
 Linear emulator will struggle with sinusoidal patterns

Complex Function Test: Pushing Emulator Limits¶

In [7]:

Copied!





print(" Test 4: Multimodal Function - Complex Nonlinearity")
print("=" * 50)

# Test on complex multimodal function
multimodal_results = []

for emulator_type in ['linear', 'gpr', 'bayes_linear']:
    try:
        result = evaluate_emulator_performance(
            emulator_type,
            TestFunctions.multimodal_1d,
            x_range=(-4, 4),
            n_train=80,  # More training data for complex function
            n_test=150
        )
        multimodal_results.append(result)
    except Exception as e:
        print(f"Failed to evaluate {emulator_type}: {e}")

plot_emulator_comparison(multimodal_results, "Multimodal Function Test")

# Performance summary
print("\n Performance Summary (Multimodal Function):")
print(f"{'Emulator':<15} {'RMSE':<8} {'R²':<8} {'Train Time':<12} {'Coverage 95%':<12}")
print("-" * 65)

for result in multimodal_results:
    coverage_str = f"{result['coverage_95']:.3f}" if result['coverage_95'] is not None else "N/A"
    print(f"{result['emulator_type']:<15} {result['rmse']:<8.3f} {result['r2']:<8.3f} "
          f"{result['training_time']:<12.4f} {coverage_str:<12}")

print("\n Expected: GPR should handle complexity better, but may need more training data")
print(" Linear emulator will perform poorly on this complex function")
print(" Test 4: Multimodal Function - Complex Nonlinearity")
print("=" * 50)

# Test on complex multimodal function
multimodal_results = []

for emulator_type in ['linear', 'gpr', 'bayes_linear']:
    try:
        result = evaluate_emulator_performance(
            emulator_type,
            TestFunctions.multimodal_1d,
            x_range=(-4, 4),
            n_train=80,  # More training data for complex function
            n_test=150
        )
        multimodal_results.append(result)
    except Exception as e:
        print(f"Failed to evaluate {emulator_type}: {e}")

plot_emulator_comparison(multimodal_results, "Multimodal Function Test")

# Performance summary
print("\n Performance Summary (Multimodal Function):")
print(f"{'Emulator':<15} {'RMSE':<8} {'R²':<8} {'Train Time':<12} {'Coverage 95%':<12}")
print("-" * 65)

for result in multimodal_results:
    coverage_str = f"{result['coverage_95']:.3f}" if result['coverage_95'] is not None else "N/A"
    print(f"{result['emulator_type']:<15} {result['rmse']:<8.3f} {result['r2']:<8.3f} "
          f"{result['training_time']:<12.4f} {coverage_str:<12}")

print("\n Expected: GPR should handle complexity better, but may need more training data")
print(" Linear emulator will perform poorly on this complex function")

 Test 4: Multimodal Function - Complex Nonlinearity
==================================================

 Performance Summary (Multimodal Function):
Emulator        RMSE     R²       Train Time   Coverage 95%
-----------------------------------------------------------------
linear          2.966    0.448    0.0019       0.340       
gpr             3.033    0.422    0.2766       1.000       
bayes_linear    2.978    0.443    0.0038       0.913       

 Expected: GPR should handle complexity better, but may need more training data
 Linear emulator will perform poorly on this complex function

Sample Size Sensitivity Analysis¶

In [8]:

Copied!





print(" Test 5: Sample Size Sensitivity Analysis")
print("=" * 50)

# Test how emulators perform with different training sample sizes
sample_sizes = [10, 20, 30, 50, 80, 120]
emulator_types = ['linear', 'gpr', 'bayes_linear']

sensitivity_results = {emulator_type: {'sizes': [], 'rmse': [], 'r2': [], 'times': []} 
                      for emulator_type in emulator_types}

test_function = TestFunctions.quadratic_1d # Use quadratic as test case

for n_train in sample_sizes:
    print(f"  Testing with {n_train} training samples...")
    
    for emulator_type in emulator_types:
        try:
            result = evaluate_emulator_performance(
                emulator_type,
                test_function,
                x_range=(-3, 3),
                n_train=n_train,
                n_test=100
            )
            
            sensitivity_results[emulator_type]['sizes'].append(n_train)
            sensitivity_results[emulator_type]['rmse'].append(result['rmse'])
            sensitivity_results[emulator_type]['r2'].append(result['r2'])
            sensitivity_results[emulator_type]['times'].append(result['training_time'])
            
        except Exception as e:
            print(f"    Failed {emulator_type} with {n_train} samples: {e}")

# Plot sensitivity analysis
fig, axes = plt.subplots(1, 3, figsize=(18, 6))

colors = {'linear': 'blue', 'gpr': 'red', 'bayes_linear': 'green'}

# RMSE vs sample size
ax = axes[0]
for emulator_type in emulator_types:
    if sensitivity_results[emulator_type]['sizes']:
        ax.plot(sensitivity_results[emulator_type]['sizes'], 
               sensitivity_results[emulator_type]['rmse'],
               'o-', color=colors[emulator_type], linewidth=2, markersize=6,
               label=emulator_type)

ax.set_xlabel('Training Sample Size')
ax.set_ylabel('RMSE')
ax.set_title('RMSE vs Training Sample Size')
ax.legend()
ax.grid(True, alpha=0.3)

# R² vs sample size
ax = axes[1]
for emulator_type in emulator_types:
    if sensitivity_results[emulator_type]['sizes']:
        ax.plot(sensitivity_results[emulator_type]['sizes'], 
               sensitivity_results[emulator_type]['r2'],
               'o-', color=colors[emulator_type], linewidth=2, markersize=6,
               label=emulator_type)

ax.set_xlabel('Training Sample Size')
ax.set_ylabel('R²')
ax.set_title('R² vs Training Sample Size')
ax.legend()
ax.grid(True, alpha=0.3)

# Training time vs sample size
ax = axes[2]
for emulator_type in emulator_types:
    if sensitivity_results[emulator_type]['sizes']:
        ax.plot(sensitivity_results[emulator_type]['sizes'], 
               sensitivity_results[emulator_type]['times'],
               'o-', color=colors[emulator_type], linewidth=2, markersize=6,
               label=emulator_type)

ax.set_xlabel('Training Sample Size')
ax.set_ylabel('Training Time (seconds)')
ax.set_title('Training Time vs Sample Size')
ax.legend()
ax.grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

print("\n Sensitivity Analysis Complete!")
print(" Observations:")
print(" - Linear emulator: Consistent performance, fast training")
print(" - GPR emulator: Better accuracy but longer training time")
print(" - Both benefit from more training data, but with diminishing returns")
print(" - Bayes Linear: GPR-like accuracy without TensorFlow dependency")
print(" Test 5: Sample Size Sensitivity Analysis")
print("=" * 50)

# Test how emulators perform with different training sample sizes
sample_sizes = [10, 20, 30, 50, 80, 120]
emulator_types = ['linear', 'gpr', 'bayes_linear']

sensitivity_results = {emulator_type: {'sizes': [], 'rmse': [], 'r2': [], 'times': []} 
                      for emulator_type in emulator_types}

test_function = TestFunctions.quadratic_1d # Use quadratic as test case

for n_train in sample_sizes:
    print(f"  Testing with {n_train} training samples...")
    
    for emulator_type in emulator_types:
        try:
            result = evaluate_emulator_performance(
                emulator_type,
                test_function,
                x_range=(-3, 3),
                n_train=n_train,
                n_test=100
            )
            
            sensitivity_results[emulator_type]['sizes'].append(n_train)
            sensitivity_results[emulator_type]['rmse'].append(result['rmse'])
            sensitivity_results[emulator_type]['r2'].append(result['r2'])
            sensitivity_results[emulator_type]['times'].append(result['training_time'])
            
        except Exception as e:
            print(f"    Failed {emulator_type} with {n_train} samples: {e}")

# Plot sensitivity analysis
fig, axes = plt.subplots(1, 3, figsize=(18, 6))

colors = {'linear': 'blue', 'gpr': 'red', 'bayes_linear': 'green'}

# RMSE vs sample size
ax = axes[0]
for emulator_type in emulator_types:
    if sensitivity_results[emulator_type]['sizes']:
        ax.plot(sensitivity_results[emulator_type]['sizes'], 
               sensitivity_results[emulator_type]['rmse'],
               'o-', color=colors[emulator_type], linewidth=2, markersize=6,
               label=emulator_type)

ax.set_xlabel('Training Sample Size')
ax.set_ylabel('RMSE')
ax.set_title('RMSE vs Training Sample Size')
ax.legend()
ax.grid(True, alpha=0.3)

# R² vs sample size
ax = axes[1]
for emulator_type in emulator_types:
    if sensitivity_results[emulator_type]['sizes']:
        ax.plot(sensitivity_results[emulator_type]['sizes'], 
               sensitivity_results[emulator_type]['r2'],
               'o-', color=colors[emulator_type], linewidth=2, markersize=6,
               label=emulator_type)

ax.set_xlabel('Training Sample Size')
ax.set_ylabel('R²')
ax.set_title('R² vs Training Sample Size')
ax.legend()
ax.grid(True, alpha=0.3)

# Training time vs sample size
ax = axes[2]
for emulator_type in emulator_types:
    if sensitivity_results[emulator_type]['sizes']:
        ax.plot(sensitivity_results[emulator_type]['sizes'], 
               sensitivity_results[emulator_type]['times'],
               'o-', color=colors[emulator_type], linewidth=2, markersize=6,
               label=emulator_type)

ax.set_xlabel('Training Sample Size')
ax.set_ylabel('Training Time (seconds)')
ax.set_title('Training Time vs Sample Size')
ax.legend()
ax.grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

print("\n Sensitivity Analysis Complete!")
print(" Observations:")
print(" - Linear emulator: Consistent performance, fast training")
print(" - GPR emulator: Better accuracy but longer training time")
print(" - Both benefit from more training data, but with diminishing returns")
print(" - Bayes Linear: GPR-like accuracy without TensorFlow dependency")

 Test 5: Sample Size Sensitivity Analysis
==================================================
  Testing with 10 training samples...

  Testing with 20 training samples...

  Testing with 30 training samples...

  Testing with 50 training samples...

  Testing with 80 training samples...

  Testing with 120 training samples...

 Sensitivity Analysis Complete!
 Observations:
 - Linear emulator: Consistent performance, fast training
 - GPR emulator: Better accuracy but longer training time
 - Both benefit from more training data, but with diminishing returns
 - Bayes Linear: GPR-like accuracy without TensorFlow dependency

Integration with History Matching Workflow¶

Let's demonstrate how different emulator choices affect history matching performance.

In [9]:

Copied!





print("Test 6: Emulator Choice Impact on History Matching")
print("=" * 50)

# Create a simple 2D test case for history matching
def synthetic_model(samples: pd.DataFrame) -> pd.DataFrame:
    """Synthetic model with nonlinear behavior."""
    results = []
    for _, row in samples.iterrows():
        p1, p2 = row['param1'], row['param2']
        output1 = p1**2 + p2 + np.random.normal(0, 0.1)
        output2 = np.sin(p1) * p2 + np.random.normal(0, 0.05)
        output3 = p1 * p2 + 0.5 * p1**2 + np.random.normal(0, 0.1)
        results.append({'output1': output1, 'output2': output2, 'output3': output3})
    return pd.DataFrame(results)

# Generate "observed" data
true_params = pd.DataFrame({'param1': [1.5], 'param2': [2.0]})
true_outputs = synthetic_model(true_params)

observations = {
    'output1': (true_outputs['output1'].iloc[0], 0.2),
    'output2': (true_outputs['output2'].iloc[0], 0.1),
    'output3': (true_outputs['output3'].iloc[0], 0.2),
}
parameter_bounds = {'param1': (0.5, 3.0), 'param2': (1.0, 4.0)}

print(f"True parameters: param1={true_params['param1'].iloc[0]}, param2={true_params['param2'].iloc[0]}")
print(f"Observations: {observations}")

def run_hm_with_emulator(emulator_type, max_iterations=3):
    """Run history matching with specified emulator type."""
    print(f"\nRunning HM with {emulator_type} emulator...")

    try:
        engine = hm.HistoryMatching(
            bounds=parameter_bounds,
            observations=observations,
            function=synthetic_model,
            sampling_strategy='lhs',
            emulator_type=emulator_type,
            n_samples=300,
            max_iterations=max_iterations,
            random_seed=42,
        )

        start_time = time.time()
        results = engine.run()
        end_time = time.time()

        final_samples = engine.get_nroy_samples()
        param1_error = abs(final_samples['param1'].median() - true_params['param1'].iloc[0])
        param2_error = abs(final_samples['param2'].median() - true_params['param2'].iloc[0])

        return {
            'emulator_type':   emulator_type,
            'total_time':      end_time - start_time,
            'iterations':      len(results),
            'final_samples':   len(final_samples),
            'param1_error':    param1_error,
            'param2_error':    param2_error,
            'acceptance_rate': engine.acceptance_rate,
            'samples':         final_samples,
        }

    except Exception as e:
        print(f"  Failed with {emulator_type}: {e}")
        return None

# Test different emulator types
hm_results = []
for emulator_type in ['linear', 'bayes_linear', 'gpr']:
    result = run_hm_with_emulator(emulator_type)
    if result:
        hm_results.append(result)
print("Test 6: Emulator Choice Impact on History Matching")
print("=" * 50)

# Create a simple 2D test case for history matching
def synthetic_model(samples: pd.DataFrame) -> pd.DataFrame:
    """Synthetic model with nonlinear behavior."""
    results = []
    for _, row in samples.iterrows():
        p1, p2 = row['param1'], row['param2']
        output1 = p1**2 + p2 + np.random.normal(0, 0.1)
        output2 = np.sin(p1) * p2 + np.random.normal(0, 0.05)
        output3 = p1 * p2 + 0.5 * p1**2 + np.random.normal(0, 0.1)
        results.append({'output1': output1, 'output2': output2, 'output3': output3})
    return pd.DataFrame(results)

# Generate "observed" data
true_params = pd.DataFrame({'param1': [1.5], 'param2': [2.0]})
true_outputs = synthetic_model(true_params)

observations = {
    'output1': (true_outputs['output1'].iloc[0], 0.2),
    'output2': (true_outputs['output2'].iloc[0], 0.1),
    'output3': (true_outputs['output3'].iloc[0], 0.2),
}
parameter_bounds = {'param1': (0.5, 3.0), 'param2': (1.0, 4.0)}

print(f"True parameters: param1={true_params['param1'].iloc[0]}, param2={true_params['param2'].iloc[0]}")
print(f"Observations: {observations}")

def run_hm_with_emulator(emulator_type, max_iterations=3):
    """Run history matching with specified emulator type."""
    print(f"\nRunning HM with {emulator_type} emulator...")

    try:
        engine = hm.HistoryMatching(
            bounds=parameter_bounds,
            observations=observations,
            function=synthetic_model,
            sampling_strategy='lhs',
            emulator_type=emulator_type,
            n_samples=300,
            max_iterations=max_iterations,
            random_seed=42,
        )

        start_time = time.time()
        results = engine.run()
        end_time = time.time()

        final_samples = engine.get_nroy_samples()
        param1_error = abs(final_samples['param1'].median() - true_params['param1'].iloc[0])
        param2_error = abs(final_samples['param2'].median() - true_params['param2'].iloc[0])

        return {
            'emulator_type':   emulator_type,
            'total_time':      end_time - start_time,
            'iterations':      len(results),
            'final_samples':   len(final_samples),
            'param1_error':    param1_error,
            'param2_error':    param2_error,
            'acceptance_rate': engine.acceptance_rate,
            'samples':         final_samples,
        }

    except Exception as e:
        print(f"  Failed with {emulator_type}: {e}")
        return None

# Test different emulator types
hm_results = []
for emulator_type in ['linear', 'bayes_linear', 'gpr']:
    result = run_hm_with_emulator(emulator_type)
    if result:
        hm_results.append(result)

Test 6: Emulator Choice Impact on History Matching
==================================================
True parameters: param1=1.5, param2=2.0
Observations: {'output1': (4.123550159304504, 0.2), 'output2': (1.90533465058138, 0.1), 'output3': (4.086718281038703, 0.2)}

Running HM with linear emulator...

Running HM with bayes_linear emulator...

Running HM with gpr emulator...

In [10]:

Copied!





# Analyze history matching results
if hm_results:
    print("\n History Matching Results Comparison:")
    print(f"{'Emulator':<15} {'Time(s)':<8} {'Iterations':<12} {'Final Samples':<14} {'Param1 Error':<13} {'Param2 Error':<13} {'Accept Rate':<12}")
    print("-" * 100)
    
    for result in hm_results:
        print(f"{result['emulator_type']:<15} {result['total_time']:<8.2f} {result['iterations']:<12} "
              f"{result['final_samples']:<14} {result['param1_error']:<13.3f} {result['param2_error']:<13.3f} "
              f"{result['acceptance_rate']:<12.3f}")
    
    # Visualize parameter recovery
    fig, axes = plt.subplots(1, 2, figsize=(15, 6))
    
    colors = ['blue', 'red', 'green']
    
    # Parameter space comparison
    ax = axes[0]
    for i, result in enumerate(hm_results):
        samples = result['samples']
        ax.scatter(samples['param1'], samples['param2'], 
                  alpha=0.6, s=20, color=colors[i % len(colors)], 
                  label=f"{result['emulator_type']} ({len(samples)} samples)")
    
    # Mark true parameters
    ax.scatter(true_params['param1'].iloc[0], true_params['param2'].iloc[0], 
              color='red', s=200, marker='*', edgecolor='black', linewidth=2, 
              label='True parameters', zorder=5)
    
    ax.set_xlabel('param1')
    ax.set_ylabel('param2')
    ax.set_title('Final Parameter Distributions')
    ax.legend()
    ax.grid(True, alpha=0.3)
    
    # Error comparison
    ax = axes[1]
    emulator_names = [r['emulator_type'] for r in hm_results]
    param1_errors = [r['param1_error'] for r in hm_results]
    param2_errors = [r['param2_error'] for r in hm_results]
    
    x_pos = np.arange(len(emulator_names))
    width = 0.35
    
    ax.bar(x_pos - width/2, param1_errors, width, label='param1 error', alpha=0.7)
    ax.bar(x_pos + width/2, param2_errors, width, label='param2 error', alpha=0.7)
    
    ax.set_xlabel('Emulator Type')
    ax.set_ylabel('Absolute Error')
    ax.set_title('Parameter Recovery Error')
    ax.set_xticks(x_pos)
    ax.set_xticklabels(emulator_names)
    ax.legend()
    ax.grid(True, alpha=0.3)
    
    plt.tight_layout()
    plt.show()
    
    print("\n History Matching Insights:")
    print("  - Emulator choice significantly affects parameter recovery accuracy")
    print("  - GPR typically provides better uncertainty quantification")
    print("  - Linear emulators are faster but may miss nonlinear relationships")
    print("  - Consider the complexity of your model when choosing emulators")

else:
    print("❌ No history matching results available")
# Analyze history matching results
if hm_results:
    print("\n History Matching Results Comparison:")
    print(f"{'Emulator':<15} {'Time(s)':<8} {'Iterations':<12} {'Final Samples':<14} {'Param1 Error':<13} {'Param2 Error':<13} {'Accept Rate':<12}")
    print("-" * 100)
    
    for result in hm_results:
        print(f"{result['emulator_type']:<15} {result['total_time']:<8.2f} {result['iterations']:<12} "
              f"{result['final_samples']:<14} {result['param1_error']:<13.3f} {result['param2_error']:<13.3f} "
              f"{result['acceptance_rate']:<12.3f}")
    
    # Visualize parameter recovery
    fig, axes = plt.subplots(1, 2, figsize=(15, 6))
    
    colors = ['blue', 'red', 'green']
    
    # Parameter space comparison
    ax = axes[0]
    for i, result in enumerate(hm_results):
        samples = result['samples']
        ax.scatter(samples['param1'], samples['param2'], 
                  alpha=0.6, s=20, color=colors[i % len(colors)], 
                  label=f"{result['emulator_type']} ({len(samples)} samples)")
    
    # Mark true parameters
    ax.scatter(true_params['param1'].iloc[0], true_params['param2'].iloc[0], 
              color='red', s=200, marker='*', edgecolor='black', linewidth=2, 
              label='True parameters', zorder=5)
    
    ax.set_xlabel('param1')
    ax.set_ylabel('param2')
    ax.set_title('Final Parameter Distributions')
    ax.legend()
    ax.grid(True, alpha=0.3)
    
    # Error comparison
    ax = axes[1]
    emulator_names = [r['emulator_type'] for r in hm_results]
    param1_errors = [r['param1_error'] for r in hm_results]
    param2_errors = [r['param2_error'] for r in hm_results]
    
    x_pos = np.arange(len(emulator_names))
    width = 0.35
    
    ax.bar(x_pos - width/2, param1_errors, width, label='param1 error', alpha=0.7)
    ax.bar(x_pos + width/2, param2_errors, width, label='param2 error', alpha=0.7)
    
    ax.set_xlabel('Emulator Type')
    ax.set_ylabel('Absolute Error')
    ax.set_title('Parameter Recovery Error')
    ax.set_xticks(x_pos)
    ax.set_xticklabels(emulator_names)
    ax.legend()
    ax.grid(True, alpha=0.3)
    
    plt.tight_layout()
    plt.show()
    
    print("\n History Matching Insights:")
    print("  - Emulator choice significantly affects parameter recovery accuracy")
    print("  - GPR typically provides better uncertainty quantification")
    print("  - Linear emulators are faster but may miss nonlinear relationships")
    print("  - Consider the complexity of your model when choosing emulators")

else:
    print("❌ No history matching results available")

 History Matching Results Comparison:
Emulator        Time(s)  Iterations   Final Samples  Param1 Error  Param2 Error  Accept Rate 
----------------------------------------------------------------------------------------------------
linear          8.39     3            300            0.202         0.511         0.225       
bayes_linear    9.40     3            300            0.007         0.047         0.051       
gpr             11.77    3            300            0.011         0.069         0.045

 History Matching Insights:
  - Emulator choice significantly affects parameter recovery accuracy
  - GPR typically provides better uncertainty quantification
  - Linear emulators are faster but may miss nonlinear relationships
  - Consider the complexity of your model when choosing emulators

Emulator Selection Guidelines¶

In [11]:

Copied!





# Create comprehensive comparison summary
print(" Emulator Selection Guidelines")
print("=" * 50)

guidelines = {
    'Linear Emulator': {
        'Best for': [
            'Linear or near-linear relationships',
            'Fast prototyping and testing',
            'Large-scale problems requiring speed',
            'Well-understood linear models'
        ],
        'Advantages': [
            'Very fast training and prediction',
            'Interpretable coefficients',
            'Minimal memory requirements',
            'Stable and robust'
        ],
        'Limitations': [
            'Cannot capture nonlinear patterns',
            'Poor uncertainty quantification',
            'May underfit complex relationships',
            'Limited expressiveness'
        ],
        'Typical use cases': [
            'Linear regression models',
            'Simple physical relationships',
            'Baseline comparisons',
            'High-dimensional linear problems'
        ]
    },
    
        'Bayes Linear Emulator': {
        'Best for': [
            'Nonlinear relationships with moderate complexity',
            'Problems where TensorFlow/GPflow is not available',
            'Lightweight deployments needing GP-like accuracy',
            'History matching with pure numpy/scipy stack'
        ],
        'Advantages': [
            'Good uncertainty quantification (like GPR)',
            'No TensorFlow dependency — pure numpy/scipy',
            'Faster training than GPR for moderate datasets',
            'Squared-exponential kernel with ARD'
        ],
        'Limitations': [
            'O(n^3) training like GPR (Cholesky factorization)',
            'Less flexible than full GP (fixed OLS trend)',
            'No marginal likelihood for joint hyperparameter tuning',
            'Sequential hyperparameter estimation'
        ],
        'Typical use cases': [
            'History matching workflows',
            'Environments without GPU support',
            'Rapid prototyping before committing to GPR',
            'Scientific computing on minimal dependencies'
        ]
    },
    
    'Gaussian Process Regression (GPR)': {
        'Best for': [
            'Nonlinear relationships',
            'Small to medium datasets (< 10,000 samples)',
            'Problems requiring uncertainty quantification',
            'Smooth, continuous functions'
        ],
        'Advantages': [
            'Excellent uncertainty quantification',
            'Handles nonlinearity well',
            'Principled Bayesian framework',
            'Good interpolation properties'
        ],
        'Limitations': [
            'Slow training on large datasets (O(n³))',
            'Memory intensive',
            'Requires hyperparameter tuning',
            'Can be sensitive to kernel choice'
        ],
        'Typical use cases': [
            'Scientific modeling',
            'Expensive computer experiments',
            'Optimization problems',
            'Medical/biological models'
        ]
    },
    
    'Neural Network Emulator': {
        'Best for': [
            'Highly complex, nonlinear relationships',
            'Large datasets',
            'High-dimensional input spaces',
            'Pattern recognition tasks'
        ],
        'Advantages': [
            'Can learn complex patterns',
            'Scales well with data size',
            'Flexible architecture',
            'Good for high-dimensional problems'
        ],
        'Limitations': [
            'Requires large amounts of training data',
            'Black box (limited interpretability)',
            'Prone to overfitting',
            'Requires careful hyperparameter tuning'
        ],
        'Typical use cases': [
            'Image recognition in simulations',
            'Complex dynamical systems',
            'High-dimensional parameter spaces',
            'Deep learning applications'
        ]
    }
}

for emulator_name, info in guidelines.items():
    print(f"\n {emulator_name}")
    print("-" * (len(emulator_name) + 5))
    
    for category, items in info.items():
        print(f"\n{category}:")
        for item in items:
            print(f"  • {item}")

print("\n\n Decision Framework:")
print("-" * 20)

decision_tree = [
    "1. **Data Size Assessment**:",
    "   • Small (< 100 samples): Bayes Linear or GPR",
    "   • Medium (100-1000 samples): Bayes Linear or GPR",
    "   • Large (> 1000 samples): Bayes Linear (GPR training gets slow)",
    "",
    "2. **Relationship Complexity**:",
    "   • Linear/Simple: Linear Emulator",
    "   • Moderately Nonlinear: Bayes Linear or GPR",
    "   • Highly Complex: GPR",
    "",
    "3. **Uncertainty Quantification Needs**:",
    "   • Critical: GPR or Bayes Linear (both provide good UQ)",
    "   • Moderate: Bayes Linear",
    "   • Not important: Any emulator type",
    "",
    "4. **Computational Constraints**:",
    "   • Speed critical: Linear Emulator",
    "   • No TensorFlow: Bayes Linear (pure numpy/scipy)",
    "   • Balanced: Bayes Linear",
    "   • GPU available: GPR",
    "",
    "5. **Domain Knowledge**:",
    "   • Known linear relationships: Linear Emulator",
    "   • Physical/scientific models: GPR",
    "   • Black-box complex systems: Neural Network"
]

for line in decision_tree:
    print(line)

print("\n\n Quick Start Recommendations:")
print("-" * 30)
print("• **Starting out**: Use Bayes Linear - GPR-like quality, no TensorFlow dependency")
print("• **Production workflows**: Bayes Linear for most cases, GPR if you need maximum flexibility")
print("• **Research applications**: Bayes Linear or GPR for uncertainty quantification")
print("• **Large-scale studies**: Linear for speed, Neural Network for complexity")
print("• **When in doubt**: Run comparison like in this notebook!")
# Create comprehensive comparison summary
print(" Emulator Selection Guidelines")
print("=" * 50)

guidelines = {
    'Linear Emulator': {
        'Best for': [
            'Linear or near-linear relationships',
            'Fast prototyping and testing',
            'Large-scale problems requiring speed',
            'Well-understood linear models'
        ],
        'Advantages': [
            'Very fast training and prediction',
            'Interpretable coefficients',
            'Minimal memory requirements',
            'Stable and robust'
        ],
        'Limitations': [
            'Cannot capture nonlinear patterns',
            'Poor uncertainty quantification',
            'May underfit complex relationships',
            'Limited expressiveness'
        ],
        'Typical use cases': [
            'Linear regression models',
            'Simple physical relationships',
            'Baseline comparisons',
            'High-dimensional linear problems'
        ]
    },
    
        'Bayes Linear Emulator': {
        'Best for': [
            'Nonlinear relationships with moderate complexity',
            'Problems where TensorFlow/GPflow is not available',
            'Lightweight deployments needing GP-like accuracy',
            'History matching with pure numpy/scipy stack'
        ],
        'Advantages': [
            'Good uncertainty quantification (like GPR)',
            'No TensorFlow dependency — pure numpy/scipy',
            'Faster training than GPR for moderate datasets',
            'Squared-exponential kernel with ARD'
        ],
        'Limitations': [
            'O(n^3) training like GPR (Cholesky factorization)',
            'Less flexible than full GP (fixed OLS trend)',
            'No marginal likelihood for joint hyperparameter tuning',
            'Sequential hyperparameter estimation'
        ],
        'Typical use cases': [
            'History matching workflows',
            'Environments without GPU support',
            'Rapid prototyping before committing to GPR',
            'Scientific computing on minimal dependencies'
        ]
    },
    
    'Gaussian Process Regression (GPR)': {
        'Best for': [
            'Nonlinear relationships',
            'Small to medium datasets (< 10,000 samples)',
            'Problems requiring uncertainty quantification',
            'Smooth, continuous functions'
        ],
        'Advantages': [
            'Excellent uncertainty quantification',
            'Handles nonlinearity well',
            'Principled Bayesian framework',
            'Good interpolation properties'
        ],
        'Limitations': [
            'Slow training on large datasets (O(n³))',
            'Memory intensive',
            'Requires hyperparameter tuning',
            'Can be sensitive to kernel choice'
        ],
        'Typical use cases': [
            'Scientific modeling',
            'Expensive computer experiments',
            'Optimization problems',
            'Medical/biological models'
        ]
    },
    
    'Neural Network Emulator': {
        'Best for': [
            'Highly complex, nonlinear relationships',
            'Large datasets',
            'High-dimensional input spaces',
            'Pattern recognition tasks'
        ],
        'Advantages': [
            'Can learn complex patterns',
            'Scales well with data size',
            'Flexible architecture',
            'Good for high-dimensional problems'
        ],
        'Limitations': [
            'Requires large amounts of training data',
            'Black box (limited interpretability)',
            'Prone to overfitting',
            'Requires careful hyperparameter tuning'
        ],
        'Typical use cases': [
            'Image recognition in simulations',
            'Complex dynamical systems',
            'High-dimensional parameter spaces',
            'Deep learning applications'
        ]
    }
}

for emulator_name, info in guidelines.items():
    print(f"\n {emulator_name}")
    print("-" * (len(emulator_name) + 5))
    
    for category, items in info.items():
        print(f"\n{category}:")
        for item in items:
            print(f"  • {item}")

print("\n\n Decision Framework:")
print("-" * 20)

decision_tree = [
    "1. **Data Size Assessment**:",
    "   • Small (< 100 samples): Bayes Linear or GPR",
    "   • Medium (100-1000 samples): Bayes Linear or GPR",
    "   • Large (> 1000 samples): Bayes Linear (GPR training gets slow)",
    "",
    "2. **Relationship Complexity**:",
    "   • Linear/Simple: Linear Emulator",
    "   • Moderately Nonlinear: Bayes Linear or GPR",
    "   • Highly Complex: GPR",
    "",
    "3. **Uncertainty Quantification Needs**:",
    "   • Critical: GPR or Bayes Linear (both provide good UQ)",
    "   • Moderate: Bayes Linear",
    "   • Not important: Any emulator type",
    "",
    "4. **Computational Constraints**:",
    "   • Speed critical: Linear Emulator",
    "   • No TensorFlow: Bayes Linear (pure numpy/scipy)",
    "   • Balanced: Bayes Linear",
    "   • GPU available: GPR",
    "",
    "5. **Domain Knowledge**:",
    "   • Known linear relationships: Linear Emulator",
    "   • Physical/scientific models: GPR",
    "   • Black-box complex systems: Neural Network"
]

for line in decision_tree:
    print(line)

print("\n\n Quick Start Recommendations:")
print("-" * 30)
print("• **Starting out**: Use Bayes Linear - GPR-like quality, no TensorFlow dependency")
print("• **Production workflows**: Bayes Linear for most cases, GPR if you need maximum flexibility")
print("• **Research applications**: Bayes Linear or GPR for uncertainty quantification")
print("• **Large-scale studies**: Linear for speed, Neural Network for complexity")
print("• **When in doubt**: Run comparison like in this notebook!")

 Emulator Selection Guidelines
==================================================

 Linear Emulator
--------------------

Best for:
  • Linear or near-linear relationships
  • Fast prototyping and testing
  • Large-scale problems requiring speed
  • Well-understood linear models

Advantages:
  • Very fast training and prediction
  • Interpretable coefficients
  • Minimal memory requirements
  • Stable and robust

Limitations:
  • Cannot capture nonlinear patterns
  • Poor uncertainty quantification
  • May underfit complex relationships
  • Limited expressiveness

Typical use cases:
  • Linear regression models
  • Simple physical relationships
  • Baseline comparisons
  • High-dimensional linear problems

 Bayes Linear Emulator
--------------------------

Best for:
  • Nonlinear relationships with moderate complexity
  • Problems where TensorFlow/GPflow is not available
  • Lightweight deployments needing GP-like accuracy
  • History matching with pure numpy/scipy stack

Advantages:
  • Good uncertainty quantification (like GPR)
  • No TensorFlow dependency — pure numpy/scipy
  • Faster training than GPR for moderate datasets
  • Squared-exponential kernel with ARD

Limitations:
  • O(n^3) training like GPR (Cholesky factorization)
  • Less flexible than full GP (fixed OLS trend)
  • No marginal likelihood for joint hyperparameter tuning
  • Sequential hyperparameter estimation

Typical use cases:
  • History matching workflows
  • Environments without GPU support
  • Rapid prototyping before committing to GPR
  • Scientific computing on minimal dependencies

 Gaussian Process Regression (GPR)
--------------------------------------

Best for:
  • Nonlinear relationships
  • Small to medium datasets (< 10,000 samples)
  • Problems requiring uncertainty quantification
  • Smooth, continuous functions

Advantages:
  • Excellent uncertainty quantification
  • Handles nonlinearity well
  • Principled Bayesian framework
  • Good interpolation properties

Limitations:
  • Slow training on large datasets (O(n³))
  • Memory intensive
  • Requires hyperparameter tuning
  • Can be sensitive to kernel choice

Typical use cases:
  • Scientific modeling
  • Expensive computer experiments
  • Optimization problems
  • Medical/biological models

 Neural Network Emulator
----------------------------

Best for:
  • Highly complex, nonlinear relationships
  • Large datasets
  • High-dimensional input spaces
  • Pattern recognition tasks

Advantages:
  • Can learn complex patterns
  • Scales well with data size
  • Flexible architecture
  • Good for high-dimensional problems

Limitations:
  • Requires large amounts of training data
  • Black box (limited interpretability)
  • Prone to overfitting
  • Requires careful hyperparameter tuning

Typical use cases:
  • Image recognition in simulations
  • Complex dynamical systems
  • High-dimensional parameter spaces
  • Deep learning applications


 Decision Framework:
--------------------
1. **Data Size Assessment**:
   • Small (< 100 samples): Bayes Linear or GPR
   • Medium (100-1000 samples): Bayes Linear or GPR
   • Large (> 1000 samples): Bayes Linear (GPR training gets slow)

2. **Relationship Complexity**:
   • Linear/Simple: Linear Emulator
   • Moderately Nonlinear: Bayes Linear or GPR
   • Highly Complex: GPR

3. **Uncertainty Quantification Needs**:
   • Critical: GPR or Bayes Linear (both provide good UQ)
   • Moderate: Bayes Linear
   • Not important: Any emulator type

4. **Computational Constraints**:
   • Speed critical: Linear Emulator
   • No TensorFlow: Bayes Linear (pure numpy/scipy)
   • Balanced: Bayes Linear
   • GPU available: GPR

5. **Domain Knowledge**:
   • Known linear relationships: Linear Emulator
   • Physical/scientific models: GPR
   • Black-box complex systems: Neural Network


 Quick Start Recommendations:
------------------------------
• **Starting out**: Use Bayes Linear - GPR-like quality, no TensorFlow dependency
• **Production workflows**: Bayes Linear for most cases, GPR if you need maximum flexibility
• **Research applications**: Bayes Linear or GPR for uncertainty quantification
• **Large-scale studies**: Linear for speed, Neural Network for complexity
• **When in doubt**: Run comparison like in this notebook!

Summary¶

This notebook demonstrated the different emulator types available in the history matching library and provided practical guidance for choosing the right emulator for your specific problem.

Key Findings¶

Linear Emulators: Excel at linear relationships, very fast, but limited for complex patterns
Bayes Linear: GP-like accuracy and uncertainty without TensorFlow — pure numpy/scipy
Gaussian Process Regression: Best balance of flexibility and uncertainty quantification
Neural Networks: Handle complex patterns but require more data and tuning

Performance Insights¶

Training Time: Linear << GPR < Neural Network
Prediction Speed: Linear ≈ GPR < Neural Network
Accuracy on Linear Data: Linear ≈ GPR > Neural Network
Accuracy on Nonlinear Data: Neural Network ≈ GPR >> Linear
Uncertainty Quantification: GPR >> Neural Network > Linear

Practical Recommendations¶

Start with Bayes Linear or GPR for most applications - Bayes Linear if you want to avoid TensorFlow
Use Linear emulators when speed is critical or relationships are known to be linear
Consider Neural Networks for highly complex, high-dimensional problems with lots of data
Always validate emulator performance before using in production workflows
Use the comparison framework in this notebook to test on your specific problem

Integration with History Matching¶

The choice of emulator significantly affects:

Parameter recovery accuracy
Computational efficiency
Uncertainty quantification quality
Workflow reliability

Choose your emulator based on your specific needs, data characteristics, and computational constraints. When in doubt, test multiple options using the framework provided in this notebook!