Calibration#

The phylomodels.calibration module contains wrappers to calibration functions and libraries, as well as any code required for their use. The goals of this module are to provide a standard interface for calibration jobs, as well as maintaining a battery of calibration methods that are well suited for our needs.

There are two types of tasks that can be done using functions in this module:

calibration initialization, which generates one or more initial points for faster convergence of calibration algorithms; and
calibration, which executes an actual calibration job.

Calibration algorithms could use the output of an initialization algorithm. However, the use of initialization algorithms is not required.

File structure#

This directory is organized as a collection of wrappers for calibration and calibration initializers. The wrappers are Python functions. The name of the function should be the same name of the file that contains it. Additional functions (and files) that are necessary for the execution of a calibration algorithm or initializer must be in an individual subdirectory. Optionally, there could be a test script for each wrapper.

For example, a basic configuration of history matching as an initializer could require the following file structure:

init_historyMatching_basic: Wrapper function for history matching.
test_init_historyMatching_basic: Optional script for testing the basic configuration of history matching.
./init_historyMatching: Auxiliary functions for the execution of history matching.

Note

A separate folder with model code may not be always necessary. For example, accessing models from a package that is already installed (and whose code is located in a separate folder/folder structure) may be done directly from the wrapper.

Naming conventions#

Wrappers#

Wrappers should be named as follows:

[type]_[descriptive name of algorithm/method/library][_configuration name (optional)]

type

Indicates the type of operation that is being done. It can be:

init, for calibration initializer; or
cal, for calibration operations.

descriptive name of algorithm/method/library

Refers to the algorithm or library that is being used by this wrapper.

configuration name

Optional, describes a particular configuration mode used when calling the algorithm.

The following are examples of wrapper names for different configurations of a history matching initializer:

init_historyMatching
init_historyMatching_poissonBasis
init_historyMatching_gaussianBasis
init_historyMatching_rejectionRules
init_historyMatching_rejectionRules_PoissonBasis

Tests#

Wrapper test scripts should take the name of the wrapper with the test_ prefix.

Subfolders#

As mentioned above, any other file necessary for interfacing a method or library must be placed in a separate subfolder. The subfolder should be named as follows:

[type]_[descriptive name of algorithm/method/library]

where type and descriptive name of algorithm/method/library follow the descriptions mentioned for the wrappers above.

For example, a subfolder containing files for using history matching would be named init_historyMatching.

Note

The existence of a subfolder for a given method is optional. A subfolder should only be created if there is at least one file (other than the wrapper) necessary for calling a model.

Inputs#

Initializers#

Each wrapper receives the following arguments (in this order):

xInfo: Name and range of model parameters. This is a pandas dataframe with 3 columns, namely: name, min, and max. Each row of the dataframe contains the corresponding information for a given model parameter.
y: Observation or measurements. This is a pandas dataframe. Columns in this dataframe are the features that will be used for calibration. The name of the features correspond to names of features that can be computed by the features module (those names are used by the calibration algorithm for configuring calls to features methods. The first row of y contains the observations that will be used by the calibration method for fitting the model. Optionally, the second row may contain the variance of each observation. The variance may be used by initialization and/or calibration algorithms.
model: Pointer to model or model wrapper. The interface to the model (or its wrapper) should be the one defined in the phylomodels.models module.
params: Other parameters for the initializer. This argument is a Python dictionary (i.e., key-value pairs with the name of the parameters and their value).

Inputs to initializers must not be modified by the initializer (or any of the functions called internally by the initializer).

Calibration solvers#

Each calibration solver receives the following arguments (in this order):

xInfo: Name and range of model parameters (see description of initializer inputs above).
xInit: Initial value(s) or guess of model parameters. This is a pandas dataframe where columns refer (and are named according) to parameters of the model, and rows are initial values for the solver.
y: Observations or measurements (see description of initializer inputs above).
model: Pointer to model or model wrapper (see description of initializer inputs above).
params: Other parameters for the calibration method. This argument is a Python dictionary.

Inputs to calibration functions must not be modified by the calibration function, or by any of the subroutines called internally by the function).

Outputs#

Initializers#

Each initializer returns a single output. The output is a pandas dataframe containing one or more sets of parameters (i.e., the initial values) that can be used as inputs for a calibration method. Ideally, the output of an initializer either characterizes a reduced parameter space, or provides an initial point (or set of points) for faster convergence of calibration methods. Each column of the output dataframe contains the values for a parameter (the names of the columns are indicated by xInfo["name"]). Each row contains the values of a set of parameters for initialization of a calibration method.

Calibration solvers#

Each calibration solver returns a single output. The output is a pandas dataframe. Columns of this dataframe are:

Parameters of the model that render the solution. There is one column per parameter. Names of these columns are defined by xInfo["name"].
Distance (or cost) metric. This is a column with the distance metric obtained for each set of parameters.
Any other additional column, as defined by the calibration method.

Rows in the output dataframe are solutions to the calibration problem. Rows are sorted in decreasing order (from best to worst). Note that, depending on the calibration method, the output can contain more than one solution.