To do#

The following topic lists the functionality we plan to add to phyloModels in the future.

Calibration algorithms#

Features and summary statistics#

  • Include multiple correlation coefficients as statistics for the evaluation of dispersion of summary statistics.

  • Integrate summary statistics for trees.

History matching#

  • Automatic selection of GLM basis. This can done by running a quick test that characterizes the data to fit as one of the basis supported by History Matching (which are the basis supported by statsmodels).

  • Use history matching diagnostics information for deciding if an iteration is successful or not (and make decisions regarding the need for repeating it using a different observation or summary statistic). NOTE: This depends on history matching reporting test information.

  • Add support for using a variable number of model simulations per iteration, as well as guidelines (and functions) for selecting them.

  • Add calibration example calling a model in R.

  • Relax the condition that the number of observations must match the number of outputs rendered by the simulation model. The number of observations to use for feature generation should be the minimum between the number of observations provided as input argument and the number of outputs delivered by the simulation model.

  • Incorporate means for robustly dealing with outliers and for capturing rare events.

Parameter sweep#

  • Add support for additional cost metrics.

Others#

  • Integrate other calibration methods: ELFI, ABC

Development#

General#

  • Document specifications for tests.

  • Performance characterization for history matching.

Features#

  • Update conventions for the name of outputs of summary statistics. It should be the name of the summary statistic (i.e., feature) followed by underscore “_” and an integer number.

  • Ensure parallelization for the computation of features. We should be able to take advantage of all available cores for computing features and statistics. Parallelization should be done on each feature or statistic independently.

  • How can we optimize computations so that data can be shared/reused for the computation of multiple features or statistics? For example, dx is used in all series_derivative features.

Visualization#

  • Superimpose marginal distributions in parallel coordinates plots.

  • Superimpose reference curve in parallel coordinates plots.

  • Add support for reverse/log axis in any coordinate in parallel coordinates plots (these options are currently supported only for the rightmost axes).

Distribution#

  • Package in a Docker image.

  • Make library available in Artifactory.