COMPS scheduling

idmtools supports job scheduling on the COMPS platform, which includes support for multiple scenarios depending upon the scheduling needs of your specific research needs and requirements. For example, you could schedule your simulations to run under a single process on the same node and with a specified number of cores. For more information about this and other supported scenarios, see Scheduling Scenarios. To use the full scheduling capabilites included within COMPS you must add the workorder.json as a transient asset. This is a one time task to complete for your project. For more information about scheduling configuration, see Scheduling Configuration. Examples are provided from which you can leverage to help get started and gain a better understanding. Scheduling Schemas enumerate the available options that may be included in workorder.json.

Scheduling scenarios

Choosing the correct scheduling scenario will depend upon your specific research needs and requirements. The following lists some of the common scenarios supported:

  • N cores, N processes - useful for single-threaded or MPI-enabled workloads, such as EMOD.

  • N cores, 1 node, 1 process - useful for models that want to spawn various worker thread (GenEpi) or have large memory usage, where the number of cores being an indicator of memory usage.

  • 1 node, N processes - useful for models with high migration and interprocess communication. By running on the same node MPI can use shared memory, as opposed to slower tcp sockets over multiple nodes. This may be useful for some scenarios using EMOD or other MPI-enabled workloads.

Scheduling configuration

By configuring a workorder.json file and adding it as a transient asset you can take advantage of the full scheduling support provided with COMPS. Scheduling information included in the workorder.json file will take precedent over any scheduling information you may have in the idmtools.ini file or scheduling parameters passed to Platform. The following examples shows some of the options available to include in a workorder.json file.

Example workorder.json for HPC clusters:

{
  "Command": "python -c \"print('hello test')\"",
  "NodeGroupName": "idm_abcd",
  "NumCores": 1,
  "SingleNode": false,
  "Exclusive": false
}

Example workorder.json for SLURM clusters:

{
  "Command": "python3 Assets/model1.py",
  "NodeGroupName": "idm_abcd",
  "NumCores": 1,
  "NumProcesses": 1,
  "NumNodes": 1,
  "Environment": {
    "key1": "value1",
    "key2:": "value2",
    "PYTHONPATH": "$PYTHONPATH:$PWD/Assets:$PWD/Assets/site-packages",
    "PATH": "$PATH:$PWD/Assets:$PWD/Assets/site-packages"
  }
}

In addition to including a workorder.json file you must also set and pass scheduling=True parameter when running simulations, for example:

experiment.run(scheduling=True)

Add workorder.json as a transient asset

To include the workorder.json file as a transient asset you can either add an existing workorder.json using the add_work_order method or dynamically create one using the add_schedule_config method, both methods included in the Scheduling class.

Add existing workorder.json:

add_work_order(ts, file_path=os.path.join(COMMON_INPUT_PATH, "scheduling", "slurm", "WorkOrder.json"))

Dynamically create workorder.json:

add_schedule_config(ts, command="python -c \"print('hello test')\"", node_group_name='idm_abcd', num_cores=2,
                        NumProcesses=1, NumNodes=1,
                        Environment={"key1": "value1", "key2:": "value2"})

Scheduling example

For addition information and specifics of using a workorder.json file within Python, you can begin with the following:

# In this example, we will demonstrate how to run use WorkOrder.json to create simulation in mshpc cluster
# if use WorkOrder.json correctly, it will create simulations based on the Command in WorkOrder.json. all commands from
# task will get ignored

import os
import sys
from functools import partial
from typing import Any, Dict

from idmtools.builders import SimulationBuilder
from idmtools.core.platform_factory import Platform
from idmtools.entities.experiment import Experiment
from idmtools.entities.simulation import Simulation
from idmtools.entities.templated_simulation import TemplatedSimulations
from idmtools_models.python.json_python_task import JSONConfiguredPythonTask
from idmtools_platform_comps.utils.scheduling import add_work_order

# first define our base task. please see the detail explanation in examples/python_models/python_sim.py
# if we do not use WorkOrder.json, this task will create simulation command run as "python Assets/model.py" in comps
# but for this example, we will use WorkOrder.json to override this command, so here the task's script can be anything
task = JSONConfiguredPythonTask(script_path=os.path.join("inputs", "python_model_with_deps", "Assets", "model.py"),
                                parameters=(dict(c=0)))

# now let's use this task to create a TemplatedSimulation builder. This will build new simulations from sweep builders
# we will define later. We can also use it to manipulate the base_task or the base_simulation
ts = TemplatedSimulations(base_task=task)

# We can define common metadata like tags across all the simulations using the base_simulation object
ts.base_simulation.tags['tag1'] = 1

# load WorkOrder.json file from local to each simulation via task. the actual command in comps will contain in this file
add_work_order(ts, file_path=os.path.join("inputs", "scheduling", "hpc", "WorkOrder.json"))

# Since we have our templated simulation object now, let's define our sweeps
# To do that we need to use a builder
builder = SimulationBuilder()


# define an utility function that will update a single parameter at a
# time on the model and add that param/value pair as a tag on our simulation.
def param_update(simulation: Simulation, param: str, value: Any) -> Dict[str, Any]:
    """
    This function is called during sweeping allowing us to pass the generated sweep values to our Task Configuration

    We always receive a Simulation object. We know that simulations all have tasks and that for our particular set
    of simulations they will all include JSONConfiguredPythonTask. We configure the model with calls to set_parameter
    to update the config. In addition, we are can return a dictionary of tags to add to the simulations so we return
    the output of the 'set_parameter' call since it returns the param/value pair we set

    Args:
        simulation: Simulation we are configuring
        param: Param string passed to use
        value: Value to set param to

    Returns:

    """
    return simulation.task.set_parameter(param, value)


# now add the sweep to our builder
builder.add_sweep_definition(partial(param_update, param="a"), range(3))
builder.add_sweep_definition(partial(param_update, param="b"), [1, 2, 3])
ts.add_builder(builder)

# Now we can create our Experiment using our template builder
experiment = Experiment.from_template(ts, name=os.path.split(sys.argv[0])[1])
# Add our own custom tag to simulation
experiment.tags["tag1"] = 1
# And maybe some custom Experiment Level Assets
experiment.assets.add_directory(assets_directory=os.path.join("inputs", "python_model_with_deps", "Assets"))

with Platform('BELEGOST') as platform:
    # Call run() with 'scheduling=True' to run simulations with scheduling using WorkOrder.json(loaded above)
    # There are few ways to schedule computation resources in COMPS:
    #    1. add_work_order() method to add WorkOrder.json file to simulations as transient asset
    #    2. add_schedule_config() method can be used to add dynamic WorkOrder.json to simulations as transient asset
    #    3. add additional parameters to Platform creation with Platform(**kwargs) in kwargs
    #    4. idmtools.ini
    # the order of precedence is WorkOrder.json > Platform() > idmtools.ini
    # with experiment.run method, you can also passin other options like 'priority=Highest' here to override any
    # priority value either passed in from idmtools.ini or defined in Platform(**kwargs)
    experiment.run(True, scheduling=True, priority='Highest')
    # use system status as the exit code
    sys.exit(0 if experiment.succeeded else -1)

To see the list of platform alias’, such as BELEGOST and CALCULON, use the following CLI command: idmtools info plugins platform-aliases.

Scheduling schemas

The following schemas, for both HPC and SLURM clusters on COMPS, list the available options you are able to include within the workorder.json file.

HPC:

{
  "title": "MSHPC job WorkOrder Schema",
  "$schema": "http://json-schema.org/draft-04/schema",
  "type": "object",
  "required": [
    "Command"
  ],
  "properties": {
    "Command": {
      "type": "string",
      "minLength": 1,
      "description": "The command to run, including binary and all arguments"
    },
    "NodeGroupName": {
      "type": "string",
      "minLength": 1,
      "description": "The cluster node-group to commission the job to"
    },
    "NumCores": {
      "type": "integer",
      "minimum": 1,
      "description": "The number of cores to reserve"
    },
    "SingleNode": {
      "type": "boolean",
      "description": "A flag to limit all reserved cores to being on the same compute node"
    },
    "Exclusive": {
      "type": "boolean",
      "description": "A flag that controls whether nodes should be exclusively allocated to this job"
    }
  },
  "additionalProperties": false
}

SLURM:

{
  "title": "SLURM job WorkOrder Schema",
  "$schema": "http://json-schema.org/draft-04/schema",
  "type": "object",
  "required": [
    "Command"
  ],
  "properties": {
    "Command": {
      "type": "string",
      "minLength": 1,
      "description": "The command to run, including binary and all arguments"
    },
    "NodeGroupName": {
      "type": "string",
      "minLength": 1,
      "description": "The cluster node-group to commission to"
    },
    "NumCores": {
      "type": "integer",
      "minimum": 1,
      "description": "The number of cores to reserve"
    },
    "NumNodes": {
      "type": "integer",
      "minimum": 1,
      "description": "The number of nodes to schedule"
    },
    "NumProcesses": {
      "type": "integer",
      "minimum": 1,
      "description": "The number of processes to execute"
    },
    "EnableMpi": {
      "type": "boolean",
      "description": "A flag that controls whether to run the job with mpiexec (i.e. whether the job will use MPI)"
    },
    "Environment": {
      "type": "object",
      "description": "Environment variables to set in the job environment; these can be dynamically expanded (e.g. $PATH)",
      "additionalProperties": {
        "type": "string"
      }
    }
  },
  "additionalProperties": false
}