COMPS scheduling#

idmtools supports job scheduling on the COMPS platform, which includes support for multiple scenarios depending upon the scheduling needs of your specific research needs and requirements. For example, you could schedule your simulations to run under a single process on the same node and with a specified number of cores. For more information about this and other supported scenarios, see Scheduling Scenarios. To use the full scheduling capabilites included within COMPS you must add the workorder.json as a transient asset. This is a one time task to complete for your project. For more information about scheduling configuration, see Scheduling Configuration. Examples are provided from which you can leverage to help get started and gain a better understanding. Scheduling Schemas enumerate the available options that may be included in workorder.json.

Scheduling scenarios#

Choosing the correct scheduling scenario will depend upon your specific research needs and requirements. The following lists some of the common scenarios supported:

N cores, N processes - useful for single-threaded or MPI-enabled workloads, such as EMOD.
N cores, 1 node, 1 process - useful for models that want to spawn various worker thread (GenEpi) or have large memory usage, where the number of cores being an indicator of memory usage.
1 node, N processes - useful for models with high migration and interprocess communication. By running on the same node MPI can use shared memory, as opposed to slower tcp sockets over multiple nodes. This may be useful for some scenarios using EMOD or other MPI-enabled workloads.

Scheduling configuration#

By configuring a workorder.json file and adding it as a transient asset you can take advantage of the full scheduling support provided with COMPS. Scheduling information included in the workorder.json file will take precedent over any scheduling information you may have in the idmtools.ini file or scheduling parameters passed to Platform. The following examples shows some of the options available to include in a workorder.json file.

Example workorder.json for HPC clusters:

{
  "Command": "python -c \"print('hello test')\"",
  "NodeGroupName": "idm_abcd",
  "NumCores": 1,
  "SingleNode": false,
  "Exclusive": false
}

Example workorder.json for SLURM clusters:

{
  "Command": "python3 Assets/model1.py",
  "NodeGroupName": "idm_abcd",
  "NumCores": 1,
  "NumProcesses": 1,
  "NumNodes": 1,
  "Environment": {
    "key1": "value1",
    "key2:": "value2",
    "PYTHONPATH": "$PYTHONPATH:$PWD/Assets:$PWD/Assets/site-packages",
    "PATH": "$PATH:$PWD/Assets:$PWD/Assets/site-packages"
  }
}

Add workorder.json as a transient asset#

To include the workorder.json file as a transient asset you can either add an existing workorder.json using the add_work_order method or dynamically create one using the add_schedule_config method, both methods included in the Scheduling class.

Add existing workorder.json:

add_work_order(ts, file_path=os.path.join(COMMON_INPUT_PATH, "scheduling", "slurm", "WorkOrder.json"))

Dynamically create workorder.json:

add_schedule_config(ts, command="python -c \"print('hello test')\"", NodeGroupName='idm_abcd', NumCores=2,
                        NumProcesses=1, NumNodes=1,
                        Environment={"key1": "value1", "key2:": "value2"})

When dynamically creating workorder by add_schedule_config method you must also set and pass scheduling=True parameter when running simulations, for example:

experiment.run(scheduling=True)

Scheduling example#

For addition information and specifics of using a workorder.json file within Python, you can begin with the following:

# In this example, we will demonstrate how to run use WorkOrder.json to create simulation in hpc cluster
# If use WorkOrder.json correctly, it will create simulations based on the Command in WorkOrder.json. all commands from
# task will get ignored

import os
import sys
from functools import partial
from typing import Any, Dict
from idmtools.builders import SimulationBuilder
from idmtools.core.platform_factory import Platform
from idmtools.entities.experiment import Experiment
from idmtools.entities.simulation import Simulation
from idmtools.entities.templated_simulation import TemplatedSimulations
from idmtools_models.python.json_python_task import JSONConfiguredPythonTask
from idmtools_platform_comps.utils.scheduling import add_work_order

# first define our base task. please see the detail explanation in examples/python_models/python_sim.py
# if we do not use WorkOrder.json, this task will create simulation command run as "python Assets/model.py" in comps
# but for this example, we will use WorkOrder.json to override this command, so here the task's script can be anything
task = JSONConfiguredPythonTask(script_path=os.path.join("inputs", "python_model_with_deps", "Assets", "model.py"),
                                parameters=(dict(c=0)))

# now let's use this task to create a TemplatedSimulation builder. This will build new simulations from sweep builders
# we will define later. We can also use it to manipulate the base_task or the base_simulation
ts = TemplatedSimulations(base_task=task)

# We can define common metadata like tags across all the simulations using the base_simulation object
ts.base_simulation.tags['tag1'] = 1

# load WorkOrder.json file from local to each simulation via task. the actual command in comps will contain in this file
add_work_order(ts, file_path=os.path.join("inputs", "scheduling", "hpc", "WorkOrder.json"))

# Since we have our templated simulation object now, let's define our sweeps
# To do that we need to use a builder
builder = SimulationBuilder()


# define a utility function that will update a single parameter at a
# time on the model and add that param/value pair as a tag on our simulation.
def param_update(simulation: Simulation, param: str, value: Any) -> Dict[str, Any]:
    return simulation.task.set_parameter(param, value)


# now add the sweep to our builder
builder.add_sweep_definition(partial(param_update, param="a"), range(3))
builder.add_sweep_definition(partial(param_update, param="b"), [1, 2, 3])
ts.add_builder(builder)

# Now we can create our Experiment using our template builder
experiment = Experiment.from_template(ts, name=os.path.split(sys.argv[0])[1])
# Add our own custom tag to simulation
experiment.tags["tag1"] = 1

with Platform('IDMCloud') as platform:
    # Call run() to run simulations with scheduling using WorkOrder.json(loaded above)
    experiment.run(True, priority='Highest')
    # use system status as the exit code
    sys.exit(0 if experiment.succeeded else -1)

To see the list of platform alias’, such as IDMCloud and CALCULON, use the following CLI command: idmtools info plugins platform-aliases.

Scheduling schemas#

The following schemas, for both HPC and SLURM clusters on COMPS, list the available options you are able to include within the workorder.json file.

HPC:

{
  "title": "MSHPC job WorkOrder Schema",
  "$schema": "http://json-schema.org/draft-04/schema",
  "type": "object",
  "required": [
    "Command"
  ],
  "properties": {
    "Command": {
      "type": "string",
      "minLength": 1,
      "description": "The command to run, including binary and all arguments"
    },
    "NodeGroupName": {
      "type": "string",
      "minLength": 1,
      "description": "The cluster node-group to commission the job to"
    },
    "NumCores": {
      "type": "integer",
      "minimum": 1,
      "description": "The number of cores to reserve"
    },
    "SingleNode": {
      "type": "boolean",
      "description": "A flag to limit all reserved cores to being on the same compute node"
    },
    "Exclusive": {
      "type": "boolean",
      "description": "A flag that controls whether nodes should be exclusively allocated to this job"
    }
  },
  "additionalProperties": false
}

SLURM:

{
  "title": "SLURM job WorkOrder Schema",
  "$schema": "http://json-schema.org/draft-04/schema",
  "type": "object",
  "required": [
    "Command"
  ],
  "properties": {
    "Command": {
      "type": "string",
      "minLength": 1,
      "description": "The command to run, including binary and all arguments"
    },
    "NodeGroupName": {
      "type": "string",
      "minLength": 1,
      "description": "The cluster node-group to commission to"
    },
    "NumCores": {
      "type": "integer",
      "minimum": 1,
      "description": "The number of cores to reserve"
    },
    "NumNodes": {
      "type": "integer",
      "minimum": 1,
      "description": "The number of nodes to schedule"
    },
    "NumProcesses": {
      "type": "integer",
      "minimum": 1,
      "description": "The number of processes to execute"
    },
    "EnableMpi": {
      "type": "boolean",
      "description": "A flag that controls whether to run the job with mpiexec (i.e. whether the job will use MPI)"
    },
    "Environment": {
      "type": "object",
      "description": "Environment variables to set in the job environment; these can be dynamically expanded (e.g. $PATH)",
      "additionalProperties": {
        "type": "string"
      }
    }
  },
  "additionalProperties": false
}