Data analysis
idmtools provides a map-reduce framework for analyzing simulation outputs after your experiment has finished running.
Overview
The analysis pipeline consists of three components:
| Component |
Description |
| IAnalyzer |
Base class you implement to define how to process and aggregate simulation outputs |
| AnalyzeManager |
Runs analyzers locally against one or more experiments, suites, or simulations |
| PlatformAnalysis |
Runs analyzers remotely as an SSMT work item on COMPS — keeps data on the cluster and avoids large transfers |
How it works
Analysis follows a map-reduce pattern:
| Simulations ──► map() ──► per-simulation result
│
▼
reduce() ──► aggregate output (CSV, plots, etc.)
|
- map — called once per simulation; receives the simulation's output files and returns any Python object
- reduce — called once after all simulations are mapped; receives
{simulation: map_result} and produces the final output
Quick example
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22 | from idmtools.analysis.analyze_manager import AnalyzeManager
from idmtools.core import ItemType
from idmtools.core.platform_factory import Platform
from idmtools.entities import IAnalyzer
class MyAnalyzer(IAnalyzer):
def __init__(self):
super().__init__(filenames=["output/result.json"])
def map(self, data, simulation):
return data[self.filenames[0]]
def reduce(self, all_data):
for sim, result in all_data.items():
print(sim.id, result)
with Platform('CALCULON') as platform:
manager = AnalyzeManager(
ids=[('your-experiment-id', ItemType.EXPERIMENT)],
analyzers=[MyAnalyzer()]
)
manager.analyze()
|
|
AnalyzeManager |
PlatformAnalysis |
| Where it runs |
Your local machine |
Remote SSMT worker on COMPS |
| Data transfer |
Downloads output files locally |
Files stay on the cluster |
| Best for |
Development, small datasets |
Large datasets, production workflows |
| Platform required |
Any idmtools platform |
COMPS only |
In this section