Test In Colab

User Guide Tutorial 09: Benchmarks

TemporAI provides some useful benchmarking tools in tempor.benchmarks, these are demonstrated here.

Using tempor.benchmarks.benchmark_models

The tempor.benchmarks.benchmark_models function provides a quick way to benchmark a number of models (plugins) for a particular task.

It takes a list of models (these may also be a Pipeline) and a dataset, and performs cross-validation to get the mean and standard deviation of the various metrics.

It returns a tuple (results_readable, results) as below.

[ ]:
from tempor.benchmarks import benchmark_models
from tempor import plugin_loader

from IPython.display import display

dataset = plugin_loader.get("prediction.one_off.sine", plugin_type="datasource", random_state=42, no=25).load()

results_readable, results = benchmark_models(
    task_type="prediction.one_off.classification",
    tests=[
        ("model_1", plugin_loader.get("prediction.one_off.classification.nn_classifier", n_iter=10)),
        ("model_2", plugin_loader.get("prediction.one_off.classification.ode_classifier", n_iter=100)),
    ],
    data=dataset,
    n_splits=3,
)

print("Results in easily-readable format:")
display(results_readable)

print("Full results:\n")
for model, value in results.items():
    print(f"{model}:")
    display(value)
2023-11-16 22:18:41 | INFO     | tempor.benchmarks.benchmark:benchmark_models:104 | Test case: model_1
2023-11-16 22:18:45 | INFO     | tempor.models.ts_model:_train:388 | Epoch:0| train loss: 0.68637615442276, validation loss: 0.6680578589439392
2023-11-16 22:18:46 | INFO     | tempor.models.ts_model:_train:388 | Epoch:0| train loss: 0.6887814402580261, validation loss: 0.6939960718154907
2023-11-16 22:18:47 | INFO     | tempor.models.ts_model:_train:388 | Epoch:0| train loss: 0.6906945109367371, validation loss: 0.6966598033905029
2023-11-16 22:18:47 | INFO     | tempor.benchmarks.benchmark:benchmark_models:104 | Test case: model_2
2023-11-16 22:18:52 | INFO     | tempor.models.ts_ode:_train:617 | Epoch:99| train loss: 0.6465951204299927, validation loss: 0.5632617473602295
2023-11-16 22:18:58 | INFO     | tempor.models.ts_ode:_train:617 | Epoch:99| train loss: 0.913261890411377, validation loss: 0.8132617473602295
2023-11-16 22:19:04 | INFO     | tempor.models.ts_ode:_train:617 | Epoch:99| train loss: 0.7132617831230164, validation loss: 0.8132617473602295
Results in easily-readable format:
model_1 model_2
accuracy 0.602 +/- 0.033 0.519 +/- 0.105
f1_score_micro 0.602 +/- 0.033 0.519 +/- 0.105
f1_score_macro 0.375 +/- 0.013 0.338 +/- 0.048
f1_score_weighted 0.453 +/- 0.04 0.361 +/- 0.116
kappa 0.0 +/- 0.0 0.0 +/- 0.0
kappa_quadratic 0.0 +/- 0.0 0.0 +/- 0.0
recall_micro 0.602 +/- 0.033 0.519 +/- 0.105
recall_macro 0.5 +/- 0.0 0.5 +/- 0.0
recall_weighted 0.602 +/- 0.033 0.519 +/- 0.105
precision_micro 0.602 +/- 0.033 0.519 +/- 0.105
precision_macro 0.301 +/- 0.016 0.259 +/- 0.053
precision_weighted 0.363 +/- 0.039 0.28 +/- 0.104
mcc 0.0 +/- 0.0 0.0 +/- 0.0
aucprc 0.385 +/- 0.028 0.588 +/- 0.292
aucroc 0.311 +/- 0.083 0.583 +/- 0.312
Full results:

model_1:
mean stddev
accuracy 0.601852 0.032736
f1_score_micro 0.601852 0.032736
f1_score_macro 0.375458 0.012951
f1_score_weighted 0.452788 0.039572
kappa 0.000000 0.000000
kappa_quadratic 0.000000 0.000000
recall_micro 0.601852 0.032736
recall_macro 0.500000 0.000000
recall_weighted 0.601852 0.032736
precision_micro 0.601852 0.032736
precision_macro 0.300926 0.016368
precision_weighted 0.363297 0.038647
mcc 0.000000 0.000000
aucprc 0.385185 0.028154
aucroc 0.311111 0.083148
model_2:
mean stddev
accuracy 0.518519 0.105369
f1_score_micro 0.518519 0.105369
f1_score_macro 0.338162 0.047609
f1_score_weighted 0.360713 0.115623
kappa 0.000000 0.000000
kappa_quadratic 0.000000 0.000000
recall_micro 0.518519 0.105369
recall_macro 0.500000 0.000000
recall_weighted 0.518519 0.105369
precision_micro 0.518519 0.105369
precision_macro 0.259259 0.052684
precision_weighted 0.279964 0.104057
mcc 0.000000 0.000000
aucprc 0.587731 0.291568
aucroc 0.583333 0.311805

Supported tasks

⚠️ Not all task types are supported by benchmark_models yet.

Supported tasks (for each task_type argument): * task_type="prediction.one_off.classification". * task_type="prediction.one_off.regression". * task_type="time_to_event".