User Guide Tutorial 09: AutoML¶

TemporAI provides AutoML tools for finding the best model for your use case in tempor.automl, these are demonstrated here.

AutoML in TemporAI Overview¶

TemporAI provides two AutoML approaches (“seekers”) under the tempor.automl.seekers module.

MethodSeeker: Search the hyperparameter space of a particular predictive method.
PipelineSeeker: Search the hyperparameter space of a pipeline like preprocessing steps -> predictive step.

The optimization strategies are facilitated by `optuna <https://optuna.readthedocs.io/>`__ and the currently supported strategies are: * Bayesian, specifically Tree-structured Parzen estimator ("bayesian"), * Random ("random"), * CMA-ES ("cmaes"), * QMC ("qmc"), * Grid ("grid").

Using `MethodSeeker`¶

Use a MethodSeeker to search for best algorithm and hyperparameters parameters for a particular task. No preprocessing (data transformation) steps are carried out in this approach, so preprocess the data using tempor.methods.preprocessing first, as needed.

A MethodSeeker can be initialized as follows.

[ ]:

from tempor import plugin_loader
from tempor.automl.seeker import MethodSeeker

# Load your dataset.
dataset = plugin_loader.get("prediction.one_off.sine", plugin_type="datasource").load()

seeker = MethodSeeker(
    # Name your AutoML study:
    study_name="my_automl_study",
    # Select the type of task:
    task_type="prediction.one_off.classification",
    # Choose which predictive methods to use in the search:
    estimator_names=[
        "cde_classifier",
        "ode_classifier",
        "nn_classifier",
    ],
    # Choose a metric. Metric maximization/minimization will be determined automatically.
    metric="aucroc",
    # Pass in your dataset.
    dataset=dataset,
    # How many best models to return:
    return_top_k=3,
    # Number of AutoML iterations:
    num_iter=100,
    # Type of AutoML tuner to use:
    tuner_type="bayesian",
    # You can also provide some other options like early stopping patience, number of cross-validation folds etc.
)

2023-11-16 22:18:41 | INFO     | tempor.automl.seeker:_set_up_tuners:341 | Setting up estimators and tuners for study my_automl_study.
2023-11-16 22:18:41 | INFO     | tempor.automl.seeker:_init_estimator:557 | Creating estimator cde_classifier.
2023-11-16 22:18:41 | INFO     | tempor.automl.seeker:_init_estimator:557 | Creating estimator ode_classifier.
2023-11-16 22:18:41 | INFO     | tempor.automl.seeker:_init_estimator:557 | Creating estimator nn_classifier.

You can then run the AutoML search as below.

The below example also shows how you can provide a custom hyperparameter space (override the default hyperparameter space for a model).

[ ]:

from tempor.methods.core.params import IntegerParams, CategoricalParams

# Provide a custom hyperparameter space to search for each type of model.
# NOTE: For the sake of speed of this example, we limit epochs to 2.
hp_space = {
    "cde_classifier": [
        IntegerParams(name="n_iter", low=2, high=2),
        IntegerParams(name="n_temporal_units_hidden", low=5, high=20),
        CategoricalParams(name="lr", choices=[1e-2, 1e-3, 1e-4]),
    ],
    "ode_classifier": [
        IntegerParams(name="n_iter", low=2, high=2),
        IntegerParams(name="n_units_hidden", low=5, high=20),
        CategoricalParams(name="lr", choices=[1e-2, 1e-3, 1e-4]),
    ],
    "nn_classifier": [
        IntegerParams(name="n_iter", low=2, high=2),
        IntegerParams(name="n_units_hidden", low=5, high=20),
        CategoricalParams(name="lr", choices=[1e-2, 1e-3, 1e-4]),
    ],
}

# Initialize a `MethodSeeker` and provide `override_hp_space`.
seeker = MethodSeeker(
    study_name="my_automl_study",
    task_type="prediction.one_off.classification",
    estimator_names=[
        "cde_classifier",
        "ode_classifier",
        "nn_classifier",
    ],
    metric="aucroc",
    dataset=dataset,
    return_top_k=3,
    num_iter=3,  # For the sake of speed of this example, only 3 AutoML iterations.
    tuner_type="bayesian",
    # Override hyperparameter space:
    override_hp_space=hp_space,
)

2023-11-16 22:18:41 | INFO     | tempor.automl.seeker:_set_up_tuners:341 | Setting up estimators and tuners for study my_automl_study.
2023-11-16 22:18:41 | INFO     | tempor.automl.seeker:_init_estimator:557 | Creating estimator cde_classifier.
2023-11-16 22:18:41 | INFO     | tempor.automl.seeker:_init_estimator:557 | Creating estimator ode_classifier.
2023-11-16 22:18:41 | INFO     | tempor.automl.seeker:_init_estimator:557 | Creating estimator nn_classifier.

[ ]:

# Execute the search.

best_methods, best_scores = seeker.search()

2023-11-16 22:18:41 | INFO     | tempor.automl.seeker:search:400 | Running  search for estimator 'cde_classifier' 1/3.
2023-11-16 22:18:41 | INFO     | tempor.automl.tuner:tune:228 | Baseline score computation skipped
2023-11-16 22:18:41 | INFO     | tempor.automl.tuner:objective:248 |
Hyperparameters sampled from CDEClassifier:
{'n_iter': 2, 'n_temporal_units_hidden': 13, 'lr': 0.01}
2023-11-16 22:18:48 | INFO     | tempor.automl.tuner:objective:248 |
Hyperparameters sampled from CDEClassifier:
{'n_iter': 2, 'n_temporal_units_hidden': 11, 'lr': 0.0001}
2023-11-16 22:18:53 | INFO     | tempor.automl.tuner:objective:248 |
Hyperparameters sampled from CDEClassifier:
{'n_iter': 2, 'n_temporal_units_hidden': 20, 'lr': 0.001}
2023-11-16 22:18:58 | INFO     | tempor.automl.seeker:search:400 | Running  search for estimator 'ode_classifier' 2/3.
2023-11-16 22:18:58 | INFO     | tempor.automl.tuner:tune:228 | Baseline score computation skipped
2023-11-16 22:18:58 | INFO     | tempor.automl.tuner:objective:248 |
Hyperparameters sampled from ODEClassifier:
{'n_iter': 2, 'n_units_hidden': 13, 'lr': 0.01}
2023-11-16 22:19:01 | INFO     | tempor.automl.tuner:objective:248 |
Hyperparameters sampled from ODEClassifier:
{'n_iter': 2, 'n_units_hidden': 11, 'lr': 0.0001}
2023-11-16 22:19:05 | INFO     | tempor.automl.tuner:objective:248 |
Hyperparameters sampled from ODEClassifier:
{'n_iter': 2, 'n_units_hidden': 20, 'lr': 0.001}
2023-11-16 22:19:08 | INFO     | tempor.automl.seeker:search:400 | Running  search for estimator 'nn_classifier' 3/3.
2023-11-16 22:19:08 | INFO     | tempor.automl.tuner:tune:228 | Baseline score computation skipped
2023-11-16 22:19:08 | INFO     | tempor.automl.tuner:objective:248 |
Hyperparameters sampled from NeuralNetClassifier:
{'n_iter': 2, 'n_units_hidden': 13, 'lr': 0.01}
2023-11-16 22:19:09 | INFO     | tempor.models.ts_model:_train:388 | Epoch:0| train loss: 0.6919505596160889, validation loss: 0.6882655620574951
2023-11-16 22:19:09 | INFO     | tempor.models.ts_model:_train:388 | Epoch:0| train loss: 0.691156268119812, validation loss: 0.6865323185920715
2023-11-16 22:19:10 | INFO     | tempor.models.ts_model:_train:388 | Epoch:0| train loss: 0.692247211933136, validation loss: 0.6861070990562439
2023-11-16 22:19:10 | INFO     | tempor.models.ts_model:_train:388 | Epoch:0| train loss: 0.6928325891494751, validation loss: 0.6825112104415894
2023-11-16 22:19:11 | INFO     | tempor.models.ts_model:_train:388 | Epoch:0| train loss: 0.6919053792953491, validation loss: 0.6819757223129272
2023-11-16 22:19:11 | INFO     | tempor.automl.tuner:objective:248 |
Hyperparameters sampled from NeuralNetClassifier:
{'n_iter': 2, 'n_units_hidden': 11, 'lr': 0.0001}
2023-11-16 22:19:12 | INFO     | tempor.models.ts_model:_train:388 | Epoch:0| train loss: 0.6919505596160889, validation loss: 0.691270649433136
2023-11-16 22:19:12 | INFO     | tempor.models.ts_model:_train:388 | Epoch:0| train loss: 0.691156268119812, validation loss: 0.6910805106163025
2023-11-16 22:19:13 | INFO     | tempor.models.ts_model:_train:388 | Epoch:0| train loss: 0.692247211933136, validation loss: 0.6900344491004944
2023-11-16 22:19:13 | INFO     | tempor.models.ts_model:_train:388 | Epoch:0| train loss: 0.6928325891494751, validation loss: 0.6906693577766418
2023-11-16 22:19:14 | INFO     | tempor.models.ts_model:_train:388 | Epoch:0| train loss: 0.6919053792953491, validation loss: 0.6910874843597412
2023-11-16 22:19:14 | INFO     | tempor.automl.tuner:objective:248 |
Hyperparameters sampled from NeuralNetClassifier:
{'n_iter': 2, 'n_units_hidden': 20, 'lr': 0.001}
2023-11-16 22:19:14 | INFO     | tempor.models.ts_model:_train:388 | Epoch:0| train loss: 0.6919505596160889, validation loss: 0.6896671056747437
2023-11-16 22:19:15 | INFO     | tempor.models.ts_model:_train:388 | Epoch:0| train loss: 0.691156268119812, validation loss: 0.6886401772499084
2023-11-16 22:19:15 | INFO     | tempor.models.ts_model:_train:388 | Epoch:0| train loss: 0.692247211933136, validation loss: 0.6884845495223999
2023-11-16 22:19:16 | INFO     | tempor.models.ts_model:_train:388 | Epoch:0| train loss: 0.6928325891494751, validation loss: 0.6887120604515076
2023-11-16 22:19:16 | INFO     | tempor.models.ts_model:_train:388 | Epoch:0| train loss: 0.6919053792953491, validation loss: 0.6893669962882996
2023-11-16 22:19:16 | INFO     | tempor.automl.seeker:search:435 |
Evaluation for cde_classifier scores:
[0.46300505050505053, 0.48396464646464643, 0.4901515151515151].
2023-11-16 22:19:16 | INFO     | tempor.automl.seeker:search:435 |
Evaluation for ode_classifier scores:
[0.47992424242424236, 0.48409090909090907, 0.484090909090909].
2023-11-16 22:19:16 | INFO     | tempor.automl.seeker:search:435 |
Evaluation for nn_classifier scores:
[0.37064393939393936, 0.5370580808080809, 0.4579545454545454].
2023-11-16 22:19:16 | INFO     | tempor.automl.seeker:search:436 |
All estimator definitions searched:
['cde_classifier', 'ode_classifier', 'nn_classifier']
2023-11-16 22:19:16 | INFO     | tempor.automl.seeker:search:437 |
Best scores for each estimator searched:
[0.4901515151515151, 0.48409090909090907, 0.5370580808080809]
2023-11-16 22:19:16 | INFO     | tempor.automl.seeker:search:438 |
Best hyperparameters for each estimator searched:
[{'n_iter': 2, 'n_temporal_units_hidden': 20, 'lr': 0.001}, {'n_iter': 2, 'n_units_hidden': 11, 'lr': 0.0001}, {'n_iter': 2, 'n_units_hidden': 11, 'lr': 0.0001}]
2023-11-16 22:19:16 | INFO     | tempor.automl.seeker:_create_estimator_with_hps:565 |
Selected score 0.5370580808080809 for nn_classifier with hyperparameters:
{'n_iter': 2, 'n_units_hidden': 11, 'lr': 0.0001}
2023-11-16 22:19:16 | INFO     | tempor.automl.seeker:_create_estimator_with_hps:565 |
Selected score 0.4901515151515151 for cde_classifier with hyperparameters:
{'n_iter': 2, 'n_temporal_units_hidden': 20, 'lr': 0.001}
2023-11-16 22:19:16 | INFO     | tempor.automl.seeker:_create_estimator_with_hps:565 |
Selected score 0.48409090909090907 for ode_classifier with hyperparameters:
{'n_iter': 2, 'n_units_hidden': 11, 'lr': 0.0001}

[ ]:

# The best methods are returned, and can be used by calling .predict() and so on.

import rich.pretty  # For pretty printing only.

for method in best_methods:
    rich.pretty.pprint(method, indent_guides=False)

NeuralNetClassifier(
    name='nn_classifier',
    category='prediction.one_off.classification',
    plugin_type='method',
    params={
        'n_static_units_hidden': 100,
        'n_static_layers_hidden': 2,
        'n_temporal_units_hidden': 102,
        'n_temporal_layers_hidden': 2,
        'n_iter': 2,
        'mode': 'RNN',
        'n_iter_print': 10,
        'batch_size': 100,
        'lr': 0.0001,
        'weight_decay': 0.001,
        'window_size': 1,
        'device': None,
        'dataloader_sampler': None,
        'dropout': 0,
        'nonlin': 'relu',
        'random_state': 0,
        'clipping_value': 1,
        'patience': 20,
        'train_ratio': 0.8
    }
)

CDEClassifier(
    name='cde_classifier',
    category='prediction.one_off.classification',
    plugin_type='method',
    params={
        'n_units_hidden': 100,
        'n_layers_hidden': 1,
        'nonlin': 'relu',
        'dropout': 0,
        'atol': 0.01,
        'rtol': 0.01,
        'interpolation': 'cubic',
        'lr': 0.001,
        'weight_decay': 0.001,
        'n_iter': 2,
        'batch_size': 500,
        'n_iter_print': 100,
        'random_state': 0,
        'patience': 10,
        'clipping_value': 1,
        'train_ratio': 0.8,
        'device': None,
        'dataloader_sampler': None
    }
)

ODEClassifier(
    name='ode_classifier',
    category='prediction.one_off.classification',
    plugin_type='method',
    params={
        'n_units_hidden': 11,
        'n_layers_hidden': 1,
        'nonlin': 'relu',
        'dropout': 0,
        'atol': 0.01,
        'rtol': 0.01,
        'interpolation': 'cubic',
        'lr': 0.0001,
        'weight_decay': 0.001,
        'n_iter': 2,
        'batch_size': 500,
        'n_iter_print': 100,
        'random_state': 0,
        'patience': 10,
        'clipping_value': 1,
        'train_ratio': 0.8,
        'device': None,
        'dataloader_sampler': None
    }
)

Using `PipelineSeeker`¶

Use a PipelineSeeker to search for best pipeline (preprocessing steps -> prediction step) for a particular task.

This seeker will create pipelines comprised of: - A static imputer (if at lease one candidate in static_imputers provided), - A static scaler (if at lease one candidate in static_scalers provided), - A temporal imputer (if at lease one candidate in temporal_imputers provided), - A temporal scaler (if at lease one candidate in temporal_scalers provided), - The final predictor, from the estimator_names options.

The imputer/scaler candidates will be sampled as a categorical hyperparameter. The hyperparameter spaces of these, and of the final predictor, will be sampled.

A PipelineSeeker uses a very similar interface to MethodSeeker, and can be initialized as follows.

[ ]:

from tempor.automl.seeker import PipelineSeeker

seeker = PipelineSeeker(
    study_name="my_automl_study",
    task_type="prediction.one_off.classification",
    # The estimators here will be the final step of the pipeline:
    estimator_names=[
        "cde_classifier",
        "ode_classifier",
        "nn_classifier",
    ],
    metric="aucroc",
    dataset=dataset,
    return_top_k=3,
    num_iter=100,
    tuner_type="bayesian",
    # The following arguments specify the candidates of the different preprocessing steps, e.g.:
    static_imputers=["static_tabular_imputer"],
    static_scalers=[],
    temporal_imputers=["ffill", "bfill"],
    temporal_scalers=["ts_minmax_scaler"],
)

2023-11-16 22:19:16 | INFO     | tempor.automl.seeker:_set_up_tuners:341 | Setting up estimators and tuners for study my_automl_study.
2023-11-16 22:19:16 | INFO     | tempor.automl.seeker:_init_estimator:733 | Creating estimator <Pipeline with cde_classifier>.
2023-11-16 22:19:16 | INFO     | tempor.automl.seeker:_init_estimator:733 | Creating estimator <Pipeline with ode_classifier>.
2023-11-16 22:19:16 | INFO     | tempor.automl.seeker:_init_estimator:733 | Creating estimator <Pipeline with nn_classifier>.

By default, the following preprocessing candidates will be used, if you do not specify the argument:

[ ]:

from tempor.automl.seeker import (
    DEFAULT_STATIC_IMPUTERS,
    DEFAULT_STATIC_SCALERS,
    DEFAULT_TEMPORAL_IMPUTERS,
    DEFAULT_TEMPORAL_SCALERS,
)

print("Static imputer candidates:", DEFAULT_STATIC_IMPUTERS)
print("Static scaler candidates:", DEFAULT_STATIC_SCALERS)
print("Temporal imputer candidates:", DEFAULT_TEMPORAL_IMPUTERS)
print("Temporal scaler candidates:", DEFAULT_TEMPORAL_SCALERS)

Static imputer candidates: ['static_tabular_imputer']
Static scaler candidates: ['static_minmax_scaler', 'static_standard_scaler']
Temporal imputer candidates: ['ffill', 'ts_tabular_imputer', 'bfill']
Temporal scaler candidates: ['ts_minmax_scaler', 'ts_standard_scaler']

You can execute the search as follows.

[ ]:

from tempor.methods.core.params import IntegerParams, CategoricalParams

# Provide a custom hyperparameter space to search for each type of model.
# These can be provided for the final (predictive) step of the pipeline.
# Default hyperparameter space will be sampled for the preprocessing steps.
# NOTE: For the sake of speed of this example, we limit epochs to 2.
hp_space = {
    "cde_classifier": [
        IntegerParams(name="n_iter", low=2, high=2),
        IntegerParams(name="n_temporal_units_hidden", low=5, high=20),
        CategoricalParams(name="lr", choices=[1e-2, 1e-3, 1e-4]),
    ],
    "ode_classifier": [
        IntegerParams(name="n_iter", low=2, high=2),
        IntegerParams(name="n_units_hidden", low=5, high=20),
        CategoricalParams(name="lr", choices=[1e-2, 1e-3, 1e-4]),
    ],
    "nn_classifier": [
        IntegerParams(name="n_iter", low=2, high=2),
        IntegerParams(name="n_units_hidden", low=5, high=20),
        CategoricalParams(name="lr", choices=[1e-2, 1e-3, 1e-4]),
    ],
}

# Initialize a `PipelineSeeker` and provide `override_hp_space`.
seeker = PipelineSeeker(
    study_name="my_automl_study",
    task_type="prediction.one_off.classification",
    estimator_names=[
        "cde_classifier",
        "ode_classifier",
        "nn_classifier",
    ],
    metric="aucroc",
    dataset=dataset,
    return_top_k=3,
    num_iter=3,  # For the sake of speed of this example, only 3 AutoML iterations.
    tuner_type="bayesian",
    # Override hyperparameter space:
    override_hp_space=hp_space,
    # Specify preprocessing candidates:
    static_imputers=["static_tabular_imputer"],
    static_scalers=["static_minmax_scaler", "static_standard_scaler"],
    temporal_imputers=[],
    temporal_scalers=["ts_minmax_scaler", "ts_standard_scaler"],
)

2023-11-16 22:19:17 | INFO     | tempor.automl.seeker:_set_up_tuners:341 | Setting up estimators and tuners for study my_automl_study.
2023-11-16 22:19:17 | INFO     | tempor.automl.seeker:_init_estimator:733 | Creating estimator <Pipeline with cde_classifier>.
2023-11-16 22:19:17 | INFO     | tempor.automl.seeker:_init_estimator:733 | Creating estimator <Pipeline with ode_classifier>.
2023-11-16 22:19:17 | INFO     | tempor.automl.seeker:_init_estimator:733 | Creating estimator <Pipeline with nn_classifier>.

[ ]:

best_pipelines, best_scores = seeker.search()

2023-11-16 22:19:17 | INFO     | tempor.automl.seeker:search:400 | Running  search for estimator '<Pipeline with cde_classifier>' 1/3.
2023-11-16 22:19:17 | INFO     | tempor.automl.tuner:tune:228 | Baseline score computation skipped
2023-11-16 22:19:17 | INFO     | tempor.automl.tuner:objective:248 |
Hyperparameters sampled from preprocessing.imputation.static.static_tabular_imputer->preprocessing.scaling.static.static_minmax_scaler->preprocessing.scaling.temporal.ts_standard_scaler->prediction.one_off.classification.cde_classifier:
{'plugin_params': {'static_tabular_imputer': {'imputer': 'softimpute'}, 'static_minmax_scaler': {'clip': True}, 'ts_standard_scaler': {}, 'cde_classifier': {'n_iter': 2, 'n_temporal_units_hidden': 17, 'lr': 0.001}}}
2023-11-16 22:19:32 | INFO     | tempor.automl.tuner:objective:248 |
Hyperparameters sampled from preprocessing.imputation.static.static_tabular_imputer->preprocessing.scaling.static.static_minmax_scaler->preprocessing.scaling.temporal.ts_minmax_scaler->prediction.one_off.classification.cde_classifier:
{'plugin_params': {'static_tabular_imputer': {'imputer': 'mean'}, 'static_minmax_scaler': {'clip': False}, 'ts_minmax_scaler': {'clip': False}, 'cde_classifier': {'n_iter': 2, 'n_temporal_units_hidden': 14, 'lr': 0.001}}}
2023-11-16 22:19:38 | INFO     | tempor.automl.tuner:objective:248 |
Hyperparameters sampled from preprocessing.imputation.static.static_tabular_imputer->preprocessing.scaling.static.static_standard_scaler->preprocessing.scaling.temporal.ts_standard_scaler->prediction.one_off.classification.cde_classifier:
{'plugin_params': {'static_tabular_imputer': {'imputer': 'most_frequent'}, 'static_standard_scaler': {}, 'ts_standard_scaler': {}, 'cde_classifier': {'n_iter': 2, 'n_temporal_units_hidden': 6, 'lr': 0.0001}}}
2023-11-16 22:19:45 | INFO     | tempor.automl.seeker:search:400 | Running  search for estimator '<Pipeline with ode_classifier>' 2/3.
2023-11-16 22:19:45 | INFO     | tempor.automl.tuner:tune:228 | Baseline score computation skipped
2023-11-16 22:19:45 | INFO     | tempor.automl.tuner:objective:248 |
Hyperparameters sampled from preprocessing.imputation.static.static_tabular_imputer->preprocessing.scaling.static.static_minmax_scaler->preprocessing.scaling.temporal.ts_standard_scaler->prediction.one_off.classification.ode_classifier:
{'plugin_params': {'static_tabular_imputer': {'imputer': 'softimpute'}, 'static_minmax_scaler': {'clip': True}, 'ts_standard_scaler': {}, 'ode_classifier': {'n_iter': 2, 'n_units_hidden': 17, 'lr': 0.001}}}
2023-11-16 22:20:00 | INFO     | tempor.automl.tuner:objective:248 |
Hyperparameters sampled from preprocessing.imputation.static.static_tabular_imputer->preprocessing.scaling.static.static_minmax_scaler->preprocessing.scaling.temporal.ts_minmax_scaler->prediction.one_off.classification.ode_classifier:
{'plugin_params': {'static_tabular_imputer': {'imputer': 'mean'}, 'static_minmax_scaler': {'clip': False}, 'ts_minmax_scaler': {'clip': False}, 'ode_classifier': {'n_iter': 2, 'n_units_hidden': 14, 'lr': 0.001}}}
2023-11-16 22:20:06 | INFO     | tempor.automl.tuner:objective:248 |
Hyperparameters sampled from preprocessing.imputation.static.static_tabular_imputer->preprocessing.scaling.static.static_standard_scaler->preprocessing.scaling.temporal.ts_standard_scaler->prediction.one_off.classification.ode_classifier:
{'plugin_params': {'static_tabular_imputer': {'imputer': 'most_frequent'}, 'static_standard_scaler': {}, 'ts_standard_scaler': {}, 'ode_classifier': {'n_iter': 2, 'n_units_hidden': 6, 'lr': 0.0001}}}
2023-11-16 22:20:12 | INFO     | tempor.automl.seeker:search:400 | Running  search for estimator '<Pipeline with nn_classifier>' 3/3.
2023-11-16 22:20:12 | INFO     | tempor.automl.tuner:tune:228 | Baseline score computation skipped
2023-11-16 22:20:12 | INFO     | tempor.automl.tuner:objective:248 |
Hyperparameters sampled from preprocessing.imputation.static.static_tabular_imputer->preprocessing.scaling.static.static_minmax_scaler->preprocessing.scaling.temporal.ts_standard_scaler->prediction.one_off.classification.nn_classifier:
{'plugin_params': {'static_tabular_imputer': {'imputer': 'softimpute'}, 'static_minmax_scaler': {'clip': True}, 'ts_standard_scaler': {}, 'nn_classifier': {'n_iter': 2, 'n_units_hidden': 17, 'lr': 0.001}}}
2023-11-16 22:20:14 | INFO     | tempor.models.ts_model:_train:388 | Epoch:0| train loss: 0.6918608546257019, validation loss: 0.6910383105278015
2023-11-16 22:20:17 | INFO     | tempor.models.ts_model:_train:388 | Epoch:0| train loss: 0.6909429430961609, validation loss: 0.6872987151145935
2023-11-16 22:20:20 | INFO     | tempor.models.ts_model:_train:388 | Epoch:0| train loss: 0.692150354385376, validation loss: 0.6879752278327942
2023-11-16 22:20:23 | INFO     | tempor.models.ts_model:_train:388 | Epoch:0| train loss: 0.6926363706588745, validation loss: 0.6881052851676941
2023-11-16 22:20:26 | INFO     | tempor.models.ts_model:_train:388 | Epoch:0| train loss: 0.6915659308433533, validation loss: 0.6888220310211182
2023-11-16 22:20:27 | INFO     | tempor.automl.tuner:objective:248 |
Hyperparameters sampled from preprocessing.imputation.static.static_tabular_imputer->preprocessing.scaling.static.static_minmax_scaler->preprocessing.scaling.temporal.ts_minmax_scaler->prediction.one_off.classification.nn_classifier:
{'plugin_params': {'static_tabular_imputer': {'imputer': 'mean'}, 'static_minmax_scaler': {'clip': False}, 'ts_minmax_scaler': {'clip': False}, 'nn_classifier': {'n_iter': 2, 'n_units_hidden': 14, 'lr': 0.001}}}
2023-11-16 22:20:27 | INFO     | tempor.models.ts_model:_train:388 | Epoch:0| train loss: 0.6920319199562073, validation loss: 0.6892786622047424
2023-11-16 22:20:28 | INFO     | tempor.models.ts_model:_train:388 | Epoch:0| train loss: 0.6913070678710938, validation loss: 0.6893746852874756
2023-11-16 22:20:30 | INFO     | tempor.models.ts_model:_train:388 | Epoch:0| train loss: 0.6924894452095032, validation loss: 0.6886722445487976
2023-11-16 22:20:31 | INFO     | tempor.models.ts_model:_train:388 | Epoch:0| train loss: 0.6930248141288757, validation loss: 0.6894596815109253
2023-11-16 22:20:32 | INFO     | tempor.models.ts_model:_train:388 | Epoch:0| train loss: 0.6921922564506531, validation loss: 0.6893032789230347
2023-11-16 22:20:32 | INFO     | tempor.automl.tuner:objective:248 |
Hyperparameters sampled from preprocessing.imputation.static.static_tabular_imputer->preprocessing.scaling.static.static_standard_scaler->preprocessing.scaling.temporal.ts_standard_scaler->prediction.one_off.classification.nn_classifier:
{'plugin_params': {'static_tabular_imputer': {'imputer': 'most_frequent'}, 'static_standard_scaler': {}, 'ts_standard_scaler': {}, 'nn_classifier': {'n_iter': 2, 'n_units_hidden': 6, 'lr': 0.0001}}}
2023-11-16 22:20:33 | INFO     | tempor.models.ts_model:_train:388 | Epoch:0| train loss: 0.6914154291152954, validation loss: 0.6911771297454834
2023-11-16 22:20:34 | INFO     | tempor.models.ts_model:_train:388 | Epoch:0| train loss: 0.690178394317627, validation loss: 0.6934390664100647
2023-11-16 22:20:35 | INFO     | tempor.models.ts_model:_train:388 | Epoch:0| train loss: 0.6933540105819702, validation loss: 0.6896734833717346
2023-11-16 22:20:36 | INFO     | tempor.models.ts_model:_train:388 | Epoch:0| train loss: 0.694333553314209, validation loss: 0.690070390701294
2023-11-16 22:20:37 | INFO     | tempor.models.ts_model:_train:388 | Epoch:0| train loss: 0.6921719312667847, validation loss: 0.6909489631652832
2023-11-16 22:20:37 | INFO     | tempor.automl.seeker:search:435 |
Evaluation for <Pipeline with cde_classifier> scores:
[0.4736742424242424, 0.4737373737373737, 0.4657828282828283].
2023-11-16 22:20:37 | INFO     | tempor.automl.seeker:search:435 |
Evaluation for <Pipeline with ode_classifier> scores:
[0.47575757575757577, 0.4779040404040404, 0.4779671717171716].
2023-11-16 22:20:37 | INFO     | tempor.automl.seeker:search:435 |
Evaluation for <Pipeline with nn_classifier> scores:
[0.48478535353535346, 0.4361742424242424, 0.5071338383838383].
2023-11-16 22:20:37 | INFO     | tempor.automl.seeker:search:436 |
All estimator definitions searched:
['<Pipeline with cde_classifier>', '<Pipeline with ode_classifier>', '<Pipeline with nn_classifier>']
2023-11-16 22:20:37 | INFO     | tempor.automl.seeker:search:437 |
Best scores for each estimator searched:
[0.4737373737373737, 0.4779671717171716, 0.5071338383838383]
2023-11-16 22:20:37 | INFO     | tempor.automl.seeker:search:438 |
Best hyperparameters for each estimator searched:
[{'<candidates>(preprocessing.imputation.static)': 'static_tabular_imputer', '[static_tabular_imputer](imputer)': 'mean', '<candidates>(preprocessing.scaling.static)': 'static_minmax_scaler', '[static_minmax_scaler](clip)': False, '<candidates>(preprocessing.scaling.temporal)': 'ts_minmax_scaler', '[ts_minmax_scaler](clip)': False, '[cde_classifier](n_iter)': 2, '[cde_classifier](n_temporal_units_hidden)': 14, '[cde_classifier](lr)': 0.001}, {'<candidates>(preprocessing.imputation.static)': 'static_tabular_imputer', '[static_tabular_imputer](imputer)': 'most_frequent', '<candidates>(preprocessing.scaling.static)': 'static_standard_scaler', '[static_minmax_scaler](clip)': False, '<candidates>(preprocessing.scaling.temporal)': 'ts_standard_scaler', '[ts_minmax_scaler](clip)': False, '[ode_classifier](n_iter)': 2, '[ode_classifier](n_units_hidden)': 6, '[ode_classifier](lr)': 0.0001}, {'<candidates>(preprocessing.imputation.static)': 'static_tabular_imputer', '[static_tabular_imputer](imputer)': 'most_frequent', '<candidates>(preprocessing.scaling.static)': 'static_standard_scaler', '[static_minmax_scaler](clip)': False, '<candidates>(preprocessing.scaling.temporal)': 'ts_standard_scaler', '[ts_minmax_scaler](clip)': False, '[nn_classifier](n_iter)': 2, '[nn_classifier](n_units_hidden)': 6, '[nn_classifier](lr)': 0.0001}]
2023-11-16 22:20:37 | INFO     | tempor.automl.seeker:_create_estimator_with_hps:748 |
Selected score 0.5071338383838383 for <Pipeline with nn_classifier> with hyperparameters:
{'<candidates>(preprocessing.imputation.static)': 'static_tabular_imputer', '[static_tabular_imputer](imputer)': 'most_frequent', '<candidates>(preprocessing.scaling.static)': 'static_standard_scaler', '[static_minmax_scaler](clip)': False, '<candidates>(preprocessing.scaling.temporal)': 'ts_standard_scaler', '[ts_minmax_scaler](clip)': False, '[nn_classifier](n_iter)': 2, '[nn_classifier](n_units_hidden)': 6, '[nn_classifier](lr)': 0.0001}
2023-11-16 22:20:37 | INFO     | tempor.automl.seeker:_create_estimator_with_hps:748 |
Selected score 0.4779671717171716 for <Pipeline with ode_classifier> with hyperparameters:
{'<candidates>(preprocessing.imputation.static)': 'static_tabular_imputer', '[static_tabular_imputer](imputer)': 'most_frequent', '<candidates>(preprocessing.scaling.static)': 'static_standard_scaler', '[static_minmax_scaler](clip)': False, '<candidates>(preprocessing.scaling.temporal)': 'ts_standard_scaler', '[ts_minmax_scaler](clip)': False, '[ode_classifier](n_iter)': 2, '[ode_classifier](n_units_hidden)': 6, '[ode_classifier](lr)': 0.0001}
2023-11-16 22:20:37 | INFO     | tempor.automl.seeker:_create_estimator_with_hps:748 |
Selected score 0.4737373737373737 for <Pipeline with cde_classifier> with hyperparameters:
{'<candidates>(preprocessing.imputation.static)': 'static_tabular_imputer', '[static_tabular_imputer](imputer)': 'mean', '<candidates>(preprocessing.scaling.static)': 'static_minmax_scaler', '[static_minmax_scaler](clip)': False, '<candidates>(preprocessing.scaling.temporal)': 'ts_minmax_scaler', '[ts_minmax_scaler](clip)': False, '[cde_classifier](n_iter)': 2, '[cde_classifier](n_temporal_units_hidden)': 14, '[cde_classifier](lr)': 0.001}

[ ]:

# The best performing pipelines are returned, and can be used by calling .predict() and so on.

for method in best_pipelines:
    rich.pretty.pprint(method, indent_guides=False)

Pipeline(
    pipeline_seq='preprocessing.imputation.static.static_tabular_imputer->preprocessing.scaling.static.static_standard_scaler->preprocessing.scaling.temporal.ts_standard_scaler->prediction.one_off.classification.nn_classifier',
    predictor_category='prediction.one_off.classification',
    params={
        'static_tabular_imputer': {
            'imputer': 'most_frequent',
            'random_state': 0,
            'imputer_params': {'random_state': 0}
        },
        'static_standard_scaler': {'with_mean': True, 'with_std': True},
        'ts_standard_scaler': {'with_mean': True, 'with_std': True},
        'nn_classifier': {
            'n_static_units_hidden': 100,
            'n_static_layers_hidden': 2,
            'n_temporal_units_hidden': 102,
            'n_temporal_layers_hidden': 2,
            'n_iter': 2,
            'mode': 'RNN',
            'n_iter_print': 10,
            'batch_size': 100,
            'lr': 0.0001,
            'weight_decay': 0.001,
            'window_size': 1,
            'device': None,
            'dataloader_sampler': None,
            'dropout': 0,
            'nonlin': 'relu',
            'random_state': 0,
            'clipping_value': 1,
            'patience': 20,
            'train_ratio': 0.8
        }
    }
)

Pipeline(
    pipeline_seq='preprocessing.imputation.static.static_tabular_imputer->preprocessing.scaling.static.static_standard_scaler->preprocessing.scaling.temporal.ts_standard_scaler->prediction.one_off.classification.ode_classifier',
    predictor_category='prediction.one_off.classification',
    params={
        'static_tabular_imputer': {
            'imputer': 'most_frequent',
            'random_state': 0,
            'imputer_params': {'random_state': 0}
        },
        'static_standard_scaler': {'with_mean': True, 'with_std': True},
        'ts_standard_scaler': {'with_mean': True, 'with_std': True},
        'ode_classifier': {
            'n_units_hidden': 6,
            'n_layers_hidden': 1,
            'nonlin': 'relu',
            'dropout': 0,
            'atol': 0.01,
            'rtol': 0.01,
            'interpolation': 'cubic',
            'lr': 0.0001,
            'weight_decay': 0.001,
            'n_iter': 2,
            'batch_size': 500,
            'n_iter_print': 100,
            'random_state': 0,
            'patience': 10,
            'clipping_value': 1,
            'train_ratio': 0.8,
            'device': None,
            'dataloader_sampler': None
        }
    }
)

Pipeline(
    pipeline_seq='preprocessing.imputation.static.static_tabular_imputer->preprocessing.scaling.static.static_minmax_scaler->preprocessing.scaling.temporal.ts_minmax_scaler->prediction.one_off.classification.cde_classifier',
    predictor_category='prediction.one_off.classification',
    params={
        'static_tabular_imputer': {'imputer': 'mean', 'random_state': 0, 'imputer_params': {'random_state': 0}},
        'static_minmax_scaler': {'feature_range': [0, 1], 'clip': False},
        'ts_minmax_scaler': {'feature_range': [0, 1], 'clip': False},
        'cde_classifier': {
            'n_units_hidden': 100,
            'n_layers_hidden': 1,
            'nonlin': 'relu',
            'dropout': 0,
            'atol': 0.01,
            'rtol': 0.01,
            'interpolation': 'cubic',
            'lr': 0.001,
            'weight_decay': 0.001,
            'n_iter': 2,
            'batch_size': 500,
            'n_iter_print': 100,
            'random_state': 0,
            'patience': 10,
            'clipping_value': 1,
            'train_ratio': 0.8,
            'device': None,
            'dataloader_sampler': None
        }
    }
)

Advanced customization¶

You may further customize the AutoML tuning behavior by specifying the sampler an pruner, if desired.

See the below example.

[ ]:

# 1. Import a Tuner:
from tempor.automl.tuner import OptunaTuner

# 2. Customize this as needed:
import optuna

custom_tuner = OptunaTuner(
    study_name="my_automl_study",
    direction="maximize",
    # Customized sampler:
    study_sampler=optuna.samplers.TPESampler(seed=12345, n_startup_trials=3),
    # Customized pruner:
    study_pruner=optuna.pruners.MedianPruner(interval_steps=2),
    # Using default optuna storage object here, but may a provide custom one, e.g. redis.
    study_storage=None,
)

# 3. Pass the Tuner to the {Method/Pipeline}Seeker:
seeker = MethodSeeker(
    study_name="my_automl_study",
    task_type="prediction.one_off.classification",
    estimator_names=[
        "cde_classifier",
        "ode_classifier",
        "nn_classifier",
    ],
    metric="aucroc",
    dataset=dataset,
    # Like so:
    custom_tuner=custom_tuner,
)

# 4. Execute search:
# results = seeker.search() ...

2023-11-16 22:20:37 | INFO     | tempor.automl.seeker:_set_up_tuners:341 | Setting up estimators and tuners for study my_automl_study.
2023-11-16 22:20:37 | INFO     | tempor.automl.seeker:_init_estimator:557 | Creating estimator cde_classifier.
2023-11-16 22:20:37 | INFO     | tempor.automl.seeker:_init_estimator:557 | Creating estimator ode_classifier.
2023-11-16 22:20:37 | INFO     | tempor.automl.seeker:_init_estimator:557 | Creating estimator nn_classifier.

Supported tasks¶

⚠️ The tasks for which benchmarking is supported are supported by AutoML. See the benchmarking tutorial.