Test In Colab

User Guide Tutorial 07: Pipeline

This tutorial shows how to use TemporAI Pipelines.

TemporAI Pipeline

A TemporAI Pipeline allows you to combine multiple plugins into one; inspired by be scikit-learn pipeline.

  • All but the final plugin in the pipeline need to be data transformers (the preprocessing plugin category),

  • The final one must be a predictive plugin (any of the prediction, time_to_event, treatments plugin categories).

When fitting, all the stages will be fitted, and the data will be sequentially transformed by all the preprocessing steps before fitting the final predictive method plugin.

When predicting, the data will be again transformed by the preprocessing steps, and prediction carried out using the final predictive method plugin.

Note:

All pipelines follow PipelineBase interface, see API reference for details.

Example

Below is an example of a pipeline ending with prediction.one_off.nn_classifier.

Initializing the Pipeline follows the following steps. 1. Use the pipeline() function to create a pipeline class from a list of strings denoting its steps. 1. Instantiate the pipeline class. The initialization arguments to each component plugin can be passed as a dictionary at this step. 1. Use the pipeline like any other TemporAI estimator (call .fit(...), .predict(...) and so on).

[ ]:
from rich.pretty import pprint  # For fancy printing only.
[ ]:
from tempor.methods.pipeline import pipeline

# 1. Create a pipeline class based on your desired definition of the pipeline.
PipelineClass = pipeline(
    # Provide plugin names for the pipeline, in order.
    [
        # Preprocessing (data transformer) plugins:
        "preprocessing.imputation.temporal.bfill",
        "preprocessing.imputation.static.static_tabular_imputer",
        "preprocessing.imputation.temporal.ts_tabular_imputer",
        "preprocessing.scaling.temporal.ts_minmax_scaler",
        # Prediction plugin:
        "prediction.one_off.classification.nn_classifier",
    ],
)
print("Pipeline class:")
print(PipelineClass)

print("\nPipeline base classes (note `PipelineBase`):")
pprint(PipelineClass.mro())

pipe = PipelineClass(
    # You can provide initialization arguments to each plugin comprising the pipeline as a dictionary, as follows:
    {
        "static_imputer": {"static_imputer": "ice", "random_state": 42},
        "nn_classifier": {"n_iter": 100},
    }
)

print("Pipeline instance:")
pprint(pipe)
Pipeline class:
<class 'tempor.methods.pipeline.pipeline.<locals>.Pipeline'>

Pipeline base classes (note `PipelineBase`):
[
<class 'tempor.methods.pipeline.pipeline.<locals>.Pipeline'>,
<class 'tempor.methods.pipeline.PipelineBase'>,
<class 'tempor.methods.prediction.one_off.classification.BaseOneOffClassifier'>,
<class 'tempor.methods.core._base_predictor.BasePredictor'>,
<class 'tempor.methods.core._base_estimator.BaseEstimator'>,
<class 'tempor.core.plugins.Plugin'>,
<class 'abc.ABC'>,
<class 'object'>
]
2023-11-16 22:18:41 | INFO     | hyperimpute.logger:log_and_print:65 | Iteration imputation: select_model_by_column: True, select_model_by_iteration: True
2023-11-16 22:18:41 | INFO     | hyperimpute.logger:log_and_print:65 | Iteration imputation: select_model_by_column: True, select_model_by_iteration: True
Pipeline instance:
Pipeline(
pipeline_seq='preprocessing.imputation.temporal.bfill->preprocessing.imputation.static.static_tabular_imputer->preprocessing.imputation.temporal.ts_tabular_imputer->preprocessing.scaling.temporal.ts_minmax_scaler->prediction.one_off.classification.nn_classifier',
predictor_category='prediction.one_off.classification',
params={
│   │   'bfill': {},
│   │   'static_tabular_imputer': {'imputer': 'ice', 'random_state': 0, 'imputer_params': {'random_state': 0}},
│   │   'ts_tabular_imputer': {'imputer': 'ice', 'random_state': 0, 'imputer_params': {'random_state': 0}},
│   │   'ts_minmax_scaler': {'feature_range': [0, 1], 'clip': False},
│   │   'nn_classifier': {
│   │   │   'n_static_units_hidden': 100,
│   │   │   'n_static_layers_hidden': 2,
│   │   │   'n_temporal_units_hidden': 102,
│   │   │   'n_temporal_layers_hidden': 2,
│   │   │   'n_iter': 100,
│   │   │   'mode': 'RNN',
│   │   │   'n_iter_print': 10,
│   │   │   'batch_size': 100,
│   │   │   'lr': 0.001,
│   │   │   'weight_decay': 0.001,
│   │   │   'window_size': 1,
│   │   │   'device': None,
│   │   │   'dataloader_sampler': None,
│   │   │   'dropout': 0,
│   │   │   'nonlin': 'relu',
│   │   │   'random_state': 0,
│   │   │   'clipping_value': 1,
│   │   │   'patience': 20,
│   │   │   'train_ratio': 0.8
│   │   }
}
)

Using the Pipeline:

[ ]:
from tempor import plugin_loader

dataset = plugin_loader.get("prediction.one_off.sine", plugin_type="datasource", random_state=42).load()

# Fit:
pipe.fit(dataset)

# Predict:
pipe.predict(dataset)  # This will transform the data ant then predict.
2023-11-16 22:18:43 | INFO     | hyperimpute.logger:log_and_print:65 |   > HyperImpute using inner optimization
2023-11-16 22:18:43 | INFO     | hyperimpute.logger:log_and_print:65 |   > Imputation iter 0
2023-11-16 22:18:43 | INFO     | hyperimpute.logger:log_and_print:65 |   > Imputation iter 1
2023-11-16 22:18:43 | INFO     | hyperimpute.logger:log_and_print:65 |   > Imputation iter 2
2023-11-16 22:18:43 | INFO     | hyperimpute.logger:log_and_print:65 |   > Imputation iter 3
2023-11-16 22:18:43 | INFO     | hyperimpute.logger:log_and_print:65 |   > Imputation iter 4
2023-11-16 22:18:43 | INFO     | hyperimpute.logger:log_and_print:65 |   > Imputation iter 5
2023-11-16 22:18:43 | INFO     | hyperimpute.logger:log_and_print:65 |   > Imputation iter 6
2023-11-16 22:18:43 | INFO     | hyperimpute.logger:log_and_print:65 |      >>>> Early stopping on objective diff iteration
2023-11-16 22:18:43 | INFO     | hyperimpute.logger:log_and_print:65 |   > HyperImpute using inner optimization
2023-11-16 22:18:43 | INFO     | hyperimpute.logger:log_and_print:65 |   > Imputation iter 0
2023-11-16 22:18:43 | INFO     | hyperimpute.logger:log_and_print:65 |   > Imputation iter 1
2023-11-16 22:18:43 | INFO     | hyperimpute.logger:log_and_print:65 |   > Imputation iter 2
2023-11-16 22:18:43 | INFO     | hyperimpute.logger:log_and_print:65 |   > Imputation iter 3
2023-11-16 22:18:43 | INFO     | hyperimpute.logger:log_and_print:65 |   > Imputation iter 4
2023-11-16 22:18:43 | INFO     | hyperimpute.logger:log_and_print:65 |   > Imputation iter 5
2023-11-16 22:18:43 | INFO     | hyperimpute.logger:log_and_print:65 |   > Imputation iter 6
2023-11-16 22:18:43 | INFO     | hyperimpute.logger:log_and_print:65 |      >>>> Early stopping on objective diff iteration
2023-11-16 22:18:46 | INFO     | tempor.models.ts_model:_train:388 | Epoch:0| train loss: 0.688586950302124, validation loss: 0.6886438131332397
2023-11-16 22:18:46 | INFO     | tempor.models.ts_model:_train:388 | Epoch:10| train loss: 0.6998531222343445, validation loss: 0.6888602375984192
2023-11-16 22:18:46 | INFO     | tempor.models.ts_model:_train:388 | Epoch:20| train loss: 0.6862186789512634, validation loss: 0.689547061920166
2023-11-16 22:18:46 | INFO     | tempor.models.ts_model:_train:388 | Epoch:30| train loss: 0.6913825273513794, validation loss: 0.6873278617858887
2023-11-16 22:18:46 | INFO     | tempor.models.ts_model:_train:388 | Epoch:40| train loss: 0.6604955196380615, validation loss: 0.6891502141952515
2023-11-16 22:18:46 | INFO     | tempor.models.ts_model:_train:388 | Epoch:50| train loss: 0.6746630072593689, validation loss: 0.6878217458724976
2023-11-16 22:18:46 | INFO     | tempor.models.ts_model:_train:388 | Epoch:60| train loss: 0.7084730267524719, validation loss: 0.6860195398330688
2023-11-16 22:18:46 | INFO     | tempor.models.ts_model:_train:388 | Epoch:70| train loss: 0.6851019263267517, validation loss: 0.6836397647857666
2023-11-16 22:18:46 | INFO     | tempor.models.ts_model:_train:388 | Epoch:80| train loss: 0.6695641279220581, validation loss: 0.6823516488075256
2023-11-16 22:18:46 | INFO     | tempor.models.ts_model:_train:388 | Epoch:90| train loss: 0.6456663608551025, validation loss: 0.6871115565299988
2023-11-16 22:18:47 | INFO     | hyperimpute.logger:log_and_print:65 |   > HyperImpute using inner optimization
2023-11-16 22:18:47 | INFO     | hyperimpute.logger:log_and_print:65 |   > Imputation iter 0
2023-11-16 22:18:47 | INFO     | hyperimpute.logger:log_and_print:65 |   > Imputation iter 1
2023-11-16 22:18:47 | INFO     | hyperimpute.logger:log_and_print:65 |   > Imputation iter 2
2023-11-16 22:18:47 | INFO     | hyperimpute.logger:log_and_print:65 |   > Imputation iter 3
2023-11-16 22:18:47 | INFO     | hyperimpute.logger:log_and_print:65 |   > Imputation iter 4
2023-11-16 22:18:47 | INFO     | hyperimpute.logger:log_and_print:65 |   > Imputation iter 5
2023-11-16 22:18:47 | INFO     | hyperimpute.logger:log_and_print:65 |   > Imputation iter 6
2023-11-16 22:18:47 | INFO     | hyperimpute.logger:log_and_print:65 |      >>>> Early stopping on objective diff iteration
2023-11-16 22:18:47 | INFO     | hyperimpute.logger:log_and_print:65 |   > HyperImpute using inner optimization
2023-11-16 22:18:47 | INFO     | hyperimpute.logger:log_and_print:65 |   > Imputation iter 0
2023-11-16 22:18:47 | INFO     | hyperimpute.logger:log_and_print:65 |   > Imputation iter 1
2023-11-16 22:18:47 | INFO     | hyperimpute.logger:log_and_print:65 |   > Imputation iter 2
2023-11-16 22:18:47 | INFO     | hyperimpute.logger:log_and_print:65 |   > Imputation iter 3
2023-11-16 22:18:47 | INFO     | hyperimpute.logger:log_and_print:65 |   > Imputation iter 4
2023-11-16 22:18:47 | INFO     | hyperimpute.logger:log_and_print:65 |   > Imputation iter 5
2023-11-16 22:18:47 | INFO     | hyperimpute.logger:log_and_print:65 |   > Imputation iter 6
2023-11-16 22:18:47 | INFO     | hyperimpute.logger:log_and_print:65 |      >>>> Early stopping on objective diff iteration

StaticSamples with data:

feat_0
sample_idx
0 1.0
1 1.0
2 1.0
3 1.0
4 0.0
... ...
95 1.0
96 1.0
97 1.0
98 1.0
99 1.0

100 rows × 1 columns