Test In Colab

Data Tutorial 02: Datasets

This tutorial shows different TemporAI Datasets.

Prepare some example data

[ ]:
import pandas as pd
import numpy as np

# Some time series data:
time_series_df = pd.DataFrame(
    {
        "sample_idx": ["sample_0", "sample_0", "sample_0", "sample_0", "sample_1", "sample_1", "sample_2"],
        "time_idx": [1, 2, 3, 4, 2, 4, 9],
        "t_feat_0": [11, 12, 13, 14, 21, 22, 31],
        "t_feat_1": [1.1, 1.2, 1.3, 1.4, 2.1, 2.2, 3.1],
        "t_feat_2": [10, 20, 30, 40, 11, 21, 111],
    }
)
time_series_df.set_index(keys=["sample_idx", "time_idx"], drop=True, inplace=True)

# Some static data:
static_df = pd.DataFrame(
    {
        "s_feat_0": [100, 200, 300],
        "s_feat_1": [-1.1, -1.2, -1.3],
        "s_feat_2": [0, 1, 0],
    },
    index=["sample_0", "sample_1", "sample_2"],
)

event_df = pd.DataFrame(
    {
        "e_feat_0": [(10, True), (12, False), (13, True)],
        "e_feat_1": [(10, False), (10, False), (11, True)],
    },
    index=["sample_0", "sample_1", "sample_2"],
)

Preview the dataframes below.

[ ]:
time_series_df
t_feat_0 t_feat_1 t_feat_2
sample_idx time_idx
sample_0 1 11 1.1 10
2 12 1.2 20
3 13 1.3 30
4 14 1.4 40
sample_1 2 21 2.1 11
4 22 2.2 21
sample_2 9 31 3.1 111
[ ]:
static_df
s_feat_0 s_feat_1 s_feat_2
sample_0 100 -1.1 0
sample_1 200 -1.2 1
sample_2 300 -1.3 0
[ ]:
event_df
e_feat_0 e_feat_1
sample_0 (10, True) (10, False)
sample_1 (12, False) (10, False)
sample_2 (13, True) (11, True)

CovariatesDataset

A CovariatesDataset contains time series and optionally static covariates only, without any predictive data (targets or treatments).

It can be used with preprocessing transformations.

[ ]:
from tempor.data import dataset
[ ]:
# Initialize a CovariatesDataset:
data = dataset.CovariatesDataset(
    time_series=time_series_df,
    static=static_df,  # Optional, can be `None`.
)

data
CovariatesDataset(
    time_series=TimeSeriesSamples([3, *, 3]),
    static=StaticSamples([3, 3])
)
[ ]:
data.time_series

TimeSeriesSamples with data:

t_feat_0 t_feat_1 t_feat_2
sample_idx time_idx
sample_0 1 11 1.1 10
2 12 1.2 20
3 13 1.3 30
4 14 1.4 40
sample_1 2 21 2.1 11
4 22 2.2 21
sample_2 9 31 3.1 111
[ ]:
data.static

StaticSamples with data:

s_feat_0 s_feat_1 s_feat_2
sample_idx
sample_0 100 -1.1 0
sample_1 200 -1.2 1
sample_2 300 -1.3 0

OneOffPredictionDataset

A OneOffPredictionDataset contains time series and optionally static covariates.

It also needs StaticSamples prediction targets for estimators to be able to fit on this dataset.

It can be used with prediction.one_off estimators. The task is to predict some one-off value for each sample.

[ ]:
# Initialize a OneOffPredictionDataset:
data = dataset.OneOffPredictionDataset(
    time_series=time_series_df,
    static=static_df.loc[:, :"s_feat_1"],  # Optional, can be `None`.
    targets=static_df.loc[:, ["s_feat_2"]],  # Optional, can be `None` at inference time.
)

data
OneOffPredictionDataset(
    time_series=TimeSeriesSamples([3, *, 3]),
    static=StaticSamples([3, 2]),
    predictive=OneOffPredictionTaskData(targets=StaticSamples([3, 1]))
)
[ ]:
data.time_series

TimeSeriesSamples with data:

t_feat_0 t_feat_1 t_feat_2
sample_idx time_idx
sample_0 1 11 1.1 10
2 12 1.2 20
3 13 1.3 30
4 14 1.4 40
sample_1 2 21 2.1 11
4 22 2.2 21
sample_2 9 31 3.1 111
[ ]:
data.static

StaticSamples with data:

s_feat_0 s_feat_1
sample_idx
sample_0 100 -1.1
sample_1 200 -1.2
sample_2 300 -1.3
[ ]:
data.predictive.targets

StaticSamples with data:

s_feat_2
sample_idx
sample_0 0
sample_1 1
sample_2 0

TemporalPredictionDataset

A TemporalPredictionDataset contains time series and optionally static covariates.

It also needs TimeSeriesSamples prediction targets for estimators to be able to fit on this dataset.

It can be used with prediction.temporal estimators. The task is to predict some time series for each sample.

[ ]:
# Initialize a TemporalPredictionDataset:
data = dataset.TemporalPredictionDataset(
    time_series=time_series_df.loc[:, :"t_feat_1"],
    static=static_df,  # Optional, can be `None`.
    targets=time_series_df.loc[:, ["t_feat_2"]],  # Optional, can be `None` at inference time.
)

data
TemporalPredictionDataset(
    time_series=TimeSeriesSamples([3, *, 2]),
    static=StaticSamples([3, 3]),
    predictive=TemporalPredictionTaskData(
        targets=TimeSeriesSamples([3, *, 1])
    )
)
[ ]:
data.time_series

TimeSeriesSamples with data:

t_feat_0 t_feat_1
sample_idx time_idx
sample_0 1 11 1.1
2 12 1.2
3 13 1.3
4 14 1.4
sample_1 2 21 2.1
4 22 2.2
sample_2 9 31 3.1
[ ]:
data.static

StaticSamples with data:

s_feat_0 s_feat_1 s_feat_2
sample_idx
sample_0 100 -1.1 0
sample_1 200 -1.2 1
sample_2 300 -1.3 0
[ ]:
data.predictive.targets

TimeSeriesSamples with data:

t_feat_2
sample_idx time_idx
sample_0 1 10
2 20
3 30
4 40
sample_1 2 11
4 21
sample_2 9 111

TimeToEventAnalysisDataset

A TimeToEventAnalysisDataset contains time series and optionally static covariates.

It also needs EventSamples prediction targets for estimators to be able to fit on this dataset.

It can be used with time_to_event estimators. The task is to predict risk scores for each sample.

[ ]:
# Initialize a TimeToEventAnalysisDataset:
data = dataset.TimeToEventAnalysisDataset(
    time_series=time_series_df,
    static=static_df,  # Optional, can be `None`.
    targets=event_df,  # Optional, can be `None` at inference time.
)

data
TimeToEventAnalysisDataset(
    time_series=TimeSeriesSamples([3, *, 3]),
    static=StaticSamples([3, 3]),
    predictive=TimeToEventAnalysisTaskData(targets=EventSamples([3, 2]))
)
[ ]:
data.time_series

TimeSeriesSamples with data:

t_feat_0 t_feat_1 t_feat_2
sample_idx time_idx
sample_0 1 11 1.1 10
2 12 1.2 20
3 13 1.3 30
4 14 1.4 40
sample_1 2 21 2.1 11
4 22 2.2 21
sample_2 9 31 3.1 111
[ ]:
data.static

StaticSamples with data:

s_feat_0 s_feat_1 s_feat_2
sample_idx
sample_0 100 -1.1 0
sample_1 200 -1.2 1
sample_2 300 -1.3 0
[ ]:
data.predictive.targets

EventSamples with data:

e_feat_0 e_feat_1
sample_idx
sample_0 (10, True) (10, False)
sample_1 (12, False) (10, False)
sample_2 (13, True) (11, True)

OneOffTreatmentEffectsDataset

A OneOffTreatmentEffectsDataset contains time series and optionally static covariates.

It also needs TimeSeriesSamples prediction targets and EventSamples treatments for estimators to be able to fit on this dataset.

It can be used with treatments.one_off estimators. The task is to predict a time series counterfactual outcome based on a one-off treatment event.

[ ]:
# Initialize a TimeToEventAnalysisDataset:
data = dataset.OneOffTreatmentEffectsDataset(
    time_series=time_series_df.loc[:, :"t_feat_1"],
    static=static_df,  # Optional, can be `None`.
    targets=time_series_df.loc[:, ["t_feat_2"]],  # Optional, can be `None` at inference time.
    treatments=event_df.loc[:, ["e_feat_0"]],
)

data
OneOffTreatmentEffectsDataset(
    time_series=TimeSeriesSamples([3, *, 2]),
    static=StaticSamples([3, 3]),
    predictive=OneOffTreatmentEffectsTaskData(
        targets=TimeSeriesSamples([3, *, 1]),
        treatments=EventSamples([3, 1])
    )
)
[ ]:
data.time_series

TimeSeriesSamples with data:

t_feat_0 t_feat_1
sample_idx time_idx
sample_0 1 11 1.1
2 12 1.2
3 13 1.3
4 14 1.4
sample_1 2 21 2.1
4 22 2.2
sample_2 9 31 3.1
[ ]:
data.static

StaticSamples with data:

s_feat_0 s_feat_1 s_feat_2
sample_idx
sample_0 100 -1.1 0
sample_1 200 -1.2 1
sample_2 300 -1.3 0
[ ]:
data.predictive.targets

TimeSeriesSamples with data:

t_feat_2
sample_idx time_idx
sample_0 1 10
2 20
3 30
4 40
sample_1 2 11
4 21
sample_2 9 111
[ ]:
data.predictive.treatments

EventSamples with data:

e_feat_0
sample_idx
sample_0 (10, True)
sample_1 (12, False)
sample_2 (13, True)

TemporalTreatmentEffectsDataset

A TemporalTreatmentEffectsDataset contains time series and optionally static covariates.

It also needs TimeSeriesSamples prediction targets and TimeSeriesSamples treatments for estimators to be able to fit on this dataset.

It can be used with treatments.temporal estimators. The task is to predict a time series counterfactual outcome based on a time series treatment.

[ ]:
# Initialize a TimeToEventAnalysisDataset:
data = dataset.TemporalTreatmentEffectsDataset(
    time_series=time_series_df.loc[:, :"t_feat_0"],
    static=static_df,  # Optional, can be `None`.
    targets=time_series_df.loc[:, ["t_feat_1"]],  # Optional, can be `None` at inference time.
    treatments=time_series_df.loc[:, ["t_feat_2"]],
)

data
TemporalTreatmentEffectsDataset(
    time_series=TimeSeriesSamples([3, *, 1]),
    static=StaticSamples([3, 3]),
    predictive=TemporalTreatmentEffectsTaskData(
        targets=TimeSeriesSamples([3, *, 1]),
        treatments=TimeSeriesSamples([3, *, 1])
    )
)
[ ]:
data.time_series

TimeSeriesSamples with data:

t_feat_0
sample_idx time_idx
sample_0 1 11
2 12
3 13
4 14
sample_1 2 21
4 22
sample_2 9 31
[ ]:
data.static

StaticSamples with data:

s_feat_0 s_feat_1 s_feat_2
sample_idx
sample_0 100 -1.1 0
sample_1 200 -1.2 1
sample_2 300 -1.3 0
[ ]:
data.predictive.targets

TimeSeriesSamples with data:

t_feat_1
sample_idx time_idx
sample_0 1 1.1
2 1.2
3 1.3
4 1.4
sample_1 2 2.1
4 2.2
sample_2 9 3.1
[ ]:
data.predictive.treatments

TimeSeriesSamples with data:

t_feat_2
sample_idx time_idx
sample_0 1 10
2 20
3 30
4 40
sample_1 2 11
4 21
sample_2 9 111