Data Tutorial 02: Datasets¶
This tutorial shows different TemporAI Datasets.
Prepare some example data¶
[ ]:
import pandas as pd
import numpy as np
# Some time series data:
time_series_df = pd.DataFrame(
{
"sample_idx": ["sample_0", "sample_0", "sample_0", "sample_0", "sample_1", "sample_1", "sample_2"],
"time_idx": [1, 2, 3, 4, 2, 4, 9],
"t_feat_0": [11, 12, 13, 14, 21, 22, 31],
"t_feat_1": [1.1, 1.2, 1.3, 1.4, 2.1, 2.2, 3.1],
"t_feat_2": [10, 20, 30, 40, 11, 21, 111],
}
)
time_series_df.set_index(keys=["sample_idx", "time_idx"], drop=True, inplace=True)
# Some static data:
static_df = pd.DataFrame(
{
"s_feat_0": [100, 200, 300],
"s_feat_1": [-1.1, -1.2, -1.3],
"s_feat_2": [0, 1, 0],
},
index=["sample_0", "sample_1", "sample_2"],
)
event_df = pd.DataFrame(
{
"e_feat_0": [(10, True), (12, False), (13, True)],
"e_feat_1": [(10, False), (10, False), (11, True)],
},
index=["sample_0", "sample_1", "sample_2"],
)
Preview the dataframes below.
[ ]:
time_series_df
| t_feat_0 | t_feat_1 | t_feat_2 | ||
|---|---|---|---|---|
| sample_idx | time_idx | |||
| sample_0 | 1 | 11 | 1.1 | 10 |
| 2 | 12 | 1.2 | 20 | |
| 3 | 13 | 1.3 | 30 | |
| 4 | 14 | 1.4 | 40 | |
| sample_1 | 2 | 21 | 2.1 | 11 |
| 4 | 22 | 2.2 | 21 | |
| sample_2 | 9 | 31 | 3.1 | 111 |
[ ]:
static_df
| s_feat_0 | s_feat_1 | s_feat_2 | |
|---|---|---|---|
| sample_0 | 100 | -1.1 | 0 |
| sample_1 | 200 | -1.2 | 1 |
| sample_2 | 300 | -1.3 | 0 |
[ ]:
event_df
| e_feat_0 | e_feat_1 | |
|---|---|---|
| sample_0 | (10, True) | (10, False) |
| sample_1 | (12, False) | (10, False) |
| sample_2 | (13, True) | (11, True) |
CovariatesDataset¶
A CovariatesDataset contains time series and optionally static covariates only, without any predictive data (targets or treatments).
It can be used with preprocessing transformations.
[ ]:
from tempor.data import dataset
[ ]:
# Initialize a CovariatesDataset:
data = dataset.CovariatesDataset(
time_series=time_series_df,
static=static_df, # Optional, can be `None`.
)
data
CovariatesDataset(
time_series=TimeSeriesSamples([3, *, 3]),
static=StaticSamples([3, 3])
)
[ ]:
data.time_series
TimeSeriesSamples with data:
| t_feat_0 | t_feat_1 | t_feat_2 | ||
|---|---|---|---|---|
| sample_idx | time_idx | |||
| sample_0 | 1 | 11 | 1.1 | 10 |
| 2 | 12 | 1.2 | 20 | |
| 3 | 13 | 1.3 | 30 | |
| 4 | 14 | 1.4 | 40 | |
| sample_1 | 2 | 21 | 2.1 | 11 |
| 4 | 22 | 2.2 | 21 | |
| sample_2 | 9 | 31 | 3.1 | 111 |
[ ]:
data.static
StaticSamples with data:
| s_feat_0 | s_feat_1 | s_feat_2 | |
|---|---|---|---|
| sample_idx | |||
| sample_0 | 100 | -1.1 | 0 |
| sample_1 | 200 | -1.2 | 1 |
| sample_2 | 300 | -1.3 | 0 |
OneOffPredictionDataset¶
A OneOffPredictionDataset contains time series and optionally static covariates.
It also needs StaticSamples prediction targets for estimators to be able to fit on this dataset.
It can be used with prediction.one_off estimators. The task is to predict some one-off value for each sample.
[ ]:
# Initialize a OneOffPredictionDataset:
data = dataset.OneOffPredictionDataset(
time_series=time_series_df,
static=static_df.loc[:, :"s_feat_1"], # Optional, can be `None`.
targets=static_df.loc[:, ["s_feat_2"]], # Optional, can be `None` at inference time.
)
data
OneOffPredictionDataset(
time_series=TimeSeriesSamples([3, *, 3]),
static=StaticSamples([3, 2]),
predictive=OneOffPredictionTaskData(targets=StaticSamples([3, 1]))
)
[ ]:
data.time_series
TimeSeriesSamples with data:
| t_feat_0 | t_feat_1 | t_feat_2 | ||
|---|---|---|---|---|
| sample_idx | time_idx | |||
| sample_0 | 1 | 11 | 1.1 | 10 |
| 2 | 12 | 1.2 | 20 | |
| 3 | 13 | 1.3 | 30 | |
| 4 | 14 | 1.4 | 40 | |
| sample_1 | 2 | 21 | 2.1 | 11 |
| 4 | 22 | 2.2 | 21 | |
| sample_2 | 9 | 31 | 3.1 | 111 |
[ ]:
data.static
StaticSamples with data:
| s_feat_0 | s_feat_1 | |
|---|---|---|
| sample_idx | ||
| sample_0 | 100 | -1.1 |
| sample_1 | 200 | -1.2 |
| sample_2 | 300 | -1.3 |
[ ]:
data.predictive.targets
StaticSamples with data:
| s_feat_2 | |
|---|---|
| sample_idx | |
| sample_0 | 0 |
| sample_1 | 1 |
| sample_2 | 0 |
TemporalPredictionDataset¶
A TemporalPredictionDataset contains time series and optionally static covariates.
It also needs TimeSeriesSamples prediction targets for estimators to be able to fit on this dataset.
It can be used with prediction.temporal estimators. The task is to predict some time series for each sample.
[ ]:
# Initialize a TemporalPredictionDataset:
data = dataset.TemporalPredictionDataset(
time_series=time_series_df.loc[:, :"t_feat_1"],
static=static_df, # Optional, can be `None`.
targets=time_series_df.loc[:, ["t_feat_2"]], # Optional, can be `None` at inference time.
)
data
TemporalPredictionDataset(
time_series=TimeSeriesSamples([3, *, 2]),
static=StaticSamples([3, 3]),
predictive=TemporalPredictionTaskData(
targets=TimeSeriesSamples([3, *, 1])
)
)
[ ]:
data.time_series
TimeSeriesSamples with data:
| t_feat_0 | t_feat_1 | ||
|---|---|---|---|
| sample_idx | time_idx | ||
| sample_0 | 1 | 11 | 1.1 |
| 2 | 12 | 1.2 | |
| 3 | 13 | 1.3 | |
| 4 | 14 | 1.4 | |
| sample_1 | 2 | 21 | 2.1 |
| 4 | 22 | 2.2 | |
| sample_2 | 9 | 31 | 3.1 |
[ ]:
data.static
StaticSamples with data:
| s_feat_0 | s_feat_1 | s_feat_2 | |
|---|---|---|---|
| sample_idx | |||
| sample_0 | 100 | -1.1 | 0 |
| sample_1 | 200 | -1.2 | 1 |
| sample_2 | 300 | -1.3 | 0 |
[ ]:
data.predictive.targets
TimeSeriesSamples with data:
| t_feat_2 | ||
|---|---|---|
| sample_idx | time_idx | |
| sample_0 | 1 | 10 |
| 2 | 20 | |
| 3 | 30 | |
| 4 | 40 | |
| sample_1 | 2 | 11 |
| 4 | 21 | |
| sample_2 | 9 | 111 |
TimeToEventAnalysisDataset¶
A TimeToEventAnalysisDataset contains time series and optionally static covariates.
It also needs EventSamples prediction targets for estimators to be able to fit on this dataset.
It can be used with time_to_event estimators. The task is to predict risk scores for each sample.
[ ]:
# Initialize a TimeToEventAnalysisDataset:
data = dataset.TimeToEventAnalysisDataset(
time_series=time_series_df,
static=static_df, # Optional, can be `None`.
targets=event_df, # Optional, can be `None` at inference time.
)
data
TimeToEventAnalysisDataset(
time_series=TimeSeriesSamples([3, *, 3]),
static=StaticSamples([3, 3]),
predictive=TimeToEventAnalysisTaskData(targets=EventSamples([3, 2]))
)
[ ]:
data.time_series
TimeSeriesSamples with data:
| t_feat_0 | t_feat_1 | t_feat_2 | ||
|---|---|---|---|---|
| sample_idx | time_idx | |||
| sample_0 | 1 | 11 | 1.1 | 10 |
| 2 | 12 | 1.2 | 20 | |
| 3 | 13 | 1.3 | 30 | |
| 4 | 14 | 1.4 | 40 | |
| sample_1 | 2 | 21 | 2.1 | 11 |
| 4 | 22 | 2.2 | 21 | |
| sample_2 | 9 | 31 | 3.1 | 111 |
[ ]:
data.static
StaticSamples with data:
| s_feat_0 | s_feat_1 | s_feat_2 | |
|---|---|---|---|
| sample_idx | |||
| sample_0 | 100 | -1.1 | 0 |
| sample_1 | 200 | -1.2 | 1 |
| sample_2 | 300 | -1.3 | 0 |
[ ]:
data.predictive.targets
EventSamples with data:
| e_feat_0 | e_feat_1 | |
|---|---|---|
| sample_idx | ||
| sample_0 | (10, True) | (10, False) |
| sample_1 | (12, False) | (10, False) |
| sample_2 | (13, True) | (11, True) |
OneOffTreatmentEffectsDataset¶
A OneOffTreatmentEffectsDataset contains time series and optionally static covariates.
It also needs TimeSeriesSamples prediction targets and EventSamples treatments for estimators to be able to fit on this dataset.
It can be used with treatments.one_off estimators. The task is to predict a time series counterfactual outcome based on a one-off treatment event.
[ ]:
# Initialize a TimeToEventAnalysisDataset:
data = dataset.OneOffTreatmentEffectsDataset(
time_series=time_series_df.loc[:, :"t_feat_1"],
static=static_df, # Optional, can be `None`.
targets=time_series_df.loc[:, ["t_feat_2"]], # Optional, can be `None` at inference time.
treatments=event_df.loc[:, ["e_feat_0"]],
)
data
OneOffTreatmentEffectsDataset(
time_series=TimeSeriesSamples([3, *, 2]),
static=StaticSamples([3, 3]),
predictive=OneOffTreatmentEffectsTaskData(
targets=TimeSeriesSamples([3, *, 1]),
treatments=EventSamples([3, 1])
)
)
[ ]:
data.time_series
TimeSeriesSamples with data:
| t_feat_0 | t_feat_1 | ||
|---|---|---|---|
| sample_idx | time_idx | ||
| sample_0 | 1 | 11 | 1.1 |
| 2 | 12 | 1.2 | |
| 3 | 13 | 1.3 | |
| 4 | 14 | 1.4 | |
| sample_1 | 2 | 21 | 2.1 |
| 4 | 22 | 2.2 | |
| sample_2 | 9 | 31 | 3.1 |
[ ]:
data.static
StaticSamples with data:
| s_feat_0 | s_feat_1 | s_feat_2 | |
|---|---|---|---|
| sample_idx | |||
| sample_0 | 100 | -1.1 | 0 |
| sample_1 | 200 | -1.2 | 1 |
| sample_2 | 300 | -1.3 | 0 |
[ ]:
data.predictive.targets
TimeSeriesSamples with data:
| t_feat_2 | ||
|---|---|---|
| sample_idx | time_idx | |
| sample_0 | 1 | 10 |
| 2 | 20 | |
| 3 | 30 | |
| 4 | 40 | |
| sample_1 | 2 | 11 |
| 4 | 21 | |
| sample_2 | 9 | 111 |
[ ]:
data.predictive.treatments
EventSamples with data:
| e_feat_0 | |
|---|---|
| sample_idx | |
| sample_0 | (10, True) |
| sample_1 | (12, False) |
| sample_2 | (13, True) |
TemporalTreatmentEffectsDataset¶
A TemporalTreatmentEffectsDataset contains time series and optionally static covariates.
It also needs TimeSeriesSamples prediction targets and TimeSeriesSamples treatments for estimators to be able to fit on this dataset.
It can be used with treatments.temporal estimators. The task is to predict a time series counterfactual outcome based on a time series treatment.
[ ]:
# Initialize a TimeToEventAnalysisDataset:
data = dataset.TemporalTreatmentEffectsDataset(
time_series=time_series_df.loc[:, :"t_feat_0"],
static=static_df, # Optional, can be `None`.
targets=time_series_df.loc[:, ["t_feat_1"]], # Optional, can be `None` at inference time.
treatments=time_series_df.loc[:, ["t_feat_2"]],
)
data
TemporalTreatmentEffectsDataset(
time_series=TimeSeriesSamples([3, *, 1]),
static=StaticSamples([3, 3]),
predictive=TemporalTreatmentEffectsTaskData(
targets=TimeSeriesSamples([3, *, 1]),
treatments=TimeSeriesSamples([3, *, 1])
)
)
[ ]:
data.time_series
TimeSeriesSamples with data:
| t_feat_0 | ||
|---|---|---|
| sample_idx | time_idx | |
| sample_0 | 1 | 11 |
| 2 | 12 | |
| 3 | 13 | |
| 4 | 14 | |
| sample_1 | 2 | 21 |
| 4 | 22 | |
| sample_2 | 9 | 31 |
[ ]:
data.static
StaticSamples with data:
| s_feat_0 | s_feat_1 | s_feat_2 | |
|---|---|---|---|
| sample_idx | |||
| sample_0 | 100 | -1.1 | 0 |
| sample_1 | 200 | -1.2 | 1 |
| sample_2 | 300 | -1.3 | 0 |
[ ]:
data.predictive.targets
TimeSeriesSamples with data:
| t_feat_1 | ||
|---|---|---|
| sample_idx | time_idx | |
| sample_0 | 1 | 1.1 |
| 2 | 1.2 | |
| 3 | 1.3 | |
| 4 | 1.4 | |
| sample_1 | 2 | 2.1 |
| 4 | 2.2 | |
| sample_2 | 9 | 3.1 |
[ ]:
data.predictive.treatments
TimeSeriesSamples with data:
| t_feat_2 | ||
|---|---|---|
| sample_idx | time_idx | |
| sample_0 | 1 | 10 |
| 2 | 20 | |
| 3 | 30 | |
| 4 | 40 | |
| sample_1 | 2 | 11 |
| 4 | 21 | |
| sample_2 | 9 | 111 |