User Guide Tutorial 03: Preprocessing › Scaling¶
This tutorial shows how to use TemporAI preprocessing.scaling plugins.
All preprocessing.scaling plugins¶
To see all the relevant plugins:
[ ]:
from tempor import plugin_loader
plugin_loader.list()["preprocessing"]["scaling"]
{'static': ['static_minmax_scaler', 'static_standard_scaler'],
'temporal': ['ts_minmax_scaler', 'ts_standard_scaler']}
Now also load data source(s) we will use:
[ ]:
SineDataSource = plugin_loader.get_class("prediction.one_off.sine", plugin_type="datasource")
Using a static data scaling plugin¶
[ ]:
from tempor import plugin_loader
dataset = SineDataSource(static_scale=5.0, random_state=42).load()
print(dataset)
model = plugin_loader.get("preprocessing.scaling.static.static_minmax_scaler", static_imputer="mean")
print(model)
OneOffPredictionDataset(
time_series=TimeSeriesSamples([100, *, 5]),
static=StaticSamples([100, 4]),
predictive=OneOffPredictionTaskData(targets=StaticSamples([100, 1]))
)
StaticMinMaxScaler(
name='static_minmax_scaler',
category='preprocessing.scaling.static',
plugin_type='method',
params={'feature_range': [0, 1], 'clip': False}
)
[ ]:
# Note the scale of static features.
from IPython.display import display
print("Min, max values per feature:")
display(dataset.static.dataframe().describe().T.loc[:, ["min", "max"]]) # type: ignore
dataset.static
Min, max values per feature:
| min | max | |
|---|---|---|
| 0 | 0.025308 | 4.818100 |
| 1 | 0.045985 | 4.950269 |
| 2 | 0.102922 | 4.952526 |
| 3 | 0.082939 | 4.858910 |
StaticSamples with data:
| 0 | 1 | 2 | 3 | |
|---|---|---|---|---|
| sample_idx | ||||
| 0 | 1.872701 | 4.753572 | 3.659970 | 2.993292 |
| 1 | 0.780093 | 0.779973 | 0.290418 | 4.330881 |
| 2 | 3.005575 | 3.540363 | 0.102922 | 4.849549 |
| 3 | 4.162213 | 1.061696 | 0.909125 | 0.917023 |
| 4 | 1.521211 | 2.623782 | 2.159725 | 1.456146 |
| ... | ... | ... | ... | ... |
| 95 | 0.590824 | 3.483686 | 3.144714 | 4.387360 |
| 96 | 3.675355 | 4.017405 | 1.410173 | 0.887198 |
| 97 | 3.753074 | 4.034174 | 4.952526 | 2.063088 |
| 98 | 1.860090 | 3.882065 | 1.704018 | 4.653787 |
| 99 | 4.292064 | 2.144970 | 3.754355 | 3.772714 |
100 rows × 4 columns
[ ]:
# Note the new scale of static features.
dataset = model.fit_transform(dataset) # Or call fit() then transform().
print("Min, max values per feature:")
display(dataset.static.dataframe().describe().T.loc[:, ["min", "max"]]) # type: ignore
dataset.static
Min, max values per feature:
| min | max | |
|---|---|---|
| 0 | 0.0 | 1.0 |
| 1 | 0.0 | 1.0 |
| 2 | 0.0 | 1.0 |
| 3 | 0.0 | 1.0 |
StaticSamples with data:
| 0 | 1 | 2 | 3 | |
|---|---|---|---|---|
| sample_idx | ||||
| 0 | 0.385452 | 0.959893 | 0.733472 | 0.609374 |
| 1 | 0.157483 | 0.149662 | 0.038662 | 0.889440 |
| 2 | 0.621823 | 0.712515 | 0.000000 | 0.998040 |
| 3 | 0.863151 | 0.207107 | 0.166241 | 0.174642 |
| 4 | 0.312115 | 0.525621 | 0.424118 | 0.287524 |
| ... | ... | ... | ... | ... |
| 95 | 0.117993 | 0.700959 | 0.627225 | 0.901266 |
| 96 | 0.761570 | 0.809786 | 0.269558 | 0.168397 |
| 97 | 0.777786 | 0.813205 | 1.000000 | 0.414607 |
| 98 | 0.382821 | 0.782190 | 0.330150 | 0.957051 |
| 99 | 0.890244 | 0.427990 | 0.752934 | 0.772571 |
100 rows × 4 columns
Using a temporal data scaling plugin¶
[ ]:
from tempor import plugin_loader
dataset = SineDataSource(ts_scale=5.0, random_state=42).load()
print(dataset)
model = plugin_loader.get("preprocessing.scaling.temporal.ts_standard_scaler")
print(model)
OneOffPredictionDataset(
time_series=TimeSeriesSamples([100, *, 5]),
static=StaticSamples([100, 4]),
predictive=OneOffPredictionTaskData(targets=StaticSamples([100, 1]))
)
TimeSeriesStandardScaler(
name='ts_standard_scaler',
category='preprocessing.scaling.temporal',
plugin_type='method',
params={'with_mean': True, 'with_std': True}
)
[ ]:
# Note the scale of time series features.
from IPython.display import display
print("Min, max values per feature:")
display(dataset.time_series.dataframe().describe().T.loc[:, ["min", "max"]])
dataset.time_series
Min, max values per feature:
| min | max | |
|---|---|---|
| 0 | -4.999519 | 4.999999 |
| 1 | -4.999982 | 4.999999 |
| 2 | -4.999923 | 4.999992 |
| 3 | -4.999979 | 5.000000 |
| 4 | -4.999970 | 4.999928 |
TimeSeriesSamples with data:
| 0 | 1 | 2 | 3 | 4 | ||
|---|---|---|---|---|---|---|
| sample_idx | time_idx | |||||
| 0 | 0 | -0.095075 | -0.240884 | -0.542729 | 2.209324 | 0.122539 |
| 1 | 1.500152 | 1.822750 | 2.882952 | 4.450264 | 2.673609 | |
| 2 | 2.939520 | 3.567547 | 4.864963 | 4.930896 | 4.464728 | |
| 3 | 4.073484 | 4.688307 | 4.410788 | 3.461105 | 4.986786 | |
| 4 | 4.784229 | 4.988986 | 1.747861 | 0.622269 | 4.091392 | |
| ... | ... | ... | ... | ... | ... | ... |
| 99 | 5 | 4.835604 | 0.634449 | 4.634897 | 4.910111 | 4.815565 |
| 6 | 3.532665 | 2.066645 | 2.845170 | 3.281070 | 4.946122 | |
| 7 | 1.263739 | 3.315606 | 0.121903 | 0.253339 | 4.998853 | |
| 8 | -1.350749 | 4.270597 | -2.641363 | -2.882388 | 4.972930 | |
| 9 | -3.595883 | 4.846944 | -4.537960 | -4.789380 | 4.868760 |
1000 rows × 5 columns
[ ]:
# Note the new scale of time series features.
dataset = model.fit_transform(dataset) # Or call fit() then transform().
print("Min, max values per feature:")
display(dataset.time_series.dataframe().describe().T.loc[:, ["min", "max"]])
dataset.time_series
Min, max values per feature:
| min | max | |
|---|---|---|
| 0 | -1.711349 | 1.200819 |
| 1 | -1.724449 | 1.239101 |
| 2 | -1.734762 | 1.230568 |
| 3 | -1.592516 | 1.277314 |
| 4 | -1.728804 | 1.177170 |
TimeSeriesSamples with data:
| 0 | 1 | 2 | 3 | 4 | ||
|---|---|---|---|---|---|---|
| sample_idx | time_idx | |||||
| 0 | 0 | -0.283024 | -0.314064 | -0.413046 | 0.476436 | -0.240201 |
| 1 | 0.181555 | 0.297505 | 0.602791 | 1.119549 | 0.501140 | |
| 2 | 0.600744 | 0.814586 | 1.190527 | 1.257482 | 1.021640 | |
| 3 | 0.930989 | 1.146729 | 1.055848 | 0.835676 | 1.173350 | |
| 4 | 1.137980 | 1.235837 | 0.266196 | 0.020977 | 0.913149 | |
| ... | ... | ... | ... | ... | ... | ... |
| 99 | 5 | 1.152942 | -0.054654 | 1.122305 | 1.251517 | 1.123594 |
| 6 | 0.773486 | 0.369785 | 0.591587 | 0.784009 | 1.161533 | |
| 7 | 0.112705 | 0.739922 | -0.215959 | -0.084900 | 1.176857 | |
| 8 | -0.648715 | 1.022938 | -1.035365 | -0.984802 | 1.169324 | |
| 9 | -1.302567 | 1.193742 | -1.597773 | -1.532077 | 1.139052 |
1000 rows × 5 columns