tempor.data.samples_experimental module

Module with experimental samples implementations.

class tempor.data.samples_experimental.StaticSamplesDask(data: DataFrame | ndarray, **kwargs: Any)[source]

Bases: StaticSamplesBase

Create a StaticSamplesDask object from the data.

Parameters:
data : data_typing.DataContainer

A container with the data.

**kwargs : Any

Any additional keyword arguments to pass to the constructor.

static from_dataframe(dataframe: DataFrame, **kwargs: Any) StaticSamplesDask[source]

Create StaticSamplesDask from pandas.DataFrame. The rows represent samples, the columns represent features.

Parameters:
dataframe : pd.DataFrame

The dataframe that represents the data.

**kwargs : Any

Any additional keyword arguments to pass to the constructor.

Returns:

StaticSamples object from dataframe.

Return type:

StaticSamplesDask

static from_numpy(array: ndarray, *, sample_index: list[int] | list[str] | None = None, feature_index: list[str] | None = None, **kwargs: Any) StaticSamplesDask[source]

Not implemented yet.

numpy(**kwargs: Any) ndarray[source]

Return the data as a numpy.ndarray.

Parameters:
**kwargs : Any

Any additional keyword arguments. Currently unused.

Returns:

The numpy.ndarray.

Return type:

np.ndarray

dataframe(**kwargs: Any) DataFrame[source]

Return the data as a pandas.DataFrame.

Parameters:
**kwargs : Any

Any additional keyword arguments. Currently unused.

Returns:

The dataframe.

Return type:

pd.DataFrame

sample_index() list[int] | list[str][source]

Return a list representing sample indexes.

Returns:

Sample indexes.

Return type:

data_typing.SampleIndex

property num_samples : int

Return number of samples.

Returns:

Number of samples.

Return type:

int

property num_features : int

Return number of features.

Returns:

Number of features.

Return type:

int

short_repr() str[source]

A short string representation of the object.

Returns:

The short representation.

Return type:

str

category : ClassVar[plugin_typing.PluginCategory] = 'static_samples'

Plugin category, such as 'prediction.one_off.classification'. Must be set by the plugin class using @register_plugin.

name : ClassVar[plugin_typing.PluginName] = 'static_samples_dask'

Plugin name, such as 'my_nn_classifier'. Must be set by the plugin class using @register_plugin.

plugin_type : ClassVar[plugin_typing.PluginTypeArg] = 'dataformat'

Plugin type, such as 'method'. May be optionally set by the plugin class using @register_plugin, else will set the default plugin type.

tempor.data.samples_experimental.multiindex_df_to_compatible_ddf(df: DataFrame, **kwargs: Any) DataFrame[source]

Convert a multiindex dataframe to a dask dataframe with a single tuple index.

tempor.data.samples_experimental.compatible_ddf_to_multiindex_df(ddf: DataFrame) DataFrame[source]

Convert a dask dataframe with a single tuple index to a multiindex dataframe.

class tempor.data.samples_experimental.TimeSeriesSamplesDask(data: DataFrame | ndarray, **kwargs: Any)[source]

Bases: TimeSeriesSamplesBase

Create an TimeSeriesSamplesDask object from the data.

Parameters:
data : data_typing.DataContainer

A container with the data.

**kwargs : Any

Any additional keyword arguments to pass to the constructor.

static from_dataframe(dataframe: DataFrame, **kwargs: Any) TimeSeriesSamplesDask[source]

Create TimeSeriesSamplesDask from pandas.DataFrame. This row index of the dataframe should be a 2-level multiindex (sample, timestep). The columns should be the features.

Parameters:
dataframe : pd.DataFrame

The dataframe that contains the data.

**kwargs : Any

Any additional keyword arguments to pass to the constructor.

Returns:

The TimeSeriesSamples object created from the dataframe.

Return type:

TimeSeriesSamplesDask

static from_numpy(array: ndarray, **kwargs: Any) TimeSeriesSamplesDask[source]

Not implemented yet.

numpy(*, padding_indicator: Any = 999.0, **kwargs: Any) ndarray[source]

Return the data as a numpy.ndarray.

Parameters:
padding_indicator : Any, optional

Padding indicator value. Defaults to DATA_SETTINGS.default_padding_indicator.

**kwargs : Any

Any additional keyword arguments. Currently unused.

Returns:

The numpy.ndarray.

Return type:

np.ndarray

dataframe(**kwargs: Any) DataFrame[source]

Return the data as a pandas.DataFrame.

Parameters:
**kwargs : Any

Any additional keyword arguments. Currently unused.

Returns:

The pandas.DataFrame.

Return type:

pd.DataFrame

sample_index() list[int] | list[str][source]

Get a list containing sample indexes.

Returns:

A list containing sample indexes.

Return type:

data_typing.SampleIndex

time_indexes() list[list[float]] | list[list[int]] | list[list[Timestamp]][source]

Get a list containing time indexes for each sample. Each time index is represented as a list of time step elements.

Returns:

A list containing time indexes for each sample.

Return type:

data_typing.TimeIndexList

time_indexes_as_dict() dict[int, list[float] | list[int] | list[Timestamp]] | dict[str, list[float] | list[int] | list[Timestamp]][source]

Get a dictionary mapping each sample index to its time index. Time index is represented as a list of time step elements.

Returns:

The dictionary mapping each sample index to its time index.

Return type:

data_typing.SampleToTimeIndexDict

time_indexes_float() list[ndarray][source]

Return time indexes but converting their elements to float values.

Date-time time index will be converted using datetime_time_index_to_float.

Returns:

List of 1D numpy.ndarray s of float values, corresponding to the time index.

Return type:

List[np.ndarray]

num_timesteps() list[int][source]

Get the number of timesteps for each sample.

Returns:

List containing the number of timesteps for each sample.

Return type:

List[int]

num_timesteps_as_dict() dict[int, int] | dict[str, int][source]

Get a dictionary mapping each sample index to its the number of timesteps.

Returns:

List containing the number of timesteps for each sample.

Return type:

data_typing.SampleToNumTimestepsDict

num_timesteps_equal() bool[source]

Returns True if all samples share the same number of timesteps, False otherwise.

Returns:

whether all samples share the same number of timesteps.

Return type:

bool

list_of_dataframes() list[DataFrame][source]

Returns a list of dataframes where each dataframe has the data for each sample.

Returns:

List of dataframes for each sample.

Return type:

List[pd.DataFrame]

property num_samples : int

Return number of samples.

Returns:

Number of samples.

Return type:

int

property num_features : int

Return number of features.

Returns:

Number of features.

Return type:

int

short_repr() str[source]

A short string representation of the object.

Returns:

The short representation.

Return type:

str

category : ClassVar[plugin_typing.PluginCategory] = 'time_series_samples'

Plugin category, such as 'prediction.one_off.classification'. Must be set by the plugin class using @register_plugin.

name : ClassVar[plugin_typing.PluginName] = 'time_series_samples_dask'

Plugin name, such as 'my_nn_classifier'. Must be set by the plugin class using @register_plugin.

plugin_type : ClassVar[plugin_typing.PluginTypeArg] = 'dataformat'

Plugin type, such as 'method'. May be optionally set by the plugin class using @register_plugin, else will set the default plugin type.

class tempor.data.samples_experimental.EventSamplesDask(data: DataFrame | ndarray, **kwargs: Any)[source]

Bases: EventSamplesBase

Create an EventSamplesDask object from the data.

Parameters:
data : data_typing.DataContainer

A container with the data.

**kwargs : Any

Any additional keyword arguments to pass to the constructor.

static from_dataframe(dataframe: DataFrame, **kwargs: Any) EventSamplesDask[source]

Create EventSamples from pandas.DataFrame. The row index of the dataframe should be the sample indexes. The columns should be the features. Each feature should contain a tuple of (time, value) representing the event.

Parameters:
dataframe : pd.DataFrame

The dataframe that contains the data.

**kwargs : Any

Any additional keyword arguments to pass to the constructor.

Returns:

The EventSamplesDask object created from the dataframe.

Return type:

EventSamplesDask

static from_numpy(array: ndarray, **kwargs: Any) EventSamplesDask[source]

Not implemented yet.

numpy(**kwargs: Any) ndarray[source]

Return the data as a numpy.ndarray.

Parameters:
**kwargs : Any

Any additional keyword arguments. Currently unused.

Returns:

The numpy.ndarray.

Return type:

np.ndarray

dataframe(**kwargs: Any) DataFrame[source]

Return the data as a pandas.DataFrame.

Parameters:
**kwargs : Any

Any additional keyword arguments. Currently unused.

Returns:

The pandas.DataFrame.

Return type:

pd.DataFrame

sample_index() list[int] | list[str][source]

Return a list representing sample indexes.

Returns:

Sample indexes.

Return type:

data_typing.SampleIndex

category : ClassVar[plugin_typing.PluginCategory] = 'event_samples'

Plugin category, such as 'prediction.one_off.classification'. Must be set by the plugin class using @register_plugin.

name : ClassVar[plugin_typing.PluginName] = 'event_samples_dask'

Plugin name, such as 'my_nn_classifier'. Must be set by the plugin class using @register_plugin.

property num_samples : int

Return number of samples.

Returns:

Number of samples.

Return type:

int

plugin_type : ClassVar[plugin_typing.PluginTypeArg] = 'dataformat'

Plugin type, such as 'method'. May be optionally set by the plugin class using @register_plugin, else will set the default plugin type.

property num_features : int

Return number of features.

Returns:

Number of features.

Return type:

int

split(time_feature_suffix: str = '_time') DataFrame[source]

Return a pandas.DataFrame where the time component of each event feature has been split off to its own column. The new columns that contain the times will be named "<original column name><time_feature_suffix>" and will be inserted before each corresponding <original column name> column. The <original column name> columns will contain only the event value.

Parameters:
time_feature_suffix : str, optional

A column name suffix string to identify the time columns that will be split off. Defaults to "_time".

Returns:

The output dataframe.

Return type:

pd.DataFrame

split_as_two_dataframes(time_feature_suffix: str = '_time') tuple[DataFrame, DataFrame][source]
Analogous to split() but returns two pandas.DataFrame s:
  • first dataframe contains the event times of each feature.

  • second dataframe contains the event values (True/False) of each feature.

Parameters:
time_feature_suffix : str, optional

A column name suffix string to identify the time columns that will be split off. Defaults to "_time".

Returns:

Two pandas.DataFrame s containing event times and values respectively.

Return type:

Tuple[pd.DataFrame, pd.DataFrame]

short_repr() str[source]

A short string representation of the object.

Returns:

The short representation.

Return type:

str