tempor.data.samples_experimental module¶

Module with experimental samples implementations.

class tempor.data.samples_experimental.StaticSamplesDask(data: DataFrame | ndarray, **kwargs: Any)[source]¶

Bases: StaticSamplesBase

Create a StaticSamplesDask object from the data.

Parameters:¶

data : data_typing.DataContainer¶: A container with the data.
**kwargs : Any: Any additional keyword arguments to pass to the constructor.

static from_dataframe(dataframe: DataFrame, **kwargs: Any) → StaticSamplesDask[source]¶

Create StaticSamplesDask from pandas.DataFrame. The rows represent samples, the columns represent features.

Parameters:¶

dataframe : pd.DataFrame¶: The dataframe that represents the data.
**kwargs : Any: Any additional keyword arguments to pass to the constructor.

Returns:¶

StaticSamples object from dataframe.

Return type:¶

StaticSamplesDask

static from_numpy(array: ndarray, *, sample_index: list[int] | list[str] | None = None, feature_index: list[str] | None = None, **kwargs: Any) → StaticSamplesDask[source]¶: Not implemented yet.

numpy(**kwargs: Any) → ndarray[source]¶

Return the data as a numpy.ndarray.

Parameters:¶

**kwargs : Any: Any additional keyword arguments. Currently unused.

Returns:¶

The numpy.ndarray.

Return type:¶

np.ndarray

dataframe(**kwargs: Any) → DataFrame[source]¶

Return the data as a pandas.DataFrame.

Parameters:¶

**kwargs : Any: Any additional keyword arguments. Currently unused.

Returns:¶

The dataframe.

Return type:¶

pd.DataFrame

sample_index() → list[int] | list[str][source]¶

Return a list representing sample indexes.

Returns:¶: Sample indexes.
Return type:¶: data_typing.SampleIndex

property num_samples : int¶

Return number of samples.

Returns:¶: Number of samples.
Return type:¶: int

property num_features : int¶

Return number of features.

Returns:¶: Number of features.
Return type:¶: int

short_repr() → str[source]¶

A short string representation of the object.

Returns:¶: The short representation.
Return type:¶: str

category : ClassVar[plugin_typing.PluginCategory] = 'static_samples'¶: Plugin category, such as 'prediction.one_off.classification'. Must be set by the plugin class using @register_plugin.

name : ClassVar[plugin_typing.PluginName] = 'static_samples_dask'¶: Plugin name, such as 'my_nn_classifier'. Must be set by the plugin class using @register_plugin.

plugin_type : ClassVar[plugin_typing.PluginTypeArg] = 'dataformat'¶: Plugin type, such as 'method'. May be optionally set by the plugin class using @register_plugin, else will set the default plugin type.

tempor.data.samples_experimental.multiindex_df_to_compatible_ddf(df: DataFrame, **kwargs: Any) → DataFrame[source]¶: Convert a multiindex dataframe to a dask dataframe with a single tuple index.

tempor.data.samples_experimental.compatible_ddf_to_multiindex_df(ddf: DataFrame) → DataFrame[source]¶: Convert a dask dataframe with a single tuple index to a multiindex dataframe.

class tempor.data.samples_experimental.TimeSeriesSamplesDask(data: DataFrame | ndarray, **kwargs: Any)[source]¶

Bases: TimeSeriesSamplesBase

Create an TimeSeriesSamplesDask object from the data.

Parameters:¶

data : data_typing.DataContainer¶: A container with the data.
**kwargs : Any: Any additional keyword arguments to pass to the constructor.

static from_dataframe(dataframe: DataFrame, **kwargs: Any) → TimeSeriesSamplesDask[source]¶

Create TimeSeriesSamplesDask from pandas.DataFrame. This row index of the dataframe should be a 2-level multiindex (sample, timestep). The columns should be the features.

Parameters:¶

dataframe : pd.DataFrame¶: The dataframe that contains the data.
**kwargs : Any: Any additional keyword arguments to pass to the constructor.

Returns:¶

The TimeSeriesSamples object created from the dataframe.

Return type:¶

TimeSeriesSamplesDask

static from_numpy(array: ndarray, **kwargs: Any) → TimeSeriesSamplesDask[source]¶: Not implemented yet.

numpy(*, padding_indicator: Any = 999.0, **kwargs: Any) → ndarray[source]¶

Return the data as a numpy.ndarray.

Parameters:¶

padding_indicator : Any, optional¶: Padding indicator value. Defaults to DATA_SETTINGS.default_padding_indicator.
**kwargs : Any: Any additional keyword arguments. Currently unused.

Returns:¶

The numpy.ndarray.

Return type:¶

np.ndarray

dataframe(**kwargs: Any) → DataFrame[source]¶

Return the data as a pandas.DataFrame.

Parameters:¶

**kwargs : Any: Any additional keyword arguments. Currently unused.

Returns:¶

The pandas.DataFrame.

Return type:¶

pd.DataFrame

sample_index() → list[int] | list[str][source]¶

Get a list containing sample indexes.

Returns:¶: A list containing sample indexes.
Return type:¶: data_typing.SampleIndex

time_indexes() → list[list[float]] | list[list[int]] | list[list[Timestamp]][source]¶

Get a list containing time indexes for each sample. Each time index is represented as a list of time step elements.

Returns:¶: A list containing time indexes for each sample.
Return type:¶: data_typing.TimeIndexList

Get a dictionary mapping each sample index to its time index. Time index is represented as a list of time step elements.

Returns:¶: The dictionary mapping each sample index to its time index.
Return type:¶: data_typing.SampleToTimeIndexDict

time_indexes_float() → list[ndarray][source]¶

Return time indexes but converting their elements to float values.

Date-time time index will be converted using datetime_time_index_to_float.

Returns:¶: List of 1D numpy.ndarray s of float values, corresponding to the time index.
Return type:¶: List[np.ndarray]

num_timesteps() → list[int][source]¶

Get the number of timesteps for each sample.

Returns:¶: List containing the number of timesteps for each sample.
Return type:¶: List[int]

num_timesteps_as_dict() → dict[int, int] | dict[str, int][source]¶

Get a dictionary mapping each sample index to its the number of timesteps.

Returns:¶: List containing the number of timesteps for each sample.
Return type:¶: data_typing.SampleToNumTimestepsDict

num_timesteps_equal() → bool[source]¶

Returns True if all samples share the same number of timesteps, False otherwise.

Returns:¶: whether all samples share the same number of timesteps.
Return type:¶: bool

list_of_dataframes() → list[DataFrame][source]¶

Returns a list of dataframes where each dataframe has the data for each sample.

Returns:¶: List of dataframes for each sample.
Return type:¶: List[pd.DataFrame]

property num_samples : int¶

Return number of samples.

Returns:¶: Number of samples.
Return type:¶: int

property num_features : int¶

Return number of features.

Returns:¶: Number of features.
Return type:¶: int

short_repr() → str[source]¶

A short string representation of the object.

Returns:¶: The short representation.
Return type:¶: str

category : ClassVar[plugin_typing.PluginCategory] = 'time_series_samples'¶: Plugin category, such as 'prediction.one_off.classification'. Must be set by the plugin class using @register_plugin.

name : ClassVar[plugin_typing.PluginName] = 'time_series_samples_dask'¶: Plugin name, such as 'my_nn_classifier'. Must be set by the plugin class using @register_plugin.

plugin_type : ClassVar[plugin_typing.PluginTypeArg] = 'dataformat'¶: Plugin type, such as 'method'. May be optionally set by the plugin class using @register_plugin, else will set the default plugin type.

class tempor.data.samples_experimental.EventSamplesDask(data: DataFrame | ndarray, **kwargs: Any)[source]¶

Bases: EventSamplesBase

Create an EventSamplesDask object from the data.

Parameters:¶

data : data_typing.DataContainer¶: A container with the data.
**kwargs : Any: Any additional keyword arguments to pass to the constructor.

static from_dataframe(dataframe: DataFrame, **kwargs: Any) → EventSamplesDask[source]¶

Create EventSamples from pandas.DataFrame. The row index of the dataframe should be the sample indexes. The columns should be the features. Each feature should contain a tuple of (time, value) representing the event.

Parameters:¶

dataframe : pd.DataFrame¶: The dataframe that contains the data.
**kwargs : Any: Any additional keyword arguments to pass to the constructor.

Returns:¶

The EventSamplesDask object created from the dataframe.

Return type:¶

EventSamplesDask

static from_numpy(array: ndarray, **kwargs: Any) → EventSamplesDask[source]¶: Not implemented yet.

numpy(**kwargs: Any) → ndarray[source]¶

Return the data as a numpy.ndarray.

Parameters:¶

**kwargs : Any: Any additional keyword arguments. Currently unused.

Returns:¶

The numpy.ndarray.

Return type:¶

np.ndarray

dataframe(**kwargs: Any) → DataFrame[source]¶

Return the data as a pandas.DataFrame.

Parameters:¶

**kwargs : Any: Any additional keyword arguments. Currently unused.

Returns:¶

The pandas.DataFrame.

Return type:¶

pd.DataFrame

sample_index() → list[int] | list[str][source]¶

Return a list representing sample indexes.

Returns:¶: Sample indexes.
Return type:¶: data_typing.SampleIndex

category : ClassVar[plugin_typing.PluginCategory] = 'event_samples'¶: Plugin category, such as 'prediction.one_off.classification'. Must be set by the plugin class using @register_plugin.

name : ClassVar[plugin_typing.PluginName] = 'event_samples_dask'¶: Plugin name, such as 'my_nn_classifier'. Must be set by the plugin class using @register_plugin.

property num_samples : int¶

Return number of samples.

Returns:¶: Number of samples.
Return type:¶: int

plugin_type : ClassVar[plugin_typing.PluginTypeArg] = 'dataformat'¶: Plugin type, such as 'method'. May be optionally set by the plugin class using @register_plugin, else will set the default plugin type.

property num_features : int¶

Return number of features.

Returns:¶: Number of features.
Return type:¶: int

split(time_feature_suffix: str = '_time') → DataFrame[source]¶

Return a pandas.DataFrame where the time component of each event feature has been split off to its own column. The new columns that contain the times will be named "<original column name><time_feature_suffix>" and will be inserted before each corresponding <original column name> column. The <original column name> columns will contain only the event value.

Parameters:¶

time_feature_suffix : str, optional¶: A column name suffix string to identify the time columns that will be split off. Defaults to "_time".

Returns:¶

The output dataframe.

Return type:¶

pd.DataFrame

split_as_two_dataframes(time_feature_suffix: str = '_time') → tuple[DataFrame, DataFrame][source]¶

Analogous to split() but returns two pandas.DataFrame s:

first dataframe contains the event times of each feature.
second dataframe contains the event values (True/False) of each feature.

Parameters:¶

time_feature_suffix : str, optional¶: A column name suffix string to identify the time columns that will be split off. Defaults to "_time".

Returns:¶

Two pandas.DataFrame s containing event times and values respectively.

Return type:¶

Tuple[pd.DataFrame, pd.DataFrame]

short_repr() → str[source]¶

A short string representation of the object.

Returns:¶: The short representation.
Return type:¶: str