tempor.data.samples module¶
Data handling for different data samples modalities supported by TemporAI.
- class tempor.data.samples.DataSamples(data: DataFrame | ndarray, **kwargs: Any)[source]¶
-
The abstract base class for all data samples classes.
- Parameters:¶
- data : data_typing.DataContainer¶
The data container.
- **kwargs : Any
Any additional keyword arguments.
- abstract property modality : DataModality¶
Return the data modality enum corresponding to the class
- validate() None[source]¶
Validate the data contained.
- Raises:¶
tempor.exc.DataValidationException – Raised if data validation fails.
-
abstract static from_numpy(array: ndarray, *, sample_index: list[int] | list[str] | None =
None, feature_index: list[str] | None =None, **kwargs: Any) DataSamples[source]¶ Create
DataSamplesfromnumpy.ndarray.- Parameters:¶
- array : np.ndarray¶
The array that represents the data.
- sample_index : Optional[data_typing.SampleIndex], optional¶
List with sample (row) index for each sample. Optional, if
None, will be of form[0, 1, ...]. Defaults toNone.- feature_index : Optional[data_typing.FeatureIndex], optional¶
List with feature (column) index for each feature. Optional, if
None, will be of form["feat_0", "feat_1", ...]. Defaults toNone.- **kwargs : Any
Any additional keyword arguments.
- Returns:¶
DataSamplesobject fromarray.- Return type:¶
- abstract static from_dataframe(dataframe: DataFrame, **kwargs: Any) DataSamples[source]¶
Create
DataSamplesfrompandas.DataFrame.
- abstract numpy(**kwargs: Any) ndarray[source]¶
Return
numpy.ndarrayrepresentation of the data.
- abstract dataframe(**kwargs: Any) DataFrame[source]¶
Return
pandas.DataFramerepresentation of the data.
- class tempor.data.samples.StaticSamplesBase(data: DataFrame | ndarray, **kwargs: Any)[source]¶
Bases:
DataSamplesThe abstract base class for all data samples classes.
- Parameters:¶
- data : data_typing.DataContainer¶
The data container.
- **kwargs : Any
Any additional keyword arguments.
- property modality : DataModality¶
Return the data modality enum corresponding to the class. Here,
STATIC.
- class tempor.data.samples.TimeSeriesSamplesBase(data: DataFrame | ndarray, **kwargs: Any)[source]¶
Bases:
DataSamplesThe abstract base class for all data samples classes.
- Parameters:¶
- data : data_typing.DataContainer¶
The data container.
- **kwargs : Any
Any additional keyword arguments.
- property modality : DataModality¶
Return the data modality enum corresponding to the class. Here,
TIME_SERIES.
- abstract time_indexes() list[list[float]] | list[list[int]] | list[list[Timestamp]][source]¶
Get a list containing time indexes for each sample. Each time index is represented as a list of time step elements.
- abstract time_indexes_as_dict() dict[int, list[float] | list[int] | list[Timestamp]] | dict[str, list[float] | list[int] | list[Timestamp]][source]¶
Get a dictionary mapping each sample index to its time index. Time index is represented as a list of time step elements.
- abstract time_indexes_float() list[ndarray][source]¶
Return time indexes but converting their elements to
floatvalues.Date-time time index will be converted using
datetime_time_index_to_float.- Returns:¶
List of 1D
numpy.ndarrays offloatvalues, corresponding to the time index.- Return type:¶
List[np.ndarray]
- abstract num_timesteps_as_dict() dict[int, int] | dict[str, int][source]¶
Get a dictionary mapping each sample index to its the number of timesteps.
- class tempor.data.samples.EventSamplesBase(data: DataFrame | ndarray, **kwargs: Any)[source]¶
Bases:
DataSamplesThe abstract base class for all data samples classes.
- Parameters:¶
- data : data_typing.DataContainer¶
The data container.
- **kwargs : Any
Any additional keyword arguments.
- property modality : DataModality¶
Return the data modality enum corresponding to the class. Here,
EVENT.
-
abstract split(time_feature_suffix: str =
'_time') DataFrame[source]¶ Return a
pandas.DataFramewhere the time component of each event feature has been split off to its own column. The new columns that contain the times will be named"<original column name><time_feature_suffix>"and will be inserted before each corresponding<original column name>column. The<original column name>columns will contain only the event value.
-
abstract split_as_two_dataframes(time_feature_suffix: str =
'_time') tuple[DataFrame, DataFrame][source]¶ - Analogous to
split()but returns twopandas.DataFrames:
- Analogous to
-
class tempor.data.samples.StaticSamples(data: DataFrame | ndarray, *, sample_index: list[int] | list[str] | None =
None, feature_index: list[str] | None =None, **kwargs: Any)[source]¶ Bases:
StaticSamplesBaseCreate a
StaticSamplesobject from thedata.- Parameters:¶
- data : data_typing.DataContainer¶
A container with the data.
- sample_index : Optional[data_typing.SampleIndex], optional¶
Used only if
datais anumpy.ndarray. List with sample (row) index for each sample. Optional, ifNone, will be of form[0, 1, ...]. Defaults toNone.- feature_index : Optional[data_typing.FeatureIndex], optional¶
Used only if
datais anumpy.ndarray. List with feature (column) index for each feature. Optional, ifNone, will be of form["feat_0", "feat_1", ...]. Defaults toNone.- **kwargs : Any
Any additional keyword arguments to pass to the constructor.
- static from_dataframe(dataframe: DataFrame, **kwargs: Any) StaticSamples[source]¶
Create
StaticSamplesfrompandas.DataFrame. The rows represent samples, the columns represent features.- Parameters:¶
- dataframe : pd.DataFrame¶
The dataframe that represents the data.
- **kwargs : Any
Any additional keyword arguments to pass to the constructor.
- Returns:¶
StaticSamplesobject fromdataframe.- Return type:¶
-
static from_numpy(array: ndarray, *, sample_index: list[int] | list[str] | None =
None, feature_index: list[str] | None =None, **kwargs: Any) StaticSamples[source]¶ Create
StaticSamplesfromnumpy.ndarray. The 0th dimension represents samples, the 1st dimension represents features.- Parameters:¶
- array : np.ndarray¶
The array with the data.
- sample_index : Optional[data_typing.SampleIndex], optional¶
Sample indices to assign. Defaults to None.
- feature_index : Optional[data_typing.FeatureIndex], optional¶
Feature indices to assign. Defaults to None.
- **kwargs : Any
Any additional keyword arguments to pass to the constructor.
- Returns:¶
StaticSamplesobject created from thearray.- Return type:¶
- numpy(**kwargs: Any) ndarray[source]¶
Return the data as a
numpy.ndarray.- Parameters:¶
- **kwargs : Any
Any additional keyword arguments. Currently unused.
- Returns:¶
The
numpy.ndarray.- Return type:¶
np.ndarray
- dataframe(**kwargs: Any) DataFrame[source]¶
Return the data as a
pandas.DataFrame.
-
category : ClassVar[plugin_typing.PluginCategory] =
'static_samples'¶ Plugin category, such as
'prediction.one_off.classification'. Must be set by the plugin class using@register_plugin.
-
name : ClassVar[plugin_typing.PluginName] =
'static_samples_df'¶ Plugin name, such as
'my_nn_classifier'. Must be set by the plugin class using@register_plugin.
-
plugin_type : ClassVar[plugin_typing.PluginTypeArg] =
'dataformat'¶ Plugin type, such as
'method'. May be optionally set by the plugin class using@register_plugin, else will set the default plugin type.
- tempor.data.samples.workaround_pandera_pd2_1_0_multiindex_compatibility(schema: DataFrameSchema, data: DataFrame) Generator[source]¶
A version compatibility issue exists between pandera and pandas 2.1.0, as reported here: https://github.com/unionai-oss/pandera/issues/1328
The error pertains to multiindex uniqueness validation giving an unexpected error.
This is a workaround that will “manually” throw an error that is expected from pandera.
-
class tempor.data.samples.TimeSeriesSamples(data: DataFrame | ndarray, *, padding_indicator: Any =
None, sample_index: list[int] | list[str] | None =None, time_indexes: list[list[float]] | list[list[int]] | list[list[Timestamp]] | None =None, feature_index: list[str] | None =None, **kwargs: Any)[source]¶ Bases:
TimeSeriesSamplesBaseCreate a
TimeSeriesSamplesobject from thedata.If
datais apandas.DataFrame, this should be a 2-level multiindex (sample, timestep) dataframe.If
datais anumpy.ndarray, this should be a 3D array, with dimensions(sample, timestep, feature). Optionally, padding values ofpadding_indicatorcan be set inside the array to pad out the length of arrays of different samples in case they differ. Padding needs to go at the end of the timesteps (dim 1). Padding must be the same across the feature dimension (dim 2) for each sample.- Parameters:¶
- data : data_typing.DataContainer¶
A container with the data.
- padding_indicator : Any, optional¶
Padding indicator used in
datato indicate padding. Defaults toNone.- sample_index : Optional[data_typing.SampleIndex], optional¶
Used only if
datais anumpy.ndarray. List with sample (row) index for each sample. Optional, ifNone, will be of form[0, 1, ...]. Defaults toNone.- time_indexes : Optional[data_typing.TimeIndexList], optional¶
Used only if
datais anumpy.ndarray. List of lists containing timesteps for each sample (outer list should be the same length as dim 0 ofdata, inner list should contain as many elements as each sample has timesteps). Optional, ifNone, will be of form[[0, 1, ...], [0, 1, ...], ...]Defaults toNone.- feature_index : Optional[data_typing.FeatureIndex], optional¶
Used only if
datais anumpy.ndarray. List with feature (column) index for each feature. Optional, ifNone, will be of form["feat_0", "feat_1", ...]. Defaults toNone.- **kwargs : Any
Any additional keyword arguments to pass to the constructor.
- static from_dataframe(dataframe: DataFrame, **kwargs: Any) TimeSeriesSamples[source]¶
Create
TimeSeriesSamplesfrompandas.DataFrame. This row index of the dataframe should be a 2-level multiindex (sample, timestep). The columns should be the features.- Parameters:¶
- dataframe : pd.DataFrame¶
The dataframe that contains the data.
- **kwargs : Any
Any additional keyword arguments to pass to the constructor.
- Returns:¶
The
TimeSeriesSamplesobject created from thedataframe.- Return type:¶
-
static from_numpy(array: ndarray, *, padding_indicator: Any | None =
None, sample_index: list[int] | list[str] | None =None, time_indexes: list[list[float]] | list[list[int]] | list[list[Timestamp]] | None =None, feature_index: list[str] | None =None, **kwargs: Any) TimeSeriesSamples[source]¶ Create
TimeSeriesSamplesfromnumpy.ndarray.This should be a 3D array, with dimensions
(sample, timestep, feature).Optionally, padding values of
padding_indicatorcan be set inside the array to pad out the length of arrays of different samples in case they differ. Padding needs to go at the end of the timesteps (dim 1). Padding must be the same across the feature dimension (dim 2) for each sample.- Parameters:¶
- array : np.ndarray¶
The array that contains the data.
- padding_indicator : Any, optional¶
The padding indicator value. Defaults to
None.- sample_index : Optional[data_typing.SampleIndex], optional¶
Sample indexes as a list. Defaults to
None.- time_indexes : Optional[data_typing.TimeIndexList], optional¶
Time indexes as a list of list (that is, time indexes per sample). Defaults to
None.- feature_index : Optional[data_typing.FeatureIndex], optional¶
Feature indexes as a list. Defaults to
None.- **kwargs : Any
Any additional keyword arguments.
- Returns:¶
The
TimeSeriesSamplesobject created from thearray.- Return type:¶
-
numpy(*, padding_indicator: Any =
999.0, **kwargs: Any) ndarray[source]¶ Return the data as a
numpy.ndarray.- Parameters:¶
- padding_indicator : Any, optional¶
Padding indicator value. Defaults to
DATA_SETTINGS.default_padding_indicator.- **kwargs : Any
Any additional keyword arguments. Currently unused.
- Returns:¶
The
numpy.ndarray.- Return type:¶
np.ndarray
- dataframe(**kwargs: Any) DataFrame[source]¶
Return the data as a
pandas.DataFrame.- Parameters:¶
- **kwargs : Any
Any additional keyword arguments. Currently unused.
- Returns:¶
The
pandas.DataFrame.- Return type:¶
pd.DataFrame
- time_indexes() list[list[float]] | list[list[int]] | list[list[Timestamp]][source]¶
Get a list containing time indexes for each sample. Each time index is represented as a list of time step elements.
- time_indexes_as_dict() dict[int, list[float] | list[int] | list[Timestamp]] | dict[str, list[float] | list[int] | list[Timestamp]][source]¶
Get a dictionary mapping each sample index to its time index. Time index is represented as a list of time step elements.
- time_indexes_float() list[ndarray][source]¶
Return time indexes but converting their elements to
floatvalues.Date-time time index will be converted using
datetime_time_index_to_float.- Returns:¶
List of 1D
numpy.ndarrays offloatvalues, corresponding to the time index.- Return type:¶
List[np.ndarray]
- num_timesteps_as_dict() dict[int, int] | dict[str, int][source]¶
Get a dictionary mapping each sample index to its the number of timesteps.
- num_timesteps_equal() bool[source]¶
Returns
Trueif all samples share the same number of timesteps,Falseotherwise.
- list_of_dataframes() list[DataFrame][source]¶
Returns a list of dataframes where each dataframe has the data for each sample.
-
category : ClassVar[plugin_typing.PluginCategory] =
'time_series_samples'¶ Plugin category, such as
'prediction.one_off.classification'. Must be set by the plugin class using@register_plugin.
-
name : ClassVar[plugin_typing.PluginName] =
'time_series_samples_df'¶ Plugin name, such as
'my_nn_classifier'. Must be set by the plugin class using@register_plugin.
-
plugin_type : ClassVar[plugin_typing.PluginTypeArg] =
'dataformat'¶ Plugin type, such as
'method'. May be optionally set by the plugin class using@register_plugin, else will set the default plugin type.
-
class tempor.data.samples.EventSamples(data: DataFrame | ndarray, *, sample_index: list[int] | list[str] | None =
None, feature_index: list[str] | None =None, **kwargs: Any)[source]¶ Bases:
EventSamplesBaseCreate an
EventSamplesobject from thedata.- Parameters:¶
- data : data_typing.DataContainer¶
A container with the data.
- sample_index : Optional[data_typing.SampleIndex], optional¶
Used only if
datais anumpy.ndarray. List with sample (row) index for each sample. Optional, ifNone, will be of form[0, 1, ...]. Defaults toNone.- feature_index : Optional[data_typing.FeatureIndex], optional¶
Used only if
datais anumpy.ndarray. List with feature (column) index for each feature. Optional, ifNone, will be of form["feat_0", "feat_1", ...]. Defaults toNone.- **kwargs : Any
Any additional keyword arguments to pass to the constructor.
- static from_dataframe(dataframe: DataFrame, **kwargs: Any) EventSamples[source]¶
Create
EventSamplesfrompandas.DataFrame. The row index of the dataframe should be the sample indexes. The columns should be the features. Each feature should contain a tuple of(time, value)representing the event.- Parameters:¶
- dataframe : pd.DataFrame¶
The dataframe that contains the data.
- **kwargs : Any
Any additional keyword arguments to pass to the constructor.
- Returns:¶
The
EventSamplesobject created from thedataframe.- Return type:¶
-
static from_numpy(array: ndarray, *, sample_index: list[int] | list[str] | None =
None, feature_index: list[str] | None =None, **kwargs: Any) EventSamples[source]¶ Create
EventSamplesfromnumpy.ndarray. The array should be a 2D array, with dimensions(sample, feature). Each element should contain a tuple of(time, value)representing the event.- Parameters:¶
- array : np.ndarray¶
The array that contains the data.
- sample_index : Optional[data_typing.SampleIndex], optional¶
Sample indexes. Defaults to
None.- feature_index : Optional[data_typing.FeatureIndex], optional¶
Feature index. Defaults to
None.- **kwargs : Any
Any additional keyword arguments to pass to the constructor.
- Returns:¶
The
EventSamplesobject created from thearray.- Return type:¶
- numpy(**kwargs: Any) ndarray[source]¶
Return the data as a
numpy.ndarray.- Parameters:¶
- **kwargs : Any
Any additional keyword arguments. Currently unused.
- Returns:¶
The
numpy.ndarray.- Return type:¶
np.ndarray
- dataframe(**kwargs: Any) DataFrame[source]¶
Return the data as a
pandas.DataFrame.- Parameters:¶
- **kwargs : Any
Any additional keyword arguments. Currently unused.
- Returns:¶
The
pandas.DataFrame.- Return type:¶
pd.DataFrame
-
category : ClassVar[plugin_typing.PluginCategory] =
'event_samples'¶ Plugin category, such as
'prediction.one_off.classification'. Must be set by the plugin class using@register_plugin.
-
name : ClassVar[plugin_typing.PluginName] =
'event_samples_df'¶ Plugin name, such as
'my_nn_classifier'. Must be set by the plugin class using@register_plugin.
-
plugin_type : ClassVar[plugin_typing.PluginTypeArg] =
'dataformat'¶ Plugin type, such as
'method'. May be optionally set by the plugin class using@register_plugin, else will set the default plugin type.
-
split(time_feature_suffix: str =
'_time') DataFrame[source]¶ Return a
pandas.DataFramewhere the time component of each event feature has been split off to its own column. The new columns that contain the times will be named"<original column name><time_feature_suffix>"and will be inserted before each corresponding<original column name>column. The<original column name>columns will contain only the event value.
-
split_as_two_dataframes(time_feature_suffix: str =
'_time') tuple[DataFrame, DataFrame][source]¶ - Analogous to
split()but returns twopandas.DataFrames:
- Analogous to