tempor.data.pandera_utils module¶

Utilities for pandera validation.

tempor.data.pandera_utils.update_schema(schema: DataFrameSchema, **kwargs: Any) → DataFrameSchema[source]¶

Update a pandera dataframe schema with kwargs.

Parameters:¶

schema : pa.DataFrameSchema¶: pandera dataframe schema.
**kwargs : Any: keyword arguments to update schema with.

Returns:¶

pandera dataframe schema.

Return type:¶

pa.DataFrameSchema

tempor.data.pandera_utils.update_index(index: Index, **kwargs: Any) → Index[source]¶

Update a pandera index with kwargs.

Parameters:¶

index : pa.Index¶: pandera index.
**kwargs : Any: keyword arguments to update index with.

Returns:¶

pandera index.

Return type:¶

pa.Index

tempor.data.pandera_utils.update_multiindex(multi_index: MultiIndex, **kwargs: Any) → MultiIndex[source]¶

Update a pandera multiindex with kwargs.

Parameters:¶

multi_index : pa.MultiIndex¶: pandera multiindex.
**kwargs : Any: keyword arguments to update multi_index with.

Returns:¶

pandera multiindex.

Return type:¶

pa.MultiIndex

tempor.data.pandera_utils.PA_DTYPE_MAP : dict[type | Literal[category] | Literal[datetime], DataType] = {<class 'bool'>: DataType(bool), <class 'int'>: DataType(int), <class 'float'>: DataType(float), <class 'str'>: DataType(string), 'category': DataType(category), 'datetime': DataType(timestamp)}¶: A mapping from dtype specified as Dtype to a pandera.DataType.

tempor.data.pandera_utils.get_pa_dtypes(dtypes: Iterable[type | Literal[category] | Literal[datetime] | DataType]) → list[DataType][source]¶: Return a list of pandera.DataType corresponding to dtypes. Raises KeyError If not found.

class tempor.data.pandera_utils.UnionDtype(dtype: Any)[source]¶

Bases: DataType

Extend pandera DataType s with a custom UnionDtype, which will function similarly to Union.

See pandera DataType [guide](https://pandera.readthedocs.io/en/stable/dtypes.html) for details.

In this case, rather than wrapping the extension DataType with register_dtype and immutable decorators, we apply these directly to the class returned by __class_getitem__, which dynamically creates the union specified with its dtypes. In this way, pandera’s pandas engine correctly registers each new kind of union as a different dtype.

union_dtypes : list¶: The list of types in the union.

name : str¶: The string representation of the data type used for repr.

check(pandera_dtype: DataType, data_container: Any | None = None) → bool | Iterable[bool][source]¶

Checks whether the pandera_dtype and optionally data_container satisfy at least one the union’s union_dtypes.

Parameters:¶

pandera_dtype : pa_dtypes.DataType¶: The data type received as part of the check/validation.
data_container : Any¶: The data container received as part of the check/validation. Defaults to None.

Returns:¶

A bool stating whether the data type is satisfied, or an iterable thereof (for each item in the data_container).

Return type:¶

Union[bool, Iterable[bool]]

coerce(data_container: Any) → NoReturn[source]¶: The coerce method is not supported and will throw a NotImplementedError.

tempor.data.pandera_utils.init_schema(data: DataFrame, **kwargs: Any) → DataFrameSchema[source]¶

Initialize a pandera.DataFrameSchema from data using pandera.infer_schema.

Parameters:¶

data : pd.DataFrame¶: Input dataframe.
**kwargs : Any: Keyword arguments to update the schema with after initialization.

Returns:¶

pandera.DataFrameSchema initialized from data.

Return type:¶

pa.DataFrameSchema

tempor.data.pandera_utils.add_df_checks(schema: DataFrameSchema, *, checks_list: list[Check]) → DataFrameSchema[source]¶

Update schema with pandera checks specified in checks_list.

Parameters:¶

schema : pa.DataFrameSchema¶: DataFrameSchema to add checks to.
checks_list : List[pa.Check]¶: The list of checks.

Returns:¶

DataFrameSchema with checks added.

Return type:¶

pa.DataFrameSchema

tempor.data.pandera_utils.add_regex_column_checks(schema: DataFrameSchema, *, regex: str = '.*', dtype: Any, nullable: bool, checks_list: list[Check] | None = None) → DataFrameSchema[source]¶: Update schema with checks specified in checks_list, applied to all columns specified by regex. dtype and nullable can also be specified and will apply to all columns.

tempor.data.pandera_utils.set_up_index(schema: DataFrameSchema, data: DataFrame, *, dtype: Any, name: str, nullable: bool, unique: bool, coerce: bool, checks_list: list[Check] | None = None) → tuple[DataFrameSchema, DataFrame][source]¶

Update schema.index (pandera.Index) with dtype, name, nullable, … schema settings.

In addition, set the index name of data (pandas.DataFrame) to name.

Returns the schema and the dataframe.

tempor.data.pandera_utils.set_up_2level_multiindex(schema: DataFrameSchema, data: DataFrame, *, dtypes: tuple[Any, Any], names: tuple[str, str], nullable: tuple[bool, bool], coerce: bool, unique: tuple[str, ...], checks_list: tuple[list[Check], list[Check]] | None = None) → tuple[DataFrameSchema, DataFrame][source]¶

Update schema.index (pandera.MultiIndex), which is expected to have 2 levels, with dtypes``, names, nullable, … schema settings.

In addition, set the index name of data (pandas.DataFrame) to name.

Returns the schema and the dataframe.

class tempor.data.pandera_utils.checks[source]¶

Bases: object

Namespace containing reusable pandera.Check s.

forbid_multiindex_index = <Check <lambda>: MultiIndex Index not allowed>¶

forbid_multiindex_columns = <Check <lambda>: MultiIndex Columns not allowed>¶

require_2level_multiindex_index = <Check <lambda>: Index must be a MultiIndex with 2 levels>¶

require_element_len_2 = <Check <lambda>: Each item must contain a sequence of length 2>¶

class configurable[source]¶

Bases: object

Namespace containing functions to get configurable pandera.Check s.

static column_index_satisfies_dtype(dtype: Any, *, nullable: bool) → Check[source]¶

Return a pandera.Check that checks that the column index satisfies dtype. Optionally, also set the nullable attribute.

Parameters:¶

dtype : Any¶: The dtype to check against.
nullable : bool¶: The nullable attribute to set.

Returns:¶

The pandera.Check defined.

Return type:¶

pa.Check