tempor.data.pandera_utils module

Utilities for pandera validation.

tempor.data.pandera_utils.update_schema(schema: DataFrameSchema, **kwargs: Any) DataFrameSchema[source]

Update a pandera dataframe schema with kwargs.

Parameters:
schema : pa.DataFrameSchema

pandera dataframe schema.

**kwargs : Any

keyword arguments to update schema with.

Returns:

pandera dataframe schema.

Return type:

pa.DataFrameSchema

tempor.data.pandera_utils.update_index(index: Index, **kwargs: Any) Index[source]

Update a pandera index with kwargs.

Parameters:
index : pa.Index

pandera index.

**kwargs : Any

keyword arguments to update index with.

Returns:

pandera index.

Return type:

pa.Index

tempor.data.pandera_utils.update_multiindex(multi_index: MultiIndex, **kwargs: Any) MultiIndex[source]

Update a pandera multiindex with kwargs.

Parameters:
multi_index : pa.MultiIndex

pandera multiindex.

**kwargs : Any

keyword arguments to update multi_index with.

Returns:

pandera multiindex.

Return type:

pa.MultiIndex

tempor.data.pandera_utils.PA_DTYPE_MAP : dict[type | Literal[category] | Literal[datetime], DataType] = {<class 'bool'>: DataType(bool), <class 'int'>: DataType(int), <class 'float'>: DataType(float), <class 'str'>: DataType(string), 'category': DataType(category), 'datetime': DataType(timestamp)}

A mapping from dtype specified as Dtype to a pandera.DataType.

tempor.data.pandera_utils.get_pa_dtypes(dtypes: Iterable[type | Literal[category] | Literal[datetime] | DataType]) list[DataType][source]

Return a list of pandera.DataType corresponding to dtypes. Raises KeyError If not found.

class tempor.data.pandera_utils.UnionDtype(dtype: Any)[source]

Bases: DataType

Extend pandera DataType s with a custom UnionDtype, which will function similarly to Union.

See pandera DataType [guide](https://pandera.readthedocs.io/en/stable/dtypes.html) for details.

In this case, rather than wrapping the extension DataType with register_dtype and immutable decorators, we apply these directly to the class returned by __class_getitem__, which dynamically creates the union specified with its dtypes. In this way, pandera’s pandas engine correctly registers each new kind of union as a different dtype.

union_dtypes : list

The list of types in the union.

name : str

The string representation of the data type used for repr.

check(pandera_dtype: DataType, data_container: Any | None = None) bool | Iterable[bool][source]

Checks whether the pandera_dtype and optionally data_container satisfy at least one the union’s union_dtypes.

Parameters:
pandera_dtype : pa_dtypes.DataType

The data type received as part of the check/validation.

data_container : Any

The data container received as part of the check/validation. Defaults to None.

Returns:

A bool stating whether the data type is satisfied, or an iterable thereof (for each item in the data_container).

Return type:

Union[bool, Iterable[bool]]

coerce(data_container: Any) NoReturn[source]

The coerce method is not supported and will throw a NotImplementedError.

tempor.data.pandera_utils.init_schema(data: DataFrame, **kwargs: Any) DataFrameSchema[source]

Initialize a pandera.DataFrameSchema from data using pandera.infer_schema.

Parameters:
data : pd.DataFrame

Input dataframe.

**kwargs : Any

Keyword arguments to update the schema with after initialization.

Returns:

pandera.DataFrameSchema initialized from data.

Return type:

pa.DataFrameSchema

tempor.data.pandera_utils.add_df_checks(schema: DataFrameSchema, *, checks_list: list[Check]) DataFrameSchema[source]

Update schema with pandera checks specified in checks_list.

Parameters:
schema : pa.DataFrameSchema

DataFrameSchema to add checks to.

checks_list : List[pa.Check]

The list of checks.

Returns:

DataFrameSchema with checks added.

Return type:

pa.DataFrameSchema

tempor.data.pandera_utils.add_regex_column_checks(schema: DataFrameSchema, *, regex: str = '.*', dtype: Any, nullable: bool, checks_list: list[Check] | None = None) DataFrameSchema[source]

Update schema with checks specified in checks_list, applied to all columns specified by regex. dtype and nullable can also be specified and will apply to all columns.

tempor.data.pandera_utils.set_up_index(schema: DataFrameSchema, data: DataFrame, *, dtype: Any, name: str, nullable: bool, unique: bool, coerce: bool, checks_list: list[Check] | None = None) tuple[DataFrameSchema, DataFrame][source]

Update schema.index (pandera.Index) with dtype, name, nullable, … schema settings.

In addition, set the index name of data (pandas.DataFrame) to name.

Returns the schema and the dataframe.

tempor.data.pandera_utils.set_up_2level_multiindex(schema: DataFrameSchema, data: DataFrame, *, dtypes: tuple[Any, Any], names: tuple[str, str], nullable: tuple[bool, bool], coerce: bool, unique: tuple[str, ...], checks_list: tuple[list[Check], list[Check]] | None = None) tuple[DataFrameSchema, DataFrame][source]

Update schema.index (pandera.MultiIndex), which is expected to have 2 levels, with dtypes``, names, nullable, … schema settings.

In addition, set the index name of data (pandas.DataFrame) to name.

Returns the schema and the dataframe.

class tempor.data.pandera_utils.checks[source]

Bases: object

Namespace containing reusable pandera.Check s.

forbid_multiindex_index = <Check <lambda>: MultiIndex Index not allowed>
forbid_multiindex_columns = <Check <lambda>: MultiIndex Columns not allowed>
require_2level_multiindex_index = <Check <lambda>: Index must be a MultiIndex with 2 levels>
require_element_len_2 = <Check <lambda>: Each item must contain a sequence of length 2>
class configurable[source]

Bases: object

Namespace containing functions to get configurable pandera.Check s.

static column_index_satisfies_dtype(dtype: Any, *, nullable: bool) Check[source]

Return a pandera.Check that checks that the column index satisfies dtype. Optionally, also set the nullable attribute.

Parameters:
dtype : Any

The dtype to check against.

nullable : bool

The nullable attribute to set.

Returns:

The pandera.Check defined.

Return type:

pa.Check