tempor.benchmarks.evaluation module¶

Module with helpers for evaluating the performance of the methods.

tempor.benchmarks.evaluation.OutputMetric¶

The metric evaluation output statistics / other information about the evaluation cross-validation runs.

Possible values:

"min":
The mix score of the metric
"max":
The max score of the metric
"mean":
The mean score of the metric
"stddev":
The stddev score of the metric
"median":
The median score of the metric
"iqr":
The interquartile range of the metric
"rounds":
Number of folds
"errors":
Number of errors encountered
"durations":
Average duration for the fold evaluation.

alias of Literal[min, max, mean, stddev, median, iqr, rounds, errors, durations]

tempor.benchmarks.evaluation.output_metrics = ('min', 'max', 'mean', 'stddev', 'median', 'iqr', 'rounds', 'errors', 'durations')¶: A tuple of all possible values of OutputMetric.

tempor.benchmarks.evaluation.evaluate_prediction_oneoff_classifier(estimator: Any, data: PredictiveDataset, n_splits: int = 3, random_state: int = 0, raise_exceptions: bool = False, silence_warnings: bool = False, **kwargs: Any) → DataFrame[source]¶

Helper for evaluating classifiers.

Parameters:¶

estimator : Any¶: Baseline model to evaluate - must be unfitted.
data : dataset.PredictiveDataset¶: The dataset.
n_splits : int, optional¶: Cross-validation folds. Defaults to 3.
random_state : int, optional¶: Random state. Defaults to 0.
raise_exceptions : bool, optional¶: Whether to raise exceptions during evaluation. If False, the exceptions will be swallowed and the evaluation will continue - exception count will be reported in the "errors" column of the resultant dataframe. Defaults to False.
silence_warnings : bool, optional¶: Whether to silence warnings raised. Defaults to False.
**kwargs : Any: Currently unused.

Returns:¶

DataFrame containing the results.

The columns of the dataframe contain details about the cross-validation repeats: one column for each OutputMetric.

The index of the dataframe contains all the metrics registered: >>> from tempor import plugin_loader >>> plugin_loader.list(plugin_type=”metric”)[“prediction”][“one_off”][“classification”] […]

Return type:¶

pd.DataFrame

tempor.benchmarks.evaluation.evaluate_prediction_oneoff_regressor(estimator: Any, data: PredictiveDataset, n_splits: int = 3, random_state: int = 0, raise_exceptions: bool = False, silence_warnings: bool = False, **kwargs: Any) → DataFrame[source]¶

Helper for evaluating regression tasks.

Parameters:¶

estimator : Any¶: Baseline model to evaluate - must be unfitted.
data : dataset.PredictiveDataset¶: The dataset.
n_splits : int, optional¶: Cross-validation folds. Defaults to 3.
random_state : int, optional¶: Random state. Defaults to 0.
raise_exceptions : bool, optional¶: Whether to raise exceptions during evaluation. If False, the exceptions will be swallowed and the evaluation will continue - exception count will be reported in the "errors" column of the resultant dataframe. Defaults to False.
silence_warnings : bool, optional¶: Whether to silence warnings raised. Defaults to False.
**kwargs : Any: Currently unused.

Returns:¶

DataFrame containing the results.

The columns of the dataframe contain details about the cross-validation repeats: one column for each OutputMetric.

Return type:¶

pd.DataFrame

tempor.benchmarks.evaluation.evaluate_time_to_event(estimator: Any, data: TimeToEventAnalysisDataset, horizons: list[float] | list[int] | list[Timestamp], n_splits: int = 3, random_state: int = 0, raise_exceptions: bool = False, silence_warnings: bool = False, **kwargs: Any) → DataFrame[source]¶

Helper for evaluating time-to-event tasks.

Parameters:¶

estimator : Any¶: Baseline model to evaluate - must be unfitted
data : dataset.TimeToEventAnalysisDataset¶: The dataset.
horizons : data_typing.TimeIndex¶: Time horizons for making predictions at.
n_splits : int, optional¶: Cross-validation folds. Defaults to 3.
random_state : int, optional¶: Random state. Defaults to 0.
raise_exceptions : bool, optional¶: Whether to raise exceptions during evaluation. If False, the exceptions will be swallowed and the evaluation will continue - exception count will be reported in the "errors" column of the resultant dataframe. Defaults to False.
silence_warnings : bool, optional¶: Whether to silence warnings raised. Defaults to False.
**kwargs : Any: Currently unused.

Returns:¶

DataFrame containing the results.

The columns of the dataframe contain details about the cross-validation repeats: one column for each OutputMetric.

The index of the dataframe contains all the metrics registered: >>> from tempor import plugin_loader >>> plugin_loader.list(plugin_type=”metric”)[“time_to_event”] […]

Return type:¶

pd.DataFrame