tempor.benchmarks.evaluation module

Module with helpers for evaluating the performance of the methods.

tempor.benchmarks.evaluation.OutputMetric

The metric evaluation output statistics / other information about the evaluation cross-validation runs.

Possible values:
  • "min":

    The mix score of the metric

  • "max":

    The max score of the metric

  • "mean":

    The mean score of the metric

  • "stddev":

    The stddev score of the metric

  • "median":

    The median score of the metric

  • "iqr":

    The interquartile range of the metric

  • "rounds":

    Number of folds

  • "errors":

    Number of errors encountered

  • "durations":

    Average duration for the fold evaluation.

alias of Literal[min, max, mean, stddev, median, iqr, rounds, errors, durations]

tempor.benchmarks.evaluation.output_metrics = ('min', 'max', 'mean', 'stddev', 'median', 'iqr', 'rounds', 'errors', 'durations')

A tuple of all possible values of OutputMetric.

tempor.benchmarks.evaluation.evaluate_prediction_oneoff_classifier(estimator: Any, data: PredictiveDataset, n_splits: int = 3, random_state: int = 0, raise_exceptions: bool = False, silence_warnings: bool = False, **kwargs: Any) DataFrame[source]

Helper for evaluating classifiers.

Parameters:
estimator : Any

Baseline model to evaluate - must be unfitted.

data : dataset.PredictiveDataset

The dataset.

n_splits : int, optional

Cross-validation folds. Defaults to 3.

random_state : int, optional

Random state. Defaults to 0.

raise_exceptions : bool, optional

Whether to raise exceptions during evaluation. If False, the exceptions will be swallowed and the evaluation will continue - exception count will be reported in the "errors" column of the resultant dataframe. Defaults to False.

silence_warnings : bool, optional

Whether to silence warnings raised. Defaults to False.

**kwargs : Any

Currently unused.

Returns:

DataFrame containing the results.

The columns of the dataframe contain details about the cross-validation repeats: one column for each OutputMetric.

The index of the dataframe contains all the metrics registered: >>> from tempor import plugin_loader >>> plugin_loader.list(plugin_type=”metric”)[“prediction”][“one_off”][“classification”] […]

Return type:

pd.DataFrame

tempor.benchmarks.evaluation.evaluate_prediction_oneoff_regressor(estimator: Any, data: PredictiveDataset, n_splits: int = 3, random_state: int = 0, raise_exceptions: bool = False, silence_warnings: bool = False, **kwargs: Any) DataFrame[source]

Helper for evaluating regression tasks.

Parameters:
estimator : Any

Baseline model to evaluate - must be unfitted.

data : dataset.PredictiveDataset

The dataset.

n_splits : int, optional

Cross-validation folds. Defaults to 3.

random_state : int, optional

Random state. Defaults to 0.

raise_exceptions : bool, optional

Whether to raise exceptions during evaluation. If False, the exceptions will be swallowed and the evaluation will continue - exception count will be reported in the "errors" column of the resultant dataframe. Defaults to False.

silence_warnings : bool, optional

Whether to silence warnings raised. Defaults to False.

**kwargs : Any

Currently unused.

Returns:

DataFrame containing the results.

The columns of the dataframe contain details about the cross-validation repeats: one column for each OutputMetric.

The index of the dataframe contains all the metrics registered: >>> from tempor import plugin_loader >>> plugin_loader.list(plugin_type=”metric”)[“prediction”][“one_off”][“regression”] […]

Return type:

pd.DataFrame

tempor.benchmarks.evaluation.evaluate_time_to_event(estimator: Any, data: TimeToEventAnalysisDataset, horizons: list[float] | list[int] | list[Timestamp], n_splits: int = 3, random_state: int = 0, raise_exceptions: bool = False, silence_warnings: bool = False, **kwargs: Any) DataFrame[source]

Helper for evaluating time-to-event tasks.

Parameters:
estimator : Any

Baseline model to evaluate - must be unfitted

data : dataset.TimeToEventAnalysisDataset

The dataset.

horizons : data_typing.TimeIndex

Time horizons for making predictions at.

n_splits : int, optional

Cross-validation folds. Defaults to 3.

random_state : int, optional

Random state. Defaults to 0.

raise_exceptions : bool, optional

Whether to raise exceptions during evaluation. If False, the exceptions will be swallowed and the evaluation will continue - exception count will be reported in the "errors" column of the resultant dataframe. Defaults to False.

silence_warnings : bool, optional

Whether to silence warnings raised. Defaults to False.

**kwargs : Any

Currently unused.

Returns:

DataFrame containing the results.

The columns of the dataframe contain details about the cross-validation repeats: one column for each OutputMetric.

The index of the dataframe contains all the metrics registered: >>> from tempor import plugin_loader >>> plugin_loader.list(plugin_type=”metric”)[“time_to_event”] […]

Return type:

pd.DataFrame