tempor.benchmarks.evaluation module¶
Module with helpers for evaluating the performance of the methods.
- tempor.benchmarks.evaluation.OutputMetric¶
The metric evaluation output statistics / other information about the evaluation cross-validation runs.
- Possible values:
"min":The mix score of the metric
"max":The max score of the metric
"mean":The mean score of the metric
"stddev":The stddev score of the metric
"median":The median score of the metric
"iqr":The interquartile range of the metric
"rounds":Number of folds
"errors":Number of errors encountered
"durations":Average duration for the fold evaluation.
alias of
Literal[min, max, mean, stddev, median, iqr, rounds, errors, durations]
-
tempor.benchmarks.evaluation.output_metrics =
('min', 'max', 'mean', 'stddev', 'median', 'iqr', 'rounds', 'errors', 'durations')¶ A tuple of all possible values of
OutputMetric.
-
tempor.benchmarks.evaluation.evaluate_prediction_oneoff_classifier(estimator: Any, data: PredictiveDataset, n_splits: int =
3, random_state: int =0, raise_exceptions: bool =False, silence_warnings: bool =False, **kwargs: Any) DataFrame[source]¶ Helper for evaluating classifiers.
- Parameters:¶
- estimator : Any¶
Baseline model to evaluate - must be unfitted.
- data : dataset.PredictiveDataset¶
The dataset.
- n_splits : int, optional¶
Cross-validation folds. Defaults to
3.- random_state : int, optional¶
Random state. Defaults to
0.- raise_exceptions : bool, optional¶
Whether to raise exceptions during evaluation. If
False, the exceptions will be swallowed and the evaluation will continue - exception count will be reported in the"errors"column of the resultant dataframe. Defaults toFalse.- silence_warnings : bool, optional¶
Whether to silence warnings raised. Defaults to
False.- **kwargs : Any
Currently unused.
- Returns:¶
DataFrame containing the results.
The columns of the dataframe contain details about the cross-validation repeats: one column for each
OutputMetric.The index of the dataframe contains all the metrics registered: >>> from tempor import plugin_loader >>> plugin_loader.list(plugin_type=”metric”)[“prediction”][“one_off”][“classification”] […]
- Return type:¶
pd.DataFrame
-
tempor.benchmarks.evaluation.evaluate_prediction_oneoff_regressor(estimator: Any, data: PredictiveDataset, n_splits: int =
3, random_state: int =0, raise_exceptions: bool =False, silence_warnings: bool =False, **kwargs: Any) DataFrame[source]¶ Helper for evaluating regression tasks.
- Parameters:¶
- estimator : Any¶
Baseline model to evaluate - must be unfitted.
- data : dataset.PredictiveDataset¶
The dataset.
- n_splits : int, optional¶
Cross-validation folds. Defaults to
3.- random_state : int, optional¶
Random state. Defaults to
0.- raise_exceptions : bool, optional¶
Whether to raise exceptions during evaluation. If
False, the exceptions will be swallowed and the evaluation will continue - exception count will be reported in the"errors"column of the resultant dataframe. Defaults toFalse.- silence_warnings : bool, optional¶
Whether to silence warnings raised. Defaults to
False.- **kwargs : Any
Currently unused.
- Returns:¶
DataFrame containing the results.
The columns of the dataframe contain details about the cross-validation repeats: one column for each
OutputMetric.The index of the dataframe contains all the metrics registered: >>> from tempor import plugin_loader >>> plugin_loader.list(plugin_type=”metric”)[“prediction”][“one_off”][“regression”] […]
- Return type:¶
pd.DataFrame
-
tempor.benchmarks.evaluation.evaluate_time_to_event(estimator: Any, data: TimeToEventAnalysisDataset, horizons: list[float] | list[int] | list[Timestamp], n_splits: int =
3, random_state: int =0, raise_exceptions: bool =False, silence_warnings: bool =False, **kwargs: Any) DataFrame[source]¶ Helper for evaluating time-to-event tasks.
- Parameters:¶
- estimator : Any¶
Baseline model to evaluate - must be unfitted
- data : dataset.TimeToEventAnalysisDataset¶
The dataset.
- horizons : data_typing.TimeIndex¶
Time horizons for making predictions at.
- n_splits : int, optional¶
Cross-validation folds. Defaults to
3.- random_state : int, optional¶
Random state. Defaults to
0.- raise_exceptions : bool, optional¶
Whether to raise exceptions during evaluation. If
False, the exceptions will be swallowed and the evaluation will continue - exception count will be reported in the"errors"column of the resultant dataframe. Defaults toFalse.- silence_warnings : bool, optional¶
Whether to silence warnings raised. Defaults to
False.- **kwargs : Any
Currently unused.
- Returns:¶
DataFrame containing the results.
The columns of the dataframe contain details about the cross-validation repeats: one column for each
OutputMetric.The index of the dataframe contains all the metrics registered: >>> from tempor import plugin_loader >>> plugin_loader.list(plugin_type=”metric”)[“time_to_event”] […]
- Return type:¶
pd.DataFrame