tempor.benchmarks.benchmark module¶
The main benchmarking module.
- tempor.benchmarks.benchmark.print_score(mean: Series, std: Series) Series[source]¶
Print the mean and standard deviation of a metric in a human-readable format.
-
tempor.benchmarks.benchmark.benchmark_models(task_type: Literal[prediction.one_off.classification] | Literal[prediction.one_off.regression] | Literal[prediction.temporal.classification] | Literal[prediction.temporal.regression] | Literal[time_to_event] | Literal[treatments.one_off.classification] | Literal[treatments.one_off.regression] | Literal[treatments.temporal.classification] | Literal[treatments.temporal.regression], tests: list[tuple[str, Any]], data: PredictiveDataset, n_splits: int =
3, random_state: int =0, horizons: list[float] | list[int] | list[Timestamp] | None =None, raise_exceptions: bool =False, silence_warnings: bool =True) tuple[DataFrame, dict[str, DataFrame]][source]¶ Benchmark the performance of several algorithms.
- Parameters:¶
- task_type : PredictiveTaskType¶
The type of problem. Relevant for evaluating the downstream models with the correct metrics. The options are any of
PredictiveTaskType.- tests : List[Tuple[str, Any]]¶
Tuples of form
(test_name: str, plugin: BasePredictor/Pipeline)- data : dataset.PredictiveDataset¶
The evaluation dataset to use for cross-validation.
- n_splits : int, optional¶
Number of splits used for cross-validation. Defaults to
3.- random_state : int, optional¶
Random seed. Defaults to
0.- horizons : Optional[data_typing.TimeIndex], optional¶
Time horizons for making predictions, if applicable to the task.
- raise_exceptions : bool, optional¶
Whether to raise exceptions during evaluation. If
False, the exceptions will be swallowed and the evaluation will continue - exception count will be reported in the"errors"column of the resultant dataframe. Defaults toFalse.- silence_warnings : bool, optional¶
Whether to silence warnings raised. Some dependencies (e.g.
xgbse) may circumvent this and raise warnings regardless. Defaults toTrue.
- Returns:¶
- The benchmarking results given as
(readable_dataframe: pd.DataFrame, results: Dict[str, pd.DataFrame]])where: readable_dataframe: a dataframe with metric name as index and test names as columns, where the values are readable string representations of the evaluation metric, like:MEAN +/- STDDEV.results: a dictionary mapping the test name to a dataframe with metric names as index and["mean", "stddev"]columns, where the values are thefloatmean and standard deviation for each metric.
- The benchmarking results given as
- Return type:¶
Tuple[pd.DataFrame, Dict[str, pd.DataFrame]]
-
tempor.benchmarks.benchmark.visualize_benchmark(results: dict[str, DataFrame], palette: str =
'viridis', plot_block: bool =True) Any[source]¶ Visualize the benchmarking results.
- Parameters:¶
- results : Dict[str, pd.DataFrame]¶
The
resultsdictionary returned bybenchmark_models.- palette : str, optional¶
seaborncolor palette for the visualization. Defaults to"viridis".- plot_block : bool, optional¶
Whether to block the execution flow by the generated
matplotlibchart. Defaults toTrue.
- Returns:¶
The list of
matplotlibaxes objects with the generated plots.- Return type:¶
Any