boosters-eval#
A framework for benchmarking gradient boosting libraries.
boosters-eval makes it easy to compare Boosters against XGBoost and LightGBM
on various datasets, ensuring fair comparisons with consistent hyperparameters.
Installation#
# From the repository root
pip install -e packages/boosters-eval
# Or using uv
uv pip install -e packages/boosters-eval
Quick Start#
# Run a quick benchmark
boosters-eval quick
# Compare specific libraries
boosters-eval compare -d california -l boosters -l xgboost
# Generate a report
boosters-eval report -s quick -o benchmark.md
from boosters_eval import compare, run_suite, QUICK_SUITE
# Quick comparison
results = compare(["california"], seeds=[42])
print(results.to_markdown())
# Run predefined suite
results = run_suite(QUICK_SUITE)
print(results.summary())
CLI Commands#
quick#
Run a quick benchmark suite (3 seeds, 2 datasets, 50 trees):
boosters-eval quick
boosters-eval quick -o results.md
full#
Run the full benchmark suite (5 seeds, all datasets, 100 trees):
boosters-eval full
boosters-eval full -o results.md
boosters-eval full --booster gblinear # Test linear booster
compare#
Compare specific libraries on selected datasets:
# Compare all libraries on california dataset
boosters-eval compare -d california
# Customize comparison
boosters-eval compare \
-d california \
-d breast_cancer \
-l boosters \
-l xgboost \
--trees 100 \
--seeds 5
baseline#
Record and check baselines for CI regression detection:
# Record baseline
boosters-eval baseline record -o baseline.json -s quick
# Check against baseline
boosters-eval baseline check baseline.json -s quick --tolerance 0.02
report#
Generate markdown reports with machine fingerprinting:
boosters-eval report -s quick -o docs/benchmarks/report.md
boosters-eval report -s full --title "Release 0.1.0 Benchmark"
list-*#
List available resources:
boosters-eval list-datasets
boosters-eval list-libraries
boosters-eval list-tasks
Python API#
Custom Suites#
from boosters_eval import SuiteConfig, run_suite, BoosterType
suite = SuiteConfig(
name="custom",
description="My custom benchmark",
datasets=["california", "breast_cancer"],
n_estimators=100,
seeds=[42, 123, 456],
libraries=["boosters", "xgboost", "lightgbm"],
booster_type=BoosterType.GBDT,
)
results = run_suite(suite)
print(results.to_markdown())
Ablation Studies#
Compare different hyperparameter settings:
from boosters_eval import QUICK_SUITE, create_ablation_suite, run_suite
# Compare different tree depths
depth_variants = {
"depth_4": {"max_depth": 4},
"depth_6": {"max_depth": 6},
"depth_8": {"max_depth": 8},
}
depth_suites = create_ablation_suite("depth_study", QUICK_SUITE, depth_variants)
for suite in depth_suites:
results = run_suite(suite)
print(f"\n{suite.name}:")
print(results.to_markdown())
Baseline Regression Testing#
Detect performance regressions in CI:
from boosters_eval import (
record_baseline,
load_baseline,
check_baseline,
run_suite,
QUICK_SUITE,
)
from pathlib import Path
# Record baseline
results = run_suite(QUICK_SUITE)
baseline = record_baseline(results, output_path=Path("baseline.json"))
# Later: check for regressions
current_results = run_suite(QUICK_SUITE)
baseline = load_baseline(Path("baseline.json"))
report = check_baseline(current_results, baseline, tolerance=0.02)
if report.has_regressions:
for reg in report.regressions:
print(f"⚠️ Regression: {reg['config']} {reg['metric']}")
Available Datasets#
Dataset |
Task |
Size |
Features |
|---|---|---|---|
california |
Regression |
20,640 |
8 |
breast_cancer |
Binary Classification |
569 |
30 |
iris |
Multiclass Classification |
150 |
4 |
synthetic_reg_* |
Synthetic Regression |
Various |
Configurable |
synthetic_bin_* |
Synthetic Binary |
Various |
Configurable |
Supported Libraries#
Library |
Booster Types |
Notes |
|---|---|---|
boosters |
gbdt, gblinear |
Native Rust implementation |
xgboost |
gbdt, gblinear |
Industry standard |
lightgbm |
gbdt, linear_trees |
Leaf-wise growth, histogram-based |
CI Integration#
Add baseline regression testing to your CI pipeline:
# .github/workflows/benchmark.yml
name: Benchmark Regression Check
on:
pull_request:
branches: [main]
jobs:
benchmark:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Set up Python
uses: actions/setup-python@v5
with:
python-version: "3.11"
- name: Install dependencies
run: |
pip install -e packages/boosters-eval
pip install xgboost lightgbm
- name: Check baseline
run: |
boosters-eval baseline check \
tests/baselines/quick.json \
-s quick \
--tolerance 0.02
See Also#
Full Benchmark Report — Latest benchmark results
Hyperparameters — Understanding hyperparameters
XGBoost vs LightGBM: Approach Comparison — Comparison of GBDT libraries