Choosing an API#

boosters provides two Python APIs designed for different use cases:

API

Best For

Key Features

sklearn API

Most users, ML pipelines

Works with Pipeline, cross_val_score, GridSearchCV

Core API

Full control, custom workflows

Explicit Dataset, callbacks, advanced configuration

sklearn API#

Use the sklearn API when you want:

  • Integration with scikit-learn pipelines and tools

  • Familiar estimator interface (fit/predict/score)

  • Quick experimentation and prototyping

  • Hyperparameter tuning with GridSearchCV or RandomizedSearchCV

Example:

from boosters.sklearn import GBDTRegressor, GBDTClassifier
from sklearn.model_selection import cross_val_score
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler

# Simple usage
model = GBDTRegressor(n_estimators=100, max_depth=6)
model.fit(X_train, y_train)
predictions = model.predict(X_test)

# With sklearn tools
scores = cross_val_score(model, X, y, cv=5)

# In a pipeline
pipe = Pipeline([
    ('scaler', StandardScaler()),
    ('model', GBDTRegressor(n_estimators=100)),
])
pipe.fit(X_train, y_train)

Available estimators:

Class

Use Case

GBDTRegressor

Regression with decision trees

GBDTClassifier

Classification with decision trees

GBLinearRegressor

Regression with linear boosting (sparse data)

GBLinearClassifier

Classification with linear boosting

Core API#

Use the Core API when you want:

  • Full control over the training process

  • Custom objectives or metrics

  • Access to training callbacks and logging

  • Advanced configuration options

  • Direct control over the Dataset object

Example:

import boosters as bst

# Create dataset explicitly
train_data = bst.Dataset(X_train, y_train)
val_data = bst.Dataset(X_val, y_val)

# Configure with all options
config = bst.GBDTConfig(
    n_estimators=100,
    learning_rate=0.1,
    max_depth=6,
    objective=bst.Objective.squared(),
    metric=bst.Metric.rmse(),
    early_stopping_rounds=10,
    l2=1.0,
    subsample=0.8,
    colsample_bytree=0.8,
)

# Train with validation set
model = bst.GBDTModel.train(
    train_data,
    evals=[(val_data, "val")],
    config=config,
)

# Predict
predictions = model.predict(bst.Dataset(X_test))

Core API components:

Class

Description

Dataset

Wraps feature matrix and optional labels/weights

GBDTModel

Gradient boosted decision trees

GBLinearModel

Gradient boosted linear model

GBDTConfig

Configuration for GBDT training

GBLinearConfig

Configuration for GBLinear training

Objective

Loss functions (squared, logistic, softmax, etc.)

Metric

Evaluation metrics (RMSE, AUC, accuracy, etc.)

Key Differences#

Feature

sklearn API

Core API

Interface

fit(X, y) / predict(X)

train(dataset) / predict(dataset)

Input format

NumPy arrays directly

Explicit Dataset objects

Configuration

Constructor kwargs

GBDTConfig / GBLinearConfig

Validation

Use sklearn’s cross_val_score

Pass evals list to train()

sklearn tools

✅ Full support

❌ Not directly compatible

Callbacks

❌ Not available

✅ Available

Custom objectives

Via objective= param

Via config.objective

When to Use Each#

Choose sklearn API if:

  • You’re already using scikit-learn

  • You want to use Pipeline, GridSearchCV, etc.

  • You’re doing quick experiments

  • You don’t need advanced control

Choose Core API if:

  • You need maximum control over training

  • You’re implementing custom training loops

  • You want to use callbacks or custom logging

  • You’re optimizing for production performance

Both APIs produce the same models — the difference is in how you interact with them.