Python Quickstart#

This guide gets you training your first boosters model in under 5 minutes.

Prerequisites#

boosters installed (see Installation)
Basic Python and machine learning knowledge

Which Model Should I Use?#

boosters offers three model types, each optimized for different use cases:

Model	Best For	Strengths	Considerations
GBDT (default)	General-purpose ML	High accuracy, handles complex patterns	Cannot extrapolate beyond training data
GBDT + Linear Leaves	Time series, extrapolation	Can extrapolate trends, good for forecasting	Needs local linear relationships
GBLinear	Sparse/linear data	Fast, interpretable, memory efficient	Limited to linear relationships

Quick decision guide:

🎯 Start with GBDT — Works well for most problems
📈 Need to predict beyond training range? → Try Linear Leaves
⚡ High-dimensional sparse data? → Consider GBLinear
🔍 Need feature coefficients? → Use GBLinear

For detailed comparisons, see RFC-0010: GBLinear (linear models) and RFC-0011: Linear Leaves (linear tree leaves).

Basic GBDT Training#

The simplest way to use boosters is through the sklearn-compatible interface:

from boosters.sklearn import GBDTRegressor
from sklearn.datasets import make_regression
from sklearn.model_selection import train_test_split

# Generate sample data
X, y = make_regression(n_samples=1000, n_features=10, random_state=42)
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=42)

# Train a model
model = GBDTRegressor(n_estimators=100, max_depth=6, learning_rate=0.1)
model.fit(X_train, y_train)

# Make predictions
predictions = model.predict(X_test)

# Evaluate
print(f"R² score: {model.score(X_test, y_test):.4f}")

Using the Core API#

For more control, use the core GBDTModel API:

import numpy as np
from boosters import Dataset, GBDTModel, GBDTConfig, Objective

# Prepare data
X_train = np.random.randn(1000, 10).astype(np.float32)
y_train = np.random.randn(1000).astype(np.float32)

# Create dataset
train_data = Dataset(X_train, y_train)

# Configure model
config = GBDTConfig(
    n_estimators=100,
    max_depth=6,
    learning_rate=0.1,
    objective=Objective.squared(),
)

# Train
model = GBDTModel.train(config, train_data)

# Predict
predictions = model.predict(X_train)

Classification#

For classification tasks, use GBDTClassifier:

from boosters.sklearn import GBDTClassifier
from sklearn.datasets import make_classification

X, y = make_classification(n_samples=1000, n_features=10, random_state=42)
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=42)

model = GBDTClassifier(n_estimators=100, max_depth=6)
model.fit(X_train, y_train)

# Get probabilities
probas = model.predict_proba(X_test)

# Get class predictions
predictions = model.predict(X_test)
print(f"Accuracy: {model.score(X_test, y_test):.4f}")

GBLinear (Linear Boosting)#

For high-dimensional or sparse data with linear relationships:

from boosters.sklearn import GBLinearRegressor

# GBLinear is especially good for sparse/linear data
model = GBLinearRegressor(n_estimators=100, learning_rate=0.5)
model.fit(X_train, y_train)

# Access learned coefficients
print(f"Coefficients: {model.coef_}")
print(f"Intercept: {model.intercept_}")

See Tutorial 06: GBLinear & Sparse Data for a complete tutorial.

GBDT with Linear Leaves#

Enable linear leaves for better extrapolation:

from boosters.sklearn import GBDTRegressor

# Enable linear leaves for extrapolation capability
model = GBDTRegressor(
    n_estimators=50,
    max_depth=4,
    linear_leaves=True,  # Key parameter!
    linear_l2=0.01,
)
model.fit(X_train, y_train)

See Tutorial 10: GBDT with Linear Leaves for when and how to use linear leaves.

Next Steps#

Tutorials — Hands-on tutorials for common tasks
Hyperparameters — Understanding hyperparameters
API Reference — Complete API reference
RFC-0008: GBDT Training — GBDT training algorithm details