Note
This tutorial is available as a Jupyter notebook. Download notebook
Tutorial 01: Basic GBDT Training#
🟢 Beginner — No prior boosting experience needed
In this tutorial, you’ll learn how to train your first Gradient Boosted Decision Tree (GBDT) model with boosters.
What you’ll learn#
Create a dataset from NumPy arrays
Configure and train a GBDT model
Make predictions
Evaluate model performance
Setup#
First, let’s install and import the required packages:
[1]:
import numpy as np
from sklearn.datasets import make_regression
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error, r2_score
import boosters
Generate Sample Data#
We’ll use scikit-learn to generate a synthetic regression dataset:
[2]:
# Generate synthetic regression data
X, y = make_regression(n_samples=1000, n_features=10, noise=0.1, random_state=42)
# Split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
print(f"Training samples: {len(X_train)}")
print(f"Test samples: {len(X_test)}")
print(f"Features: {X_train.shape[1]}")
Training samples: 800
Test samples: 200
Features: 10
Create a Dataset#
boosters uses a Dataset object to wrap your data for efficient training:
[3]:
# Create boosters Dataset objects
train_data = boosters.Dataset(X_train, y_train)
test_data = boosters.Dataset(X_test, y_test)
print(f"Train dataset: {train_data}")
print(f"Test dataset: {test_data}")
Train dataset: Dataset(n_samples=800, n_features=10, has_labels=true, categorical_features=0)
Test dataset: Dataset(n_samples=200, n_features=10, has_labels=true, categorical_features=0)
Configure the Model#
Create a configuration for your GBDT model:
[4]:
# Configure the GBDT model
config = boosters.GBDTConfig(
n_estimators=100, # Number of trees
max_depth=6, # Maximum tree depth
learning_rate=0.1, # Learning rate (shrinkage)
objective=boosters.Objective.squared(), # Regression objective (L2 loss)
)
print("Configuration created!")
print(config)
Configuration created!
GBDTConfig(n_estimators=100, learning_rate=0.1, objective=Squared)
Train the Model#
Train the model using the GBDTModel.train() method:
[5]:
# Train the model
model = boosters.GBDTModel.train(train_data, config=config)
print(f"Model trained!")
print(f"Number of trees: {model.n_trees}")
print(f"Number of features: {model.n_features}")
Model trained!
Number of trees: 100
Number of features: 10
Make Predictions#
Use the trained model to predict on the test set:
[6]:
# Make predictions - need to wrap in Dataset for core API
y_pred = model.predict(boosters.Dataset(X_test))
print(f"Predictions shape: {y_pred.shape}")
print(f"First 5 predictions: {y_pred[:5].flatten()}")
Predictions shape: (200, 1)
First 5 predictions: [ 2.1480764e+01 6.7458168e+01 -2.0529093e-01 -2.7172467e+02
3.0354942e+01]
Evaluate Performance#
Calculate standard regression metrics:
[7]:
# Calculate metrics - flatten predictions for sklearn metrics
y_pred_flat = y_pred.flatten()
mse = mean_squared_error(y_test, y_pred_flat)
rmse = np.sqrt(mse)
r2 = r2_score(y_test, y_pred_flat)
print(f"Mean Squared Error: {mse:.4f}")
print(f"Root Mean Squared Error: {rmse:.4f}")
print(f"R² Score: {r2:.4f}")
Mean Squared Error: 1243.9708
Root Mean Squared Error: 35.2700
R² Score: 0.9263
Summary#
In this tutorial, you learned how to:
✅ Create datasets from NumPy arrays
✅ Configure a GBDT model with basic hyperparameters
✅ Train the model
✅ Make predictions and evaluate performance
Next Steps#
Tutorial 02: sklearn Integration — Use boosters with sklearn pipelines
Tutorial 03: Classification — Train classification models