Early Stopping#
Early stopping prevents overfitting by monitoring validation performance and stopping training when the model stops improving.
Basic Usage#
sklearn API:
from boosters.sklearn import GBDTRegressor
model = GBDTRegressor(
n_estimators=1000, # Maximum trees
early_stopping_rounds=10, # Stop after 10 rounds without improvement
)
# Provide validation data via eval_set
model.fit(
X_train, y_train,
eval_set=[(X_val, y_val)],
)
# Check how many trees were actually trained
print(f"Trained {model.n_trees_} trees (max was 1000)")
Core API:
import boosters as bst
config = bst.GBDTConfig(
n_estimators=1000,
early_stopping_rounds=10,
objective=bst.Objective.squared(),
metric=bst.Metric.rmse(),
)
train_data = bst.Dataset(X_train, y_train)
val_data = bst.Dataset(X_val, y_val)
model = bst.GBDTModel.train(
train_data,
evals=[(val_data, "validation")],
config=config,
)
How Early Stopping Works#
Train — Add trees one by one
Evaluate — Compute metric on validation set after each tree
Check — If no improvement for
early_stopping_rounds, stopReturn — Use the best model (not the last one)
Iteration 1: Train RMSE: 0.85 Val RMSE: 0.90
Iteration 2: Train RMSE: 0.72 Val RMSE: 0.82 ← improved
Iteration 3: Train RMSE: 0.65 Val RMSE: 0.78 ← improved
...
Iteration 50: Train RMSE: 0.12 Val RMSE: 0.75 ← best so far
Iteration 51: Train RMSE: 0.11 Val RMSE: 0.76 ← no improvement (1/10)
Iteration 52: Train RMSE: 0.10 Val RMSE: 0.77 ← no improvement (2/10)
...
Iteration 60: Train RMSE: 0.05 Val RMSE: 0.82 ← no improvement (10/10)
→ Stopping! Returning model from iteration 50.
Choosing early_stopping_rounds#
Value |
Behavior |
When to Use |
|---|---|---|
5-10 |
Aggressive stopping |
Quick experiments, small datasets |
10-20 |
Balanced (recommended) |
Most use cases |
50-100 |
Conservative stopping |
Large datasets, slow learning rates |
Rule of thumb: Use higher values with lower learning rates.
Validation Set Strategies#
Hold-out validation (simple, fast):
from sklearn.model_selection import train_test_split
X_train, X_val, y_train, y_val = train_test_split(
X, y, test_size=0.2, random_state=42
)
Time-based split (for time series):
# Use last 20% as validation
split_idx = int(len(X) * 0.8)
X_train, X_val = X[:split_idx], X[split_idx:]
y_train, y_val = y[:split_idx], y[split_idx:]
K-fold with early stopping (use with caution):
from sklearn.model_selection import KFold
kf = KFold(n_splits=5, shuffle=True, random_state=42)
models = []
for train_idx, val_idx in kf.split(X):
model = GBDTRegressor(n_estimators=1000, early_stopping_rounds=10)
model.fit(
X[train_idx], y[train_idx],
eval_set=[(X[val_idx], y[val_idx])],
)
models.append(model)
Monitoring Multiple Metrics#
You can monitor multiple metrics, but only one is used for early stopping:
config = bst.GBDTConfig(
n_estimators=1000,
early_stopping_rounds=10,
metric=bst.Metric.rmse(), # This one controls early stopping
)
Common Pitfalls#
Don’t use test set for early stopping:
# ❌ Wrong: Using test set for early stopping
model.fit(X_train, y_train, eval_set=[(X_test, y_test)])
# ✅ Right: Use a separate validation set
X_train, X_val, y_train, y_val = train_test_split(X_train, y_train, test_size=0.2)
model.fit(X_train, y_train, eval_set=[(X_val, y_val)])
Don’t forget to set n_estimators high enough:
# ❌ Wrong: Low n_estimators limits potential
model = GBDTRegressor(n_estimators=50, early_stopping_rounds=10)
# ✅ Right: High n_estimators, let early stopping decide
model = GBDTRegressor(n_estimators=1000, early_stopping_rounds=10)
See Also#
Tutorial 05: Early Stopping — Early stopping tutorial with visualizations
Hyperparameters — Understanding all hyperparameters