We’ve covered the algorithms. Now for the practical question: how do you configure all these parameters?
Gradient boosting has many hyperparameters, each affecting model behavior differently. This post provides a systematic approach to tuning, organized by what each parameter controls.
The Core Trade-Off: Bias vs Variance
Every hyperparameter affects the bias-variance trade-off:
Bias (underfitting): Model is too simple to capture patterns
Variance (overfitting): Model memorizes training data, fails on new data
Gradient boosting defaults are often biased toward overfitting (deep trees, no sampling). Tuning is mostly about adding the right amount of regularization.
Learning Rate and Number of Trees
These two parameters work together as the most important knobs.
learning_rate (eta)
Controls how much each tree contributes: Fm=Fm−1+η⋅hm.
Value
Effect
Use Case
0.3 (default)
Aggressive
Quick experiments
0.1
Balanced
Good starting point
0.01-0.05
Conservative
Maximum accuracy
Lower learning rates need more trees but typically generalize better.
n_estimators
Number of boosting rounds (trees).
The Golden Rule
Set a high n_estimators and use early stopping. Let the validation loss determine when to stop.
Early stopping finds the optimal number of trees automatically, avoiding both underfitting and overfitting.
The Relationship
Lower learning rate + more trees = better generalization (given enough trees).
Configuration
Typical n_estimators
Notes
learning_rate=0.3
100-300
Fast, may overfit
learning_rate=0.1
300-1000
Balanced
learning_rate=0.01
1000-5000
Best accuracy, slow
Tree Structure Parameters
These control how complex each individual tree can be.
max_depth
Maximum levels in each tree.
Value
Effect
3-4
Simple interactions (linear + pairwise)
6 (default)
Moderate complexity
8-10
Complex interactions
> 10
Usually overkill, high overfit risk
Rule of thumb: Start with 6, reduce if overfitting, increase if underfitting.
num_leaves (LightGBM)
Alternative to max_depth for leaf-wise growth. Maximum leaves per tree.
num_leaves
Rough max_depth equivalent
31 (default)
~5
63
~6
127
~7
255
~8
min_child_weight / min_data_in_leaf
Minimum sum of Hessians (or minimum samples) in a leaf.
Higher values = more conservative, stops early
Lower values = trees can create smaller leaves
Typical range: 1-100
Hessian vs Samples
For squared error, Hessian = 1 per sample, so min_child_weight ≈ minimum samples.
For logistic loss, Hessian = p(1−p), which peaks at 0.25 for uncertain predictions.
min_split_gain (gamma)
Minimum gain required to make a split. Pruning parameter.
Split if: Gain>γ
0 (default): Allow all positive-gain splits
0.1-1.0: Mild pruning
1-5: Aggressive pruning
Regularization Parameters
Beyond tree structure, explicit regularization on leaf weights.
reg_lambda (L2)
L2 regularization on leaf weights. Appears in the optimal weight formula: