Hyperparameters#
This guide explains all hyperparameters in boosters, their effects, and tuning recommendations.
General Parameters#
These apply to both GBDT and GBLinear:
n_estimators (default: 100)#
Number of boosting iterations (trees for GBDT, rounds for GBLinear).
Too low: Underfitting, model doesn’t learn patterns
Too high: Overfitting, diminishing returns, slower training
Typical range: 50–1000
Tuning: Use early stopping to find optimal value
learning_rate (default: 0.3)#
Shrinkage applied to each weak learner. Lower values require more iterations but often generalize better.
Too low: Need many iterations, slow training
Too high: Overfitting, unstable training
Typical range: 0.01–0.3
Rule of thumb: Lower learning rate + more trees = better generalization
objective (default: “reg:squarederror”)#
Loss function to optimize. See Classification Objective Research for details.
reg:squarederror: Regression (MSE)binary:logistic: Binary classification (log loss)multi:softmax: Multiclass classificationreg:quantile: Quantile regression
seed (default: None)#
Random seed for reproducibility. Affects:
Feature subsampling
Row subsampling
Initial tree structure (for some implementations)
GBDT-Specific Parameters#
Tree Structure#
max_depth (default: 6)#
Maximum depth of each tree.
Deeper trees: More complex patterns, higher overfitting risk
Shallower trees: Simpler patterns, needs more trees
Typical range: 3–10
Note: Computation is O(2^depth), so deep trees are expensive
max_leaves (default: None)#
Maximum number of leaf nodes. Alternative to max_depth for controlling tree complexity.
If set, overrides max_depth
Allows asymmetric trees
Typical range: 16–256
min_child_weight (default: 1.0)#
Minimum sum of Hessians required in a child node. Regularization parameter that prevents splits creating too-small leaf nodes.
Higher values: More conservative, prevents overfitting
For regression: Similar to minimum samples per leaf
For classification: Accounts for class imbalance
Typical range: 1–10
min_split_loss (gamma) (default: 0.0)#
Minimum loss reduction required to make a split. Pruning parameter.
Higher values: Fewer splits, simpler trees
Typical range: 0–5
Regularization#
reg_lambda (default: 1.0)#
L2 regularization on leaf weights.
Prevents extreme leaf weights
Typical range: 0–10
reg_alpha (default: 0.0)#
L1 regularization on leaf weights. Promotes sparse solutions.
Useful when many features are irrelevant
Typical range: 0–1
Subsampling#
subsample (default: 1.0)#
Row subsampling ratio. Each tree sees a random subset of training data.
Lower values: More regularization, faster training
Reduces overfitting through stochastic effects
Typical range: 0.5–1.0
colsample_bytree (default: 1.0)#
Column subsampling ratio per tree.
Each tree considers a random subset of features
Typical range: 0.5–1.0
colsample_bylevel (default: 1.0)#
Column subsampling ratio per tree level.
More aggressive feature regularization
Typical range: 0.5–1.0
GBLinear-Specific Parameters#
feature_selector (default: “cyclic”)#
Strategy for selecting features to update:
Selector |
When to Use |
|---|---|
|
Default, deterministic, good for most cases |
|
Breaks feature correlations, often better generalization |
|
Sparse data, when few features are relevant |
|
High dimensions where greedy is too slow |
top_k (default: 0)#
For greedy/thrifty selectors: number of top features to consider.
0 means use all features
Useful for very high dimensional data
Tuning Strategy#
Start Simple#
Use defaults
Set learning_rate=0.1, n_estimators=1000 with early stopping
Let early stopping find the right number of trees
Tune Tree Structure#
Start with max_depth=6
If overfitting: reduce max_depth, increase min_child_weight
If underfitting: increase max_depth, reduce regularization
Add Regularization#
Add subsampling (subsample=0.8, colsample_bytree=0.8)
Increase reg_lambda if still overfitting
Cross-Validation#
Use cross-validation to evaluate hyperparameter choices:
from sklearn.model_selection import cross_val_score
from boosters.sklearn import GBDTRegressor
model = GBDTRegressor(
n_estimators=100,
max_depth=6,
learning_rate=0.1,
)
scores = cross_val_score(model, X, y, cv=5, scoring="neg_mean_squared_error")
print(f"RMSE: {(-scores.mean()) ** 0.5:.4f} ± {scores.std() ** 0.5:.4f}")
See Also#
Gradient Boosting — Theory overview
GBLinear — GBLinear details
Classification Objective Research — Loss functions and metrics