Hyperparameters#

This guide explains all hyperparameters in boosters, their effects, and tuning recommendations.

General Parameters#

These apply to both GBDT and GBLinear:

`n_estimators` (default: 100)#

Number of boosting iterations (trees for GBDT, rounds for GBLinear).

Too low: Underfitting, model doesn’t learn patterns
Too high: Overfitting, diminishing returns, slower training
Typical range: 50–1000
Tuning: Use early stopping to find optimal value

`learning_rate` (default: 0.3)#

Shrinkage applied to each weak learner. Lower values require more iterations but often generalize better.

\[F_m(x) = F_{m-1}(x) + \eta \cdot h_m(x)\]

Too low: Need many iterations, slow training
Too high: Overfitting, unstable training
Typical range: 0.01–0.3
Rule of thumb: Lower learning rate + more trees = better generalization

`objective` (default: “reg:squarederror”)#

Loss function to optimize. See Classification Objective Research for details.

reg:squarederror: Regression (MSE)
binary:logistic: Binary classification (log loss)
multi:softmax: Multiclass classification
reg:quantile: Quantile regression

`seed` (default: None)#

Random seed for reproducibility. Affects:

Feature subsampling
Row subsampling
Initial tree structure (for some implementations)

GBDT-Specific Parameters#

Tree Structure#

`max_depth` (default: 6)#

Maximum depth of each tree.

Deeper trees: More complex patterns, higher overfitting risk
Shallower trees: Simpler patterns, needs more trees
Typical range: 3–10
Note: Computation is O(2^depth), so deep trees are expensive

`max_leaves` (default: None)#

Maximum number of leaf nodes. Alternative to max_depth for controlling tree complexity.

If set, overrides max_depth
Allows asymmetric trees
Typical range: 16–256

`min_child_weight` (default: 1.0)#

Minimum sum of Hessians required in a child node. Regularization parameter that prevents splits creating too-small leaf nodes.

Higher values: More conservative, prevents overfitting
For regression: Similar to minimum samples per leaf
For classification: Accounts for class imbalance
Typical range: 1–10

`min_split_loss` (gamma) (default: 0.0)#

Minimum loss reduction required to make a split. Pruning parameter.

\[\text{Split if Gain} > \gamma\]

Higher values: Fewer splits, simpler trees
Typical range: 0–5

Regularization#

`reg_lambda` (default: 1.0)#

L2 regularization on leaf weights.

\[w^* = -\frac{\sum g_i}{\sum H_i + \lambda}\]

Prevents extreme leaf weights
Typical range: 0–10

`reg_alpha` (default: 0.0)#

L1 regularization on leaf weights. Promotes sparse solutions.

Useful when many features are irrelevant
Typical range: 0–1

Subsampling#

`subsample` (default: 1.0)#

Row subsampling ratio. Each tree sees a random subset of training data.

Lower values: More regularization, faster training
Reduces overfitting through stochastic effects
Typical range: 0.5–1.0

`colsample_bytree` (default: 1.0)#

Column subsampling ratio per tree.

Each tree considers a random subset of features
Typical range: 0.5–1.0

`colsample_bylevel` (default: 1.0)#

Column subsampling ratio per tree level.

More aggressive feature regularization
Typical range: 0.5–1.0

GBLinear-Specific Parameters#

`feature_selector` (default: “cyclic”)#

Strategy for selecting features to update:

Selector	When to Use
`cyclic`	Default, deterministic, good for most cases
`shuffle`	Breaks feature correlations, often better generalization
`greedy`	Sparse data, when few features are relevant
`thrifty`	High dimensions where greedy is too slow

`top_k` (default: 0)#

For greedy/thrifty selectors: number of top features to consider.

0 means use all features
Useful for very high dimensional data

Tuning Strategy#

Start Simple#

Use defaults
Set learning_rate=0.1, n_estimators=1000 with early stopping
Let early stopping find the right number of trees

Tune Tree Structure#

Start with max_depth=6
If overfitting: reduce max_depth, increase min_child_weight
If underfitting: increase max_depth, reduce regularization

Add Regularization#

Add subsampling (subsample=0.8, colsample_bytree=0.8)
Increase reg_lambda if still overfitting

Cross-Validation#

Use cross-validation to evaluate hyperparameter choices:

from sklearn.model_selection import cross_val_score
from boosters.sklearn import GBDTRegressor

model = GBDTRegressor(
    n_estimators=100,
    max_depth=6,
    learning_rate=0.1,
)
scores = cross_val_score(model, X, y, cv=5, scoring="neg_mean_squared_error")
print(f"RMSE: {(-scores.mean()) ** 0.5:.4f} ± {scores.std() ** 0.5:.4f}")

Hyperparameters#

General Parameters#

n_estimators (default: 100)#

learning_rate (default: 0.3)#

objective (default: “reg:squarederror”)#

seed (default: None)#

GBDT-Specific Parameters#

Tree Structure#

max_depth (default: 6)#

max_leaves (default: None)#

min_child_weight (default: 1.0)#

min_split_loss (gamma) (default: 0.0)#