GBDT (Gradient Boosted Decision Trees)#

GBDT uses decision trees as the weak learners in gradient boosting. Each tree is trained to predict the pseudo-residuals (negative gradients) of the current ensemble, progressively reducing the overall loss.

This is the most widely-used form of gradient boosting, implemented in XGBoost, LightGBM, CatBoost, and other popular libraries.

The Core Idea#

GBDT builds an ensemble of trees sequentially:

Prediction = Tree₁(x) + Tree₂(x) + Tree₃(x) + ... + Treeₘ(x)

Each tree is trained on the gradient of the loss function with respect to the current predictions. For squared error loss, this is simply the residual (actual - predicted). For other losses, it’s the gradient that tells us “which direction should we move the prediction to reduce error?”

Modern implementations (XGBoost, LightGBM) use a second-order Taylor expansion. For each sample \(i\) at a given tree node, we compute:

Gradient: \(g_i = \frac{\partial L(y_i, \hat{y}_i)}{\partial \hat{y}_i}\)
Hessian: \(h_i = \frac{\partial^2 L(y_i, \hat{y}_i)}{\partial \hat{y}_i^2}\)

The optimal leaf weight for a set of samples \(I\) is:

\[w^* = -\frac{\sum_{i \in I} g_i}{\sum_{i \in I} h_i + \lambda}\]

Where \(\lambda\) is L2 regularization.

Why Trees?#

Decision trees have properties that make them excellent weak learners:

Property	Benefit
Non-linear	Can model complex relationships
Automatic feature selection	Splits on most informative features
Handle mixed types	Numerical and categorical (with encoding)
Missing value handling	Can learn default directions
Fast inference	Just follow decision rules
Interpretable splits	“If age > 30 AND income > 50k”

The combination of many weak trees (high bias, low variance individually) creates a strong learner with low bias and controlled variance.

Training Overview#

GBDT training follows this high-level loop:

Algorithm: GBDT Training
─────────────────────────────────────────
Input: Training data (X, y), loss function L, num_trees M
Output: Ensemble of trees {T₁, T₂, ..., Tₘ}

1. Initialize predictions: F₀(x) = base_score
2. For m = 1 to M:
   a. Compute gradients: gᵢ = ∂L/∂F for each sample
   b. Compute hessians:  hᵢ = ∂²L/∂F² for each sample
   c. Build tree Tₘ by:
      - Finding best splits using gradient statistics
      - Computing leaf weights from gradients/hessians
   d. Update predictions: Fₘ = Fₘ₋₁ + η · Tₘ
3. Return ensemble {T₁, ..., Tₘ}

See Training for detailed algorithms.

Modern Optimizations#

Modern GBDT implementations use several key optimizations:

Histogram-Based Training#

Instead of sorting features to find splits (O(n log n)), modern implementations:

Quantize features into discrete bins
Build gradient histograms per bin
Find splits by scanning histograms (O(bins))

See Histogram Method.

Histogram Subtraction#

For binary tree nodes: child = parent - sibling. This nearly halves the work by only building histograms for the smaller child.

Tree Growth Strategies#

Depth-wise (XGBoost default): Grow all nodes at the same level
Leaf-wise (LightGBM default): Always split the best leaf

See Tree Growth Strategies.

Sampling#

Row subsampling: Use fraction of rows per tree
GOSS: Keep large-gradient samples, sample small-gradient samples

See Sampling Strategies.

Contents#

Training#

Algorithms for building trees efficiently:

Document	Description
Histogram Method	Core histogram-based split finding
Quantization	Feature binning and discretization
Split Finding	Gain formula and split enumeration
Tree Growth	Depth-wise vs leaf-wise strategies
Sampling	GOSS and subsampling

Inference#

Efficient prediction with trained models:

Document	Description
Batch Traversal	Block-based prediction, cache efficiency
Multi-Output	Multi-class and multi-target

Data Structures#

Key representations for training and inference:

Document	Description
Histogram Cuts	Bin boundaries and quantile sketches
Quantized Data	Binned feature storage
Tree Storage	AoS vs SoA tree layouts

Library Implementations#

Library	Source Files
XGBoost	`src/tree/updater_quantile_hist.cc`, `src/gbm/gbtree.cc`
LightGBM	`src/treelearner/serial_tree_learner.cpp`, `src/boosting/gbdt.cpp`

See Library Comparison for detailed feature comparison.