Research#

Deep dives into algorithms, data structures, and techniques used in gradient boosting.

Research documents are educational — explaining how gradient boosting algorithms work, their optimizations, and trade-offs. They inform implementation decisions but are not prescriptive to our library’s specific design.

Foundations#

Gradient Boosting

Algorithms#

GBDT (Gradient Boosted Decision Trees)

GBDT Overview

GBDT (Gradient Boosted Decision Trees)

Training Pipeline

Inference

Data Structures

GBLinear (Linear Gradient Boosting)

GBLinear Overview

GBLinear

GBLinear Details

Cross-Cutting Topics#

Reference#

Research vs RFCs#

Research	RFCs
“How does gradient boosting work?”	“How will we build it?”
Algorithm documentation	Design decisions
External focus (XGBoost, LightGBM)	Internal focus (Boosters)
Educational	Prescriptive
Can cite academic papers	Should be self-contained

Primary Sources#

These documents synthesize information from:

XGBoost — github.com/dmlc/xgboost

Primary reference for histogram-based training
JSON model format compatibility

LightGBM — github.com/microsoft/LightGBM

Leaf-wise growth strategy
GOSS sampling
Native categorical handling

Academic Papers

Chen & Guestrin (2016): XGBoost: A Scalable Tree Boosting System
Ke et al. (2017): LightGBM: A Highly Efficient Gradient Boosting Decision Tree