Research#
Deep dives into algorithms, data structures, and techniques used in gradient boosting.
Research documents are educational — explaining how gradient boosting algorithms work, their optimizations, and trade-offs. They inform implementation decisions but are not prescriptive to our library’s specific design.
Foundations#
Algorithms#
GBDT (Gradient Boosted Decision Trees)
GBDT Overview
Training Pipeline
Inference
GBLinear (Linear Gradient Boosting)
GBLinear Overview
GBLinear Details
Cross-Cutting Topics#
Reference#
Research vs RFCs#
Research |
RFCs |
|---|---|
“How does gradient boosting work?” |
“How will we build it?” |
Algorithm documentation |
Design decisions |
External focus (XGBoost, LightGBM) |
Internal focus (Boosters) |
Educational |
Prescriptive |
Can cite academic papers |
Should be self-contained |
Primary Sources#
These documents synthesize information from:
XGBoost — github.com/dmlc/xgboost
Primary reference for histogram-based training
JSON model format compatibility
LightGBM — github.com/microsoft/LightGBM
Leaf-wise growth strategy
GOSS sampling
Native categorical handling
Academic Papers
Chen & Guestrin (2016): XGBoost: A Scalable Tree Boosting System
Ke et al. (2017): LightGBM: A Highly Efficient Gradient Boosting Decision Tree