Research#

Deep dives into algorithms, data structures, and techniques used in gradient boosting.

Research documents are educational — explaining how gradient boosting algorithms work, their optimizations, and trade-offs. They inform implementation decisions but are not prescriptive to our library’s specific design.

Foundations#

Algorithms#

GBDT (Gradient Boosted Decision Trees)

GBLinear (Linear Gradient Boosting)

GBLinear Overview

Cross-Cutting Topics#

Reference#

Research vs RFCs#

Research

RFCs

“How does gradient boosting work?”

“How will we build it?”

Algorithm documentation

Design decisions

External focus (XGBoost, LightGBM)

Internal focus (Boosters)

Educational

Prescriptive

Can cite academic papers

Should be self-contained

Primary Sources#

These documents synthesize information from:

XGBoostgithub.com/dmlc/xgboost

  • Primary reference for histogram-based training

  • JSON model format compatibility

LightGBMgithub.com/microsoft/LightGBM

  • Leaf-wise growth strategy

  • GOSS sampling

  • Native categorical handling

Academic Papers

  • Chen & Guestrin (2016): XGBoost: A Scalable Tree Boosting System

  • Ke et al. (2017): LightGBM: A Highly Efficient Gradient Boosting Decision Tree