XGBoost vs LightGBM: Approach Comparison#
How XGBoost and LightGBM solve the same problems differently.
This document compares approaches and trade-offs, not feature checklists. Use it to understand why certain design decisions were made.
Tree Growth Strategy#
The problem: In what order should we split tree nodes?
XGBoost: Depth-wise (Level-by-Level)#
Split all nodes at the current level before moving to the next:
Level 0: [root] ← Split first
Level 1: [A] [B] ← Split BOTH before going deeper
Level 2: [a][b] [c][d] ← Split ALL FOUR
Properties:
Produces balanced trees
Easy to parallelize (entire level at once)
May waste work on low-gain splits
LightGBM: Leaf-wise (Best-First)#
Always split the leaf with the highest gain:
[root] → [highest-gain] → [highest-gain] → ...
Properties:
Lower loss for same number of leaves
More efficient (skips low-gain regions)
Risk of overfitting without
max_depthconstraintProduces unbalanced trees
Trade-off Summary#
Criterion |
Depth-wise |
Leaf-wise |
|---|---|---|
Accuracy (same leaves) |
Lower |
Higher |
Overfitting risk |
Lower |
Higher |
Parallelization |
Easier |
Harder |
Best for |
Small data, shallow trees |
Large data, efficiency |
Split Finding#
The problem: How to efficiently find the best split point?
Both: Histogram-Based#
Both libraries quantize features into bins and use histogram aggregation.
Aspect |
XGBoost |
LightGBM |
|---|---|---|
Default bins |
256 |
255 |
Histogram subtraction |
✅ |
✅ |
Missing value handling |
Learned direction |
Learned direction |
Difference: Histogram Build Strategy#
XGBoost: Row-wise by default — iterate samples, accumulate to feature histograms.
LightGBM: Adaptive — auto-selects row-wise or column-wise based on cache analysis.
Sampling Strategies#
The problem: How to reduce training data while maintaining quality?
XGBoost: Random Subsampling#
Uniform random selection of rows per tree.
subsample = 0.8 → Use random 80% of rows per tree
Simple, unbiased, but doesn’t prioritize informative samples.
LightGBM: GOSS (Gradient-based One-Side Sampling)#
Keep samples with large gradients, randomly sample the rest:
1. Keep top 20% by |gradient| (informative)
2. Random sample 10% of rest (maintain distribution)
3. Upweight sampled rest (correct for sampling bias)
Insight: Large gradient = model is wrong = more to learn.
Trade-off: More complex, but better sample efficiency for large datasets.
Categorical Feature Handling#
The problem: How to split on categorical features efficiently?
XGBoost: One-Hot or Approximate Partitions#
Traditional: One-hot encode, treat as numerical
v1.5+: Approximate partition-based splits via
enable_categorical
LightGBM: Native Gradient-Sorted Partitions#
For high-cardinality categoricals:
Sort categories by gradient statistics
Find optimal partition in O(k log k) for k categories
Key difference: LightGBM finds optimal binary partitions (subset vs complement), not just one-vs-rest splits.
Sparse Data Optimization#
The problem: How to handle datasets with many zeros/missing values?
Both: Learned Default Direction#
During split finding, try missing values going left AND right, pick better.
LightGBM Extra: Exclusive Feature Bundling (EFB)#
Bundle mutually exclusive features (if A ≠ 0 → B = 0):
Features A, B, C never overlap → Bundle into single feature AB'C
Reduces histogram memory and computation for sparse/one-hot data.
Gradient Precision#
The problem: How to reduce memory for gradient storage?
XGBoost: GPU-Only Quantization#
Full precision (float32/64) on CPU. GPU uses 16-bit packed gradients.
LightGBM: CPU + GPU Quantization#
Adaptive precision based on leaf size:
Large leaves: 32-bit (grad:16 + hess:16)
Small leaves: 16-bit (grad:8 + hess:8)
Trade-off: Minor accuracy loss, significant memory/bandwidth savings.
Multi-Output Handling#
The problem: How to handle multi-class or multi-target problems?
XGBoost: Vector Leaves (Optional)#
Single tree can output K values via size_leaf_vector:
Leaf stores: [v₀, v₁, ..., vₖ₋₁]
LightGBM: Separate Trees per Output#
Train num_tree_per_iteration = K trees, one per class:
Iteration i: [Tree_class0, Tree_class1, ..., Tree_classK]
Trade-off:
Vector leaves: shared structure, single traversal
Separate trees: more flexible, independent tree shapes
Model Serialization#
The problem: How to save and load trained models?
Format |
XGBoost |
LightGBM |
|---|---|---|
Primary |
JSON (structured) |
Text (simple) |
Binary |
UBJSON |
Binary |
Schema |
Fully specified |
Implicit |
XGBoost’s JSON format has a documented schema, making it easier for third-party implementations to load models.
Summary: Design Philosophy#
Aspect |
XGBoost |
LightGBM |
|---|---|---|
Philosophy |
Correctness, compatibility |
Speed, efficiency |
Tree growth |
Balanced (depth-wise) |
Aggressive (leaf-wise) |
Sampling |
Simple random |
Gradient-aware (GOSS) |
Categoricals |
Approximate |
Native optimal |
Gradients |
Full precision CPU |
Quantized CPU+GPU |
Documentation |
Extensive |
Good |
What We Learn from Each#
From XGBoost:
Clear JSON model format
Monotonic constraint implementation
Comprehensive parameter documentation
From LightGBM:
Leaf-wise growth efficiency
Native categorical handling
CPU gradient quantization
GOSS sampling strategy