Tree Growth Strategies#

Tree growth strategy determines the order in which nodes are expanded during training. The two main approaches — depth-wise and leaf-wise — have different trade-offs in terms of efficiency, accuracy, and overfitting risk.

Overview#

Depth-wise builds balanced trees by splitting all nodes at each level before moving to the next level. This is the traditional approach.

Leaf-wise always splits the node (leaf) with the highest gain, regardless of depth. This focuses computation on the most informative splits.

Depth-wise:
Level 0:    [root]           ← Split this first
               │
Level 1:  [A]   [B]          ← Split BOTH before going deeper
           │     │
Level 2: [C][D] [E][F]       ← Split ALL four

Leaf-wise:
[root] → [highest-gain leaf] → [highest-gain leaf] → ...
May produce unbalanced trees

Depth-Wise Growth#

Algorithm#

Algorithm: Depth-Wise Tree Growth
─────────────────────────────────────────
Input: Training data, max_depth, other constraints
Output: Decision tree

1. Create root node with all samples
2. For depth = 0 to max_depth - 1:
   a. For each leaf node at current depth:
      - Build histogram
      - Find best split
      - If gain > 0 and constraints satisfied:
        - Create two child nodes
        - Partition samples
   b. If no splits made, stop
3. Compute leaf weights for all leaves

Characteristics#

Aspect	Depth-Wise
Tree shape	Balanced (same depth everywhere)
Parallelism	Easy — process entire level together
Memory	Predictable — 2^depth nodes at each level
Best for	Shallow trees, distributed training
Default in	XGBoost

Level-Parallel Optimization#

All nodes at the same depth are independent, enabling parallelism:

Level 2: [C] [D] [E] [F]
         ↓   ↓   ↓   ↓    ← All can be processed in parallel
         Build histograms, find splits simultaneously

This is particularly valuable in distributed settings where histogram aggregation can be done level-by-level.

Leaf-Wise Growth#

Algorithm#

Algorithm: Leaf-Wise Tree Growth
─────────────────────────────────────────
Input: Training data, num_leaves, other constraints
Output: Decision tree

1. Create root node, add to priority queue with gain = ∞
2. While num_nodes < num_leaves:
   a. Pop leaf with highest gain from queue
   b. Build histogram for this leaf
   c. Find best split
   d. If gain > 0 and constraints satisfied:
      - Split into two children
      - Compute potential gains for children
      - Add children to priority queue
   e. Else: remove from queue (won't be split)
3. Compute leaf weights for all leaves

Characteristics#

Aspect	Leaf-Wise
Tree shape	Unbalanced (deeper where data is complex)
Parallelism	Harder — must track best leaf globally
Memory	Variable — priority queue of candidates
Best for	Large datasets, when efficiency matters
Default in	LightGBM

Why Leaf-Wise is More Efficient#

Consider 8 samples with this structure:

Depth-wise must split all nodes at each level:
      [8]            ← Split (gain = 10)
     /    \
   [4]    [4]        ← Split BOTH (gains = 8, 2)
   / \    / \
 [2][2] [2][2]       ← Split ALL FOUR

Leaf-wise only splits high-gain nodes:
      [8]            ← Split (gain = 10)
     /    \
   [4]    [4]        ← Only split left (gain = 8)
   / \      \
 [2][2]    [4]       ← Only split [2] with gain > threshold

With the same number of leaves, leaf-wise achieves lower loss because it prioritizes the most impactful splits.

Comparison#

Criterion	Depth-Wise	Leaf-Wise
Accuracy (same #leaves)	Lower	Higher
Training speed	Slower	Faster
Overfitting risk	Lower	Higher (needs max_depth)
Distributed training	Easier	Harder
GPU efficiency	Better	Okay
Tree interpretability	Balanced structure	Varied depths

When to Use Depth-Wise#

Small datasets (< 10k samples) — less overfitting risk
Distributed training — level-sync is natural
GPU training — uniform tree structure helps
When max_depth is small (< 6)

When to Use Leaf-Wise#

Large datasets (> 100k samples) — efficiency matters
When training time is a bottleneck
Deep trees are beneficial
Single-machine training

Hybrid Approaches#

Leaf-Wise with Max Depth#

LightGBM uses leaf-wise but respects max_depth:

Parameters:
  num_leaves = 31     # Primary constraint
  max_depth = -1      # No depth limit (default)
  # or
  max_depth = 8       # Additional constraint

This gives leaf-wise efficiency with depth-based regularization.

Loss-Guided in XGBoost#

XGBoost’s grow_policy = lossguide enables leaf-wise:

Parameters:
  grow_policy = 'lossguide'
  max_leaves = 31

Stopping Criteria#

Both strategies use similar stopping conditions:

Criterion	Description
max_depth	Don’t grow beyond this depth
num_leaves / max_leaves	Maximum leaf count
min_child_weight	Minimum Hessian sum in child
min_split_gain	Minimum gain to accept split
min_data_in_leaf	Minimum samples in leaf

The difference is which splits are prioritized, not which are allowed.

Implementation Considerations#

Depth-Wise#

Simpler implementation:
- Process level by level
- All nodes at same level have same parent depth
- Natural batch processing

Data structures:
- Two arrays: current_level_nodes, next_level_nodes
- Swap arrays each level

Leaf-Wise#

More complex implementation:
- Priority queue of candidate leaves
- Track potential gain for each leaf
- May need to recompute gains after sibling splits

Data structures:
- Priority queue (max-heap by gain)
- Gain cache per leaf

Theoretical Analysis#

Loss Reduction#

For a fixed number of leaves \(L\), leaf-wise achieves the globally optimal allocation of splits (greedily). Depth-wise may waste splits on low-gain regions.

Overfitting Risk#

Leaf-wise can create very deep paths for outliers, leading to overfitting on small datasets. The max_depth constraint mitigates this.

Computational Complexity#

Both strategies: O(samples × features × depth) per tree.

Leaf-wise has lower constants because:

Histogram subtraction is more effective (larger sibling more common)
Fewer total nodes visited

Source References#

XGBoost#

src/tree/updater_quantile_hist.cc — DepthWise and LossGuide drivers
Parameter: grow_policy = 'depthwise' (default) or 'lossguide'

LightGBM#

src/treelearner/serial_tree_learner.cpp — Train() method (leaf-wise)
Parameters: num_leaves, max_depth