Tree Growth Strategies#

Tree growth strategy determines the order in which nodes are expanded during training. The two main approaches — depth-wise and leaf-wise — have different trade-offs in terms of efficiency, accuracy, and overfitting risk.


Overview#

Depth-wise builds balanced trees by splitting all nodes at each level before moving to the next level. This is the traditional approach.

Leaf-wise always splits the node (leaf) with the highest gain, regardless of depth. This focuses computation on the most informative splits.

Depth-wise:
Level 0:    [root]           ← Split this first
               │
Level 1:  [A]   [B]          ← Split BOTH before going deeper
           │     │
Level 2: [C][D] [E][F]       ← Split ALL four

Leaf-wise:
[root] → [highest-gain leaf] → [highest-gain leaf] → ...
May produce unbalanced trees

Depth-Wise Growth#

Algorithm#

Algorithm: Depth-Wise Tree Growth
─────────────────────────────────────────
Input: Training data, max_depth, other constraints
Output: Decision tree

1. Create root node with all samples
2. For depth = 0 to max_depth - 1:
   a. For each leaf node at current depth:
      - Build histogram
      - Find best split
      - If gain > 0 and constraints satisfied:
        - Create two child nodes
        - Partition samples
   b. If no splits made, stop
3. Compute leaf weights for all leaves

Characteristics#

Aspect

Depth-Wise

Tree shape

Balanced (same depth everywhere)

Parallelism

Easy — process entire level together

Memory

Predictable — 2^depth nodes at each level

Best for

Shallow trees, distributed training

Default in

XGBoost

Level-Parallel Optimization#

All nodes at the same depth are independent, enabling parallelism:

Level 2: [C] [D] [E] [F]
         ↓   ↓   ↓   ↓    ← All can be processed in parallel
         Build histograms, find splits simultaneously

This is particularly valuable in distributed settings where histogram aggregation can be done level-by-level.


Leaf-Wise Growth#

Algorithm#

Algorithm: Leaf-Wise Tree Growth
─────────────────────────────────────────
Input: Training data, num_leaves, other constraints
Output: Decision tree

1. Create root node, add to priority queue with gain = ∞
2. While num_nodes < num_leaves:
   a. Pop leaf with highest gain from queue
   b. Build histogram for this leaf
   c. Find best split
   d. If gain > 0 and constraints satisfied:
      - Split into two children
      - Compute potential gains for children
      - Add children to priority queue
   e. Else: remove from queue (won't be split)
3. Compute leaf weights for all leaves

Characteristics#

Aspect

Leaf-Wise

Tree shape

Unbalanced (deeper where data is complex)

Parallelism

Harder — must track best leaf globally

Memory

Variable — priority queue of candidates

Best for

Large datasets, when efficiency matters

Default in

LightGBM

Why Leaf-Wise is More Efficient#

Consider 8 samples with this structure:

Depth-wise must split all nodes at each level:
      [8]            ← Split (gain = 10)
     /    \
   [4]    [4]        ← Split BOTH (gains = 8, 2)
   / \    / \
 [2][2] [2][2]       ← Split ALL FOUR

Leaf-wise only splits high-gain nodes:
      [8]            ← Split (gain = 10)
     /    \
   [4]    [4]        ← Only split left (gain = 8)
   / \      \
 [2][2]    [4]       ← Only split [2] with gain > threshold

With the same number of leaves, leaf-wise achieves lower loss because it prioritizes the most impactful splits.


Comparison#

Criterion

Depth-Wise

Leaf-Wise

Accuracy (same #leaves)

Lower

Higher

Training speed

Slower

Faster

Overfitting risk

Lower

Higher (needs max_depth)

Distributed training

Easier

Harder

GPU efficiency

Better

Okay

Tree interpretability

Balanced structure

Varied depths

When to Use Depth-Wise#

  • Small datasets (< 10k samples) — less overfitting risk

  • Distributed training — level-sync is natural

  • GPU training — uniform tree structure helps

  • When max_depth is small (< 6)

When to Use Leaf-Wise#

  • Large datasets (> 100k samples) — efficiency matters

  • When training time is a bottleneck

  • Deep trees are beneficial

  • Single-machine training


Hybrid Approaches#

Leaf-Wise with Max Depth#

LightGBM uses leaf-wise but respects max_depth:

Parameters:
  num_leaves = 31     # Primary constraint
  max_depth = -1      # No depth limit (default)
  # or
  max_depth = 8       # Additional constraint

This gives leaf-wise efficiency with depth-based regularization.

Loss-Guided in XGBoost#

XGBoost’s grow_policy = lossguide enables leaf-wise:

Parameters:
  grow_policy = 'lossguide'
  max_leaves = 31

Stopping Criteria#

Both strategies use similar stopping conditions:

Criterion

Description

max_depth

Don’t grow beyond this depth

num_leaves / max_leaves

Maximum leaf count

min_child_weight

Minimum Hessian sum in child

min_split_gain

Minimum gain to accept split

min_data_in_leaf

Minimum samples in leaf

The difference is which splits are prioritized, not which are allowed.


Implementation Considerations#

Depth-Wise#

Simpler implementation:
- Process level by level
- All nodes at same level have same parent depth
- Natural batch processing

Data structures:
- Two arrays: current_level_nodes, next_level_nodes
- Swap arrays each level

Leaf-Wise#

More complex implementation:
- Priority queue of candidate leaves
- Track potential gain for each leaf
- May need to recompute gains after sibling splits

Data structures:
- Priority queue (max-heap by gain)
- Gain cache per leaf

Theoretical Analysis#

Loss Reduction#

For a fixed number of leaves \(L\), leaf-wise achieves the globally optimal allocation of splits (greedily). Depth-wise may waste splits on low-gain regions.

Overfitting Risk#

Leaf-wise can create very deep paths for outliers, leading to overfitting on small datasets. The max_depth constraint mitigates this.

Computational Complexity#

Both strategies: O(samples × features × depth) per tree.

Leaf-wise has lower constants because:

  • Histogram subtraction is more effective (larger sibling more common)

  • Fewer total nodes visited


Source References#

XGBoost#

  • src/tree/updater_quantile_hist.ccDepthWise and LossGuide drivers

  • Parameter: grow_policy = 'depthwise' (default) or 'lossguide'

LightGBM#

  • src/treelearner/serial_tree_learner.cppTrain() method (leaf-wise)

  • Parameters: num_leaves, max_depth