Explainability Research#

This document covers the theory and implementation approaches for model explainability in gradient boosting systems.

Overview#

Model explainability in GBDT/GBLinear systems operates at two levels:

Global Explanations - Understanding the overall model behavior
- Feature importance (gain, split count, cover)
- Feature interactions
Local Explanations - Understanding individual predictions
- SHAP values (Shapley additive explanations)
- Feature contributions per prediction

Feature Importance#

Types of Feature Importance#

Type	Description	Use Case
Split count	Number of times a feature is used for splitting	Simple, fast, but can be misleading for high-cardinality features
Gain	Total/average gain from splits using this feature	Better indicator of predictive power
Cover	Number of samples covered by splits on this feature	Indicates breadth of feature influence
Permutation	Loss increase when feature is randomly shuffled	Most reliable but computationally expensive

XGBoost Implementation#

XGBoost computes feature importance via XGBoosterFeatureScore:

// Types: weight, gain, cover, total_gain, total_cover
// For tree models: walks tree structure
// For linear models: returns normalized coefficients

LightGBM Implementation#

LightGBM computes via LGBM_BoosterFeatureImportance:

importance_type = 0: Split count
importance_type = 1: Gain

Categorical Feature Handling#

For categorical features with one-hot encoding or native handling:

Sum importance across all categories
Provide per-category breakdown as optional

Linear Model Feature Importance#

For GBLinear and linear leaves:

Weight magnitude: |w_i| (absolute coefficient value)
Standardized weight: |w_i| * std(x_i) (accounts for feature scale)

SHAP Values (Shapley Additive Explanations)#

Theory#

SHAP values are based on game theory’s Shapley values. For a prediction:

\[f(x) = \phi_0 + \sum_{i=1}^{M} \phi_i\]

Where:

\(\phi_0\) = expected model output (base value)
\(\phi_i\) = contribution of feature \(i\) to the prediction

The Shapley value for feature \(i\) is:

\[\phi_i = \sum_{S \subseteq N \setminus \{i\}} \frac{|S|!(M-|S|-1)!}{M!} [f_{S \cup \{i\}}(x) - f_S(x)]\]

TreeSHAP Algorithm#

For tree ensembles, TreeSHAP provides an efficient O(TLD²) algorithm where:

T = number of trees
L = max leaves per tree
D = max depth

The key insight is that we can compute exact Shapley values by tracking paths through trees.

Path-Based Algorithm (Lundberg et al.)#

TREESHAP(tree, x):
    φ = [0] * num_features
    
    function recurse(node, path, zero_weight):
        if node is leaf:
            for each unique feature in path:
                # Compute contribution based on path weights
                φ[feature] += contribution
        else:
            # Split path based on feature
            left_path = extend_path(path, go_left(x, node))
            right_path = extend_path(path, go_right(x, node))
            
            recurse(node.left, left_path, ...)
            recurse(node.right, right_path, ...)
    
    recurse(root, empty_path, 1.0)
    return φ

Interventional vs Conditional SHAP#

Conditional (TreeSHAP): Uses tree structure to marginalize features
- Faster, leverages tree paths
- Can produce counterintuitive results for correlated features
Interventional: Treats features as independent
- More causally meaningful
- Requires different algorithm (GPUTreeSHAP supports this)

GPUTreeSHAP Approach#

The GPUTreeSHAP library (used by XGBoost) uses:

Path extraction: Convert tree to list of PathElement structures
Parallel processing: One thread per (row, path) pair
Efficient memory layout for coalesced access

struct PathElement {
    size_t path_idx;        // Unique path index
    int64_t feature_idx;    // Feature for this split (-1 for root)
    int group;              // Output group (multiclass)
    SplitCondition split;   // Threshold info
    double zero_fraction;   // P(taking this path when feature is "unknown")
    float v;                // Leaf value
};

SHAP for Linear Models#

For linear models, SHAP values are exact and have closed form:

\[\phi_i = w_i (x_i - E[X_i])\]

Where:

\(w_i\) = coefficient for feature \(i\)
\(x_i\) = feature value for this prediction
\(E[X_i]\) = mean of feature \(i\) over training data

Linear Leaves in GBDT#

For trees with linear leaves:

First compute tree contribution (which leaf was reached)
Within that leaf, compute linear SHAP values

Total contribution = tree path contribution + linear leaf contribution

Multi-Output Models#

For multiclass or multi-output regression:

Compute SHAP values per output
Return shape: (n_samples, n_features + 1, n_outputs)
The +1 is for the expected value

Implementation Strategy#

Phase 1: Feature Importance#

Tree-based models
- Walk tree structure during/after training
- Accumulate gain/split-count per feature
- Handle categorical features (sum across categories or report separately)
Linear models
- Return coefficient magnitudes
- Optionally standardize by feature scale

Phase 2: SHAP Values#

Tree-based models (CPU)
- Implement path-based TreeSHAP algorithm
- Support both conditional and interventional modes
- Handle categoricals, missing values, default directions
Linear models
- Implement closed-form solution
- Requires storing feature means from training
Linear leaves
- Hybrid approach: tree path + linear contribution

GPU Consideration (Future)#

For GPU acceleration:

Extract paths to contiguous memory
Use GPUTreeSHAP algorithm
Batch rows for efficient kernel execution

References#

Lundberg, Scott M., and Su-In Lee. “A unified approach to interpreting model predictions.” NeurIPS 2017.
Lundberg, Scott M., Gabriel G. Erion, and Su-In Lee. “Consistent individualized feature attribution for tree ensembles.” arXiv:1802.03888 (2018).
Mitchell, R., Frank, E., & Holmes, G. “GPUTreeShap: Massively parallel exact calculation of SHAP scores for tree ensembles.” arXiv:2010.13972 (2022).
XGBoost SHAP implementation: xgboost/src/tree/tree_shap.cc
GPUTreeSHAP: xgboost/gputreeshap/GPUTreeShap/gpu_treeshap.h
LightGBM feature importance: LightGBM/src/boosting/gbdt.cpp