GBLinear#
GBLinear uses linear models as the weak learners instead of decision trees. This produces a final linear model through iterative coordinate descent.
How GBLinear Works#
At each boosting iteration:
Compute gradients and Hessians for all training samples
Update one or more feature weights using coordinate descent
The update minimizes the second-order approximation of the loss
Apply learning rate shrinkage
The final model is linear:
where \(w_0\) is the bias and \(w_j\) are feature weights.
Weight Updates#
For a single feature \(j\), the optimal weight update is:
where:
\(g_i\) is the gradient for sample \(i\)
\(H_i\) is the Hessian for sample \(i\)
\(x_{ij}\) is the feature value
\(\lambda\) is L2 regularization
Feature Selectors#
GBLinear supports different strategies for selecting which features to update:
Selector |
Description |
|---|---|
|
Cycle through features in order (deterministic) |
|
Random feature order each round (breaks correlations) |
|
Pick feature with largest gradient (sparse data) |
|
Approximate greedy (faster for high dimensions) |
When to Use GBLinear#
GBLinear is ideal for:
High-dimensional sparse data: Text, click-through prediction
Linear relationships: When the true relationship is mostly linear
Fast inference: Linear prediction is O(features)
Interpretability: Feature weights directly show importance
Advantages#
Very fast inference
Memory efficient for sparse data
Easily interpretable (linear coefficients)
Good for high-dimensional data
L1/L2 regularization built-in
Disadvantages#
Cannot capture non-linear patterns
Cannot capture feature interactions
Assumes linear separability for classification
GBDT vs GBLinear#
Aspect |
GBDT |
GBLinear |
|---|---|---|
Relationships |
Non-linear, interactions |
Linear only |
Inference speed |
O(trees × depth) |
O(features) |
Sparse data |
OK |
Excellent |
Interpretability |
Feature importance |
Direct coefficients |
Best for |
Tabular data |
High-dim linear data |
Key Hyperparameters#
Parameter |
Default |
Effect |
|---|---|---|
|
100 |
Number of boosting rounds |
|
0.3 |
Step size for weight updates |
|
1.0 |
L2 regularization |
|
0.0 |
L1 regularization |
Example#
from boosters.sklearn import GBLinearRegressor
model = GBLinearRegressor(
n_estimators=100,
learning_rate=0.5,
l2=0.1, # L2 regularization
)
model.fit(X_train, y_train)
predictions = model.predict(X_test)
# Access linear coefficients
print("Weights:", model.coef_)
print("Bias:", model.intercept_)
See Also#
RFC-0010: GBLinear — Design document with algorithm details
Gradient Boosting — Theory overview
Tutorial 06: GBLinear & Sparse Data — GBLinear tutorial
Hyperparameters — Complete parameter guide