GBLinear#

GBLinear uses linear models as the weak learners instead of decision trees. This produces a final linear model through iterative coordinate descent.

How GBLinear Works#

At each boosting iteration:

Compute gradients and Hessians for all training samples
Update one or more feature weights using coordinate descent
The update minimizes the second-order approximation of the loss
Apply learning rate shrinkage

The final model is linear:

\[\hat{y} = w_0 + \sum_{j=1}^{p} w_j x_j\]

where \(w_0\) is the bias and \(w_j\) are feature weights.

Weight Updates#

For a single feature \(j\), the optimal weight update is:

\[\Delta w_j = -\frac{\sum_i g_i x_{ij}}{\sum_i H_i x_{ij}^2 + \lambda}\]

where:

\(g_i\) is the gradient for sample \(i\)
\(H_i\) is the Hessian for sample \(i\)
\(x_{ij}\) is the feature value
\(\lambda\) is L2 regularization

Feature Selectors#

GBLinear supports different strategies for selecting which features to update:

Selector	Description
`cyclic`	Cycle through features in order (deterministic)
`shuffle`	Random feature order each round (breaks correlations)
`greedy`	Pick feature with largest gradient (sparse data)
`thrifty`	Approximate greedy (faster for high dimensions)

When to Use GBLinear#

GBLinear is ideal for:

High-dimensional sparse data: Text, click-through prediction
Linear relationships: When the true relationship is mostly linear
Fast inference: Linear prediction is O(features)
Interpretability: Feature weights directly show importance

Advantages#

Very fast inference
Memory efficient for sparse data
Easily interpretable (linear coefficients)
Good for high-dimensional data
L1/L2 regularization built-in

Disadvantages#

Cannot capture non-linear patterns
Cannot capture feature interactions
Assumes linear separability for classification

GBDT vs GBLinear#

Aspect	GBDT	GBLinear
Relationships	Non-linear, interactions	Linear only
Inference speed	O(trees × depth)	O(features)
Sparse data	OK	Excellent
Interpretability	Feature importance	Direct coefficients
Best for	Tabular data	High-dim linear data

Key Hyperparameters#

Parameter	Default	Effect
`n_estimators`	100	Number of boosting rounds
`learning_rate`	0.3	Step size for weight updates
`l2`	1.0	L2 regularization
`l1`	0.0	L1 regularization

Example#

from boosters.sklearn import GBLinearRegressor

model = GBLinearRegressor(
    n_estimators=100,
    learning_rate=0.5,
    l2=0.1,  # L2 regularization
)
model.fit(X_train, y_train)
predictions = model.predict(X_test)

# Access linear coefficients
print("Weights:", model.coef_)
print("Bias:", model.intercept_)