Note

This tutorial is available as a Jupyter notebook. Download notebook

Tutorial 11: Sample Weighting#

🟡 Intermediate — Familiarity with ML concepts helpful

Learn how to use sample weights to handle class imbalance, emphasize recent data, and downweight outliers.

What you’ll learn#

  1. Handle class imbalance with balanced sample weights

  2. Use temporal weighting for concept drift

  3. Downweight outliers and noisy samples

  4. Integrate sample weights with both sklearn and core APIs

[1]:
import numpy as np
from sklearn.datasets import make_classification
from sklearn.metrics import classification_report
from sklearn.model_selection import train_test_split
from sklearn.utils.class_weight import compute_sample_weight

import boosters
from boosters.sklearn import GBDTClassifier, GBDTRegressor

Why Sample Weights?#

Sample weights allow you to control how much each training example contributes to the model:

  • Class imbalance: Give minority class samples higher weights

  • Temporal importance: Weight recent data more heavily for concept drift

  • Data quality: Downweight known outliers or noisy samples

  • Business logic: Emphasize high-value customers or critical cases

1. Handling Class Imbalance#

When classes are imbalanced, the model tends to favor the majority class. Sample weights can correct this by giving minority samples more importance.

[2]:
# Create imbalanced dataset (95% class 0, 5% class 1)
rng = np.random.default_rng(42)
n_samples = 1000
n_minority = 50

X = rng.standard_normal((n_samples, 5)).astype(np.float32)

# Create imbalanced labels
y = np.zeros(n_samples, dtype=np.float32)
y[:n_minority] = 1.0

# Shuffle
perm = rng.permutation(n_samples)
X, y = X[perm], y[perm]

X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42, stratify=y
)

print(f"Class distribution in training: {np.bincount(y_train.astype(int))}")
print(f"Class distribution in test:     {np.bincount(y_test.astype(int))}")
Class distribution in training: [760  40]
Class distribution in test:     [190  10]

Without Sample Weights#

The model will likely predict only the majority class:

[3]:
clf_no_weights = GBDTClassifier(n_estimators=50, verbose=0)
clf_no_weights.fit(X_train, y_train)
pred_no_weights = clf_no_weights.predict(X_test)

print("Classification Report (no weights):")
print(classification_report(y_test, pred_no_weights, zero_division=0))
Classification Report (no weights):
              precision    recall  f1-score   support

         0.0       0.95      1.00      0.97       190
         1.0       0.00      0.00      0.00        10

    accuracy                           0.95       200
   macro avg       0.47      0.50      0.49       200
weighted avg       0.90      0.95      0.93       200

With Balanced Sample Weights#

sklearn’s compute_sample_weight calculates weights inversely proportional to class frequency:

[4]:
# Compute balanced weights
weights = compute_sample_weight("balanced", y_train)

print(f"Weight for class 0 samples: {weights[y_train == 0][0]:.3f}")
print(f"Weight for class 1 samples: {weights[y_train == 1][0]:.3f}")
print(f"Ratio: {weights[y_train == 1][0] / weights[y_train == 0][0]:.1f}x")
Weight for class 0 samples: 0.526
Weight for class 1 samples: 10.000
Ratio: 19.0x
[5]:
clf_weighted = GBDTClassifier(n_estimators=50, verbose=0)
clf_weighted.fit(X_train, y_train, sample_weight=weights)
pred_weighted = clf_weighted.predict(X_test)

print("Classification Report (with balanced weights):")
print(classification_report(y_test, pred_weighted, zero_division=0))
Classification Report (with balanced weights):
              precision    recall  f1-score   support

         0.0       0.95      0.93      0.94       190
         1.0       0.00      0.00      0.00        10

    accuracy                           0.89       200
   macro avg       0.47      0.47      0.47       200
weighted avg       0.90      0.89      0.89       200

2. Temporal Weighting for Concept Drift#

When data distribution changes over time (concept drift), you may want to emphasize recent observations. This is common in:

  • Financial markets

  • User behavior prediction

  • Fraud detection

[6]:
# Simulate temporal data with concept drift
rng = np.random.default_rng(42)
n_samples = 500

X = rng.standard_normal((n_samples, 3)).astype(np.float32)

# Target relationship changes over time:
# Early: y depends mostly on X[:, 0]
# Late: y depends mostly on X[:, 1]
time_factor = np.linspace(0, 1, n_samples)
y = ((1 - time_factor) * X[:, 0] + time_factor * X[:, 1]).astype(np.float32)
y += rng.standard_normal(n_samples).astype(np.float32) * 0.1

# Split chronologically (not randomly!)
train_size = 400
X_train, X_test = X[:train_size], X[train_size:]
y_train, y_test = y[:train_size], y[train_size:]

print(f"Training on first {train_size} samples")
print(f"Testing on last {n_samples - train_size} samples (most recent)")
Training on first 400 samples
Testing on last 100 samples (most recent)
[7]:
# Uniform weights baseline
reg_uniform = GBDTRegressor(n_estimators=50, verbose=0)
reg_uniform.fit(X_train, y_train)
pred_uniform = reg_uniform.predict(X_test)
rmse_uniform = np.sqrt(np.mean((pred_uniform - y_test) ** 2))

# Exponential decay weights - recent samples have higher weight
decay_rate = 3.0  # Higher = more emphasis on recent data
temporal_weights = np.exp(decay_rate * np.linspace(0, 1, train_size)).astype(np.float32)

reg_weighted = GBDTRegressor(n_estimators=50, verbose=0)
reg_weighted.fit(X_train, y_train, sample_weight=temporal_weights)
pred_weighted = reg_weighted.predict(X_test)
rmse_weighted = np.sqrt(np.mean((pred_weighted - y_test) ** 2))

print(f"RMSE with uniform weights:   {rmse_uniform:.4f}")
print(f"RMSE with temporal weights:  {rmse_weighted:.4f}")
print(f"Improvement: {(rmse_uniform - rmse_weighted) / rmse_uniform * 100:.1f}%")
RMSE with uniform weights:   0.7219
RMSE with temporal weights:  0.5716
Improvement: 20.8%
[8]:
import matplotlib.pyplot as plt

fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(12, 4))

# Plot weight distribution
ax1.plot(temporal_weights)
ax1.set_xlabel("Sample index (time)")
ax1.set_ylabel("Weight")
ax1.set_title("Exponential Temporal Weights")
ax1.grid(True, alpha=0.3)

# Plot predictions vs actual
ax2.scatter(y_test, pred_uniform, alpha=0.5, label=f"Uniform (RMSE={rmse_uniform:.3f})")
ax2.scatter(y_test, pred_weighted, alpha=0.5, label=f"Temporal (RMSE={rmse_weighted:.3f})")
ax2.plot([y_test.min(), y_test.max()], [y_test.min(), y_test.max()], "k--")
ax2.set_xlabel("Actual")
ax2.set_ylabel("Predicted")
ax2.set_title("Predictions on Test Set")
ax2.legend()
ax2.grid(True, alpha=0.3)

plt.tight_layout()
plt.show()
../_images/tutorials_11-sample-weighting_13_0.png

3. Downweighting Outliers#

If you know certain samples are outliers or noisy, you can reduce their influence:

[9]:
rng = np.random.default_rng(42)
X = rng.standard_normal((200, 4)).astype(np.float32)
y = (X[:, 0] + X[:, 1] * 0.5 + rng.standard_normal(200) * 0.1).astype(np.float32)

# Inject some outliers
outlier_idx = [10, 50, 100, 150]
y[outlier_idx] = y[outlier_idx] + 10.0  # Shift by 10 standard deviations

# Create weights - downweight outliers
weights = np.ones(200, dtype=np.float32)
weights[outlier_idx] = 0.01  # Very low weight for known outliers

print(f"Number of outliers: {len(outlier_idx)}")
print(f"Outlier weight: {weights[outlier_idx[0]]}")
Number of outliers: 4
Outlier weight: 0.009999999776482582
[10]:
# Without outlier handling
reg_no_handling = GBDTRegressor(n_estimators=50, verbose=0)
reg_no_handling.fit(X, y)
pred_no_handling = reg_no_handling.predict(X)

# With outlier downweighting
reg_downweight = GBDTRegressor(n_estimators=50, verbose=0)
reg_downweight.fit(X, y, sample_weight=weights)
pred_downweight = reg_downweight.predict(X)

# Calculate RMSE on non-outlier samples only
mask = ~np.isin(np.arange(len(y)), outlier_idx)
rmse_no = np.sqrt(np.mean((pred_no_handling[mask] - y[mask]) ** 2))
rmse_dw = np.sqrt(np.mean((pred_downweight[mask] - y[mask]) ** 2))

print(f"RMSE on clean samples (no handling):    {rmse_no:.4f}")
print(f"RMSE on clean samples (with weights):   {rmse_dw:.4f}")
RMSE on clean samples (no handling):    0.2987
RMSE on clean samples (with weights):   0.1037

4. Core API with Sample Weights#

You can also use sample weights directly with the core Dataset and GBDTModel API:

[11]:
from boosters import Dataset, GBDTConfig, GBDTModel, Objective

rng = np.random.default_rng(42)
X = rng.standard_normal((200, 4)).astype(np.float32)
y = (X[:, 0] + X[:, 1] * 0.5 + rng.standard_normal(200) * 0.1).astype(np.float32)

# Create custom weights
weights = np.ones(200, dtype=np.float32)
weights[np.abs(y) > 2] = 0.1  # Downweight extreme values

print(f"Samples with reduced weight: {np.sum(weights < 1)}")
Samples with reduced weight: 17
[12]:
# Create dataset WITH weights
train_data = Dataset(
    X, y,
    weights=weights,
    feature_names=["feat_a", "feat_b", "feat_c", "feat_d"]
)

config = GBDTConfig(
    n_estimators=50,
    max_depth=4,
    learning_rate=0.1,
    objective=Objective.squared(),
)

model = GBDTModel.train(train_data, config=config)
predictions = model.predict(Dataset(X))

rmse = np.sqrt(np.mean((predictions.flatten() - y) ** 2))
print(f"Training RMSE: {rmse:.4f}")
print(f"Feature names stored in model: {model.feature_names}")
Training RMSE: 0.2142
Feature names stored in model: ['feat_a', 'feat_b', 'feat_c', 'feat_d']

Summary#

Sample weights are a powerful tool for:

Use Case

Weight Strategy

Class imbalance

Inverse class frequency (compute_sample_weight("balanced"))

Concept drift

Exponential decay (recent → higher weight)

Known outliers

Low fixed weight (e.g., 0.01)

Business importance

Custom weights per sample

Key Points#

  1. sklearn API: Pass sample_weight to .fit()

  2. Core API: Pass weights to Dataset()

  3. Weights are relative — they don’t need to sum to 1

  4. Use compute_sample_weight("balanced") for quick class balancing

  5. For temporal weighting, use exponential decay: exp(rate * normalized_time)

Next Steps#