Tutorial 03: Binary Classification#

🟢 Beginner — No prior boosting experience needed

Learn how to train a binary classifier with boosters and evaluate it using ROC curves and AUC.

What you’ll learn#

Train a binary classifier
Get probability predictions
Plot ROC curves and calculate AUC
Choose classification thresholds

[1]:

import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.metrics import roc_curve, auc, precision_recall_curve, classification_report

from boosters.sklearn import GBDTClassifier

Generate Data#

Create a binary classification dataset:

[2]:

# Generate binary classification data
X, y = make_classification(
    n_samples=2000,
    n_features=20,
    n_informative=10,
    n_redundant=5,
    n_classes=2,
    weights=[0.7, 0.3],  # Slight imbalance
    random_state=42
)

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

print(f"Training samples: {len(X_train)}")
print(f"Class distribution: {np.bincount(y_train)}")

Training samples: 1600
Class distribution: [1121  479]

Train the Classifier#

[3]:

# Train classifier
clf = GBDTClassifier(
    n_estimators=100,
    max_depth=6,
    learning_rate=0.1,
)
clf.fit(X_train, y_train)

print(f"Accuracy: {clf.score(X_test, y_test):.4f}")

Accuracy: 0.9300

Probability Predictions#

[4]:

# Get probability predictions
y_proba = clf.predict_proba(X_test)
y_score = y_proba[:, 1]  # Probability of positive class

print(f"Probability shape: {y_proba.shape}")
print(f"Sample probabilities: {y_score[:5]}")

Probability shape: (400, 2)
Sample probabilities: [0.00366234 0.00452526 0.6404448  0.05477158 0.8995941 ]

ROC Curve and AUC#

[5]:

# Calculate ROC curve
fpr, tpr, thresholds = roc_curve(y_test, y_score)
roc_auc = auc(fpr, tpr)

# Plot
plt.figure(figsize=(8, 6))
plt.plot(fpr, tpr, color='blue', lw=2, label=f'ROC curve (AUC = {roc_auc:.3f})')
plt.plot([0, 1], [0, 1], color='gray', lw=1, linestyle='--', label='Random')
plt.xlabel('False Positive Rate')
plt.ylabel('True Positive Rate')
plt.title('Receiver Operating Characteristic (ROC)')
plt.legend(loc='lower right')
plt.grid(True, alpha=0.3)
plt.show()

print(f"AUC: {roc_auc:.4f}")

../_images/tutorials_03-classification_9_0.png

AUC: 0.9799

Precision-Recall Curve#

[6]:

# Calculate precision-recall curve
precision, recall, thresholds_pr = precision_recall_curve(y_test, y_score)
pr_auc = auc(recall, precision)

# Plot
plt.figure(figsize=(8, 6))
plt.plot(recall, precision, color='blue', lw=2, label=f'PR curve (AUC = {pr_auc:.3f})')
plt.xlabel('Recall')
plt.ylabel('Precision')
plt.title('Precision-Recall Curve')
plt.legend(loc='lower left')
plt.grid(True, alpha=0.3)
plt.show()

../_images/tutorials_03-classification_11_0.png

Classification Report#

[7]:

# Get class predictions
y_pred = clf.predict(X_test)

# Print classification report
print(classification_report(y_test, y_pred, target_names=['Negative', 'Positive']))

              precision    recall  f1-score   support

    Negative       0.93      0.97      0.95       275
    Positive       0.92      0.85      0.88       125

    accuracy                           0.93       400
   macro avg       0.93      0.91      0.92       400
weighted avg       0.93      0.93      0.93       400

Summary#

In this tutorial, you learned how to:

✅ Train a binary classifier
✅ Get probability predictions with predict_proba
✅ Plot ROC curves and calculate AUC
✅ Analyze precision-recall trade-offs

Next Steps#

Tutorial 04: Multiclass Classification — Handle multiple classes
Tutorial 05: Early Stopping — Prevent overfitting