boosters.sklearn.GBDTClassifier#

class boosters.sklearn.GBDTClassifier[source]#

Bases: _GBDTEstimatorBase, ClassifierMixin

Gradient Boosted Decision Tree Classifier.

A sklearn-compatible wrapper around GBDTModel for classification.

Parameters:
  • n_estimators (int, default=100) – Number of boosting rounds.

  • learning_rate (float, default=0.1) – Learning rate.

  • max_depth (int, default=6) – Maximum depth of each tree.

  • min_child_weight (float, default=1.0) – Minimum sum of instance weight in a child node.

  • max_leaves (int, default=31) – Maximum number of leaves per tree.

  • grow_strategy (GrowthStrategy, default=GrowthStrategy.Depthwise) – Tree growing strategy.

  • colsample_bytree (float, default=1.0) – Subsample ratio of columns for each tree.

  • subsample (float, default=1.0) – Subsample ratio of training instances.

  • gamma (float, default=0.0) – Minimum loss reduction required for split.

  • reg_alpha (float, default=0.0) – L1 regularization.

  • reg_lambda (float, default=1.0) – L2 regularization.

  • early_stopping_rounds (int or None, default=None) – Stop if no improvement for this many rounds.

  • seed (int, default=42) – Random seed.

  • n_threads (int, default=0) – Number of threads (0 = auto).

  • objective (Objective or None, default=None) – Loss function. Must be a classification objective. If None, auto-detects: Objective.logistic() for binary, Objective.softmax() for multiclass.

  • metric (Metric or None, default=None) – Evaluation metric. If None, uses Metric.logloss().

  • Attributes

  • ----------

  • model (GBDTModel) – The fitted core model.

  • classes (ndarray) – Unique class labels.

  • n_classes (int) – Number of classes.

  • n_features_in (int) – Number of features seen during fit.

  • feature_importances (ndarray of shape (n_features,)) – Feature importance scores.

classes_: ndarray[tuple[Any, ...], dtype[Any]]#
n_classes_: int#
predict(X)[source]#

Predict class labels.

Parameters:
  • X (array-like of shape (n_samples, n_features)) – Input samples.

  • Returns

  • -------

  • y_pred (ndarray of shape (n_samples,)) – Predicted class labels.

Return type:

ndarray[tuple[Any, ...], dtype[Any]]

predict_proba(X)[source]#

Predict class probabilities.

Parameters:
  • X (array-like of shape (n_samples, n_features)) – Input samples.

  • Returns

  • -------

  • proba (ndarray of shape (n_samples, n_classes)) – Class probability estimates.

Return type:

ndarray[tuple[Any, ...], dtype[float32]]

__init__(n_estimators=100, learning_rate=0.1, max_depth=6, min_child_weight=1.0, max_leaves=31, grow_strategy=GrowthStrategy.Depthwise, colsample_bytree=1.0, subsample=1.0, gamma=0.0, reg_alpha=0.0, reg_lambda=1.0, early_stopping_rounds=None, seed=42, n_threads=0, verbose=1, objective=None, metric=None)#
Parameters:
Return type:

None

property feature_importances_: ndarray[tuple[Any, ...], dtype[float32]]#

Return feature importances (gain-based).

fit(X, y, eval_set=None, sample_weight=None)#

Fit the estimator.

Parameters:
  • X (array-like of shape (n_samples, n_features)) – Training input samples.

  • y (array-like of shape (n_samples,)) – Target values.

  • eval_set (tuple of (X, y), optional) – Validation set as (X_val, y_val) tuple.

  • sample_weight (array-like of shape (n_samples,), optional) – Sample weights.

  • Returns

  • -------

  • self – Fitted estimator.

Return type:

Self

get_feature_importance(importance_type=ImportanceType.Gain)#

Get feature importance scores.

Parameters:
  • importance_type (ImportanceType, default=ImportanceType.Gain) – Type of feature importance to compute.

  • Returns

  • -------

  • importance (ndarray of shape (n_features,)) – Feature importance scores.

Return type:

ndarray[tuple[Any, ...], dtype[float32]]

get_metadata_routing()#

Get metadata routing of this object.

Please check User Guide on how the routing mechanism works.

Returns:

routing – A MetadataRequest encapsulating routing information.

Return type:

MetadataRequest

get_params(deep=True)#

Get parameters for this estimator.

Parameters:

deep (bool, default=True) – If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns:

params – Parameter names mapped to their values.

Return type:

dict

score(X, y, sample_weight=None)#

Return accuracy on provided data and labels.

In multi-label classification, this is the subset accuracy which is a harsh metric since you require for each sample that each label set be correctly predicted.

Parameters:
  • X (array-like of shape (n_samples, n_features)) – Test samples.

  • y (array-like of shape (n_samples,) or (n_samples, n_outputs)) – True labels for X.

  • sample_weight (array-like of shape (n_samples,), default=None) – Sample weights.

Returns:

score – Mean accuracy of self.predict(X) w.r.t. y.

Return type:

float

set_fit_request(*, eval_set='$UNCHANGED$', sample_weight='$UNCHANGED$')#

Configure whether metadata should be requested to be passed to the fit method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to fit if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to fit.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:
  • eval_set (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for eval_set parameter in fit.

  • sample_weight (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for sample_weight parameter in fit.

  • self (GBDTClassifier)

Returns:

self – The updated object.

Return type:

object

set_params(**params)#

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters:

**params (dict) – Estimator parameters.

Returns:

self – Estimator instance.

Return type:

estimator instance

set_score_request(*, sample_weight='$UNCHANGED$')#

Configure whether metadata should be requested to be passed to the score method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to score if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to score.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:
  • sample_weight (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for sample_weight parameter in score.

  • self (GBDTClassifier)

Returns:

self – The updated object.

Return type:

object

model_: GBDTModel#
n_features_in_: int#