aif360.sklearn.postprocessing.PostProcessingMeta

class aif360.sklearn.postprocessing.PostProcessingMeta(estimator, postprocessor=CalibratedEqualizedOdds(), needs_proba=None, prefit=False, val_size=0.25, **options)[source]

A meta-estimator which wraps a given estimator with a post-processing step.

The post-processor trains on a separate training set from the estimator to prevent leakage.

Note

Because of the dataset splitting, if a Pipeline is necessary it should be used as the input to this meta-estimator not the other way around.

Variables:
  • estimator – Fitted estimator.
  • postprocessor – Fitted postprocessor.
  • needs_proba (bool) – Determined depending on the postprocessor type if needs_proba is None.
Parameters:
  • estimator (sklearn.BaseEstimator) – Original estimator.
  • postprocessor – Post-processing algorithm.
  • needs_proba (bool) – Use self.estimator_.predict_proba() instead of self.estimator_.predict() as input to postprocessor. If None, defaults to True if the postprocessor supports it.
  • prefit (bool) – If True, it is assumed that estimator has been fitted already and all data is used to train postprocessor.
  • val_size (int or float) – Size of validation set used to fit the postprocessor. The estimator fits on the remainder of the training set. See train_test_split() for details.
  • **options – Keyword options passed through to train_test_split(). Note: ‘train_size’ and ‘test_size’ will be ignored in favor of ‘val_size’.

Methods

fit Splits the training samples with train_test_split() and uses the resultant ‘train’ portion to train the estimator.
get_params Get parameters for this estimator.
predict Predict class labels for the given samples.
predict_log_proba Log of probability estimates.
predict_proba Probability estimates.
score Returns the output of the post-processor’s score function on the given test data and labels.
set_params Set the parameters of this estimator.
__init__(estimator, postprocessor=CalibratedEqualizedOdds(), needs_proba=None, prefit=False, val_size=0.25, **options)[source]
Parameters:
  • estimator (sklearn.BaseEstimator) – Original estimator.
  • postprocessor – Post-processing algorithm.
  • needs_proba (bool) – Use self.estimator_.predict_proba() instead of self.estimator_.predict() as input to postprocessor. If None, defaults to True if the postprocessor supports it.
  • prefit (bool) – If True, it is assumed that estimator has been fitted already and all data is used to train postprocessor.
  • val_size (int or float) – Size of validation set used to fit the postprocessor. The estimator fits on the remainder of the training set. See train_test_split() for details.
  • **options – Keyword options passed through to train_test_split(). Note: ‘train_size’ and ‘test_size’ will be ignored in favor of ‘val_size’.
fit(X, y, sample_weight=None, **fit_params)[source]

Splits the training samples with train_test_split() and uses the resultant ‘train’ portion to train the estimator. Then the estimator predicts on the ‘test’ portion of the split data and the post-processor is trained with those prediction-ground-truth target pairs.

Parameters:
  • X (array-like) – Training samples.
  • y (pandas.Series) – Training labels.
  • sample_weight (array-like, optional) – Sample weights.
  • **fit_params – Parameters passed to the post-processor fit() method. Note: these do not need to be prefixed with __ notation.
Returns:

self

predict(X)[source]

Predict class labels for the given samples.

First, runs self.estimator_.predict() (or predict_proba() if self.needs_proba_ is True) then returns the post-processed output from those predictions.

Parameters:X (pandas.DataFrame) – Test samples.
Returns:numpy.ndarray – Predicted class label per sample.
predict_log_proba(X)[source]

Log of probability estimates.

First, runs self.estimator_.predict() (or predict_proba() if self.needs_proba_ is True) then returns the post-processed output from those predictions.

The returned estimates for all classes are ordered by the label of classes.

Parameters:X (pandas.DataFrame) – Test samples.
Returns:array – Returns the log-probability of the sample for each class in the model, where classes are ordered as they are in self.classes_.
predict_proba(X)[source]

Probability estimates.

First, runs self.estimator_.predict() (or predict_proba() if self.needs_proba_ is True) then returns the post-processed output from those predictions.

The returned estimates for all classes are ordered by the label of classes.

Parameters:X (pandas.DataFrame) – Test samples.
Returns:numpy.ndarray – Returns the probability of the sample for each class in the model, where classes are ordered as they are in self.classes_.
score(X, y, sample_weight=None)[source]

Returns the output of the post-processor’s score function on the given test data and labels.

First, runs self.estimator_.predict() (or predict_proba() if self.needs_proba_ is True) then gets the post-processed output from those predictions and scores it.

Parameters:
  • X (pandas.DataFrame) – Test samples.
  • y (array-like) – True labels for X.
  • sample_weight (array-like, optional) – Sample weights.
Returns:

float – Score value.