`aif360.sklearn.postprocessing`.PostProcessingMeta¶

class aif360.sklearn.postprocessing.PostProcessingMeta(estimator, postprocessor=CalibratedEqualizedOdds(), needs_proba=None, prefit=False, val_size=0.25, **options)[source]¶

A meta-estimator which wraps a given estimator with a post-processing step.

The post-processor trains on a separate training set from the estimator to prevent leakage.

Note

Because of the dataset splitting, if a Pipeline is necessary it should be used as the input to this meta-estimator not the other way around.

Variables:

estimator – Fitted estimator.
postprocessor – Fitted postprocessor.
needs_proba (bool) – Determined depending on the postprocessor type if needs_proba is None.

Parameters:

estimator (sklearn.BaseEstimator) – Original estimator.
postprocessor – Post-processing algorithm.
needs_proba (bool) – Use self.estimator_.predict_proba() instead of self.estimator_.predict() as input to postprocessor. If None, defaults to True if the postprocessor supports it.
prefit (bool) – If True, it is assumed that estimator has been fitted already and all data is used to train postprocessor.
val_size (int or float) – Size of validation set used to fit the postprocessor. The estimator fits on the remainder of the training set. See train_test_split() for details.
**options – Keyword options passed through to train_test_split(). Note: ‘train_size’ and ‘test_size’ will be ignored in favor of ‘val_size’.

Methods

`fit`	Splits the training samples with `train_test_split()` and uses the resultant ‘train’ portion to train the estimator.
`get_params`	Get parameters for this estimator.
`predict`	Predict class labels for the given samples.
`predict_log_proba`	Log of probability estimates.
`predict_proba`	Probability estimates.
`score`	Returns the output of the post-processor’s score function on the given test data and labels.
`set_params`	Set the parameters of this estimator.

__init__(estimator, postprocessor=CalibratedEqualizedOdds(), needs_proba=None, prefit=False, val_size=0.25, **options)[source]¶

Parameters:

estimator (sklearn.BaseEstimator) – Original estimator.
postprocessor – Post-processing algorithm.
needs_proba (bool) – Use self.estimator_.predict_proba() instead of self.estimator_.predict() as input to postprocessor. If None, defaults to True if the postprocessor supports it.
prefit (bool) – If True, it is assumed that estimator has been fitted already and all data is used to train postprocessor.
val_size (int or float) – Size of validation set used to fit the postprocessor. The estimator fits on the remainder of the training set. See train_test_split() for details.
**options – Keyword options passed through to train_test_split(). Note: ‘train_size’ and ‘test_size’ will be ignored in favor of ‘val_size’.

fit(X, y, sample_weight=None, **fit_params)[source]¶

Splits the training samples with train_test_split() and uses the resultant ‘train’ portion to train the estimator. Then the estimator predicts on the ‘test’ portion of the split data and the post-processor is trained with those prediction-ground-truth target pairs.

Parameters:	X (array-like) – Training samples. y (pandas.Series) – Training labels. sample_weight (array-like, optional) – Sample weights. **fit_params – Parameters passed to the post-processor `fit()` method. Note: these do not need to be prefixed with `__` notation.
Returns:	self

predict(X)[source]¶

Predict class labels for the given samples.

First, runs self.estimator_.predict() (or predict_proba() if self.needs_proba_ is True) then returns the post-processed output from those predictions.

Parameters:	X (pandas.DataFrame) – Test samples.
Returns:	numpy.ndarray – Predicted class label per sample.

predict_log_proba(X)[source]¶

Log of probability estimates.

First, runs self.estimator_.predict() (or predict_proba() if self.needs_proba_ is True) then returns the post-processed output from those predictions.

The returned estimates for all classes are ordered by the label of classes.

Parameters:	X (pandas.DataFrame) – Test samples.
Returns:	array – Returns the log-probability of the sample for each class in the model, where classes are ordered as they are in `self.classes_`.

predict_proba(X)[source]¶

Probability estimates.

First, runs self.estimator_.predict() (or predict_proba() if self.needs_proba_ is True) then returns the post-processed output from those predictions.

The returned estimates for all classes are ordered by the label of classes.

Parameters:	X (pandas.DataFrame) – Test samples.
Returns:	numpy.ndarray – Returns the probability of the sample for each class in the model, where classes are ordered as they are in `self.classes_`.

score(X, y, sample_weight=None)[source]¶

Returns the output of the post-processor’s score function on the given test data and labels.

First, runs self.estimator_.predict() (or predict_proba() if self.needs_proba_ is True) then gets the post-processed output from those predictions and scores it.

Parameters:	X (pandas.DataFrame) – Test samples. y (array-like) – True labels for X. sample_weight (array-like, optional) – Sample weights.
Returns:	float – Score value.

aif360.sklearn.postprocessing.PostProcessingMeta¶

`aif360.sklearn.postprocessing`.PostProcessingMeta¶