aif360.sklearn.postprocessing
.PostProcessingMeta¶
-
class
aif360.sklearn.postprocessing.
PostProcessingMeta
(estimator, postprocessor=CalibratedEqualizedOdds(), needs_proba=None, prefit=False, val_size=0.25, **options)[source]¶ A meta-estimator which wraps a given estimator with a post-processing step.
The post-processor trains on a separate training set from the estimator to prevent leakage.
Note
Because of the dataset splitting, if a Pipeline is necessary it should be used as the input to this meta-estimator not the other way around.
Variables: - estimator – Fitted estimator.
- postprocessor – Fitted postprocessor.
- needs_proba (bool) – Determined depending on the postprocessor type if
needs_proba
is None.
Parameters: - estimator (sklearn.BaseEstimator) – Original estimator.
- postprocessor – Post-processing algorithm.
- needs_proba (bool) – Use
self.estimator_.predict_proba()
instead ofself.estimator_.predict()
as input to postprocessor. IfNone
, defaults toTrue
if the postprocessor supports it. - prefit (bool) – If
True
, it is assumed that estimator has been fitted already and all data is used to train postprocessor. - val_size (int or float) – Size of validation set used to fit the
postprocessor. The estimator fits on the remainder of the
training set.
See
train_test_split()
for details. - **options – Keyword options passed through to
train_test_split()
. Note: ‘train_size’ and ‘test_size’ will be ignored in favor of ‘val_size’.
Methods
fit
Splits the training samples with train_test_split()
and uses the resultant ‘train’ portion to train the estimator.get_params
Get parameters for this estimator. predict
Predict class labels for the given samples. predict_log_proba
Log of probability estimates. predict_proba
Probability estimates. score
Returns the output of the post-processor’s score function on the given test data and labels. set_params
Set the parameters of this estimator. -
__init__
(estimator, postprocessor=CalibratedEqualizedOdds(), needs_proba=None, prefit=False, val_size=0.25, **options)[source]¶ Parameters: - estimator (sklearn.BaseEstimator) – Original estimator.
- postprocessor – Post-processing algorithm.
- needs_proba (bool) – Use
self.estimator_.predict_proba()
instead ofself.estimator_.predict()
as input to postprocessor. IfNone
, defaults toTrue
if the postprocessor supports it. - prefit (bool) – If
True
, it is assumed that estimator has been fitted already and all data is used to train postprocessor. - val_size (int or float) – Size of validation set used to fit the
postprocessor. The estimator fits on the remainder of the
training set.
See
train_test_split()
for details. - **options – Keyword options passed through to
train_test_split()
. Note: ‘train_size’ and ‘test_size’ will be ignored in favor of ‘val_size’.
-
fit
(X, y, sample_weight=None, **fit_params)[source]¶ Splits the training samples with
train_test_split()
and uses the resultant ‘train’ portion to train the estimator. Then the estimator predicts on the ‘test’ portion of the split data and the post-processor is trained with those prediction-ground-truth target pairs.Parameters: - X (array-like) – Training samples.
- y (pandas.Series) – Training labels.
- sample_weight (array-like, optional) – Sample weights.
- **fit_params – Parameters passed to the post-processor
fit()
method. Note: these do not need to be prefixed with__
notation.
Returns: self
-
predict
(X)[source]¶ Predict class labels for the given samples.
First, runs
self.estimator_.predict()
(orpredict_proba()
ifself.needs_proba_
isTrue
) then returns the post-processed output from those predictions.Parameters: X (pandas.DataFrame) – Test samples. Returns: numpy.ndarray – Predicted class label per sample.
-
predict_log_proba
(X)[source]¶ Log of probability estimates.
First, runs
self.estimator_.predict()
(orpredict_proba()
ifself.needs_proba_
isTrue
) then returns the post-processed output from those predictions.The returned estimates for all classes are ordered by the label of classes.
Parameters: X (pandas.DataFrame) – Test samples. Returns: array – Returns the log-probability of the sample for each class in the model, where classes are ordered as they are in self.classes_
.
-
predict_proba
(X)[source]¶ Probability estimates.
First, runs
self.estimator_.predict()
(orpredict_proba()
ifself.needs_proba_
isTrue
) then returns the post-processed output from those predictions.The returned estimates for all classes are ordered by the label of classes.
Parameters: X (pandas.DataFrame) – Test samples. Returns: numpy.ndarray – Returns the probability of the sample for each class in the model, where classes are ordered as they are in self.classes_
.
-
score
(X, y, sample_weight=None)[source]¶ Returns the output of the post-processor’s score function on the given test data and labels.
First, runs
self.estimator_.predict()
(orpredict_proba()
ifself.needs_proba_
isTrue
) then gets the post-processed output from those predictions and scores it.Parameters: - X (pandas.DataFrame) – Test samples.
- y (array-like) – True labels for X.
- sample_weight (array-like, optional) – Sample weights.
Returns: float – Score value.