aif360.sklearn.postprocessing.PostProcessingMeta¶
-
class
aif360.sklearn.postprocessing.PostProcessingMeta(estimator, postprocessor=CalibratedEqualizedOdds(), needs_proba=None, prefit=False, val_size=0.25, **options)[source]¶ A meta-estimator which wraps a given estimator with a post-processing step.
The post-processor trains on a separate training set from the estimator to prevent leakage.
Note
Because of the dataset splitting, if a Pipeline is necessary it should be used as the input to this meta-estimator not the other way around.
Variables: - estimator – Fitted estimator.
- postprocessor – Fitted postprocessor.
- needs_proba (bool) – Determined depending on the postprocessor type if
needs_probais None.
Parameters: - estimator (sklearn.BaseEstimator) – Original estimator.
- postprocessor – Post-processing algorithm.
- needs_proba (bool) – Use
self.estimator_.predict_proba()instead ofself.estimator_.predict()as input to postprocessor. IfNone, defaults toTrueif the postprocessor supports it. - prefit (bool) – If
True, it is assumed that estimator has been fitted already and all data is used to train postprocessor. - val_size (int or float) – Size of validation set used to fit the
postprocessor. The estimator fits on the remainder of the
training set.
See
train_test_split()for details. - **options – Keyword options passed through to
train_test_split(). Note: ‘train_size’ and ‘test_size’ will be ignored in favor of ‘val_size’.
Methods
fitSplits the training samples with train_test_split()and uses the resultant ‘train’ portion to train the estimator.get_paramsGet parameters for this estimator. predictPredict class labels for the given samples. predict_log_probaLog of probability estimates. predict_probaProbability estimates. scoreReturns the output of the post-processor’s score function on the given test data and labels. set_paramsSet the parameters of this estimator. -
__init__(estimator, postprocessor=CalibratedEqualizedOdds(), needs_proba=None, prefit=False, val_size=0.25, **options)[source]¶ Parameters: - estimator (sklearn.BaseEstimator) – Original estimator.
- postprocessor – Post-processing algorithm.
- needs_proba (bool) – Use
self.estimator_.predict_proba()instead ofself.estimator_.predict()as input to postprocessor. IfNone, defaults toTrueif the postprocessor supports it. - prefit (bool) – If
True, it is assumed that estimator has been fitted already and all data is used to train postprocessor. - val_size (int or float) – Size of validation set used to fit the
postprocessor. The estimator fits on the remainder of the
training set.
See
train_test_split()for details. - **options – Keyword options passed through to
train_test_split(). Note: ‘train_size’ and ‘test_size’ will be ignored in favor of ‘val_size’.
-
fit(X, y, sample_weight=None, **fit_params)[source]¶ Splits the training samples with
train_test_split()and uses the resultant ‘train’ portion to train the estimator. Then the estimator predicts on the ‘test’ portion of the split data and the post-processor is trained with those prediction-ground-truth target pairs.Parameters: - X (array-like) – Training samples.
- y (pandas.Series) – Training labels.
- sample_weight (array-like, optional) – Sample weights.
- **fit_params – Parameters passed to the post-processor
fit()method. Note: these do not need to be prefixed with__notation.
Returns: self
-
predict(X)[source]¶ Predict class labels for the given samples.
First, runs
self.estimator_.predict()(orpredict_proba()ifself.needs_proba_isTrue) then returns the post-processed output from those predictions.Parameters: X (pandas.DataFrame) – Test samples. Returns: numpy.ndarray – Predicted class label per sample.
-
predict_log_proba(X)[source]¶ Log of probability estimates.
First, runs
self.estimator_.predict()(orpredict_proba()ifself.needs_proba_isTrue) then returns the post-processed output from those predictions.The returned estimates for all classes are ordered by the label of classes.
Parameters: X (pandas.DataFrame) – Test samples. Returns: array – Returns the log-probability of the sample for each class in the model, where classes are ordered as they are in self.classes_.
-
predict_proba(X)[source]¶ Probability estimates.
First, runs
self.estimator_.predict()(orpredict_proba()ifself.needs_proba_isTrue) then returns the post-processed output from those predictions.The returned estimates for all classes are ordered by the label of classes.
Parameters: X (pandas.DataFrame) – Test samples. Returns: numpy.ndarray – Returns the probability of the sample for each class in the model, where classes are ordered as they are in self.classes_.
-
score(X, y, sample_weight=None)[source]¶ Returns the output of the post-processor’s score function on the given test data and labels.
First, runs
self.estimator_.predict()(orpredict_proba()ifself.needs_proba_isTrue) then gets the post-processed output from those predictions and scores it.Parameters: - X (pandas.DataFrame) – Test samples.
- y (array-like) – True labels for X.
- sample_weight (array-like, optional) – Sample weights.
Returns: float – Score value.