aif360.sklearn.postprocessing
.RejectOptionClassifier
- class aif360.sklearn.postprocessing.RejectOptionClassifier(prot_attr=None, threshold=0.5, margin=0.1)[source]
Reject option based classification (ROC) post-processor.
Reject option classification is a post-processing technique that gives favorable outcomes to unprivileged groups and unfavorable outcomes to privileged groups in a confidence band around the decision boundary with the highest uncertainty [1].
Note
A
Pipeline
expects a single estimation step but this class requires an estimator’s predictions as input. SeePostProcessingMeta
for a workaround.See also
PostProcessingMeta
,RejectOptionClassifierCV
References
- Variables:
prot_attr_ (str or list(str)) – Protected attribute(s) used for post- processing.
groups_ (array, shape (2,)) – A list of group labels known to the classifier. Note: this algorithm require a binary division of the data.
classes_ (array, shape (num_classes,)) – A list of class labels known to the classifier. Note: this algorithm treats all non-positive outcomes as negative (binary classification only).
pos_label_ (scalar) – The label of the positive class.
priv_group_ (scalar) – The label of the privileged group.
Examples
RejectOptionClassifier can be easily paired with GridSearchCV to find the best threshold and margin with respect to a fairness measure:
>>> from sklearn.model_selection import GridSearchCV >>> roc = RejectOptionClassifier() >>> param = [{'threshold': [t], 'margin': np.arange(0.05, min(t, 1-t)+0.025, 0.05)} ... for t in np.arange(0.05, 1., 0.05)] >>> stat_par = make_scorer(statistical_parity_difference) >>> scoring = {'bal_acc': 'balanced_accuracy', 'stat_par': stat_par} >>> def refit(cv_res): ... return np.ma.array(cv_res['mean_test_bal_acc'], ... mask=cv_res['mean_test_stat_par'] < -0.1).argmax() ... >>> grid = GridSearchCV(roc, param, scoring=scoring, refit=refit)
Or, alternatively, this can be done in one step with RejectOptionClassifierCV:
>>> grid = RejectOptionClassifierCV(scoring='statistical_parity')
- Parameters:
prot_attr (single label or list-like, optional) – Protected attribute(s) to use in the post-processing. If more than one attribute, all combinations of values (intersections) are considered. Default is
None
meaning all protected attributes from the dataset are used. Note: This algorithm requires there be exactly 2 groups (privileged and unprivileged).threshold (scalar) – Classification threshold. Probability estimates greater than this value are considered positive. Must be between 0 and 1.
margin (scalar) – Half width of the critical region. Estimates within the critical region are “rejected” and assigned according to their group. Must be between 0 and min(threshold, 1-threshold).
Methods
This is essentially a no-op; it simply validates the inputs and stores them for predict.
Predict class labels for the given scores.
get_metadata_routing
Get metadata routing of this object.
get_params
Get parameters for this estimator.
Predict class labels for the given scores.
Probability estimates.
score
Return the mean accuracy on the given test data and labels.
Request metadata passed to the
fit
method.set_params
Set the parameters of this estimator.
Request metadata passed to the
score
method.- __init__(prot_attr=None, threshold=0.5, margin=0.1)[source]
- Parameters:
prot_attr (single label or list-like, optional) – Protected attribute(s) to use in the post-processing. If more than one attribute, all combinations of values (intersections) are considered. Default is
None
meaning all protected attributes from the dataset are used. Note: This algorithm requires there be exactly 2 groups (privileged and unprivileged).threshold (scalar) – Classification threshold. Probability estimates greater than this value are considered positive. Must be between 0 and 1.
margin (scalar) – Half width of the critical region. Estimates within the critical region are “rejected” and assigned according to their group. Must be between 0 and min(threshold, 1-threshold).
- fit(X, y, labels=None, pos_label=1, priv_group=1, sample_weight=None)[source]
This is essentially a no-op; it simply validates the inputs and stores them for predict.
- Parameters:
X (array-like) – Ignored.
y (array-like) – Ground-truth (correct) target values. Note: one of X or y must contain protected attribute information.
labels (list, optional) – The ordered set of labels values. Must match the order of columns in X if provided. By default, all labels in y are used in sorted order.
pos_label (scalar, optional) – The label of the positive class.
priv_group (scalar, optional) – The label of the privileged group.
sample_weight (array-like, optional) – Ignored.
- Returns:
self
- fit_predict(X, y=None, **fit_params)[source]
Predict class labels for the given scores.
In general, it is not necessary to fit and predict separately so this method may be used instead. For subsequent predicts, it may be easier to use the
predict
method, though.- Parameters:
X (pandas.DataFrame) – Probability estimates of the targets as returned by a
predict_proba()
call or equivalent. Note: must include protected attributes in the index.y (array-like, optional) – Ground-truth (correct) target values. Note: if not provided,
labels
must be provided in**fit_params
. Seefit
for details.**fit_params – See
fit
for details.
- Returns:
numpy.ndarray – Predicted class label per sample.
- predict(X)[source]
Predict class labels for the given scores.
- Parameters:
X (pandas.DataFrame) – Probability estimates of the targets as returned by a
predict_proba()
call or equivalent. Note: must include protected attributes in the index.- Returns:
numpy.ndarray – Predicted class label per sample.
- predict_proba(X)[source]
Probability estimates.
The returned estimates for all classes are ordered by the label of classes.
- Parameters:
X (pandas.DataFrame) – Probability estimates of the targets as returned by a
predict_proba()
call or equivalent. Note: must include protected attributes in the index.- Returns:
numpy.ndarray – Returns the probability of the sample for each class in the model, where classes are ordered as they are in
self.classes_
.
- set_fit_request(*, labels: bool | None | str = '$UNCHANGED$', pos_label: bool | None | str = '$UNCHANGED$', priv_group: bool | None | str = '$UNCHANGED$', sample_weight: bool | None | str = '$UNCHANGED$') RejectOptionClassifier [source]
Request metadata passed to the
fit
method.Note that this method is only relevant if
enable_metadata_routing=True
(seesklearn.set_config()
). Please see User Guide on how the routing mechanism works.The options for each parameter are:
True
: metadata is requested, and passed tofit
if provided. The request is ignored if metadata is not provided.False
: metadata is not requested and the meta-estimator will not pass it tofit
.None
: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str
: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED
) retains the existing request. This allows you to change the request for some parameters and not others.New in version 1.3.
Note
This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a
Pipeline
. Otherwise it has no effect.- Parameters:
labels (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for
labels
parameter infit
.pos_label (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for
pos_label
parameter infit
.priv_group (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for
priv_group
parameter infit
.sample_weight (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for
sample_weight
parameter infit
.
- Returns:
self (object) – The updated object.
- set_score_request(*, sample_weight: bool | None | str = '$UNCHANGED$') RejectOptionClassifier [source]
Request metadata passed to the
score
method.Note that this method is only relevant if
enable_metadata_routing=True
(seesklearn.set_config()
). Please see User Guide on how the routing mechanism works.The options for each parameter are:
True
: metadata is requested, and passed toscore
if provided. The request is ignored if metadata is not provided.False
: metadata is not requested and the meta-estimator will not pass it toscore
.None
: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str
: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED
) retains the existing request. This allows you to change the request for some parameters and not others.New in version 1.3.
Note
This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a
Pipeline
. Otherwise it has no effect.- Parameters:
sample_weight (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for
sample_weight
parameter inscore
.- Returns:
self (object) – The updated object.