aif360.sklearn.postprocessing
.CalibratedEqualizedOdds¶
-
class
aif360.sklearn.postprocessing.
CalibratedEqualizedOdds
(prot_attr=None, cost_constraint='weighted', random_state=None)[source]¶ Calibrated equalized odds post-processor.
Calibrated equalized odds is a post-processing technique that optimizes over calibrated classifier score outputs to find probabilities with which to change output labels with an equalized odds objective [1].
Note
A
Pipeline
expects a single estimation step but this class requires an estimator’s predictions as input. SeePostProcessingMeta
for a workaround.See also
References
[1] G. Pleiss, M. Raghavan, F. Wu, J. Kleinberg, and K. Q. Weinberger, “On Fairness and Calibration,” Conference on Neural Information Processing Systems, 2017. Adapted from: https://github.com/gpleiss/equalized_odds_and_calibration/blob/master/calib_eq_odds.py
Variables: - prot_attr (str or list(str)) – Protected attribute(s) used for post- processing.
- groups (array, shape (2,)) – A list of group labels known to the classifier. Note: this algorithm require a binary division of the data.
- classes (array, shape (num_classes,)) – A list of class labels known to the classifier. Note: this algorithm treats all non-positive outcomes as negative (binary classification only).
- pos_label (scalar) – The label of the positive class.
- mix_rates (array, shape (2,)) – The interpolation parameters – the probability of randomly returning the group’s base rate. The group for which the cost function is higher is set to 0.
Parameters: - prot_attr (single label or list-like, optional) – Protected
attribute(s) to use in the post-processing. If more than one
attribute, all combinations of values (intersections) are
considered. Default is
None
meaning all protected attributes from the dataset are used. Note: This algorithm requires there be exactly 2 groups (privileged and unprivileged). - cost_constraint ('fpr', 'fnr', or 'weighted') – Which equal-cost constraint to satisfy: generalized false positive rate (‘fpr’), generalized false negative rate (‘fnr’), or a weighted combination of both (‘weighted’).
- random_state (int or numpy.RandomState, optional) – Seed of pseudo- random number generator for sampling from the mix rates.
Methods
fit
Compute the mixing rates required to satisfy the cost constraint. get_params
Get parameters for this estimator. predict
Predict class labels for the given scores. predict_proba
The returned estimates for all classes are ordered by the label of classes. score
Score the predictions according to the cost constraint specified. set_params
Set the parameters of this estimator. -
__init__
(prot_attr=None, cost_constraint='weighted', random_state=None)[source]¶ Parameters: - prot_attr (single label or list-like, optional) – Protected
attribute(s) to use in the post-processing. If more than one
attribute, all combinations of values (intersections) are
considered. Default is
None
meaning all protected attributes from the dataset are used. Note: This algorithm requires there be exactly 2 groups (privileged and unprivileged). - cost_constraint ('fpr', 'fnr', or 'weighted') – Which equal-cost constraint to satisfy: generalized false positive rate (‘fpr’), generalized false negative rate (‘fnr’), or a weighted combination of both (‘weighted’).
- random_state (int or numpy.RandomState, optional) – Seed of pseudo- random number generator for sampling from the mix rates.
- prot_attr (single label or list-like, optional) – Protected
attribute(s) to use in the post-processing. If more than one
attribute, all combinations of values (intersections) are
considered. Default is
-
fit
(X, y, labels=None, pos_label=1, sample_weight=None)[source]¶ Compute the mixing rates required to satisfy the cost constraint.
Parameters: - X (array-like) – Probability estimates of the targets as returned by
a
predict_proba()
call or equivalent. - y (pandas.Series) – Ground-truth (correct) target values.
- labels (list, optional) – The ordered set of labels values. Must match the order of columns in X if provided. By default, all labels in y are used in sorted order.
- pos_label (scalar, optional) – The label of the positive class.
- sample_weight (array-like, optional) – Sample weights.
Returns: self
- X (array-like) – Probability estimates of the targets as returned by
a
-
predict
(X)[source]¶ Predict class labels for the given scores.
Parameters: X (pandas.DataFrame) – Probability estimates of the targets as returned by a predict_proba()
call or equivalent. Note: must include protected attributes in the index.Returns: numpy.ndarray – Predicted class label per sample.
-
predict_proba
(X)[source]¶ The returned estimates for all classes are ordered by the label of classes.
Parameters: X (pandas.DataFrame) – Probability estimates of the targets as returned by a predict_proba()
call or equivalent. Note: must include protected attributes in the index.Returns: numpy.ndarray – Returns the probability of the sample for each class in the model, where classes are ordered as they are in self.classes_
.
-
score
(X, y, sample_weight=None)[source]¶ Score the predictions according to the cost constraint specified.
Parameters: - X (pandas.DataFrame) – Probability estimates of the targets as
returned by a
predict_proba()
call or equivalent. Note: must include protected attributes in the index. - y (array-like) – Ground-truth (correct) target values.
- sample_weight (array-like, optional) – Sample weights.
Returns: float – Absolute value of the difference in cost function for the two groups (e.g.
generalized_fpr()
ifself.cost_constraint
is ‘fpr’)- X (pandas.DataFrame) – Probability estimates of the targets as
returned by a