aif360.metrics
.BinaryLabelDatasetMetric¶
-
class
aif360.metrics.
BinaryLabelDatasetMetric
(dataset, unprivileged_groups=None, privileged_groups=None)[source]¶ Class for computing metrics based on a single
BinaryLabelDataset
.Parameters: - dataset (BinaryLabelDataset) – A BinaryLabelDataset.
- privileged_groups (list(dict)) – Privileged groups. Format is a list
of
dicts
where the keys areprotected_attribute_names
and the values are values inprotected_attributes
. Eachdict
element describes a single group. See examples for more details. - unprivileged_groups (list(dict)) – Unprivileged groups in the same
format as
privileged_groups
.
Raises: TypeError
–dataset
must be aBinaryLabelDataset
type.Methods
base_rate
Compute the base rate, \(Pr(Y = 1) = P/(P+N)\), optionally conditioned on protected attributes. consistency
Individual fairness metric from [1] that measures how similar the labels are for similar instances. difference
Compute difference of the metric for unprivileged and privileged groups. disparate_impact
mean_difference
Alias of statistical_parity_difference()
.num_instances
Compute the number of instances, \(n\), in the dataset conditioned on protected attributes if necessary. num_negatives
Compute the number of negatives, \(N = \sum_{i=1}^n \mathbb{1}[y_i = 0]\), optionally conditioned on protected attributes. num_positives
Compute the number of positives, \(P = \sum_{i=1}^n \mathbb{1}[y_i = 1]\), optionally conditioned on protected attributes. ratio
Compute ratio of the metric for unprivileged and privileged groups. rich_subgroup
Audit dataset with respect to rich subgroups defined by linear thresholds of sensitive attributes smoothed_empirical_differential_fairness
Smoothed EDF from [2]. statistical_parity_difference
-
__init__
(dataset, unprivileged_groups=None, privileged_groups=None)[source]¶ Parameters: - dataset (BinaryLabelDataset) – A BinaryLabelDataset.
- privileged_groups (list(dict)) – Privileged groups. Format is a list
of
dicts
where the keys areprotected_attribute_names
and the values are values inprotected_attributes
. Eachdict
element describes a single group. See examples for more details. - unprivileged_groups (list(dict)) – Unprivileged groups in the same
format as
privileged_groups
.
Raises: TypeError
–dataset
must be aBinaryLabelDataset
type.
-
base_rate
(privileged=None)[source]¶ Compute the base rate, \(Pr(Y = 1) = P/(P+N)\), optionally conditioned on protected attributes.
Parameters: privileged (bool, optional) – Boolean prescribing whether to condition this metric on the privileged_groups
, ifTrue
, or theunprivileged_groups
, ifFalse
. Defaults toNone
meaning this metric is computed over the entire dataset.Returns: float – Base rate (optionally conditioned).
-
consistency
(n_neighbors=5)[source]¶ Individual fairness metric from [1] that measures how similar the labels are for similar instances.
\[1 - \frac{1}{n\cdot\text{n_neighbors}}\sum_{i=1}^n |\hat{y}_i - \sum_{j\in\mathcal{N}_{\text{n_neighbors}}(x_i)} \hat{y}_j|\]Parameters: n_neighbors (int, optional) – Number of neighbors for the knn computation. References
[1] (1, 2) R. Zemel, Y. Wu, K. Swersky, T. Pitassi, and C. Dwork, “Learning Fair Representations,” International Conference on Machine Learning, 2013.
-
disparate_impact
()[source]¶ - \[\frac{Pr(Y = 1 | D = \text{unprivileged})} {Pr(Y = 1 | D = \text{privileged})}\]
-
mean_difference
()[source]¶ Alias of
statistical_parity_difference()
.
-
num_negatives
(privileged=None)[source]¶ Compute the number of negatives, \(N = \sum_{i=1}^n \mathbb{1}[y_i = 0]\), optionally conditioned on protected attributes.
Parameters: privileged (bool, optional) – Boolean prescribing whether to condition this metric on the privileged_groups
, ifTrue
, or theunprivileged_groups
, ifFalse
. Defaults toNone
meaning this metric is computed over the entire dataset.Raises: AttributeError
–privileged_groups
orunprivileged_groups
must be must be provided at initialization to condition on them.
-
num_positives
(privileged=None)[source]¶ Compute the number of positives, \(P = \sum_{i=1}^n \mathbb{1}[y_i = 1]\), optionally conditioned on protected attributes.
Parameters: privileged (bool, optional) – Boolean prescribing whether to condition this metric on the privileged_groups
, ifTrue
, or theunprivileged_groups
, ifFalse
. Defaults toNone
meaning this metric is computed over the entire dataset.Raises: AttributeError
–privileged_groups
orunprivileged_groups
must be must be provided at initialization to condition on them.
-
rich_subgroup
(predictions, fairness_def='FP')[source]¶ Audit dataset with respect to rich subgroups defined by linear thresholds of sensitive attributes
- Args: fairness_def is ‘FP’ or ‘FN’ for rich subgroup wrt to false positive or false negative rate.
- predictions is a hashable tuple of predictions. Typically the labels attribute of a GerryFairClassifier
Returns: the gamma disparity with respect to the fairness_def.
Examples: see examples/gerry_plots.ipynb
-
smoothed_empirical_differential_fairness
(concentration=1.0)[source]¶ Smoothed EDF from [2].
Parameters: concentration (float, optional) – Concentration parameter for Dirichlet smoothing. Must be non-negative. Examples
To use with non-binary protected attributes, the column must be converted to ordinal:
>>> mapping = {'Black': 0, 'White': 1, 'Asian-Pac-Islander': 2, ... 'Amer-Indian-Eskimo': 3, 'Other': 4} >>> def map_race(df): ... df['race-num'] = df.race.map(mapping) ... return df ... >>> adult = AdultDataset(protected_attribute_names=['sex', ... 'race-num'], privileged_classes=[['Male'], [1]], ... categorical_features=['workclass', 'education', ... 'marital-status', 'occupation', 'relationship', ... 'native-country', 'race'], custom_preprocessing=map_race) >>> metric = BinaryLabelDatasetMetric(adult) >>> metric.smoothed_empirical_differential_fairness() 1.7547611985549287
References
[2] (1, 2) J. R. Foulds, R. Islam, K. N. Keya, and S. Pan, “An Intersectional Definition of Fairness,” arXiv preprint arXiv:1807.08362, 2018.