aif360.metrics.BinaryLabelDatasetMetric

class aif360.metrics.BinaryLabelDatasetMetric(dataset, unprivileged_groups=None, privileged_groups=None)[source]

Class for computing metrics based on a single BinaryLabelDataset.

Parameters:
  • dataset (BinaryLabelDataset) – A BinaryLabelDataset.

  • privileged_groups (list(dict)) – Privileged groups. Format is a list of dicts where the keys are protected_attribute_names and the values are values in protected_attributes. Each dict element describes a single group. See examples for more details.

  • unprivileged_groups (list(dict)) – Unprivileged groups in the same format as privileged_groups.

Raises:

TypeErrordataset must be a BinaryLabelDataset type.

Methods

base_rate

Compute the base rate, \(Pr(Y = 1) = P/(P+N)\), optionally conditioned on protected attributes.

consistency

Individual fairness metric from [1] that measures how similar the labels are for similar instances.

difference

Compute difference of the metric for unprivileged and privileged groups.

disparate_impact

mean_difference

Alias of statistical_parity_difference().

num_instances

Compute the number of instances, \(n\), in the dataset conditioned on protected attributes if necessary.

num_negatives

Compute the number of negatives, \(N = \sum_{i=1}^n \mathbb{1}[y_i = 0]\), optionally conditioned on protected attributes.

num_positives

Compute the number of positives, \(P = \sum_{i=1}^n \mathbb{1}[y_i = 1]\), optionally conditioned on protected attributes.

ratio

Compute ratio of the metric for unprivileged and privileged groups.

rich_subgroup

Audit dataset with respect to rich subgroups defined by linear thresholds of sensitive attributes

smoothed_empirical_differential_fairness

Smoothed EDF from [2].

statistical_parity_difference

__init__(dataset, unprivileged_groups=None, privileged_groups=None)[source]
Parameters:
  • dataset (BinaryLabelDataset) – A BinaryLabelDataset.

  • privileged_groups (list(dict)) – Privileged groups. Format is a list of dicts where the keys are protected_attribute_names and the values are values in protected_attributes. Each dict element describes a single group. See examples for more details.

  • unprivileged_groups (list(dict)) – Unprivileged groups in the same format as privileged_groups.

Raises:

TypeErrordataset must be a BinaryLabelDataset type.

base_rate(privileged=None)[source]

Compute the base rate, \(Pr(Y = 1) = P/(P+N)\), optionally conditioned on protected attributes.

Parameters:

privileged (bool, optional) – Boolean prescribing whether to condition this metric on the privileged_groups, if True, or the unprivileged_groups, if False. Defaults to None meaning this metric is computed over the entire dataset.

Returns:

float – Base rate (optionally conditioned).

consistency(n_neighbors=5)[source]

Individual fairness metric from [1] that measures how similar the labels are for similar instances.

\[1 - \frac{1}{n}\sum_{i=1}^n |\hat{y}_i - \frac{1}{\text{n_neighbors}} \sum_{j\in\mathcal{N}_{\text{n_neighbors}}(x_i)} \hat{y}_j|\]
Parameters:

n_neighbors (int, optional) – Number of neighbors for the knn computation.

References

disparate_impact()[source]
\[\frac{Pr(Y = 1 | D = \text{unprivileged})} {Pr(Y = 1 | D = \text{privileged})}\]
mean_difference()[source]

Alias of statistical_parity_difference().

num_negatives(privileged=None)[source]

Compute the number of negatives, \(N = \sum_{i=1}^n \mathbb{1}[y_i = 0]\), optionally conditioned on protected attributes.

Parameters:

privileged (bool, optional) – Boolean prescribing whether to condition this metric on the privileged_groups, if True, or the unprivileged_groups, if False. Defaults to None meaning this metric is computed over the entire dataset.

Raises:

AttributeErrorprivileged_groups or unprivileged_groups must be must be provided at initialization to condition on them.

num_positives(privileged=None)[source]

Compute the number of positives, \(P = \sum_{i=1}^n \mathbb{1}[y_i = 1]\), optionally conditioned on protected attributes.

Parameters:

privileged (bool, optional) – Boolean prescribing whether to condition this metric on the privileged_groups, if True, or the unprivileged_groups, if False. Defaults to None meaning this metric is computed over the entire dataset.

Raises:

AttributeErrorprivileged_groups or unprivileged_groups must be must be provided at initialization to condition on them.

rich_subgroup(predictions, fairness_def='FP')[source]

Audit dataset with respect to rich subgroups defined by linear thresholds of sensitive attributes

Args: fairness_def is ‘FP’ or ‘FN’ for rich subgroup wrt to false positive or false negative rate.

predictions is a hashable tuple of predictions. Typically the labels attribute of a GerryFairClassifier

Returns: the gamma disparity with respect to the fairness_def.

Examples: see examples/gerry_plots.ipynb

smoothed_empirical_differential_fairness(concentration=1.0)[source]

Smoothed EDF from [2].

Parameters:

concentration (float, optional) – Concentration parameter for Dirichlet smoothing. Must be non-negative.

Examples

To use with non-binary protected attributes, the column must be converted to ordinal:

>>> mapping = {'Black': 0, 'White': 1, 'Asian-Pac-Islander': 2,
... 'Amer-Indian-Eskimo': 3, 'Other': 4}
>>> def map_race(df):
...     df['race-num'] = df.race.map(mapping)
...     return df
...
>>> adult = AdultDataset(protected_attribute_names=['sex',
... 'race-num'], privileged_classes=[['Male'], [1]],
... categorical_features=['workclass', 'education',
... 'marital-status', 'occupation', 'relationship',
... 'native-country', 'race'], custom_preprocessing=map_race)
>>> metric = BinaryLabelDatasetMetric(adult)
>>> metric.smoothed_empirical_differential_fairness()
1.7547611985549287

References

statistical_parity_difference()[source]
\[Pr(Y = 1 | D = \text{unprivileged}) - Pr(Y = 1 | D = \text{privileged})\]