`aif360.metrics`.BinaryLabelDatasetMetric

class aif360.metrics.BinaryLabelDatasetMetric(dataset, unprivileged_groups=None, privileged_groups=None)[source]

Class for computing metrics based on a single BinaryLabelDataset.

Parameters:

dataset (BinaryLabelDataset) – A BinaryLabelDataset.
privileged_groups (list(dict)) – Privileged groups. Format is a list of dicts where the keys are protected_attribute_names and the values are values in protected_attributes. Each dict element describes a single group. See examples for more details.
unprivileged_groups (list(dict)) – Unprivileged groups in the same format as privileged_groups.

Raises:

TypeError – dataset must be a BinaryLabelDataset type.

Methods

`base_rate`	Compute the base rate, \(Pr(Y = 1) = P/(P+N)\), optionally conditioned on protected attributes.
`consistency`	Individual fairness metric from [1] that measures how similar the labels are for similar instances.
`difference`	Compute difference of the metric for unprivileged and privileged groups.
`disparate_impact`
`mean_difference`	Alias of `statistical_parity_difference()`.
`num_instances`	Compute the number of instances, \(n\), in the dataset conditioned on protected attributes if necessary.
`num_negatives`	Compute the number of negatives, \(N = \sum_{i=1}^n \mathbb{1}[y_i = 0]\), optionally conditioned on protected attributes.
`num_positives`	Compute the number of positives, \(P = \sum_{i=1}^n \mathbb{1}[y_i = 1]\), optionally conditioned on protected attributes.
`ratio`	Compute ratio of the metric for unprivileged and privileged groups.
`rich_subgroup`	Audit dataset with respect to rich subgroups defined by linear thresholds of sensitive attributes
`smoothed_empirical_differential_fairness`	Smoothed EDF from [2].
`statistical_parity_difference`

__init__(dataset, unprivileged_groups=None, privileged_groups=None)[source]

Parameters:

dataset (BinaryLabelDataset) – A BinaryLabelDataset.
privileged_groups (list(dict)) – Privileged groups. Format is a list of dicts where the keys are protected_attribute_names and the values are values in protected_attributes. Each dict element describes a single group. See examples for more details.
unprivileged_groups (list(dict)) – Unprivileged groups in the same format as privileged_groups.

Raises:

TypeError – dataset must be a BinaryLabelDataset type.

base_rate(privileged=None)[source]

Compute the base rate, \(Pr(Y = 1) = P/(P+N)\), optionally conditioned on protected attributes.

Parameters:: privileged (bool, optional) – Boolean prescribing whether to condition this metric on the privileged_groups, if True, or the unprivileged_groups, if False. Defaults to None meaning this metric is computed over the entire dataset.
Returns:: float – Base rate (optionally conditioned).

consistency(n_neighbors=5)[source]

Individual fairness metric from [1] that measures how similar the labels are for similar instances.

\[1 - \frac{1}{n}\sum_{i=1}^n |\hat{y}_i - \frac{1}{\text{n_neighbors}} \sum_{j\in\mathcal{N}_{\text{n_neighbors}}(x_i)} \hat{y}_j|\]

Parameters:: n_neighbors (int, optional) – Number of neighbors for the knn computation.

References

disparate_impact()[source]: \[\frac{Pr(Y = 1 | D = \text{unprivileged})} {Pr(Y = 1 | D = \text{privileged})}\]

mean_difference()[source]: Alias of statistical_parity_difference().

num_negatives(privileged=None)[source]

Compute the number of negatives, \(N = \sum_{i=1}^n \mathbb{1}[y_i = 0]\), optionally conditioned on protected attributes.

Parameters:: privileged (bool, optional) – Boolean prescribing whether to condition this metric on the privileged_groups, if True, or the unprivileged_groups, if False. Defaults to None meaning this metric is computed over the entire dataset.
Raises:: AttributeError – privileged_groups or unprivileged_groups must be must be provided at initialization to condition on them.

num_positives(privileged=None)[source]

Compute the number of positives, \(P = \sum_{i=1}^n \mathbb{1}[y_i = 1]\), optionally conditioned on protected attributes.

Parameters:: privileged (bool, optional) – Boolean prescribing whether to condition this metric on the privileged_groups, if True, or the unprivileged_groups, if False. Defaults to None meaning this metric is computed over the entire dataset.
Raises:: AttributeError – privileged_groups or unprivileged_groups must be must be provided at initialization to condition on them.

rich_subgroup(predictions, fairness_def='FP')[source]

Audit dataset with respect to rich subgroups defined by linear thresholds of sensitive attributes

Args: fairness_def is ‘FP’ or ‘FN’ for rich subgroup wrt to false positive or false negative rate.: predictions is a hashable tuple of predictions. Typically the labels attribute of a GerryFairClassifier

Returns: the gamma disparity with respect to the fairness_def.

Examples: see examples/gerry_plots.ipynb

smoothed_empirical_differential_fairness(concentration=1.0)[source]

Smoothed EDF from [2].

Parameters:: concentration (float, optional) – Concentration parameter for Dirichlet smoothing. Must be non-negative.

Examples

To use with non-binary protected attributes, the column must be converted to ordinal:

>>> mapping = {'Black': 0, 'White': 1, 'Asian-Pac-Islander': 2,
... 'Amer-Indian-Eskimo': 3, 'Other': 4}
>>> def map_race(df):
...     df['race-num'] = df.race.map(mapping)
...     return df
...
>>> adult = AdultDataset(protected_attribute_names=['sex',
... 'race-num'], privileged_classes=[['Male'], [1]],
... categorical_features=['workclass', 'education',
... 'marital-status', 'occupation', 'relationship',
... 'native-country', 'race'], custom_preprocessing=map_race)
>>> metric = BinaryLabelDatasetMetric(adult)
>>> metric.smoothed_empirical_differential_fairness()
1.7547611985549287

References

statistical_parity_difference()[source]: \[Pr(Y = 1 | D = \text{unprivileged}) - Pr(Y = 1 | D = \text{privileged})\]

aif360.metrics.BinaryLabelDatasetMetric

`aif360.metrics`.BinaryLabelDatasetMetric