Base Class¶

class aif360.datasets.StandardDataset(df, label_name, favorable_classes, protected_attribute_names, privileged_classes, instance_weights_name='', scores_name='', categorical_features=[], features_to_keep=[], features_to_drop=[], na_values=[], custom_preprocessing=None, metadata=None)[source]¶

Base class for every BinaryLabelDataset provided out of the box by aif360.

It is not strictly necessary to inherit this class when adding custom datasets but it may be useful.

This class is very loosely based on code from https://github.com/algofairness/fairness-comparison.

Subclasses of StandardDataset should perform the following before calling super().__init__:

Load the dataframe from a raw file.

Then, this class will go through a standard preprocessing routine which:

(optional) Performs some dataset-specific preprocessing (e.g. renaming columns/values, handling missing data).

Drops unrequested columns (see features_to_keep and features_to_drop for details).

Drops rows with NA values.

Creates a one-hot encoding of the categorical variables.

Maps protected attributes to binary privileged/unprivileged values (1/0).

Maps labels to binary favorable/unfavorable labels (1/0).

Parameters:

df (pandas.DataFrame) – DataFrame on which to perform standard processing.
label_name – Name of the label column in df.
favorable_classes (list or function) – Label values which are considered favorable or a boolean function which returns True if favorable. All others are unfavorable. Label values are mapped to 1 (favorable) and 0 (unfavorable) if they are not already binary and numerical.
protected_attribute_names (list) – List of names corresponding to protected attribute columns in df.
privileged_classes (list(list or function)) – Each element is a list of values which are considered privileged or a boolean function which return True if privileged for the corresponding column in protected_attribute_names. All others are unprivileged. Values are mapped to 1 (privileged) and 0 (unprivileged) if they are not already numerical.
instance_weights_name (optional) – Name of the instance weights column in df.
categorical_features (optional, list) – List of column names in the DataFrame which are to be expanded into one-hot vectors.
features_to_keep (optional, list) – Column names to keep. All others are dropped except those present in protected_attribute_names, categorical_features, label_name or instance_weights_name. Defaults to all columns if not provided.
features_to_drop (optional, list) – Column names to drop. Note: this overrides features_to_keep.
na_values (optional) – Additional strings to recognize as NA. See pandas.read_csv() for details.
custom_preprocessing (function) – A function object which acts on and returns a DataFrame (f: DataFrame -> DataFrame). If None, no extra preprocessing is applied.
metadata (optional) – Additional metadata to append.

Adult Dataset¶

class aif360.datasets.AdultDataset(label_name='income-per-year', favorable_classes=['>50K', '>50K.'], protected_attribute_names=['race', 'sex'], privileged_classes=[['White'], ['Male']], instance_weights_name=None, categorical_features=['workclass', 'education', 'marital-status', 'occupation', 'relationship', 'native-country'], features_to_keep=[], features_to_drop=['fnlwgt'], na_values=['?'], custom_preprocessing=None, metadata={'label_maps': [{1.0: '>50K', 0.0: '<=50K'}], 'protected_attribute_maps': [{1.0: 'White', 0.0: 'Non-white'}, {1.0: 'Male', 0.0: 'Female'}]})[source]¶

Adult Census Income Dataset.

See aif360/data/raw/adult/README.md.

See StandardDataset for a description of the arguments.

Examples

The following will instantiate a dataset which uses the fnlwgt feature:

>>> from aif360.datasets import AdultDataset
>>> ad = AdultDataset(instance_weights_name='fnlwgt',
... features_to_drop=[])
WARNING:root:Missing Data: 3620 rows removed from dataset.
>>> not np.all(ad.instance_weights == 1.)
True

To instantiate a dataset which utilizes only numerical features and a single protected attribute, run:

>>> single_protected = ['sex']
>>> single_privileged = [['Male']]
>>> ad = AdultDataset(protected_attribute_names=single_protected,
... privileged_classes=single_privileged,
... categorical_features=[],
... features_to_keep=['age', 'education-num'])
>>> print(ad.feature_names)
['education-num', 'age', 'sex']
>>> print(ad.label_names)
['income-per-year']

Note: the protected_attribute_names and label_name are kept even if they are not explicitly given in features_to_keep.

In some cases, it may be useful to keep track of a mapping from float -> str for protected attributes and/or labels. If our use case differs from the default, we can modify the mapping stored in metadata:

>>> label_map = {1.0: '>50K', 0.0: '<=50K'}
>>> protected_attribute_maps = [{1.0: 'Male', 0.0: 'Female'}]
>>> ad = AdultDataset(protected_attribute_names=['sex'],
... privileged_classes=[['Male']], metadata={'label_map': label_map,
... 'protected_attribute_maps': protected_attribute_maps})

Now this information will stay attached to the dataset and can be used for more descriptive visualizations.

Bank Dataset¶

class aif360.datasets.BankDataset(label_name='y', favorable_classes=['yes'], protected_attribute_names=['age'], privileged_classes=[<function BankDataset.<lambda>>], instance_weights_name=None, categorical_features=['job', 'marital', 'education', 'default', 'housing', 'loan', 'contact', 'month', 'day_of_week', 'poutcome'], features_to_keep=[], features_to_drop=[], na_values=['unknown'], custom_preprocessing=None, metadata=None)[source]¶

Bank marketing Dataset.

See aif360/data/raw/bank/README.md.

See StandardDataset for a description of the arguments.

By default, this code converts the ‘age’ attribute to a binary value where privileged is age >= 25 and unprivileged is age < 25 as in GermanDataset.

Compas Dataset¶

class aif360.datasets.CompasDataset(label_name='two_year_recid', favorable_classes=[0], protected_attribute_names=['sex', 'race'], privileged_classes=[['Female'], ['Caucasian']], instance_weights_name=None, categorical_features=['age_cat', 'c_charge_degree', 'c_charge_desc'], features_to_keep=['sex', 'age', 'age_cat', 'race', 'juv_fel_count', 'juv_misd_count', 'juv_other_count', 'priors_count', 'c_charge_degree', 'c_charge_desc', 'two_year_recid'], features_to_drop=[], na_values=[], custom_preprocessing=<function default_preprocessing>, metadata={'label_maps': [{1.0: 'Did recid.', 0.0: 'No recid.'}], 'protected_attribute_maps': [{0.0: 'Male', 1.0: 'Female'}, {1.0: 'Caucasian', 0.0: 'Not Caucasian'}]})[source]¶

ProPublica COMPAS Dataset.

See aif360/data/raw/compas/README.md.

See StandardDataset for a description of the arguments.

Note: The label value 0 in this case is considered favorable (no recidivism).

Examples

In some cases, it may be useful to keep track of a mapping from float -> str for protected attributes and/or labels. If our use case differs from the default, we can modify the mapping stored in metadata:

>>> label_map = {1.0: 'Did recid.', 0.0: 'No recid.'}
>>> protected_attribute_maps = [{1.0: 'Male', 0.0: 'Female'}]
>>> cd = CompasDataset(protected_attribute_names=['sex'],
... privileged_classes=[['Male']], metadata={'label_map': label_map,
... 'protected_attribute_maps': protected_attribute_maps})

Now this information will stay attached to the dataset and can be used for more descriptive visualizations.

German Dataset¶

class aif360.datasets.GermanDataset(label_name='credit', favorable_classes=[1], protected_attribute_names=['sex', 'age'], privileged_classes=[['male'], <function GermanDataset.<lambda>>], instance_weights_name=None, categorical_features=['status', 'credit_history', 'purpose', 'savings', 'employment', 'other_debtors', 'property', 'installment_plans', 'housing', 'skill_level', 'telephone', 'foreign_worker'], features_to_keep=[], features_to_drop=['personal_status'], na_values=[], custom_preprocessing=<function default_preprocessing>, metadata={'label_maps': [{1.0: 'Good Credit', 2.0: 'Bad Credit'}], 'protected_attribute_maps': [{1.0: 'Male', 0.0: 'Female'}, {1.0: 'Old', 0.0: 'Young'}]})[source]¶

German credit Dataset.

See aif360/data/raw/german/README.md.

See StandardDataset for a description of the arguments.

By default, this code converts the ‘age’ attribute to a binary value where privileged is age >= 25 and unprivileged is age < 25 as proposed by Kamiran and Calders [1].

References

[1]	F. Kamiran and T. Calders, “Classifying without discriminating,” 2nd International Conference on Computer, Control and Communication, 2009.

Examples

In some cases, it may be useful to keep track of a mapping from float -> str for protected attributes and/or labels. If our use case differs from the default, we can modify the mapping stored in metadata:

>>> label_map = {1.0: 'Good Credit', 0.0: 'Bad Credit'}
>>> protected_attribute_maps = [{1.0: 'Male', 0.0: 'Female'}]
>>> gd = GermanDataset(protected_attribute_names=['sex'],
... privileged_classes=[['male']], metadata={'label_map': label_map,
... 'protected_attribute_maps': protected_attribute_maps})

Now this information will stay attached to the dataset and can be used for more descriptive visualizations.