aif360.datasets.AdultDataset

class aif360.datasets.AdultDataset(label_name='income-per-year', favorable_classes=['>50K', '>50K.'], protected_attribute_names=['race', 'sex'], privileged_classes=[['White'], ['Male']], instance_weights_name=None, categorical_features=['workclass', 'education', 'marital-status', 'occupation', 'relationship', 'native-country'], features_to_keep=[], features_to_drop=['fnlwgt'], na_values=['?'], custom_preprocessing=None, metadata={'label_maps': [{0.0: '<=50K', 1.0: '>50K'}], 'protected_attribute_maps': [{0.0: 'Non-white', 1.0: 'White'}, {0.0: 'Female', 1.0: 'Male'}]})[source]

Adult Census Income Dataset.

See aif360/data/raw/adult/README.md.

See StandardDataset for a description of the arguments.

Examples

The following will instantiate a dataset which uses the fnlwgt feature:

>>> from aif360.datasets import AdultDataset
>>> ad = AdultDataset(instance_weights_name='fnlwgt',
... features_to_drop=[])
WARNING:root:Missing Data: 3620 rows removed from dataset.
>>> not np.all(ad.instance_weights == 1.)
True

To instantiate a dataset which utilizes only numerical features and a single protected attribute, run:

>>> single_protected = ['sex']
>>> single_privileged = [['Male']]
>>> ad = AdultDataset(protected_attribute_names=single_protected,
... privileged_classes=single_privileged,
... categorical_features=[],
... features_to_keep=['age', 'education-num'])
>>> print(ad.feature_names)
['education-num', 'age', 'sex']
>>> print(ad.label_names)
['income-per-year']

Note: the protected_attribute_names and label_name are kept even if they are not explicitly given in features_to_keep.

In some cases, it may be useful to keep track of a mapping from float -> str for protected attributes and/or labels. If our use case differs from the default, we can modify the mapping stored in metadata:

>>> label_map = {1.0: '>50K', 0.0: '<=50K'}
>>> protected_attribute_maps = [{1.0: 'Male', 0.0: 'Female'}]
>>> ad = AdultDataset(protected_attribute_names=['sex'],
... categorical_features=['workclass', 'education', 'marital-status',
... 'occupation', 'relationship', 'native-country', 'race'],
... privileged_classes=[['Male']], metadata={'label_map': label_map,
... 'protected_attribute_maps': protected_attribute_maps})

Note that we are now adding race as a categorical_features. Now this information will stay attached to the dataset and can be used for more descriptive visualizations.

Methods

align_datasets

Align the other dataset features, labels and protected_attributes to this dataset.

convert_to_dataframe

Convert the StructuredDataset to a pandas.DataFrame.

copy

Convenience method to return a copy of this dataset.

export_dataset

Export the dataset and supporting attributes TODO: The preferred file format is HDF

import_dataset

Import the dataset and supporting attributes TODO: The preferred file format is HDF

split

Split this dataset into multiple partitions.

subset

Subset of dataset based on position :param indexes: iterable which contains row indexes

temporarily_ignore

Temporarily add the fields provided to ignore_fields.

validate_dataset

Error checking and type validation.

__init__(label_name='income-per-year', favorable_classes=['>50K', '>50K.'], protected_attribute_names=['race', 'sex'], privileged_classes=[['White'], ['Male']], instance_weights_name=None, categorical_features=['workclass', 'education', 'marital-status', 'occupation', 'relationship', 'native-country'], features_to_keep=[], features_to_drop=['fnlwgt'], na_values=['?'], custom_preprocessing=None, metadata={'label_maps': [{0.0: '<=50K', 1.0: '>50K'}], 'protected_attribute_maps': [{0.0: 'Non-white', 1.0: 'White'}, {0.0: 'Female', 1.0: 'Male'}]})[source]

See StandardDataset for a description of the arguments.

Examples

The following will instantiate a dataset which uses the fnlwgt feature:

>>> from aif360.datasets import AdultDataset
>>> ad = AdultDataset(instance_weights_name='fnlwgt',
... features_to_drop=[])
WARNING:root:Missing Data: 3620 rows removed from dataset.
>>> not np.all(ad.instance_weights == 1.)
True

To instantiate a dataset which utilizes only numerical features and a single protected attribute, run:

>>> single_protected = ['sex']
>>> single_privileged = [['Male']]
>>> ad = AdultDataset(protected_attribute_names=single_protected,
... privileged_classes=single_privileged,
... categorical_features=[],
... features_to_keep=['age', 'education-num'])
>>> print(ad.feature_names)
['education-num', 'age', 'sex']
>>> print(ad.label_names)
['income-per-year']

Note: the protected_attribute_names and label_name are kept even if they are not explicitly given in features_to_keep.

In some cases, it may be useful to keep track of a mapping from float -> str for protected attributes and/or labels. If our use case differs from the default, we can modify the mapping stored in metadata:

>>> label_map = {1.0: '>50K', 0.0: '<=50K'}
>>> protected_attribute_maps = [{1.0: 'Male', 0.0: 'Female'}]
>>> ad = AdultDataset(protected_attribute_names=['sex'],
... categorical_features=['workclass', 'education', 'marital-status',
... 'occupation', 'relationship', 'native-country', 'race'],
... privileged_classes=[['Male']], metadata={'label_map': label_map,
... 'protected_attribute_maps': protected_attribute_maps})

Note that we are now adding race as a categorical_features. Now this information will stay attached to the dataset and can be used for more descriptive visualizations.