aif360.datasets
.AdultDataset
- class aif360.datasets.AdultDataset(label_name='income-per-year', favorable_classes=['>50K', '>50K.'], protected_attribute_names=['race', 'sex'], privileged_classes=[['White'], ['Male']], instance_weights_name=None, categorical_features=['workclass', 'education', 'marital-status', 'occupation', 'relationship', 'native-country'], features_to_keep=[], features_to_drop=['fnlwgt'], na_values=['?'], custom_preprocessing=None, metadata={'label_maps': [{0.0: '<=50K', 1.0: '>50K'}], 'protected_attribute_maps': [{0.0: 'Non-white', 1.0: 'White'}, {0.0: 'Female', 1.0: 'Male'}]})[source]
Adult Census Income Dataset.
See
aif360/data/raw/adult/README.md
.See
StandardDataset
for a description of the arguments.Examples
The following will instantiate a dataset which uses the
fnlwgt
feature:>>> from aif360.datasets import AdultDataset >>> ad = AdultDataset(instance_weights_name='fnlwgt', ... features_to_drop=[]) WARNING:root:Missing Data: 3620 rows removed from dataset. >>> not np.all(ad.instance_weights == 1.) True
To instantiate a dataset which utilizes only numerical features and a single protected attribute, run:
>>> single_protected = ['sex'] >>> single_privileged = [['Male']] >>> ad = AdultDataset(protected_attribute_names=single_protected, ... privileged_classes=single_privileged, ... categorical_features=[], ... features_to_keep=['age', 'education-num']) >>> print(ad.feature_names) ['education-num', 'age', 'sex'] >>> print(ad.label_names) ['income-per-year']
Note: the
protected_attribute_names
andlabel_name
are kept even if they are not explicitly given infeatures_to_keep
.In some cases, it may be useful to keep track of a mapping from
float -> str
for protected attributes and/or labels. If our use case differs from the default, we can modify the mapping stored inmetadata
:>>> label_map = {1.0: '>50K', 0.0: '<=50K'} >>> protected_attribute_maps = [{1.0: 'Male', 0.0: 'Female'}] >>> ad = AdultDataset(protected_attribute_names=['sex'], ... categorical_features=['workclass', 'education', 'marital-status', ... 'occupation', 'relationship', 'native-country', 'race'], ... privileged_classes=[['Male']], metadata={'label_map': label_map, ... 'protected_attribute_maps': protected_attribute_maps})
Note that we are now adding
race
as acategorical_features
. Now this information will stay attached to the dataset and can be used for more descriptive visualizations.Methods
align_datasets
Align the other dataset features, labels and protected_attributes to this dataset.
convert_to_dataframe
Convert the StructuredDataset to a
pandas.DataFrame
.copy
Convenience method to return a copy of this dataset.
export_dataset
Export the dataset and supporting attributes TODO: The preferred file format is HDF
import_dataset
Import the dataset and supporting attributes TODO: The preferred file format is HDF
split
Split this dataset into multiple partitions.
subset
Subset of dataset based on position :param indexes: iterable which contains row indexes
temporarily_ignore
Temporarily add the fields provided to
ignore_fields
.validate_dataset
Error checking and type validation.
- __init__(label_name='income-per-year', favorable_classes=['>50K', '>50K.'], protected_attribute_names=['race', 'sex'], privileged_classes=[['White'], ['Male']], instance_weights_name=None, categorical_features=['workclass', 'education', 'marital-status', 'occupation', 'relationship', 'native-country'], features_to_keep=[], features_to_drop=['fnlwgt'], na_values=['?'], custom_preprocessing=None, metadata={'label_maps': [{0.0: '<=50K', 1.0: '>50K'}], 'protected_attribute_maps': [{0.0: 'Non-white', 1.0: 'White'}, {0.0: 'Female', 1.0: 'Male'}]})[source]
See
StandardDataset
for a description of the arguments.Examples
The following will instantiate a dataset which uses the
fnlwgt
feature:>>> from aif360.datasets import AdultDataset >>> ad = AdultDataset(instance_weights_name='fnlwgt', ... features_to_drop=[]) WARNING:root:Missing Data: 3620 rows removed from dataset. >>> not np.all(ad.instance_weights == 1.) True
To instantiate a dataset which utilizes only numerical features and a single protected attribute, run:
>>> single_protected = ['sex'] >>> single_privileged = [['Male']] >>> ad = AdultDataset(protected_attribute_names=single_protected, ... privileged_classes=single_privileged, ... categorical_features=[], ... features_to_keep=['age', 'education-num']) >>> print(ad.feature_names) ['education-num', 'age', 'sex'] >>> print(ad.label_names) ['income-per-year']
Note: the
protected_attribute_names
andlabel_name
are kept even if they are not explicitly given infeatures_to_keep
.In some cases, it may be useful to keep track of a mapping from
float -> str
for protected attributes and/or labels. If our use case differs from the default, we can modify the mapping stored inmetadata
:>>> label_map = {1.0: '>50K', 0.0: '<=50K'} >>> protected_attribute_maps = [{1.0: 'Male', 0.0: 'Female'}] >>> ad = AdultDataset(protected_attribute_names=['sex'], ... categorical_features=['workclass', 'education', 'marital-status', ... 'occupation', 'relationship', 'native-country', 'race'], ... privileged_classes=[['Male']], metadata={'label_map': label_map, ... 'protected_attribute_maps': protected_attribute_maps})
Note that we are now adding
race
as acategorical_features
. Now this information will stay attached to the dataset and can be used for more descriptive visualizations.