aif360.sklearn.datasets
.fetch_adult
- aif360.sklearn.datasets.fetch_adult(subset='all', *, data_home=None, cache=True, binary_race=True, usecols=None, dropcols=None, numeric_only=False, dropna=True)[source]
Load the Adult Census Income Dataset.
Binarizes ‘race’ to ‘White’ (privileged) or ‘Non-white’ (unprivileged). The other protected attribute is ‘sex’ (‘Male’ is privileged and ‘Female’ is unprivileged). The outcome variable is ‘annual-income’: ‘>50K’ (favorable) or ‘<=50K’ (unfavorable).
Note
By default, the data is downloaded from OpenML. See the adult page for details.
- Parameters:
subset ({'train', 'test', or 'all'}, optional) – Select the dataset to load: ‘train’ for the training set, ‘test’ for the test set, ‘all’ for both.
data_home (string, optional) – Specify another download and cache folder for the datasets. By default all AIF360 datasets are stored in ‘aif360/sklearn/data/raw’ subfolders.
cache (bool) – Whether to cache downloaded datasets.
binary_race (bool, optional) – Group all non-white races together. Only the protected attribute is affected, not the feature column, unless numeric_only is
True
.usecols (list-like, optional) – Feature column(s) to keep. All others are dropped.
dropcols (list-like, optional) – Feature column(s) to drop.
numeric_only (bool) – Drop all non-numeric feature columns.
dropna (bool) – Drop rows with NAs.
- Returns:
namedtuple – Tuple containing X, y, and sample_weights for the Adult dataset accessible by index or name.
See also
Examples
>>> adult = fetch_adult() >>> adult.X.shape (45222, 13)
>>> adult_num = fetch_adult(numeric_only=True) >>> adult_num.X.shape (48842, 5)