aif360.sklearn.datasets.fetch_adult

aif360.sklearn.datasets.fetch_adult(subset='all', *, data_home=None, cache=True, binary_race=True, usecols=None, dropcols=None, numeric_only=False, dropna=True)[source]

Load the Adult Census Income Dataset.

Binarizes ‘race’ to ‘White’ (privileged) or ‘Non-white’ (unprivileged). The other protected attribute is ‘sex’ (‘Male’ is privileged and ‘Female’ is unprivileged). The outcome variable is ‘annual-income’: ‘>50K’ (favorable) or ‘<=50K’ (unfavorable).

Note

By default, the data is downloaded from OpenML. See the adult page for details.

Parameters:
  • subset ({'train', 'test', or 'all'}, optional) – Select the dataset to load: ‘train’ for the training set, ‘test’ for the test set, ‘all’ for both.

  • data_home (string, optional) – Specify another download and cache folder for the datasets. By default all AIF360 datasets are stored in ‘aif360/sklearn/data/raw’ subfolders.

  • cache (bool) – Whether to cache downloaded datasets.

  • binary_race (bool, optional) – Group all non-white races together. Only the protected attribute is affected, not the feature column, unless numeric_only is True.

  • usecols (list-like, optional) – Feature column(s) to keep. All others are dropped.

  • dropcols (list-like, optional) – Feature column(s) to drop.

  • numeric_only (bool) – Drop all non-numeric feature columns.

  • dropna (bool) – Drop rows with NAs.

Returns:

namedtuple – Tuple containing X, y, and sample_weights for the Adult dataset accessible by index or name.

Examples

>>> adult = fetch_adult()
>>> adult.X.shape
(45222, 13)
>>> adult_num = fetch_adult(numeric_only=True)
>>> adult_num.X.shape
(48842, 5)