aif360.sklearn.datasets.fetch_compas

aif360.sklearn.datasets.fetch_compas(subset='all', *, data_home=None, cache=True, binary_race=False, usecols=['sex', 'age', 'age_cat', 'race', 'juv_fel_count', 'juv_misd_count', 'juv_other_count', 'priors_count', 'c_charge_degree', 'c_charge_desc'], dropcols=None, numeric_only=False, dropna=True)[source]

Load the COMPAS Recidivism Risk Scores dataset.

Optionally binarizes ‘race’ to ‘Caucasian’ (privileged) or ‘African-American’ (unprivileged). The other protected attribute is ‘sex’ (‘Male’ is unprivileged and ‘Female’ is privileged). The outcome variable is ‘Survived’ (favorable) if the person was not accused of a crime within two years or ‘Recidivated’ (unfavorable) if they were.

Note

The values for the ‘sex’ variable if numeric_only is True are 1 for ‘Female and 0 for ‘Male’ – opposite the convention of other datasets.

Parameters:
  • subset ({'all' or 'violent'}) – Use the violent recidivism or full version of the dataset. Note: ‘violent’ is not a strict subset of ‘all’ – there are four samples in ‘violent’ which do not show up in ‘all’.

  • data_home (string, optional) – Specify another download and cache folder for the datasets. By default all AIF360 datasets are stored in ‘aif360/sklearn/data/raw’ subfolders.

  • cache (bool) – Whether to cache downloaded datasets.

  • binary_race (bool, optional) – Filter only White and Black defendants.

  • usecols (single label or list-like, optional) – Feature column(s) to keep. All others are dropped.

  • dropcols (single label or list-like, optional) – Feature column(s) to drop.

  • numeric_only (bool) – Drop all non-numeric feature columns.

  • dropna (bool) – Drop rows with NAs.

Returns:

namedtuple – Tuple containing X and y for the COMPAS dataset accessible by index or name.