aif360.sklearn.datasets.fetch_german

aif360.sklearn.datasets.fetch_german(*, data_home=None, cache=True, binary_age=True, usecols=None, dropcols=None, numeric_only=False, dropna=True)[source]

Load the German Credit Dataset.

Protected attributes are ‘sex’ (‘male’ is privileged and ‘female’ is unprivileged) and ‘age’ (binarized by default as recommended by [1]: age >= 25 is considered privileged and age < 25 is considered unprivileged; see the binary_age flag to keep this continuous). The outcome variable is ‘credit-risk’: ‘good’ (favorable) or ‘bad’ (unfavorable).

Note

By default, the data is downloaded from OpenML. See the credit-g page for details.

Parameters:
  • data_home (string, optional) – Specify another download and cache folder for the datasets. By default all AIF360 datasets are stored in ‘aif360/sklearn/data/raw’ subfolders.

  • cache (bool) – Whether to cache downloaded datasets.

  • binary_age (bool, optional) – If True, split protected attribute, ‘age’, into ‘aged’ (privileged) and ‘youth’ (unprivileged). The ‘age’ feature remains continuous.

  • usecols (list-like, optional) – Column name(s) to keep. All others are dropped.

  • dropcols (list-like, optional) – Column name(s) to drop.

  • numeric_only (bool) – Drop all non-numeric feature columns.

  • dropna (bool) – Drop rows with NAs.

Returns:

namedtuple – Tuple containing X and y for the German dataset accessible by index or name.

References

Examples

>>> german = fetch_german()
>>> german.X.shape
(1000, 21)
>>> german_num = fetch_german(numeric_only=True)
>>> german_num.X.shape
(1000, 7)
>>> X, y = fetch_german(numeric_only=True)
>>> y_pred = LogisticRegression().fit(X, y).predict(X)
>>> disparate_impact_ratio(y, y_pred, prot_attr='age', priv_group=True,
... pos_label='good')
0.9483094846144106