aif360.sklearn.datasets.fetch_german(data_home=None, binary_age=True, usecols=[], dropcols=[], numeric_only=False, dropna=True)[source]

Load the German Credit Dataset.

Protected attributes are ‘sex’ (‘male’ is privileged and ‘female’ is unprivileged) and ‘age’ (binarized by default as recommended by [1]: age >= 25 is considered privileged and age < 25 is considered unprivileged; see the binary_age flag to keep this continuous). The outcome variable is ‘credit-risk’: ‘good’ (favorable) or ‘bad’ (unfavorable).


By default, the data is downloaded from OpenML. See the credit-g page for details.

  • data_home (string, optional) – Specify another download and cache folder for the datasets. By default all AIF360 datasets are stored in ‘aif360/sklearn/data/raw’ subfolders.
  • binary_age (bool, optional) – If True, split protected attribute, ‘age’, into ‘aged’ (privileged) and ‘youth’ (unprivileged). The ‘age’ feature remains continuous.
  • usecols (single label or list-like, optional) – Column name(s) to keep. All others are dropped.
  • dropcols (single label or list-like, optional) – Column name(s) to drop.
  • numeric_only (bool) – Drop all non-numeric feature columns.
  • dropna (bool) – Drop rows with NAs.

namedtuple – Tuple containing X and y for the German dataset accessible by index or name.


[1]F. Kamiran and T. Calders, “Classifying without discriminating,” 2nd International Conference on Computer, Control and Communication, 2009.


>>> german = fetch_german()
>>> german.X.shape
(1000, 21)
>>> german_num = fetch_german(numeric_only=True)
>>> german_num.X.shape
(1000, 7)
>>> X, y = fetch_german(numeric_only=True)
>>> y_pred = LogisticRegression().fit(X, y).predict(X)
>>> disparate_impact_ratio(y, y_pred, prot_attr='age', priv_group=True,
... pos_label='good')