aif360.sklearn.datasets
.fetch_german
- aif360.sklearn.datasets.fetch_german(*, data_home=None, cache=True, binary_age=True, usecols=None, dropcols=None, numeric_only=False, dropna=True)[source]
Load the German Credit Dataset.
Protected attributes are ‘sex’ (‘male’ is privileged and ‘female’ is unprivileged) and ‘age’ (binarized by default as recommended by [1]: age >= 25 is considered privileged and age < 25 is considered unprivileged; see the binary_age flag to keep this continuous). The outcome variable is ‘credit-risk’: ‘good’ (favorable) or ‘bad’ (unfavorable).
Note
By default, the data is downloaded from OpenML. See the credit-g page for details.
- Parameters:
data_home (string, optional) – Specify another download and cache folder for the datasets. By default all AIF360 datasets are stored in ‘aif360/sklearn/data/raw’ subfolders.
cache (bool) – Whether to cache downloaded datasets.
binary_age (bool, optional) – If
True
, split protected attribute, ‘age’, into ‘aged’ (privileged) and ‘youth’ (unprivileged). The ‘age’ feature remains continuous.usecols (list-like, optional) – Column name(s) to keep. All others are dropped.
dropcols (list-like, optional) – Column name(s) to drop.
numeric_only (bool) – Drop all non-numeric feature columns.
dropna (bool) – Drop rows with NAs.
- Returns:
namedtuple – Tuple containing X and y for the German dataset accessible by index or name.
See also
References
Examples
>>> german = fetch_german() >>> german.X.shape (1000, 21)
>>> german_num = fetch_german(numeric_only=True) >>> german_num.X.shape (1000, 7)
>>> X, y = fetch_german(numeric_only=True) >>> y_pred = LogisticRegression().fit(X, y).predict(X) >>> disparate_impact_ratio(y, y_pred, prot_attr='age', priv_group=True, ... pos_label='good') 0.9483094846144106