aif360.sklearn.datasets
.fetch_german¶
-
aif360.sklearn.datasets.
fetch_german
(data_home=None, binary_age=True, usecols=[], dropcols=[], numeric_only=False, dropna=True)[source]¶ Load the German Credit Dataset.
Protected attributes are ‘sex’ (‘male’ is privileged and ‘female’ is unprivileged) and ‘age’ (binarized by default as recommended by [1]: age >= 25 is considered privileged and age < 25 is considered unprivileged; see the binary_age flag to keep this continuous). The outcome variable is ‘credit-risk’: ‘good’ (favorable) or ‘bad’ (unfavorable).
Note
By default, the data is downloaded from OpenML. See the credit-g page for details.
Parameters: - data_home (string, optional) – Specify another download and cache folder for the datasets. By default all AIF360 datasets are stored in ‘aif360/sklearn/data/raw’ subfolders.
- binary_age (bool, optional) – If
True
, split protected attribute, ‘age’, into ‘aged’ (privileged) and ‘youth’ (unprivileged). The ‘age’ feature remains continuous. - usecols (single label or list-like, optional) – Column name(s) to keep. All others are dropped.
- dropcols (single label or list-like, optional) – Column name(s) to drop.
- numeric_only (bool) – Drop all non-numeric feature columns.
- dropna (bool) – Drop rows with NAs.
Returns: namedtuple – Tuple containing X and y for the German dataset accessible by index or name.
See also
References
[1] F. Kamiran and T. Calders, “Classifying without discriminating,” 2nd International Conference on Computer, Control and Communication, 2009. Examples
>>> german = fetch_german() >>> german.X.shape (1000, 21)
>>> german_num = fetch_german(numeric_only=True) >>> german_num.X.shape (1000, 7)
>>> X, y = fetch_german(numeric_only=True) >>> y_pred = LogisticRegression().fit(X, y).predict(X) >>> disparate_impact_ratio(y, y_pred, prot_attr='age', priv_group=True, ... pos_label='good') 0.9483094846144106