aif360.datasets.GermanDataset

class aif360.datasets.GermanDataset(label_name='credit', favorable_classes=[1], protected_attribute_names=['sex', 'age'], privileged_classes=[['male'], <function GermanDataset.<lambda>>], instance_weights_name=None, categorical_features=['status', 'credit_history', 'purpose', 'savings', 'employment', 'other_debtors', 'property', 'installment_plans', 'housing', 'skill_level', 'telephone', 'foreign_worker'], features_to_keep=[], features_to_drop=['personal_status'], na_values=[], custom_preprocessing=<function default_preprocessing>, metadata={'label_maps': [{1.0: 'Good Credit', 2.0: 'Bad Credit'}], 'protected_attribute_maps': [{1.0: 'Male', 0.0: 'Female'}, {1.0: 'Old', 0.0: 'Young'}]})[source]

German credit Dataset.

See aif360/data/raw/german/README.md.

See StandardDataset for a description of the arguments.

By default, this code converts the ‘age’ attribute to a binary value where privileged is age > 25 and unprivileged is age <= 25 as proposed by Kamiran and Calders [1]_.

References

[1]F. Kamiran and T. Calders, “Classifying without discriminating,” 2nd International Conference on Computer, Control and Communication, 2009.

Examples

In some cases, it may be useful to keep track of a mapping from float -> str for protected attributes and/or labels. If our use case differs from the default, we can modify the mapping stored in metadata:

>>> label_map = {1.0: 'Good Credit', 0.0: 'Bad Credit'}
>>> protected_attribute_maps = [{1.0: 'Male', 0.0: 'Female'}]
>>> gd = GermanDataset(protected_attribute_names=['sex'],
... privileged_classes=[['male']], metadata={'label_map': label_map,
... 'protected_attribute_maps': protected_attribute_maps})

Now this information will stay attached to the dataset and can be used for more descriptive visualizations.

Methods

align_datasets Align the other dataset features, labels and protected_attributes to this dataset.
convert_to_dataframe Convert the StructuredDataset to a pandas.DataFrame.
copy Convenience method to return a copy of this dataset.
export_dataset Export the dataset and supporting attributes TODO: The preferred file format is HDF
import_dataset Import the dataset and supporting attributes TODO: The preferred file format is HDF
split Split this dataset into multiple partitions.
subset Subset of dataset based on position :param indexes: iterable which contains row indexes
temporarily_ignore Temporarily add the fields provided to ignore_fields.
validate_dataset Error checking and type validation.
__init__(label_name='credit', favorable_classes=[1], protected_attribute_names=['sex', 'age'], privileged_classes=[['male'], <function GermanDataset.<lambda>>], instance_weights_name=None, categorical_features=['status', 'credit_history', 'purpose', 'savings', 'employment', 'other_debtors', 'property', 'installment_plans', 'housing', 'skill_level', 'telephone', 'foreign_worker'], features_to_keep=[], features_to_drop=['personal_status'], na_values=[], custom_preprocessing=<function default_preprocessing>, metadata={'label_maps': [{1.0: 'Good Credit', 2.0: 'Bad Credit'}], 'protected_attribute_maps': [{1.0: 'Male', 0.0: 'Female'}, {1.0: 'Old', 0.0: 'Young'}]})[source]

See StandardDataset for a description of the arguments.

By default, this code converts the ‘age’ attribute to a binary value where privileged is age > 25 and unprivileged is age <= 25 as proposed by Kamiran and Calders [1]_.

References

[1]F. Kamiran and T. Calders, “Classifying without discriminating,” 2nd International Conference on Computer, Control and Communication, 2009.

Examples

In some cases, it may be useful to keep track of a mapping from float -> str for protected attributes and/or labels. If our use case differs from the default, we can modify the mapping stored in metadata:

>>> label_map = {1.0: 'Good Credit', 0.0: 'Bad Credit'}
>>> protected_attribute_maps = [{1.0: 'Male', 0.0: 'Female'}]
>>> gd = GermanDataset(protected_attribute_names=['sex'],
... privileged_classes=[['male']], metadata={'label_map': label_map,
... 'protected_attribute_maps': protected_attribute_maps})

Now this information will stay attached to the dataset and can be used for more descriptive visualizations.