aif360.sklearn.detectors.FACTS

class aif360.sklearn.detectors.FACTS(clf, prot_attr, categorical_features=None, freq_itemset_min_supp=0.1, feature_weights={}, feats_allowed_to_change=None, feats_not_allowed_to_change=None)[source]

Fairness aware counterfactuals for subgroups (FACTS) detector.

FACTS is an efficient, model-agnostic, highly parameterizable, and explainable framework for evaluating subgroup fairness through counterfactual explanations [1].

This class is a wrapper for the various methods exposed by the FACTS framework.

References

Parameters:
  • clf (sklearn.base.BaseEstimator) – A trained and ready to use classifier, implementing method predict(X), where X is the matrix of features; predictions returned by predict(X) are either 0 or 1. In other words, fitted scikit-learn classifiers.

  • prot_attr (str) – the name of the column that represents the protected attribute.

  • categorical_features (list(str), optional) – the list of categorical features. The default is to choose (dynamically, inside fit) the columns of the dataset with types “object” or “category”.

  • freq_itemset_min_supp (float, optional) – minimum support for all the runs of the frequent itemset mining algorithm (specifically, FP Growth). We mine frequent itemsets to generate candidate subpopulation groups and candidate actions. For more information, see paper [1]. Defaults to 10%.

  • feature_weights (dict(str, float), optional) – the weights for each feature. Used in the calculation of the cost of a suggested change. Specifically, the term corresponding to each feature is multiplied by this weight. Defaults to 1, for all features.

  • feats_allowed_to_change (list(str), optional) – if provided, only allows these features to change value in the suggested recourses. Default: no frozen features. Note: providing both feats_allowed_to_change and feats_not_allowed_to_change is currently treated as an error.

  • feats_not_allowed_to_change (list(str), optional) – if provided, prevents these features from changing at all in any given recourse. Default: no frozen features. Note: providing both feats_allowed_to_change and feats_not_allowed_to_change is currently treated as an error.

Methods

bias_scan

Examines generated subgroups and calculates the top_count most unfair ones, with respect to the chosen metric.

fit

Calculates subpopulation groups, actions and respective effectiveness

get_metadata_routing

Get metadata routing of this object.

get_params

Get parameters for this estimator.

print_recourse_report

Prints a nicely formatted report of the results (subpopulation groups and recourses) discovered by the bias_scan method.

set_fit_request

Request metadata passed to the fit method.

set_params

Set the parameters of this estimator.

__init__(clf, prot_attr, categorical_features=None, freq_itemset_min_supp=0.1, feature_weights={}, feats_allowed_to_change=None, feats_not_allowed_to_change=None)[source]
Parameters:
  • clf (sklearn.base.BaseEstimator) – A trained and ready to use classifier, implementing method predict(X), where X is the matrix of features; predictions returned by predict(X) are either 0 or 1. In other words, fitted scikit-learn classifiers.

  • prot_attr (str) – the name of the column that represents the protected attribute.

  • categorical_features (list(str), optional) – the list of categorical features. The default is to choose (dynamically, inside fit) the columns of the dataset with types “object” or “category”.

  • freq_itemset_min_supp (float, optional) –

    minimum support for all the runs of the frequent itemset mining algorithm (specifically, FP Growth). We mine frequent itemsets to generate candidate subpopulation groups and candidate actions. For more information, see paper [1]. Defaults to 10%.

  • feature_weights (dict(str, float), optional) – the weights for each feature. Used in the calculation of the cost of a suggested change. Specifically, the term corresponding to each feature is multiplied by this weight. Defaults to 1, for all features.

  • feats_allowed_to_change (list(str), optional) – if provided, only allows these features to change value in the suggested recourses. Default: no frozen features. Note: providing both feats_allowed_to_change and feats_not_allowed_to_change is currently treated as an error.

  • feats_not_allowed_to_change (list(str), optional) – if provided, prevents these features from changing at all in any given recourse. Default: no frozen features. Note: providing both feats_allowed_to_change and feats_not_allowed_to_change is currently treated as an error.

bias_scan(metric: str = 'equal-effectiveness', viewpoint: str = 'macro', sort_strategy: str = 'max-cost-diff-decr', top_count: int = 10, filter_sequence: List[str] = [], phi: float = 0.5, c: float = 0.5)[source]

Examines generated subgroups and calculates the top_count most unfair ones, with respect to the chosen metric.

Stores the final groups in instance variable self.top_rules and the respective subgroup costs in self.subgroup_costs (or self.unfairness for the “fair-tradeoff” metric).

Parameters:
  • metric (str, optional) –

    one of the following choices

    • ”equal-effectiveness”

    • ”equal-choice-for-recourse”

    • ”equal-effectiveness-within-budget”

    • ”equal-cost-of-effectiveness”

    • ”equal-mean-recourse”

    • ”fair-tradeoff”

    Defaults to “equal-effectiveness”.

    For explanation of each of those metrics, refer either to the paper [1] or the demo_FACTS notebook.

  • viewpoint (str, optional) –

    “macro” or “micro”. Refers to the notions of “macro viewpoint” and “micro viewpoint” defined in section 2.2 of the paper [1].

    As a short explanation, consider a set of actions A and a subgroup (cohort / set of individuals) G. Metrics with the macro viewpoint interpretation are constrained to always apply one action from A to the entire G, while metrics with the micro interpretation are allowed to give each individual in G the min-cost action from A which changes the individual’s class.

    Note that not all combinations of metric and viewpoint are valid, e.g. “Equal Choice for Recourse” only has a macro interpretation.

    Defaults to “macro”.

  • sort_strategy (str, optional) –

    one of the following choices

    • "max-cost-diff-decr": simply rank the groups in descending order according to the unfairness metric.

    • "max-cost-diff-decr-ignore-forall-subgroups-empty": ignore groups for which we have no available actions whatsoever.

    • "max-cost-diff-decr-ignore-exists-subgroup-empty": ignore groups for which at least one protected subgroup has no available actions.

    Defaults to “max-cost-diff-decr”.

  • top_count (int, optional) – the number of subpopulation groups that the algorithm will keep. Defaults to 10.

  • filter_sequence (List[str], optional) –

    List of various filters applied on the groups and / or actions. Available filters are:

    • "remove-contained": does not show groups which are subsumed by other shown groups. By “subsumed” we mean that the group is defined by extra feature values, but those values are not changed by any action.

    • "remove-below-thr-corr": does not show actions which are below the given effectiveness threshold. Refer also to the documentation of parameter phi below.

    • "remove-above-thr-cost": does not show action that cost more than the given cost budget. Refer also to the documentation of parameter c below.

    • "keep-rules-until-thr-corr-reached":

    • "remove-fair-rules": do not show groups which do not exhibit bias.

    • "keep-only-min-change": for each group shown, show only the suggested actions that have minimum cost, ignore the others.

    Defaults to [].

  • phi (float, optional) – effectiveness threshold. Real number in [0, 1]. Applicable for “equal-choice-for-recourse” and “equal-cost-of-effectiveness” metrics. For these two metrics, an action is considered to achieve recourse for a subpopulation group if at least phi % of the group’s individuals achieve recourse. Defaults to 0.5.

  • c (float, optional) – cost budget. Real number. Applicable for “equal-effectiveness-within-budget” metric. Specifies the maximum cost that can be payed for an action (by the individual, by a central authority etc.) Defaults to 0.5.

fit(X: DataFrame, verbose: bool = True)[source]

Calculates subpopulation groups, actions and respective effectiveness

Parameters:
  • X (DataFrame) – Dataset given as a pandas.DataFrame. As in standard scikit-learn convention, it is expected to contain one instance per row and one feature / explanatory variable per column (labels not needed, we already have an ML model).

  • verbose (bool) – whether to print intermediate messages and progress bar. Defaults to True.

Raises:
  • ValueErrorfeats_allowed_to_change and feats_not_allowed_to_change cannot be given simultaneously.

  • Exception – when unreachable code is executed.

Returns:

FACTS – Returns self.

print_recourse_report(population_sizes=None, missing_subgroup_val='N/A', show_subgroup_costs=False, show_action_costs=False, show_cumulative_plots=False, show_bias=None, show_unbiased_subgroups=True, correctness_metric=False)[source]

Prints a nicely formatted report of the results (subpopulation groups and recourses) discovered by the bias_scan method.

Parameters:
  • population_sizes (dict(str, int), optional) – Number of individuals that are given the negative prediction by the model, for each subgroup. If given, it is included in the report together with some coverage percentages.

  • missing_subgroup_val (str, optional) – Optionally specify a value of the protected attribute which denotes that it is missing and should not be included in the printed results. Defaults to “N/A”.

  • show_subgroup_costs (bool, optional) – Whether to show the costs assigned to each protected subgroup. Defaults to False.

  • show_action_costs (bool, optional) – Whether to show the costs assigned to each specific action. Defaults to False.

  • show_cumulative_plots (bool, optional) – If true, shows, for each subgroup, a graph of the effectiveness cumulative distribution, as it is called in [1]. Defaults to False.

  • show_bias (str, optional) – Specify which value of the protected attribute corresponds to the subgroup against which we want to find unfairness. Mainly useful for when the protected attribute is not binary (e.g. race). Defaults to None.

  • correctness_metric (bool, optional) – if True, the metric is considered to quantify utility, i.e. the greater it is for a group, the more beneficial it is for the individuals of the group. Defaults to False.

  • metric_name (str, optional) – If given, it is added to the the printed message for unfairness in a subpopulation group, i.e. the method prints “Bias against females due to <metric_name>”.

Raises:

RuntimeError – if costs for groups and subgroups are empty. Most likely the bias_scan method was not run.

set_fit_request(*, verbose: bool | None | str = '$UNCHANGED$') FACTS[source]

Request metadata passed to the fit method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to fit if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to fit.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

New in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline. Otherwise it has no effect.

Parameters:

verbose (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for verbose parameter in fit.

Returns:

self (object) – The updated object.