aif360.sklearn.detectors
.FACTS
- class aif360.sklearn.detectors.FACTS(clf, prot_attr, categorical_features=None, freq_itemset_min_supp=0.1, feature_weights={}, feats_allowed_to_change=None, feats_not_allowed_to_change=None)[source]
Fairness aware counterfactuals for subgroups (FACTS) detector.
FACTS is an efficient, model-agnostic, highly parameterizable, and explainable framework for evaluating subgroup fairness through counterfactual explanations [1].
This class is a wrapper for the various methods exposed by the FACTS framework.
References
- Parameters:
clf (sklearn.base.BaseEstimator) – A trained and ready to use classifier, implementing method
predict(X)
, whereX
is the matrix of features; predictions returned bypredict(X)
are either 0 or 1. In other words, fitted scikit-learn classifiers.prot_attr (str) – the name of the column that represents the protected attribute.
categorical_features (list(str), optional) – the list of categorical features. The default is to choose (dynamically, inside
fit
) the columns of the dataset with types “object” or “category”.freq_itemset_min_supp (float, optional) – minimum support for all the runs of the frequent itemset mining algorithm (specifically, FP Growth). We mine frequent itemsets to generate candidate subpopulation groups and candidate actions. For more information, see paper [1]. Defaults to 10%.
feature_weights (dict(str, float), optional) – the weights for each feature. Used in the calculation of the cost of a suggested change. Specifically, the term corresponding to each feature is multiplied by this weight. Defaults to 1, for all features.
feats_allowed_to_change (list(str), optional) – if provided, only allows these features to change value in the suggested recourses. Default: no frozen features. Note: providing both
feats_allowed_to_change
andfeats_not_allowed_to_change
is currently treated as an error.feats_not_allowed_to_change (list(str), optional) – if provided, prevents these features from changing at all in any given recourse. Default: no frozen features. Note: providing both
feats_allowed_to_change
andfeats_not_allowed_to_change
is currently treated as an error.
Methods
Examines generated subgroups and calculates the
top_count
most unfair ones, with respect to the chosen metric.Calculates subpopulation groups, actions and respective effectiveness
get_metadata_routing
Get metadata routing of this object.
get_params
Get parameters for this estimator.
Prints a nicely formatted report of the results (subpopulation groups and recourses) discovered by the
bias_scan
method.Request metadata passed to the
fit
method.set_params
Set the parameters of this estimator.
- __init__(clf, prot_attr, categorical_features=None, freq_itemset_min_supp=0.1, feature_weights={}, feats_allowed_to_change=None, feats_not_allowed_to_change=None)[source]
- Parameters:
clf (sklearn.base.BaseEstimator) – A trained and ready to use classifier, implementing method
predict(X)
, whereX
is the matrix of features; predictions returned bypredict(X)
are either 0 or 1. In other words, fitted scikit-learn classifiers.prot_attr (str) – the name of the column that represents the protected attribute.
categorical_features (list(str), optional) – the list of categorical features. The default is to choose (dynamically, inside
fit
) the columns of the dataset with types “object” or “category”.freq_itemset_min_supp (float, optional) –
minimum support for all the runs of the frequent itemset mining algorithm (specifically, FP Growth). We mine frequent itemsets to generate candidate subpopulation groups and candidate actions. For more information, see paper [1]. Defaults to 10%.
feature_weights (dict(str, float), optional) – the weights for each feature. Used in the calculation of the cost of a suggested change. Specifically, the term corresponding to each feature is multiplied by this weight. Defaults to 1, for all features.
feats_allowed_to_change (list(str), optional) – if provided, only allows these features to change value in the suggested recourses. Default: no frozen features. Note: providing both
feats_allowed_to_change
andfeats_not_allowed_to_change
is currently treated as an error.feats_not_allowed_to_change (list(str), optional) – if provided, prevents these features from changing at all in any given recourse. Default: no frozen features. Note: providing both
feats_allowed_to_change
andfeats_not_allowed_to_change
is currently treated as an error.
- bias_scan(metric: str = 'equal-effectiveness', viewpoint: str = 'macro', sort_strategy: str = 'max-cost-diff-decr', top_count: int = 10, filter_sequence: List[str] = [], phi: float = 0.5, c: float = 0.5)[source]
Examines generated subgroups and calculates the
top_count
most unfair ones, with respect to the chosen metric.Stores the final groups in instance variable
self.top_rules
and the respective subgroup costs inself.subgroup_costs
(orself.unfairness
for the “fair-tradeoff” metric).- Parameters:
metric (str, optional) –
one of the following choices
”equal-effectiveness”
”equal-choice-for-recourse”
”equal-effectiveness-within-budget”
”equal-cost-of-effectiveness”
”equal-mean-recourse”
”fair-tradeoff”
Defaults to “equal-effectiveness”.
For explanation of each of those metrics, refer either to the paper [1] or the demo_FACTS notebook.
viewpoint (str, optional) –
“macro” or “micro”. Refers to the notions of “macro viewpoint” and “micro viewpoint” defined in section 2.2 of the paper [1].
As a short explanation, consider a set of actions A and a subgroup (cohort / set of individuals) G. Metrics with the macro viewpoint interpretation are constrained to always apply one action from A to the entire G, while metrics with the micro interpretation are allowed to give each individual in G the min-cost action from A which changes the individual’s class.
Note that not all combinations of
metric
andviewpoint
are valid, e.g. “Equal Choice for Recourse” only has a macro interpretation.Defaults to “macro”.
sort_strategy (str, optional) –
one of the following choices
"max-cost-diff-decr"
: simply rank the groups in descending order according to the unfairness metric."max-cost-diff-decr-ignore-forall-subgroups-empty"
: ignore groups for which we have no available actions whatsoever."max-cost-diff-decr-ignore-exists-subgroup-empty"
: ignore groups for which at least one protected subgroup has no available actions.
Defaults to “max-cost-diff-decr”.
top_count (int, optional) – the number of subpopulation groups that the algorithm will keep. Defaults to 10.
filter_sequence (List[str], optional) –
List of various filters applied on the groups and / or actions. Available filters are:
"remove-contained"
: does not show groups which are subsumed by other shown groups. By “subsumed” we mean that the group is defined by extra feature values, but those values are not changed by any action."remove-below-thr-corr"
: does not show actions which are below the given effectiveness threshold. Refer also to the documentation of parameterphi
below."remove-above-thr-cost"
: does not show action that cost more than the given cost budget. Refer also to the documentation of parameterc
below."keep-rules-until-thr-corr-reached"
:"remove-fair-rules"
: do not show groups which do not exhibit bias."keep-only-min-change"
: for each group shown, show only the suggested actions that have minimum cost, ignore the others.
Defaults to [].
phi (float, optional) – effectiveness threshold. Real number in [0, 1]. Applicable for “equal-choice-for-recourse” and “equal-cost-of-effectiveness” metrics. For these two metrics, an action is considered to achieve recourse for a subpopulation group if at least
phi
% of the group’s individuals achieve recourse. Defaults to 0.5.c (float, optional) – cost budget. Real number. Applicable for “equal-effectiveness-within-budget” metric. Specifies the maximum cost that can be payed for an action (by the individual, by a central authority etc.) Defaults to 0.5.
- fit(X: DataFrame, verbose: bool = True)[source]
Calculates subpopulation groups, actions and respective effectiveness
- Parameters:
X (DataFrame) – Dataset given as a
pandas.DataFrame
. As in standard scikit-learn convention, it is expected to contain one instance per row and one feature / explanatory variable per column (labels not needed, we already have an ML model).verbose (bool) – whether to print intermediate messages and progress bar. Defaults to True.
- Raises:
ValueError –
feats_allowed_to_change
andfeats_not_allowed_to_change
cannot be given simultaneously.Exception – when unreachable code is executed.
- Returns:
FACTS – Returns self.
- print_recourse_report(population_sizes=None, missing_subgroup_val='N/A', show_subgroup_costs=False, show_action_costs=False, show_cumulative_plots=False, show_bias=None, show_unbiased_subgroups=True, correctness_metric=False)[source]
Prints a nicely formatted report of the results (subpopulation groups and recourses) discovered by the
bias_scan
method.- Parameters:
population_sizes (dict(str, int), optional) – Number of individuals that are given the negative prediction by the model, for each subgroup. If given, it is included in the report together with some coverage percentages.
missing_subgroup_val (str, optional) – Optionally specify a value of the protected attribute which denotes that it is missing and should not be included in the printed results. Defaults to “N/A”.
show_subgroup_costs (bool, optional) – Whether to show the costs assigned to each protected subgroup. Defaults to False.
show_action_costs (bool, optional) – Whether to show the costs assigned to each specific action. Defaults to False.
show_cumulative_plots (bool, optional) – If true, shows, for each subgroup, a graph of the
effectiveness cumulative distribution
, as it is called in [1]. Defaults to False.show_bias (str, optional) – Specify which value of the protected attribute corresponds to the subgroup against which we want to find unfairness. Mainly useful for when the protected attribute is not binary (e.g. race). Defaults to None.
correctness_metric (bool, optional) – if True, the metric is considered to quantify utility, i.e. the greater it is for a group, the more beneficial it is for the individuals of the group. Defaults to False.
metric_name (str, optional) – If given, it is added to the the printed message for unfairness in a subpopulation group, i.e. the method prints “Bias against females due to <metric_name>”.
- Raises:
RuntimeError – if costs for groups and subgroups are empty. Most likely the
bias_scan
method was not run.
- set_fit_request(*, verbose: bool | None | str = '$UNCHANGED$') FACTS [source]
Request metadata passed to the
fit
method.Note that this method is only relevant if
enable_metadata_routing=True
(seesklearn.set_config()
). Please see User Guide on how the routing mechanism works.The options for each parameter are:
True
: metadata is requested, and passed tofit
if provided. The request is ignored if metadata is not provided.False
: metadata is not requested and the meta-estimator will not pass it tofit
.None
: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str
: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED
) retains the existing request. This allows you to change the request for some parameters and not others.New in version 1.3.
Note
This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a
Pipeline
. Otherwise it has no effect.- Parameters:
verbose (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for
verbose
parameter infit
.- Returns:
self (object) – The updated object.