aif360.datasets
.RegressionDataset¶
-
class
aif360.datasets.
RegressionDataset
(df, dep_var_name, protected_attribute_names, privileged_classes, instance_weights_name='', categorical_features=[], na_values=[], custom_preprocessing=None, metadata=None)[source]¶ Base class for regression datasets.
Subclasses of RegressionDataset should perform the following before calling
super().__init__
:- Load the dataframe from a raw file.
Then, this class will go through a standard preprocessing routine which:
- (optional) Performs some dataset-specific preprocessing (e.g. renaming columns/values, handling missing data).
- Drops rows with NA values.
- Creates a one-hot encoding of the categorical variables.
- Maps protected attributes to binary privileged/unprivileged values (1/0).
- Normalizes df values
Parameters: - df (pandas.DataFrame) – DataFrame on which to perform standard processing.
- dep_var_name – Name of the dependent variable column in
df
. - protected_attribute_names (list) – List of names corresponding to
protected attribute columns in
df
. - privileged_classes (list(list or function)) – Each element is
a list of values which are considered privileged or a boolean
function which return
True
if privileged for the corresponding column inprotected_attribute_names
. All others are unprivileged. Values are mapped to 1 (privileged) and 0 (unprivileged) if they are not already numerical. - instance_weights_name (optional) – Name of the instance weights
column in
df
. - categorical_features (optional, list) – List of column names in the DataFrame which are to be expanded into one-hot vectors.
- na_values (optional) – Additional strings to recognize as NA. See
pandas.read_csv()
for details. - custom_preprocessing (function) – A function object which
acts on and returns a DataFrame (f: DataFrame -> DataFrame). If
None
, no extra preprocessing is applied. - metadata (optional) – Additional metadata to append.
Methods
align_datasets
Align the other dataset features, labels and protected_attributes to this dataset. convert_to_dataframe
Convert the StructuredDataset to a pandas.DataFrame
.copy
Convenience method to return a copy of this dataset. export_dataset
Export the dataset and supporting attributes TODO: The preferred file format is HDF import_dataset
Import the dataset and supporting attributes TODO: The preferred file format is HDF split
Split this dataset into multiple partitions. subset
Subset of dataset based on position :param indexes: iterable which contains row indexes temporarily_ignore
Temporarily add the fields provided to ignore_fields
.validate_dataset
Error checking and type validation. -
__init__
(df, dep_var_name, protected_attribute_names, privileged_classes, instance_weights_name='', categorical_features=[], na_values=[], custom_preprocessing=None, metadata=None)[source]¶ Subclasses of RegressionDataset should perform the following before calling
super().__init__
:- Load the dataframe from a raw file.
Then, this class will go through a standard preprocessing routine which:
- (optional) Performs some dataset-specific preprocessing (e.g. renaming columns/values, handling missing data).
- Drops rows with NA values.
- Creates a one-hot encoding of the categorical variables.
- Maps protected attributes to binary privileged/unprivileged values (1/0).
- Normalizes df values
Parameters: - df (pandas.DataFrame) – DataFrame on which to perform standard processing.
- dep_var_name – Name of the dependent variable column in
df
. - protected_attribute_names (list) – List of names corresponding to
protected attribute columns in
df
. - privileged_classes (list(list or function)) – Each element is
a list of values which are considered privileged or a boolean
function which return
True
if privileged for the corresponding column inprotected_attribute_names
. All others are unprivileged. Values are mapped to 1 (privileged) and 0 (unprivileged) if they are not already numerical. - instance_weights_name (optional) – Name of the instance weights
column in
df
. - categorical_features (optional, list) – List of column names in the DataFrame which are to be expanded into one-hot vectors.
- na_values (optional) – Additional strings to recognize as NA. See
pandas.read_csv()
for details. - custom_preprocessing (function) – A function object which
acts on and returns a DataFrame (f: DataFrame -> DataFrame). If
None
, no extra preprocessing is applied. - metadata (optional) – Additional metadata to append.