aif360.datasets.RegressionDataset

class aif360.datasets.RegressionDataset(df, dep_var_name, protected_attribute_names, privileged_classes, instance_weights_name='', categorical_features=[], na_values=[], custom_preprocessing=None, metadata=None)[source]

Base class for regression datasets.

Subclasses of RegressionDataset should perform the following before calling super().__init__:

  1. Load the dataframe from a raw file.

Then, this class will go through a standard preprocessing routine which:

  1. (optional) Performs some dataset-specific preprocessing (e.g. renaming columns/values, handling missing data).

  2. Drops rows with NA values.

  3. Creates a one-hot encoding of the categorical variables.

  4. Maps protected attributes to binary privileged/unprivileged values (1/0).

  5. Normalizes df values

Parameters:
  • df (pandas.DataFrame) – DataFrame on which to perform standard processing.

  • dep_var_name – Name of the dependent variable column in df.

  • protected_attribute_names (list) – List of names corresponding to protected attribute columns in df.

  • privileged_classes (list(list or function)) – Each element is a list of values which are considered privileged or a boolean function which return True if privileged for the corresponding column in protected_attribute_names. All others are unprivileged. Values are mapped to 1 (privileged) and 0 (unprivileged) if they are not already numerical.

  • instance_weights_name (optional) – Name of the instance weights column in df.

  • categorical_features (optional, list) – List of column names in the DataFrame which are to be expanded into one-hot vectors.

  • na_values (optional) – Additional strings to recognize as NA. See pandas.read_csv() for details.

  • custom_preprocessing (function) – A function object which acts on and returns a DataFrame (f: DataFrame -> DataFrame). If None, no extra preprocessing is applied.

  • metadata (optional) – Additional metadata to append.

Methods

align_datasets

Align the other dataset features, labels and protected_attributes to this dataset.

convert_to_dataframe

Convert the StructuredDataset to a pandas.DataFrame.

copy

Convenience method to return a copy of this dataset.

export_dataset

Export the dataset and supporting attributes TODO: The preferred file format is HDF

import_dataset

Import the dataset and supporting attributes TODO: The preferred file format is HDF

split

Split this dataset into multiple partitions.

subset

Subset of dataset based on position :param indexes: iterable which contains row indexes

temporarily_ignore

Temporarily add the fields provided to ignore_fields.

validate_dataset

Error checking and type validation.

__init__(df, dep_var_name, protected_attribute_names, privileged_classes, instance_weights_name='', categorical_features=[], na_values=[], custom_preprocessing=None, metadata=None)[source]

Subclasses of RegressionDataset should perform the following before calling super().__init__:

  1. Load the dataframe from a raw file.

Then, this class will go through a standard preprocessing routine which:

  1. (optional) Performs some dataset-specific preprocessing (e.g. renaming columns/values, handling missing data).

  2. Drops rows with NA values.

  3. Creates a one-hot encoding of the categorical variables.

  4. Maps protected attributes to binary privileged/unprivileged values (1/0).

  5. Normalizes df values

Parameters:
  • df (pandas.DataFrame) – DataFrame on which to perform standard processing.

  • dep_var_name – Name of the dependent variable column in df.

  • protected_attribute_names (list) – List of names corresponding to protected attribute columns in df.

  • privileged_classes (list(list or function)) – Each element is a list of values which are considered privileged or a boolean function which return True if privileged for the corresponding column in protected_attribute_names. All others are unprivileged. Values are mapped to 1 (privileged) and 0 (unprivileged) if they are not already numerical.

  • instance_weights_name (optional) – Name of the instance weights column in df.

  • categorical_features (optional, list) – List of column names in the DataFrame which are to be expanded into one-hot vectors.

  • na_values (optional) – Additional strings to recognize as NA. See pandas.read_csv() for details.

  • custom_preprocessing (function) – A function object which acts on and returns a DataFrame (f: DataFrame -> DataFrame). If None, no extra preprocessing is applied.

  • metadata (optional) – Additional metadata to append.