`aif360.datasets`.MEPSDataset20

class aif360.datasets.MEPSDataset20(label_name='UTILIZATION', favorable_classes=[1.0], protected_attribute_names=['RACE'], privileged_classes=[['White']], instance_weights_name='PERWT15F', categorical_features=['REGION', 'SEX', 'MARRY', 'FTSTU', 'ACTDTY', 'HONRDC', 'RTHLTH', 'MNHLTH', 'HIBPDX', 'CHDDX', 'ANGIDX', 'MIDX', 'OHRTDX', 'STRKDX', 'EMPHDX', 'CHBRON', 'CHOLDX', 'CANCERDX', 'DIABDX', 'JTPAIN', 'ARTHDX', 'ARTHTYPE', 'ASTHDX', 'ADHDADDX', 'PREGNT', 'WLKLIM', 'ACTLIM', 'SOCLIM', 'COGLIM', 'DFHEAR42', 'DFSEE42', 'ADSMOK42', 'PHQ242', 'EMPST', 'POVCAT', 'INSCOV'], features_to_keep=['REGION', 'AGE', 'SEX', 'RACE', 'MARRY', 'FTSTU', 'ACTDTY', 'HONRDC', 'RTHLTH', 'MNHLTH', 'HIBPDX', 'CHDDX', 'ANGIDX', 'MIDX', 'OHRTDX', 'STRKDX', 'EMPHDX', 'CHBRON', 'CHOLDX', 'CANCERDX', 'DIABDX', 'JTPAIN', 'ARTHDX', 'ARTHTYPE', 'ASTHDX', 'ADHDADDX', 'PREGNT', 'WLKLIM', 'ACTLIM', 'SOCLIM', 'COGLIM', 'DFHEAR42', 'DFSEE42', 'ADSMOK42', 'PCS42', 'MCS42', 'K6SUM42', 'PHQ242', 'EMPST', 'POVCAT', 'INSCOV', 'UTILIZATION', 'PERWT15F'], features_to_drop=[], na_values=[], custom_preprocessing=<function default_preprocessing>, metadata={'label_maps': [{0.0: '< 10 Visits', 1.0: '>= 10 Visits'}], 'protected_attribute_maps': [{0.0: 'Non-White', 1.0: 'White'}]})[source]

MEPS Dataset.

See aif360/data/raw/meps/README.md.

Subclasses of StandardDataset should perform the following before calling super().__init__:

Load the dataframe from a raw file.

Then, this class will go through a standard preprocessing routine which:

(optional) Performs some dataset-specific preprocessing (e.g. renaming columns/values, handling missing data).

Drops unrequested columns (see features_to_keep and features_to_drop for details).

Drops rows with NA values.

Creates a one-hot encoding of the categorical variables.

Maps protected attributes to binary privileged/unprivileged values (1/0).

Maps labels to binary favorable/unfavorable labels (1/0).

Parameters:

df (pandas.DataFrame) – DataFrame on which to perform standard processing.
label_name – Name of the label column in df.
favorable_classes (list or function) – Label values which are considered favorable or a boolean function which returns True if favorable. All others are unfavorable. Label values are mapped to 1 (favorable) and 0 (unfavorable) if they are not already binary and numerical.
protected_attribute_names (list) – List of names corresponding to protected attribute columns in df.
privileged_classes (list(list or function)) – Each element is a list of values which are considered privileged or a boolean function which return True if privileged for the corresponding column in protected_attribute_names. All others are unprivileged. Values are mapped to 1 (privileged) and 0 (unprivileged) if they are not already numerical.
instance_weights_name (optional) – Name of the instance weights column in df.
categorical_features (optional, list) – List of column names in the DataFrame which are to be expanded into one-hot vectors.
features_to_keep (optional, list) – Column names to keep. All others are dropped except those present in protected_attribute_names, categorical_features, label_name or instance_weights_name. Defaults to all columns if not provided.
features_to_drop (optional, list) – Column names to drop. Note: this overrides features_to_keep.
na_values (optional) – Additional strings to recognize as NA. See pandas.read_csv() for details.
custom_preprocessing (function) – A function object which acts on and returns a DataFrame (f: DataFrame -> DataFrame). If None, no extra preprocessing is applied.
metadata (optional) – Additional metadata to append.

Methods

`align_datasets`	Align the other dataset features, labels and protected_attributes to this dataset.
`convert_to_dataframe`	Convert the StructuredDataset to a `pandas.DataFrame`.
`copy`	Convenience method to return a copy of this dataset.
`export_dataset`	Export the dataset and supporting attributes TODO: The preferred file format is HDF
`import_dataset`	Import the dataset and supporting attributes TODO: The preferred file format is HDF
`split`	Split this dataset into multiple partitions.
`subset`	Subset of dataset based on position :param indexes: iterable which contains row indexes
`temporarily_ignore`	Temporarily add the fields provided to `ignore_fields`.
`validate_dataset`	Error checking and type validation.

aif360.datasets.MEPSDataset20

`aif360.datasets`.MEPSDataset20