Discrimination in social sense (e.g., against minorities and disadvantaged groups) is the subject of many laws worldwide, and it has been extensively studied in the social and economic sciences. We tackle the problem of determining, given a dataset of historical decision records, a precise measure of the degree of discrimination suffered by a given group (e.g., an etnic minority) in a given context (e.g., a geographic area) with respect to the decision (e.g. credit denial). In our approach, this problem is rephrased in a classification rule based setting, and a collection of quantitative measures of discrimination is introduced, on the basis of existing norms and regulations. The measures are defined as functions of the contingency table of a classification rule, and their statistical significance is assessed, relying on a large body of statistical inference methods for proportions. Based on this basic method, we are then able to address the more general problems of: (1) unveiling all discriminatory decision patterns hidden in the historical data, combining discrimination analysis with association rule mining, (2) unveiling discrimination in classifiers that learn over training data biased by discriminatory decisions, and (3) in the case of rule-based classifiers, sanitizing discriminatory rules by correcting their confidence. Our approach is validated on the German credit dataset and on the CPAR classifier.
Measuring Discrimination in Socially-Sensitive Decision Records
PEDRESCHI, DINO;RUGGIERI, SALVATORE;TURINI, FRANCO
2009-01-01
Abstract
Discrimination in social sense (e.g., against minorities and disadvantaged groups) is the subject of many laws worldwide, and it has been extensively studied in the social and economic sciences. We tackle the problem of determining, given a dataset of historical decision records, a precise measure of the degree of discrimination suffered by a given group (e.g., an etnic minority) in a given context (e.g., a geographic area) with respect to the decision (e.g. credit denial). In our approach, this problem is rephrased in a classification rule based setting, and a collection of quantitative measures of discrimination is introduced, on the basis of existing norms and regulations. The measures are defined as functions of the contingency table of a classification rule, and their statistical significance is assessed, relying on a large body of statistical inference methods for proportions. Based on this basic method, we are then able to address the more general problems of: (1) unveiling all discriminatory decision patterns hidden in the historical data, combining discrimination analysis with association rule mining, (2) unveiling discrimination in classifiers that learn over training data biased by discriminatory decisions, and (3) in the case of rule-based classifiers, sanitizing discriminatory rules by correcting their confidence. Our approach is validated on the German credit dataset and on the CPAR classifier.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.