A new paper on so-called ‘black-box’ AI models reveals the dangers of our increasing dependence on opaque systems – and offers up a method for combating prejudiced AI.
Just last week, we reported on the UK Financial Stability Board’s warning that the use of AI and machine learning could compound any future financial crisis. Now, research published on academic journal database arXiv has highlighted another harmful aspect of our dependency on automated risk assessment. In short, it could be prejudiced, to harmful effect.
Black-box risk scoring AIs can be found throughout our financial and criminal justice systems. They are adept at processing vast quantities of data to determine whether an individual meets the desired risk criteria to qualify for a loan or be granted bail. Through machine learning, such systems evolve over time, identifying trends and making associations within the information they’re processing.
Read more: SAP: Banks must prepare for open banking age
Are we making AIs prejudiced?
However, these AIs are only as capable as their models (and the data they are fed) permit them to be. While it is usually illegal to consider factors such as race in these cases, black-box AIs are typically opaque in their methods. Algorithms can recognize that education levels or addresses correlate with other demographic information.
The institutions using them either don’t fully understand their AI’s methods or are using proprietary products, the workings of which suppliers refuse to divulge. There is a very real danger that the limited data sets and methods used by these systems is resulting in unethical bias.
This latest report, Detecting Bias in Black-Box Models Using Transparent Model Distillation, is led by Sarah Tan of Cornell University and provides the means to rid our AIs of prejudice.
Model distillation is a method of improving the performance of almost any machine learning algorithm by training numerous different models on the same data and then averaging their predictions. The output of these ‘teacher models’ is distilled into a faster, simpler ‘student model’, without significant loss of accuracy.
How can we better understand AI?
Tan’s method differs in that it uses two labels to train the AI – a risk score and the actual outcome the risk score was intended to predict. Her team have outlined how these labels relate to each other in a way that eliminates their bias. They achieve this by assessing whether contributions of protected features to the risk score are statistically different from contributions to the actual outcome.
In the past, more transparent models such as this have resulted in reduced prediction accuracy – creating tension between less transparent but more accurate models and clearer but less precise solutions. When the decision could determine whether an individual is granted bail or a loan, it’s a tricky choice with high-stakes implications.
This latest development allows black-box AI users to retrain them with the actual outcomes. “Here, we train a transparent student model to mimic a black-box risk score teacher. We intentionally include all features that may or may not be originally used in the creation of the black-box risk score, even protected features, specifically because we are interested in examining what the model learns from these variables,” describes the report. “Then, we train another transparent model to predict the actual outcome that the risk score was intended to predict.”
In other words, the black-box risk score (such as a credit score) is compared to the actual outcome (whether a loan defaulted). Any systematic differences between the risk scoring model and the actual outcome are then identified as bias – those variables from the initial data set that weren’t factors in the outcome.
Tan and her colleagues trialed the method on loan risks and default rates from the peer-to-peer company LendingClub. It identified that the lender’s current model was probably also ignoring the purpose of the loans for which it was calculating risk – an important variable that has been proven to correlate with risk.
They also tested their model against COMPAS, a proprietary score that predicts recidivism risk in the area of crime (and the subject of scrutiny for racial bias). Its proponents argue that it is race-blind – that is, not prejudiced – as it doesn’t use race as an input.
However, ProPublica previously analyzed and released data on COMPAS scores and true recidivism outcomes of defendants in Broward County, Florida. They found that, “black defendants who did not recidivate over a two-year period were nearly twice as likely to be misclassified as higher risk compared to their white counterparts (45 percent versus 23 percent).”
Tan’s model was able to back this up by demonstrating biases against certain age groups and races within COMPAS, while its own model, trained on the true outcomes, showed no evidence to support this.
With further testing and development, Cornell University’s solution could serve to please everyone – from the institutions that employ AI, to the individuals that must live by their conclusions.
Most importantly, it introduces transparency to critical AI models, while retaining accuracy. As we become ever more dependent on AI, across all walks of life, it’s vital that we understand how they reach conclusions – or we risk blind acceptance of prejudiced decisions.