MIT researchers combat bias in AI training data

MIT researchers combat bias in AI training data

Bias in artificial intelligence (AI) systems can lead to inaccurate predictive models, inadequate search results, and discrimination, but a team of researchers at MIT claim to have found a way to overcome such biases.

Their new method demonstrates that the main perpetrator is not the algorithms themselves, but how the data is collected. Vitally, the approach can reduce bias without affecting he accuracy of predictive results.

According to Irene Chen, a PhD student who co-authored the research paper, computer scientists are often quick to say that the way to make these systems less biased is to simply design better algorithms.

“But algorithms are only as good as the data they’re using, and our research shows that you can often make a bigger difference with better data,” she explained.

By looking at specific examples, researchers were able to both identify potential causes for differences in accuracy and quantify each factor’s individual impact on the data.

They then showed how changing the way they collected data could reduce each type of bias, while maintaining the same level of predictive accuracy.

Research paper co-author MIT professor David Sontag said the research was a “toolbox” for helping machine learning engineers figure out what questions to ask of their data in order to diagnose why their systems may be making unfair predictions.

‘More’ doesn’t mean ‘better’

One of the biggest misconceptions is that more data is always better, Chen said:

Getting more participants does not necessarily help, since drawing from the exact same population often leads to the same subgroups being under-represented.

“Even the popular image database ImageNet, with its many millions of images, has been shown to be biased towards the Northern Hemisphere.”

Sontag said it was important to get more data from those under-represented groups. For example, the team looked at an income-prediction system and found that it was twice as likely to misclassify female employees as low-income and male employees as high-income.

However, if they increased the dataset by a factor of 10, those mistakes would happen 40 percent less often.

In another dataset, the researchers found that a system’s ability to predict intensive care unit (ICU) mortality was less accurate for Asian patients.

Existing approaches for reducing discrimination would basically just make the non-Asian predictions less accurate, which is problematic when you’re talking about settings, like healthcare, that can have life-or-death implications.

Chen said that their approach allowed them to look at a dataset and determine how many more participants from different populations are needed to improve accuracy for the group with lower accuracy, while still preserving accuracy for the group with higher accuracy.

“We can plot trajectory curves to see what would happen if we added 2,000 more people versus 20,000, and from that figure out what size the dataset should be if we want to have the best of all worlds.

With a more nuanced approach like this, hospitals and other institutions would be better equipped to do cost-benefit analyses to see if it would be useful to get more data.

The results will be presented next month at Neural Information Processing Systems (NIPS) in Montreal.

Internet of Business says

We’ve previously reported on research out of MIT on the implications of using biased data to train AI. Taken to the extreme, Norman the psychopathic AI demonstrates that a sound algorithm can lead to dark results, if trained on problematic information.

The implications of the ‘Norman’ research are valuable – and troubling – because they reveal that some AI systems may simply present us with the results that we, consciously or unconsciously, already want to see.

In the same way that a Google image search, for example, will present whatever pictures internet users have tagged in a certain way – including tags that may be partial or biased.

This opens up the real possibility that we may begin to use AI to ‘prove’ things that we already believe to be the case. In such a world, confirmation bias could become endemic, but have a veneer of neutrality and evidenced fact.

Elsewhere, IBM recently announced a new artificial intelligence (AI) ‘Trust and Transparency’ service, which it claims gives businesses greater insight into AI decision-making and bias.

The new Watson-based cloud service is designed to not only ‘open the black box’ of complex AI systems, but also to reinforce organisations’ trust in their own AI-based decisions – and data – by showing the workings.

Yet more needs to be done to ensure that, as we become more dependent on AI across many industries, we aren’t shaping AI models with unbalanced data or our own prejudices – particularly in life-changing fields, such as the justice system.