MIT Researchers Uncover New Approach to Address AI Bias

Bias in AI is a well-known challenge, with platform developers working on several potential ways to minimize the issue.

Bias can occur as developers design an AI model, from underrepresentation of certain groups in underlying data, human annotators bringing their own biases when labeling data, societal biases in training data and more.

MIT researchers think that they have found a solution to this challenge, using a technique that identifies and removes specific points in a training dataset that contribute most to a model’s failures on minority subgroups. By removing far fewer data points than other approaches, this technique maintains the overall accuracy of the model while improving its performance regarding underrepresented groups, according to an MIT News report.

How MIT's Approach to Reducing Bias Works

MIT's new approach can identify hidden sources of bias in a training dataset that lacks labels. Unlabeled data is far more prevalent than labeled data for many applications.

The technique involves identifying and removing specific training examples that cause models to perform poorly on certain subgroups — specific, identifiable subpopulations within a larger dataset. Once these problematic examples are removed, the model is retrained on the debiased dataset, leading to improved performance across different subgroups.

Maryam Meseha, founding partner and co-chair of privacy and data security at Pierson Ferdinand, said MIT's advancements improve on legacy techniques. “Traditional AI models often rely on biased datasets, leading to skewed outcomes, particularly in decision-making areas like hiring, lending and law enforcement."

She added that new techniques, like those developed by MIT, improve upon legacy approaches by:

Re-weighting datasets dynamically, ensuring underrepresented data points are properly accounted for without distorting accuracy
Leveraging counterfactual fairness methods, where models analyze how outcomes would change if demographic attributes were different
Integrating adversarial debiasing, refining models to mitigate bias while maintaining robust decision-making capabilities

MIT's approach can be used even when the specific underperforming groups are not initially known, making it a versatile method for addressing and reducing bias in various machine learning tasks.

Related Article: What Are Ethicists Saying About AI?

Bias Reduction Is No Easy Feat

Reducing bias in AI isn’t a simple matter, said Meseha. Ensuring more reliable and inclusive AI requires organizations to:

Adopt transparent AI auditing practices to regularly assess and refine models
Align with legal and ethical standards, such as GDPR and NIST AI frameworks, to uphold fairness in real-world applications
Include multidisciplinary teams (ethicists, lawyers, technologists) in AI development to mitigate unintended consequences

There are many ways that AI model training can go awry, according to Gil Irizarry, chief of innovation at Babel Street. “For example, we leverage AI code to convert foreign language to English. For example, Sophia is originally a Greek word. The phi, or the character in Greek is what is the middle of Sophia. My daughter’s name is spelled: S o, p, h, i, a, but you'll also see a spelling of S, o, f, i, a.

"So if I were to train a model, and every time I had a phi, I only had words like Sophia with the 'ph' — so hydrophilia, etc., the model would interpret that 'phi' as 'ph,' but that would be an example of bias, because I gave you these set examples for model training, but we as humans know that [there are] lot of words where an 'f' is an appropriate transliteration. The goal, Irizarry added, would be to come up with a representative set of examples to understand the best percentage to inform the model.

Learning Opportunities

Webinar

Nov

Fix the Content Bottleneck: Build a Better WebOps Strategy

Content stalled? Dev overloaded? You’re not the only one. Learn how streamlined WebOps bridges the publishing gap.

Webinar

On demand

Beyond Storage: Smarter Content, Bigger Impact with DAM + AI

Discover how the DAM + AI duo makes content smarter, stronger and more accessible.

Watch Now

Webinar

On demand

Agentic AI Playbook: Real-World Customer Service Use Cases You Can Deploy Now

Boost self-service by 30% and slash call volume by 63% with agentic AI.

Watch Now

Webinar

On demand

CMS Briefing: A Live Look at What’s Next in AI-Driven Platforms

Learn how leading organizations are using AI‑driven tools to publish faster, personalize smarter and stay secure.

Watch Now

Webinar

On demand

Ready or Not: How Data-First Organizations Are Unlocking Agentforce Potential

Learn how to cut through the noise, activate Agentforce and build a Salesforce AI strategy that actually delivers.

Watch Now

Webinar

On demand

AI in Customer Service: Faster Resolutions, Happier Customers

Don’t let rising demand burn out your team. See how to build a smarter, more resilient support org.

Watch Now

Webinar

Nov

Fix the Content Bottleneck: Build a Better WebOps Strategy

Content stalled? Dev overloaded? You’re not the only one. Learn how streamlined WebOps bridges the publishing gap.

Webinar

On demand

Beyond Storage: Smarter Content, Bigger Impact with DAM + AI

Discover how the DAM + AI duo makes content smarter, stronger and more accessible.

Watch Now

Webinar

On demand

Agentic AI Playbook: Real-World Customer Service Use Cases You Can Deploy Now

Boost self-service by 30% and slash call volume by 63% with agentic AI.

Watch Now

Irizarry's company uses another AI model to identify common name components and rare name components. For example, names like John Smith have two fairly common and equal name components, John and Smith. A name like John Rutherford has unequal components. The less common one needs to carry more weight in the AI model design. “How we aggregate the data to train that model is going to say a lot about what the model is going to put out, which really is what the MIT research is talking about,” he explained.

MIT researchers preface the paper by underscoring the fact that machine learning algorithms rely on accurate labeling, Irizarry noted. “The human brain learns by combining experience and knowledge over time; machines only learn what they’ve been instructed (data). Therefore, bad annotation leads to biased algorithms. Additionally, data labeling must be fair and unbiased, for the AI model to be free of bias.”