The Gist
- Addressing bias. AI fairness requires robust data governance and bias prevention.
- Leveraging policies. Apply stringent access controls to data used in AI models.
- Promoting fairness. Enhance AI diversity, equity, and inclusion through conscientious use of data.
Several years ago, Ann Cavoukian defined Privacy by Design with seven elegant and complete principles. As we are in the "Age of AI" where data bias can determine whether you are considered for a job, receive appropriate credit card offers, or get the right attention from customer service, I want to suggest an eighth principle. This principle will address machine learning data bias, and it should matter to CMOs, heads of customer service and heads of HR. Simply because the way artificial intelligence (AI) operates impacts brand loyalty business decision-making and the ability to attract a diverse workforce.
Unraveling AI Bias: The Need for Data Governance and Ethical Standards
Companies are increasingly embedding AI into their products, services, business processes and decision-making strategies. By necessity, it would be wise for them to shift their attention to precisely how data is being used by the software. For example, can our algorithms be prevented from developing bias just like their human creators?
Since much of the current AI and machine learning technologies are in an unsupervised learning format, or back propagation neural networks, the only control point for bias occurs from data organizations feed into their machines. Numerous examples of algorithm bias have taken place over the last several years. Apple’s credit card algorithm has been accused of discriminating against women, triggering an investigation by New York’s Department of Financial Services. Meanwhile, Amazon's résumé screening system demonstrated a bias against females, effectively excluding female candidates from consideration. How can managers build diversity if their algorithms get in the way?
Clearly, AI’s ability to consider customers and employees in granular fashion results in people being treated differently. This differentiated treatment can result in bias or unfairness. A recent Harvard Business Review article, “AI Regulation is Coming” says, “AI increases the potential scale of bias. Any flaw can affect millions, exposing companies to class-action lawsuits.” It means organizations need to not only control private information from internal or external release but also the application of this information to data models.
The above case studies prove that the data provisioned to machine learning models can lead to biased outcomes. In a personal conversation with Tom Davenport he said, “bias comes from data not models.” This is particularly the case with supervised learning. Joi Ito, former director of the MIT Media Lab, says, “any biases or errors in the data the engineers use to teach machines will result in outcomes that reflect those biases.”
The EU's General Data Protection Regulation (GDPR) already created “the right … to obtain an explanation of the decision reached.” But what happens when an algorithm versus a person makes the decision? How do you explain the decision? Recently, the EU has identified explainability as a key factor in increasing distrust in AI in its white paper and AI regulation proposal.
Without question, as AI proliferates, new standards will become a necessity. We need to drive ethical AI by controlling model access to sensitive data. Interestingly, this is wise marketing as well. Several years ago, Daniel Yankelovich was right when he wrote this in a Harvard Business Review article titled, "New Criteria for Marketing Segmentation." In this HBR article, he said, “demographics are not an effective basis for marketing segmentation.” Fixing this will prevent bias and privacy loss in the application of data. Now the responsible use of the technology must evolve.
Related Article: Dealing With AI Biases, Part 1: Acknowledging the Bias
AI Bias: Bridging the Gap With Data Governance and Ethical Practices
Amazon is experimenting with something that they call a fairness metric, which is labeled as “conditional demographic disparity.” But there is no agreed-upon definition of fairness, nor is it possible to categorize the general conditions that determine equitable outcomes.
I want to suggest an easier way forward. It involves creating generative AI data governance. Here, you use policies just like you do to protect sensitive data from humans in order to comply. In this case, you eliminate algorithm bias by applying fine grain controls to profile markers. This activity works to remove the risk of machine learning bias. For this to succeed, you need to determine what data can create bias by a machine learning model before you turn them loose on training data.
Now to be fair, this may be a bit more complex. The potential for inferred bias also needs to be eliminated, and, as Privacy by Design states, you need to consider your datasets and mask data that can lead to any inferred bias.
Let me provide a couple of examples for companies that I have direct experience with. A large telecom company my firm works with noticed its data scientists were bringing their own data into a sandbox and wanted the data scanned for data sensitivity before it could be used in a data model. Because the external data may contain personally identifiable information (PII) or other sensitive attributes that could compromise individuals' privacy, without proper scrutiny and safeguards, the company risks inadvertently using or exposing sensitive information, potentially leading to privacy concerns and ethical issues.
A financial data company wanted data scanned before it went into training models and created bias in calculating creditworthiness. There are unfortunately often inherent and indirect biases and proxy variables found in this data and sometimes a lack of contextual understanding during the process. By adopting a holistic approach that combines data scanning with strategies such as algorithmic fairness techniques, proxy variable analysis, representative data exercises, and regular audits and monitoring, the financial data company can work toward reducing bias and promoting fairness in creditworthiness calculations.
For any industry, the first step in the generative AI data governance process is to employ computational data governance. Here you govern all of the data that goes into data models. You eliminate biased models and the unwarranted access to the data before it gets into the model. Simply put, models just can’t be trusted with all data. Models need access policies just like the data scientists that create them. Tom Davenport in "All in on AI" says, “AI should have policies, governance, and leadership roles.”
Related Article: Avoid Data and AI Biases for Stronger CX Outcomes
Crafting Ethical Policies: Ensuring Fairness in AI Data Utilization
The problem above can be solved by implementing the right policies and controls, but it means data scientists and marketers need to include design protections by considering what data is fair for the models to consider. This depends, of course, on the end purpose of the model. The expected output of the model is also a critical aspect to consider. Let’s take a look at some specific cases.
Take credit card data, for example. It isn’t appropriate for algorithms to have access to data that violates the Equal Credit Opportunity Act. Data access shouldn’t be restricted in terms of race, color, religion, national origin, marital status, age, gender or orientation. This includes data that can be used to infer this data. The Health Insurance Portability and Accountability Act of 1996 (HIPAA) is a good illustration here because it has outlined 18 variables that can be used to define each one of us. Data scientists need to mask anything that may be used to determine or infer who I am. These same variables make sense then to exclude from algorithms designed to gate in or gate out candidates for a position. Unless they are intending to encourage diversity and may be used with positive versus negative weights.
With this said, for some marketing activities, considering gender may be perfectly appropriate. For example, I would expect to get offers from retailers for male clothing. And even though I do, during the holidays, buy clothing for my wife, I would not expect to receive offers for my wife unless the AI was smart enough to connect my wife’s birthday with things for her. In other words, data models need to have contextual intelligence.
In the same vein, however, companies need to be actively working to avoid what happened to Target a few years ago. They noticed that a household likely had a pregnant mother and started sending offers for baby stuff. It turned out that the pregnant mother was a teenager, and the father did not know before the algorithm did. This means the application of AI needs to consider the unthinkable and only be applied to cases that do not create harm. Just because you can detect something does not mean you should use this data.
AI and Bias: The Urgent Need for an 8th Privacy Principle
So, I want to suggest that there is an eighth principle for Privacy by Design. This principle is to make sure your organization is not using AI and any private information found therein, where it can lead to harm or bias for customers or employees. The potential for bias and harm is crystal clear. Policies, governance, and more importantly, conscious controls over how data can be used must be in place.
Even in cases where biases may be reasonable, we as organizations should make sure diversity, equity and inclusion initiatives are not harmed. Taking steps to consciously govern the data that AI and machine learning can access will work to ensure a more equitably rooted data ecosystem.
Learn how you can join our contributor community.