At every level of the artificial intelligence (AI) process, across every AI and AI-adjacent technology, one thing is true: No AI solution will meet your goals if your organization does not feed it accurate, relevant and trusted (ART) information. But how do you get there?
Predictive Analytics Showed the Way
Cleaning data to achieve accurate results is nothing new. Since the advent of big data and the drive toward predictive analytics, clean data was required for success.
Many projects were delayed or cancelled when organizations realized that their data was not as pristine as expected, and realized the truth of the phrase, “Garbage in, garbage out.” Analytics solutions produced results that were inaccurate or biased toward previous outcomes, even if those outcomes were not desired.
As current efforts shift to LLMs and beyond, structured data is the appetizer for new models. Unstructured data, documents and other content act as the main course as the full power of AI is brought to bear. The challenge that resulting models face is that information governance has lagged behind data governance.
Most organizations that have begun using AI know how to identify and fix bad structured data. Unstructured data looms as the next hurdle.
1. Use Intelligent Document Processing
Organizations do not have to start from scratch when working to identify a body of content that is ART. Intelligent Document Processing (IDP) solutions are great at sorting through documents and making sense out of them. They are built upon established technologies that scanning and e-discovery vendors used for years. Robotic Process Automation (RPA) vendors used the same technologies as they rose to prominence.
IDP solutions do require training to learn your business. With planning, they learn fast and organizations move from high levels of human-in-the-loop workflows to minimal oversight as accuracy and confidence grows. The resulting information set is one that is ART, perfect for feeding into an AI model.
2. Consider the Information’s Sensitivity
One important aspect to using the resulting AI models is ensuring that the information being ingested is not sensitive in nature. The last thing any organization wants is confidential information to appear in AI-drafted content and shared with people without appropriate access.
Understanding the level of sensitivity for information and teaching your model to respect those sensitivity levels helps prevent unfortunate incidents that could lead to legal consequences. There are methods of classifying the information entering the system and controlling the output. Even with those enhancements, it is still necessary to increase the training and human involvement for any output before it is shared. No matter how well the model is set up and designed, having that review helps prevent accidents.
3. Look at Your Information
No matter what you’re doing, bad data can derail you. The ability for LLMs to create accurate and unbiased results depends upon the incoming information. Without information that is prepared and is ART, the results will not meet the needs of the organization.
Take the time to consider the inputs to your AI models. Understand the implicit biases and inaccuracies in the source material. Do you want to make a claim based upon a reviewed design document or an optimistic sales proposal? Working to clean the source material will go a long way to ensuring that everything your LLMs create is ART.
Related Articles:
- AI Isn't Magic. Prepare Your Data and Your People First — While powerful, AI isn't magical or instantly transformative. But with some prep work, you can deliver value.
- How Data Poisoning Taints the AI Waters — Most GenAI applications rely on public data, making them particularly vulnerable to data poisoning. How far the repercussions go is anyone’s guess.
- An AI Roadmap for the Next 5 Years — Developing an AI business plan with clear, practical goals for the one, three and five-year horizons is the key to delivering real-world business results.
Learn how you can join our contributor community.