close up of a person's feet in orange shoes walking up some bright blue steps
Editorial

3 Steps to Using Unstructured Data With AI

2 minute read
Laurence Hart avatar
By
SAVED
Without clean, well-governed inputs, even the most advanced AI tools will fail to deliver reliable results.

At every level of the artificial intelligence (AI) process, across every AI and AI-adjacent technology, one thing is true: No AI solution will meet your goals if your organization does not feed it accurate, relevant and trusted (ART) information. But how do you get there?

Predictive Analytics Showed the Way

Cleaning data to achieve accurate results is nothing new. Since the advent of big data and the drive toward predictive analytics, clean data was required for success. 

Many projects were delayed or cancelled when organizations realized that their data was not as pristine as expected, and realized the truth of the phrase, “Garbage in, garbage out.” Analytics solutions produced results that were inaccurate or biased toward previous outcomes, even if those outcomes were not desired. 

As current efforts shift to LLMs and beyond, structured data is the appetizer for new models. Unstructured data, documents and other content act as the main course as the full power of AI is brought to bear. The challenge that resulting models face is that information governance has lagged behind data governance.

Most organizations that have begun using AI know how to identify and fix bad structured data. Unstructured data looms as the next hurdle. 

1. Use Intelligent Document Processing

Organizations do not have to start from scratch when working to identify a body of content that is ART. Intelligent Document Processing (IDP) solutions are great at sorting through documents and making sense out of them. They are built upon established technologies that scanning and e-discovery vendors used for years. Robotic Process Automation (RPA) vendors used the same technologies as they rose to prominence.

IDP solutions do require training to learn your business. With planning, they learn fast and organizations move from high levels of human-in-the-loop workflows to minimal oversight as accuracy and confidence grows. The resulting information set is one that is ART, perfect for feeding into an AI model.

2. Consider the Information’s Sensitivity

One important aspect to using the resulting AI models is ensuring that the information being ingested is not sensitive in nature. The last thing any organization wants is confidential information to appear in AI-drafted content and shared with people without appropriate access. 

Understanding the level of sensitivity for information and teaching your model to respect those sensitivity levels helps prevent unfortunate incidents that could lead to legal consequences. There are methods of classifying the information entering the system and controlling the output. Even with those enhancements, it is still necessary to increase the training and human involvement for any output before it is shared. No matter how well the model is set up and designed, having that review helps prevent accidents. 

3. Look at Your Information

No matter what you’re doing, bad data can derail you. The ability for LLMs to create accurate and unbiased results depends upon the incoming information. Without information that is prepared and is ART, the results will not meet the needs of the organization.

Take the time to consider the inputs to your AI models. Understand the implicit biases and inaccuracies in the source material. Do you want to make a claim based upon a reviewed design document or an optimistic sales proposal? Working to clean the source material will go a long way to ensuring that everything your LLMs create is ART.

Related Articles: 

fa-solid fa-hand-paper Learn how you can join our contributor community.

About the Author
Laurence Hart

Laurence Hart is a VP of consulting services at CGI Federal, with a focus on leading digital transformation efforts that drive his clients’ success. A proven leader in content management and information governance, Laurence has over two decades of experience solving the challenges organizations face as they implement and deploy information solutions. Connect with Laurence Hart:

Main image: Lindsay Henwood | unsplash
Featured Research