Magnifying glass on laptop keyboard
Editorial

Enterprises Are Building a New AI QA Playbook

2 minute read
Chris Sheehan avatar
By
SAVED
Autonomy without oversight is a liability. Here’s how leading organizations are utilizing red teaming, hybrid evaluation and expert validation.

In my last VKTR article, I discussed how agentic AI brings both autonomy AND risk, and that we’ll need new quality assurance strategies and testing to provide a strong, scalable, trustworthy foundation. 

Fortunately, the industry isn't standing still. I see progress and investment in a number of new or enhanced strategies. 

Table of Contents

LLMs Trained to Increase Relevancy and Accuracy — While Reducing Bias 

This is done by keeping humans in the loop, both from a data sourcing AND testing perspective. You may have noticed that I really enjoy writing about this topic.

This notion that we can “set it and forget it” with AI agents isn’t realistic for many reasons. While a properly trained agent is truly intelligent and adaptable, it doesn’t have the human capacity for judgement, which is critical with unexpected or imperfect situations that arise all the time in the real world. Hybrid evaluations that combine human-in-the loop and automated assessments will give organizations a comprehensive approach to testing that can be adapted to their own unique business cases. 

More Organizations Understand Rigorous Security Testing Is Table Stakes

If there’s any topic that I like writing about more than “humans in the loop,” it would have to be red team testing. I referenced this concept in my very first VKTR article in 2024, and in three other posts as well.

Red team testing, its legacy in cybersecurity and effectiveness in securing AI systems are not new concepts. However, red teaming is not yet a requirement of AI development for most internal teams that are undoubtedly balancing aggressive launch deadlines with limited expertise and resources. A 2025 report found only 33% of organizations leverage this QA best practice.

But, cutting corners on quality can have severe consequences down the road. We’re seeing our clients (some of the world’s largest brands) embed red team expertise and execution into the SDLC, understanding that it is the most efficient way to guard against post-launch headaches and very real costs. Continuing to prioritize sophisticated safety evaluations and red teaming is starting to become an essential part of the AI development cycle for all companies. 

Related Article: AI Risks Grow as Companies Prioritize Speed Over Safety

Expertise Makes Large Language Models Better

I wrote previously about how subject-matter experts can help fine-tune AI for specific use cases and evaluate accuracy, tone and coherence. (Note: ANOTHER human-in-the-loop plug). We’re seeing many organizations improve their AI data with the help of experts and generalists alike.

In a recent example, a financial software company approached my company to evaluate and test its model for safety, accuracy and potential harms. The company required financial experts to test the model against various criteria and provide feedback. As a result, we ended up identifying and running moderated studies with dozens of CFOs who were willing to provide critical feedback, insights and issues from their own data generated by the agent being tested.

Roller Coaster or Ferris Wheel? 

AI strategies will continue to evolve, and more changes are on the horizon. Are we boarding another roller coaster, or are we in for a gentler ride?

Businesses that want to ride out these twists and turns should start with the strongest foundation (a diverse human dataset), embed rigorous testing throughout the SDLC and use the best experts (on-hand or within reach) to validate and optimize their findings.

Learning Opportunities

If there’s one thing that is true about AI, it’s that its users are ultimately human and we should all strive as leaders to minimize AI’s risks (inaccuracy, bias, toxicity) and maximize AI’s incredible value (greater efficiency, higher productivity). 

fa-solid fa-hand-paper Learn how you can join our contributor community.

About the Author
Chris Sheehan

As Applause's SVP and GM of strategic accounts, Chris Sheehan enables the success of Applause’s strategic account business, including strategy, sales and operations to ensure continued growth and customer success of its largest customers. Since joining Applause in 2015, Sheehan has held roles on multiple teams, including software delivery, product strategy and customer success. Connect with Chris Sheehan:

Main image: Paweł Michałowski | Adobe Stock
Featured Research