Pawn looking in the mirror and seeing a king
Editorial

How and Why Agentic AI Changes the Game

3 minute read
Chris Sheehan avatar
By
SAVED
Agentic AI brings autonomy and systemic risks that demand new QA strategies, making rigorous testing essential for trust, safety and business outcomes.

Many of my prior articles on VKTR focused on where and why AI implementations are hitting roadblocks and why quality assurance NEEDS to be part of rolling out any kind of new technology like AI. Whether we’re ready for it or not, AI is here.

In just the last week, Shopify rolled out new AI tools for developers, Wells Fargo and Google announced that Wells Fargo employees will gain access to expanded AI agents and tools and AI provider Cohere rolled out enterprise enhancements to its North platform. By the time this article goes live, a search of “today’s AI news” will likely yield entirely different results but show that momentum is growing, and fast. 

Recent reports from UIPath and PwC validate that executives want to keep AI in the headlines. UIPath found that 93% of US IT executives are extremely or very interested in agentic AI, with 32% noting they are planning to invest in the next six months or less. In PwC’s survey of senior executives, 88% said their team or function planned to increase AI budgets in the next 12 months due to agentic AI. And 79% said their company already adopted AI agents. 

Executives and boards want results. Product teams want speed. Customers expect magic. And yet, the quality needs improvement to overcome hallucinations, bias, inconsistent outputs and privacy gaps that are all too common. If this is a problem now with traditional and generative AI, it’s about to be a much bigger issue with agentic AI. 

What’s So Different About Agentic AI? 

Agentic AI is fundamentally different from traditional rule-based automation because we’re asking the technology to make decisions on its own. McKinsey noted that this introduces “a new class of systemic risks that traditional gen AI architectures, designed primarily for isolated LLM-centric use cases, were never built to handle: uncontrolled autonomy, fragmented system access, lack of observability and traceability, expanding surface of attack and agent sprawl and duplication.”

Traditional testing approaches clearly won’t work here. To start, what can we even measure when an agentic AI model is capable of generating an infinite number of responses? The scope is enormous. Instead, companies need new QA strategies that focus on intent, experience and risk. It’s not enough to ask if the model returns a result; you have to ask if it returns a reasonable, fair, accurate and safe one.

Related Article: AI Agent vs. Agentic AI: What’s the Difference — And Why It Matters

Red-Teaming and Other Advanced Testing Methods

In a prior article, I outlined best practices on red team testing, or red teaming. This adversarial technique designed to find failure points focuses on common problems related to security, safety, accuracy, functionality and performance, forcing organizations to think about which losses and missteps might present the highest risks. Red team testing enables teams to look beyond code and think about behavior, complex reasoning chains and patterns — essentially, to think like the AI.

A diverse team of testers can “launch attacks” and uncover issues, testing both agent communications and actions for harmful behaviors and weaknesses. These “attacks” can include: adversarial prompt injections to test if prompts can bypass safety filters, contextual framing exploits to check if agents are following harmful instructions when assuming roles or changing contexts, token-level manipulation to validate whether odd token patterns trigger unsafe outputs, agent action leakage to prevent an agent from revealing data or exposing its underlying properties when prompted or toxicity detection to leverage LLMs to flag biased, racist or other toxic outputs. 

Questions to Consider When Testing Agentic AI 

  • Did the agent do the task it was supposed to do? 
  • Did the agent behave ethically in how it handled the task it was asked to do? 
  • Has the agent’s tone and role remained consistent across interactions? Does the agent align to its specific use case? 
  • Can I verify the agent’s decision-making process and final output are grounded in truth? 
  • Are the agent’s reasoning and actions cost-efficient? Does the agent align with organizational behaviors? 
  • Is my agent interoperable with my organization’s other business functions? 

While these are broad questions, they can be tailored to specific use cases. For example, if you’re testing an integrated voice response (IVR) system and there are a number of disconnections or incorrect routing, then no, the agent is not completing the task it was supposed to. Or, if you’re testing an online booking system and the task is completed, but it’s a multistep process between the user and the agent, then the workflow is not cost-efficient. The point is that your organization can tailor testing to your unique needs and quantify threats as low, mid or high based on their potential negative impact. 

These sophisticated and holistic testing approaches are evolving just as fast as AI itself, but are necessary in order to provide a strong, scalable foundation built on trust. 

Related Article: Do's, Don'ts and Must-Haves for Agentic AI

Learning Opportunities

Testing as Strategic Necessity

When it comes to testing AI, we’re dealing with an entirely different animal than traditional technologies. It’s unpredictable and powerful and extremely complex. It’s a machine-powered conversation that has to factor in subjective judgment and cultural nuance to ensure that it’s supporting the organization’s overall mission and goals.

Testing isn’t a luxury. It’s key to making sure that the conversation is a good one. And that good user experience leads to customer retention, loyalty and revenue. Testing also prevents a bad conversation from damaging your brand. Simply put, it’s not simple — agentic AI is a game-changer that is going to impact every organization one way or another.

fa-solid fa-hand-paper Learn how you can join our contributor community.

About the Author
Chris Sheehan

As Applause's SVP and GM of strategic accounts, Chris Sheehan enables the success of Applause’s strategic account business, including strategy, sales and operations to ensure continued growth and customer success of its largest customers. Since joining Applause in 2015, Sheehan has held roles on multiple teams, including software delivery, product strategy and customer success. Connect with Chris Sheehan:

Main image: Feydzhet Shabanov on Adobe Stock, Generated With AI
Featured Research