How and Why Agentic AI Changes the Game

Many of my prior articles on VKTR focused on where and why AI implementations are hitting roadblocks and why quality assurance NEEDS to be part of rolling out any kind of new technology like AI. Whether we’re ready for it or not, AI is here.

In just the last week, Shopify rolled out new AI tools for developers, Wells Fargo and Google announced that Wells Fargo employees will gain access to expanded AI agents and tools and AI provider Cohere rolled out enterprise enhancements to its North platform. By the time this article goes live, a search of “today’s AI news” will likely yield entirely different results but show that momentum is growing, and fast.

Recent reports from UIPath and PwC validate that executives want to keep AI in the headlines. UIPath found that 93% of US IT executives are extremely or very interested in agentic AI, with 32% noting they are planning to invest in the next six months or less. In PwC’s survey of senior executives, 88% said their team or function planned to increase AI budgets in the next 12 months due to agentic AI. And 79% said their company already adopted AI agents.

Executives and boards want results. Product teams want speed. Customers expect magic. And yet, the quality needs improvement to overcome hallucinations, bias, inconsistent outputs and privacy gaps that are all too common. If this is a problem now with traditional and generative AI, it’s about to be a much bigger issue with agentic AI.

What’s So Different About Agentic AI?

Agentic AI is fundamentally different from traditional rule-based automation because we’re asking the technology to make decisions on its own. McKinsey noted that this introduces “a new class of systemic risks that traditional gen AI architectures, designed primarily for isolated LLM-centric use cases, were never built to handle: uncontrolled autonomy, fragmented system access, lack of observability and traceability, expanding surface of attack and agent sprawl and duplication.”

Traditional testing approaches clearly won’t work here. To start, what can we even measure when an agentic AI model is capable of generating an infinite number of responses? The scope is enormous. Instead, companies need new QA strategies that focus on intent, experience and risk. It’s not enough to ask if the model returns a result; you have to ask if it returns a reasonable, fair, accurate and safe one.

Red-Teaming and Other Advanced Testing Methods

In a prior article, I outlined best practices on red team testing, or red teaming. This adversarial technique designed to find failure points focuses on common problems related to security, safety, accuracy, functionality and performance, forcing organizations to think about which losses and missteps might present the highest risks. Red team testing enables teams to look beyond code and think about behavior, complex reasoning chains and patterns — essentially, to think like the AI.

A diverse team of testers can “launch attacks” and uncover issues, testing both agent communications and actions for harmful behaviors and weaknesses. These “attacks” can include: adversarial prompt injections to test if prompts can bypass safety filters, contextual framing exploits to check if agents are following harmful instructions when assuming roles or changing contexts, token-level manipulation to validate whether odd token patterns trigger unsafe outputs, agent action leakage to prevent an agent from revealing data or exposing its underlying properties when prompted or toxicity detection to leverage LLMs to flag biased, racist or other toxic outputs.

Questions to Consider When Testing Agentic AI

Did the agent do the task it was supposed to do?
Did the agent behave ethically in how it handled the task it was asked to do?
Has the agent’s tone and role remained consistent across interactions? Does the agent align to its specific use case?
Can I verify the agent’s decision-making process and final output are grounded in truth?
Are the agent’s reasoning and actions cost-efficient? Does the agent align with organizational behaviors?
Is my agent interoperable with my organization’s other business functions?

While these are broad questions, they can be tailored to specific use cases. For example, if you’re testing an integrated voice response (IVR) system and there are a number of disconnections or incorrect routing, then no, the agent is not completing the task it was supposed to. Or, if you’re testing an online booking system and the task is completed, but it’s a multistep process between the user and the agent, then the workflow is not cost-efficient. The point is that your organization can tailor testing to your unique needs and quantify threats as low, mid or high based on their potential negative impact.

These sophisticated and holistic testing approaches are evolving just as fast as AI itself, but are necessary in order to provide a strong, scalable foundation built on trust.

Related Article: Do's, Don'ts and Must-Haves for Agentic AI

Learning Opportunities

Webinar

Sep

CMS Briefing: A Live Look at What’s Next in AI-Driven Platforms

Learn how leading organizations are using AI‑driven tools to publish faster, personalize smarter and stay secure.

Webinar

Oct

Agentic AI Playbook: Real-World Customer Service Use Cases You Can Deploy Now

Boost self-service by 30% and slash call volume by 63% with agentic AI.

Webinar

On demand

Ready or Not: How Data-First Organizations Are Unlocking Agentforce Potential

Learn how to cut through the noise, activate Agentforce and build a Salesforce AI strategy that actually delivers.

Watch Now

Webinar

On demand

AI in Customer Service: Faster Resolutions, Happier Customers

Don’t let rising demand burn out your team. See how to build a smarter, more resilient support org.

Watch Now

Webinar

On demand

From Hype to High-Impact CX Strategies That Actually Scale

Turn buzzworthy AI and outsourcing trends into measurable CX wins with fresh 2025 data.

Watch Now

Webinar

On demand

Insights to Action Rethinking the Contact Center for Real Business Impact

Join our exclusive webinar to hear CX executives share their innovative strategies for transforming service delivery.

Watch Now

Webinar

Sep

CMS Briefing: A Live Look at What’s Next in AI-Driven Platforms

Learn how leading organizations are using AI‑driven tools to publish faster, personalize smarter and stay secure.

Webinar

Oct

Agentic AI Playbook: Real-World Customer Service Use Cases You Can Deploy Now

Boost self-service by 30% and slash call volume by 63% with agentic AI.

Webinar

On demand

Ready or Not: How Data-First Organizations Are Unlocking Agentforce Potential

Learn how to cut through the noise, activate Agentforce and build a Salesforce AI strategy that actually delivers.

Watch Now

Testing as Strategic Necessity

When it comes to testing AI, we’re dealing with an entirely different animal than traditional technologies. It’s unpredictable and powerful and extremely complex. It’s a machine-powered conversation that has to factor in subjective judgment and cultural nuance to ensure that it’s supporting the organization’s overall mission and goals.

Testing isn’t a luxury. It’s key to making sure that the conversation is a good one. And that good user experience leads to customer retention, loyalty and revenue. Testing also prevents a bad conversation from damaging your brand. Simply put, it’s not simple — agentic AI is a game-changer that is going to impact every organization one way or another.

fa-solid fa-hand-paper Learn how you can join our contributor community.