graffiti of a surveillance camera painted on a wall
Feature

Who Watches the AI? Why Agentic AI Needs Observability Platforms

6 minute read
David Barry avatar
By
SAVED
Agentic AI has greater potential but also higher risks than traditional LLMs. Observability platforms play a part in making sure things don't go off the rails.

Agentic AI promises to transform industries by enabling systems to perceive, plan, adapt, learn and move beyond traditional rules-based systems. Unlike static systems, agentic AI uses generative AI for deeper insights and autonomous decision-making.

But who watches the agents when they’re at work? Who watches exactly where agentic AI goes and what information or data the agents use to complete a task?

Turns out, a decades-old solution may be the answer to this most modern problem.

Observability Platforms to the Rescue

An observability platform is a comprehensive toolset designed to provide deep visibility into the performance, health and behavior of complex systems and applications. It enables IT professionals to collect, store, analyze and visualize data from various sources, including metrics, logs and traces (otherwise known as the acronym "MELT").

This centralized approach allows teams to gain real-time insights into system behavior, diagnose issues and make informed decisions to enhance performance and reliability. Some of the key features of observability platforms include:

  • Log management: Aggregates and analyzes system logs to identify patterns and anomalies.
  • Metrics monitoring: Tracks critical system metrics such as CPU usage and memory consumption.
  • Distributed tracing: Provides visibility into user requests across multiple services to identify bottlenecks.
  • Dashboards and visualizations: Presents data in an easily digestible format for quick assessments.
  • Anomaly detection: Utilizes AI/ML algorithms to identify irregularities in real-time.

In a nutshell, observability platforms facilitate proactive monitoring and troubleshooting of complex IT environments, ensuring optimal application performance and improved user experiences.

The synergy between agentic AI and observability is critical for ensuring responsible AI implementation. Observability offers insight into what is happening, while agentic AIOps (Artificial Intelligence for IT Operations) explains why it is happening and takes action to fix it. This combination transforms raw signals into actionable insights and automated responses, which reduces complexity and accelerates resolution.

AIOps uses AI, including machine learning, and analytics to automate, streamline and optimize IT operations, supporting IT teams to quickly identify and resolve slowdowns and outages, often proactively, by providing end-to-end visibility and context across diverse IT environments. Through generative AI, it also introduces autonomous decision-making capabilities, enabling systems to detect, diagnose and resolve issues without human intervention.

Observability and AIOps work together to revolutionize IT operations. Observability platforms gather comprehensive data on system health, while AIOps uses AI and machine learning to analyze this data, providing actionable insights and automating responses.

This combination improves incident management by reducing detection and resolution times, enabling proactive problem-solving through predictive analytics and supporting real-time monitoring and autonomous operations. Observability provides the visibility, while AIOps delivers the intelligence and automation needed to effectively manage complex IT environments.

Agentic AI Observability Challenges

One of the biggest challenges with observability data analysis is the variability of the data, Chronosphere Field CTO Bill Hineline told Reworked.

He suggests businesses choosing among AI models to drive their o11y (observability) work consider standard data models like OTEL to make telemetry less variable. Tools employing agentic AI will need greater transparency in their answers to build confidence and refine their reasoning. The result, he says, is that those who try the black-box approach, hiding how the technology arrives at the answer, will lose confidence.

In short, tools need to show their work to be trustworthy.

“Observability continues to explode, and the size of environments means that data volumes and analysis overhead will continue to explode," he said. “Without AI-powered solutions, we may reach a point where rapid interpretation may become impossible and could be the differentiation between competitors."

That said, he still believes there are many use cases for AI to drive better observability, many of which rely on existing AI capabilities. Pattern-matching, anomaly detection and other use cases are possible today if companies have adopted standard telemetry like OTEL to reduce the variability of data across their environment.

Platform Challenges

Then, there are platform-specific challenges. Traditional platforms can be slow, expensive and ineffective at providing real-time insights. “This becomes a major issue when dealing with AI-driven automation, where speed and accuracy are critical,” Coralogix VP of AI Liran Hason told Reworked.

Hason points to what he sees business face as key challenges in this area:

  1. Scalability issues: As companies expand their AI capabilities, they need to ask: Is our observability system built to handle this? If not, they may need to switch providers, which can be a costly and time-consuming process.
  2. Integration headaches: Many observability platforms don’t easily integrate with existing AI and automation tools. Businesses need solutions that seamlessly fit into their tech stack without requiring a complete overhaul.
  3. Vendor lock-in risk: Some platforms are closed-source, meaning businesses are tied to one provider with limited flexibility. If that provider doesn't meet their AI observability needs, switching can be difficult and expensive.
  4. Balancing cost, storage and insights: Observability for AI is already an investment, so companies need to ensure they’re getting real value, not just paying for unnecessary storage or delayed insights.

“In short, businesses need to carefully evaluate their observability platform to ensure it’s cost-effective, scalable and flexible enough to support AI-driven automation without creating unnecessary roadblocks,” Hason said.

Observability Evolves in Parallel With AI Models

Unlike static or narrowly focused artificial intelligence models, these autonomous systems require tracking not only performance indicators but also goal changes, decision rationales and emergent behaviors to guarantee safety and dependability, Rogers Jeffrey Leo John, co-founder and CTO of DataChat, said. 

Another major challenge with agentic AI is when unexpected behavior and feedback loops cause compounding errors or negative effects. To identify potential issues and stop risky adaptations in real-time, effective monitoring must concentrate on continuous behavioral analysis, anomaly detection and trajectory prediction.

“Gaining meaningful insights necessitates looking beyond basic measures to understand how models change, how artificial intelligence decisions are made, and whether the system complements the business's goals,” John said.

John also emphasizes the need for transparency, traceability and ethical alignment, especially in cases when agentic artificial intelligence is deployed in high-stakes situations. These systems need to be observed for alignment with long-term goals and human values while providing justification for their choices. To foster responsibility and trust, observability systems should center on value alignment checks, intent monitoring and explainable decision logs.

AI observability platforms are evolving as quickly as the AI models they monitor, to provide insights into data provenance, decision paths and biases in real-time.

“With observability systems needed now more than ever, we can probably start to see standards for telemetry data collection that will be widely embraced,” John said. “These standards will simplify integration and lower barriers for new observability tools. Observability tools will heavily leverage AI and ML models. These AI models will improve over time, learning from previous incidents to fine-tune detection algorithms and reduce false positives.”

Final Words on AI Observability

Agentic AI has greater potential but also higher risks than traditional LLMs, SmartBear VP of AI and Architecture Fitz Nowlan told Reworked. And while observability can help, it too carries risk. 

Learning Opportunities

"When considering observability, just as the AI itself can pick the wrong action, it can be hard for an observability system to determine whether the AI is performing incorrectly and an alert should be fired. Another challenge is choosing the right level of abstraction for the observation and tracking. There may be too much data to monitor every single AI interaction," said Nowlan.

Nowlan does see observability help to explain questions related to AI response drift, latency and input distribution. 

Aible founder and CEO Arijit Sengupta also sees observability's use. Without proper observability, AI agents can go off track early and waste resources — or worse, take unintended and potentially harmful actions. Businesses must avoid the trap of treating these agents as infallible — they are powerful tools but require careful monitoring and specialization.

To enhance explainability, observability platforms should incorporate reasoning models that require agents to articulate their plans upfront, allowing users to review and approve them before execution. This approach ensures greater transparency and control.

Looking ahead, Sengupta said, organizations are prioritizing fine-tuned, specialized models for agents to improve task reliability while also seeking reasoning frameworks that enhance AI interpretability.

However, a growing challenge is the lack of fine-tuning options for leading proprietary models, which could drive a shift toward open-source alternatives like Stanford's s1 project.

Editor's Note: Read up on other considerations when introducing agentic AI into your workplace:

About the Author
David Barry

David is a European-based journalist of 35 years who has spent the last 15 following the development of workplace technologies, from the early days of document management, enterprise content management and content services. Now, with the development of new remote and hybrid work models, he covers the evolution of technologies that enable collaboration, communications and work and has recently spent a great deal of time exploring the far reaches of AI, generative AI and General AI.

Main image: Tobias Tullius | unsplash
Featured Research