yellow brain on blue clear background
News

Poetiq’s AI Reasoning Layer Hits 54% on ARC-AGI-2 at Half the Cost

2 minute read
Michelle Hawley avatar
By
SAVED
Meta-system breaks the 50% barrier on the benchmark while cutting per-problem costs by more than half.

Key Takeaways

  • Poetiq surpassed ARC-AGI-2 with a 54% score.
  • New meta-system reduced per-problem cost by over 50%.
  • AI and data leaders may gain advanced, cost-effective reasoning tools.

A new AI reasoning overlay has shattered the ARC-AGI-2 benchmark, potentially marking a shift toward cost-efficient intelligence layers that enhance models without retraining.

Poetiq's meta-system achieved a new state-of-the-art result on the ARC-AGI-2 Semi-Private Test Set, with ARC Prize officially verifying a 54% accuracy score at $30.57 per problem. The previous benchmark of 45% accuracy, set by Gemini 3 Deep Think, at a cost of $77.16 per problem.

According to company officials, the system represents the first to break through the 50% barrier on ARC-AGI-2, establishing what they describe as "an entirely new Pareto frontier" for cost-effective reasoning. The company released a pure Gemini-based configuration for official evaluation.

Table of Contents

Reasoning Models Replace Scaling-First Strategy

Pretraining approaches are declining as a primary development strategy. Major providers, including OpenAI, Anthropic, Google and DeepSeek, now offer reasoning models designed to handle complex tasks in math and coding.

Traditional scaling methods typically hit data limitation difficulties. Rather than simply feeding models more compute and data, developers are building systems that reason through problems. Enterprise testing shows strong interest in reasoning models and the agentic capabilities they enable, as these systems allow them to "solve newer, more complex use cases," opening value creation avenues previously unavailable.

Organizations are increasingly comfortable hosting directly with model providers to access the latest model with the best performance as soon as it's available.

Related Article: How and Why Agentic AI Changes the Game

Fine-Tuning Fades, Model-Agnostic Validation Layers Gain Traction

Fine-tuning with proprietary data is becoming less necessary as model capabilities improve. Many organizations find that "dumping data into a long context" with off-the-shelf models yields nearly equivalent results. 

Independent validation systems are also emerging as important architectural components. Experts recommend enterprises avoid relying solely on model providers, instead implementing:

  • Continuous AI purple teaming for customized guardrails
  • Third-party validations with agents as judges
  • Autocorrection layers that dynamically review outputs with audit trails

These model-agnostic approaches create scalable governance without requiring model retraining. 

With fewer than one billion people currently using LLMs, adoption remains early-stage. As capabilities converge, competitive advantages are shifting toward distinct strategic positions rather than pure model performance. 

Poetiq's Meta-System Capabilities

"Our meta-system improves with every task that it solves by learning how the task was solved."

Poetiq Officials 

Poetiq's platform delivers reasoning enhancements without model retraining. Key capabilities include:

CapabilityDescription
Learned test time reasoningFirst system to exceed 50% on ARC-AGI-2
Model-agnostic deploymentWorks with any frontier model
Automatic system creation
Builds task-specific solutions
Cost optimization
Reduces per-problem costs by 60%
Self-improvement
Learns from each solved task
Learning Opportunities

Related Article: AI Isn't Actually Intelligent: Why We Need a Reality Check

Poetiq Background

Poetiq targets enterprise technology leaders, AI product teams and research organizations with a model-agnostic intelligence layer for large language models. The company offers an "on-top" reasoning system that enhances the accuracy and reliability of LLM outputs without modifying base models.

About the Author
Michelle Hawley

Michelle Hawley is an experienced journalist who specializes in reporting on the impact of technology on society. As editorial director at Simpler Media Group, she oversees the day-to-day operations of VKTR, covering the world of enterprise AI and managing a network of contributing writers. She's also the host of CMSWire's CMO Circle and co-host of CMSWire's CX Decoded. With an MFA in creative writing and background in both news and marketing, she offers unique insights on the topics of tech disruption, corporate responsibility, changing AI legislation and more. She currently resides in Pennsylvania with her husband and two dogs. Connect with Michelle Hawley:

Main image: ImagesRouges | Adobe Stock
Featured Research