Earlier this month IBM introduced the IBM z17, a next-generation mainframe powered by the IBM Telum II processor to act as an "AI accelerator." The z17 processes over 450 billion AI inferences a day — a 50% increase over its predecessor, z16 — with a response time of one millisecond. The mainframe supports over 250 AI use cases including risk assessment, medical image analysis, chatbot management and fraud detection, according to the company.
IBM has been in the mainframe business since 1952. And while mainframes may sound like a relic from a bygone era, modern mainframes come equipped with AI-optimized processors and hardware accelerators, making them ideal for running generative AI models on-premises. Also, where mainframes in the '50s required 2000 to 10,000 square feet, the z17 takes up the space of a large refrigerator.
What this means is the z17 offers the performance, scalability and reliability needed to process large datasets securely and in real time — making them perfect for enterprises with strict data governance and uptime requirements — and ideal specifically for the use of AI agents.
IBM z17 Mainframe Moves AI to On Premises
The IBM z17 will get added capabilities and security support from a number of complementary products:
- IBM Spyre Accelerator — A PCIe card expected in late 2025. The expansion card provides secure data processing, so AI applications can run without having to transfer sensitive information off platform.
- watsonx Code Assistant for Z and watsonx Assistant for Z — These system management tools offer real-time code suggestions and assist with incident resolution while integrating with IBM Z Operations Unite to help teams quickly detect and address issues using live system data.
- z/OS 3.2 operating system — Slated for release in Q3 2025, the z/OS brings hardware-accelerated AI, support for hybrid cloud data environments and compatibility with modern data access methods.
- IBM Vault — An identity-based security solution built on the company's HashiCorp acquisition to manage a company's sensitive data across their full IT holdings.
- IBM Storage DS8000 — Optimized for mission-critical workloads, DS8000 complements the z17 by delivering consistent performance, a modular architecture and robust data security.
Notably, the company calls out the mainframe's quantum-safe cryptography, a growing security concern. IBM z17's release date is set for June 18.
The Benefits of GenAI On Premises
Enterprise AI consultancy RTS Labs CRO Rob Lubeck thinks IBM’s move to bring generative AI on-premises points to a growing trend. "It just makes a lot of sense,” he said. As companies become more focused on data control and real-time performance, they're looking to harness AI without compromising sensitive information. "We want the power of AI, but we don’t want our data leaving our four walls."
The shift is especially noticeable in organizations balancing speed and security, Lubeck said. With on-prem systems, “you get real-time answers without sending data out and waiting for it to come back.” The benefits are clear: more control, local data for privacy and compliance, and the ability to fine-tune systems beyond what managed cloud environments offer.
"There’s just more flexibility," he said.
On-premises AI fits seamlessly into hybrid cloud strategies, where early AI development and scale happens in the cloud and sensitive workloads that demand more control are moved on-premises.
“IBM’s recent introduction of the Spyre Accelerator and the Telum II processor signals a clear direction: a deeper investment in purpose-built AI hardware,” Lubeck said. “It’s a strong indicator that enterprise AI isn’t just about innovation — it’s about building the right foundation for real-world deployment at scale.”
IBM's Mainframe Experience Gives it a Leg Up With On-Prem GenAI
As organizations see the potential benefits of integrating GenAI models into their own infrastructure, the shift toward on-prem solutions is gaining traction, Appvance CEO Kevin Surace told Reworked.
With on-premises hosting of AI models, the enterprise assumes full responsibility for their operation, upgrades and maintenance. This provides a significant level of control, especially in industries such as healthcare and banking where data privacy and security are paramount.
AI agents are at the core of this transformation, Surace added. While the potential for these agents is clear, concerns stand in the way of adoption.
“AI agents can execute tasks on behalf of its keeper,” he said. “From watching for trip points to interacting with back-office tasks, it is like having another person working for you. However, so far capabilities and trust are big stoppers. And on-prem will alleviate much of this concern.”
On-prem or private cloud deployment isn't just a preference, it's a necessity for regulated industries, Surace said. While these enterprises may shift toward hybrid cloud or fully cloud-based solutions in the future, for now the need is for on-premises models.
IBM's extensive legacy in mainframe computing puts it in a prime position to dominate the enterprise AI space. “IBM is in every major enterprise with mainframes for legacy applications,” Surace added. “And many enterprises have thousands of these. So never write off IBM as a major force in the enterprise. Leading with AI-accelerated hardware options puts them in a lead position in most enterprises where trust and on-prem and hybrid cloud rule ... and IBM already has a major footprint.”
On Premises Brings the Latency AI Agents Need
On-premises Gen AI provides increased ownership and governance, proximity to data, ultra-low latency and the ability to enforce custom security boundaries — all critical for high-stakes flows like fraud detection, contract validation or real-time inventory optimization, where even milliseconds matter, Rierino co-founder and CMO Mine Bayrak Ozmen told Reworked.
Latency isn't just a performance concern, it's a functional constraint preventing agentic AI from performing correctly or at its best, she said. Even small delays can break the value loop of context-aware AI, with significant impact on the business bottom line. For example, in e-commerce, shoppers expect instant results when searching or receiving AI-assisted recommendations.
“A delay of a few hundred milliseconds can degrade the user experience or directly impact conversions. In those cases, AI adoption becomes difficult, not because the models lack intelligence, but because they cannot respond fast enough to be useful,” she said.
IBM z17 addresses this by bringing AI inference closer to the data layer via the Telum II processor and Spyre Accelerator, she said. This tight integration reduces data movement and enables AI agents to interact with core systems in real time without leaving the secure mainframe environment.
The focus on enabling agentic AI is a smart move, she added. It feeds into the broader trend of AI moving beyond passive prompting to acting autonomously within enterprise workflows.
The Downsides of Hosting LLMs On Premises
On premises GenAI hosting has some trade-offs, including cost, complexity and model agility, said Bayrak Ozmen. Running generative AI locally means investing in specialized hardware, power and cooling, which also requires a certain level of operational expertise to manage them optimally, she continued.
She also notes the challenges to keeping models current and well-functioning, unless they are integrated into a broader update lifecycle. Another consideration is the pace of innovation in the LLM space. "On-prem models may lag behind the latest capabilities available in the cloud, e.g., in terms of reasoning performance or instruction tuning, unless there is special effort put into continuous improvement and evaluation," Bayrak Ozmen said.
Lubeck also acknowledges on-premises isn’t for everyone. The upfront investment, skilled personnel and long-term strategy may make it a less appealing proposition for some. "Smaller companies or teams that just want to test and experiment might find the cloud a better starting point," he said. But for those building production-level systems and needing tight data control, it’s a smart fit.
Bayrak Ozmen sees the future of secure AI as distributed, where certain workloads, especially those involving sensitive data, run on-prem. She also expects a shift toward proximity-based execution, where inference happens wherever the data lives. “As agentic AI becomes more and more embedded in business logic, on-premises will be critical for this kind of secure and context-aware automation,” she concluded.
Editor's Note: Read more about how businesses are balancing security and innovation with AI:
- The Hidden Cracks: How AI Integration Is Testing Workplace Resilience — AI effectiveness depends on integration with business systems. But the more embedded AI becomes, the more it exposes an organization to potential problems.
- Prep Your Infrastructure for the Agentic AI Future — Want to deploy agentic AI? You'll probably have to upgrade your digital infrastructure.
- The Potential and Risks of AI-Powered Data Integration — Companies are racing to AI-driven data integration, with good reasons. But going too fast carries security risks, too.