OpenAI Wants to Build Science's Most Tireless Researcher

OpenAI is pivoting its research agenda around a sweeping new ambition: building a fully autonomous AI scientist that can run experiments, generate hypotheses and push the frontiers of human knowledge — largely on its own.

The San Francisco company has declared this vision its guiding priority for the coming years, with chief scientist Jakub Pachocki describing a future in which a "whole research lab" exists inside a data center.

The Roadmap: From Intern to Full Researcher
The Building Blocks Already Exist
Skepticism From the Field
The Safety Problem No One Has Solved
What's Actually at Stake

The Roadmap: From Intern to Full Researcher

OpenAI has laid out a two-stage plan with firm deadlines:

Milestone	Target Date	Description
AI Research Intern	September 2026	An autonomous agent capable of handling a small number of specific research problems by itself
Full AI Researcher	2028	A multi-agent system capable of tackling problems too large or complex for humans alone

The "intern" will represent a meaningful jump beyond what current tools can do. "What we're really looking at for an automated research intern is a system that you can delegate tasks [to] that would take a person a few days," Pachocki told MIT Technology Review.

The 2028 system is a far grander vision — a platform that could theoretically generate new mathematical proofs, crack open problems in biology and chemistry or weigh in on business and policy challenges.

The Building Blocks Already Exist

OpenAI's January release of Codex, a coding agent capable of spinning up and executing code on demand, is considered the proof-of-concept for this vision. Pachocki describes it as an "early version" of the AI researcher.

According to the company, the majority of its technical staff now use Codex regularly in their work. "Our jobs are now totally different than they were even a year ago," Pachocki said. "Nobody really edits code all the time anymore. Instead, you manage a group of Codex agents."

The system's foundation rests on several converging research threads:

Reasoning models: LLMs trained to work through problems step by step, backtracking when they hit dead ends
Long-context capability: Training on complex puzzles (math olympiads, coding contests) that teach models to manage large chunks of information and break tasks into subtasks
Agent architecture: Multi-system coordination that allows models to run in parallel on sub-problems

Skepticism From the Field

Not everyone is convinced the timeline is realistic.

Doug Downey, a research scientist at the Allen Institute for AI, notes that while the core idea is compelling, chaining research tasks together introduces compounding error risk.

"If you have to chain tasks together, then the odds that you get several of them right in succession tend to go down," he explained. His team tested several top LLMs on scientific tasks last summer and found that even the best-performing models — including OpenAI's own — made frequent errors.

That said, Downey acknowledged that the field moves fast: "Those results might already be stale."

Pachocki himself is candid about the unevenness of today's AI tools. He noted that he personally resisted using even basic AI autocomplete until recently, preferring to write code manually in Vim. What changed his mind was watching the models handle tasks that would have consumed a week of his time: "I can have it run experiments in a weekend that previously would have taken me like a week to code."

The Safety Problem No One Has Solved

A system capable of running complex, multi-day research autonomously raises serious safety questions — ones Pachocki doesn't shy away from.

"If you believe that AI is about to substantially accelerate research, including AI research, that's a big change in the world," he said. "And it comes with some serious unanswered questions. If it's so smart and capable, if it can run an entire research program, what if it does something bad?"

OpenAI's primary safeguard right now is chain-of-thought monitoring — training models to narrate their reasoning in a kind of internal scratchpad, which can then be audited by other AI systems looking for signs of unwanted behavior. The company released new details about its use of this technique within Codex earlier this month.

Pachocki described it as the linchpin of safe autonomy: "Once we get to systems working mostly autonomously for a long time in a big data center, I think this will be something that we're really going to depend on."

Critics and researchers in the AI safety community have noted that chain-of-thought monitoring, while promising, is far from a complete solution. LLMs are not yet well enough understood to be fully controlled, and a model that is both capable and opaque presents risks that no single technique has resolved.

What's Actually at Stake

The vision Pachocki is describing — a fully automated research lab capable of generating novel scientific knowledge — would mean a complete upending in how humanity expands its understanding of the world. Whether it's achievable on OpenAI's timeline, or whether the 2028 target will slip as past ambitious AI milestones have, remains to be seen.

Learning Opportunities

Webinar

Mar

Content Leaders Collective: Navigating Content Decisions at Scale

Discover how content leaders are modernizing content operations, avoiding costly missteps and preparing for scale and AI.

Webinar

Apr

The State of Enterprise Site Search: Moving Beyond "Good Enough"

Join CMSWire and SearchStax for a conversation about how enterprise IT and marketing leaders are moving beyond basic site search.

Webinar

On demand

Content Strategy Leaders Live: Scaling for Speed, Complexity and AI in High Tech

A candid roundtable on how high-tech leaders are rethinking content at scale.

Watch Now

Webinar

On demand

Do More with Less: Modernizing the Cloud Contact Center for 2026

Learn how to leverage cloud platforms without adding a single hire to personalize every customer interaction.

Watch Now

Webinar

Complex, internal combustion engine or fine clockwork.

On demand

Cut the Noise: Deploying AI That Actually Moves the Needle

Learn how to turn AI experimentation into concrete revenue operations.

Watch Now

Webinar

On demand

Ditch the Desk Phones: How Modern Teams Drive AI-First Communications

Find out how one team finally pulled the plug on a legacy phone system. And built something smarter.

Watch Now

Webinar

Mar

Content Leaders Collective: Navigating Content Decisions at Scale

Discover how content leaders are modernizing content operations, avoiding costly missteps and preparing for scale and AI.

Webinar

Apr

The State of Enterprise Site Search: Moving Beyond "Good Enough"

Join CMSWire and SearchStax for a conversation about how enterprise IT and marketing leaders are moving beyond basic site search.

Webinar

On demand

Content Strategy Leaders Live: Scaling for Speed, Complexity and AI in High Tech

A candid roundtable on how high-tech leaders are rethinking content at scale.

Watch Now

What's clear is that OpenAI views this as its defining challenge. Not just building a better chatbot, but building a machine that can think its way through the problems that have stumped human scientists for generations.

"Just looking at these models coming up with ideas that would take most PhD [students] weeks," Pachocki said, "makes me expect that we'll see much more acceleration coming from this technology in the near future."

OpenAI Wants to Build Science's Most Tireless Researcher — By 2028

Table of Contents

The Roadmap: From Intern to Full Researcher

The Building Blocks Already Exist

Skepticism From the Field

The Safety Problem No One Has Solved

What's Actually at Stake