OpenAI's Codex open on smartphone
News

OpenAI Wants to Build Science's Most Tireless Researcher — By 2028

3 minute read
Michelle Hawley avatar
By
SAVED
OpenAI's new "North Star" is an autonomous research system capable of tackling problems too complex for humans.

OpenAI is pivoting its research agenda around a sweeping new ambition: building a fully autonomous AI scientist that can run experiments, generate hypotheses and push the frontiers of human knowledge — largely on its own.

The San Francisco company has declared this vision its guiding priority for the coming years, with chief scientist Jakub Pachocki describing a future in which a "whole research lab" exists inside a data center.

Table of Contents

The Roadmap: From Intern to Full Researcher

OpenAI has laid out a two-stage plan with firm deadlines:

MilestoneTarget DateDescription
AI Research InternSeptember 2026An autonomous agent capable of handling a small number of specific research problems by itself
Full AI Researcher2028A multi-agent system capable of tackling problems too large or complex for humans alone

The "intern" will represent a meaningful jump beyond what current tools can do. "What we're really looking at for an automated research intern is a system that you can delegate tasks [to] that would take a person a few days," Pachocki told MIT Technology Review.

The 2028 system is a far grander vision — a platform that could theoretically generate new mathematical proofs, crack open problems in biology and chemistry or weigh in on business and policy challenges.

The Building Blocks Already Exist

OpenAI's January release of Codex, a coding agent capable of spinning up and executing code on demand, is considered the proof-of-concept for this vision. Pachocki describes it as an "early version" of the AI researcher.

According to the company, the majority of its technical staff now use Codex regularly in their work. "Our jobs are now totally different than they were even a year ago," Pachocki said. "Nobody really edits code all the time anymore. Instead, you manage a group of Codex agents."

The system's foundation rests on several converging research threads:

  • Reasoning models: LLMs trained to work through problems step by step, backtracking when they hit dead ends
  • Long-context capability: Training on complex puzzles (math olympiads, coding contests) that teach models to manage large chunks of information and break tasks into subtasks
  • Agent architecture: Multi-system coordination that allows models to run in parallel on sub-problems

Skepticism From the Field

Not everyone is convinced the timeline is realistic.

Doug Downey, a research scientist at the Allen Institute for AI, notes that while the core idea is compelling, chaining research tasks together introduces compounding error risk.

"If you have to chain tasks together, then the odds that you get several of them right in succession tend to go down," he explained. His team tested several top LLMs on scientific tasks last summer and found that even the best-performing models — including OpenAI's own — made frequent errors.

That said, Downey acknowledged that the field moves fast: "Those results might already be stale."

Pachocki himself is candid about the unevenness of today's AI tools. He noted that he personally resisted using even basic AI autocomplete until recently, preferring to write code manually in Vim. What changed his mind was watching the models handle tasks that would have consumed a week of his time: "I can have it run experiments in a weekend that previously would have taken me like a week to code."

The Safety Problem No One Has Solved

A system capable of running complex, multi-day research autonomously raises serious safety questions — ones Pachocki doesn't shy away from.

"If you believe that AI is about to substantially accelerate research, including AI research, that's a big change in the world," he said. "And it comes with some serious unanswered questions. If it's so smart and capable, if it can run an entire research program, what if it does something bad?"

OpenAI's primary safeguard right now is chain-of-thought monitoring — training models to narrate their reasoning in a kind of internal scratchpad, which can then be audited by other AI systems looking for signs of unwanted behavior. The company released new details about its use of this technique within Codex earlier this month.

Pachocki described it as the linchpin of safe autonomy: "Once we get to systems working mostly autonomously for a long time in a big data center, I think this will be something that we're really going to depend on."

Critics and researchers in the AI safety community have noted that chain-of-thought monitoring, while promising, is far from a complete solution. LLMs are not yet well enough understood to be fully controlled, and a model that is both capable and opaque presents risks that no single technique has resolved.

What's Actually at Stake

The vision Pachocki is describing — a fully automated research lab capable of generating novel scientific knowledge — would mean a complete upending in how humanity expands its understanding of the world. Whether it's achievable on OpenAI's timeline, or whether the 2028 target will slip as past ambitious AI milestones have, remains to be seen.

Learning Opportunities

What's clear is that OpenAI views this as its defining challenge. Not just building a better chatbot, but building a machine that can think its way through the problems that have stumped human scientists for generations.

"Just looking at these models coming up with ideas that would take most PhD [students] weeks," Pachocki said, "makes me expect that we'll see much more acceleration coming from this technology in the near future."

About the Author
Michelle Hawley

Michelle Hawley is an experienced journalist who specializes in reporting on the impact of technology on society. As editorial director at Simpler Media Group, she oversees the day-to-day operations of VKTR, covering the world of enterprise AI and managing a network of contributing writers. She's also the host of CMSWire's CMO Circle and co-host of CMSWire's CX Decoded. With an MFA in creative writing and background in both news and marketing, she offers unique insights on the topics of tech disruption, corporate responsibility, changing AI legislation and more. She currently resides in Pennsylvania with her husband and two dogs. Connect with Michelle Hawley:

Main image: Robert | Adobe Stock
Featured Research