OpenAI Releases GPT-5.5, the First Fully Retrained Base Model Since GPT-4.5, With 1M-Token Context and State-of-the-Art Agentic Benchmarks

Overview

OpenAI on April 23, 2026 released GPT-5.5, describing it as the first fully retrained base model since GPT-4.5 and a significant architectural departure from the post-training iterations that produced GPT-5.1 through GPT-5.4. The model rolls out immediately to ChatGPT Plus, Pro, Business, and Enterprise users, with API access to follow after additional safety review. A more capable GPT-5.5 Pro variant is available to Pro, Business, and Enterprise subscribers at substantially higher cost.

What the Model Offers

GPT-5.5 ships with a 922,000-token context window — approximately 1,383 A4 pages of text — and accepts text and image inputs while producing text output, according to Fello AI’s technical summary. OpenAI describes the model as redesigned for agentic workflows, meaning it “takes a sequence of actions, uses tools, checks its own work” rather than operating primarily as a single-pass response generator, as detailed by OFox AI.

The model runs on NVIDIA GB200 NVL72 infrastructure, which OpenAI reports delivers 35 times lower cost per million tokens compared to previous generations. Despite increased capability, the company claims GPT-5.5 maintains the same per-token latency as GPT-5.4 while using fewer tokens to complete equivalent tasks.

OpenAI President Greg Brockman characterized the release carefully, stating: “This model is a real step forward towards the kind of computing that we expect in the future — but it is one step,” as reported by TechCrunch. Chief Scientist Jakub Pachocki added a note of self-criticism, telling reporters: “I think the last two years have been surprisingly slow.”

Benchmark Performance

OpenAI highlights several benchmark results to support its claims of state-of-the-art agentic performance. On Terminal-Bench 2.0, which measures command-line tool usage, GPT-5.5 standard scores 82.7 percent, a 13-point lead over Claude Opus 4.7’s 69.4 percent, according to Interesting Engineering. On GDPval, a benchmark spanning economically valuable tasks across 44 occupations, GPT-5.5 reaches 84.9 percent versus Opus 4.7’s 80.3 percent. On OSWorld-Verified, which tests whether a model can operate real desktop environments, GPT-5.5 records 78.7 percent, just above Opus 4.7’s 78.0 percent, as detailed by Fello AI.

The GPT-5.5 Pro tier achieves 39.6 percent on FrontierMath Tier 4, a postdoctoral-level mathematics benchmark, nearly double Claude Opus 4.7’s 22.9 percent on the same test, as reported by SiliconAngle.

The picture is more mixed on software engineering tasks. On SWE-Bench Pro, which measures resolution of real GitHub issues, GPT-5.5 scores 58.6 percent, behind Claude Opus 4.7’s 64.3 percent. Anthropic’s model also outperforms GPT-5.5 on multilingual question answering, where Opus 4.7 reaches 91.5 percent compared to GPT-5.5’s 83.2 percent. On hallucination resistance, GPT-5.5 scores 86 percent on the AA-Omniscience benchmark against Opus 4.7’s 36 percent — though this benchmark’s inversion means a higher score here reflects lower factual accuracy, per Fello AI.

Overall, Fello AI places GPT-5.5 at number two on the Artificial Analysis Intelligence Index with a score of 59 out of 141 evaluated models.

On the competitive fringe, Anthropic’s still-restricted Mythos model leads GPT-5.5 on six overlapping benchmarks, but Mythos remains available only to approximately 50 partner organizations. GPT-5.5 represents, in the assessment of analysts quoted by Fello AI, “the best model most people can actually use.”

Notable Applied Results

OpenAI reports two applied demonstrations intended to show the model’s research capabilities. In collaboration with academic mathematicians, GPT-5.5 reportedly helped researchers identify a new proof related to Ramsey numbers in combinatorics, a longstanding area of discrete mathematics. The company also reports that the model optimized internal infrastructure management software, increasing token generation speeds by over 20 percent, according to SiliconAngle. Neither result has been independently peer-reviewed at the time of release.

OpenAI also reports the model can process large-scale structured documents — the company cited an example involving more than 71,000 pages of tax forms — and produce what it characterizes as PhD-quality academic output on complex research tasks, per Interesting Engineering.

Pricing and Availability

GPT-5.5 doubles the per-token price of its predecessor. The standard API tier costs $5 per million input tokens and $30 per million output tokens, compared to GPT-5.4’s $2.50 and $15 respectively. The Pro tier raises those figures to $30 input and $180 output per million tokens. Batch and flexible pricing carry a 50 percent discount, according to OFox AI. The doubling represents the largest single-release price increase OpenAI has made across the GPT-5.x series.

API availability is listed as coming “very soon,” pending completion of OpenAI’s safety and deployment review process, as noted by OFox AI.

What Remains Unclear

OpenAI has not released technical details about the training compute or dataset scale used for GPT-5.5, making it impossible to externally verify the claim that it represents a ground-up retrain rather than a particularly large post-training run. The company’s SWE-Bench Pro score of 58.6 percent lags behind Claude Opus 4.7, and OpenAI has not addressed whether this gap reflects architectural choices, training data differences, or benchmark framing.

The practical significance of the Ramsey number proof contribution also remains to be assessed by the mathematics community. OpenAI has described the result as an AI-assisted discovery but has not provided the paper for independent review.

The relationship between GPT-5.5 and the previously reported internal project code-named Spud — which OpenAI CEO Sam Altman had described as a “very strong model” in internal communications as recently as late March — has not been officially confirmed or denied by the company.