GPT-5.5 scores 62.5 on Every’s engineer test

OraCore Editors

[MODEL] May 23, 20263 min readOraCore Editors

GPT-5.5 scores 62.5 on Every’s engineer test

Every says GPT-5.5 beat Opus 4.7 on its Senior Engineer Benchmark, scoring 62.5 on its best run and landing as OpenAI’s work model.

coding OpenAI GPT-5.5 benchmark Anthropic

Share LinkedIn

GPT-5.5 scores 62.5 on Every’s engineer test

Every says GPT-5.5 is OpenAI’s fastest new work model and tops its Senior Engineer Benchmark.

OpenAI released GPT-5.5 on April 23, 2026, and Every says the model hit 62.5 on its best run on the publication’s Senior Engineer Benchmark. That put it well ahead of Opus 4.7 in the low 30s, though still below human senior engineers, who score in the high 80s and low 90s.

項目	數值
Release date	April 23, 2026
Best Senior Engineer Benchmark score	62.5
Opus 4.7 comparison score	Low 30s
Human senior engineer range	High 80s to low 90s
Context window	1 million tokens
Input pricing	$5 per 1M tokens
Output pricing	$30 per 1M tokens
GPT-5.5 Pro output pricing	$180 per 1M tokens

What changed

Get the latest AI news in your inbox

Weekly picks of model releases, tools, and deep dives — no spam, unsubscribe anytime.

No spam. Unsubscribe at any time.

Every’s review frames GPT-5.5 as a new pre-train, not just a better wrapper around the same base model. The result, according to the piece, is a model that feels faster, steadier, and easier to work with than Anthropic’s Opus 4.7 for many professional tasks.

The article says GPT-5.5 launches first in ChatGPT and Codex, with API access coming later after more safety and security checks. It also keeps a 1 million-token context window, supports prompt caching, and defaults to medium reasoning instead of none.

Best benchmark run: 62.5 on Every’s Senior Engineer Benchmark
Opus 4.7: low 30s at a similar reasoning level
Human senior engineers: high 80s to low 90s
API pricing: $5 in, $30 out per 1 million tokens
GPT-5.5 Pro pricing: $30 in, $180 out per 1 million tokens
Launch surface: ChatGPT and Codex first, API later

Every also says GPT-5.5 is better at sustained engineering, writing, dashboards, curricula, run-of-show docs, and transcript-based work. But it still trails Opus 4.7 on some product and design tasks, plus Ruby, PowerPoint, and spatial composition.

Why it matters

The practical shift is less about a single benchmark win and more about where OpenAI wants to compete. Every says GPT-5.5 is OpenAI’s clearest bid to reclaim coding and professional work, areas where Anthropic has been the default for many teams.

For developers, the pitch is simple: fewer retries, more planning, and a model you can keep in the loop on long tasks. If that holds up in production, GPT-5.5 could become the cheaper-to-finish option even when its token price is higher than GPT-5.4.

The bigger question is whether speed and reliability will outweigh Opus 4.7’s edge in planning, product taste, and presentation work. For now, Every’s take is that GPT-5.5 is the safer daily driver for code and knowledge work, while Opus still has the sharper creative finish.

The takeaway: GPT-5.5 looks like OpenAI’s strongest move yet to turn ChatGPT into a work model, but the real test is whether teams trust it on unfinished, messy jobs.

// Related Articles

GPT-5.5 scores 62.5 on Every’s engineer test

What changed

Get the latest AI news in your inbox

Why it matters

Gemini 1.5 Pro-002, Flash-002 and 2.0 Flash update Google AI

MiniMax M3 Proves Open-Weight Can Still Win on Coding

Gemini 3.5 Flash Pricing, Context, Benchmarks

Gemma 4 12B: Specs, Benchmarks & How to Run It Locally

Best Kimi Models in 2026: K2.5 vs K2 Thinking

Kimi K2.6 adds open-source coding and agent swarm