[MODEL] 3 min readOraCore Editors

GPT-5.5 scores 62.5 on Every’s engineer test

Every says GPT-5.5 beat Opus 4.7 on its Senior Engineer Benchmark, scoring 62.5 on its best run and landing as OpenAI’s work model.

Share LinkedIn
GPT-5.5 scores 62.5 on Every’s engineer test

Every says GPT-5.5 is OpenAI’s fastest new work model and tops its Senior Engineer Benchmark.

OpenAI released GPT-5.5 on April 23, 2026, and Every says the model hit 62.5 on its best run on the publication’s Senior Engineer Benchmark. That put it well ahead of Opus 4.7 in the low 30s, though still below human senior engineers, who score in the high 80s and low 90s.

項目數值
Release dateApril 23, 2026
Best Senior Engineer Benchmark score62.5
Opus 4.7 comparison scoreLow 30s
Human senior engineer rangeHigh 80s to low 90s
Context window1 million tokens
Input pricing$5 per 1M tokens
Output pricing$30 per 1M tokens
GPT-5.5 Pro output pricing$180 per 1M tokens

What changed

Get the latest AI news in your inbox

Weekly picks of model releases, tools, and deep dives — no spam, unsubscribe anytime.

No spam. Unsubscribe at any time.

Every’s review frames GPT-5.5 as a new pre-train, not just a better wrapper around the same base model. The result, according to the piece, is a model that feels faster, steadier, and easier to work with than Anthropic’s Opus 4.7 for many professional tasks.

GPT-5.5 scores 62.5 on Every’s engineer test

The article says GPT-5.5 launches first in ChatGPT and Codex, with API access coming later after more safety and security checks. It also keeps a 1 million-token context window, supports prompt caching, and defaults to medium reasoning instead of none.

  • Best benchmark run: 62.5 on Every’s Senior Engineer Benchmark
  • Opus 4.7: low 30s at a similar reasoning level
  • Human senior engineers: high 80s to low 90s
  • API pricing: $5 in, $30 out per 1 million tokens
  • GPT-5.5 Pro pricing: $30 in, $180 out per 1 million tokens
  • Launch surface: ChatGPT and Codex first, API later

Every also says GPT-5.5 is better at sustained engineering, writing, dashboards, curricula, run-of-show docs, and transcript-based work. But it still trails Opus 4.7 on some product and design tasks, plus Ruby, PowerPoint, and spatial composition.

Why it matters

The practical shift is less about a single benchmark win and more about where OpenAI wants to compete. Every says GPT-5.5 is OpenAI’s clearest bid to reclaim coding and professional work, areas where Anthropic has been the default for many teams.

GPT-5.5 scores 62.5 on Every’s engineer test

For developers, the pitch is simple: fewer retries, more planning, and a model you can keep in the loop on long tasks. If that holds up in production, GPT-5.5 could become the cheaper-to-finish option even when its token price is higher than GPT-5.4.

The bigger question is whether speed and reliability will outweigh Opus 4.7’s edge in planning, product taste, and presentation work. For now, Every’s take is that GPT-5.5 is the safer daily driver for code and knowledge work, while Opus still has the sharper creative finish.

The takeaway: GPT-5.5 looks like OpenAI’s strongest move yet to turn ChatGPT into a work model, but the real test is whether teams trust it on unfinished, messy jobs.