GPT-5.4

Last Updated : 7 Mar, 2026

GPT-5.4 is a frontier large language model released by OpenAI, designed for complex professional work such as coding, research, automation, document processing and agent-based workflows. It is optimized for real-world tasks that require working with spreadsheets, presentations, documents, software tools and long workflows.

Key characteristics of GPT-5.4 include:

  • Large 1 million token context window
  • Native computer-use capabilities
  • Advanced tool calling and tool search
  • Improved agent workflows
  • Higher benchmark performance across knowledge work and coding
  • Improved token efficiency compared to GPT-5.2

Evolution of GPT Models

The GPT model family has evolved gradually with each version improving reasoning, coding, and tool interaction capabilities.

ModelKey FocusMajor Improvement
GPT-4Multimodal reasoningImage + text understanding
GPT-5.1Faster general modelImproved latency
GPT-5.2Reasoning modelBetter multi-step reasoning
GPT-5.3-CodexCoding-focused modelStrong software engineering ability
GPT-5.4Unified frontier modelCombines reasoning, coding, and agents

GPT-5.4 merges the best parts of GPT-5.2 reasoning and GPT-5.3-Codex coding capabilities.

Context Window and Token Limits

Context window refers to the amount of text the model can process in a single request. GPT-5.4 introduces one of the largest context windows available in frontier AI models.

ModelContext WindowNotes
GPT-5.2272K tokensStandard reasoning model
GPT-5.3-Codex~272K tokensCoding optimized
GPT-5.41M tokens (experimental)Enables long workflows and large documents

Large context windows allow the model to process:

  • Large research documents
  • Full code repositories
  • Long conversation histories
  • Large datasets

Key Technical Improvements in GPT-5.4

1. Planning-Based Reasoning

GPT-5.4 Thinking can present an initial plan before generating the final answer. Users can modify this plan mid-response.

  • Better control over responses
  • Reduced back-and-forth iterations
  • More accurate outputs

2. Improved Knowledge Work Capabilities

GPT-5.4 shows major improvements in professional tasks such as:

  • financial modeling
  • presentation generation
  • spreadsheet creation
  • document analysis

On the GDPval benchmark, which evaluates tasks across 44 occupations, GPT-5.4 achieved the highest performance.

Screenshot-2026-03-07-100645
Image Credits to OpenAI

This benchmark measures how well models perform professional knowledge tasks across industries.

3. Improved Spreadsheet and Document Work

GPT-5.4 significantly improves performance on real business tasks.

These improvements make GPT-5.4 suitable for professional workflows involving:

  • financial spreadsheets
  • reports
  • presentations
  • business documents

4. Reduced Hallucinations

GPT-5.4 improves factual accuracy. Compared to GPT-5.2:

  • 33% fewer incorrect claims
  • 18% fewer responses containing errors

This improvement increases reliability for professional work.

Computer Use Capabilities

GPT-5.4 introduces native computer interaction capabilities, allowing agents to perform tasks directly on software interfaces. Capabilities include:

  • reading screenshots
  • mouse clicks
  • keyboard commands
  • browser automation

This enables automation of tasks such as:

  • sending emails
  • scheduling calendar events
  • filling forms
  • navigating software interfaces

Computer Use Benchmarks

Performance improvements are measured using the OSWorld-Verified benchmark, which evaluates the ability to operate computers.

OSWorld
Image Credits to OpenAI

GPT-5.4 surpasses human performance in this benchmark.

Vision and Document Understanding

GPT-5.4 improves visual perception and document understanding. Example benchmarks:

OmniDocBench
Image Credits to OpenAI

The model also supports high-resolution image inputs. Image limits:

  • Up to 10.24 million pixels
  • Maximum dimension 6000 pixels

This improves tasks such as:

  • document parsing
  • chart understanding
  • UI interaction

Coding Performance

GPT-5.4 integrates the coding strengths of GPT-5.3-Codex. Key improvements include:

  • better code generation
  • improved debugging
  • faster execution
  • lower latency

Benchmark comparison:

SWE-Bench
Image Credits to OpenAI

GPT-5.4 also performs well on Terminal-Bench 2.0, scoring 75.1%.

Tool Use Improvements

GPT-5.4 improves how models interact with external tools and APIs.

Previously, all tool definitions had to be included in the prompt. GPT-5.4 introduces tool search, which allows the model to fetch tool definitions only when needed.

  • reduced token usage
  • faster responses
  • better scaling with large tool ecosystems

Example evaluation results:

Example-token-savings-from-tool-search
Image Credits to OpenAI

This results in 47% token savings.

2. Agentic Tool Calling

GPT-5.4 also improves decision making when choosing tools. Example benchmark:

Toolathlon
Image Credits to OpenAI

Toolathlon measures how well AI agents use tools to complete multi-step tasks.

3. Web Search Performance

GPT-5.4 improves persistent web search capabilities. Benchmark:

BrowseComp
Image Credits to OpenAI

This benchmark evaluates the ability to find difficult information across multiple web pages.

Academic and Reasoning Benchmarks

GPT-5.4 shows strong improvements on reasoning benchmarks.

BenchmarkGPT-5.4GPT-5.2
Frontier Science Research33.0%25.2%
FrontierMath Tier 1-347.6%40.7%
ARC-AGI-193.7%86.2%
ARC-AGI-273.3%52.9%

These benchmarks measure advanced reasoning ability.

Long Context Benchmarks

GPT-5.4 supports extremely long input contexts.

BenchmarkGPT-5.4
Graphwalks BFS (0-128K)93.0%
Graphwalks BFS (256K-1M)21.4%

Safety and Deployment

GPT-5.4 is classified under OpenAI’s High Cyber Capability category. Security measures include:

  • monitoring systems
  • trusted access controls
  • request-level blocking for high-risk requests
  • improved safety classifiers

OpenAI also continues research on Chain-of-Thought monitoring to detect potential misuse.

Comment

Explore