GPT-5.4 is a frontier large language model released by OpenAI, designed for complex professional work such as coding, research, automation, document processing and agent-based workflows. It is optimized for real-world tasks that require working with spreadsheets, presentations, documents, software tools and long workflows.
Key characteristics of GPT-5.4 include:
- Large 1 million token context window
- Native computer-use capabilities
- Advanced tool calling and tool search
- Improved agent workflows
- Higher benchmark performance across knowledge work and coding
- Improved token efficiency compared to GPT-5.2
Evolution of GPT Models
The GPT model family has evolved gradually with each version improving reasoning, coding, and tool interaction capabilities.
| Model | Key Focus | Major Improvement |
|---|---|---|
| GPT-4 | Multimodal reasoning | Image + text understanding |
| GPT-5.1 | Faster general model | Improved latency |
| GPT-5.2 | Reasoning model | Better multi-step reasoning |
| GPT-5.3-Codex | Coding-focused model | Strong software engineering ability |
| GPT-5.4 | Unified frontier model | Combines reasoning, coding, and agents |
GPT-5.4 merges the best parts of GPT-5.2 reasoning and GPT-5.3-Codex coding capabilities.
Context Window and Token Limits
Context window refers to the amount of text the model can process in a single request. GPT-5.4 introduces one of the largest context windows available in frontier AI models.
| Model | Context Window | Notes |
|---|---|---|
| GPT-5.2 | 272K tokens | Standard reasoning model |
| GPT-5.3-Codex | ~272K tokens | Coding optimized |
| GPT-5.4 | 1M tokens (experimental) | Enables long workflows and large documents |
Large context windows allow the model to process:
- Large research documents
- Full code repositories
- Long conversation histories
- Large datasets
Key Technical Improvements in GPT-5.4
1. Planning-Based Reasoning
GPT-5.4 Thinking can present an initial plan before generating the final answer. Users can modify this plan mid-response.
- Better control over responses
- Reduced back-and-forth iterations
- More accurate outputs
2. Improved Knowledge Work Capabilities
GPT-5.4 shows major improvements in professional tasks such as:
- financial modeling
- presentation generation
- spreadsheet creation
- document analysis
On the GDPval benchmark, which evaluates tasks across 44 occupations, GPT-5.4 achieved the highest performance.

This benchmark measures how well models perform professional knowledge tasks across industries.
3. Improved Spreadsheet and Document Work
GPT-5.4 significantly improves performance on real business tasks.
These improvements make GPT-5.4 suitable for professional workflows involving:
- financial spreadsheets
- reports
- presentations
- business documents
4. Reduced Hallucinations
GPT-5.4 improves factual accuracy. Compared to GPT-5.2:
- 33% fewer incorrect claims
- 18% fewer responses containing errors
This improvement increases reliability for professional work.
Computer Use Capabilities
GPT-5.4 introduces native computer interaction capabilities, allowing agents to perform tasks directly on software interfaces. Capabilities include:
- reading screenshots
- mouse clicks
- keyboard commands
- browser automation
This enables automation of tasks such as:
- sending emails
- scheduling calendar events
- filling forms
- navigating software interfaces
Computer Use Benchmarks
Performance improvements are measured using the OSWorld-Verified benchmark, which evaluates the ability to operate computers.

GPT-5.4 surpasses human performance in this benchmark.
Vision and Document Understanding
GPT-5.4 improves visual perception and document understanding. Example benchmarks:

The model also supports high-resolution image inputs. Image limits:
- Up to 10.24 million pixels
- Maximum dimension 6000 pixels
This improves tasks such as:
- document parsing
- chart understanding
- UI interaction
Coding Performance
GPT-5.4 integrates the coding strengths of GPT-5.3-Codex. Key improvements include:
- better code generation
- improved debugging
- faster execution
- lower latency
Benchmark comparison:

GPT-5.4 also performs well on Terminal-Bench 2.0, scoring 75.1%.
Tool Use Improvements
GPT-5.4 improves how models interact with external tools and APIs.
1. Tool Search
Previously, all tool definitions had to be included in the prompt. GPT-5.4 introduces tool search, which allows the model to fetch tool definitions only when needed.
- reduced token usage
- faster responses
- better scaling with large tool ecosystems
Example evaluation results:

This results in 47% token savings.
2. Agentic Tool Calling
GPT-5.4 also improves decision making when choosing tools. Example benchmark:

Toolathlon measures how well AI agents use tools to complete multi-step tasks.
3. Web Search Performance
GPT-5.4 improves persistent web search capabilities. Benchmark:

This benchmark evaluates the ability to find difficult information across multiple web pages.
Academic and Reasoning Benchmarks
GPT-5.4 shows strong improvements on reasoning benchmarks.
| Benchmark | GPT-5.4 | GPT-5.2 |
|---|---|---|
| Frontier Science Research | 33.0% | 25.2% |
| FrontierMath Tier 1-3 | 47.6% | 40.7% |
| ARC-AGI-1 | 93.7% | 86.2% |
| ARC-AGI-2 | 73.3% | 52.9% |
These benchmarks measure advanced reasoning ability.
Long Context Benchmarks
GPT-5.4 supports extremely long input contexts.
| Benchmark | GPT-5.4 |
|---|---|
| Graphwalks BFS (0-128K) | 93.0% |
| Graphwalks BFS (256K-1M) | 21.4% |
Safety and Deployment
GPT-5.4 is classified under OpenAI’s High Cyber Capability category. Security measures include:
- monitoring systems
- trusted access controls
- request-level blocking for high-risk requests
- improved safety classifiers
OpenAI also continues research on Chain-of-Thought monitoring to detect potential misuse.