Aquileo | GPT-5.4 - GeeksforGeeks

GPT-5.4 is a frontier large language model released by OpenAI, designed for complex professional work such as coding, research, automation, document processing and agent-based workflows. It is optimized for real-world tasks that require working with spreadsheets, presentations, documents, software tools and long workflows.

Key characteristics of GPT-5.4 include:

Large 1 million token context window
Native computer-use capabilities
Advanced tool calling and tool search
Improved agent workflows
Higher benchmark performance across knowledge work and coding
Improved token efficiency compared to GPT-5.2

Evolution of GPT Models

The GPT model family has evolved gradually with each version improving reasoning, coding, and tool interaction capabilities.

Model	Key Focus	Major Improvement
GPT-4	Multimodal reasoning	Image + text understanding
GPT-5.1	Faster general model	Improved latency
GPT-5.2	Reasoning model	Better multi-step reasoning
GPT-5.3-Codex	Coding-focused model	Strong software engineering ability
GPT-5.4	Unified frontier model	Combines reasoning, coding, and agents

GPT-5.4 merges the best parts of GPT-5.2 reasoning and GPT-5.3-Codex coding capabilities.

Context Window and Token Limits

Context window refers to the amount of text the model can process in a single request. GPT-5.4 introduces one of the largest context windows available in frontier AI models.

Model	Context Window	Notes
GPT-5.2	272K tokens	Standard reasoning model
GPT-5.3-Codex	~272K tokens	Coding optimized
GPT-5.4	1M tokens (experimental)	Enables long workflows and large documents

Large context windows allow the model to process:

Large research documents
Full code repositories
Long conversation histories
Large datasets

Key Technical Improvements in GPT-5.4

1. Planning-Based Reasoning

GPT-5.4 Thinking can present an initial plan before generating the final answer. Users can modify this plan mid-response.

Better control over responses
Reduced back-and-forth iterations
More accurate outputs

2. Improved Knowledge Work Capabilities

GPT-5.4 shows major improvements in professional tasks such as:

financial modeling
presentation generation
spreadsheet creation
document analysis

On the GDPval benchmark, which evaluates tasks across 44 occupations, GPT-5.4 achieved the highest performance.

Screenshot-2026-03-07-100645 — Image Credits to OpenAI

This benchmark measures how well models perform professional knowledge tasks across industries.

3. Improved Spreadsheet and Document Work

GPT-5.4 significantly improves performance on real business tasks.

These improvements make GPT-5.4 suitable for professional workflows involving:

financial spreadsheets
reports
presentations
business documents

4. Reduced Hallucinations

GPT-5.4 improves factual accuracy. Compared to GPT-5.2:

33% fewer incorrect claims
18% fewer responses containing errors

This improvement increases reliability for professional work.

Computer Use Capabilities

GPT-5.4 introduces native computer interaction capabilities, allowing agents to perform tasks directly on software interfaces. Capabilities include:

reading screenshots
mouse clicks
keyboard commands
browser automation

This enables automation of tasks such as:

sending emails
scheduling calendar events
filling forms
navigating software interfaces

Computer Use Benchmarks

Performance improvements are measured using the OSWorld-Verified benchmark, which evaluates the ability to operate computers.

GPT-5.4 surpasses human performance in this benchmark.

Vision and Document Understanding

GPT-5.4 improves visual perception and document understanding. Example benchmarks:

The model also supports high-resolution image inputs. Image limits:

Up to 10.24 million pixels
Maximum dimension 6000 pixels

This improves tasks such as:

document parsing
chart understanding
UI interaction

Coding Performance

GPT-5.4 integrates the coding strengths of GPT-5.3-Codex. Key improvements include:

better code generation
improved debugging
faster execution
lower latency

Benchmark comparison:

GPT-5.4 also performs well on Terminal-Bench 2.0, scoring 75.1%.

Tool Use Improvements

GPT-5.4 improves how models interact with external tools and APIs.

1. Tool Search

Previously, all tool definitions had to be included in the prompt. GPT-5.4 introduces tool search, which allows the model to fetch tool definitions only when needed.

reduced token usage
faster responses
better scaling with large tool ecosystems

Example evaluation results:

Example-token-savings-from-tool-search — Image Credits to OpenAI

This results in 47% token savings.

2. Agentic Tool Calling

GPT-5.4 also improves decision making when choosing tools. Example benchmark:

Toolathlon measures how well AI agents use tools to complete multi-step tasks.

3. Web Search Performance

GPT-5.4 improves persistent web search capabilities. Benchmark:

This benchmark evaluates the ability to find difficult information across multiple web pages.

Academic and Reasoning Benchmarks

GPT-5.4 shows strong improvements on reasoning benchmarks.

Benchmark	GPT-5.4	GPT-5.2
Frontier Science Research	33.0%	25.2%
FrontierMath Tier 1-3	47.6%	40.7%
ARC-AGI-1	93.7%	86.2%
ARC-AGI-2	73.3%	52.9%

These benchmarks measure advanced reasoning ability.

Long Context Benchmarks

GPT-5.4 supports extremely long input contexts.

Benchmark	GPT-5.4
Graphwalks BFS (0-128K)	93.0%
Graphwalks BFS (256K-1M)	21.4%

Safety and Deployment

GPT-5.4 is classified under OpenAI’s High Cyber Capability category. Security measures include:

monitoring systems
trusted access controls
request-level blocking for high-risk requests
improved safety classifiers

OpenAI also continues research on Chain-of-Thought monitoring to detect potential misuse.

GPT-5.4