OpenAI Just Launched GPT-5.3-Codex: An Accelerated Model for Agentic Coding That Combines Frontier Code Performance And Expert Consulting In A Single System

OpenAI recently introduced GPT-5.3-Codex, a new agent coding model that extends Codex from writing and updating code to handling a wider range of tasks on the computer. The model combines GPT-5.2-Codex’s frontier code performance and GPT-5.2’s thinking power and technical know-how into a single system, and runs 25% faster for Codex users due to infrastructure and conceptual improvements.
For Devs, GPT-5.3-Codex is positioned as a code agent that can perform long-term tasks involving research, tooling, and complex operations, while always guiding as a ‘colleague’ during the run.
Frontier agetic power and positioning effects
OpenAI tests the GPT-5.3-Codex on four key benchmarks that target real-world coding and agent behavior: SWE-Bench Pro, Terminal-Bench 2.0, OSWorld-Verified, and GDPval.

In SWE-Bench Pro, a dirt-resistant benchmark built on real GitHub issues and pull requests in 4 languages, GPT-5.3-Codex achieves 56.8% with xhigh think effort. This is a slight improvement over GPT-5.2-Codex and GPT-5.2 at the same level of effort. Terminal-Bench 2.0, which measures the final skills needed by coding agents, shows a big gap: GPT-5.3-Codex reaches 77.3%, much higher than previous models.


In OSWorld-Verified, an agent computing benchmark where agents complete production tasks in a virtual desktop environment, GPT-5.3-Codex scores 64.7%. Humans score around 72% on this benchmark, providing a rough reference point for human ranking.
For professional knowledge work, the GPT-5.3-Codex is evaluated with GDPval, an assessment introduced in 2025 that measures the performance of well-defined tasks in all 44 occupations. The GPT-5.3-Codex achieves 70.9% wins or ties in GDPval, matching GPT-5.2 with a greater effort to think. These tasks include creating presentations, spreadsheets, and other work products that are consistent with the professional’s typical workflow.
The details of the remarkable programs are that GPT-5.3-Codex achieves its results with fewer tokens than previous models, allowing users to “build more” within the same context and budget.
Without coding: GDPval and OSWorld
OpenAI emphasizes that software devs, designers, product managers, and data scientists perform many tasks beyond generating code. GPT-5.3-Codex is designed to assist in the entire software life cycle: debugging, deployment, monitoring, writing PRDs, copy editing, conducting user research, tests, and metrics.
With custom capabilities similar to those used in earlier GDPval tests, the GPT-5.3-Codex produces full work products. Examples on the official OpenAI blog include financial advice slide decks, a sales training document, an NPV analysis spreadsheet, and a fashion presentation. Each GDPval function is designed by a domain expert and shows the actual activity from that function.


At OSWorld, the GPT-5.3-Codex shows stronger computing power than previous GPT models. OSWorld-Verified requires the model to use vision to complete various tasks in a desktop environment, closely matching how agents operate real applications and tools instead of just generating text.
An interactive editor in the Codex app
As the models become more powerful, OpenAI poses great challenges such as human supervision and control of multiple agents working in parallel. The Codex application is designed to make managing and directing agents easier, and with GPT-5.3-Codex gains interactive behavior.
Codex now provides regular updates during runtime so users can see important decisions and progress. Instead of waiting for one final output, users can ask questions, discuss methods, and guide the model in real time. The GPT-5.3-Codex explains what it does and answers the question while keeping the context. This ‘tracking behavior’ can be configured in the Codex app settings.
A model that has helped to train and drive
The GPT-5.3-Codex is the first model in this ‘self-made’ family. OpenAI used earlier versions of the GPT-5.3-Codex to optimize its training, manage deployments, and evaluate test and evaluation results.
The OpenAI research team used Codex to monitor and adjust training runs, track patterns throughout the training process, analyze interaction quality, suggest fixes, and build apps that visualize behavioral differences compared to previous models. The development team used Codex to optimize and synchronize the rendering harness, identify rendering bugs, find causes of low cache hit rates, and dynamically balance GPU clusters to maintain stable latency under increased traffic.
During alpha testing, the researcher asked GPT-5.3-Codex to measure the additional work completed each time and the impact on productivity. The model generated regex-based classifiers to measure frequency specificity, positive and negative responses, and activity progress, then passed these session logs and generated a report. Codex also helped create new data pipelines and rich visualizations where standard dashboard tools were inadequate and summarized information from thousands of data points in less than 3 minutes.
Cybersecurity capabilities and defenses
The GPT-5.3-Codex is the first model that OpenAI classifies as ‘High Skill’ for cybersecurity-related tasks under its Prepare framework and the first model that it has specifically trained to identify software vulnerabilities. OpenAI says it has no clear evidence that the model can handle cyber attacks in the end and is taking a cautious approach with its most comprehensive cybersecurity stack to date.
Mitigation includes security training, automated monitoring, trusted access to advanced capabilities, and reinforcement pipelines that include threat intelligence. OpenAI is launching a ‘Trusted Cyber Access’ pilot, extending the private beta of Aardvark, a security research agent, and offering free codebase scans for widely used open source projects such as Next.js, where Codex has recently been used to identify exposed vulnerabilities.
Key Takeaways
- An integrated boundary model for coding and function: GPT-5.3-Codex combines the writing capabilities of GPT-5.2-Codex with the reasoning and processing capabilities of GPT-5.2 in a single agent model, and is 25% faster than Codex.
- Advanced coding and agent benchmarks: The model sets new highs in SWE-Bench Pro (56.8% in xhigh), Terminal-Bench 2.0 (77.3%), and gains 64.7% in OSWorld-Verified and 70.9% wins or tied in GDPval, usually with fewer tokens than previous models.
- It supports long-term web and application development: Using capabilities such as ‘develop web game’ and common tracks such as ‘debug’ and ‘develop game,’ GPT-5.3-Codex has automatically developed complex racing games and diving games with millions of tokens, demonstrating the continuous ability of multi-step development.
- A tool in training and distribution: Earlier versions of GPT-5.3-Codex were used to debug training implementations, analyze behavior, improve the feed stack, build custom pipelines, and summarize large alpha logs, making Codex’s first model ‘a tool for self-creation.’
- High power internet model with supervised access: GPT-5.3-Codex is the first OpenAI rated ‘High Skill’ cyber model and the first to be trained to identify software vulnerabilities. OpenAI pairs this with Trusted Cyber Access, an extended Aardvark beta, free codebase scanning for projects like Next.js.
Check it out Technical details and Try it here. Also, feel free to follow us Twitter and don’t forget to join our 100k+ ML SubReddit and Subscribe to Our newspaper. Wait! are you on telegram? now you can join us on telegram too.




