OpenAI Releases GPT-5.3-Codex-Spark Research Preview: 15x Faster AI Code Model Delivers Over 1000 Tokens per Second on Cerebras Hardware

0 0 3 minutes read

OpenAI Releases GPT-5.3-Codex-Spark Research Preview: 15x Faster AI Code Model Delivers Over 1000 Tokens per Second on Cerebras Hardware

OpenAI recently launched a new preview called GPT-5.3 Codex-Spark. This model is designed for 1 thing: high speed. While the standard GPT-5.3 Codex focuses on deep thinking, Spark is designed for close response times. It is the result of deep software-software integration between OpenAI and Cerebras.

The results are game changing. The Spark 15x faster than the flagship GPT-5.3 Codex. It delivers consistently 1000 tokens per second. This speed effectively eliminates the delay between the developer’s imagination and the output of the model code.

Hardware: Wafer-Scale Engineering

A significant jump in performance is enabled by Cerebras Wafer-Scale Engine 3 (WSE-3). Traditional AI models run on small GPU clusters. These GPUs must communicate over wires, which creates a ‘bottleneck.’ This bottleneck slows down the model.

I WSE-3 it is different. It’s a single, large chip that’s the size of an entire silicon wafer. Because the entire model resides in 1 piece of silicon, there are no wires to slow it down. This property provides:

Large on-chip memory.
Very high bandwidth.
Low computer latency.

By using the Cerebras CS-3 programOpenAI can use inference at speeds that traditional GPU clusters cannot reach.

Software Development and low latency

Speed is not limited to the chip. OpenAI has redesigned the way the model interacts with your computer. They depart from traditional methods of solicitation and present a persistent WebSocket connection.

This change leads to several technological improvements:

Round Trip Time (RTT): Client server overhead is reduced by 80%.
Time-to-First-Token (TTFT): This development is by 50%which means the code starts appearing almost as soon as you hit enter.
Per-Token Overhead: The internal processing time for each token is determined 30%.

This setting allows for ‘Real-Time Steering.’ You can interrupt the model while it is writing and redirect its understanding without waiting for the full block to finish.

Trade-offs: Speed vs. Consultation

GPT-5.3 Codex-Spark is optimized for efficiency, not deep complexity. A ‘smaller’ model than the flagship GPT-5.3 Codex. Because of this, it has a low thinking depth.

Devs should be aware of these performance differences:

Ratings: Spark points are low SWE-Bench Pro again Terminal-Bench 2.0 compared to the flagship model. It can be difficult with very complex changes, with many architectural files.
Security: Under OpenAI Preparation Frameworkthe flagship GPT-5.3 Codex is rated as ‘High’ skill with cybersecurity. Spark does not meet this upper limit. It should not be used for sensitive security understanding or automated authentication functions.

Quick Details and Access

Spark is available now ChatGPT Pro users and developers. You can access it through the following tools:

Codex app: Use the model selector to select ‘Spark.’
VS code extension: It is integrated directly into the compiler.
CLI: Access it with the command codex --model gpt-5.3-codex-spark.

A feature	GPT-5.3 Codex-Spark	GPT-5.3 Codex (Flagship)
Tokens per second	1000+	~70
Content Window	128k	128k
Computer hardware	Cerebras WSE-3	NVIDIA GPU clusters
It’s very good	Quick Replication	Critical Thinking / Safety

Key Takeaways

Maximum speed: The Spark 15x faster than the flagship of the GPT-5.3 Codex, it delivers an unprecedented pass 1,000 tokens per second to enable near code generation.
Custom Silicon Infrastructure: This is the first OpenAI model to work on Cerebras Wafer-Scale Engine 3 (WSE-3) hardware than traditional NVIDIA GPUs, using ‘wafer-scale’ memory to eliminate data constraints.
Dynamic Latency Reduction: A combination of a persistent WebSocket connection reduce the client-server round trip 80% and improves the time-to-start-token with 50%.
Real Time Guidance: Designed for ‘micro-iterations,’ the speed of the model allows developers to do just that interrupt and redirect logic in real-time, changing the workflow from batch processing to live two-way processing.
Target Power Exchange: Although it’s faster, the Spark has less depth of thought than the flagship model and it is not meet the ‘high power’ cybersecurity threshold in OpenAI’s Preparedness Framework, making it unsuitable for sensitive auth or security operations.

Check it out Technical details here. Also, feel free to follow us Twitter and don’t forget to join our 100k+ ML SubReddit and Subscribe to Our newspaper. Wait! are you on telegram? now you can join us on telegram too.