Technology & AI

Anthropic Claude Opus 4.6 Release With 1M Context, Agentic Code, Adaptive Logical Controls, and Expanded Security Tools Capabilities

Anthropic introduced Claude Opus 4.6, its most powerful model to date, focusing on long-range content visualization, agent coding, and high-value information work. The model builds on Claude Opus 4.5 and is now available on claude.ai, the Claude API, and major cloud providers under ID. claude-opus-4-6.

The focus of the model: the activity of the agent, not a single response

Opus 4.6 is designed for multi-step tasks where the model must plan, execute, and update over time. According to the Anthropic team, they use it in Claude Code and report that it focuses more on the most difficult parts of the work, handles abstract problems with better judgment, and remains productive for longer periods of time.

The model tends to think more deeply and revisits their reasoning before responding. This improves performance on difficult problems but can increase cost and delay on simple ones. Anthropic reveals a /effort a parameter with 4 levels – low, medium, high (automatic), and maximum – so that developers can clearly trade off the depth of thinking against the speed and the cost in each area or use.

Beyond coding, Opus 4.6 addresses practical information tasks:

  • conducting financial analysis
  • conducting research by retrieving and browsing
  • use and create documents, spreadsheets, and presentations

Within Cowork, Anthropic’s autonomous workspace, the model can implement a multi-step workflow that integrates these artifacts without continuous human instruction.

Long context capabilities and developer controls

Opus 4.6 is the first Opus-class model with a 1M token context window in beta. For more than 200k tokens in this 1M content mode, the price goes up to $10 for 1M input tokens and $37.50 for 1M output tokens. The model supports up to 128k output tokens, which is enough for very long reports, code reviews, or structured editing of multiple files in a single response.

To make long-running agents more manageable, Anthropic is shipping several platform features around Opus 4.6:

  • Adaptive thinking: the model can decide when to use extended thinking based on the complexity of the task and the context, instead of always working with greater depth of thinking.
  • Effort controls: 4 different effort levels (low, medium, high, high) reveal a clean control area of ​​latency versus imaging quality.
  • Content integration (beta): the platform automatically summarizes and replaces old parts of the conversation as the configurable context limit is approached, reducing the need for a custom termination concept.
  • US view only: workloads that have to stay in US states can work with 1.1× token values.

These controls target a common real-world pattern: an agent workflow that accumulates hundreds of thousands of tokens while interacting with tools, documents, and code in multiple steps.

Product integration: Claude Code, Excel, and PowerPoint

Anthropic has enhanced its product stack so that Opus 4.6 can drive realistic workflows for developers and analysts.

In Claude Code, a new ‘agent groups’ mode (research preview) allows users to create multiple agents that work in parallel and automatically coordinate. This is intended for heavy tasks such as codebase updates. Each sub-agent can be taken in conjunction, including with tmuxequivalent to a centralized engineering workflow.

Claude in Excel now edits before taking action, can import unstructured data and infer structure, and can apply multi-step transformations in one pass. When paired with Claude in PowerPoint, users can go from raw data in Excel to structured, unbranded slide decks. The model learns layouts, fonts, and slide masters so that the generated decks stay consistent with existing templates. Claude at PowerPoint is currently in research previews for the Max, Team, and Enterprise plans.

Benchmark profile: coding, searching, retrieval of long content

The Anthropic team ranks Opus 4.6 as state of the art in several key external benchmarks for coding agents, search agents, and professional decision support.

Key results include:

  • GDPval-AA (economically important knowledge work in finance, law, and related fields): Opus 4.6 outperforms OpenAI’s GPT-5.2 with 144 Elo points and Claude Opus 4.5 with 190 points. This means that, in a head-to-head comparison, Opus 4.6 beats GPT-5.2 in this test about 70% of the time.
  • Terminal-Bench 2.0: Opus 4.6 achieves the highest reported score in this agent coding and system performance benchmark.
  • Humanity’s Final Test: in this multidisciplinary thought test with tools (web search, coding, etc.), Opus 4.6 leads other frontier models, including GPT-5.2 and Gemini 3 Pro, under the written harness.
  • BrowseComp: Opus 4.6 performs better than any other model in this agent search benchmark. When Claude’s models are combined with a multi-agent harness, the score increases to 86.8%.

Long content retrieval is a moderate improvement. On the 8-needle 1M variant of MRCR v2 — a ‘needle-in-a-haystack’ benchmark where facts are hidden within 1M tokens of text — Opus 4.6 scores 76%, compared to 18.5% for Claude Sonnet 4.5. Anthropic defines this as a qualitative variable in how well the model can use context without context decay.

Additional benefits of working in:

  • root cause analysis in complex software failures
  • writing in many languages
  • compliance with long-term planning
  • cybersecurity operations
  • life sciences, where Opus 4.6 performs almost 2× better than Opus 4.5 in computational biology, structural biology, organic chemistry, and phylogenetics tests

In Vending-Bench 2, a long-horizon economic performance benchmark, the Opus 4.6 earns $3,050.53 more than the Opus 4.5 under the reported setup.

Key Takeaways

  • Opus 4.6 is the ultimate Anthropic mod with 1M token core (beta): Supports 1M input tokens and up to 128k output tokens, with a premium price of over 200k tokens, making it suitable for very long codebases, scripts, and multi-step agent workflows.
  • Obvious controls for depth of thought and cost with effort and flexible thinking: Developers can tune /effort (low, medium, high, high) and let ‘flexible thinking’ decide when extended thinking is needed, revealing a clear delay versus accuracy versus cost trade-off for different routes and functions.
  • Strong benchmark performance in coding, search, and economic value operations: Opus 4.6 leads in GDPval-AA, Terminal-Bench 2.0, Humanity’s Last Exam, BrowseComp, and MRCR v2 1M, with major advantages over Claude Opus 4.5 and GPT class base in long context and improved tool thinking.
  • Tight integration with Claude code, Excel, and PowerPoint for real workloads: Agent groups in Claude Code, systematic conversion of Excel, and PowerPoint production position that recognizes the template Opus 4.6 as the core of practical engineering and analyst workflow, not just to discuss.

Check it out Technical details and documentation. Also, feel free to follow us Twitter and don’t forget to join our 100k+ ML SubReddit and Subscribe to Our newspaper. Wait! are you on telegram? now you can join us on telegram too.


Max is an AI analyst at MarkTechPost, based in Silicon Valley, who is actively shaping the future of technology. He teaches robots at Brainvyne, fights spam with ComplyEmail, and uses AI every day to translate complex technological advances into clear and understandable information.

Related Articles

Leave a Reply

Your email address will not be published. Required fields are marked *

Back to top button