MiniMax Just Open Sourced MiniMax M2.7: An Automated Agent Model That Gets 56.22% in SWE-Pro and 57.0% in Terminal Bench 2

admin April 12, 2026

0 0 5 minutes read

MiniMax Just Open Sourced MiniMax M2.7: An Automated Agent Model That Gets 56.22% in SWE-Pro and 57.0% in Terminal Bench 2

MiniMax has officially opened the MiniMax M2.7, making the model weights publicly available on Hugging Face. First announced on March 18, 2026, MiniMax M2.7 is MiniMax’s most powerful open source model to date — and its first model to actively participate in its development cycle, a logical change in the way large language models are built and iterated.

What is MiniMax M2.7?

The MiniMax M2.7 is part of the Mixture-of-Experts (MoE) MiniMax M2 series. MoE is an architectural design where only a subset of the total parameters are ‘active’ during any inference pass, making the model much faster and cheaper to render compared to a denser model of the same output quality.

MiniMax M2.7 is built around three main areas: professional software engineering, professional office work, and what MiniMax calls Agent Teams – native multi-agent collaboration. MiniMax M2.7 is able to build complex agent harnesses and complete detailed production tasks, supporting capabilities such as Agent Groups, complex Skills, and dynamic tool searches.

SOTA Benchmark Performance: SWE-Pro and Terminal 2 Benchmark

In SWE-Pro, which includes multiple programming languages, MiniMax M2.7 achieved an accuracy rate of 56.22%, which matches GPT-5.3-Codex. SWE-Pro’s long-term tasks of log analysis, troubleshooting, code security reviews, and debugging machine learning workflows – are much closer to the dirty reality of production systems than standard algorithmic coding testing.

In the Terminal 2 (57.0%) and NL2Repo (39.8%) benchmarks, both of which require a high level of system-level understanding, the MiniMax M2.7 performs strongly. The model is not only successful in code generation but can also deeply understand the functional reasoning and interaction capabilities of software systems.

In the VIBE-Pro repo-level code generation benchmark, MiniMax M2.7 scored 55.6%, almost on par with Opus 4.6 — meaning that whether the requirement involves Web, Android, iOS, or simulation tasks, it can be directly assigned to MiniMax M2.7 to complete. It also shows a strong advantage in benchmarks close to real-world engineering situations: SWE Multilingual (76.5) and Multi SWE Bench (52.7).

Manufacturing Debugging: Less Than Three Minutes

When faced with alerts in production, MiniMax M2.7 can correlate monitoring metrics with deployment timelines to make causal inferences, run statistical analysis on trace samples and suggest accurate hypotheses, continuously connect to the database to verify causes, identify missing index migration files in the code repository, and send a request to stop non-blocking bleeding. The MiniMax team reports that in most cases, this has reduced the recovery time for live production system incidents to less than three minutes. From visual analysis and database information to decision making at the SRE level, this positions the MiniMax M2.7 as more than just a code generation model.

Self-Evolution Architecture

To test the limits of independent development, MiniMax M2.7 was tasked with improving the performance of model systems in the internal framework. It worked automatically, using an iterative loop of ‘analyze failure modes → design changes → change scaffold code → run analysis → compare results → decide to keep or revert changes’ over 100 cycles. During this process, the MiniMax M2.7 found a successful configuration on its own: systematically searching for the right combination of sampling parameters such as temperature, frequency penalty, and presence penalty; designing specific workflow guidelines (such as automatically searching for the same bug pattern in other files after correction); and adding loop detection to the scaffolding agent loop. This achieved a 30% performance improvement on internal test sets.

Within the workflow of the MiniMax reinforcement learning group, M2.7 is now able to handle 30%–50% of the end-to-end workflow, with human researchers only participating to make critical decisions and discussions.

MLE Bench Lite: Tests Automated ML Benchmarks

The MiniMax team also tested the MiniMax M2.7 on the MLE Bench Lite, an open AI benchmark for 22 machine learning benchmarks run on a single A30 GPU, covering almost every stage of the ML workflow.

For this experiment, the MiniMax team designed a simple three-part harness: short-term memory, your feedback, and self-regulation. After each iteration cycle, the agent generates a short-term memory tag file, critiques itself on the current results, and provides configuration directions for the next cycle. Three experiments were performed, each with a 24-hour window of repeated evolution.

A great run won 9 gold medals, 5 silver medals, and 1 bronze medal. The average medal rate for all three runs was 66.6%, the second result was Opus-4.6 (75.7%) and GPT-5.4 (71.2%), combined with Gemini-3.1 (66.6%).

Technical and Financial Office Work

Besides software engineering, the MiniMax M2.7 is aimed at professional office tasks. In the GDPval-AA test, which measures the domain expertise and performance of all 45 models, the MiniMax M2.7 achieved an ELO score of 1495 – the highest among open source models, second only to Opus 4.6, Sonnet 4.6, and GPT-5.4, and surpassing GPT-5.3.

In the Toolathon, the MiniMax M2.7 achieved an accuracy of 46.3%, reaching the world’s highest level. In the MM Claw test — testing a MiniMax built based on real-world usage patterns from the OpenClaw personal agent platform — the MiniMax M2.7 maintained a skill compliance rate of 97% across 40 complex skills (each over 2,000 tokens) and achieved an overall accuracy of 62.7%.

In finance, MiniMax M2.7 can automatically read company annual reports and transcripts of earnings calls, multi-reference research reports, independently design ideas and create an income forecast model, and generate PPT and Word research reports based on templates – understanding, decision making, and output generation as a small analyst.

Key Takeaways

MiniMax M2.7 is now officially open sourcewith weights found in Hugging Face, making a frontier-grade agent model freely accessible for developers to use and build upon.
MiniMax M2.7 achieves SOTA performance in real-world software engineering benchmarksit scored 56.22% on SWE-Pro (similar to GPT-5.3-Codex) and 57.0% on Terminal Bench 2 — tests that measure production-level thinking, not just code production.
MiniMax The M2.7 is the first model to actively participate in its developmentuses 100 independent rounds of scaffold development and achieves a 30% performance improvement – an early, concrete example of AI-assisted development in practice.
The model is designed for real agent deploymentmaintaining 97% skill adherence across 40 complex skills (each exceeding 2,000 tokens), supporting native Agent Teams with stable role boundaries, and handling 30–50% of MiniMax’s internal RL team flows automatically.
MiniMax M2.7 is an open source model with the highest level in GDPval-AA with an ELO score of 1495 across 45 models, demonstrating strong work skills that include organizing office documents, financial analysis, and delivering high-quality multi-tasking tasks.

Check out Technical details again Model weight. Also, feel free to follow us Twitter and don’t forget to join our 130k+ ML SubReddit and Subscribe to Our newspaper. Wait! are you on telegram? now you can join us on telegram too.

Need to work with us on developing your GitHub Repo OR Hug Face Page OR Product Release OR Webinar etc.? contact us

admin April 12, 2026

0 0 5 minutes read