MiniMax Releases MMX-CLI: A Command-Line Interface That Gives AI Agents Native Access to Image, Video, Speech, Music, Vision, and Search

MiniMax, the AI research company behind the MiniMax omni-modal model stack, has released MMX-CLI – a command line interface based on Node.js that exposes the productivity capabilities of the MiniMax AI platform, both to human engineers working at the airport and to AI agents using tools like Cursor, Claude Code, and OpenCode.
What Problem Does MMX-CLI Solve?
Most of the agents based on the large language model (LLM) today are capable of reading and writing text. They can consult documents, generate code, and respond to multi-variable instructions. But they don’t have a direct way to generate media — there’s no built-in way to synthesize speech, compose music, create video, or understand an image without a separate layer of integration like the Model Context Protocol (MCP).
Building that integration often requires writing custom API wrappers, configuring a server-side tool, and handling authentication differently for whatever agent framework you’re using. MMX-CLI is positioned as an alternative: expose all those capabilities as shell commands that the agent can request directly, the same way a developer can from a terminal – without zero MCP glue required.
The Seven Ways
MMX-CLI wraps the full MiniMax mode stack into seven production groups – mmx text, mmx image, mmx video, mmx speech, mmx music, mmx visionagain mmx search – and supporting services (mmx auth, mmx config, mmx quota, mmx update).
- I
mmx textcommand supports multi-variable dialog, streamed output, system prompt, and JSON output mode. Accept a--modelflag to target a specific MiniMax model variant such asMiniMax-M2.7-highspeedwithMiniMax-M2.7as default. - I
mmx imagecommand generates images from text input with aspect ratio controls (--aspect-ratio) and bulk calculation (--n). It also supports a--subject-reftitle reference parameter, which enables consistency of character or object across multiple generated images – useful for workflows that require visual continuity. - I
mmx videocommand usingMiniMax-Hailuo-2.3as its default model, withMiniMax-Hailuo-2.3-Fastavailable as an alternative. By mistake,mmx video generatesends work and surveys in parallel until the video is ready. Passing by--asyncor--no-waitchanges this behavior: the command returns the function ID immediately, allowing the caller to check the progress separately using themmx video task get --task-id. The command also supports ia--first-frameflag for producing a video with image mode, where a specific image is used as the opening frame of the output video. - I
mmx speechcommand reveals text-to-speech (TTS) integration with more than 30 voices available, speed control, volume and pitch adjustment, subtitles extracting time data--subtitlesand playback support for pipe streaming to a media player. The default model isspeech-2.8-hdwithspeech-2.6againspeech-02like other methods. Input is 10,000 characters. - I
mmx musicorder, supported bymusic-2.5model, generates music in text recognition with sophisticated compositional controls including--vocals(eg"warm male baritone"),--genre,--mood,--instruments,--tempo,--bpm,--keyagain--structure. I--instrumentalthe flag produces music without words. An--aigc-watermarkflag is also available to embed an AI-generated content watermark in the output audio. mmx visionhandles image comprehension with a visual language model (VLM). Accepts a local file path or remote URL — base64-encoding local files — or a preloaded MiniMax file ID. A--promptthe flag allows you to ask a specific question about the image; The default prompt is"Describe the image."mmx searchexecutes a web search query using the MiniMax search infrastructure and returns results in text or JSON format.
Technical Architecture
MMX-CLI is written almost entirely in TypeScript (99.8% TS) with strict mode enabled, and uses Bun as a native runtime for development and testing while being distributed on npm for compatibility with Node.js 18+ environments. Configuration schema validation uses Zod, and configuration follows the defined order — CLI flags → environment variables → ~/.mmx/config.json → automatic — to enable deployment directly to a containerized or CI environment. Dual-region support is built into the client API layer, looking for Global users api.minimax.io and CN users to api.minimaxi.comcan be replaced by mmx config set --key region --value cn.
Key Takeaways
- MMX-CLI is the official open command line interface for MiniMax that gives AI agents native access to seven productivity modes – text, image, video, speech, music, vision, and search – without requiring MCP integration.
- AI agents using tools like Cursor, Claude Code, and OpenCode can be set up with two commands and one natural language command, after which the agent learns the complete command to interact with itself from the SKILL.md stack.
- The CLI is designed for program and agent use, with dedicated flags for non-interactive operations, clean stdout/stderr separation for safe piping, structured error-handling exit codes, and a schema export feature that allows agent frameworks to register mmx commands as JSON tool definitions.
- For AI devs already building agent-based systems, it lowers the integration barrier significantly by combining image, video, speech, music, vision, and search generation into a single, well-documented CLI that agents can learn and operate on their own.
Check out Repo here. Also, feel free to follow us Twitter and don’t forget to join our 130k+ ML SubReddit and Subscribe to Our newspaper. Wait! are you on telegram? now you can join us on telegram too.
Need to work with us on developing your GitHub Repo OR Hug Face Page OR Product Release OR Webinar etc.? contact us
Shobha is a data analyst with a proven track record of developing machine learning solutions that drive business value.



