MiniMax M3 vs GPT-5.5 and Gemini 3.1 Pro: 1M Context and Native-Multimodality

M3 is very competitive to GPT-5.5 and Gemini 3.1 Pro, it uses MiniMax Sparse Attention (MSA) to make long context practical.

The numbers are interesting but the architecture is more interesting.

Personally I think that there is finally a cheap model with vision, long context, and coding-agent capability but also can't help to ask "Is this benchmark-optimized but weaker in real use? Will it run locally ortoo large for local hardware?"

M3 is a strong signal that the agent stack is moving from closed frontier APIs toward hybrid, cheaper, multimodal, long-context systems that engineering teams can actually route, evaluate, and eventually self-host.

Here's independent DeepSWE run:

Let's walk through what shipped, what did not ship yet, how to start using it, and what an M3-based agent workflow should look like if you care about reliability.

If you started to replace GPT or Gemini models with MiniMax, Kimi or GLM variants, I'd love to hear your experience in the comments too.

The 30-second brief

MiniMax M3 was released on June 1, 2026 and API is available now.

You can call it through:

OpenAI-compatible Chat Completions at https://api.minimax.io/v1
Anthropic-compatible Messages at https://api.minimax.io/anthropic
Claude Code-style tools
OpenCode
Cursor, Cline, Roo Code, TRAE, and other tools that accept custom OpenAI-compatible or Anthropic-compatible endpoints

The model ID is:

MiniMax-M3

MiniMax-M3

OpenAI-compatible Chat Completions endpoint supports text, image, video, and tool-call content for M3.

Pricing has a 512K input-token boundary: calls at or below 512K input tokens are billed at the standard rate, and calls above 512K are billed at a higher long-context rate.

MiniMax's launch lists Token Plan tiers of Plus at $20/month for about 1.7B tokens, Max at $50/month for about 5.1B tokens, and Ultra at $120/month for about 9.8B tokens:

Here are the available pieces today:

The most important technical detail is MSA

The big architectural idea in M3 is MiniMax Sparse Attention.

Full attention has the familiar long-context problem: cost grows badly as sequence length increases. For simple product usage, this is annoying. For agents, it is structural.

Agent traces are long by default.

With a normal short-context model, you summarize aggressively and lose detail
With a naive long-context model, you keep everything and pay for it
With a weak retrieval layer, you select the wrong files and the model confidently edits the wrong abstraction

MSA addresses context scaling at the attention layer, it partitions KV into blocks more precisely than approaches like DSA and MoBA, uses a "KV outer gather Q" operator strategy, reads each block once with contiguous memory access, and is more than 4x faster than open-source Flash-Sparse-Attention and flash-moba under M3's head configuration

However, before long-context efficiency, you had to build small-context agents around aggressive retrieval. The agent would search, select snippets, summarize, and operate inside a small working set.

That can work, but it creates failure modes: bad retrieval, missing invariants, stale summaries, and hallucinated file relationships.

With a usable 1M context, you can give the model more raw evidence but if you simply dump everything into the prompt, you create a different failure mode: scope drift.

The model sees too much, reasons over irrelevant details, and burns tokens.

The right architecture is hybrid:

Use retrieval to build a high-signal context bundle.
Use long context for files, logs, specs, and traces that truly need to remain lossless.
Preserve tool-call history only when it affects the next decision.
Summarize low-value history, not high-value evidence.
Keep a hard token budget per task stage.

The slogan should be:

1M context does not replace context engineering. It raises the ceiling for context engineering.

Benchmarks are useful, but scaffolding is the hidden variable

M3 is at 59.0% on SWE-Bench Pro, above GPT-5.5 and Gemini 3.1 Pro and near Opus 4.7 in its published comparison.

M3 also surpasses Opus 4.7 on SVG-Bench and scores above Gemini 3.1 Pro on OmniDocBench.

Those numbers are worth paying attention to, but the methodology section is even more important.

SWE-Bench Verified and SWE-Bench Pro were tested on internal infrastructure using Claude Code as scaffolding, with the default system prompt overridden
Terminal-Bench 2.1 used Terminus 2 as scaffolding
SWE Atlas-Codebase QNA used Mini-SWE-Agent for some models
NL2Repo used Claude Code scaffolding for Claude Opus, MiniMax-M2.7, M3, and Gemini 3.1 Pro
GPT-5.5 used Codex scaffolding
OfficeQA Pro used Claude Code scaffolding

So you should not interpret benchmark scores as pure model intelligence because they measure a combination of:

base model capability
system prompt quality
tool schema design
file access strategy
sandbox reliability
retry logic
benchmark harness assumptions
evaluator behavior
timeout limits
max output token settings
context truncation policy

That is why two teams can use the same model and get very different production behavior.

A weak harness can make a strong model look sloppy and a strong harness can make a cheaper model feel much closer to frontier quality.

This is exactly why Claude Code feels better than "Claude in a text box".

Where M3 fits in a serious agent stack

M3 is best viewed as a high-capability worker model for agentic tasks that need a combination of code understanding, long context, and visual grounding.

Good fits:

Bad fits without extra controls:

The pragmatic stack is a router:

You can think of agents are distributed systems and model routing is load balancing for cognition.

A minimal M3 agent harness

The most useful developer exercise is to build a small harness that makes the model operate through tools.

Do not start with a huge framework.

Start with four tools:

list_files
read_file
search_repo
run_tests

Then add write_file only after you have guardrails.

The goal is to make the model's decisions observable.

Here is a compact Python harness using the OpenAI-compatible API. It is intentionally narrow. It only reads files, searches the repo, and runs tests. It does not write code automatically. You can add patch application later after you trust the traces.

# mini_m3_agent.py
# Minimal read/search/test harness for MiniMax-M3.
# Requires: pip install openai

from __future__ import annotations

import json
import os
import subprocess
from pathlib import Path
from typing import Any

from openai import OpenAI

ROOT = Path(os.environ.get("AGENT_REPO", ".")).resolve()
MAX_FILE_CHARS = 20_000
MAX_TOOL_OUTPUT_CHARS = 12_000
MAX_STEPS = 8

client = OpenAI(
    base_url=os.environ.get("OPENAI_BASE_URL", "https://api.minimax.io/v1"),
    api_key=os.environ["MINIMAX_API_KEY"],
)

def safe_path(path: str) -> Path:
    candidate = (ROOT / path).resolve()
    if not str(candidate).startswith(str(ROOT)):
        raise ValueError(f"Path escapes repo root: {path}")
    return candidate

def truncate(text: str, limit: int = MAX_TOOL_OUTPUT_CHARS) -> str:
    if len(text) <= limit:
        return text
    return text[:limit] + f"\n\n[truncated: {len(text) - limit} chars omitted]"

def list_files(pattern: str = "") -> str:
    cmd = ["git", "ls-files"]
    result = subprocess.run(cmd, cwd=ROOT, capture_output=True, text=True, timeout=10)
    if result.returncode != 0:
        return truncate(result.stderr)

    files = result.stdout.splitlines()
    if pattern:
        files = [f for f in files if pattern.lower() in f.lower()]
    return truncate("\n".join(files), 30_000)

def read_file(path: str) -> str:
    p = safe_path(path)
    if not p.exists() or not p.is_file():
        return f"File not found: {path}"
    return truncate(p.read_text(errors="replace"), MAX_FILE_CHARS)

def search_repo(query: str) -> str:
    result = subprocess.run(
        ["rg", "-n", "--hidden", "--glob", "!node_modules", "--glob", "!.git", query],
        cwd=ROOT,
        capture_output=True,
        text=True,
        timeout=20,
    )
    output = result.stdout if result.stdout else result.stderr
    return truncate(output)

def run_tests(command: str = "pytest -q") -> str:
    # Keep this constrained. In production, use an allowlist instead of raw commands.
    allowed_prefixes = ["pytest", "npm test", "pnpm test", "go test", "cargo test"]
    if not any(command.startswith(prefix) for prefix in allowed_prefixes):
        return f"Rejected command: {command}. Allowed prefixes: {allowed_prefixes}"

    result = subprocess.run(
        command,
        cwd=ROOT,
        shell=True,
        capture_output=True,
        text=True,
        timeout=120,
    )
    return truncate(
        f"exit_code={result.returncode}\n\nSTDOUT:\n{result.stdout}\n\nSTDERR:\n{result.stderr}"
    )

TOOL_IMPL = {
    "list_files": list_files,
    "read_file": read_file,
    "search_repo": search_repo,
    "run_tests": run_tests,
}

TOOLS: list[dict[str, Any]] = [
    {
        "type": "function",
        "function": {
            "name": "list_files",
            "description": "List tracked repository files. Optionally filter by substring.",
            "parameters": {
                "type": "object",
                "properties": {"pattern": {"type": "string"}},
                "required": [],
            },
        },
    },
    {
        "type": "function",
        "function": {
            "name": "read_file",
            "description": "Read a file from the repository root.",
            "parameters": {
                "type": "object",
                "properties": {"path": {"type": "string"}},
                "required": ["path"],
            },
        },
    },
    {
        "type": "function",
        "function": {
            "name": "search_repo",
            "description": "Search the repository using ripgrep.",
            "parameters": {
                "type": "object",
                "properties": {"query": {"type": "string"}},
                "required": ["query"],
            },
        },
    },
    {
        "type": "function",
        "function": {
            "name": "run_tests",
            "description": "Run a constrained test command and return stdout/stderr.",
            "parameters": {
                "type": "object",
                "properties": {"command": {"type": "string", "default": "pytest -q"}},
                "required": [],
            },
        },
    },
]

def run_agent(user_task: str) -> str:
    messages: list[dict[str, Any]] = [
        {
            "role": "system",
            "content": (
                "You are a senior software engineer operating through tools. "
                "Do not guess file contents. Inspect before concluding. "
                "Prefer small, reversible changes. "
                "When you have enough evidence, return a concise plan and the exact files likely needing edits."
            ),
        },
        {"role": "user", "content": user_task},
    ]

    for step in range(MAX_STEPS):
        response = client.chat.completions.create(
            model="MiniMax-M3",
            messages=messages,
            tools=TOOLS,
            max_completion_tokens=4096,
            temperature=0.3,
            extra_body={"thinking": {"type": "adaptive"}, "reasoning_split": True},
        )

        msg = response.choices[0].message

        # MiniMax docs note that full assistant messages should be preserved in multi-turn tool conversations.
        messages.append(msg.model_dump(exclude_none=True))

        if not msg.tool_calls:
            return msg.content or ""

        for call in msg.tool_calls:
            name = call.function.name
            args = json.loads(call.function.arguments or "{}")

            if name not in TOOL_IMPL:
                tool_result = f"Unknown tool: {name}"
            else:
                try:
                    tool_result = TOOL_IMPL[name](**args)
                except Exception as exc:
                    tool_result = f"Tool error: {type(exc).__name__}: {exc}"

            messages.append(
                {
                    "role": "tool",
                    "tool_call_id": call.id,
                    "name": name,
                    "content": tool_result,
                }
            )

    return "Stopped: max agent steps reached. Review the trace and narrow the task."

if __name__ == "__main__":
    import sys

    task = " ".join(sys.argv[1:]) or "Find the most likely cause of the failing tests."
    print(run_agent(task))

# mini_m3_agent.py
# Minimal read/search/test harness for MiniMax-M3.
# Requires: pip install openai

from __future__ import annotations

import json
import os
import subprocess
from pathlib import Path
from typing import Any

from openai import OpenAI

ROOT = Path(os.environ.get("AGENT_REPO", ".")).resolve()
MAX_FILE_CHARS = 20_000
MAX_TOOL_OUTPUT_CHARS = 12_000
MAX_STEPS = 8

client = OpenAI(
    base_url=os.environ.get("OPENAI_BASE_URL", "https://api.minimax.io/v1"),
    api_key=os.environ["MINIMAX_API_KEY"],
)

def safe_path(path: str) -> Path:
    candidate = (ROOT / path).resolve()
    if not str(candidate).startswith(str(ROOT)):
        raise ValueError(f"Path escapes repo root: {path}")
    return candidate

def truncate(text: str, limit: int = MAX_TOOL_OUTPUT_CHARS) -> str:
    if len(text) <= limit:
        return text
    return text[:limit] + f"\n\n[truncated: {len(text) - limit} chars omitted]"

def list_files(pattern: str = "") -> str:
    cmd = ["git", "ls-files"]
    result = subprocess.run(cmd, cwd=ROOT, capture_output=True, text=True, timeout=10)
    if result.returncode != 0:
        return truncate(result.stderr)

    files = result.stdout.splitlines()
    if pattern:
        files = [f for f in files if pattern.lower() in f.lower()]
    return truncate("\n".join(files), 30_000)

def read_file(path: str) -> str:
    p = safe_path(path)
    if not p.exists() or not p.is_file():
        return f"File not found: {path}"
    return truncate(p.read_text(errors="replace"), MAX_FILE_CHARS)

def search_repo(query: str) -> str:
    result = subprocess.run(
        ["rg", "-n", "--hidden", "--glob", "!node_modules", "--glob", "!.git", query],
        cwd=ROOT,
        capture_output=True,
        text=True,
        timeout=20,
    )
    output = result.stdout if result.stdout else result.stderr
    return truncate(output)

def run_tests(command: str = "pytest -q") -> str:
    # Keep this constrained. In production, use an allowlist instead of raw commands.
    allowed_prefixes = ["pytest", "npm test", "pnpm test", "go test", "cargo test"]
    if not any(command.startswith(prefix) for prefix in allowed_prefixes):
        return f"Rejected command: {command}. Allowed prefixes: {allowed_prefixes}"

    result = subprocess.run(
        command,
        cwd=ROOT,
        shell=True,
        capture_output=True,
        text=True,
        timeout=120,
    )
    return truncate(
        f"exit_code={result.returncode}\n\nSTDOUT:\n{result.stdout}\n\nSTDERR:\n{result.stderr}"
    )

TOOL_IMPL = {
    "list_files": list_files,
    "read_file": read_file,
    "search_repo": search_repo,
    "run_tests": run_tests,
}

TOOLS: list[dict[str, Any]] = [
    {
        "type": "function",
        "function": {
            "name": "list_files",
            "description": "List tracked repository files. Optionally filter by substring.",
            "parameters": {
                "type": "object",
                "properties": {"pattern": {"type": "string"}},
                "required": [],
            },
        },
    },
    {
        "type": "function",
        "function": {
            "name": "read_file",
            "description": "Read a file from the repository root.",
            "parameters": {
                "type": "object",
                "properties": {"path": {"type": "string"}},
                "required": ["path"],
            },
        },
    },
    {
        "type": "function",
        "function": {
            "name": "search_repo",
            "description": "Search the repository using ripgrep.",
            "parameters": {
                "type": "object",
                "properties": {"query": {"type": "string"}},
                "required": ["query"],
            },
        },
    },
    {
        "type": "function",
        "function": {
            "name": "run_tests",
            "description": "Run a constrained test command and return stdout/stderr.",
            "parameters": {
                "type": "object",
                "properties": {"command": {"type": "string", "default": "pytest -q"}},
                "required": [],
            },
        },
    },
]

def run_agent(user_task: str) -> str:
    messages: list[dict[str, Any]] = [
        {
            "role": "system",
            "content": (
                "You are a senior software engineer operating through tools. "
                "Do not guess file contents. Inspect before concluding. "
                "Prefer small, reversible changes. "
                "When you have enough evidence, return a concise plan and the exact files likely needing edits."
            ),
        },
        {"role": "user", "content": user_task},
    ]

    for step in range(MAX_STEPS):
        response = client.chat.completions.create(
            model="MiniMax-M3",
            messages=messages,
            tools=TOOLS,
            max_completion_tokens=4096,
            temperature=0.3,
            extra_body={"thinking": {"type": "adaptive"}, "reasoning_split": True},
        )

        msg = response.choices[0].message

        # MiniMax docs note that full assistant messages should be preserved in multi-turn tool conversations.
        messages.append(msg.model_dump(exclude_none=True))

        if not msg.tool_calls:
            return msg.content or ""

        for call in msg.tool_calls:
            name = call.function.name
            args = json.loads(call.function.arguments or "{}")

            if name not in TOOL_IMPL:
                tool_result = f"Unknown tool: {name}"
            else:
                try:
                    tool_result = TOOL_IMPL[name](**args)
                except Exception as exc:
                    tool_result = f"Tool error: {type(exc).__name__}: {exc}"

            messages.append(
                {
                    "role": "tool",
                    "tool_call_id": call.id,
                    "name": name,
                    "content": tool_result,
                }
            )

    return "Stopped: max agent steps reached. Review the trace and narrow the task."

if __name__ == "__main__":
    import sys

    task = " ".join(sys.argv[1:]) or "Find the most likely cause of the failing tests."
    print(run_agent(task))

Run it:

export MINIMAX_API_KEY="..."
export OPENAI_BASE_URL="https://api.minimax.io/v1"
export AGENT_REPO="/path/to/your/repo"

python mini_m3_agent.py "Investigate why the auth tests fail after the session middleware refactor."

export MINIMAX_API_KEY="..."
export OPENAI_BASE_URL="https://api.minimax.io/v1"
export AGENT_REPO="/path/to/your/repo"

python mini_m3_agent.py "Investigate why the auth tests fail after the session middleware refactor."

I'm thinking to cover latest open-source models in June, let me what you would like to see, and happy building!

Bonus Articles

Nemotron 3 Ultra Is Agent Factory NVIDIA's Nemotron 3 Ultra announcement is easy to misread.

DeepSWE: Both Claude Opus 4.6 and 4.7 registered CHEATED on more than 12% of reviewed SWE-Bench Pro DeepSWE: Both Claude Opus 4.6 and 4.7 registered CHEATED on more than 12% of reviewed SWE-Bench Pro For months, on…

M5 MacBook Pro or NVIDIA DGX Spark or RTX PRO 6000? Before answering this in depth, please think about the following question:

48GB VRAM: Local Coding Agents Most developers asking about local AI hardware still frame the question in model-size terms.

Qwen3.7-Max Lands Near Opus 4.7 and GPT-5.5: New Daily Driver for Devs? Qwen3.7-Max Lands Near Opus 4.7 and GPT-5.5: New Daily Driver for Devs? Everybody is super excited about upcoming…