June 2, 2026
MiniMax M3 vs GPT-5.5 and Gemini 3.1 Pro: 1M Context and Native-Multimodality
M3 ombines three capabilities that agentic software products have been waiting to get in one stack:
Agent Native
7 min read
M3 is very competitive to GPT-5.5 and Gemini 3.1 Pro, it uses MiniMax Sparse Attention (MSA) to make long context practical.
The numbers are interesting but the architecture is more interesting.
Personally I think that there is finally a cheap model with vision, long context, and coding-agent capability but also can't help to ask "Is this benchmark-optimized but weaker in real use? Will it run locally ortoo large for local hardware?"
M3 is a strong signal that the agent stack is moving from closed frontier APIs toward hybrid, cheaper, multimodal, long-context systems that engineering teams can actually route, evaluate, and eventually self-host.
Here's independent DeepSWE run:
Let's walk through what shipped, what did not ship yet, how to start using it, and what an M3-based agent workflow should look like if you care about reliability.
If you started to replace GPT or Gemini models with MiniMax, Kimi or GLM variants, I'd love to hear your experience in the comments too.
The 30-second brief
MiniMax M3 was released on June 1, 2026 and API is available now.
You can call it through:
- OpenAI-compatible Chat Completions at https://api.minimax.io/v1
- Anthropic-compatible Messages at https://api.minimax.io/anthropic
- Claude Code-style tools
- OpenCode
- Cursor, Cline, Roo Code, TRAE, and other tools that accept custom OpenAI-compatible or Anthropic-compatible endpoints
The model ID is:
MiniMax-M3MiniMax-M3OpenAI-compatible Chat Completions endpoint supports text, image, video, and tool-call content for M3.
Pricing has a 512K input-token boundary: calls at or below 512K input tokens are billed at the standard rate, and calls above 512K are billed at a higher long-context rate.
MiniMax's launch lists Token Plan tiers of Plus at $20/month for about 1.7B tokens, Max at $50/month for about 5.1B tokens, and Ultra at $120/month for about 9.8B tokens:
Here are the available pieces today:
The most important technical detail is MSA
The big architectural idea in M3 is MiniMax Sparse Attention.
Full attention has the familiar long-context problem: cost grows badly as sequence length increases. For simple product usage, this is annoying. For agents, it is structural.
Agent traces are long by default.
- With a normal short-context model, you summarize aggressively and lose detail
- With a naive long-context model, you keep everything and pay for it
- With a weak retrieval layer, you select the wrong files and the model confidently edits the wrong abstraction
MSA addresses context scaling at the attention layer, it partitions KV into blocks more precisely than approaches like DSA and MoBA, uses a "KV outer gather Q" operator strategy, reads each block once with contiguous memory access, and is more than 4x faster than open-source Flash-Sparse-Attention and flash-moba under M3's head configuration
However, before long-context efficiency, you had to build small-context agents around aggressive retrieval. The agent would search, select snippets, summarize, and operate inside a small working set.
That can work, but it creates failure modes: bad retrieval, missing invariants, stale summaries, and hallucinated file relationships.
With a usable 1M context, you can give the model more raw evidence but if you simply dump everything into the prompt, you create a different failure mode: scope drift.
The model sees too much, reasons over irrelevant details, and burns tokens.
The right architecture is hybrid:
- Use retrieval to build a high-signal context bundle.
- Use long context for files, logs, specs, and traces that truly need to remain lossless.
- Preserve tool-call history only when it affects the next decision.
- Summarize low-value history, not high-value evidence.
- Keep a hard token budget per task stage.
The slogan should be:
1M context does not replace context engineering. It raises the ceiling for context engineering.
Benchmarks are useful, but scaffolding is the hidden variable
M3 is at 59.0% on SWE-Bench Pro, above GPT-5.5 and Gemini 3.1 Pro and near Opus 4.7 in its published comparison.
M3 also surpasses Opus 4.7 on SVG-Bench and scores above Gemini 3.1 Pro on OmniDocBench.
Those numbers are worth paying attention to, but the methodology section is even more important.
- SWE-Bench Verified and SWE-Bench Pro were tested on internal infrastructure using Claude Code as scaffolding, with the default system prompt overridden
- Terminal-Bench 2.1 used Terminus 2 as scaffolding
- SWE Atlas-Codebase QNA used Mini-SWE-Agent for some models
- NL2Repo used Claude Code scaffolding for Claude Opus, MiniMax-M2.7, M3, and Gemini 3.1 Pro
- GPT-5.5 used Codex scaffolding
- OfficeQA Pro used Claude Code scaffolding
So you should not interpret benchmark scores as pure model intelligence because they measure a combination of:
- base model capability
- system prompt quality
- tool schema design
- file access strategy
- sandbox reliability
- retry logic
- benchmark harness assumptions
- evaluator behavior
- timeout limits
- max output token settings
- context truncation policy
That is why two teams can use the same model and get very different production behavior.
A weak harness can make a strong model look sloppy and a strong harness can make a cheaper model feel much closer to frontier quality.
This is exactly why Claude Code feels better than "Claude in a text box".
Where M3 fits in a serious agent stack
M3 is best viewed as a high-capability worker model for agentic tasks that need a combination of code understanding, long context, and visual grounding.
Good fits:
Bad fits without extra controls:
The pragmatic stack is a router:
You can think of agents are distributed systems and model routing is load balancing for cognition.
A minimal M3 agent harness
The most useful developer exercise is to build a small harness that makes the model operate through tools.
Do not start with a huge framework.
Start with four tools:
list_filesread_filesearch_reporun_tests
Then add write_file only after you have guardrails.
The goal is to make the model's decisions observable.
Here is a compact Python harness using the OpenAI-compatible API. It is intentionally narrow. It only reads files, searches the repo, and runs tests. It does not write code automatically. You can add patch application later after you trust the traces.
# mini_m3_agent.py
# Minimal read/search/test harness for MiniMax-M3.
# Requires: pip install openai
from __future__ import annotations
import json
import os
import subprocess
from pathlib import Path
from typing import Any
from openai import OpenAI
ROOT = Path(os.environ.get("AGENT_REPO", ".")).resolve()
MAX_FILE_CHARS = 20_000
MAX_TOOL_OUTPUT_CHARS = 12_000
MAX_STEPS = 8
client = OpenAI(
base_url=os.environ.get("OPENAI_BASE_URL", "https://api.minimax.io/v1"),
api_key=os.environ["MINIMAX_API_KEY"],
)
def safe_path(path: str) -> Path:
candidate = (ROOT / path).resolve()
if not str(candidate).startswith(str(ROOT)):
raise ValueError(f"Path escapes repo root: {path}")
return candidate
def truncate(text: str, limit: int = MAX_TOOL_OUTPUT_CHARS) -> str:
if len(text) <= limit:
return text
return text[:limit] + f"\n\n[truncated: {len(text) - limit} chars omitted]"
def list_files(pattern: str = "") -> str:
cmd = ["git", "ls-files"]
result = subprocess.run(cmd, cwd=ROOT, capture_output=True, text=True, timeout=10)
if result.returncode != 0:
return truncate(result.stderr)
files = result.stdout.splitlines()
if pattern:
files = [f for f in files if pattern.lower() in f.lower()]
return truncate("\n".join(files), 30_000)
def read_file(path: str) -> str:
p = safe_path(path)
if not p.exists() or not p.is_file():
return f"File not found: {path}"
return truncate(p.read_text(errors="replace"), MAX_FILE_CHARS)
def search_repo(query: str) -> str:
result = subprocess.run(
["rg", "-n", "--hidden", "--glob", "!node_modules", "--glob", "!.git", query],
cwd=ROOT,
capture_output=True,
text=True,
timeout=20,
)
output = result.stdout if result.stdout else result.stderr
return truncate(output)
def run_tests(command: str = "pytest -q") -> str:
# Keep this constrained. In production, use an allowlist instead of raw commands.
allowed_prefixes = ["pytest", "npm test", "pnpm test", "go test", "cargo test"]
if not any(command.startswith(prefix) for prefix in allowed_prefixes):
return f"Rejected command: {command}. Allowed prefixes: {allowed_prefixes}"
result = subprocess.run(
command,
cwd=ROOT,
shell=True,
capture_output=True,
text=True,
timeout=120,
)
return truncate(
f"exit_code={result.returncode}\n\nSTDOUT:\n{result.stdout}\n\nSTDERR:\n{result.stderr}"
)
TOOL_IMPL = {
"list_files": list_files,
"read_file": read_file,
"search_repo": search_repo,
"run_tests": run_tests,
}
TOOLS: list[dict[str, Any]] = [
{
"type": "function",
"function": {
"name": "list_files",
"description": "List tracked repository files. Optionally filter by substring.",
"parameters": {
"type": "object",
"properties": {"pattern": {"type": "string"}},
"required": [],
},
},
},
{
"type": "function",
"function": {
"name": "read_file",
"description": "Read a file from the repository root.",
"parameters": {
"type": "object",
"properties": {"path": {"type": "string"}},
"required": ["path"],
},
},
},
{
"type": "function",
"function": {
"name": "search_repo",
"description": "Search the repository using ripgrep.",
"parameters": {
"type": "object",
"properties": {"query": {"type": "string"}},
"required": ["query"],
},
},
},
{
"type": "function",
"function": {
"name": "run_tests",
"description": "Run a constrained test command and return stdout/stderr.",
"parameters": {
"type": "object",
"properties": {"command": {"type": "string", "default": "pytest -q"}},
"required": [],
},
},
},
]
def run_agent(user_task: str) -> str:
messages: list[dict[str, Any]] = [
{
"role": "system",
"content": (
"You are a senior software engineer operating through tools. "
"Do not guess file contents. Inspect before concluding. "
"Prefer small, reversible changes. "
"When you have enough evidence, return a concise plan and the exact files likely needing edits."
),
},
{"role": "user", "content": user_task},
]
for step in range(MAX_STEPS):
response = client.chat.completions.create(
model="MiniMax-M3",
messages=messages,
tools=TOOLS,
max_completion_tokens=4096,
temperature=0.3,
extra_body={"thinking": {"type": "adaptive"}, "reasoning_split": True},
)
msg = response.choices[0].message
# MiniMax docs note that full assistant messages should be preserved in multi-turn tool conversations.
messages.append(msg.model_dump(exclude_none=True))
if not msg.tool_calls:
return msg.content or ""
for call in msg.tool_calls:
name = call.function.name
args = json.loads(call.function.arguments or "{}")
if name not in TOOL_IMPL:
tool_result = f"Unknown tool: {name}"
else:
try:
tool_result = TOOL_IMPL[name](**args)
except Exception as exc:
tool_result = f"Tool error: {type(exc).__name__}: {exc}"
messages.append(
{
"role": "tool",
"tool_call_id": call.id,
"name": name,
"content": tool_result,
}
)
return "Stopped: max agent steps reached. Review the trace and narrow the task."
if __name__ == "__main__":
import sys
task = " ".join(sys.argv[1:]) or "Find the most likely cause of the failing tests."
print(run_agent(task))# mini_m3_agent.py
# Minimal read/search/test harness for MiniMax-M3.
# Requires: pip install openai
from __future__ import annotations
import json
import os
import subprocess
from pathlib import Path
from typing import Any
from openai import OpenAI
ROOT = Path(os.environ.get("AGENT_REPO", ".")).resolve()
MAX_FILE_CHARS = 20_000
MAX_TOOL_OUTPUT_CHARS = 12_000
MAX_STEPS = 8
client = OpenAI(
base_url=os.environ.get("OPENAI_BASE_URL", "https://api.minimax.io/v1"),
api_key=os.environ["MINIMAX_API_KEY"],
)
def safe_path(path: str) -> Path:
candidate = (ROOT / path).resolve()
if not str(candidate).startswith(str(ROOT)):
raise ValueError(f"Path escapes repo root: {path}")
return candidate
def truncate(text: str, limit: int = MAX_TOOL_OUTPUT_CHARS) -> str:
if len(text) <= limit:
return text
return text[:limit] + f"\n\n[truncated: {len(text) - limit} chars omitted]"
def list_files(pattern: str = "") -> str:
cmd = ["git", "ls-files"]
result = subprocess.run(cmd, cwd=ROOT, capture_output=True, text=True, timeout=10)
if result.returncode != 0:
return truncate(result.stderr)
files = result.stdout.splitlines()
if pattern:
files = [f for f in files if pattern.lower() in f.lower()]
return truncate("\n".join(files), 30_000)
def read_file(path: str) -> str:
p = safe_path(path)
if not p.exists() or not p.is_file():
return f"File not found: {path}"
return truncate(p.read_text(errors="replace"), MAX_FILE_CHARS)
def search_repo(query: str) -> str:
result = subprocess.run(
["rg", "-n", "--hidden", "--glob", "!node_modules", "--glob", "!.git", query],
cwd=ROOT,
capture_output=True,
text=True,
timeout=20,
)
output = result.stdout if result.stdout else result.stderr
return truncate(output)
def run_tests(command: str = "pytest -q") -> str:
# Keep this constrained. In production, use an allowlist instead of raw commands.
allowed_prefixes = ["pytest", "npm test", "pnpm test", "go test", "cargo test"]
if not any(command.startswith(prefix) for prefix in allowed_prefixes):
return f"Rejected command: {command}. Allowed prefixes: {allowed_prefixes}"
result = subprocess.run(
command,
cwd=ROOT,
shell=True,
capture_output=True,
text=True,
timeout=120,
)
return truncate(
f"exit_code={result.returncode}\n\nSTDOUT:\n{result.stdout}\n\nSTDERR:\n{result.stderr}"
)
TOOL_IMPL = {
"list_files": list_files,
"read_file": read_file,
"search_repo": search_repo,
"run_tests": run_tests,
}
TOOLS: list[dict[str, Any]] = [
{
"type": "function",
"function": {
"name": "list_files",
"description": "List tracked repository files. Optionally filter by substring.",
"parameters": {
"type": "object",
"properties": {"pattern": {"type": "string"}},
"required": [],
},
},
},
{
"type": "function",
"function": {
"name": "read_file",
"description": "Read a file from the repository root.",
"parameters": {
"type": "object",
"properties": {"path": {"type": "string"}},
"required": ["path"],
},
},
},
{
"type": "function",
"function": {
"name": "search_repo",
"description": "Search the repository using ripgrep.",
"parameters": {
"type": "object",
"properties": {"query": {"type": "string"}},
"required": ["query"],
},
},
},
{
"type": "function",
"function": {
"name": "run_tests",
"description": "Run a constrained test command and return stdout/stderr.",
"parameters": {
"type": "object",
"properties": {"command": {"type": "string", "default": "pytest -q"}},
"required": [],
},
},
},
]
def run_agent(user_task: str) -> str:
messages: list[dict[str, Any]] = [
{
"role": "system",
"content": (
"You are a senior software engineer operating through tools. "
"Do not guess file contents. Inspect before concluding. "
"Prefer small, reversible changes. "
"When you have enough evidence, return a concise plan and the exact files likely needing edits."
),
},
{"role": "user", "content": user_task},
]
for step in range(MAX_STEPS):
response = client.chat.completions.create(
model="MiniMax-M3",
messages=messages,
tools=TOOLS,
max_completion_tokens=4096,
temperature=0.3,
extra_body={"thinking": {"type": "adaptive"}, "reasoning_split": True},
)
msg = response.choices[0].message
# MiniMax docs note that full assistant messages should be preserved in multi-turn tool conversations.
messages.append(msg.model_dump(exclude_none=True))
if not msg.tool_calls:
return msg.content or ""
for call in msg.tool_calls:
name = call.function.name
args = json.loads(call.function.arguments or "{}")
if name not in TOOL_IMPL:
tool_result = f"Unknown tool: {name}"
else:
try:
tool_result = TOOL_IMPL[name](**args)
except Exception as exc:
tool_result = f"Tool error: {type(exc).__name__}: {exc}"
messages.append(
{
"role": "tool",
"tool_call_id": call.id,
"name": name,
"content": tool_result,
}
)
return "Stopped: max agent steps reached. Review the trace and narrow the task."
if __name__ == "__main__":
import sys
task = " ".join(sys.argv[1:]) or "Find the most likely cause of the failing tests."
print(run_agent(task))Run it:
export MINIMAX_API_KEY="..."
export OPENAI_BASE_URL="https://api.minimax.io/v1"
export AGENT_REPO="/path/to/your/repo"
python mini_m3_agent.py "Investigate why the auth tests fail after the session middleware refactor."export MINIMAX_API_KEY="..."
export OPENAI_BASE_URL="https://api.minimax.io/v1"
export AGENT_REPO="/path/to/your/repo"
python mini_m3_agent.py "Investigate why the auth tests fail after the session middleware refactor."I'm thinking to cover latest open-source models in June, let me what you would like to see, and happy building!
Bonus Articles
Nemotron 3 Ultra Is Agent Factory NVIDIA's Nemotron 3 Ultra announcement is easy to misread.
DeepSWE: Both Claude Opus 4.6 and 4.7 registered CHEATED on more than 12% of reviewed SWE-Bench Pro DeepSWE: Both Claude Opus 4.6 and 4.7 registered CHEATED on more than 12% of reviewed SWE-Bench Pro For months, on…
M5 MacBook Pro or NVIDIA DGX Spark or RTX PRO 6000? Before answering this in depth, please think about the following question:
48GB VRAM: Local Coding Agents Most developers asking about local AI hardware still frame the question in model-size terms.
Qwen3.7-Max Lands Near Opus 4.7 and GPT-5.5: New Daily Driver for Devs? Qwen3.7-Max Lands Near Opus 4.7 and GPT-5.5: New Daily Driver for Devs? Everybody is super excited about upcoming…