MiniMax-M2.5

Self-hosted AI inference API — 128K context, function calling, reasoning

SWE-Bench 80.2% Multi-SWE-Bench 51.3%

Quick Start

Endpointhttps://gpu-workspace.taile8dc37.ts.net/minimax/v1
AuthAuthorization: Bearer YOUR_API_KEY
Modelminimax-m2.5
curl https://gpu-workspace.taile8dc37.ts.net/minimax/v1/chat/completions \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "minimax-m2.5",
    "messages": [{"role": "user", "content": "Hello!"}]
  }'

API Reference

POST /v1/chat/completions

Standard OpenAI chat completions endpoint. Supports:

GET /v1/models

List available models.

GET /health/liveliness

Health check — returns 200 when the server is ready.

Models

Model IDContextDescription
minimax-m2.5128KRecommended
MiniMaxAI/MiniMax-M2.5128KFull name alias

Pricing

Input$0.30 / 1M tokens
Output$1.20 / 1M tokens

Limits

Max concurrent16 requests
Max context131,072 tokens (128K)
Timeout600 seconds

Reasoning

MiniMax-M2.5 includes chain-of-thought reasoning in <think> blocks. The API separates reasoning into the reasoning_content field when available.

Integrations

Claude Code

export ANTHROPIC_BASE_URL="https://gpu-workspace.taile8dc37.ts.net/minimax/v1"
export ANTHROPIC_API_KEY="YOUR_API_KEY"
claude --model minimax-m2.5

Codex (OpenAI CLI)

export OPENAI_BASE_URL="https://gpu-workspace.taile8dc37.ts.net/minimax/v1"
export OPENAI_API_KEY="YOUR_API_KEY"
codex --model minimax-m2.5 "your prompt"

Aider

aider --openai-api-base https://gpu-workspace.taile8dc37.ts.net/minimax/v1 \
      --openai-api-key YOUR_API_KEY \
      --model openai/minimax-m2.5

Continue (VS Code / JetBrains)

Add to ~/.continue/config.json:

{
  "models": [{
    "title": "MiniMax-M2.5",
    "provider": "openai",
    "model": "minimax-m2.5",
    "apiBase": "https://gpu-workspace.taile8dc37.ts.net/minimax/v1",
    "apiKey": "YOUR_API_KEY"
  }]
}

Cline (VS Code)

In Cline settings: API Provider → OpenAI Compatible, Base URL → https://gpu-workspace.taile8dc37.ts.net/minimax/v1, Model ID → minimax-m2.5

Any OpenAI-compatible client

Base URLhttps://gpu-workspace.taile8dc37.ts.net/minimax/v1
API KeyYour API key
Modelminimax-m2.5

Code Examples

Python

from openai import OpenAI

client = OpenAI(
    base_url="https://gpu-workspace.taile8dc37.ts.net/minimax/v1",
    api_key="YOUR_API_KEY",
)

response = client.chat.completions.create(
    model="minimax-m2.5",
    messages=[{"role": "user", "content": "Hello!"}],
    max_tokens=1024,
)
print(response.choices[0].message.content)

Python (streaming)

stream = client.chat.completions.create(
    model="minimax-m2.5",
    messages=[{"role": "user", "content": "Write a Redis cache decorator."}],
    stream=True,
)
for chunk in stream:
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="", flush=True)

Node.js

import OpenAI from "openai";

const client = new OpenAI({
  baseURL: "https://gpu-workspace.taile8dc37.ts.net/minimax/v1",
  apiKey: "YOUR_API_KEY",
});

const response = await client.chat.completions.create({
  model: "minimax-m2.5",
  messages: [{ role: "user", content: "Hello!" }],
});
console.log(response.choices[0].message.content);

CLI Tool

Ollama-style CLI for managing the server:

pip install -e .    # from the MiniMax-M2.5 repo

minimax run          # interactive chat with streaming
minimax ps           # server status + GPU usage
minimax serve        # start vLLM + LiteLLM
minimax stop         # stop all servers
minimax tui          # admin TUI (key management)
minimax auth login   # store API key
minimax setup claude # configure Claude Code

Capabilities

Running on 8x NVIDIA H100 80GB with vLLM — tensor parallel + expert parallel