English | 中文說明
Illustrative estimates. The ~90% is Claude-side token reduction on analysis-heavy tasks — not total cost: Kimi's own subscription still applies. The monorepo row mirrors the example below; the PDF/commits rows are rough sketches.
Delegate codebase analysis from Claude to Kimi Code (kimi-for-coding, 256K) — cut Claude-side token cost ~90%.
| Task | Claude only | Claude + kimi-code-mcp | Claude-side savings |
|---|---|---|---|
| Analyze 200-file monorepo | ~250K tok | ~25K tok | ~90% |
| Summarize 50-page RFC PDF | ~60K tok | ~6K tok | ~90% (sketch) |
| Cross-reference 100 commits | ~80K tok | ~8K tok | ~90% (sketch) |
*Illustrative estimates — savings are on Claude tokens only and depend on the task; Kimi's subscription cost is separate. See Token Economics.
# 1. Install Kimi CLI and log in
curl -L code.kimi.com/install.sh | bash
kimi login
# 2. Install via npm
npm install -g kimi-mcp-serverAdd to .mcp.json (project-level or ~/.claude/mcp.json for global):
{
"mcpServers": {
"kimi-code": {
"command": "npx",
"args": ["-y", "kimi-mcp-server"]
}
}
}Run /mcp in Claude Code to verify — you should see kimi-code with 8 tools.
Tip
You don't need the CLI for the common case. kimi_query and kimi_verify call the Kimi Code API directly — no Python CLI install or kimi login required. Just provide an API key via $KIMICODE_API_KEY or ~/.kimi/config.toml (see Kimi Code API Setup). Only the codebase-reading tools (kimi_analyze, kimi_resume) need the CLI. See Two backends: API vs CLI for the full split.
- Claude calls the
kimi_analyzetool when a task needs bulk codebase reading. - MCP routes the request to Kimi Code (
kimi-for-coding, 256K context) — Kimi reads the entire codebase in one pass. - The result is piped back as a structured response — Claude acts on it with precise, targeted edits.
┌──────────────┐ stdio/MCP ┌──────────────┐ subprocess ┌──────────────┐
│ Claude Code │ ◄──────────► │ kimi-code-mcp│ ────────────► │ Kimi CLI │
│ (conductor) │ │ (MCP server) │ │ (256K ctx) │
└──────────────┘ └──────────────┘ └──────────────┘
The server reaches Kimi two different ways, and each tool uses the one that fits its job. Knowing which is which tells you what you need to set up.
| Backend | How it talks to Kimi | What it needs | Sees your codebase? |
|---|---|---|---|
| Direct API | HTTPS to api.kimi.com/coding/v1 |
An API key only ($KIMICODE_API_KEY or ~/.kimi/config.toml) |
❌ No — you paste in the context |
| Local CLI | Spawns the kimi binary as a subprocess |
CLI installed and kimi login done |
✅ Yes — reads files from disk |
| Tool | Backend | Why |
|---|---|---|
kimi_query |
API (CLI only if no key configured) | Contextless Q&A — no codebase needed, so the API is simpler and has no login dependency |
kimi_verify |
API | You pass the code/diff/claim inline; Kimi judges it as an independent third party |
kimi_analyze |
CLI | Must read your whole codebase (256K ctx) from disk |
kimi_resume |
CLI | Continues a stateful CLI session that holds prior codebase context |
kimi_list_sessions, kimi_cache_*, kimi_status |
local | Read local session/cache metadata |
Important
Most users only need the API key. If you just want a second opinion / verification (kimi_query, kimi_verify), set the API key and you're done — skip the CLI entirely. Install + kimi login only when you want Kimi to read your codebase via kimi_analyze / kimi_resume.
Run kimi_status any time to see which backends are live — it reports the API-configured state and the CLI install/auth state separately.
If you are an AI agent (Claude Code, a subagent, etc.) deciding when to call these tools:
- Cross-check your own work before committing →
kimi_verify. Paste the actual diff/code/claim plus the surrounding context (goal, constraints, signatures). Kimi sees only thecontextstring — no repo, no session history. Vague context → useless review. - Quick model-agnostic programming question →
kimi_query. No codebase needed. Returns a different model's opinion. - Need to understand a large/unfamiliar codebase →
kimi_analyzewithwork_dir. Prefer this over reading 50 files yourself; it's ~10× cheaper in Claude tokens. Requires the CLI to be installed and logged in. - Drill deeper after an analyze →
kimi_resumewith the returnedsession_id(retains up to 256K tokens of prior context). - Don't know why a Kimi call failed →
kimi_statusfirst. "Not authenticated" on the CLI does not affectkimi_query/kimi_verify(those use the API). - Keep outputs lean. Default
detail_level: summaryfor orientation; raise tonormal/detailedonly when you need code snippets. Bigger output = more Claude tokens, defeating the purpose. - Skip Kimi for small/single-file work — Claude reading directly is faster under ~10 files.
MCP server that connects Kimi Code (model kimi-for-coding, 256K context, auto-upgraded) with Claude Code — letting Claude orchestrate while Kimi handles the heavy reading.
Kimi Code sits on the efficiency frontier — near-Claude intelligence at 10x lower cost. kimi.com/code
Tip
Stop paying Claude to read files. Kimi Code delivers frontier-class code intelligence at a fraction of the cost (see chart above). Delegate bulk codebase scanning to Kimi (256K context, near-zero cost) and let Claude focus on what it does best — reasoning, decisions, and precise code edits. One kimi_analyze call can replace 50+ file reads.
Kimi Code is an AI code agent by Moonshot AI. The model ID kimi-for-coding (1T MoE, 256K context) automatically receives backend upgrades — no version pinning required. It works across Terminal, IDE, and CLI — writing, debugging, refactoring, and analyzing code autonomously.
Key specs:
- 256K token context — reads entire codebases in one pass
- Parallel agent spawning — handles concurrent tasks
- Shell, file, and web access — full developer toolchain
- Install:
curl -L code.kimi.com/install.sh | bash
Warning
Kimi Code membership required. All tools ultimately hit Kimi Code, which needs an active Kimi Code plan. The API tools (kimi_query, kimi_verify) authenticate with an API key; the codebase tools (kimi_analyze, kimi_resume) additionally need the CLI installed + kimi login. See kimi.com/code for pricing tiers and quotas.
If you prefer to build locally instead of using the npm package:
git clone https://github.com/howardpen9/kimi-code-mcp.git
cd kimi-code-mcp && npm install && npm run build{
"mcpServers": {
"kimi-code": {
"command": "node",
"args": ["/absolute/path/to/kimi-code-mcp/dist/index.js"]
}
}
}Note
Kimi Code API and Moonshot API are separate providers — their API keys are not interchangeable.
There are two ways to configure the Kimi Code API for the CLI:
In the Kimi Code CLI shell, run:
kimiThen use the /login (or /setup) command:
/login
- Select Kimi Code as the platform
- Your browser opens for OAuth authorization
- Config is saved automatically to
~/.kimi/config.toml
Note
zsh: command not found: kimi after install? The installer puts the binary at ~/.local/bin/kimi, which may not be on your PATH. Add it (then restart your shell or open a new tab):
echo 'export PATH="$HOME/.local/bin:$PATH"' >> ~/.zshrc && source ~/.zshrcThe MCP server calls the binary by absolute path, so this only affects running kimi yourself in a terminal (e.g. for kimi login).
- Visit code.kimi.com
- Sign in → Settings → API Keys
- Create a new key (starts with
sk-, shown only once)
nano ~/.kimi/config.tomlAdd:
[providers.kimi-code]
type = "kimi"
base_url = "https://api.kimi.com/coding/v1"
api_key = "sk-your-api-key"
[models.kimi-for-coding]
provider = "kimi-code"
model = "kimi-for-coding"
max_context_size = 262144
capabilities = ["thinking"]
[defaults]
model = "kimi-for-coding"# Add to ~/.zshrc (macOS) or ~/.bashrc (Linux)
export KIMICODE_API_KEY="sk-your-api-key"Then reference it in config.toml:
[providers.kimi-code]
type = "kimi"
base_url = "https://api.kimi.com/coding/v1"
api_key = "${KIMICODE_API_KEY}"You can configure both Kimi Code and Moonshot side by side:
[providers.kimi-code]
type = "kimi"
base_url = "https://api.kimi.com/coding/v1"
api_key = "${KIMICODE_API_KEY}"
[providers.moonshot-cn]
type = "kimi"
base_url = "https://api.moonshot.cn/v1"
api_key = "${MOONSHOT_API_KEY}"
[models.kimi-for-coding]
provider = "kimi-code"
model = "kimi-for-coding"
max_context_size = 262144
capabilities = ["thinking"]
[models.kimi-k2]
provider = "moonshot-cn"
model = "kimi-k2-0905-preview"
max_context_size = 256000
capabilities = ["thinking"]
[defaults]
model = "kimi-for-coding"Switch models at any time with /model or /model kimi-k2 in the CLI.
| Feature | Kimi Code | Moonshot |
|---|---|---|
| Focus | Optimized for coding | General-purpose chat |
| Endpoint | api.kimi.com/coding/v1 |
api.moonshot.cn/v1 |
| API Key | Separate — apply at code.kimi.com | Separate |
| SearchWeb / FetchURL | Built-in | Not available |
| Context | 262K | 256K |
Just tell Claude what you need. It will delegate to Kimi automatically:
| Prompt | What happens |
|---|---|
| "Analyze this codebase's architecture" | Kimi reads all files (256K ctx), Claude acts on the report |
| "Scan for security vulnerabilities, then review Kimi's findings" | Kimi audits, Claude cross-examines — AI pair review |
| "Map all dependencies of the auth module, then plan the refactoring" | Kimi builds the dependency graph, Claude plans the changes |
| "Review the recent changes for regressions and edge cases" | Kimi reviews full context (not just the diff), Claude synthesizes |
| "Resume the last Kimi session and ask about the API design" | Kimi retains 256K tokens of context across sessions |
Claude Code is powerful but expensive. Every file it reads costs tokens. Meanwhile, many tasks — pre-reviewing large codebases, scanning for patterns, generating audit reports — are high-certainty work that doesn't need Claude's full reasoning power.
Important
The cost equation: Claude reads 50 files to understand your architecture = expensive. Kimi reads 50 files via kimi_analyze = near-zero cost. Claude then acts on Kimi's structured report = minimal tokens. Total savings: 60-80% fewer Claude tokens on analysis-heavy tasks.
┌─────────────────────────────┐
│ You (the developer) │
└──────────┬──────────────────┘
│ prompt
▼
┌─────────────────────────────┐
│ Claude Code (conductor) │
│ - orchestrates workflow │
│ - makes decisions │
│ - writes & edits code │
└──────┬──────────────┬───────┘
precise │ │ delegate
edits │ │ bulk reading
(tokens) │ │ (FREE)
▼ ▼
┌──────────┐ ┌──────────────┐
│ your │ │ Kimi Code │
│ codebase │ │ - 256K ctx │
└──────────┘ │ - reads all │
│ - reports │
└──────────────┘
- Claude receives your task → decides it needs codebase understanding
- Claude calls
kimi_analyzevia MCP → Kimi reads the entire codebase (256K context, near-zero cost) - Kimi returns a structured analysis
- Claude acts on the analysis with precise, targeted edits
Result: Claude only spends tokens on decision-making and code writing, not on reading files.
kimi-for-coding is a 1T MoE model designed for deep code comprehension. This enables AI pair review:
- Kimi pre-reviews — 256K context means it sees the entire codebase at once: security issues, anti-patterns, dead code, architectural problems
- Claude cross-examines — reviews Kimi's findings, challenges questionable items, adds its own insights
- Two perspectives — different models catch different things. What one misses, the other finds
Beyond ad-hoc analysis, you can use Kimi as a dedicated reviewer in your workflow:
┌──────────────┐ diff ┌──────────────┐ structured ┌──────────────┐
│ Your PR │ ────────► │ Kimi Code │ findings │ Claude Code │
│ (changes) │ │ (reviewer) │ ────────────►│ (decision) │
└──────────────┘ └──────────────┘ └──────────────┘
| When | What | Why |
|---|---|---|
| Before merging | Kimi scans diff + affected modules | Catch regressions early |
| Weekly | Full codebase sweep | Accumulated tech debt |
| Pre-release | Security-focused audit | Ship with confidence |
Each review session can be resumed (kimi_resume) — Kimi retains up to 256K tokens of context from previous sessions, building understanding over time.
| Review Type | Why Kimi Excels |
|---|---|
| Security audit | 256K context sees full attack surface, not just isolated files |
| Dead code detection | Can trace imports/exports across entire codebase |
| API consistency | Compares patterns across all endpoints simultaneously |
| Dependency analysis | Maps full dependency graph in one pass |
| Architecture review | Sees the forest and the trees at the same time |
| Tool | Description | Timeout |
|---|---|---|
kimi_analyze |
CLI — deep codebase analysis (architecture, audit, refactoring) | 10 min |
kimi_query |
API — quick programming questions, no codebase context (CLI only if no key configured) | 2 min |
kimi_verify |
API — independent third-party verification of code/diffs/claims; no CLI required, context-driven | 5 min |
kimi_list_sessions |
List existing Kimi sessions with metadata | instant |
kimi_resume |
CLI — resume a previous session (up to 256K token context) | 10 min |
kimi_status |
Report API-configured state + CLI install/version/auth status | instant |
kimi_cache_status |
View session cache statistics and performance metrics | instant |
kimi_cache_invalidate |
Manually invalidate cached sessions (by dir or all) | instant |
kimi_analyze and kimi_resume support these parameters to control output size:
| Parameter | Values | Default | Effect |
|---|---|---|---|
detail_level |
summary / normal / detailed |
normal |
Controls prompt-side verbosity instructions |
max_output_tokens |
number | 15000 |
Hard ceiling — output truncated at clean boundary if exceeded |
include_thinking |
boolean | false |
Include Kimi's internal reasoning chain (10-30K extra tokens) |
kimi_query also supports max_output_tokens and include_thinking.
Note
The savings come from compression ratio, not from free reading. Kimi's subscription cost still applies, but the key benefit is reducing expensive Claude Code token consumption.
Without kimi-code-mcp With kimi-code-mcp (normal)
───────────────────── ───────────────────────────
Raw source: 50 files × ~4K = 200K Kimi reads (subscription cost)
Claude reads: 200K tokens 5-15K token report
Claude token cost: $$$ $
Compression ratio by detail_level:
| Level | Compression | Output Size | Equivalent Source | Best For |
|---|---|---|---|---|
summary |
40-100x | ~2-5K tokens | ~8-20K chars / ~200-500 lines of code | Quick orientation, file inventory |
normal |
15-40x | ~5-15K tokens | ~20-60K chars / ~500-1500 lines of code | Architecture review, dependency mapping |
detailed |
5-15x | ~15-40K tokens | ~60-160K chars / ~1500-4000 lines of code | Security audit with code snippets |
When savings happen:
- Large codebases (50+ files) — architecture understanding, cross-file scanning
- Security audits, dead code detection, API consistency checks
- Pre-review before targeted edits (scan first → edit specific files)
When to skip and let Claude read directly:
- Small codebases (<10 files) — direct reading is faster
- Single-file modifications — Claude's built-in file reading is sufficient
- When you need every line of code —
detailedoutput approaches raw reading cost
Under the hood:
- Claude Code calls an MCP tool (e.g.,
kimi_analyze) - This server spawns the
kimiCLI with the prompt and codebase path - Kimi autonomously reads files, analyzes the code (up to 256K tokens)
- The result is parsed from Kimi's JSON output and returned to Claude Code
- Claude acts on the structured results — edits, plans, or further analysis
The MCP server calls the Kimi CLI in non-interactive (print) mode:
kimi --work-dir <path> --print -p "<prompt>"| Flag | Purpose |
|---|---|
--print |
Non-interactive mode — outputs result and exits (required for subprocess use) |
-p / --prompt |
Pass prompt directly (bypasses interactive shell) |
--work-dir / -w |
Set codebase root directory |
-S <id> |
Resume a specific session by ID |
--no-thinking |
Disable thinking mode |
Note
There is no kimi analyze subcommand. The MCP tool is named kimi_analyze, but the underlying CLI uses the flags above. Use this syntax to call Kimi directly for debugging or scripting.
For development (auto-recompile on changes):
{
"mcpServers": {
"kimi-code": {
"command": "npx",
"args": ["tsx", "/absolute/path/to/kimi-code-mcp/src/index.ts"]
}
}
}Published as kimi-mcp-server on npm.
npx kimi-mcp-server # run directly
npm install -g kimi-mcp-server # install globallysrc/
├── index.ts # MCP server setup, tool definitions, API-vs-CLI routing
├── kimi-api.ts # Direct Kimi Code API client (kimi_query / kimi_verify)
├── kimi-runner.ts # Spawns kimi CLI, parses output, handles timeouts
├── cache-manager.ts # Session cache (warmup, reuse, invalidation)
└── session-reader.ts # Reads Kimi session metadata from ~/.kimi/
See CONTRIBUTING.md for guidelines.
See CHANGELOG.md for version history.
MIT