BENCHMARK REPORT
BRAIN MCP vs BASELINE (GREP+READ+GLOB) :: REAL CLAUDE CODE SESSION :: 2026-04-21T10:40:43
Executive Summary
A head-to-head benchmark of Claude Code in two configurations, solving the same 50 real-world code-analysis questions against a production repo (~/dev/example/Acme).
The Real Win: Fresh Input Reduction
Claude has three token types. Only one counts fully against Max plan rate limits:
| Token type | Baseline | Brain | Δ | Rate limit impact |
|---|---|---|---|---|
| Fresh input (tool outputs, new context) | 4,145 | 2,003 | -51.7% | 100% weight |
| Output (thinking, answers) | 52,229 | 55,398 | +6.1% | 100% weight |
| Cache read (repeated system prompt) | 7,521,819 | 7,685,722 | +2.2% | ~10% weight (discounted) |
Headline Findings
- Fresh input tokens cut 51.7% (2.1× less) - the real metric for rate limits
- Wall-clock time cut 29.1% (641s saved across 50 questions)
- On architectural questions: up to 15× faster (Q45: 633s → 41s)
- Total billed cost tied (-0.7%) - cache dominates both modes
- Cross-repo analytics (god-nodes, Louvain, DRY violations): grep literally cannot do these
add_dirs. Tokens measured via Anthropic Agent SDK - no estimates.Methodology
Test Setup
- Target repo:
~/dev/example/Acme(production Nuxt 4 + Vue 3 codebase) - Model: Claude Opus
- Max turns per question: 15
- Runner: Python +
claude-agent-sdk
Two Configurations
BASELINE
Tools: Bash, Grep, Read, Glob
BRAIN
Tools: brain_query, brain_graph, brain_path, brain_explain, brain_ffcss
50 Questions - 5 Categories
- Code discovery Q01-Q10 - "Where is X?"
- Usage tracing Q11-Q20 - "Who uses X?"
- Cross-repo Q21-Q30 - "Does Checkout override X?"
- Dependency paths Q31-Q40 - "Hops A → B?"
- Architecture Q41-Q50 - "God-nodes, DRY"
Raw Results - 50 Questions
| Q | Category | Baseline tok | Brain tok | Ratio | Baseline $ | Brain $ | Baseline s | Brain s | Tools base | Tools brain |
|---|
Per-Category Analysis
Category Breakdown
| Category | N | Baseline tok | Brain tok | Baseline time | Brain time | Token savings |
|---|
Token Economics
Monthly Cost Scaling
Team Cost Projection
| Team size | Baseline $/mo | Brain $/mo | Annual savings |
|---|---|---|---|
| 1 dev | $107.47 | $108.24 | $-9.24 |
| 5 devs | $537.35 | $541.20 | $-46.20 |
| 10 devs | $1074.70 | $1082.40 | $-92.40 |
| 20 devs | $2149.40 | $2164.80 | $-184.80 |
Context Window Impact (Max users)
With 1M context window, before compression kicks in:
- Baseline: ~887 questions/session
- Brain: ~871 questions/session
Answer Quality - Side by Side
CC Workflow Demo
Baseline (Grep+Read)
> Where is the cart hook in acme-core? → Bash: find layers/acme-core -name "*Cart*" [200ms, ~5kB] → Grep: "useCart" layers/acme-core [300ms, ~8kB] → Read: layers/acme-core/composables/index.ts [~3kB] → Read: layers/acme-logic/composables/useCart.ts [~1kB] → Answer: "layers/acme-logic/composables/useCart.ts" Total: ~12s, 4 tool calls, ~17kB into context
Brain
> Where is the cart hook in acme-core? → brain_query(q="Cart", scope="acme/Acme") [150ms, ~2kB] → Answer: "layers/acme-logic/composables/useCart.ts" Total: ~3s, 1 tool call, ~2kB into context
Implementation Guide
For Developers (15 minutes)
- Step 1: Get a dev token from admin
- Step 2:
claude mcp add brain --transport http --scope user https://brain.sdet.it/mcp --header "Authorization: Bearer $TOKEN" - Step 3: Restart CC →
/mcp→ brain connected - Step 4: See Quick Start
For Architects
- Add repos to
groups.yml - Run
/brain-extractin CC, or wait for webhook extract - Brain auto-federates within 3min of push
- Dashboard at
/admin/dashboard
For Ops
- Single Docker-compose stack (API + worker + Postgres + Redis + nginx)
- Runs on 2-core 4GB VPS
- Alerts out of the box (Discord, Slack)
- Daily archive + rotation + observability
FAQ
Q: Does brain replace Grep/Read/Glob?
No. Brain adds semantic graph queries. CC still uses Grep for regex searches and Read for full file content. Brain's tools are for "what / where / who / connected".
Q: What if brain is down?
CC falls back to Grep/Read/Glob. Worst case: slow (baseline mode).
Q: How fresh is the graph?
5min polling + webhook re-extraction. Typical freshness < 5min.
Q: Is my code sent to third party?
Only graph metadata (node IDs, paths). Host on own VPS for zero external flow.
READY