OVERVIEW

BENCHMARK REPORT

BRAIN MCP vs BASELINE (GREP+READ+GLOB) :: REAL CLAUDE CODE SESSION :: 2026-04-21T10:40:43

FRESH INPUT SAVINGS
51.7%
2.1× fewer fresh tokens - the metric that counts against Max plan rate limits
WALL-CLOCK TIME
29.1%
2204s → 1563s (641s saved across 50 Q)
TOTAL TOKENS (in+out)
-1.8%
baseline 56,374 vs brain 57,401 - similar, cache dominates
BIGGEST CASE SPEEDUP
15×
Q45 'largest cluster' - 633s baseline → 41s brain

Executive Summary

A head-to-head benchmark of Claude Code in two configurations, solving the same 50 real-world code-analysis questions against a production repo (~/dev/example/Acme).

The Real Win: Fresh Input Reduction

Claude has three token types. Only one counts fully against Max plan rate limits:

Token typeBaselineBrainΔRate limit impact
Fresh input (tool outputs, new context)4,1452,003-51.7%100% weight
Output (thinking, answers)52,22955,398+6.1%100% weight
Cache read (repeated system prompt)7,521,8197,685,722+2.2%~10% weight (discounted)
What this means for Max plan users: brain cuts the tokens that count against your 5-hour and weekly rate limits by 51.7%. Same subscription, more queries before hitting limits.

Headline Findings

This benchmark ran with real Claude Code + Claude Opus on a production codebase, baseline given access to all 5 acme repos via add_dirs. Tokens measured via Anthropic Agent SDK - no estimates.

Methodology

Test Setup

Two Configurations

BASELINE

Tools: Bash, Grep, Read, Glob

Simulates standard CC without brain.

BRAIN

Tools: brain_query, brain_graph, brain_path, brain_explain, brain_ffcss

MCP over HTTPS to brain.sdet.it.

50 Questions - 5 Categories

Raw Results - 50 Questions

QCategory Baseline tokBrain tokRatio Baseline $Brain $ Baseline sBrain s Tools baseTools brain

Per-Category Analysis

Category Breakdown

CategoryN Baseline tokBrain tok Baseline timeBrain time Token savings

Token Economics

Max plan users pay $0 per token. The figures below project pay-per-use Opus API pricing.

Monthly Cost Scaling

Team Cost Projection

Team sizeBaseline $/moBrain $/moAnnual savings
1 dev$107.47$108.24$-9.24
5 devs$537.35$541.20$-46.20
10 devs$1074.70$1082.40$-92.40
20 devs$2149.40$2164.80$-184.80

Context Window Impact (Max users)

With 1M context window, before compression kicks in:

Answer Quality - Side by Side

Both methods often "correct" - we show verbatim outputs for human review.

CC Workflow Demo

Baseline (Grep+Read)

> Where is the cart hook in acme-core?
→ Bash: find layers/acme-core -name "*Cart*"          [200ms, ~5kB]
→ Grep: "useCart" layers/acme-core                    [300ms, ~8kB]
→ Read: layers/acme-core/composables/index.ts         [~3kB]
→ Read: layers/acme-logic/composables/useCart.ts  [~1kB]
→ Answer: "layers/acme-logic/composables/useCart.ts"
Total: ~12s, 4 tool calls, ~17kB into context

Brain

> Where is the cart hook in acme-core?
→ brain_query(q="Cart", scope="acme/Acme")  [150ms, ~2kB]
→ Answer: "layers/acme-logic/composables/useCart.ts"
Total: ~3s, 1 tool call, ~2kB into context

Implementation Guide

For Developers (15 minutes)

For Architects

For Ops

FAQ

Q: Does brain replace Grep/Read/Glob?

No. Brain adds semantic graph queries. CC still uses Grep for regex searches and Read for full file content. Brain's tools are for "what / where / who / connected".

Q: What if brain is down?

CC falls back to Grep/Read/Glob. Worst case: slow (baseline mode).

Q: How fresh is the graph?

5min polling + webhook re-extraction. Typical freshness < 5min.

Q: Is my code sent to third party?

Only graph metadata (node IDs, paths). Host on own VPS for zero external flow.

READY