BENCHMARK REPORT

Name: JARVIS-BRAIN :: BENCHMARK REPORT
Creator: Dariusz Kowalski
Published: 2026-04-21
License: https://opensource.org/licenses/MIT

BRAIN MCP vs BASELINE (GREP+READ+GLOB) :: REAL CLAUDE CODE SESSION :: 2026-04-21T10:40:43

FRESH INPUT SAVINGS

51.7%

2.1× fewer fresh tokens - the metric that counts against Max plan rate limits

WALL-CLOCK TIME

29.1%

2204s → 1563s (641s saved across 50 Q)

TOTAL TOKENS (in+out)

-1.8%

baseline 56,374 vs brain 57,401 - similar, cache dominates

BIGGEST CASE SPEEDUP

15×

Q45 'largest cluster' - 633s baseline → 41s brain

Executive Summary

A head-to-head benchmark of Claude Code in two configurations, solving the same 50 real-world code-analysis questions against a production repo (~/dev/example/Acme).

The Real Win: Fresh Input Reduction

Claude has three token types. Only one counts fully against Max plan rate limits:

Token type	Baseline	Brain	Δ	Rate limit impact
Fresh input (tool outputs, new context)	4,145	2,003	-51.7%	100% weight
Output (thinking, answers)	52,229	55,398	+6.1%	100% weight
Cache read (repeated system prompt)	7,521,819	7,685,722	+2.2%	~10% weight (discounted)

What this means for Max plan users: brain cuts the tokens that count against your 5-hour and weekly rate limits by 51.7%. Same subscription, more queries before hitting limits.

Headline Findings

Fresh input tokens cut 51.7% (2.1× less) - the real metric for rate limits
Wall-clock time cut 29.1% (641s saved across 50 questions)
On architectural questions: up to 15× faster (Q45: 633s → 41s)
Total billed cost tied (-0.7%) - cache dominates both modes
Cross-repo analytics (god-nodes, Louvain, DRY violations): grep literally cannot do these

This benchmark ran with real Claude Code + Claude Opus on a production codebase, baseline given access to all 5 acme repos via add_dirs. Tokens measured via Anthropic Agent SDK - no estimates.

Methodology

Test Setup

Target repo: ~/dev/example/Acme (production Nuxt 4 + Vue 3 codebase)
Model: Claude Opus
Max turns per question: 15
Runner: Python + claude-agent-sdk

Two Configurations

BASELINE

Tools: Bash, Grep, Read, Glob

Simulates standard CC without brain.

BRAIN

Tools: brain_query, brain_graph, brain_path, brain_explain, brain_ffcss

MCP over HTTPS to brain.sdet.it.

50 Questions - 5 Categories

Code discovery Q01-Q10 - "Where is X?"
Usage tracing Q11-Q20 - "Who uses X?"
Cross-repo Q21-Q30 - "Does Checkout override X?"
Dependency paths Q31-Q40 - "Hops A → B?"
Architecture Q41-Q50 - "God-nodes, DRY"

Raw Results - 50 Questions

Category: Sort:

Q	Category	Baseline tok	Brain tok	Ratio	Baseline $	Brain $	Baseline s	Brain s	Tools base	Tools brain

Per-Category Analysis

Category Breakdown

Category	N	Baseline tok	Brain tok	Baseline time	Brain time	Token savings

Token Economics

Max plan users pay $0 per token. The figures below project pay-per-use Opus API pricing.

Monthly Cost Scaling

Team Cost Projection

Team size	Baseline $/mo	Brain $/mo	Annual savings
1 dev	$107.47	$108.24	$-9.24
5 devs	$537.35	$541.20	$-46.20
10 devs	$1074.70	$1082.40	$-92.40
20 devs	$2149.40	$2164.80	$-184.80

Context Window Impact (Max users)

With 1M context window, before compression kicks in:

Baseline: ~887 questions/session
Brain: ~871 questions/session

Answer Quality - Side by Side

Both methods often "correct" - we show verbatim outputs for human review.

CC Workflow Demo

Baseline (Grep+Read)

> Where is the cart hook in acme-core?
→ Bash: find layers/acme-core -name "*Cart*"          [200ms, ~5kB]
→ Grep: "useCart" layers/acme-core                    [300ms, ~8kB]
→ Read: layers/acme-core/composables/index.ts         [~3kB]
→ Read: layers/acme-logic/composables/useCart.ts  [~1kB]
→ Answer: "layers/acme-logic/composables/useCart.ts"
Total: ~12s, 4 tool calls, ~17kB into context

Brain

> Where is the cart hook in acme-core?
→ brain_query(q="Cart", scope="acme/Acme")  [150ms, ~2kB]
→ Answer: "layers/acme-logic/composables/useCart.ts"
Total: ~3s, 1 tool call, ~2kB into context

Implementation Guide

For Developers (15 minutes)

Step 1: Get a dev token from admin
Step 2: claude mcp add brain --transport http --scope user https://brain.sdet.it/mcp --header "Authorization: Bearer $TOKEN"
Step 3: Restart CC → /mcp → brain connected
Step 4: See Quick Start

For Architects

Add repos to groups.yml
Run /brain-extract in CC, or wait for webhook extract
Brain auto-federates within 3min of push
Dashboard at /admin/dashboard

For Ops

Single Docker-compose stack (API + worker + Postgres + Redis + nginx)
Runs on 2-core 4GB VPS
Alerts out of the box (Discord, Slack)
Daily archive + rotation + observability

FAQ

Q: Does brain replace Grep/Read/Glob?

No. Brain adds semantic graph queries. CC still uses Grep for regex searches and Read for full file content. Brain's tools are for "what / where / who / connected".

Q: What if brain is down?

CC falls back to Grep/Read/Glob. Worst case: slow (baseline mode).

Q: How fresh is the graph?

5min polling + webhook re-extraction. Typical freshness < 5min.

Q: Is my code sent to third party?

Only graph metadata (node IDs, paths). Host on own VPS for zero external flow.

READY