How WOZCODE works

Smarter tools.
Patented token‑reduction tech.
Runs 100% on your machine.

A Claude Code plugin that collapses context growth on every API call. 25–55% cheaper, 30–40% faster, higher TerminalBench 2.0 score.

Install WOZCODE See the mechanics
Works with your existing Anthropic subscription. Local plugin — no indexing, no code upload, no middleman. Anthropic sees what vanilla Claude Code sends; we see none of it. Read the full security & data practices →
The core insight

Context compounds. Every call re‑pays for the last one.

Every tool call feeds its output back as input tokens on the next turn. "Find and edit 3 files" isn't 3 calls — it's Find, Read, Edit, then Verify read to confirm the edit landed, repeated per file. And the last call pays input cost on everything before it.

Vanilla Claude Code
12+
Find → Read → Edit → Verify read
Repeated for each of 3 files. Every call re‑ingests the last's context.
With WOZCODE
2
1× Search (glob+regex+read)
1× batched Edit across N files
Post‑edit validation — no verify read needed.

Fewer calls → smaller context → the savings compound across the session. Not a one‑time trim — thousands of turns that never balloon in the first place.

And when a call goes sideways, we fix it before it costs a turn. Vanilla's Edit tool rejects any edit that isn't a 100% byte‑for‑byte match — tabs vs. spaces, an invisible trailing space, a curly “quote” where the file has a straight "one", and you eat a retry. WOZCODE's Edit fuzzy‑matches: it tolerates whitespace differences and treats visually‑identical characters (curly vs. straight quotes, em‑dashes, ellipsis) as equivalent during matching, so near‑misses still land. Every retry we prevent is one fewer turn on the compound curve.

The three levers

Cheaper, faster, and better — with a mechanism for each.

Cheaper
25–55%
cost reduction vs Claude Code
Fewer turns → smaller context → compounding savings per call. Measured from live Anthropic API usage fields — real dollars.
Faster
30–40%
faster on most tasks · 5–10× on database work
Smaller context → every API round‑trip completes faster. And because smarter tools return the right information in a single call, there are far fewer round‑trips to run in the first place.
Better
80.2%
TerminalBench 2.0 · vs 69% Claude Code
Post‑edit syntax validation and fuzzy matching mean fewer failed edits and retry loops. Less irrelevant context means less attention loss — the model stays focused on the task instead of drowning in bloat.
Capability Vanilla Claude Code WOZCODE Impact
File Search Glob + Grep + Read (3+ calls) WOZCODE Search — one call combines glob, regex, and file reading with ranked snippets 3 → 1 calls
File Reading Read dumps full file contents AST truncation — replaces function bodies with stubs, keeps types and exports intact 40–60% fewer tokens
Editing Edit tool — 1 file, 1 edit per call Edit edits[] — batch N edits across multiple files in a single call N → 1 calls
Match Accuracy Exact string match (fails on whitespace drift) Fuzzy matching — Levenshtein distance tolerates indentation diffs Fewer retries
SQL Schema Read .sql files manually or paste migrations WOZCODE Sql — AST‑based introspection, live queries, FK graph ~5 → 1 calls
Session Memory None (context window only) WOZCODE Recall — semantic search across all past sessions, local‑only New capability
Bash File Ops Raw cat/grep/find output dumped into context Auto‑intercepted — redirected to structured WOZCODE Search/Edit Compounding rewards
In practice

Same refactor. Two call graphs.

“Rename handleAuth across src/auth/.” Same task, vanilla vs. WOZCODE.

Vanilla Claude Code
Glob "src/**/*.ts" 127 files
Grep "handleAuth" 4 matches
Read src/auth/login.ts full file
Read src/auth/session.ts full file
Read src/auth/token.ts full file
Edit login.ts ok
Read login.ts verify
Edit session.ts → fail (whitespace drift)
Edit session.ts (retry) ok
Read session.ts verify
Edit token.ts ok
Read token.ts verify
With WOZCODE
woz.Search "handleAuth" in src/auth/** 3 files, ranked snippets
woz.Edit edits[] = [login, session, token] 3 files patched, syntax validated
Transparency

How savings are calculated.

Three numbers on your savings dashboard — here's where each one comes from.

Calls saved — counted
Each WOZCODE tool replaces a known set of vanilla Claude Code calls. We tally the difference, per call.
Cost & tokens saved — measured
Pulled straight from Anthropic's API usage fields for your live session, multiplied by calls saved, then priced at your model's posted rates — input, output, cache reads, and cache writes each at their own tier.
Time saved — estimated
Saved calls × a per‑call roundtrip time we calibrate against our benchmark dataset. The only metric we estimate — we tell you so plainly instead of dressing it up.

Run /woz-savings to see your current savings.

Quality loop

Fewer errors = fewer turns = less spend.

Post‑edit syntax validation

TS compiler, JSON/YAML/HTML parsers, SQL linter run after every edit. Errors caught before the next turn.

Fuzzy edit matching

Edits tolerate whitespace drift, indentation changes, and visually‑identical characters (curly vs. straight quotes, em‑dashes). Near‑misses still land — no retry round‑trip.

SQL dialect auto‑fix

Common mistakes get rewritten before the error reaches the model: backtick identifiers, unquoted reserved aliases, COUNT(DISTINCT a, b), date_trunc("month", col). The query runs; the model never sees the error.

Better error context

When an error does reach the model, we enrich it with dialect‑specific hints. Failed edits expand stubs with actual file content — real diff instead of "string not found".

Dependency graph on first search

Import index surfaces "imported by" relationships. The model lands on the right entry point without a scavenger hunt.

Summarized subagent output

Subagents return compressed summaries. The main thread gets the conclusion, not the transcript.

vs. graph‑based explorers

Why symbol graphs don't cover the full session.

Tools like SDL‑MCP give the model a pre‑indexed view of your codebase so it can traverse code without dumping full files. Useful for the first hop — but that's one leg of the trip.

Covers exploration and editing
Most of a real session is editing, validating, re‑editing — a graph doesn't save tokens there. WOZCODE cuts both input tokens (smarter search, AST truncation) and output tokens (batched edits, fewer retries).
No indexing step, no server round‑trip
No pre‑indexed graph to build, no source shipped to a remote service. Runs in‑process as a Claude Code plugin. Your code never leaves your machine.
Graphs need an entry point. Prompts rarely give you one.
A symbol graph is powerful once the agent has a function name. Most prompts are high‑level ("checkout is broken") — by the time the agent finds the entry point, most of the cost has happened. WOZCODE Search is built for that first‑hop problem.
vs. output‑style tricks

Why “talk like a caveman” doesn't move the session needle.

Tools like Caveman add a system‑prompt instruction telling the model to answer tersely — no articles, no pleasantries, no unsolicited explanations. Output tokens on the model's prose drop. But prose is a small slice of a real session, and nothing else changes.

Real numbers, not cherry‑picked slices
The headline “75% savings” is measured on the discursive text the model would otherwise produce — roughly a quarter of a typical session. Independent benchmarks put the total‑session saving closer to 4–10%. WOZCODE's numbers come from live Anthropic API usage fields across the whole session, not one slice.
Tool I/O is most of the cost — terse prose doesn't touch it
Most of a coding session's cost isn't the model's narrative; it's tool I/O — search results, file reads, grep dumps, repeated edits, each round‑trip re‑ingested on the next turn. A caveman‑style system prompt tightens the model's voice but leaves every byte of that tool traffic untouched.
You still get readable prose
You don't need the model to talk like a caveman to save money. WOZCODE's savings come from not putting redundant content into the context in the first place — explanations, diff summaries, and reasoning come through as normal English when you want them.
Agent architecture

Smart delegation: the cheapest capable model for each job.

~40% of coding work is exploration — WOZCODE routes those calls to Haiku automatically.

woz:code
User's model · Opus / Sonnet

Main thread. Writes and edits code with full tool access. Stays on your chosen frontier model.

woz:explore
Haiku · ~15× cheaper than Opus

Read‑only exploration. Returns summaries to the parent — main context stays lean.

Routing off the main thread saves 70%+ on exploration calls.

WOZCODE vs. vanilla Claude Code — same session, same model, same task. Numbers come from real Anthropic API usage fields on a live session, not simulations. Run /woz-benchmark to reproduce on your codebase.

Two reinforcing sources:

1. Less content per call. Smarter search returns ranked snippets, not full‑file grep dumps. AST truncation stubs function bodies but keeps types and exports intact. SQL introspection returns just the columns the model asked about, not \d+ dumps and migration‑file pastes.

2. Fewer round‑trips. Every round‑trip re‑ingests the last one's output as input tokens, so cutting the number of calls compounds across the session. Batching a ten‑file refactor into one Edit call instead of ten saves nine entire re‑ingestions. Collapsing glob + grep + read into one Search call saves two more per query. Those savings stack — a 30‑prompt session ends up with a transcript a fraction the size of vanilla's.

Mostly it doesn't, and that's intentional. Trim too aggressively and you drop load‑bearing context; preserve too much and you defeat the savings. We avoid putting redundant content in, rather than compressing after. Claude Code's built‑in compaction handles the long‑tail case.

Usage stats and auth checks. Nothing else.

What we send to our servers:

Aggregated session stats — tool call counts, tokens (from Anthropic's own usage fields), estimated cost, turn counts, elapsed time. This is what powers the /woz-savings dashboard.
Auth checks — when you log in, we verify your subscription status.

What we never send:

• Your source code, file contents, file paths, grep output, or any tool inputs/outputs
• Your prompts or the model's responses
• Your Anthropic API key

Every request to Anthropic goes straight from your machine through the same route vanilla Claude Code uses. WOZCODE is in the loop for tool execution — not for API transport. Our servers are a stats dashboard and an auth endpoint. That's it.

Full comparison in vs. graph‑based explorers and vs. output‑style tricks above. Short version: WOZCODE covers the whole session — not just exploration or output prose — and every savings number comes from live API usage, not a theoretical baseline.

Yes. Anywhere Claude Code runs — terminal, Claude Desktop, VS Code, Cursor, Conductor. Your tools, your model. WOZCODE changes what happens in between.

Verify it on your own codebase.

Install, then run the benchmark on any prompt sequence you decide. Real cost delta from live API usage fields — no estimates.

/woz-benchmark
Install WOZCODE See pricing