How WOZCODE works

Smarter tools.
Patented token‑reduction tech.
Runs 100% on your machine.

A Claude Code plugin that collapses context growth on every API call. 25–55% cheaper, 30–40% faster, higher TerminalBench 2.0 score.

Install WOZCODE See the mechanics

The core insight

Context compounds. Every call re‑pays for the last one.

Every tool call feeds its output back as input tokens on the next turn. "Find and edit 3 files" isn't 3 calls — it's Find, Read, Edit, then Verify read to confirm the edit landed, repeated per file. And the last call pays input cost on everything before it.

Vanilla Claude Code

12+

Find → Read → Edit → Verify read
Repeated for each of 3 files. Every call re‑ingests the last's context.

With WOZCODE

1× Search (glob+regex+read)
1× batched Edit across N files
Post‑edit validation — no verify read needed.

Fewer calls → smaller context → the savings compound across the session. Not a one‑time trim — thousands of turns that never balloon in the first place.

And when a call goes sideways, we fix it before it costs a turn. Vanilla's Edit tool rejects any edit that isn't a 100% byte‑for‑byte match — tabs vs. spaces, an invisible trailing space, a curly “quote” where the file has a straight "one", and you eat a retry. WOZCODE's Edit fuzzy‑matches: it tolerates whitespace differences and treats visually‑identical characters (curly vs. straight quotes, em‑dashes, ellipsis) as equivalent during matching, so near‑misses still land. Every retry we prevent is one fewer turn on the compound curve.

The three levers

Cheaper, faster, and better — with a mechanism for each.

Cheaper

25–55%

cost reduction vs Claude Code

Fewer turns → smaller context → compounding savings per call. Measured from live Anthropic API usage fields — real dollars.

Faster

30–40%

faster on most tasks · 5–10× on database work

Smaller context → every API round‑trip completes faster. And because smarter tools return the right information in a single call, there are far fewer round‑trips to run in the first place.

Better

80.2%

TerminalBench 2.0 · vs 69% Claude Code

Post‑edit syntax validation and fuzzy matching mean fewer failed edits and retry loops. Less irrelevant context means less attention loss — the model stays focused on the task instead of drowning in bloat.

Under the hood

Fewer calls, richer results.

Capability	Vanilla Claude Code	WOZCODE	Impact
File Search	Glob + Grep + Read (3+ calls)	WOZCODE Search — one call combines glob, regex, and file reading with ranked snippets	3 → 1 calls
File Reading	Read dumps full file contents	AST truncation — replaces function bodies with stubs, keeps types and exports intact	40–60% fewer tokens
Editing	Edit tool — 1 file, 1 edit per call	Edit edits[] — batch N edits across multiple files in a single call	N → 1 calls
Match Accuracy	Exact string match (fails on whitespace drift)	Fuzzy matching — Levenshtein distance tolerates indentation diffs	Fewer retries
SQL Schema	Read `.sql` files manually or paste migrations	WOZCODE Sql — AST‑based introspection, live queries, FK graph	~5 → 1 calls
Session Memory	None (context window only)	WOZCODE Recall — semantic search across all past sessions, local‑only	New capability
Bash File Ops	Raw `cat`/`grep`/`find` output dumped into context	Auto‑intercepted — redirected to structured WOZCODE Search/Edit	Compounding rewards

In practice

Same refactor. Two call graphs.

“Rename handleAuth across src/auth/.” Same task, vanilla vs. WOZCODE.

Vanilla Claude Code

Glob "src/**/*.ts" → 127 files

Grep "handleAuth" → 4 matches

Read src/auth/login.ts → full file

Read src/auth/session.ts → full file

Read src/auth/token.ts → full file

Edit login.ts → ok

Read login.ts → verify

Edit session.ts → fail (whitespace drift)

Edit session.ts (retry) → ok

Read session.ts → verify

Edit token.ts → ok

Read token.ts → verify

With WOZCODE

woz.Search "handleAuth" in src/auth/** → 3 files, ranked snippets

woz.Edit edits[] = [login, session, token] → 3 files patched, syntax validated

Transparency

How savings are calculated.

Three numbers on your savings dashboard — here's where each one comes from.

Calls saved — counted

Each WOZCODE tool replaces a known set of vanilla Claude Code calls. We tally the difference, per call.

Cost & tokens saved — measured

Pulled straight from Anthropic's API usage fields for your live session, multiplied by calls saved, then priced at your model's posted rates — input, output, cache reads, and cache writes each at their own tier.

Time saved — estimated

Saved calls × a per‑call roundtrip time we calibrate against our benchmark dataset. The only metric we estimate — we tell you so plainly instead of dressing it up.

Run /woz-savings to see your current savings.

Quality loop

Fewer errors = fewer turns = less spend.

✓

Post‑edit syntax validation

TS compiler, JSON/YAML/HTML parsers, SQL linter run after every edit. Errors caught before the next turn.

✓

Fuzzy edit matching

Edits tolerate whitespace drift, indentation changes, and visually‑identical characters (curly vs. straight quotes, em‑dashes). Near‑misses still land — no retry round‑trip.

✓

SQL dialect auto‑fix

Common mistakes get rewritten before the error reaches the model: backtick identifiers, unquoted reserved aliases, COUNT(DISTINCT a, b), date_trunc("month", col). The query runs; the model never sees the error.

✓

Better error context

When an error does reach the model, we enrich it with dialect‑specific hints. Failed edits expand stubs with actual file content — real diff instead of "string not found".

✓

Dependency graph on first search

Import index surfaces "imported by" relationships. The model lands on the right entry point without a scavenger hunt.

✓

Summarized subagent output

Subagents return compressed summaries. The main thread gets the conclusion, not the transcript.

vs. graph‑based explorers

Why symbol graphs don't cover the full session.

Tools like SDL‑MCP give the model a pre‑indexed view of your codebase so it can traverse code without dumping full files. Useful for the first hop — but that's one leg of the trip.

Covers exploration and editing

Most of a real session is editing, validating, re‑editing — a graph doesn't save tokens there. WOZCODE cuts both input tokens (smarter search, AST truncation) and output tokens (batched edits, fewer retries).

No indexing step, no server round‑trip

No pre‑indexed graph to build, no source shipped to a remote service. Runs in‑process as a Claude Code plugin. Your code never leaves your machine.

Graphs need an entry point. Prompts rarely give you one.

A symbol graph is powerful once the agent has a function name. Most prompts are high‑level ("checkout is broken") — by the time the agent finds the entry point, most of the cost has happened. WOZCODE Search is built for that first‑hop problem.

vs. output‑style tricks

Why “talk like a caveman” doesn't move the session needle.

Tools like Caveman add a system‑prompt instruction telling the model to answer tersely — no articles, no pleasantries, no unsolicited explanations. Output tokens on the model's prose drop. But prose is a small slice of a real session, and nothing else changes.

Real numbers, not cherry‑picked slices

The headline “75% savings” is measured on the discursive text the model would otherwise produce — roughly a quarter of a typical session. Independent benchmarks put the total‑session saving closer to 4–10%. WOZCODE's numbers come from live Anthropic API usage fields across the whole session, not one slice.

Tool I/O is most of the cost — terse prose doesn't touch it

Most of a coding session's cost isn't the model's narrative; it's tool I/O — search results, file reads, grep dumps, repeated edits, each round‑trip re‑ingested on the next turn. A caveman‑style system prompt tightens the model's voice but leaves every byte of that tool traffic untouched.

You still get readable prose

You don't need the model to talk like a caveman to save money. WOZCODE's savings come from not putting redundant content into the context in the first place — explanations, diff summaries, and reasoning come through as normal English when you want them.

Agent architecture

Smart delegation: the cheapest capable model for each job.

~40% of coding work is exploration — WOZCODE routes those calls to Haiku automatically.

woz:code

User's model · Opus / Sonnet

Main thread. Writes and edits code with full tool access. Stays on your chosen frontier model.

woz:explore

Haiku · ~15× cheaper than Opus

Read‑only exploration. Returns summaries to the parent — main context stays lean.

Routing off the main thread saves 70%+ on exploration calls.

FAQ

The questions we get most often.

What is the baseline for "20% better" and "50% cheaper"?

WOZCODE vs. vanilla Claude Code — same session, same model, same task. Numbers come from real Anthropic API usage fields on a live session, not simulations. Run /woz benchmark to reproduce on your codebase.

Where do the token savings actually come from?

Two reinforcing sources:

1. Less content per call. Smarter search returns ranked snippets, not full‑file grep dumps. AST truncation stubs function bodies but keeps types and exports intact. SQL introspection returns just the columns the model asked about, not \d+ dumps and migration‑file pastes.

2. Fewer round‑trips. Every round‑trip re‑ingests the last one's output as input tokens, so cutting the number of calls compounds across the session. Batching a ten‑file refactor into one Edit call instead of ten saves nine entire re‑ingestions. Collapsing glob + grep + read into one Search call saves two more per query. Those savings stack — a 30‑prompt session ends up with a transcript a fraction the size of vanilla's.

How does WOZCODE handle context compression mid‑task?

Mostly it doesn't, and that's intentional. Trim too aggressively and you drop load‑bearing context; preserve too much and you defeat the savings. We avoid putting redundant content in, rather than compressing after. Claude Code's built‑in compaction handles the long‑tail case.

What telemetry do you actually collect?

Usage stats and auth checks. Nothing else.

What we send to our servers:

• Aggregated session stats — tool call counts, tokens (from Anthropic's own usage fields), estimated cost, turn counts, elapsed time. This is what powers the /woz-savings dashboard.
• Auth checks — when you log in, we verify your subscription status.

What we never send:

• Your source code, file contents, file paths, grep output, or any tool inputs/outputs
• Your prompts or the model's responses
• Your Anthropic API key

Every request to Anthropic goes straight from your machine through the same route vanilla Claude Code uses. WOZCODE is in the loop for tool execution — not for API transport. Our servers are a stats dashboard and an auth endpoint. That's it.

How does this compare to other token‑saving tools (SDL‑MCP, Caveman, etc.)?

Full comparison in vs. graph‑based explorers and vs. output‑style tricks above. Short version: WOZCODE covers the whole session — not just exploration or output prose — and every savings number comes from live API usage, not a theoretical baseline.

Does WOZCODE work with my IDE / my existing workflow?

Yes. Anywhere Claude Code runs — terminal, Claude Desktop, VS Code, Cursor, Conductor. Your tools, your model. WOZCODE changes what happens in between.

Verify it on your own codebase.

Install, then run the benchmark on any prompt sequence you decide. Real cost delta from live API usage fields — no estimates.

/woz benchmark

Install WOZCODE See pricing

Smarter tools.Patented token‑reduction tech.Runs 100% on your machine.

Smarter tools.
Patented token‑reduction tech.
Runs 100% on your machine.