How WOZCODE works

Smarter tools.
Patented token‑reduction tech.
Runs 100% on your machine.

A Claude Code plugin that collapses context growth on every API call. 25–55% cheaper, 5–10× faster on DB tasks, higher TerminalBench 2.0 score. Runs 100% locally.

Install WOZCODE See the mechanics
Local plugin. No indexing, no code upload. Anthropic sees what vanilla Claude Code sends; we see none of it.
The core insight

Context compounds. Every call re‑pays for the last one.#

Every tool call feeds its output back as input tokens on the next turn. "Find and edit 3 files" isn't 3 calls — it's Find, Read, Edit, then Verify read to confirm the edit landed, repeated per file. And the last call pays input cost on everything before it.

Vanilla Claude Code
12+
Find → Read → Edit → Verify read
Repeated for each of 3 files. Every call re‑ingests the last's context.
With WOZCODE
2
1× Search (glob+regex+read)
1× batched Edit across N files

Fewer calls → smaller context → the savings compound across the session. Not a one‑time trim — thousands of turns that never balloon in the first place.

The three levers

Cheaper, faster, and better — with a mechanism for each.#

Cheaper
25–55%
cost reduction vs Claude Code
Fewer turns → smaller context → compounding savings per call. Measured from live Anthropic API usage fields — real dollars.
Faster
5–10×
on database tasks · 30–40% on most tasks
Smaller context → every API round‑trip completes faster. And because smarter tools return the right information in a single call, there are far fewer round‑trips to run in the first place.
Better
80.2%
TerminalBench 2.0 · vs 69% Claude Code
Post‑edit syntax validation and fuzzy matching mean fewer failed edits and retry loops. Less irrelevant context means less attention loss — the model stays focused on the task instead of drowning in bloat.
Under the hood

Fewer calls. Richer results. Per‑tool.#

Capability Vanilla Claude Code WOZCODE Impact
File Search Glob + Grep + Read (3+ calls) WOZCODE Search — one call combines glob, regex, and file reading with ranked snippets 3 → 1 calls
File Reading Read dumps full file contents AST truncation — replaces function bodies with stubs, keeps types and exports intact 40–60% fewer tokens
Editing Edit tool — 1 file, 1 edit per call Edit edits[] — batch N edits across multiple files in a single call N → 1 calls
Match Accuracy Exact string match (fails on whitespace drift) Fuzzy matching — Levenshtein distance tolerates indentation diffs Fewer retries
SQL Schema Read .sql files manually or paste migrations WOZCODE Sql — AST‑based introspection, live queries, FK graph ~5 → 1 calls
Session Memory None (context window only) WOZCODE Recall — semantic search across all past sessions, local‑only New capability
Bash File Ops Raw cat/grep/find output dumped into context Auto‑intercepted — redirected to structured WOZCODE Search/Edit Compounding rewards
Quality loop

Fewer errors = fewer turns = less spend.#

Post‑edit syntax validation

TS compiler, JSON/YAML/HTML parsers, SQL linter run after every edit. Errors caught before the next turn.

Dependency graph on first search

Import index surfaces "imported by" relationships. The model lands on the right entry point without a scavenger hunt.

Better error context

Failed edits expand stubs with actual file content — real diff instead of "string not found".

Summarized subagent output

Subagents return compressed summaries. The main thread gets the conclusion, not the transcript.

vs. other token‑saving tools

Why graph‑based explorers don't cover the full session.#

Tools like SDL‑MCP and Caveman help the model traverse code without dumping full files. That's one leg of the trip.

Real numbers, not theoretical baselines
The 91% figure some graph tools quote is against a worst‑case "read every file" baseline, not what Claude Code actually does. Our numbers come from live Anthropic API usage fields. Run /woz-benchmark to verify on your own codebase.
Covers exploration and editing
Most of a real session is editing, validating, re‑editing — a graph doesn't save tokens there. WOZCODE cuts both input tokens (smarter search, AST truncation) and output tokens (batched edits, fewer retries).
No indexing step, no server round‑trip
No pre‑indexed graph to build, no source shipped to a remote service. Runs in‑process as a Claude Code plugin. Your code never leaves your machine.
Graphs need an entry point. Prompts rarely give you one.
A symbol graph is powerful once the agent has a function name. Most prompts are high‑level ("checkout is broken") — by the time the agent finds the entry point, most of the cost has happened. WOZCODE Search is built for that first‑hop problem.
We ride on built‑in compaction, we don't replace it
Don't put redundant content into the context in the first place. Claude Code's built‑in compaction handles the long‑tail case.
Agent architecture

Smart delegation: the cheapest capable model for each job.#

~40% of coding work is exploration and planning — WOZCODE routes those calls to smaller models automatically.

woz:code
User's model · Opus / Sonnet

Main thread. Writes and edits code with full tool access. Stays on your chosen frontier model.

woz:explore
Haiku · ~15× cheaper than Opus

Read‑only exploration. Returns summaries to the parent — main context stays lean.

woz:plan
Sonnet · auto‑escalates to Opus

Architecture & planning. Runs on Sonnet by default and upgrades to Opus when the problem calls for it — so you only pay for the bigger model when it actually matters.

Routing off the main thread saves 70%+ on exploration and planning calls.

FAQ

The questions we get most often.#

WOZCODE vs. vanilla Claude Code — same session, same model, same task. Numbers come from real Anthropic API usage fields on a live session, not simulations. Run /woz-benchmark to reproduce on your codebase.

Mostly from what never enters the context in the first place:

Smarter search — ranked snippets instead of full‑file grep dumps. Batched editing — ten edits across ten files in one call instead of ten round‑trips. SQL introspection — targeted schema queries instead of \d+ dumps and migration‑file pastes.

Mostly it doesn't, and that's intentional. Trim too aggressively and you drop load‑bearing context; preserve too much and you defeat the savings. We avoid putting redundant content in, rather than compressing after. Claude Code's built‑in compaction handles the long‑tail case.

No. The plugin runs locally, in‑process with your Claude Code session — no indexing, no upload, no WOZCODE server in the request path. The only data reaching our servers is anonymized usage counts that power the savings dashboard, and auth checks.

Graphs only help once the agent has an entry point, and they only cover exploration. Editing, validation, and retries are where most session tokens go. WOZCODE covers both halves — no indexing, no source leaves your machine, savings measured from real API usage, not theoretical baselines.

Yes. Anywhere Claude Code runs — terminal, Claude Desktop, VS Code, Cursor, Conductor. Your tools, your model. WOZCODE changes what happens in between.

Verify it on your own codebase.#

Install, then run the benchmark on any prompt sequence you decide. Real cost delta from live API usage fields — no estimates.

/woz-benchmark
Install WOZCODE See pricing