Your AI assistant doesn't need to read
the whole git diff.

thlibo sits between Claude Code / Codex and the shell. It rewrites git diff, npm install, cargo test and other verbose commands so the model sees the summary, not the noise. Deterministic Python filters handle the usual suspects; a local Gemma 4 model compresses everything else. Nothing leaves your machine.

macos / linux

$curl -fsSL https://raw.githubusercontent.com/3rg0n/thlibo/main/scripts/install.sh | bash

windows · powershell

PS>irm https://raw.githubusercontent.com/3rg0n/thlibo/main/scripts/install.ps1 | iex

v 0.1.0 binary ~4 MB model 5.1 GB (opt-in) hf tokens none

§ i

What comes back is smaller.

measured 2026-05-13

git diff HEAD~5 via git-filter · python

68,821 B → 929 B −98.6%

npm list (200 deps) via npm-filter · python

9,228 B → 52 B −99.4%

cargo test (20 failures) via cargo-filter · python

13,227 B → 1,434 B −89.2%

Same inputs, reproducible via go test ./internal/middleware/... -run TokenSavings. Raw bytes measured before the thlibo exec wrapper touches them; compressed bytes after the Python filter returns.

§ ii

How it works, in three moves.

architecture

A hook, not a proxy.

thlibo installs a Claude Code PreToolUse hook. Before the Bash tool runs, the hook asks thlibo rewrite whether to wrap the command. If yes, it swaps git status for thlibo exec -- git status.

Codex uses the symmetric PostToolUse path — decision:block with the compressed result as the reason. Same engine underneath.

ii.

Python filters, where a regex will do.

git-filter, npm-filter, cargo-filter. Short Python scripts with tight regex rules. They collapse diff hunks, strip dep-tree chatter, keep the failing tests. Nothing a model could do better, faster, or more reliably.

Add your own in ~/.thlibo/processors/<name>/ — a processor.yaml descriptor and a script. It picks up on next invocation.

iii.

A local model, for the rest.

When no deterministic filter matches, thlibo asks a local Gemma 4 E4B — quantised, ~5 GB on disk, CPU-runnable via llamafile — which processor (if any) to route to. Prompt processors summarise stack traces, error logs, and arbitrary output.

One warm model per daemon. Single-instance lock. Unix socket or named pipe, ACL'd to your user.

If any step fails — daemon down, model missing, script crashes — the original bytes pass through unchanged. The model sees what it would have seen without thlibo. The compressor never breaks the client.

§ iii

It's not sending your diffs anywhere.

local-first

thlibo runs as your user. The daemon binds a Unix domain socket (or Windows named pipe) with an ACL that grants only your SID access. Nothing listens on the network.

No telemetry. No opt-in-to-ship-anonymised-usage-data. No HuggingFace token required — the model pull is a plain anonymous HTTPS GET against a public Apache-licensed repack, SHA-256 pinned at build time.

If your repo has a clause against uploading source, thlibo does not violate it. If your model is air-gapped, it works offline.

transportunix socket / named pipe · never tcp*
bind addr127.0.0.1:47320 · loopback-only fallback
aclowner SID · mode 0660 group thlibo-users
telemetrynone
modelunsloth/gemma-4-E4B-it · Apache-2.0
installper-user · no sudo, no admin

# IPC endpoints, by platform
linux   /run/thlibo/infer.sock
darwin  /var/run/thlibo/infer.sock
windows \\.\pipe\thlibo-infer

# permissions
infer 0660 group thlibo-users
admin 0600 owner only

# single-instance lock
/run/thlibo/thlibod.lock

# model, on disk
~/.thlibo/models/gemma-4-e4b-ud-q4-k-xl.gguf
5,126,304,928 bytes · sha256 pinned

Your AI assistant doesn't need to read the whole git diff.