Communication-theoretic reliability for LLM agents.
AgentCodec wraps every LLM call in a technique drawn from classical communication theory — HARQ, diversity combining, FEC, turbo decoding, and the cost-aware SemKNN router. 28 reliability techniques — 21 communication-theoretic methods across 6 families plus 7 prior-method baselines — plus an uncoded baseline (29 dispatchable in all) and 3 adaptive routers on top, behind a one-line drop-in for the OpenAI, Anthropic, and Ollama SDKs.
Reliability you can budget for.
On our paper benchmark — with one specific lineup (Nemotron and Devstral as generators, GLM-5.1 as judge) — routing the technique adaptively per prompt reached ~56% cost reduction at matched quality, or ~7% quality improvement at matched cost, versus the best fixed method we compared against at that lineup. A single knob, λ, slides the operating point along the cost/quality frontier. The qualitative pattern (adaptive beats fixed) should generalize; absolute numbers are lineup-specific.

Reliability without a rewrite.
If you already have working code against the OpenAI, Anthropic, or Ollama SDK, the fastest path to reliability is one import swap. messages, tools, stream=True, async/sync, and the full native response shape all keep working.
- from openai import OpenAI
+ from agentcodec.openai import OpenAI
client = OpenAI(api_key=KEY, reliability="harq_ir") # one kwarg
resp = client.chat.completions.create(
model="gpt-4o-mini",
messages=[{"role": "user", "content": "What is QUIC?"}],
)
print(resp.choices[0].message.content) # native OpenAI shape
print(resp.reliability.technique_used) # reliability escape hatch
print(resp.reliability.cost_usd)6 families. 21 techniques.
21 communication-theoretic methods span these 6 families, each a coding scheme over the noisy channel that is an LLM. They generalize self-consistency, self-refine, and chain-of-verification — and outperform them on benchmark.
Diversity Combining
N parallel branches across different models (spatial), prompt variants (frequency), or temperatures (time), combined by maximal-ratio, selection, or equal-gain. Generalizes self-consistency, best-of-N, mixture-of-agents.
Hybrid ARQ
Retry until quality clears the threshold. HARQ-IR adds new critic hints each round; HARQ-CC soft-combines every attempt into one quality-weighted answer. Generalizes self-refine and Reflexion.
Turbo Decoding
Generator drafts, critic returns structured extrinsic information, generator re-drafts on that feedback until convergence. Generalizes generator–critic and self-critique loops.
Fountain
Keep generating samples until the judge is satisfied. Easy prompts finish fast; hard prompts get more samples automatically. Generalizes universal self-consistency and dynamic-N voting.
Forward Error Correction
A systematic block code: a primary answer plus parity sections (reasoning, verification, alternatives). The decoder cross-checks for consistency and reconstructs the corrected answer. Generalizes chain-of-verification.
ACM + SemKNN Routing
Route per-prompt by estimated difficulty. The SemKNN router learned which technique wins for which kind of prompt; a single knob λ trades quality for cost along the frontier.
7 prior-method baselines bring the reliability set to 28; with an uncoded single-pass baseline, that's 29 dispatchable techniques in all — benchmark them head-to-head:
SemKNN — cost-aware, per-prompt.
SemKNN learned from benchmark data which technique works best for which kind of prompt. At inference it encodes your prompt locally with a small BGE model and sends only the resulting unit-norm embedding to the backend — your prompt never leaves the client — and gets back a technique recommendation.
One knob, the whole frontier
- λ = 0Pure quality — ignore cost entirely.
- λ = 1Balanced operating point.
- λ = 5~10% cheaper picks at near-matched quality.
- λ = 10~30% cheaper.
- λ = 20~45% cheaper.
Privacy & data flow
- Prompt text is encoded locally; only a 384-d unit-norm embedding leaves the client.
- Sent alongside a scalar λ and a small fingerprint of your model lineup — nothing else.
- All other routers (fixed, acm_table, acm_linear) run fully locally, no backend at all.
- Hosted SemKNN access for research, or a self-hosted backend under commercial license.
Every dollar is labeled.
AgentCodec never hands you a cost number without telling you how it got there. Each estimate carries a tier from a fixed enum — from exact per-token rates you configured down to a parameter-count guess from the model name — so you always know how loose the accounting is.
Tiered estimates
exact_user_rate → openrouter_rate → fuzzy → table → inferred. Lower tier = tighter accounting. The worst tier across all calls is surfaced on the result.
Full per-run trace
return_trace=True gives input/output/thinking tokens, per-call cost, judge cost, number of LLM calls, and the cost-source breakdown.
Live rate catalog
Unknown models are priced against the OpenRouter catalog (disk-cached) with exact and fuzzy matching, with explicit caveats when caching or volume discounts aren't modeled.
A library, not a framework.
No new agent abstraction to adopt. Drop it into the SDK you already use, keep your code, and get reliability, evaluation, and cost accounting as a thin layer underneath.
One-line drop-in
Swap `from openai import OpenAI` for `from agentcodec.openai import OpenAI`. Same messages, tools, streaming, and native response shape — add reliability with a single kwarg. Anthropic and Ollama too.
Native async streaming
27 of 29 techniques stream natively end-to-end. Per-token deltas carry role tags so you can demux drafts, critiques, and the final answer live.
Evaluation + CI gating
Compare configs with paired statistics, score against references, and wire a quality/cost gate into CI to block regressions before they ship.
Library, CLI, and FastAPI
Use it as a Python module, the `agentcodec` console script for batch JSONL runs, or behind a FastAPI / WebSocket endpoint. Construct from YAML or a dict.
Privacy by design
SemKNN encodes your prompt locally with a small BGE model and sends only the unit-norm embedding. Telemetry is anonymous and opt-out; fully on-prem deployments are available.
Provider-agnostic
Frontier APIs, OpenRouter, or local models via Ollama / vLLM / SGLang. Mix model families in one lineup — uncorrelated errors are exactly what diversity combining wants.
Install and run in minutes.
Python ≥ 3.10. Core install pulls a small ONNX BGE encoder — no torch, no 1.5 GB download. Pick the provider extras you actually use.
# from source (not on PyPI yet)
pip install 'git+https://github.com/intellerce/agentcodec.git'
# with the provider extras you use
pip install 'agentcodec[openai] @ \
git+https://github.com/intellerce/agentcodec.git'from agentcodec import ReliabilityModule
mod = ReliabilityModule.from_yaml(
"configs/lib/fixed_harq.yaml")
result = mod.run("How does QUIC compare to TCP?")
print(result.text) # the answer
print(result.technique_used) # "harq_ir"
print(result.cost_usd) # estimated cost
print(result.latency_s) # wall-clock secondsSource-available. Free for research.
AgentCodec is released under the PolyForm Noncommercial License 1.0.0 — free for research, teaching, personal experimentation, and internal noncommercial evaluation, no payment or notification required. Commercial use (shipping it in a product, a paid service, or a for-profit pipeline) needs a separate license.
- Research, teaching, and personal experimentation — free.
- Internal noncommercial evaluation — free, no notification.
- Shipping in a product or paid service — commercial license.
- Self-hosted SemKNN backend — commercial license.
Drop reliability into your agent today.
Change one import, add one kwarg, and your OpenAI / Anthropic / Ollama calls gain communication-theoretic reliability. Need a commercial license or a hosted SemKNN backend? Talk to us.
