AgentCodec · Open Source · Reliability for LLM agents

Communication-theoretic reliability for LLM agents.

Animated visualization: independent prismatic channels recombining into a single reliable signal.

AgentCodec wraps every LLM call in a technique drawn from classical communication theory — HARQ, diversity combining, FEC, turbo decoding, and the cost-aware SemKNN router. 28 reliability techniques — 21 communication-theoretic methods across 6 families plus 7 prior-method baselines — plus an uncoded baseline (29 dispatchable in all) and 3 adaptive routers on top, behind a one-line drop-in for the OpenAI, Anthropic, and Ollama SDKs.

Why it matters

Reliability you can budget for.

On our paper benchmark — with one specific lineup (Nemotron and Devstral as generators, GLM-5.1 as judge) — routing the technique adaptively per prompt reached ~56% cost reduction at matched quality, or ~7% quality improvement at matched cost, versus the best fixed method we compared against at that lineup. A single knob, λ, slides the operating point along the cost/quality frontier. The qualitative pattern (adaptive beats fixed) should generalize; absolute numbers are lineup-specific.

AgentCodec cost–quality Pareto frontier benchmark
Cost–quality Pareto frontier with 95% confidence rectangles. The SemKNN router (arrows) moves the operating point toward the upper-left frontier — higher quality at lower cost than fixed prior methods.
~56%
Cost reduction at matched quality
~7%
Quality gain at matched cost
28
Reliability techniques
29
Dispatchable (with baseline)
3
Adaptive routers
Change one import

Reliability without a rewrite.

If you already have working code against the OpenAI, Anthropic, or Ollama SDK, the fastest path to reliability is one import swap. messages, tools, stream=True, async/sync, and the full native response shape all keep working.

Swap the import, then opt in with one kwarg
- from openai import OpenAI
+ from agentcodec.openai import OpenAI

client = OpenAI(api_key=KEY, reliability="harq_ir")   # one kwarg

resp = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[{"role": "user", "content": "What is QUIC?"}],
)
print(resp.choices[0].message.content)        # native OpenAI shape
print(resp.reliability.technique_used)        # reliability escape hatch
print(resp.reliability.cost_usd)
The codec

6 families. 21 techniques.

21 communication-theoretic methods span these 6 families, each a coding scheme over the noisy channel that is an LLM. They generalize self-consistency, self-refine, and chain-of-verification — and outperform them on benchmark.

MRC · SC · EGC

Diversity Combining

N parallel branches across different models (spatial), prompt variants (frequency), or temperatures (time), combined by maximal-ratio, selection, or equal-gain. Generalizes self-consistency, best-of-N, mixture-of-agents.

Chase Combining · Incremental Redundancy

Hybrid ARQ

Retry until quality clears the threshold. HARQ-IR adds new critic hints each round; HARQ-CC soft-combines every attempt into one quality-weighted answer. Generalizes self-refine and Reflexion.

Iterative SISO

Turbo Decoding

Generator drafts, critic returns structured extrinsic information, generator re-drafts on that feedback until convergence. Generalizes generator–critic and self-critique loops.

Rateless sampling

Fountain

Keep generating samples until the judge is satisfied. Easy prompts finish fast; hard prompts get more samples automatically. Generalizes universal self-consistency and dynamic-N voting.

Code rates 0.75 / 0.50 / 0.33

Forward Error Correction

A systematic block code: a primary answer plus parity sections (reasoning, verification, alternatives). The decoder cross-checks for consistency and reconstructs the corrected answer. Generalizes chain-of-verification.

Adaptive coding & modulation

ACM + SemKNN Routing

Route per-prompt by estimated difficulty. The SemKNN router learned which technique wins for which kind of prompt; a single knob λ trades quality for cost along the frontier.

Also implemented as baselines

7 prior-method baselines bring the reliability set to 28; with an uncoded single-pass baseline, that's 29 dispatchable techniques in all — benchmark them head-to-head:

Self-Consistency Self-Refine Chain-of-Verification Best-of-N Weighted Best-of-N CISC Mixture-of-Agents
The recommended router

SemKNN — cost-aware, per-prompt.

SemKNN learned from benchmark data which technique works best for which kind of prompt. At inference it encodes your prompt locally with a small BGE model and sends only the resulting unit-norm embedding to the backend — your prompt never leaves the client — and gets back a technique recommendation.

One knob, the whole frontier

  • λ = 0Pure quality — ignore cost entirely.
  • λ = 1Balanced operating point.
  • λ = 5~10% cheaper picks at near-matched quality.
  • λ = 10~30% cheaper.
  • λ = 20~45% cheaper.

Privacy & data flow

  • Prompt text is encoded locally; only a 384-d unit-norm embedding leaves the client.
  • Sent alongside a scalar λ and a small fingerprint of your model lineup — nothing else.
  • All other routers (fixed, acm_table, acm_linear) run fully locally, no backend at all.
  • Hosted SemKNN access for research, or a self-hosted backend under commercial license.
Cost transparency

Every dollar is labeled.

AgentCodec never hands you a cost number without telling you how it got there. Each estimate carries a tier from a fixed enum — from exact per-token rates you configured down to a parameter-count guess from the model name — so you always know how loose the accounting is.

Tiered estimates

exact_user_rate → openrouter_rate → fuzzy → table → inferred. Lower tier = tighter accounting. The worst tier across all calls is surfaced on the result.

Full per-run trace

return_trace=True gives input/output/thinking tokens, per-call cost, judge cost, number of LLM calls, and the cost-source breakdown.

Live rate catalog

Unknown models are priced against the OpenRouter catalog (disk-cached) with exact and fuzzy matching, with explicit caveats when caching or volume discounts aren't modeled.

Built for developers

A library, not a framework.

No new agent abstraction to adopt. Drop it into the SDK you already use, keep your code, and get reliability, evaluation, and cost accounting as a thin layer underneath.

One-line drop-in

Swap `from openai import OpenAI` for `from agentcodec.openai import OpenAI`. Same messages, tools, streaming, and native response shape — add reliability with a single kwarg. Anthropic and Ollama too.

Native async streaming

27 of 29 techniques stream natively end-to-end. Per-token deltas carry role tags so you can demux drafts, critiques, and the final answer live.

Evaluation + CI gating

Compare configs with paired statistics, score against references, and wire a quality/cost gate into CI to block regressions before they ship.

Library, CLI, and FastAPI

Use it as a Python module, the `agentcodec` console script for batch JSONL runs, or behind a FastAPI / WebSocket endpoint. Construct from YAML or a dict.

Privacy by design

SemKNN encodes your prompt locally with a small BGE model and sends only the unit-norm embedding. Telemetry is anonymous and opt-out; fully on-prem deployments are available.

Provider-agnostic

Frontier APIs, OpenRouter, or local models via Ollama / vLLM / SGLang. Mix model families in one lineup — uncorrelated errors are exactly what diversity combining wants.

Get started

Install and run in minutes.

Python ≥ 3.10. Core install pulls a small ONNX BGE encoder — no torch, no 1.5 GB download. Pick the provider extras you actually use.

Install
# from source (not on PyPI yet)
pip install 'git+https://github.com/intellerce/agentcodec.git'

# with the provider extras you use
pip install 'agentcodec[openai] @ \
  git+https://github.com/intellerce/agentcodec.git'
Quickstart
from agentcodec import ReliabilityModule

mod = ReliabilityModule.from_yaml(
    "configs/lib/fixed_harq.yaml")

result = mod.run("How does QUIC compare to TCP?")
print(result.text)             # the answer
print(result.technique_used)   # "harq_ir"
print(result.cost_usd)         # estimated cost
print(result.latency_s)        # wall-clock seconds
Licensing

Source-available. Free for research.

AgentCodec is released under the PolyForm Noncommercial License 1.0.0 — free for research, teaching, personal experimentation, and internal noncommercial evaluation, no payment or notification required. Commercial use (shipping it in a product, a paid service, or a for-profit pipeline) needs a separate license.

PolyForm Noncommercial 1.0.0
  • Research, teaching, and personal experimentation — free.
  • Internal noncommercial evaluation — free, no notification.
  • Shipping in a product or paid service — commercial license.
  • Self-hosted SemKNN backend — commercial license.

Drop reliability into your agent today.

Change one import, add one kwarg, and your OpenAI / Anthropic / Ollama calls gain communication-theoretic reliability. Need a commercial license or a hosted SemKNN backend? Talk to us.