Local Context Engine • Structural Skill Synthesis • MCP-Native
Entroly is a context engine with a local self-improvement loop: it detects coverage gaps, synthesizes new skills from your codebase's structure, benchmarks them, and promotes the winners — using structural analysis rather than model calls when possible. Budget-gated. Deterministic. Local-first. On large-repo workloads, release checks observed 70–95% input token reduction.
pip install entroly && entroly go | npm install -g entroly && entroly
Problem • Solution • Install • Demo • Integrations • Architecture • Self-Improving • Federation • Distillation • Community
Example trace from this repo’s local development vault:
[detect] gap observed → entity="auth", miss_count=3
[synthesize] StructuralSynthesizer ($0, deterministic, no LLM)
[benchmark] skill=ddb2e2969bb0 → fitness 1.0 (1 pass / 0 fail, 338 ms)
[promote] status: draft → promoted
[registry] .entroly/vault/evolution/registry.md updated
[spend] $0.0000 — invariant C_spent ≤ τ·S(t) holds
The structural synthesizer reads your code graph rather than calling an LLM. When structural synthesis can’t solve a gap, the LLM fallback is budget-gated by cumulative token savings — intended to keep learning cost below lifetime savings.
→ See The 3 Pillars of Zero-Token Autonomy for how.
AI coding tools that send raw file dumps often face the same limitation:
The model may only receive a handful of files at a time. The rest of your codebase is not represented.
This causes:
auth.py without knowing about auth_config.pyEntroly addresses this by selecting compact, variable-resolution context from the full repository.
Entroly selects context from your entire codebase at variable resolution.
| What changes | Without Entroly | With Entroly |
|---|---|---|
| Files visible to AI | 5-10 files | Supported files selected at variable resolution |
| Tokens per request | ~186,000 raw example | 9,300 – 55,000 in listed release examples |
| Cost per 1K requests | depends on provider/model | lower when input tokens drop |
| AI answer grounding | depends on supplied context | auditable against selected evidence |
| Setup time | manual prompt engineering | 30 seconds |
| Overhead | N/A | < 10ms local core paths |
Critical files appear in full. Supporting files appear as signatures. Everything else appears as references. The AI receives broader structural context within a smaller token budget.
| RAG (vector search) | Entroly (context engineering) | |
|---|---|---|
| What it sends | Top-K similar chunks | Selected codebase context at variable resolution |
| Handles duplicates | No — sends same code 3x | SimHash dedup in O(1) |
| Dependency-aware | No | Yes — auto-includes related files |
| Learns from usage | No | Yes — RL optimizes from AI response quality |
| Needs embeddings API | Yes (extra cost + latency) | No — runs locally |
| Budgeted selection | Approximate | Knapsack optimizer over Entroly’s scoring objective |
pip install entroly && entroly demo # see savings on YOUR codebase
Open the interactive demo for the animated experience.
Python:
pip install entroly[full]
entroly go
Node.js / TypeScript:
npm install entroly-wasm
npx entroly-wasm serve # MCP server
npx entroly-wasm optimize # CLI optimizer
npx entroly-wasm demo # see savings on YOUR codebase
Or use the short compatibility package:
npm install -g entroly
entroly serve
entroly optimize 8000 "fix the auth bug"
entroly demo
Both npm packages run the full Rust engine natively in Node.js — no Python required.
That’s it. entroly go (Python) or entroly serve / npx entroly-wasm serve (Node.js) auto-detects your IDE, starts the engine, and begins optimizing. Point your AI tool to http://localhost:9377/v1.
# Python
pip install entroly # core engine
entroly init # detect IDE + generate config
entroly proxy --quality balanced # start proxy
# Node.js
npm install -g entroly # short alias for the WASM runtime
entroly serve # start MCP server
# Or install the WASM package directly
npm install entroly-wasm # WASM engine, zero dependencies
npx entroly-wasm serve # start MCP server
| Package | What you get |
|---|---|
npm install -g entroly |
Short CLI alias that installs and delegates to entroly-wasm |
npm install entroly-wasm |
Full Rust engine via WebAssembly — MCP server, CLI, autotune, health |
| Package | What you get |
|---|---|
pip install entroly |
Core — MCP server + Python engine |
pip install entroly[proxy] |
+ HTTP proxy mode |
pip install entroly[native] |
+ Rust engine (50-100x faster) |
pip install entroly[full] |
Everything |
docker pull ghcr.io/juyterman1000/entroly:latest
docker run --rm -p 9377:9377 -p 9378:9378 -v .:/workspace:ro ghcr.io/juyterman1000/entroly:latest
| AI Tool | Setup | Method |
|---|---|---|
| Cursor | entroly init |
MCP server |
| Claude Code | claude mcp add entroly -- entroly |
MCP server |
| VS Code MCP clients | entroly init |
MCP server |
| Windsurf | entroly init |
MCP server |
| Cline | entroly init |
MCP server |
| Compatible LLM APIs | entroly proxy |
HTTP proxy |
“Entroly handled the context selection so I stopped manually pasting code.”
entroly go handles the common local setup path. No embeddings API is required.When developers search for “token saving proxy” or “context compression”, Entroly offers distinct advantages over standard alternatives:
| Feature | Entroly | Basic Proxies |
|---|---|---|
| Setup | Zero-config (entroly go) |
Requires YAML/embedding setup |
| Codebase Intelligence | Deep (dead code, god files) | Proxy transport only |
| Security | 55 SAST rules (catches hardcoded secrets) | None builtin |
| Savings Strategy | Information-theoretic Knapsack (retains 100% visibility) | Standard reduction techniques |
| Primary Use Case | Context compression for AI agents | Basic token reduction |
OpenClaw users get the deepest integration — Entroly plugs in as a Context Engine:
| Agent Type | What Entroly Does | Token Savings |
|---|---|---|
| Main agent | Full codebase at variable resolution | ~95% |
| Heartbeat | Only loads changes since last check | ~90% |
| Subagents | Inherited context + Nash bargaining budget split | ~92% |
| Cron jobs | Minimal context — relevant memories + schedule | ~93% |
| Group chat | Entropy-filtered messages — only high-signal kept | ~90% |
from entroly.context_bridge import MultiAgentContext
ctx = MultiAgentContext(workspace_path="~/.openclaw/workspace")
ctx.ingest_workspace()
sub = ctx.spawn_subagent("main", "researcher", "find auth bugs")
Does compression hurt accuracy? In these release checks, compressed context stayed statistically close to baseline.
Entroly selects context at variable resolution. We measure accuracy retention across industry-standard benchmarks:
| Benchmark | What it tests | Baseline | Entroly | Retention |
|---|---|---|---|---|
| NeedleInAHaystack | Info retrieval from long context | 100% | 100% | 100% |
| HumanEval | Code generation | 13.3% | 13.3% | 100% |
| GSM8K | Math reasoning | 86.7% | 80.0% | 92% |
| SQuAD 2.0 | Reading comprehension | 93.3% | 86.7% | 92% |
Results from release checks via
bench/accuracy.py. Performance depends on model, dataset, prompt shape, and token budget.
| Benchmark | Status inside bench/accuracy.py |
Validated Results (gpt-4o-mini) |
|---|---|---|
| NeedleInAHaystack | Implemented | 100% retention |
| HumanEval | Implemented | 100% retention |
| GSM8K | Implemented | 92% retention |
| SQuAD 2.0 | Implemented | 92% retention |
pip install entroly[full] matplotlib
# Export your API key
export OPENAI_API_KEY="sk-..."
# Run the full validation suite
python -m bench.accuracy --benchmark all --model gpt-4o-mini --samples 15
# Generate the NeedleInAHaystack Heatmap
python -m bench.needle_heatmap --model gpt-4o-mini
| Stage | What | Result |
|---|---|---|
| 1. Ingest | Index codebase, build dependency graph, fingerprint fragments | Complete map in <2s |
| 2. Score | Rank by information density — high-value code up, boilerplate down | Every fragment scored |
| 3. Select | Mathematically optimal subset fitting your token budget | Proven optimal (knapsack) |
| 4. Deliver | 3 resolution levels: full → signatures → references | 100% coverage |
| 5. Learn | Track which context produced good AI responses | Gets smarter over time |
Most agent frameworks that learn do so by calling LLMs. Entroly’s structural path tries to learn from code-graph analysis first.
Many self-improving agent frameworks spend API tokens to synthesize skills, reflect on failures, and update policies. The bill grows with experience.
Entroly’s self-evolution loop is designed around three principles intended to keep the runtime budget-negative — learning cost should stay below savings.
A ValueTracker measures cumulative token savings S(t) across every optimized request. The evolution budget is a strict fraction of savings:
C_spent(t) ≤ τ · S(t) (τ = 5%)
Any LLM-based synthesis is gated by this invariant. The intent is that the system spends less on learning than it saves you.
Before the budget is ever touched, the StructuralSynthesizer tries first. It reads the entropy gradient of your code graph — AST patterns, dependency edges, type signatures — and can emit candidate Python tools from structural analysis. No LLM. No embeddings API. No cloud call. Zero tokens.
The auth skill in the trace above was synthesized this way. Fitness 1.0, cost $0.0000.
When no user activity is detected for >60 s, the DreamingLoop generates synthetic queries from FeedbackJournal history, perturbs the PRISM scoring weights, and runs counterfactual experiments against itself. Improvements are kept when they beat the local acceptance gate; regressions are discarded. This is designed to improve local ranking while idle — with no API calls.
User query → miss → EvolutionLogger registers gap
↓
[Pillar 2] StructuralSynthesizer ($0)
↓ (if fails)
[Pillar 1] LLM fallback — only if C_spent ≤ τ·S(t)
↓
Benchmark → Promote (fitness ≥ threshold) or Prune
↓
Skill registry live in .entroly/vault/evolution/
↓
[Pillar 3] Idle? Dream: perturb weights, self-play, keep wins
↓
Next session starts strictly smarter
No manual tuning. No config files. No tokens spent on learning. The daemon ships with the runtime and starts the moment you run entroly go.
Optional: share anonymous optimization weights across installations.
When federation is enabled, Entroly installations can exchange anonymized optimization weights. The intent is that participants benefit from each other’s local learning — without sharing code.
Your daemon learns locally → shares anonymous weights → absorbs others' improvements
Design principles:
| Without federation | With federation | |
|---|---|---|
| Who improves your AI? | Your local data only | Your data + anonymous weights from other installations |
| Network effect | None | More participants = broader weight diversity |
| Infrastructure cost | None | $0 — uses GitHub for transport |
| Privacy | Local only | Differential privacy + anonymous IDs; code never shared |
Federation is experimental. Shared payloads are optimization statistics/weights, not code.
Privacy safeguards:
# Opt-in (default: off — your choice, always)
export ENTROLY_FEDERATION=1
Python and Node.js use the same protocol shape. Feature parity can vary by package version; privacy controls remain opt-in.
LLM responses often include filler — greetings, hedging, meta-commentary. Entroly can strip common filler while leaving code blocks untouched.
LLM responses often include tokens that don’t carry information: “Sure, I’d be happy to help!”, “Let me think about that…”, “Hope this helps!”. Response Distillation strips the prose filler. Code blocks are never touched.
Before: "Sure! I'd be happy to help you with that. Let me take a careful look
at your code. The issue is in the auth module — specifically the
token validation logic. Hope this helps! Let me know if you need
anything else."
After: "The issue is in the auth module — specifically the token validation logic."
→ 75% fewer output tokens. Same information. Zero filler.
Three levels — you choose:
| Mode | What goes | What stays | Typical savings |
|---|---|---|---|
lite |
Greetings, sign-offs | Everything else | 15–25% |
full |
+ hedging, meta-commentary, transitions | Code + technical content | 30–50% |
ultra |
+ articles, function words | Pure signal | 50–70% |
Safety design: Code blocks, JSON, YAML, XML are protected from prose distillation. The distiller is designed to touch prose only.
export ENTROLY_DISTILL=1 # Turn it on
export ENTROLY_DISTILL_MODE=full # lite | full | ultra
Works in real-time on streaming responses. <1ms overhead per chunk.
The daemon is useful silently — but silent autonomy doesn’t build trust. Two first-class integrations let you see and share every evolution event:
Chat gateways — live-stream gap detections, structural syntheses, promotions, and dream-cycle wins to Telegram, Discord, or Slack. Zero extra dependencies — stdlib only.
# Telegram (interactive: /status /skills /gaps /dream)
export ENTROLY_TG_TOKEN=... # from @BotFather
export ENTROLY_TG_CHAT_ID=...
python -m entroly.integrations.telegram_gateway
# Discord (incoming webhook)
export ENTROLY_DISCORD_WEBHOOK=https://discord.com/api/webhooks/...
python -m entroly.integrations.discord_gateway
# Slack (incoming webhook)
export ENTROLY_SLACK_WEBHOOK=https://hooks.slack.com/services/...
python -m entroly.integrations.slack_gateway
agentskills.io export — promoted skills aren’t vault-locked. Export to the portable agentskills.io v0.1 spec so any compatible runtime can consume them:
python -m entroly.integrations.agentskills ./dist/agentskills
# → dist/agentskills/<skill_id>/{skill.json,procedure.md,tool.py,tests.json}
Every exported skill.json carries origin.synthesis: "structural" and origin.token_cost: 0.0 — the zero-token provenance is portable too.
| Typical self-improving agent | Entroly | |
|---|---|---|
| Skill synthesis | LLM generates code (pays tokens) | Structural induction first — $0 |
| Learning budget | Unbounded (you pay the bill) | Gated: C_spent ≤ 5% of savings |
| Gap detection | Implicit (re-encounters failure) | Explicit: EvolutionLogger miss counter |
| Idle time | Process sleeps | DreamingLoop runs self-play |
| Persistence | Session memory + FTS | Epistemic vault + belief graph + registry |
| Net cost of learning | Positive (always) | Designed to be ≤ 0 |
| Capability | What It Does | Cost |
|---|---|---|
| PRISM Reinforcement Learning | Learns which context produces good AI responses. Updates 4D scoring weights (recency, frequency, semantic, entropy) via policy gradients with counterfactual credit assignment. | Zero — runs on CPU |
| Dreaming Loop | During idle time (>60s inactivity), generates synthetic queries and runs self-play experiments to find better weight configurations. Monotonic improvement guarantee. | Zero — no API calls |
| Task-Conditioned Profiles | Automatically detects task type (debugging, feature, refactor, performance, testing, docs) and loads task-specific learned weights. Debugging prioritizes recency; documentation prioritizes semantic similarity. | Zero |
| Skill Synthesis | Identifies gaps in coverage, synthesizes new tools from AST analysis, benchmarks them, promotes winners, prunes losers. Full lifecycle — no human intervention. | Zero — structural analysis only |
| Adaptive Exploration (RAVEN-UCB) | Thompson sampling + Upper Confidence Bound automatically balances exploring new strategies vs exploiting known-good ones. Exploration rate anneals as confidence grows. | Zero |
User Query → Optimize Context → AI Response → Feedback Signal
↓
PRISM RL Weight Update
Task Profile Update
Feedback Journal Entry
↓
[Idle > 60s detected]
↓
Dreaming Loop activates:
→ Synthetic query generation
→ Self-play weight experiments
→ Skill gap detection
→ Structural tool synthesis
↓
Better weights saved to disk
→ Next session starts smarter
The default self-improvement loop runs locally on your CPU. No embeddings API or fine-tuning job is required. The dreaming loop, RL updates, and structural skill synthesis operate on local signals; optional federation or LLM fallback must be enabled separately.
Day 1: Entroly selects context with default weights. Day 30: PRISM weights have shifted based on local feedback signals. Savings and ranking quality may improve as the engine learns your codebase patterns.
entroly dashboard # Watch the PRISM weights evolve in real-time
entroly autotune # Manually trigger optimization (usually not needed)
“If you compress my codebase by 80%, how do I know you didn’t strip the code my AI actually needs?”
Fair question. Here’s the honest answer:
Entroly never “strips” code from files the LLM needs. It uses three resolution levels:
| Resolution | What the LLM sees | When used |
|---|---|---|
| Full (100%) | Complete source code — every line, every comment | Files that directly match your query |
| Signatures | Function/class signatures with types + docstrings | Tangential imports your query doesn’t target |
| Reference | File path + 1-line summary | Files the LLM should know exist, but doesn’t need to read |
Selection policy: If a file directly matches the query, Entroly tries to include it at full resolution before compressing lower-priority files to signatures or references. Use /explain to inspect the actual selection for a request.
By default, optimized requests include a visible report inside the LLM context:
[Entroly: worker.ts (Full), schema.prisma (Full), types.ts (Full),
8 files (Signatures only), 12 files (Reference only). 8,777 tokens. GET /explain for details.]
Your AI sees this. You can see this. No hidden truncation.
/explain EndpointAfter any request, call GET localhost:9377/explain to see:
| Claim | What it actually means |
|---|---|
| 70–95% token savings | Observed in release checks on large-repo workloads. Varies by query specificity, repo size, and token budget. |
| Variable-resolution visibility | Every supported file in your codebase is represented at some resolution. |
| < 10ms latency | Some Rust core paths are sub-10ms. End-to-end optimization depends on repo size, engine mode, filesystem, and cache warmth. Network to the LLM API is unchanged. |
The range reflects real variability: a narrow bug-fix query against a 1000-file repo may hit 95%. A broad “explain the architecture” query against a 50-file repo lands closer to 70%. We publish the range, not the peak.
If the ~40 token overhead bothers you:
export ENTROLY_CONTEXT_REPORT=0
“The LLM is the CPU, the context window is RAM.”
| Layer | What it solves |
|---|---|
| Documentation tools | Give your agent up-to-date API docs |
| Memory systems | Remember things across conversations |
| RAG / retrieval | Find relevant code chunks |
| Entroly (optimization) | Makes selected context fit — compresses codebase + docs + memory under the configured token budget |
These layers are complementary. Entroly is the optimization layer that helps fit high-value context under a budget.
While Entroly was built for codebases, its core relies on Shannon Entropy and Knapsack Mathematics, meaning it is completely agnostic to the text it compresses. Entroly is widely used as a universal context compressor for:
| Text Type | The Problem | How Entroly Compresses It |
|---|---|---|
| Massive Server Logs | 100K lines of identical INFO logs bury the one ERROR stack trace. |
Drops repetitive logs (low entropy), strictly retains exceptions and novel timestamps. |
| Agent Memory | Multi-agent swarms fill up the context window with conversational fluff. | Extracts only the high-signal, decision-making paragraphs to pass to the next agent. |
| Legal/Financial Docs | RAG systems retrieve 50 pages of PDFs, blowing the token budget. | Scans the retrieved paragraphs, isolates the exact clauses answering the query, drops the boilerplate. |
In our NeedleInAHaystack benchmark, Entroly perfectly compressed 128,000 tokens of Paul Graham essays (pure English text) to 2,000 tokens while maintaining a 100% retrieval success rate.
| Command | What it does |
|---|---|
entroly go |
One command — auto-detect, init, proxy, dashboard |
entroly wrap claude |
Start proxy + launch Claude Code in one command |
entroly wrap codex |
Start proxy + launch Codex CLI when its provider settings permit a custom endpoint |
entroly wrap aider |
Start proxy + launch Aider |
entroly wrap cursor |
Start proxy + print Cursor config |
entroly demo |
Before/after comparison with dollar savings on YOUR project |
entroly dashboard |
Live metrics: savings trends, health grade, PRISM weights |
entroly doctor |
7 diagnostic checks — finds problems before you do |
entroly health |
Codebase health grade (A-F): clones, dead code, god files |
entroly benchmark |
Competitive benchmark: Entroly vs raw context vs top-K |
entroly role |
Weight presets: frontend, backend, sre, data, fullstack |
entroly autotune |
Auto-optimize engine parameters |
entroly learn |
Analyze session for failure patterns, write to CLAUDE.md |
entroly digest |
Weekly summary: tokens saved, cost reduction |
entroly status |
Check running services |
entroly wrap claude # Starts proxy + launches Claude Code
entroly wrap codex # Starts proxy + launches Codex CLI when custom endpoints are supported
entroly wrap aider # Starts proxy + launches Aider
entroly wrap cursor # Starts proxy + prints Cursor config
Entroly starts the proxy, sets the documented base URL environment variable where the tool supports one, and launches your tool. If a vendor CLI requires provider configuration instead, use that tool’s documented settings and review its terms before proxying.
from entroly import compress
result = compress(messages, budget=50_000)
response = client.messages.create(model="claude-sonnet-4-5-20250929", messages=result)
Or compress any content directly:
from entroly.universal_compress import universal_compress
compressed = universal_compress(huge_json_blob) # auto-detects JSON
compressed = universal_compress(log_output) # auto-detects logs
compressed = universal_compress(csv_data) # auto-detects CSV
Content-type auto-detection routes each input to the best compressor — JSON, logs, code, CSV, XML, stacktraces, tables.
| Your setup | Add Entroly | One-liner |
|---|---|---|
| Any Python app | compress() |
result = compress(messages, budget=50_000) |
| Any app (proxy) | entroly proxy |
Point base URL at localhost:9377 |
| LangChain | EntrolyCompressor |
chain = compressor \| llm |
| Multi-agent | MultiAgentContext |
ctx = MultiAgentContext(...) |
| Claude Code | entroly wrap claude |
One command |
| Codex / Aider | entroly wrap codex / entroly wrap aider |
Custom endpoint where supported |
| MCP tools | entroly init |
Auto-config |
from langchain_openai import ChatOpenAI
from entroly.integrations.langchain import EntrolyCompressor
llm = ChatOpenAI(model="gpt-4o")
compressor = EntrolyCompressor(budget=30000)
chain = compressor | llm
result = chain.invoke("Explain the auth module")
from entroly.context_bridge import MultiAgentContext
ctx = MultiAgentContext(workspace_path="~/.agent/workspace", token_budget=128_000)
ctx.ingest_workspace()
# NKBE allocates budget optimally across agents
budgets = ctx.allocate_budgets(["researcher", "coder", "reviewer"])
# Spawn subagent with inherited context
sub = ctx.spawn_subagent("main", "researcher", "find auth bugs")
# Schedule cron jobs with minimal context
ctx.schedule_cron("monitor", "check error rates", interval_seconds=900)
Entroly never permanently discards data. When a fragment is compressed to a skeleton, the original is stored in the Compressed Context Store. The LLM can retrieve the full original on demand:
# List all retrievable fragments
curl localhost:9377/retrieve
# Get full original of a compressed file
curl localhost:9377/retrieve?source=file:src/auth.py
This is the architectural answer to “silent truncation”: nothing is permanently lost. If the LLM needs the full body of a skeletonized function, it asks for it.
Entroly stabilizes context prefixes across turns to improve provider KV-cache reuse where the configured provider supports prompt caching. Cache discounts and behavior are provider-specific and can change.
entroly learn # Analyze session for failure patterns
entroly learn --apply # Write learnings to CLAUDE.md / AGENTS.md
Reads the proxy’s passive feedback data, identifies patterns where the LLM was confused or gave low-quality responses, and writes actionable corrections to your agent config files.
entroly proxy --quality speed # minimal optimization, lowest latency
entroly proxy --quality balanced # recommended (default)
entroly proxy --quality max # full pipeline, best results
entroly proxy --quality 0.7 # any float 0.0-1.0
| Linux | macOS | Windows | |
|---|---|---|---|
| Python 3.10+ | Yes | Yes | Yes |
| Rust wheel | Yes | Yes (Intel + Apple Silicon) | Yes |
| Docker | Optional | Optional | Optional |
| Admin/WSL required | No | No | No |
~/.entroly/value_tracker.json, trend charts in dashboard/confidence endpoint for real-time VS Code widgetsX-Entroly-Confidence, X-Entroly-Coverage-Pct, X-Entroly-Cost-Saved-TodayPOST /feedback lets your AI rate context qualityGET /explain shows why each fragment was included/excluded, with resolution labels and drop reasonsentroly doctor # runs 7 diagnostic checks
entroly --help # all commands
Email: autobotbugfix@gmail.com — we aim to respond within 24 hours.
| Variable | Default | What it does |
|---|---|---|
ENTROLY_QUALITY |
0.5 |
Quality dial (0.0-1.0 or preset) |
ENTROLY_PROXY_PORT |
9377 |
Proxy port |
ENTROLY_MAX_FILES |
5000 |
Max files to index |
ENTROLY_RATE_LIMIT |
0 |
Requests/min (0 = unlimited) |
ENTROLY_MCP_TRANSPORT |
stdio |
MCP transport (stdio/sse) |
ENTROLY_CONTEXT_REPORT |
1 |
Inline context report in LLM prompts (0 to disable) |
ENTROLY_CACHE_ALIGN |
1 |
Provider KV cache prefix stabilization (0 to disable) |
ENTROLY_FEDERATION |
0 |
Enable federated swarm learning (1 to enable) |
ENTROLY_FEDERATION_BOT |
(none) | Shared GitHub bot token for anonymous federation writes |
ENTROLY_DISTILL |
0 |
Enable response distillation / output compression (1 to enable) |
ENTROLY_DISTILL_MODE |
full |
Distillation intensity: lite, full, or ultra |
Apache-2.0
Measure and reduce wasted context tokens with local, evidence-aware tooling.
pip install entroly[full] && entroly go