engram

A Recursive Language Model engine for Claude Code. The codebase isn't loaded into context. Claude examines it through five logged primitives.

Manav Arya Singh · May 2026

tl;dr

Long context windows lose 25–60% retrieval accuracy past 200K tokens, and /compact erases the rest. engram skips loading entirely. Claude calls five primitives (grep, read, ast, git, recurse) and every call is journaled.

I built engram one weekend after losing a four-hour debug session to /compact for the second time that week. Claude had figured out the bug, ruled out three wrong theories, walked me through the auth flow, and then the conversation compacted and most of what we'd worked out got summarized away into five lines.

The premise is simple. Claude doesn't need to load your code to reason about it; it needs to access it. Loading is what context windows do. It's expensive, it dilutes attention, and it eventually crashes into /compact. Access is what a REPL does: call a tool, get a small answer back, decide what to ask next.

engram is that REPL.

1.Why context windows aren't the answer

Claude's window is one million tokens. That's the marketing. The reality is that you don't actually get the marketed window in practice.

Independent benchmarks through 2026 show frontier 1M-token models drop roughly 25–60 percentage points of retrieval accuracy past 200K tokens. Even Gemini's 10M window doesn't escape it; attention dilution shows up later, but it still shows up. Multi-turn coding sessions accumulate worse: each turn appends tool outputs, intermediate reasoning, and re-injected chunks until the model loses the thread on the larger plan. Then /compact happens, and most of what Claude learned in the session disappears.

The agent-memory ecosystem has grown up next to this problem. Mem0, Letta, Zep. They target chat memory: facts about the user, prior topics, state changes. None of them expose primitives appropriate to source code. None of them are verifiable in a way you'd defend in a postmortem.

2.The idea

In December 2025, three researchers at MIT — Alex Zhang, Tim Kraska, and Omar Khattab — published a paper called Recursive Language Models. The thesis: stop feeding long context to the model. Make the long context an environment the model examines through a REPL, with the option to recursively call itself over snippets. Their experiments achieved two orders of magnitude beyond the underlying model's context window on long-context tasks, while improving answer quality.

The Zhang et al. paper is general-purpose: a REPL of exec, find, summarize. engram specializes it for coding agents, with five primitives chosen to match what a coding agent actually issues queries for.

3.The primitives

grep: Bounded regex search across the repo. Returns file : line : col with N lines of context. Skips node_modules, .git, dist, build directories. Caps at 50 hits. If you hit the cap, you narrow.
read: A bounded line range from a file. 1-indexed, inclusive, hard-capped at 400 lines per call. There is no flag to fetch the whole file. That's the discipline.
ast: Structural queries via the TypeScript compiler API: functions, classes, exports, imports, or the tightest node containing a given line : col. Works on .ts, .tsx, .mts, .cts, .js, .jsx, .mjs, .cjs.
git: Bounded views over log, blame, and diff. Runs via spawn with a fixed argv. No shell, no injection vector.
recurse: Emits an ENGRAM-RECURSE-REQUEST. The parent dispatches via Claude's Task subagent. The child returns one to three sentences, never a transcript, so the parent's context stays clean.

An example call, end to end:

$ engram ast src/auth.ts functions --human
{
  "primitive": "ast",
  "ok": true,
  "data": {
    "file": "src/auth.ts",
    "symbols": [
      { "name": "login",   "kind": "function", "line": 12, "exported": true  },
      { "name": "refresh", "kind": "function", "line": 38, "exported": true  },
      { "name": "verify",  "kind": "function", "line": 64, "exported": false }
    ]
  },
  "truncated": false,
  "bytes": 218,
  "durationMs": 5,
  "journalId": "8c4a3f1e-7b22-4d91-aaff-1c0e9d2c63f4"
}

Every primitive funnels through a single byte-budgeted runner. There is no path that skips the audit chain. Errors are journaled too, so silent failure isn't possible.

4.The journal

Every call appends one line to .engram/journal.jsonl: timestamp, primitive, args, a sha256 hash over key-sorted JSON, a bounded preview, duration, session id. Append-only. Local. Replayable. You can answer “what did Claude see at 02:14?” with a cryptographic receipt.

07:50:24.987  engram.ast      {"file":"src/auth.ts","query":...}    sha256:3a7b8c…
07:50:25.012  engram.grep     {"pattern":"login","glob":"src/..."}  sha256:38d807…
07:50:25.044  engram.read     {"file":"src/auth.ts","fromLine":40}  sha256:c9e1f0…
07:50:25.063  engram.git      {"mode":"blame","file":"src/..."}     sha256:71b3d2…
07:50:25.118  engram.recurse  {"prompt":"summarize refresh path"}   sha256:9adf4e…

It's a file. No daemon, no cloud, no vendor. You can cat .engram/journal.jsonl | jq from any terminal that has those two tools. Audit, replay, gitignore, delete, compress, ship to a partner during incident review — whatever a file allows.

5.Where it fits

Codebase retrieval is solved. Cursor's index, Cody, Sourcegraph — they get you a fast retrieval surface over code. engram isn't that. engram is what the model uses after retrieval has handed it an entry point.

Agent memory is a category. Mem0, Letta, Zep get you a queryable layer over your agent's chat history. They aren't specialized for code; engram is. They also aren't verifiable in the sense that you can prove, after the fact, exactly what the model saw at a given moment. engram is.

The 2025 Recursive Language Model paradigm is, as far as I can tell, the first time anyone has drawn a clean line between access and ingestion in agent design. engram is the first specialized implementation of that paradigm for coding agents that I know of. If I've missed someone, please open an issue — I'd genuinely like to know.

6.Install

Two commands.

$ git clone https://github.com/Manavarya09/engram ~/.claude/plugins/engram

$ npm install -g engram

The first installs engram as a Claude Code plugin and gives you /engram <question>. The second installs it as a standalone CLI so you can call primitives directly from any shell.

engram needs Node 22.6 or later. It ships TypeScript source directly via --experimental-strip-types; there is no build step. (The one runtime dependency is the TypeScript compiler, which the ast primitive uses. Everything else is Node built-ins.)

7.What's next

This release ships two of the eight memory tiers I think a coding agent actually needs: the journal (L2) and the code-as-environment view (L4). The other six are on the roadmap, in roughly this order:

L7: decision lineage; why we chose X over Y, what we tried
L6: tool log; build outputs, test results, errors
L3: project state; pending TODOs, branch intent
L5: org memory; patterns across all your repos

The full roadmap and the design contracts that govern each tier are in the ARCHITECTURE.md in the repo. The full positioning — with citations, evaluation methodology, and threat model — is in the paper.

If any of this is wrong, or if you have a use case the primitives don't cover, the GitHub issues are open.

ɘengram