· 11 min read
Karpathy's LLM Wiki, But for Your D&D Campaign (And It's Already Live)
Three weeks ago Andrej Karpathy posted a gist that hit thousands of stars: stop using LLMs as search engines over your documents, and start using them as knowledge engineers that compile a living wiki. Then Garry Tan shipped GBrain — a brain that doesn't just remember, it acts. The AI world decided memory was the next big fight.
We've been building the messiest possible version of that problem for a year, and it isn't an enterprise knowledge base. It's a game. Because an AI Dungeon Master has to remember your story, write it down without contradicting itself, and eventually act on it — the three layers everyone is theorizing about, running against a story that punishes every mistake.
Why does AI forget your campaign?
The context window is not memory. It's a whiteboard that gets wiped after every session. A model can hold a million tokens, but quality starts degrading long before that, and when the session ends, everything disappears. The next conversation begins from zero.
In a chatbot that's annoying. In a role-playing campaign it's fatal. You spend three sessions building a rivalry with a smuggler named Elara. You spare her life in a crypt. Two weeks later the Dungeon Master introduces her again — as a man, from a different city, who has never met you. The illusion breaks. The story you were invested in stops being yours. If you've played with a raw LLM, you know exactly the moment we mean: the dead NPC who keeps talking.
RAG was the first serious fix: embed your documents into vectors, store them, and retrieve the relevant chunks at query time. It works — millions of production systems run on it. But it has a structural limit that a 2024 research paper mapped into seven failure points. Three of them matter for a Dungeon Master: chunking splits a character across fragments so the model finds one and misses the rest; re-derivation means every turn starts from scratch, learning nothing; and passivity means the system only ever answers, never noticing that two sources disagree.
Three ways an AI can remember: retrieve, compile, act
The current debate lines up three architectures. They get framed as rivals. They're not — they are three layers of the same stack.
- Retrieve (RAG) — find relevant content at scale. Great at breadth, blind to depth; it re-reads the same books for every exam and never learns the material.
- Compile (the LLM wiki)— Karpathy's idea: compile sources once into a persistent, cross-linked wiki. The knowledge compounds; it just doesn't scale to millions of documents.
- Act (autonomous skills)— Garry Tan's GBrain: knowledge that doesn't sit there but triggers actions, runs on a schedule, works while you sleep. Powerful, and a serious engineering commitment.

The right question isn't “which one wins.” It's what is your agent's job?A Dungeon Master's job is all three at once — which is why it makes such a brutal test case. Here is where we actually are.
Retrieve: RAG over your campaign
The base layer is real RAG. Every article in your campaign's Codex is embedded as a vector and stored with pgvector in Postgres. When the Dungeon Master needs something the structured lookups don't cover, it runs a semantic search over those embeddings and pulls the closest passages.
This is the unglamorous, load-bearing part. It scales, it's fast enough for a live turn, and it's the fallback when the graph doesn't already know the answer. But on its own it has exactly the weakness Karpathy points at: it retrieves, it never compounds. So we don't stop here.
Compile: the Codex is Karpathy's LLM wiki
The Codex is Karpathy's LLM wiki. We shipped it for tabletop RPG instead of research papers. Same three layers he describes:
- Immutable sources — your actual play. The transcript of what happened at the table. The model reads it but never rewrites it.
- The wiki — the Codex: LLM-written entity pages for every NPC, place, quest and thread, with summaries, status, and cross-links. The model owns this layer.
- The schema — the rules for how the wiki is maintained: what counts as an NPC, how backlinks form, what a contradiction is.
And it compounds the way Karpathy describes. When a scene ends, the Dungeon Master doesn't just write one summary — it reads the new events against the existing Codex and updates every page they touch: new cross-references, updated status, a fresh timeline entry. One entry literally reads “The DM keeps this entry.” The wiki grows richer the longer you play.

The payoff shipped this week: the Dungeon Master now reads its own Codex every turn. It doesn't dump the whole wiki into context — it routes to the entities that matter in the current scene, pulls their pages plus one hop of backlinks, and injects a tight memory block. That's the retrieve + compile intersection the whole industry is drawing on a whiteboard, running in a live game.
The problem nobody mentions: coherence, not memory
The “RAG is dead” essays skip the hard half. For a story, remembering is the easy half. The hard half is not contradicting yourself. A wiki that compiles itself will happily compound its own hallucinations — one wrong detail gets cross-referenced into three pages and becomes canon. Karpathy's gist is 200 elegant lines; in production, the lint and verification is 80% of the real work.
So the interesting engineering isn't the compile step, it's the guardrails. The Codex separates what a player may see from DM-onlysecrets, so the memory the Dungeon Master reads can include the twist without ever leaking it to the table. And we don't trust any of it on vibes — we run evals. With the memory block on, the Dungeon Master uses roughly three times more canonical facts per scene and never leaked a secret across repeated runs; with it off, it invented three mutually contradictory backstories for the same character. That gap is the whole product.
Act: what's coming
The third layer is the honest part of the roadmap. Today the Dungeon Master already actsthrough tools — it rolls dice, resolves mechanical combat, registers NPCs, tracks clocks. But those fire reactively, while it narrates your turn. What it doesn't do yet is what Garry Tan's GBrain does: run on its own.
The version we want is a Dungeon Master that keeps the world turning between sessions — advancing a rival's scheme, spending the favor an NPC owed you, letting a threat you ignored grow teeth — so you come back to a world that lived without you. That's the action layer, and it's the honest “coming soon” on the diagram above. The point is that the memory and the wiki are the hard prerequisites, and those already exist.
So which architecture wins?
None of them, alone. The same way databases stopped being “SQL or NoSQL” and became hybrid systems, agent memory is converging into a single stack: retrieve at scale, compile into persistent knowledge, act on it autonomously. The teams still asking which one to pick are asking the wrong question.
We didn't set out to prove a thesis about AI memory. We set out to build a Dungeon Master that doesn't forget your story. They're the same project. And the least forgiving place to test it isn't an enterprise wiki, it's a table of players who will notice the second their dead rival starts talking again.
Frequently asked questions
What is an LLM wiki?
An LLM wiki is a memory pattern popularized by Andrej Karpathy: instead of retrieving raw document chunks at query time (RAG), you use the language model to pre-compile your sources into a persistent, interlinked wiki of summaries, entity pages, and cross-references. The synthesis happens once; every future question benefits from it. The knowledge compounds instead of being re-derived on every query.
How does an AI Dungeon Master remember your story?
A capable AI Dungeon Master does not rely on the context window, which is erased between sessions. LoreKeeper writes every NPC, place, and pact your party discovers into a living wiki (the Codex), then has the Dungeon Master read that wiki each turn. So it recalls who owes you a favor and what you swore three sessions ago, instead of inventing a new version.
Is RAG dead?
No. RAG is not a competitor to an LLM wiki or to autonomous skills — it is a layer. RAG is the retrieval layer that finds relevant content at scale. A wiki is the synthesis layer that compiles it into persistent knowledge. Skills are the action layer. Production systems combine all three; the debate about which one "wins" misframes them as rivals.
What's the difference between RAG and an LLM wiki?
RAG retrieves and forgets: every query re-reads the same chunks and starts from scratch. An LLM wiki compiles once and reuses: it reads new sources against the existing wiki, updates every affected page, and cross-links them. RAG scales to millions of documents but never learns; a wiki compounds in quality but does not scale to huge corpora without a retrieval layer on top.
Can an AI game master act on its own between sessions?
Not yet in most products, including LoreKeeper today — the Dungeon Master's tools are reactive, firing while it narrates your turn. The next frontier (what Garry Tan's GBrain calls autonomous skills) is a game master that advances NPC plots, generates consequences, and updates the world while you are offline. That is the action layer, and it is on the roadmap.
Play a Dungeon Master that remembers
The Codex writes itself while you play, and the DM reads it every turn. Start a campaign free — 100 rounds, no card, solo or with up to five friends.
Play freeRelated guides
How to Create Scripted RPG Adventures
Complete guide to the Trama editor: nodes, activation conditions, tactical combat maps, and how to publish your adventure.
AI World Builder for RPGs
How AI can build factions, locations, races and lore that feed the game master in real time.
AI Arena: PvP Combat in an AI RPG
Player-versus-player combat run by AI — how it works, when to use it, balance considerations.
