Exocortex¶
A personal knowledge OS that thinks while you sleep, speaks on its own, and refuses to let you forget. Notion is just one of its faces.
Exocortex is not a notes app. It is an autonomic reasoning engine — a typed graph plus an LLM synthesiser plus a hybrid retriever — that ingests what you read, hear, and decide, and compiles knowledge from it on its own schedule. Obsidian, Notion, Telegram, and the shell are interchangeable surfaces. The brain lives in Postgres.
This document is the long-form companion to README.md.
Read it if you want to know why the architecture looks the way it
does, what problems it actually solves, and what is explicitly not
solved yet.
1. Why Exocortex¶
Every popular "second brain" template — PARA, Notion Ultimate Brain, Tiago Forte's Building a Second Brain, the Roam/Logseq family — is, underneath, a tidy relational database with backlinks. The user is the only active processor. The system waits.
That passivity is the root cause of the five PKM pain points that recur in every honest review of these systems:
- Notes never come back. You wrote it; you forgot you wrote it. The graph keeps the link, but nothing surfaces the note when you need it. Spaced repetition tools exist, but they operate on flashcards, not contextually-relevant atoms.
- Collector's fallacy. Saving an article feels productive; it is not. Without something that distils the captured material into a position you can act on, the inbox grows and your thinking about any given topic does not.
- System decay. Templates start tidy and entropy wins. After 12 months, a typical PKM looks like a yard sale of half-formed pages and dead links. The system has no mechanism to repair itself.
- Fragmentation across tools. Notion for tasks, Obsidian for prose, Apple Notes for the thought you had on the bus, Telegram for the link a friend sent, Gmail for the newsletter. No single place knows what you know.
- Output gap. You read a lot. You decide a lot. You ship very little of it. The knowledge stays in your head and the system has no idea you're sitting on a draft.
These five problems share a single shape: the system is passive and the user is the only processor. Templates can dress this up with better UX, but they cannot fix it. You need a different architecture.
2. Architecture (L1 / L2 / L3)¶
Exocortex separates storage from rendering by a layer most notes apps collapse:
L1 — sources of truth
─────────────────────
Files you (or systems on your behalf) write:
· vault/*.md (Obsidian, edited by hand)
· Gmail "Read Later" label
· RSS feeds you subscribed to
· Fireflies meeting transcripts
· Notion (when integrated as a source surface)
│
▼ write-time fork
(Karpathy LLM-Wiki pattern)
L2 — the graph (canonical)
──────────────────────────
Postgres 16 + pgvector + Apache AGE:
· thoughts (atomic, embedded, addressable)
· edges (35 typed kinds: contradicts, supersedes, decided_in, …)
· syntheses (LLM-generated, perspective-typed, versioned)
│
▼ query-time fork
(Open Brain pattern)
L3 — surfaces
─────────────
Projections of L2 the user actually reads:
· wiki/*.md (compiled nightly, by domain)
· MCP tools (search, ask, expand_node, …)
· Telegram pushes
· Notion mirror
· HTTP API
The write-time fork (Karpathy, LLM Wiki for Notion, April 2026)
runs at ingestion: parse a source, embed it, extract typed entities
and relations, store atoms and edges. Then a nightly compiler renders
domain-shaped Markdown into the vault — meetings under
wiki/work/meetings/, people under wiki/work/people/, newsletters
under wiki/news/, and so on. This output is dereferenceable: every
synthesised page links back to the thoughts and edges that produced
it.
The query-time fork (Nate Jones, The hybrid I'd actually build next, April 2026 — the "Open Brain" pattern) does not pre-compile. When you ask a question, GraphRAG walks the graph, fuses vector similarity with typed-edge traversal, and asks an LLM to compose an answer with citations. Nothing is rendered eagerly; everything is rendered when asked.
Most systems pick one fork. Notion templates are write-time only (you render once; the rendering is the truth). Pure RAG stacks are query-time only (no compiled surface; every interaction re-queries). Exocortex runs both forks against the same graph, because the forks answer different questions: "what is the current state of topic X?" (write-time → compiled wiki page) versus "given everything I know, how should I think about Y?" (query-time → live synthesis).
Typed edges, not just links¶
The 35 edge types in schema/edges_kind_enum are the load-bearing
piece. A backlink in Obsidian says "these two pages mention each
other". An Exocortex edge says one of:
decided_in— this thought records a decision made at this meetingcontradicts— these two thoughts make incompatible claimssupersedes— this thought replaces an earlier oneaddresses_problem— this synthesis answers this open questioncites— this newsletter quotes this sourcementions_person/mentions_client/mentions_projectderived_from— this synthesis was generated from these thoughts- … and 27 more
Typed edges let GraphRAG return reasoning paths, not just similar fragments: "A contradicts B, which was superseded by C, sourced from meeting X (human-authored)." That is the difference between retrieval and reasoning.
3. The plugin system¶
The core ships zero domain knowledge. Everything specific — what counts as a meeting, how a newsletter gets parsed, which MCP tools the server exposes — comes from plugins.
A plugin is a Python package with a setup(registry) function that
calls the appropriate registration method for each thing it provides.
There are five extension points:
| Extension point | Base class | What it does | Example |
|---|---|---|---|
| Source adapters | Source |
Pulls items from somewhere into the capture API | Gmail label, RSS feed, Fireflies webhook |
| Capture processors | Processor |
Turns a raw captured item into typed thoughts + edges | notion-task-sync, newsletter, youtube |
| Perspective types | PerspectiveType |
Generates a synthesis for a given key (client, project, topic, …) | client, frp, news_cluster |
| MCP tools | McpTool |
Exposes a callable tool over the MCP protocol | search_thoughts, find_contradictions |
| Wiki domains | DomainCompiler |
Renders a part of the compiled wiki | work/, news/, frp/ |
| Live sections | SectionGenerator |
Contributes a section to the home dashboard | open_action_items, coming_back_to_you |
A plugin's setup() looks like:
from exocortex.core.registry import Registry
def setup(registry: Registry) -> None:
registry.register_perspective(MyClientPerspective())
registry.register_mcp_tool(MyTool())
registry.register_compile_domain(MyDomainCompiler())
Plugins are discovered two ways. In development, drop a package
under plugins/ and it loads via Registry.discover_dev(). In
production, declare an entry point in your plugin's pyproject.toml:
pip install your plugin and the core picks it up via
importlib.metadata.entry_points. Plugins do not import each other;
they only talk to the registry and to the core engine. This is what
keeps the engine generic — there is no place in exocortex/ that
knows the name of any domain.
The reference example is examples/acme-corp/:
a fictional company plugin in ~150 lines that adds a custom
perspective, an MCP tool, and a domain compiler.
4. The source-of-truth rule¶
There is one architectural law in Exocortex, paraphrasing Nate:
The DB stays the source of truth. The wiki is never edited directly. Errors are fixed in source, then regenerated. The wiki never drifts from reality because it's always rebuilt from reality.
Practically, this means:
- Every write goes to L1. Edits to a meeting note happen in the
vault
.md(or via the Capture API). The wiki page inwiki/work/meetings/2026-05-24-foo.mdis compiler output — it will be overwritten on the next run. - Edges and syntheses are never hand-edited. They are derived artefacts. To "correct" a synthesis, you correct the underlying thought (or mark the synthesis as superseded); the next compile picks it up.
- The Notion surface is read-mostly. When Two-Way Cockpit
(Feature 3, v0.2) lands, checking off an action item in Notion will
fire a
POST /capturewithsource_type=notion-cockpit-action— i.e. it writes to L1, not directly to the graph node. Even the fanciest surface is a thin shim over the canonical write path.
This rule is what keeps the system from decaying. Notes apps decay because the user is editing the same artefact over and over and the edits accumulate cruft. In Exocortex, the cruft has nowhere to live — every compile starts from scratch.
The cost of this rule is that the compiler must be cheap to run. A full vault rebuild takes ~2 minutes on a $24/mo droplet, with LLM synthesis costs around $0.10–0.20/day at typical volumes. That's the budget the rule has to fit inside.
5. The six features¶
v0.1 ships the engine: ingestion, the graph, GraphRAG, the compiler, the MCP server, and a plugin system. The six features below are what gets layered on top in v0.2+. They are not bells and whistles — each one targets a specific pain point from Section 1.
Feature 1 — Nightly Shift (briefing while you sleep)¶
A 05:00 UTC cron runs a perspective called night_shift_briefing. It
delta-queries the last 24 hours of thoughts, asks find_contradictions
for new conflicts in the graph neighbourhood, runs a pattern detector
(term-frequency spike + embedding drift versus a 30-day baseline) over
the day's atoms, and surfaces overdue action items. The LLM (Qwen,
cheap) writes 2–5 sentences of Polish narrative — not a wall of
numbers. Telegram gets a push.
Guardrail: contradictions are surfaced as a list, never narrated. "3 new contradictions: A vs B, C vs D, E vs F — decide" — not "there's some tension between A and B." The LLM describes patterns; it does not adjudicate.
Solves pain points 1 (notes don't come back) and 5 (output gap — the system tells you what you have, ready to ship).
Feature 2 — Resurfacing Engine (notes return on their own)¶
A daily worker maintains resurfacing_state (SM2 algorithm:
ease_factor, next_due, review_count) for every thought. The
scoring combines:
- the SM2 forgetting curve (when should this come back?)
- graph proximity to today's active nodes (is it relevant now?)
- provenance weight (human authored > validated AI > unvalidated)
- an orphan boost (a thought with no recent edges gets a small bump)
3–5 notes resurface per day in a "Wraca do ciebie" / "Coming back to you" home dashboard section. This is graph-aware spaced repetition of whole notes, not flashcards. The novelty is that surfacing is driven by current relevance, not just temporal decay.
Solves pain point 1 (notes don't come back).
Feature 3 — Two-Way Notion Cockpit¶
Today Notion is a read-only mirror of the compiled wiki; the publish pipeline overwrites it nightly. v0.2 makes a property-whitelisted set of Notion fields bidirectional:
- Check a task →
POST /capturewithsource_type=notion-cockpit-action→ processor updates the action item's edge tostatus=done→ next compile shows it ticked everywhere - Type a question into a "Ask the brain" field → MCP
askruns → the answer is written into a reply field within ~5 seconds - Toggle
human_validated: trueon anai_authoredsynthesis → provenance ranking bumps it from 0.70 to 0.85
The hardest piece is conflict resolution: vault .md is the
canonical source, Notion is a surface, but humans edit both. The rule
is property-level last-write for booleans, manual merge for free
text, and a Notion-side "this is stale, repull?" banner when the
two diverge by more than a threshold.
Solves pain point 4 (fragmentation) — the same checkbox affects every surface.
Feature 4 — Telegram (mobile brain access)¶
Native Obsidian and Notion apps already solve reading on mobile. The actual gap is GraphRAG access (the reasoning layer is MCP-stdio-only, effectively dead outside a Claude Desktop session) and proactive push (the brain finds something; you have no idea).
A thin Telegram bot on the host, using an allowFrom-style ID allowlist, polling (no webhook needed behind NAT), and optional LLM intent routing for ambiguous queries, exposes:
Ask the brainfrom your phone — GraphRAG query, cited answer- Push notifications — morning briefing, contradiction alerts, overdue action items
- Capture-via-Telegram — anything you send becomes a
POST /capture - (v0.2 stretch) Document drop → graph-backed fact-check: send a
Markdown file, get back a list of claims with
supports/contradicts/decided_inedges from your own history
This is qualitatively different from a Perplexity-style fact-check: an internet-backed bot asks the world; Exocortex asks you, six months ago.
Solves pain point 4 (fragmentation — your phone is now a surface) and reinforces 1, 2, 5.
Feature 5 — Gap Radar (what you don't know)¶
A new MCP tool, gap_analysis, and a gap_radar synthesis
perspective. Pure graph queries — no new schema:
- Dense clusters of thoughts with no
synthesisnode → "40 thoughts about X, zero decision-grade synthesis. Want a draft?" - Topics with
mentions_*edges but noaddresses_problem→ dead knowledge: you keep reading about AI governance but never connected it to a project contradictsedges with nosupersedesresolution → open conflicts older than 30 days- Orphans older than 30 days → thoughts that never grew an edge
The output feeds into the newsletter pipeline as a nudge sink — "you've been quiet on X and you have material; here's a 200-word seed."
Solves pain points 2 (collector's fallacy) and 5 (output gap) directly.
Feature 6 — Wiki Compiler Refactor¶
This one is not user-facing — it's structural. workers/wiki_compiler.py
is currently a ~10k-line monolith with 218 top-level definitions and 5
module-level globals. Each new feature above adds a section to a home
dashboard rendered from a 1100-line function. Two LLM agents touching
the file at once produce merge hell.
v0.2 splits it into a wiki_compiler/ package with ~30 modules
(core/, util/, domains/), a RunContext dataclass replacing
the globals, and an explicit public API for the invariants that
matter — most importantly core/user_state.py, which preserves the
hand-toggled [x] / ✅ markers in meeting notes byte-identically
across compiles.
Without this refactor, Features 1, 2, and 5 each add 200–400 lines to
already-1100-line functions. With it, each feature is one new file
under domains/home/sections.py.
Acceptance is byte-identical output versus a pinned baseline (1172 files in the reference vault). The refactor must be invisible.
6. Compared to the alternatives¶
vs. Notion templates (Ultimate Brain, PARA, PPV)¶
| Notion templates | Exocortex | |
|---|---|---|
| Storage | Relational DB + manual relations | Append-only graph, 35 typed edges, vector embeddings |
| Active processor | The user | The user + LLM synthesisers on schedule |
| Retrieval | Keyword search + Notion AI (RAG against the page text) | GraphRAG (pgvector HNSW + AGE traversal + RRF fusion) |
| Quality signal | None | Provenance-aware ranking (human / validated / unvalidated) |
| Conflict surfacing | None | Active find_contradictions over the graph |
| Self-rebuild | Manual | Nightly compile from a canonical graph |
Notion's API fundamentally lacks vector search, graph traversal, and embeddings. No template can fix that — the limit is the platform.
vs. RAG frameworks (LangChain, LlamaIndex, Haystack)¶
RAG frameworks are libraries; Exocortex is a system. The frameworks give you primitives for one fork (query-time). They have no opinion on:
- the typed edge schema (Exocortex has one — 35 kinds)
- the write-time compile (Exocortex has a nightly wiki compiler)
- the source-of-truth rule (Exocortex enforces L1-only writes)
- the surfaces (Exocortex ships an MCP server, capture API, Telegram bot, Notion mirror)
- the operational shape (Exocortex runs on a $24/mo droplet with pg_cron, systemd timers, and a known cost envelope)
You could build Exocortex on top of LangChain. We didn't, because the abstractions point the wrong way for a system whose canonical form is a typed graph in Postgres.
vs. its private upstream¶
Exocortex was extracted from a private "second brain" repo in Q2 2026. The upstream contained domain plugins specific to its author (client engagements, audience-specific publishing pipelines, regional language configurations) and operational tooling that wasn't generically useful. Exocortex is the engine — the parts that survive removing one person's notes from the picture.
The extraction kept:
- the graph schema, the synthesizer, the compiler, the GraphRAG layer
- the plugin registries (sources, processors, perspectives, MCP
tools, wiki domains)
- the configuration loader, the migration CLI, the init command,
the Docker stack
- two example plugins: hello-world and acme-corp (fictional)
It removed: - the author's vault content - author-specific domain plugins (which live in the private repo) - proprietary audience definitions for downstream Notion publishing - vault paths, tenant IDs, and other personal configuration
The two repos share llm-router (currently bundled, will be split
out when published to PyPI) and the schema/migration story. Engine
bug fixes flow from public Exocortex into the private upstream, not
the other way around.
7. Design principles¶
Five rules that the engine commits to and that anything you build on top of it should respect:
- Append-only event log. Thoughts are never updated in place —
only superseded. The history of what was once true is part of
the data; you can ask the graph what you used to believe and
when you changed your mind. Compaction happens through edges
(
supersedes,contradicts+decided_in), not deletion. - Source-of-truth rule. L1 sources beat L2 graph state beat L3 syntheses beat L3 compiled wiki. Errors are always fixed in the highest-priority layer that contains them, then everything below is rebuilt. The wiki never drifts from reality because it is always rebuilt from reality.
- Write-time extraction beats query-time retrieval at scale. Tagging, embedding, edge extraction, and synthesis all happen on ingest or on a schedule, not when you ask a question. Query-time does graph traversal and LLM composition over already-extracted structure. This keeps query latency bounded and lets the cost of "understanding" your data amortise over time.
- Plugins are the right boundary. Domain knowledge — what counts as a meeting, which fields a client page has, how a newsletter category is named — lives in plugins. The core does not know your domains by name. New domains are new plugins, not patches to the engine.
- LLM as compiler, not oracle. The LLM compiles structured
inputs (a set of thoughts, a perspective prompt, a graph
neighbourhood) into structured outputs (a synthesis, a typed
edge, a wiki page). It does not get to decide what is true. The
typed schema, the provenance ranking, and the
find_contradictionstool are what keep it honest.
8. Roadmap¶
| Phase | What | Effort | Unlocks |
|---|---|---|---|
| v0.1 | First public release — engine, plugins, examples, docs | shipped | The whole roadmap below |
| v0.2 Phase 0 | MCP HTTP/SSE transport, Telegram digest timer (~1h) | 1–2 weeks | Features 3, 4 |
| v0.2 Phase 1 | Nightly Shift + pg_notify push event bus | 2–3 weeks | Proactive system |
| v0.2 Phase 1.5 | Wiki compiler refactor (Feature 6) — kill the monolith | 2 weeks | Removes concurrent-edit blocker, unblocks Phase 2 |
| v0.2 Phase 2 | Resurfacing Engine + Gap Radar | 2–3 weeks | Pain points 1, 2, 5 closed |
| v0.2 Phase 3 | Two-Way Notion Cockpit | 3–4 weeks | Bidirectional Notion, pain point 4 |
| v0.3+ | Scoped MCP tokens, multi-tenant team deployment, calendar source | Q4 2026+ | Team / client deployment |
Each phase ships independent value. The roadmap is the maintainer's plan; community contributions can re-order it.
9. Agentic framing — what this is really for¶
Exocortex is a UX for one person now. It is also a compiled substrate for future agents.
The graph (L2) with its typed edges, provenance-aware ranking, and
versioned syntheses is exactly the artefact a planning agent wants
to read. An agent preparing a meeting can query GraphRAG for context,
list open action items, and check for contradictions before
proposing an agenda. An agent drafting a newsletter can seed itself
from Gap Radar output. An agent validating a decision can check
provenance — "this synthesis is ai_authored and human_validated:
false; ask the human before citing it".
The difference from traditional RAG, again, is that graph traversal returns typed paths, not just similar fragments. An agent can reason about the state of your knowledge — "this claim is contested; the most recent decision was X; the contradicting source was superseded but the contradiction edge wasn't resolved" — not just retrieve text that mentions the claim.
The design rule that follows: every new feature exposes its output as an MCP tool, read-only, alongside any UI it ships. The Notion mirror is for you; the MCP tool is for whatever you (or you-as-an-agent) build next.
10. Open questions¶
Things v0.1 explicitly does not solve, listed honestly:
- Multi-tenant security. Today everything in the database is
one tenant. Scoped MCP tokens (
scope=['client_a']cannot seeclient_bdata) are sketched inschema/24but not enforced in retrievers. Don't deploy Exocortex as a shared service yet. - Conflict resolution for Two-Way Cockpit. Property-level last-write is the plan; whether that survives contact with real Notion editing patterns is unproven. Feature 3 is the highest-risk item on the roadmap.
- Trust calibration on proactive output. A system that pushes Telegram briefings every morning must be terse and have a high confidence threshold. The router-monthly-report incident in the upstream project taught us that walls-of-numbers-with-no-narrative get muted within a week.
- Calendar as a source. Meetings exist in your calendar before
Fireflies records them. Pulling tomorrow's calendar in as
pending_meetingthoughts would let the system pre-brief you. Not yet built. - Backups. The reference deployment uses a single DigitalOcean droplet. Postgres backups are off by default in the docker-compose. Turn them on before the system becomes critical to you.
- The compiler's appetite for new domains. Adding a new domain
today means editing the monolith (Feature 6 above). Until Feature 6
ships, adding a domain plugin is a contract on
wiki_compiler.compile_allthat still passes through a 10k-line file. - LLM cost ceiling under heavy load. The reference workload (one user, ~500 thoughts/week) sits well inside the cost envelope. A team workload (10 users, 5000 thoughts/week) has not been measured end-to-end. The router has cost stops; whether the cost stops fire at the right place under that load is unknown.
These are the honest gaps, not the marketing pitch. If any of them matters to your use case, open an issue and we can talk about whether v0.2 should reprioritise.
Last updated 2026-05-24. Sources: this document is adapted from the
internal vision file at source/backlog/_second-brain/F31-EXOCORTEX.md
in the private upstream, with PL → EN translation, scope adjustment
for the public engine, and the conceptual prior art credited per
ATTRIBUTION.md. Karpathy's "LLM Wiki for Notion"
and Nate Jones's "Open Brain" are the two essays the architecture
sits on top of; everything else is implementation.