Mnemosyne 2.0: A Memory System That Actually Remembers

24 days ago, Mnemosyne was a single Python file with a SQLite database and a dream. Today it ships with 292 passing tests, 8 architectural phases, and enough features that I had to write them down before I forgot half of them. Which, honestly, is exactly the problem Mnemosyne solves.

Entity extraction that just works

When you save a memory, Mnemosyne now pulls out the important names automatically. @mentions, #hashtags, quoted phrases, capitalized sequences. It uses fuzzy matching too, so "Alice" and "Alice M." resolve to the same entity. You just pass extract_entities=True on remember() and it handles the rest. No extra config.

Structured facts from raw text

Instead of dumping entire conversations into a vector DB and hoping similarity search finds the right part, Mnemosyne now extracts structured facts. Things like "user prefers neovim" or "project uses pytest with xdist." It does this via LLM, with a fallback chain: your remote API, then a local GGUF model, then it just skips gracefully. No crash, no hang, just a quiet degradation.

Temporal recall

Memories decay over time. Not delete, just fade. Recent stuff surfaces first. The scoring uses exponential decay with a configurable halflife, so you can tune how fast memories age. Ask for something from an hour ago and it's right there. Ask for something from two weeks ago and it still shows up, just ranked lower. This applies across all memory tiers: working, episodic, entity, and fact.

Hybrid scoring you can tune

Every recall query blends three signals: vector similarity (semantic meaning), FTS5 text search (keyword relevance), and importance (how critical the memory is). The defaults are 50/30/20, but you can override per query. Need exact keyword match? Crank up FTS weight. Need conceptual similarity? Lean on vectors. No global state mutation, just per-query tuning.

Memory banks

Spin up isolated memory namespaces with one line: Mnemosyne(bank="work"). Each bank gets its own SQLite file. Work memories don't leak into personal memories. Side project memories don't pollute your main project. Create, delete, rename, list, check stats. Bank names are validated: alphanumeric plus hyphens and underscores, 64 chars max. Simple, predictable, no surprises.

MCP server

Mnemosyne now speaks the Model Context Protocol. That means Claude Desktop, Cursor, and any MCP-compatible client can use your local memory directly. It supports both stdio transport (for desktop apps) and SSE transport (for web clients). Just run mnemosyne mcp and point your client at it. Six tools exposed, per-bank instance caching so you're not spinning up databases on every call.

15 Hermes tools

If you're running Hermes Agent, Mnemosyne plugs in with 15 tools: remember, recall, stats, triple_add, triple_query, sleep, scratchpad_write, scratchpad_read, scratchpad_clear, invalidate, export, update, forget, import, and diagnose. Plus three lifecycle hooks: pre_llm_call injects compressed context into every prompt, on_session_start restores previous session state, and post_tool_call catches anything that should be saved. AAAK compression keeps the injected context small.

Streaming, sync, compression, patterns

The Phase 8 stuff. MemoryStream gives you push (callbacks) and pull (iterator) event streams, thread-safe, so your agent can react to memory events in real time. DeltaSync does checkpoint-based incremental sync between instances. MemoryCompressor handles dictionary-based, RLE, and semantic compression. PatternDetector spots temporal patterns (what hour, what day of week), content patterns (keyword co-occurrence), and sequence patterns across your memory history.

Plugin system

Three built-in plugins ship with 2.0: LoggingPlugin, MetricsPlugin, and FilterPlugin. Want to build your own? Extend MnemosynePlugin, implement up to four lifecycle hooks, and drop it in ~/.hermes/mnemosyne/plugins/. The PluginManager auto-discovers everything in that directory. No registry, no config file, just drop a Python file and go.

CLI rewritten from scratch

The old CLI used stale v1 internals. The new one runs entirely on the v2 Mnemosyne/BeamMemory stack. Search your memory, inspect what's stored, run diagnostics, export everything. Commands make sense now. mnemosyne stats, mnemosyne recall "query", mnemosyne mcp. No PhD required.

SQLite WAL mode

Both the core memory engine and the BEAM layer now run in WAL (Write-Ahead Logging) mode with a 5-second busy timeout. What that means in practice: concurrent reads don't block writes, and writes don't block reads. If you've got an agent hammering the database from multiple threads, it just works. The test suite validates this explicitly now.

292 tests

Not a vanity metric. These exist because I broke things enough times that I stopped trusting myself without proof. New test files cover entities, entity integration, banks, MCP tools, streaming, and temporal recall. Every tearDown handles WAL cleanup properly. When you pip install mnemosyne-memory, it works.

What's next

Docs are getting a full rewrite. The website you're reading is being rebuilt. And I'm already sketching what's next: better import/export, more embedding options, and maybe some surprises.

If you've been waiting for a memory system that treats your privacy as a feature, not an obstacle, this is it. Install it, try it, break it, and tell me what you find. The GitHub issues page is always open.