The BEAM Architecture

For the first few days, Mnemosyne was basically a notepad. You wrote something down, you could find it later. Simple. But as I started using it with real conversations, I noticed a problem: not all memories are equal, and treating them that way makes your AI dumber than it needs to be.

Here's what I mean. When you tell an agent "I'm a Python developer," that's a fact. It should stick around forever. When you say "I'm debugging a database issue right now," that's temporary context. It matters for the next hour, not the next year. When you describe how you like your code formatted, that's a preference. It should influence behavior but not clutter every search result. And the ongoing conversation itself? That's a story that needs to be recalled chronologically.

Four memories, one system

I sketched out what would become BEAM on a piece of scrap paper. Four memory types, each with its own storage strategy and retrieval logic. Working memory for temporary context. Episodic memory for the narrative of your relationship. Semantic memory for facts and preferences. Procedural memory for how you like things done.

The hard part wasn't the concept. The hard part was making them work together without turning every query into a slow, expensive mess. I needed fast vector search for semantic similarity. I needed full-text search for exact matching. I needed temporal queries for "what happened last Tuesday." And I needed all of it in SQLite.

sqlite-vec and FTS5

sqlite-vec is a loadable SQLite extension that gives you vector search without a separate database. FTS5 is SQLite's built-in full-text search engine. Together, they let you do hybrid scoring: combine semantic similarity with keyword relevance in a single query. It's not as fancy as a dedicated vector database, but it lives in the same file as everything else. No network calls. No separate process. No additional dependency beyond a shared library.

Getting them to play nice took some doing. The sqlite-vec extension needs to be compiled for your platform. FTS5 has its own query syntax that doesn't quite match what you'd expect. And hybrid scoring requires careful weighting: too much vector and you miss exact matches; too much text and you miss conceptual similarity.

The first consolidation

Once I had four memory types working, I hit another problem: memories accumulate. Fast. A busy agent might create hundreds of working memory entries per day. Search gets slower. Storage grows. Old temporary context starts drowning out important facts.

I built the first consolidation system on day five. It was simple: summarize old working memories, merge duplicate semantic facts, and archive episodic memories that hadn't been accessed in a while. The agent could call a `sleep()` function to trigger it, like a nightly cleanup routine. It wasn't elegant, but it worked. And it taught me something important: memory systems need to forget strategically, not just remember everything.

What I learned

The BEAM architecture wasn't planned. It emerged from watching the system struggle with real conversations and fixing the specific problems that came up. Working memory was too noisy. Semantic memory was too slow to retrieve. Episodic memory lacked chronology. Each fix added a new type, and eventually the four-type system just... made sense.

The benchmarks I ran that week showed the payoff. LongMemEval scores jumped from "embarrassing" to "competitive with cloud providers." Response times stayed under a millisecond for typical queries. And the whole thing still ran on a single SQLite file. Not bad for a week's work. But the real test was still ahead: making it work with something other than my own hacked-together scripts.