We need your help.Read our story
M
All posts
Story2026-05-056 min read

How I Taught Mnemosyne to Remember (Almost) Forever

A hallucinated dashboard, a decision matrix, and the contamination discovery that changed everything. The real story of how Mnemosyne v2.3 got tiered memory degradation, smart compression, and a confidence signal for every memory.

architecturememorydegradationengineeringbehind-the-scenes

It started with a roadmap. I was using PlanForge to lay out the v2.3 feature plan when I noticed something weird. The generated document claimed I was going to build a dashboard with degradation stats. There was one problem: I didn't build one. But Wysie had done it, not me! ( github.com/wysie/mnemosyne-dashboard ). My own AI tooling had hallucinated a feature into my roadmap.

That moment was a wake-up call. My entire memory system had no concept of which facts were real and which were contamination. Everything went into the same bucket with the same trust level. If my own tooling couldn't tell the difference, how could anyone else's?

The marketing test

Around the same time, I was thinking about what a memory system should actually promise its users. I wanted to be able to say "Mnemosyne remembers what you told it a year ago" and have it be literally true. Not marketing fluff. Actual database behavior.

But how do you keep a year of memories searchable without the database growing forever and without the agent's context window filling up with irrelevant ancient history?

The decision matrix

I built a weighted decision matrix with six factors: long-term recall quality, automated maintainability, database growth rate, CPU/storage cost, recall speed, and ease of testing. The simple "raise the limit" approach scored zero on maintainability. The tiered approach got top marks across the board.

The design was simple: three tiers. Hot memories (0-30 days) stay at full detail. Warm memories (30-180 days) get compressed by an LLM into a summary. Cold memories (180+ days) get reduced to their key entities and facts. Every time the system runs its sleep cycle, old memories quietly move down a tier. No manual pruning. No admin dashboard. It just works.

The contamination discovery

While building Phase 2 (smart compression), I hit something unsettling. A generated roadmap document claimed I was going to show degradation stats in a dashboard. Thing is, I don't have a dashboard. Wysie built one for the community at github.com/wysie/mnemosyne-dashboard. My own tooling had hallucinated a feature plan and I hadn't caught it.

That was the moment it clicked. The memory system has no idea which facts are real and which are contamination. It stores everything with the same confidence. When an agent infers something, when a cron job injects data, when you explicitly state a preference... all of it goes into the same bucket with the same trust level.

I opened an issue on the dashboard repo to suggest tier visualization. Then I cleaned up the hallucinated roadmap. And I realized the tiered degradation system is only half the story. The other half is knowing which memories to trust.

The veracity signal

Every memory in Mnemosyne v2.3 now carries a veracity tag. Five levels: stated (you said it directly), inferred (the agent guessed it), tool (automation injected it), imported (came from another platform), and unknown (legacy or uncategorized).

This affects recall in a natural way. Stated facts get full weight. Inferred ones get 70%. Tool-injected data gets 50%. And there's a new method called get_contaminated() that surfaces everything the system stored without your explicit say-so. You can review it, promote it, or let it fade.

Three phases, one day

Phase 1 (tiered degradation), Phase 2 (smart compression), and Phase 3 (veracity signal) all shipped in a single day. 51 tests, all green. The core engine is about 350 lines. The marketing claim is now actually true: a 200-day-old memory, degraded to its barest signal, is still findable with the right query.

I built this with AI assistance. I'm not hiding that. The vision, the architecture decisions, the scoring weights, the voice guidelines, the marketing strategy, those are mine. The SQL and the Python and the test harness came from an agent that knows more about code than I do. That doesn't make it less real. If anything, it makes it more honest. The whole point of Mnemosyne is that your AI should remember what matters. This release makes that promise true.

Full changelog at github.com/AxDSan/mnemosyne .

A

Abdias J

Building Mnemosyne in public. No VC, no cloud lock-in, just code that works.