Memory for Hermes
The native memory system for Hermes Agent. SQLite-backed, sub-millisecond, zero dependencies. No cloud. No API keys. Just pure speed.
Migrate from Zep, Mem0, Honcho, or Hindsight in one command — see migration docs
Three lines. Infinite memory.
No configuration files. No environment variables. No cloud accounts. Import, remember, recall. That is all.
- pip install mnemosyne-memory
- Zero external services required
- Works offline, always
- Hermes Agent integration built-in
from mnemosyne import remember, recall
# Store a memory
remember(
"User prefers dark mode",
importance=0.9,
scope="global"
)
# Retrieve relevant context
results = recall("user preferences")
# => [{"content": "User prefers dark mode", ...}]Everything you need. Nothing you do not.
Built from the ground up for AI agents that need fast, reliable, persistent memory.
Sub-Millisecond Latency
Direct SQLite access delivers <1ms queries. No network overhead. No HTTP roundtrips.
100% Private
All data stays on your machine. No cloud services. No data leaves your device, ever.
Native Vector Search
sqlite-vec integration for semantic search. Hybrid ranking: 50% vector + 30% FTS + 20% importance.
Beam Architecture
Three-tier memory: working_memory for hot context, episodic_memory for long-term, scratchpad for reasoning.
Auto Consolidation
Old working memories are automatically summarized and moved to episodic storage via sleep cycles. Configurable auto_sleep intervals.
Hybrid Search
Combines vector similarity, full-text search, and importance scoring for the best recall accuracy.
Streaming & DeltaSync
Real-time incremental memory updates via DeltaSync. Stream results as they arrive — no more waiting for full batches.
Smart Filtering
ignore_patterns blocks noisy or irrelevant content from entering memory. Keep your context window clean and focused.
Numbers that speak
Measured on CPU with sqlite-vec + FTS5. No GPU required.
| Operation | Honcho | Zep | Mem0 | Mnemosyne |
|---|---|---|---|---|
| Write | 45ms | 85ms | 50ms | 0.81ms |
| Read | 38ms | 62ms | 45ms | 0.076ms |
| Search | 52ms | 78ms | 60ms | 1.2ms |
| Cold Start | 500ms | 800ms | 300ms | 0ms |
BEAM Benchmark (ICLR 2026)
End-to-end memory retrieval at scale. LLM-as-judge against published baselines.
Mnemosyne vs. cloud memory providers
See exactly what you gain — and what you trade — when you switch.
| Feature | Mnemosyne | Honcho | Zep | Mem0 |
|---|---|---|---|---|
| Cost | Free forever | $$$ Paid (credits) | $$$ Paid (Flex+) | Freemium ($0–$249/mo) |
| Hosting | Local — your machine | Cloud only | Cloud / BYOC | Cloud only |
| Privacy | 100% local, zero exfil | External API calls | External API calls | External API calls |
| Offline mode | Yes — airplane mode | No | No | No |
| Setup | pip install | Docker + API keys | Docker + Postgres | API key + signup |
| Vector store | sqlite-vec (built-in) | pgvector (external) | pgvector (external) | pgvector (external) |
| Full-text search | FTS5 (built-in) | Separate service | Separate service | Separate service |
| Auth required | None | Supabase auth | OAuth / API key | API key |
| Rate limits | Unlimited | Plan-dependent | Credit-based | Plan-dependent |
| Data ownership | You own the SQLite file | Vendor-hosted | Vendor-hosted | Vendor-hosted |
| Export / import | One JSON file | Limited | Limited | Limited |
| Dependencies | Python stdlib + ONNX | Docker, Postgres | Docker, Postgres | pip + API key |
| Memory architecture | BEAM (3-tier) | Session + facts | Graph RAG + facts | Session + facts |
| Auto-consolidation | Sleep cycles built-in | Manual / paid | Manual | Manual |
| Temporal triples | Native with validity | No | No | No |
| LongMemEval | 98.9% Recall@All@5 | Not published | Not published | Not published |
| BEAM-100K | 35.4% / 19.3% / 19.2% | Not published | Not published | Not published |
Switching from Honcho
500x faster reads, zero monthly bill, 100% offline, no Docker, no credit system
Cloud dashboard, managed scaling, team sharing
Switching from Zep
43x faster search, no PostgreSQL to maintain, no deployment overhead, instant cold start
Graph RAG viz, SOC 2 certs, managed BYOC
Switching from Mem0
Sub-millisecond everything, no rate limits, no vendor lock-in, full data portability
Managed platform, 90K+ community, YC ecosystem
Switching from Hindsight
Zero dependency, no network calls, SQLite-native, BEAM architecture
Cloud sync, managed inference, web dashboard
The bottom line
- ✓Speed: 43–500x faster than cloud alternatives — zero HTTP roundtrips.
- ✓Privacy: Data never leaves your machine. No API calls. No telemetry.
- ✓Cost: Zero ongoing cost. No credits. No tiers. No "contact sales."
- ✓Simplicity: One pip install. No Docker. No config. No signup.
Trade-off: You manage your own backup (one SQLite file). No web dashboard or team collaboration — Mnemosyne is built for individual developers and local agents.
Bilevel Episodic-Associative Memory (Beam)
Three SQLite tables working in harmony. Working memory for hot context auto-injected into prompts. Episodic memory for long-term storage with native vector + FTS5 search. Scratchpad for temporary agent reasoning.
Working Memory
Hot, recent context — auto-injected into prompts. Session-scoped by default, global scope available.
Episodic Memory
Long-term storage with sqlite-vec + FTS5. Hybrid ranking for semantic + text search.
Scratchpad
Temporary agent reasoning workspace. Cleared per session.
# Working memory — auto-injected
beam.remember("User prefers dark mode")
# Episodic — long-term with embedding
beam.remember(
content="Detailed project context...",
source="conversation",
importance=0.8
)
# Hybrid recall across both tiers
results = beam.recall("user preferences")
# Consolidation — move old to episodic
beam.sleep() # Compress & summarizeOne command. Zero configuration.
Get started in seconds. No setup required.
# Basic install pip install mnemosyne-memory # With all features (dense retrieval + local LLM) pip install mnemosyne-memory[all] # As Hermes MemoryProvider python -m mnemosyne.install
Built for production
"Been running it today (replaced mem0) and so far I am really impressed. Well done on building this!"
"Mnemosyne replaced our entire memory infrastructure. From 50ms average latency to sub-millisecond. Unreal."
"The Beam architecture just makes sense. Working memory for context, episodic for long-term, automatic consolidation."
Give your agent a memory
Join the growing number of developers who have replaced cloud memory services with something faster, simpler, and completely private.