Mnemonic — AI Memory Infrastructure

// THREE-TIER MEMORY

How memory is structured.

Inspired by human memory consolidation — fast writes at runtime, intelligent promotion offline. Three tiers, one coherent brain.

01● ACTIVE

Episodic

Short-term · Append-only

Every message, upload, and thought is instantly written as an immutable event. Zero blocking — the agent never waits for acknowledgement.

write latency<10ms

write pathReal-time, no LLM call

read pathRecent N events

storageAppend-only event log

02○ PLANNED

Semantic

Long-term · Graph + Vectors

Facts, entities, and relationships extracted from episodic events. Graph edges carry valid_until timestamps so contradictions are never silently deleted.

write latencyasync

write pathConsolidation pipeline

read pathSemantic search

storageKnowledge graph + vectors

03○ PLANNED

Identity

Core self · Compiled profile

Persistent user profile compiled from semantic memory. Injected into every system prompt as ground truth — no retrieval step, zero latency.

write latencyasync

write pathConsolidation → merge

read pathDirect lookup

storageCompiled profile JSON

// FIVE LAYERS

The full stack.

Each layer has a single responsibility. Applications talk to the API gateway. The gateway talks to the memory engine. Storage is pluggable.

Applications

Web DashboardClaude Desktop MCPSDK / API Clients

partial

API Gateway

FastAPI · Auth · Rate Limiting

partial

Memory Engine

Episodic MemorySemantic MemoryIdentity Memory

partial

Processing

Document ParserEmbedding PipelineConsolidation EngineContradiction Resolver

planned

Storage

Pinecone / QdrantPostgreSQLNeo4jRedis

partial

// MCP PROTOCOL

7 core operations.

A minimal, well-defined interface for agents to interact with memory. Each operation maps to exactly one memory tier.

Operation

Description

Latency

Status

memory.append

Write event to episodic memory

<10ms

live

memory.search

Semantic search across all tiers

~50ms

live

memory.consolidate

Trigger offline consolidation

async

live

memory.recall

Retrieve specific memory by ID

<5ms

planned

memory.forget

Mark memory entry for removal

<10ms

planned

memory.compile

Rebuild identity summary from graph

~200ms

planned

memory.status

Health check and usage stats

<5ms

live

* "live" = implemented in current MCP server · "planned" = Phase 2

// PIPELINE

The Full Pipeline

From chat bubble to updated ground truth — step by step.

Chat Input

User message arrives

A thought, question, or file upload enters the system.

System 1 — Fast Path

<10ms, no LLM call

Event is appended to the immutable stream instantly. No blocking, no extraction.

Events Accumulate

Append-only log

Events pile up in the stream, each with a timestamp and type marker.

System 2 Awakens

Background, async

The sleep consolidator activates — running offline, never blocking the agent.

Entity Extraction

LLM-powered parsing

Entities and relationships are extracted from unconsolidated events into graph edges.

Contradiction Detection

Temporal resolution

New facts are compared against existing edges. Contradictions are resolved with valid_until timestamps.

Memory Compilation

compiled_memory.md

The graph compiles into a concise markdown summary — active facts and superseded history.

Loop Complete

System prompt updated

Compiled memory is injected into the agent's system prompt. Ground truth is always current.

// BENCHMARKS

Performance comparison.

Measured across ingestion latency, contradiction resolution, and retrieval precision.

* Simulated benchmark data — real evaluation in progress

Ingestion Latency

Lower is better (ms)

Contradiction Resolution

Out of 3 contradictions

Retrieval Precision

Over 50 sessions

// ROADMAP

What's being built.

Four phases from working prototype to production memory platform. Phase 0 is complete — phases 1-3 are sequenced for minimum viable wiring first, intelligence second.

Phase 0● IN PROGRESS

Foundation

Current sprint

6/7

Clean repo structure (archive old files)
Technical architecture document
Next.js frontend with component library
3-panel brain dashboard
Sharp Editorial design system
mnemonic-autoimprove RAG optimizer
Unit test framework setup

Phase 1○ NEXT

Wire + Auth

Next sprint

0/7

PostgreSQL schema + Alembic migrations
JWT authentication middleware
Connect frontend to backend (/chat/enhanced, /upsert)
CORS lockdown + rate limiting via Redis
Error handling + retry logic in frontend
CI/CD pipeline (GitHub Actions)
Docker Compose for local dev

Phase 2○ FUTURE

Memory Engine

Three-tier implementation

0/6

Three-tier memory model (Episodic → Semantic → Identity)
Async consolidation pipeline
Contradiction detection & temporal resolution
Neo4j knowledge graph integration
MCP protocol v2 (7 operations)
WebSocket support for real-time updates

Phase 3○ FUTURE

Platform

Multi-user & analytics

0/6

Multi-user + team memory
Analytics dashboard (ClickHouse)
Python + TypeScript SDK
Plugin system
OAuth / SSO
Billing integration (open core model)

// GET INVOLVED

Open source memory infrastructure.

Mnemonic is built in the open. Star the repo, open an issue, or contribute a consolidation strategy.

View on GitHub Star on GitHub

pip install personal-brain-mcp