Quick Introduction
Memory Router is a transparent proxy that sits between your app and the model. Point your existing SDK at MemoryLake and every conversation gains long-term memory and an optimized context window — no new SDK, no retrieval pipeline to build.- One-line integration — change the base URL, keep your SDK and code exactly as they are
- BYOK or hosted — bring your own provider key (encrypted, never stored), or use MemoryLake-hosted models with a single key
- Shared memory pool — the Router and the MemoryLake API read and write the same memories, so there is one source of truth
Memory Router is OpenAI-protocol compatible and speaks the same API as your provider. Your prompts, streaming, and tool calls stay identical.
The Problem It Solves
Every LLM call is stateless. To fake continuity you re-send the entire history on every turn — which is slow, expensive, and eventually overflows the context window. Bolting on a vector DB and retrieval pipeline solves it, but it is weeks of plumbing you have to build and maintain.Without a memory layer
- Full chat history re-sent on every call — token cost climbs with conversation length.
- Long sessions hit the context-window ceiling and start truncating mid-task.
- Memory lives in one app — switch models or sessions and the context is gone.
Building it yourself
- Stand up a vector DB, embeddings pipeline, chunking, and retrieval logic.
- Write extraction, dedup, and relevance ranking — then keep it tuned.
- Maintain it across every provider and every model you support.
What You Get
| Capability | What it means |
|---|---|
| One-line integration | Change the base URL. Keep your SDK and your code exactly as they are. |
| BYOK or hosted | Bring your own provider key (encrypted, never stored) or use MemoryLake-hosted models with a single key. |
| Automatic context optimization | Redundant history is removed and only relevant memory is injected, shrinking tokens per call. |
| Shared memory pool | The Router and the MemoryLake API read and write the same memories — one source of truth. |
| Graceful fallback | If MemoryLake is ever unavailable, the request passes straight through to your provider. Zero downtime. |
| Full observability | Response headers report conversation IDs, context changes, token counts, and memories created or retrieved. |
Direct API Call vs. Memory Router
| Direct provider call | With Memory Router | |
|---|---|---|
| Long-term memory | You build and host it | Built in, automatic |
| Context window | Re-send everything, then truncate | Optimized — only what matters |
| Keys & accounts | A provider account is required | BYOK or use just a MemoryLake key |
| Code changes | New SDK + retrieval pipeline | One base-URL change |
| Across sessions & models | Memory is siloed per app | Shared memory pool |
| Provider outage of memory layer | Your problem to handle | Graceful passthrough |
| Visibility | None by default | Diagnostic response headers |
Quick Start
- Get a MemoryLake key: Sign up and create an API key in the console.
- Pick a mode and swap the base URL: Choose BYOK or MemoryLake-hosted and point your SDK at the Router.
- Call as normal: Send requests exactly as you do today — memory is recalled and stored automatically.
Documentation
How It Works
Understand the transparent proxy and what happens on each request.
Quickstart
Go live in three steps with copy-paste code for BYOK and hosted modes.
Deployment Modes
Compare BYOK and MemoryLake-hosted, base URLs, supported providers, and key safety.
Observability
Read the diagnostic response headers and understand graceful fallback.
FAQ
Common questions about code changes, providers, security, and pricing.