Skip to main content

Quick Introduction

Memory Router is a transparent proxy that sits between your app and the model. Point your existing SDK at MemoryLake and every conversation gains long-term memory and an optimized context window — no new SDK, no retrieval pipeline to build.
  • One-line integration — change the base URL, keep your SDK and code exactly as they are
  • BYOK or hosted — bring your own provider key (encrypted, never stored), or use MemoryLake-hosted models with a single key
  • Shared memory pool — the Router and the MemoryLake API read and write the same memories, so there is one source of truth
Memory Router is OpenAI-protocol compatible and speaks the same API as your provider. Your prompts, streaming, and tool calls stay identical.

The Problem It Solves

Every LLM call is stateless. To fake continuity you re-send the entire history on every turn — which is slow, expensive, and eventually overflows the context window. Bolting on a vector DB and retrieval pipeline solves it, but it is weeks of plumbing you have to build and maintain.

Without a memory layer

  • Full chat history re-sent on every call — token cost climbs with conversation length.
  • Long sessions hit the context-window ceiling and start truncating mid-task.
  • Memory lives in one app — switch models or sessions and the context is gone.

Building it yourself

  • Stand up a vector DB, embeddings pipeline, chunking, and retrieval logic.
  • Write extraction, dedup, and relevance ranking — then keep it tuned.
  • Maintain it across every provider and every model you support.
Memory Router collapses all of that into one base-URL change. The memory layer is the proxy.

What You Get

CapabilityWhat it means
One-line integrationChange the base URL. Keep your SDK and your code exactly as they are.
BYOK or hostedBring your own provider key (encrypted, never stored) or use MemoryLake-hosted models with a single key.
Automatic context optimizationRedundant history is removed and only relevant memory is injected, shrinking tokens per call.
Shared memory poolThe Router and the MemoryLake API read and write the same memories — one source of truth.
Graceful fallbackIf MemoryLake is ever unavailable, the request passes straight through to your provider. Zero downtime.
Full observabilityResponse headers report conversation IDs, context changes, token counts, and memories created or retrieved.

Direct API Call vs. Memory Router

Direct provider callWith Memory Router
Long-term memoryYou build and host itBuilt in, automatic
Context windowRe-send everything, then truncateOptimized — only what matters
Keys & accountsA provider account is requiredBYOK or use just a MemoryLake key
Code changesNew SDK + retrieval pipelineOne base-URL change
Across sessions & modelsMemory is siloed per appShared memory pool
Provider outage of memory layerYour problem to handleGraceful passthrough
VisibilityNone by defaultDiagnostic response headers

Quick Start

  1. Get a MemoryLake key: Sign up and create an API key in the console.
  2. Pick a mode and swap the base URL: Choose BYOK or MemoryLake-hosted and point your SDK at the Router.
  3. Call as normal: Send requests exactly as you do today — memory is recalled and stored automatically.

Documentation

How It Works

Understand the transparent proxy and what happens on each request.

Quickstart

Go live in three steps with copy-paste code for BYOK and hosted modes.

Deployment Modes

Compare BYOK and MemoryLake-hosted, base URLs, supported providers, and key safety.

Observability

Read the diagnostic response headers and understand graceful fallback.

FAQ

Common questions about code changes, providers, security, and pricing.