How It Works

A Transparent Proxy

Memory Router sits between your application and the model provider. Your app talks to the Router exactly as it would talk to the provider — same payload, same SDK, same response shape — and the Router adds a memory layer transparently on the way through.

Because the Router speaks your provider’s protocol, integrating it is a base-URL change. Nothing else in your code needs to move.

The Four Steps

Intercept

Your app sends the request to Memory Router instead of the provider — same payload, same SDK, same response shape.

Optimize context

The Router trims redundant history, searches prior memories, and injects only the relevant context into the prompt.

Forward

The enhanced request goes to the model — your own provider (BYOK) or a MemoryLake-hosted model. Tokens in are lower than a raw replay.

Remember

New memories are extracted and stored asynchronously in the background — the response is never delayed.

Why It Saves Tokens

Instead of replaying the entire conversation each turn, the Router removes redundant context and injects only the relevant memories. As conversations grow, fewer tokens are sent per call — and you never hit the context-window ceiling that forces mid-task truncation.

Memory extraction and storage happen asynchronously. The model’s response is returned as soon as it is ready; writing new memories never adds latency to the call.

A Transparent Proxy

The Four Steps

Why It Saves Tokens

Shared Memory Pool

Next Steps

Quickstart

Deployment Modes

​A Transparent Proxy

​The Four Steps

​Why It Saves Tokens

​Shared Memory Pool

​Next Steps

Quickstart

Deployment Modes

A Transparent Proxy

The Four Steps

Why It Saves Tokens

Shared Memory Pool

Next Steps