Architecture

How Synap works

Memory is not a storage problem alone. It is an active context-management problem. This page walks through how a message becomes structured, scoped memory, and how each layer was designed to prevent a specific failure we watched happen in production.

92%LongMemEval accuracy 93.2%LoCoMo accuracy <15msP50 recall latency

01The trilemma

Context management has three jobs

Your agent does not know what it does not know. The agent that needs context is the same agent that is missing it. So storing and searching is not enough. Managing context well means doing three things at once.

Job

What it means

The failure when it is missing

Capture

Learn what matters from the conversation, with high enough recall

A 32-day audit of 10,134 stored entries found 38 usable. A 97.8% junk rate. The system stored first and extracted never.

Compact

Shrink history without losing signal, with high enough precision

An 18,282-token context compressed to 122 tokens, and accuracy dropped below having no memory at all.

Recall

Surface the right information when the agent needs it, not when it guesses

A Pro customer was told twice they upgraded, then sent the upgrade link fifteen turns later.

Same paradigm, three failure surfaces. Synap was designed around all three.

The memory problem in depth →Why we built Synap →How Synap works →

02The lifecycle

What happens to a message after you send it

Every message flows through the same arc. It is ingested, its meaning is extracted into structured memories, those memories are stored across a vector, graph, and file store, and they are retrieved to enrich the next turn. The first three stages run asynchronously and never block your app.

Writes and reads are asynchronous and never block your agent. Consolidation and conscious forgetting run as background cycles on the stores.

Stage 1

Ingest

You record a turn or submit a document through the SDK. The call returns immediately with an ingestion ID. The scope identifiers you pass decide where the memory lives.

Stage 2

Extract

The content runs through a multi-stage pipeline: categorize, extract structured memory, chunk, resolve entities, organize.

Stage 3

Store

Processed memories are persisted in a vector store for semantic similarity and a graph store for entity relationships, with source documents kept in a file store, all scoped to the right level.

Stage 4

Retrieve

On the next turn, Synap searches the applicable scopes, ranks results, and returns the most relevant memories within your token budget, anticipating the need before the agent asks.

03Extraction

Extraction is the first step, not the last

Most memory systems store raw text and hope retrieval sorts it out later. That is how a system ends up storing “user mentioned a plan” instead of “user upgraded from Starter to Pro on April 3.” No retrieval algorithm recovers that lost precision.

Synap inverts the order. It identifies structured knowledge from the raw conversation first: facts, preferences, episodes, emotions, and temporal events, each extracted with its surrounding context. Retrieval quality is bounded by extraction quality, so the right information enters the system with the right structure from the start.

04Entity resolution

“Sarah,” “Sarah Chen,” and “SC” are the same person

Without entity resolution, a system stores three unrelated strings and retrieves whichever one happens to match the query. Synap resolves entities automatically during ingestion and during later consolidation cycles, with no extra SDK calls.

It uses four strategies in descending order of confidence, so “Alex from billing” resolves to a different person than “Alex from engineering.” New entities auto-register at the customer scope, and the registry builds itself into an organizational knowledge graph as conversations happen. When a match is ambiguous, the entity goes to a review queue instead of a silent wrong guess.

Exact match Alias match Semantic match Contextual match

05Compaction

Compaction is not summarization

A naive summarizer throws away the details that mattered because it has no way to know what will be needed downstream. Synap uses adaptive compaction strategies chosen for the conversation, and every result includes a validation score, a preserved-facts count, and a compression ratio.

If the score drops below a threshold, you know critical information was lost and can retry with a less aggressive strategy. Compaction reduces the current conversation’s footprint; retrieval brings in knowledge from past conversations. A production turn does both.

ConservativeBalancedAggressiveAdaptiverecommended

06Scoping (IACS)

One user’s memory never leaks into another’s session

Most systems scope everything to the user, which breaks the moment you have more than one organizational boundary. Synap supports a hierarchical scope chain: User, Customer, and Client. Memories are stored at the right scope at ingestion, and retrieval respects those boundaries automatically, preferring the narrowest applicable scope. We call this Intelligent and Automated Context Scoping (IACS).

Client · your whole app

Shared product knowledge, visible to everyone

Customer · Acme

Policies, team, and shared projects for this tenant

User · Alice

Facts, preferences, and episodes about this person

private to Alice

Customer · Globex

Policies, team, and shared projects for this tenant

User · Bob

Facts, preferences, and episodes about this person

private to Bob

A user's memory never leaks into another user's or another tenant's session, while organizational knowledge stays shared where it should be. Retrieval prefers the narrowest applicable scope.

07Storage and retrieval

Three stores, two retrieval speeds

Synap stores memory in a vector store for semantic similarity, a graph store for entity relationships, and a file store for raw documents, and queries them together. Retrieval has two modes, so the agent spends latency only where it pays off.

Fast

Vector and graph search, tuned for low latency

The default in the agent hot path, built for interactive turns where the response cannot wait.

Accurate

Adds subquery decomposition and reranking

For high-value or complex queries, where deeper recall is worth the extra work.

08MACA

A memory architecture designed for your agent

A customer support agent cares about ticket history and plan details. A voice concierge cares about guest preferences and booking constraints. A universal memory model serves both badly.

Synap generates a Memory Architecture for each agent from a use-case description you provide. It governs what is extracted, how it is scoped and stored, how it is retrieved, and how long it is retained. You describe the agent; Synap derives the architecture. There is no schema to hand-author.

See the configuration surface in the docs →

09Multi-agent

Context that survives the handoff

Production agents are rarely one agent. They are a router, two or three specialists, and a human escalation path. If those agents cannot share context, the customer repeats themselves at every handoff.

Synap handles multi-agent natively. Agents share a central context layer while keeping their own agent-specific memories, and the scope chain handles isolation.

10The numbers

92% on LongMemEval. 15ms at P50.

These are a consequence of the architecture described above, not of prompt tricks or model selection. Every layer addresses a specific failure mode observed in production and in the community. The methodology is published and the eval harness is open source, so you can verify every number.

92%LongMemEval accuracy 93.2%LoCoMo accuracy <15msP50 recall latency

View the open-source eval harness on GitHub →

Get started

Start building with Maximem Synap

No credit card required. Google or GitHub sign-in.

Get Started Free →Read the docs

View Synap pricing →View open-source repo on GitHub →