How Synap Works Under the Hood

We launched Maximem Synap today. We are builders and love open-source. We took a conscious call to not open source the whole repo on Day 1 for commercial and focus reasons (we are just 3 passionate folks building aggressively). However, our developer SDK, eval harnesses are open source. And we do intend to build a new benchmark that accurately measures needs of agents at enterprises, collaboratively with our peers and the open-source community. We do intend to open-core some independent parts and libraries we built to make Synap, at some point though.

But we want you to have a peek. Hence, this blog post.

Now, most technical overviews of memory systems start with the storage layer. What database, what embedding model, what chunk size and so on.

We are going to start with what breaks in production, and then walk through how each layer of the architecture was designed to prevent that specific failure. If you have not read why we built Synap, the short version: existing memory tools treat context as a storage problem alone. Synap treats it as something that needs to be actively managed, per agent, per domain, per customer. That distinction touches every design decision described here.

Extraction is the first step, not the last

We believe that a context management system needs to be contextual and not universal. When you connect an agent with Synap; we craft a custom context architecture based on the details you share about the agent. This is the bedrock of everything else we do at Synap for context-management. Context architectures are not configs. They essentially govern every part of how we manage context for an agent. (unfortunately, not at the liberty of sharing much about it right now).

When a conversation enters Synap through the SDK (which is async-first and never blocks your application), it does not land in a database. It enters a multi-stage pipeline that categorizes, extracts, chunks, organizes, and resolves entities, resolves temporal data; before anything reaches storage.

This ordering exists because of a specific production failure. A developer audited 10,134 entries stored by a popular memory library over 32 days. Thirty-eight were usable. The rest were boot-file restatements, cron noise, config dumps, hallucinated user profiles. A 97.8% junk rate. The entries were faithfully stored, which was precisely the problem. The system ingested first and extracted later, or not at all.

Synap inverts this. The pipeline identifies categories of structured knowledge from raw conversation text: facts (what has been stated or confirmed), preferences (how the user likes things done), episodes (what happened and when), emotions (sentiment and frustration signals), and temporal events (deadlines, dates, scheduling context) and so on. Each category is extracted with its surrounding context, not as isolated fragments.

The reason this matters downstream is straightforward. If the system stores "user mentioned a plan" instead of "user upgraded from Starter to Pro on April 3," no retrieval algorithm will recover that lost precision. Retrieval quality is bounded by extraction quality. We designed the pipeline so that the right information enters the system with the right structure from the start.

"Sarah," "Sarah Chen," and "SC" are the same person

Here is a problem that surfaces quietly and then ruins everything.

A customer mentions "my manager, Sarah" in a conversation. Two days later, the same customer refers to "Sarah Chen from the partner team." A week after that, someone else in the same organization submits a ticket referencing "SC." Three mentions, one person. Without entity resolution, the system stores three unrelated strings and retrieves whichever one happens to match the query embedding.

We watched this break real support workflows. The agent would retrieve context about "Sarah" but miss everything filed under "Sarah Chen" or "SC," producing responses that were incomplete and occasionally contradictory.

Synap runs entity resolution automatically during ingestion and later during consolidation cycles (that mimic human brain's consolidation turns and frequencies as well as how humans consolidate offline). No extra SDK calls. The resolution process uses four matching strategies in descending order of confidence: exact match against canonical names, alias matching, semantic matching using vector embeddings, and contextual matching where surrounding text disambiguates between candidates. "Alex from billing" resolves to a different person than "Alex from engineering" even though the name is identical.

When the system encounters an entity it has not seen before, it auto-registers at the customer scope. The entity registry grows organically as conversations happen. You do not pre-populate it. By the fiftieth conversation, the registry has built itself into a reasonably complete organizational knowledge graph, and every retrieval benefits from that accumulated structure.

For ambiguous matches (multiple candidates with similar confidence scores), the entity gets placed in a review queue rather than silently picking the wrong one. You resolve these through the Dashboard or the API. The system never guesses when it is not confident. It flags uncertainty and waits.

Compaction is not summarization

There is a distinction here that we learned by watching summarization fail.

A well-cited study tracked an agent whose context grew to 18,282 tokens with 66.7% accuracy. A single summarization step compressed that to 122 tokens. Accuracy dropped to 57.1%, which was worse than having no context at all. The summarizer threw away the details that mattered because it had no way to know what would be needed downstream.

One approach that we studied compounds this by capping its summarizer at the same token limit as the agent's context window. The summarizer, which was specifically chosen for its larger context capacity, cannot see the overflow it was hired to compress. The tool you bring in to fix the problem inherits the exact constraint that caused it.

Synap uses a combination of compaction strategies based on the context of the conversation, session, agent and so on (now you see how deep the custom context architecture goes).

What makes this different from every other compression approach we have evaluated is quality validation. Every compaction result includes a validation score, a preserved facts count, and a compression ratio. If the validation score drops below a threshold, you know critical information was lost and you can retry with a less aggressive strategy. Most systems compress and hope for the best. Synap tells you whether the compression worked.

Compaction and retrieval serve complementary roles. Compaction reduces the current conversation's token footprint. Retrieval brings in relevant knowledge from past conversations and other sessions. A typical production flow uses both: retrieve past memories, compact the current conversation if it has grown long, combine the results with recent turns into the prompt. The SDK makes this a three-call pattern.

Your user's preferences should never leak into another user's session

Most memory systems scope everything to the user. All memories go into one bucket, and retrieval searches that bucket.

This breaks the moment you have more than one organizational boundary. A SaaS company building a support agent needs user-level memory (this person's preferences and history), customer-level memory (shared knowledge about the organization, their plan, their team structure, their past issues), and possibly client-level memory (patterns across all organizations, like common feature requests or known bugs). Flattening all of this into one scope means either the agent misses organizational context or one user's personal preferences bleed into another user's session.

Synap supports a hierarchical scope chain: User, Customer, Client, and World. Memories are stored at the appropriate scope during ingestion. Retrieval respects scope boundaries automatically. When your agent handles a ticket from someone at Acme Corp, it retrieves memories from three levels: what we know about this user, what we know about Acme Corp, and what we know globally. The context the agent receives is layered and precise instead of a flat search across everything.

We call this Intelligent & Automated Context Scoping (IASC). Our design partners consistently tell us this solved problems they did not realize they had until the scope-bleeding stopped.

The agents that share your pipeline need to share context

Most memory architectures were designed for one agent talking to one user. Multi-agent systems, where a coordinator routes to specialized sub-agents or where multiple agents collaborate on a task, are typically added after the core architecture is settled. The result is predictable: Agent A writes a memory, Agent B cannot see it, or sees it in the wrong format, or retrieves it in a context that does not match where it was stored.

The real-world deployment pattern for production agents is not one agent doing everything. It is a routing agent, two or three specialist agents, and a human escalation path. If those agents cannot share context coherently, the customer repeats themselves at every handoff. That is the same failure we described in our launch post, except it happens between agents instead of between turns.

Synap handles multi-agent architectures natively. Multiple agents share a central context layer while maintaining their own agent-specific memories. The scope chain handles isolation: memories scoped to a specific agent stay with that agent, while memories scoped to the customer or client level are accessible to all agents within that boundary.

Putting it together

The SDK is Python (and JS and more soon), async-first, and handles ingestion, retrieval, context compaction, and authentication. It never blocks your application. When you call the SDK, it returns immediately with an ingestion ID; processing happens asynchronously.

Synap Cloud is the managed backend. It runs the multi-stage ingestion pipeline, stores memories across vector, graph, file systems (based on custom architecture choices for that agent), handles entity resolution, and serves retrieval queries. You do not deploy or manage this infrastructure.

The Dashboard at synap.maximem.ai is where you create and manage instances (agents), configure context architecture, monitor ingestion pipelines, and review the entity resolution queue.

What this means for your agent

Synap is not a better vector database. It is not a fancier RAG pipeline. It is an agentic context management system: a pipeline that actively captures, compacts, and recalls context for your agent, customized to your agent's domain and your customers' organizational boundaries.

The benchmark numbers (90.2% on LongMemEval, 15ms P50) are a consequence of this architecture, not of prompt tricks or model selection. Every layer described here addresses a specific failure mode that we observed in production and in the community. The architecture was designed around those failures.

The SDK is open source. The docs are at docs.maximem.ai. And if you want to start building, synap.maximem.ai has a free tier that takes under a minute to set up.

Get started: synap.maximem.ai

Read the docs: docs.maximem.ai

How Synap Works Under the Hood

Extraction is the first step, not the last

"Sarah," "Sarah Chen," and "SC" are the same person

Compaction is not summarization

Your user's preferences should never leak into another user's session

The agents that share your pipeline need to share context

Putting it together

What this means for your agent

Related posts

Maximem Synap's Agent Memory Now Available for Pipecat

Maximem Synap & LiveKit Agents Integration

Maximem Synap's Agent Memory Connected To Semantic Kernel