Why We Built Synap

We built a customer support agent for a SaaS product. The kind of agent every second team is building right now: handle tickets, pull up account details, route the hard ones to humans. A customer opens a ticket and tells the agent, twice in the same conversation, that they upgraded from the Starter plan to Pro. The agent acknowledges both times.

Fifteen turns later, the customer asks about a feature. The agent tells them it is Pro-only and sends them the upgrade link to the plan they are already paying for.

Not a model failure. The model did what the memory told it to do. The memory was wrong.

We threw the standard arsenal at it: RAG over the account data, prompt caching to keep recent context alive, one of the popular open-source memory libraries. You know how this goes; each fix worked briefly, then broke in some new way that took longer to diagnose than the original problem.

What 40 GitHub issues and 1,500 Reddit posts told us

We started talking to other agent-builders and realized this was not an isolated bug. It was a pattern. So we went looking for it systematically.

We pulled 40 GitHub issues across the most widely-used memory and agent frameworks. We read roughly 1,500 Reddit posts across r/AI_Agents, r/Rag, r/LocalLLaMA, r/LangChain, and a handful of other communities, all from December 2025 through April 2026. Four months of developers describing, debugging, and occasionally solving the same set of problems.

The failures were not random. They clustered around the same underlying architecture.

A developer on r/LocalLLaMA tracked 847 agent runs and measured instruction adherence dropping from 94% to 41% as context filled up. Their observation: the degradation is not linear, there is a cliff, and developers feel it before they can instrument it. We saw the same shape in our support agent. Worked beautifully for ten turns. Then started making mistakes that felt insulting.

The capture problem turned out to be worse than the retrieval problem, which we did not expect. A 32-day production audit of 10,134 entries in a popular memory library found 38 usable memories out of the entire set. The rest were boot-file restatements, cron noise, config dumps, and hallucinated user profiles. That is a 97.8% junk rate. The developer who ran the audit framed it in one line that changed how we thought about the whole space: the extraction prompt is the bottleneck, not the model. A smarter model follows a broken extraction prompt more faithfully, which means it captures more junk, not less.

We also found teams whose write path worked fine but whose retrieval returned nothing. One team had over 3,000 documents stored and 91 memories visible in the vendor's dashboard; every API search call came back empty. The storage was intact. The read path was silently broken. Nothing in the system told them. That kind of failure destroys trust faster than any accuracy issue, because you cannot fix what you cannot see.

And then there was an architectural paradox we genuinely did not expect to find. A popular framework caps all model calls at the same token limit, including the summarizer that is specifically designed to handle context overflow. The compactor cannot see the overflow it was hired to compact. The tool you bring in to fix the problem inherits the exact constraint that caused it.

Your agent does not know what it does not know

The pattern underneath all of these failures is the same.

Every current memory tool treats context as a storage problem. Store data >> Retrieve data >> Compress data. Your agent explicitly requests to store something, or explicitly searches to retrieve it. The entire paradigm assumes that if you get data in and out correctly, your agent will figure out the rest.

But your agent does not know what it does not know. The agent that needs context is the same agent that is missing it. It cannot ask for what it does not realize it needs. That is the fundamental mismatch with how memory tools are built today.

On top of that, these tools treat the memory needs of every agent the same way. A customer support agent cares about ticket history, resolution patterns, and plan details. A research analyst cares about source provenance, citation chains, and version history. A voice concierge cares about guest preferences, room details, and real-time booking constraints. A universal memory architecture serves all three with the same pipeline. Context management has to be, well, contextual.

An agent problem, not a storage problem

We spent the last few months working with our design partners on this as an agent problem. The result is Maximem Synap, an agentic context management system.

The way we see it: context management has three jobs. Capturing what matters from the conversation, with high enough recall that your agent learns what it needs to learn. Compacting without losing signal, with high enough precision that you are not causing context rot and inflating token costs. And recalling the right information when the agent actually needs it, not when it guesses it might.

The 97.8% junk rate I mentioned earlier, is a capture failure. The support agent quoting Starter pricing to a Pro customer is a recall failure. The summarizer inheriting its own context limit is a compaction failure. Same underlying paradigm, three different failure surfaces.

Synap builds a custom context architecture for every agent and actively manages it as the session grows. The ingestion pipeline extracts structured knowledge (facts, preferences, episodes, temporal events, emotions) rather than storing raw text and hoping retrieval sorts it out later. Entity resolution runs automatically, which means "John," "John Smith," and "my manager" map to the same canonical record over time. This directly addresses the kind of context bleeding that made our support agent fail in the first place. Context compaction uses adaptive strategies with quality validation scores, so you can tell when compression preserved critical information and when it did not. Most systems compress and hope for the best, and that is how you get the kind of 18,282-token to 122-token collapse that actually drops accuracy below having no memory at all.

We also handle multi-agent architectures natively (not bolted on as an afterthought) and support memory isolation across a hierarchical scope chain: user, customer, client, and world. User A's preferences never bleed into User B's session, but organizational knowledge stays accessible to everyone who needs it. We call this Intelligent & Automated Context Scoping (IACS).

90.2% on LongMemEval. 15 milliseconds at P50.

Synap scores 90.2% on LongMemEval, a benchmark that tests whether a memory system retrieves the right fact from a long conversation and maintains that accuracy as the conversation grows. The next closest system we tested scored 71.3%.

P50 latency is 15 milliseconds and we are working hard to bring that number to P95. For context: a voice agent typically runs on a 300-millisecond total budget. The LLM itself typically takes 150 to 200 milliseconds. A memory system that takes 180 milliseconds (which is what we observed from some alternatives) blows that budget entirely. At 15 milliseconds, the memory system occupies a fraction of the budget and the 300-millisecond voice threshold becomes feasible.

We support 10 agentic frameworks on day one: LangChain (including LangGraph), LlamaIndex, CrewAI, OpenAI Agents SDK, Google ADK, AutoGen, Haystack, Pydantic AI, and Semantic Kernel, with Vercel AI SDK support close behind.

The benchmark methodology is published. The eval harness is open source. You can download it, run it against whatever systems you are evaluating, and check every number here. We would rather someone find a flaw in our methodology than publish results nobody can verify.

Synap is live today

Free tier is open. The developer SDK is open source. Our memory eval harness is open source. The first one hundred customers get three months of our Pro tier (a five hundred dollar per month plan) for free.

We are not shipping a product and walking away. We want to build what comes next with the people who are living this problem every day.

If you are building an agent and you are tired of it forgetting, come build with us at synap.maximem.ai. If you try it and something breaks, tell me where. I will fix it.

Get started: synap.maximem.ai

See the benchmarks: Synap Benchmark Results

Read the technical deep-dive: How Synap Works Under the Hood

Why We Built Synap

What 40 GitHub issues and 1,500 Reddit posts told us

Your agent does not know what it does not know

An agent problem, not a storage problem

90.2% on LongMemEval. 15 milliseconds at P50.

Synap is live today

Related posts

Maximem Synap's Agent Memory Now Available for Pipecat

Maximem Synap & LiveKit Agents Integration

Maximem Synap's Agent Memory Connected To Semantic Kernel