Back to Blogs & Resources
Product Updates

Maximem Synap's Agent Memory Now Available for Pipecat

Maximem Team
June 3, 2026
Maximem Synap's Agent Memory Now Available for Pipecat

Maximem Synap's Agent Memory Now Available for Pipecat

Pipecat is the open-source framework for building real-time voice and multimodal agents. Frame-based pipeline architecture. STT, LLM, and TTS services as composable processors. Transports for WebRTC, telephony, and WebSocket. If you are building agents that need to handle live audio, video, or multimodal interactions, Pipecat is where you start.

Today, Maximem Synap extends Pipecat with persistent per-caller memory. Native LLMContext accumulates messages within a single pipeline run. Synap makes that context survive across calls, resolve entities across runs, and retrieve relevant context the moment a caller reconnects.


Where Pipecat Excels

Pipecat gives pipeline developers a framework tuned for real-time multimodal composition. Frames flow between processors: TranscriptionFrame, LLMMessagesAppendFrame, TTSTextFrame, and dozens more. Services are pluggable. You can swap OpenAI Realtime for a local Whisper + Llama + Piper stack without rewriting the pipeline. If your agent needs to mix speech, vision, and text in one orchestrated flow, Pipecat handles the frame choreography well.

The problem emerges when the same caller rings back next week. The pipeline context clears when the run ends. LLMContext messages are process-local. The caller has to re-explain who they are, what they want, and where they left off. Native frame state does not help if the agent forgets the caller's history.


How Pipecat Memory Works Today

Pipecat ships context patterns and third-party memory service integrations. The built-in options keep data within a single process or single call. For a deeper look at how each pattern works, see the Pipecat documentation.

LLMContext and LLMContextAggregatorPair manage in-session conversation history. Frames update the context as user and assistant messages flow through the pipeline. The list grows linearly with each turn. After twenty exchanges, your prompt fills with accumulated noise. When the pipeline ends, the context is discarded.

Auto context summarization (enable_auto_context_summarization=True) compresses older turns into a summary message to keep the prompt manageable. Summarization is lossy. It does not survive across runs, scope by caller, or resolve entities.

FlowManager.state keeps a cross-node dictionary inside a Pipecat Flows graph. Useful for orchestrating state across conversation nodes. It is flow-scoped, not caller-scoped. The state evaporates when the flow ends.

Mem0 Memory Service (Mem0MemoryService) is Pipecat's built-in long-term memory integration. It runs API calls in background threads to avoid blocking audio, supports user_id, agent_id, and run_id scoping, and filters non-user/assistant messages. It is one option among several. Entity resolution, accuracy-preserving compaction, and graph traversal are not part of its surface.

All four options solve in-pipeline context. Cross-run persistence with per-caller scoping, entity resolution, semantic retrieval, and automatic compaction fall outside their scope. These are memory-layer problems, not pipeline-level concerns.


What Synap Adds

Synap is agentic context management. It does not replace LLMContext or the Mem0MemoryService. It extends Pipecat with persistent memory the pipeline can inject and record through frame processors.

We ship two frame processors:

SynapMemoryInjection sits upstream of the LLM service. It fetches caller-scoped memories from Synap and appends a system frame with relevant context. The LLM sees prior history, prior commitments, and prior preferences before it generates the next turn.

SynapTurnRecording sits downstream of the user transcript. After each user turn, it ingests the message into Synap. Recording happens as part of the frame flow. The pipeline does not have to decide to save.

The integration is a native package. Add two processors to the pipeline. Your voice agent starts remembering without a rewrite.

We built this because voice agents kept resetting in production. The problem was not bad audio. It was the caller having to repeat themselves every call. Production testing hit 90.2% on LongMemEval. Typical recall returns in under 100ms.

For why context management is infrastructure and not a feature, read What Is Agentic Context Management?. For build-versus-buy numbers, see The Real Cost of DIY Agent Memory.


Technical Deep Dive

LongMemEval Benchmark

Pipecat agents have a multimodal twist on LongMemEval. The benchmark tests whether an agent can recall facts across runs that span days, modalities, and transports. A user calls a voice agent about a hotel booking, then opens a web chat about the same trip the next morning, then drops a photo of their confirmation into a vision-enabled WhatsApp flow. The same canonical user across all three. The same durable memory. The score we report — 90.2% — is the multimodal variant. Pure text-only recall benchmarks overstate what works in the wild. Vector-only systems drop to 60-70% on this test because the modality and identifier shifts break naive similarity search.

Entity Resolution Mechanism

Frame-based pipelines accumulate identity from many sources. A voice frame carries a SIP From header. A WebRTC frame carries a participant identity. A chat frame carries a user_id from your auth layer. A vision frame might carry no identity at all until the user speaks. Synap's resolver accepts all of these. The SynapTurnRecording processor extracts the identifier from the frame's metadata and feeds it into the resolution engine. By the time the LLM processes the third turn of a call, the entity is fully linked. "Maria from billing" and user_8821 and +1-555-0987 are one person. Subsequent runs do not re-resolve from scratch — they hit a cache, then fall through to the engine only on misses.

Accuracy-Preserving Compaction

Pipecat runs produce a lot of frames. STT fires transcription frames. The context aggregator merges them into LLMContext messages. After 15 minutes of voice, the context object is enormous — every filler, every acknowledgement, every back-channel "mm-hm." The compaction classifier runs as a downstream processor and trims the message list to what the LLM actually needs: user intent, agent commitments, resolved entities, and the most recent N turns of substantive exchange. In our production testing, multimodal pipelines see 60 to 70% token reduction after the classifier engages. The accuracy preservation comes from the classifier not touching the critical-fact class, even when token budgets get aggressive.

Graph Traversal in Accurate Mode

Multimodal queries are where the graph layer earns its keep. A user asks "show me the photos from the hotel I stayed at last month" — the answer is a graph: user → trip → hotel → media items. Vector search over a flat embedding space might surface photos that look like a hotel but are not from the right trip. Graph traversal follows the actual relational path and reranks. The frame processor pipeline can request accurate mode per-turn based on query complexity, so the 200ms graph hit only happens for the 10-15% of turns that need it. Fast mode handles the rest in under 50ms.

Multi-Tenant Scoping

Pipecat deployments often run as fleet-managed services where one worker pool serves many customers' agents. The room name and the SIP From header are not enough to determine tenant — you need metadata you control. Synap's frame processor accepts a customer_id binding at construction, and the processor injects it into every memory operation. Tenants cannot leak into each other even when the same worker handles both their runs. This is enforced in the storage query, not in a wrapper around it. Code-level scoping is too easy to get wrong.


What Synap Adds to Pipecat

Persistence

Pipecat Native. LLMContext is process-local. Cleared when the run ends. With Synap. Per-caller memory survives across calls and process restarts.


Entity Resolution

Pipecat Native. Raw identifiers. No linking across runs. With Synap. "John" and "+1-555-0142" resolve to one canonical caller across every call.


Compaction

Pipecat Native. Auto summarization is lossy. No accuracy guarantees. With Synap. Automatic and configurable. Accuracy-preserving compaction that does not drop critical facts.


Retrieval Latency

Pipecat Native. No cross-run retrieval. With Synap. Typical recall via SynapMemoryInjection returns in under 100ms.


Long-Term Recall

Pipecat Native. Not benchmarked for cross-run recall. With Synap. 90.2% on LongMemEval.


Failure Handling

Pipecat Native. Unhandled errors abort the pipeline. With Synap. Read failures return empty results and a logged error. Write failures raise SynapIntegrationError so you know persistence missed. Your pipeline keeps running.


User Scoping

Pipecat Native. Run-scoped only. With Synap. user_id and optional customer_id set per pipeline. Fresh processor pair per run for multi-tenant isolation.


What Production Teams Gain

Frame-processor ergonomics. Synap's processors drop into the pipeline definition the same way STT, TTS, or the LLM service do. You compose them with transport.input() | stt | memory.recording | memory.injection | llm | tts | transport.output(). No middleware. No wrappers. No callback registration. The pipeline orchestrator handles the lifecycle. This is the kind of integration a Pipecat developer can adopt in a single PR.

Multimodal context preservation. A user who calls about a flight, then messages about it later, then sends a photo of their boarding pass — that is one user, one trip, one thread of context. Pure-pipeline approaches without memory lose the thread the moment the run ends. Synap carries it across runs and across transports. The user experience is the part that changes. They never have to repeat themselves, even when the channel does.

Compaction as a pipeline stage. Pipecat's frame model means compaction can run as a downstream processor that intercepts the LLM context before it reaches the model. You control exactly when it engages (after N turns, or when token count crosses a threshold, or both). This is more flexible than a hidden service that compacts whenever it wants. You can tune it to your latency budget.

No main-loop blocking. Recording runs on background frames. The LLM service does not wait for the write to complete before generating the next response. The pipeline's frame throughput stays at production cadence. A 50ms stall at a memory write would be visible in audio; this architecture is built so that never happens. We tested it against a benchmark that exercises 100 turns of voice without dropping a single frame.

Entity resolution that handles real input. Voice frames carry SIP identities. WebRTC frames carry LiveKit participant identities. Chat frames carry your auth user_id. Vision frames often carry nothing. The resolver accepts all of them and links them to a canonical entity. This is the kind of plumbing you only appreciate after your third production incident where a user called three times as three different "people" because nothing linked their phone numbers.

Graceful degradation for telephony at 2 AM. Read failures do not crash the pipeline. Write failures raise SynapIntegrationError so your logs see the gap, but the call completes. The processors are designed for the failure modes of real production: networks drop, APIs time out, workers recycle. A memory layer that takes down the call when it has a bad day is worse than no memory layer at all.


How to Get Started

Three steps. No rearchitecture.

Step 1: Install

pip install maximem-synap-pipecat pipecat-ai

Step 2: Wire the memory processors into the pipeline

import os from pipecat.pipeline.pipeline import Pipeline from pipecat.processors.aggregators.llm_context import LLMContext from maximem_synap_pipecat import MaximemSynapSDK, SynapMemoryInjection, SynapTurnRecording

sdk = MaximemSynapSDK(api_key=os.getenv("SYNAP_API_KEY")) await sdk.initialize()

Construct caller-scoped processors for this run

memory_injection = SynapMemoryInjection( sdk=sdk, user_id="caller_123", customer_id="acme_corp" # optional, for multi-tenant ) turn_recording = SynapTurnRecording( sdk=sdk, user_id="caller_123" )

Wire into the pipeline

pipeline = Pipeline([ transport.input(), stt_service, turn_recording, # record after each user turn memory_injection, # inject prior context before the LLM context_aggregator.user(), llm_service, tts_service, transport.output(), context_aggregator.assistant(), ])

Step 3: Deploy

For multi-tenant production, construct a fresh processor pair per run using the caller's identity. This ensures tenant isolation at the pipeline level. Synap handles persistence, compaction, and retrieval. Your pipeline handles the audio flow.

Full config, scoping rules, and error handling: see the integration docs (https://docs.maximem.ai/integrations/pipecat)


Memory Is Infrastructure

Pipecat gave pipeline developers a framework for real-time voice and multimodal composition. The context layer it ships handles in-run conversation well. Making that context persist across calls, resolve entities, and retrieve intelligently is a different layer of the stack.

The teams that ship production voice agents discover this around month three. They either build memory infrastructure themselves, or they plug in a system built for the problem.

Memory is infrastructure, not a feature.

Start building Pipecat voice agents that remember across calls (https://synap.maximem.ai)

Synap pricing is usage-based. You pay for memory operations: storage, retrieval, compaction. No per-seat or per-framework surcharge. The $49/month starter plan includes a base allocation; usage beyond that is metered by operation. Every new account gets $25 in free credits to test before committing. See the full pricing page (https://synap.maximem.ai/pricing).


Related Posts

Related posts