Maximem Synap's Agent Memory Connected To Semantic Kernel

Semantic Kernel is Microsoft's agent orchestration framework for .NET and Python. First-class function calling. Planners that compose skills into goals. Chat history wired into the agent runtime. If you are building agents inside the Microsoft ecosystem, Semantic Kernel is where you start.

Today, Maximem Synap extends Semantic Kernel with persistent per-user memory. Native ChatHistory accumulates messages within a single process. Synap makes that history survive restarts, resolve entities across sessions, and retrieve relevant context without manual wiring.

Where Semantic Kernel Excels

Semantic Kernel gives .NET and Python developers a framework that feels native to the Microsoft stack. Type-safe kernel functions. Pluggable AI connectors for Azure OpenAI, OpenAI, and Hugging Face. A planner that decomposes goals into kernel function calls. If your agent needs to call Azure Functions, invoke Microsoft Graph, or compose skills into workflows, the framework handles the orchestration well.

The problem emerges when your user returns after a day or a week. ChatHistory accumulates messages in memory. It serializes cleanly to JSON. It does not survive a process restart on its own, and it has no concept of entity resolution, semantic retrieval, or compaction. Native chat history does not help if the agent forgets what the user told it yesterday.

How Semantic Kernel Memory Works Today

Semantic Kernel ships ChatHistory as the default conversation store. The list grows as the agent and user exchange turns. For a deeper look at how each pattern works, see the Semantic Kernel agent documentation.

In-Process ChatHistory appends every message to a list attached to the kernel. The list grows linearly with each turn. After twenty exchanges, your prompt fills with accumulated noise. The model struggles to find your actual instructions. When the process restarts, the list clears. Every conversation begins at zero.

Volatile VariableStore keeps a flat key-value bag scoped to a kernel instance. Useful for short-lived scratch state. It does not understand message roles, semantic search, or temporal ordering. It is a dictionary, not a memory.

Vector Store Connectors let you index chat messages into a vector database (Azure AI Search, Pinecone, Qdrant, Postgres pgvector). Retrieval is possible. Entity resolution, temporal awareness, and accuracy-preserving compaction are not. You get cosine similarity. You do not get memory.

All three options solve in-session state. Cross-session persistence with per-user scoping, entity resolution, semantic retrieval, and automatic compaction fall outside their scope. These are memory-layer problems, not framework-level concerns.

What Synap Adds

Synap is agentic context management. It does not replace Semantic Kernel's ChatHistory. It extends the framework with persistent memory the kernel can invoke through a plugin.

We ship a kernel plugin that exposes two functions:

search_memory performs semantic search over user-scoped memories. Accepts a query and optional result limit. Returns ranked results the planner can chain into function calls.

store_memory ingests new facts into Synap. Takes content and optional memory type. Returns a status identifier confirming persistence.

Register the plugin with any kernel. The agent decides when to recall prior context and when to save new facts. Scope is set at plugin initialization via user_id and optional customer_id. The model only sees query and content arguments.

The integration is a native package. Add one plugin. Your kernel starts remembering without a rewrite.

We built this because .NET and Python agents kept stalling in production. The problem was not bad orchestration. It was missing context that lived in a different session last Tuesday. Production testing hit 92% LongMemEval, 93.2% on LoCoMo. Typical recall returns in under 100ms.

For why context management is infrastructure and not a feature, read What Is Agentic Context Management?. For build-versus-buy numbers, see The Real Cost of DIY Agent Memory.

Technical Deep Dive

LongMemEval Benchmark

LongMemEval tests whether agents recall facts across long, multi-turn conversations spanning multiple sessions. The benchmark simulates production conditions where users return days apart and expect the agent to remember prior context. Synap scores 92% on this benchmark. Baseline vector-only approaches typically score 60-70%. The gap comes from entity resolution and temporal awareness that pure vector search lacks.

Entity Resolution Mechanism

Synap tracks identity across 15 reference patterns: names, emails, phone numbers, account IDs, session IDs, device IDs, and more. When an agent encounters "Max" in one session and "[email protected]" in another, the resolution engine runs deterministic matching on structured fields, then probabilistic matching on unstructured references. Conflicts are resolved using temporal recency and source confidence scores. The result is a single canonical entity that accumulates context across all identifiers.

Accuracy-Preserving Compaction

Compaction identifies which facts are critical versus redundant. Critical facts include user preferences, constraints, commitments, and entity relationships. Redundant facts include repeated greetings, acknowledged statements, and intermediate reasoning steps. The compaction engine uses a classifier trained on conversation data to distinguish these categories. In Maximem's internal production testing, most integrations see 60 to 70% fewer tokens shipped to the LLM per turn after compaction kicks in. The accuracy preservation comes from never dropping classified-critical facts, even under aggressive token budgets.

Graph Traversal in Accurate Mode

Fast mode retrieves by vector similarity alone. Accurate mode adds a graph layer that traverses relationships between entities. If you ask about "the project John mentioned," the graph finds John, traverses to projects linked to John, and returns the relevant context. This adds latency but catches connections that vector similarity misses. Reranking then scores results by recency, confidence, and query relevance.

Multi-Tenant Scoping

Memory is scoped by three keys: user_id identifies the person, conversation_id isolates individual sessions, and customer_id enables multi-tenant deployments. A SaaS deploying agents for multiple customers uses customer_id to ensure tenant A never sees tenant B's memory. This scoping is enforced at the storage layer, not just in application logic.

For the Semantic Kernel integration, scope is set at plugin initialization. Register a fresh plugin instance per request so concurrent users cannot view one another's memories. This pairs Synap's storage-level scoping with Semantic Kernel's kernel-level isolation.

What Synap Adds to Semantic Kernel

Persistence

Semantic Kernel Native. In-process ChatHistory only. State clears on restart. With Synap. Per-user memory survives across sessions and restarts.

Entity Resolution

Semantic Kernel Native. Raw identifiers. No linking across sessions. With Synap. "John" and "[email protected]" resolve to one canonical entity across every session.

Compaction

Semantic Kernel Native. Manual summarization or no compaction. Lossy. With Synap. Automatic and configurable. Accuracy-preserving compaction that does not drop critical facts.

Retrieval Latency

Semantic Kernel Native. Depends on vector store setup. With Synap. Typical recall via search_memory returns in under 100ms.

Long-Term Recall

Semantic Kernel Native. Not benchmarked for cross-session recall. With Synap. 92% on LongMemEval.

Failure Handling

Semantic Kernel Native. Unhandled exceptions abort the kernel invocation. With Synap. Read failures return empty results and a logged error. Write failures raise SynapIntegrationError so you know persistence missed. Your kernel keeps running.

User Scoping

Semantic Kernel Native. Process-scoped only. With Synap. user_id and optional customer_id set at plugin init. Fresh plugin instance per request for multi-tenant isolation.

What Production Teams Gain

Cross-session continuity. Your user chats on Monday, returns on Wednesday. search_memory recalls prior context. store_memory persists new facts the agent learns. The kernel treats every session as one continuous conversation. Native memory treats every session as a fresh start.

Accuracy that ships. 92% LongMemEval, 93.2% on LoCoMo measures whether agents recall facts across long, multi-turn conversations spanning multiple sessions. This benchmark tests the specific failure mode that breaks production agents: accurate recall over distance and time.

Token efficiency. Synap's compaction trims conversation history without dropping critical context. Most teams see 60 to 70% fewer tokens shipped to the LLM per turn. At scale, that is the difference between profit and burn.

Latency that does not block. Typical search_memory recall returns in under 100ms. store_memory ingestion runs asynchronously and does not block kernel execution. A failure returns empty results and a log line, not a broken kernel.

Entity resolution. "John from Acme," "[email protected]," and "user_4829" resolve to one person across every session. Synap handles this at the memory layer so your kernel does not have to.

Production resilience. Read failures return empty results and a logged error instead of crashing the kernel. Write failures raise SynapIntegrationError so you know if persistence missed. The plugin implements standard Semantic Kernel function patterns.

How to Get Started

Three steps. No rearchitecture.

Step 1: Install

pip install maximem-synap-semantic-kernel semantic-kernel

Step 2: Initialize and register the plugin


import os
import semantic_kernel as sk
from semantic_kernel.connectors.ai.open_ai import AzureChatCompletion
from maximem_synap_semantic_kernel import MaximemSynapSDK, create_synap_plugin
sdk = MaximemSynapSDK(api_key=os.getenv("SYNAP_API_KEY"))
await sdk.initialize()
Create a user-scoped memory plugin
memory_plugin = create_synap_plugin(
sdk=sdk,
user_id="user_123",
customer_id="acme_corp"  # optional, for multi-tenant
)
Wire into the kernel
kernel = sk.Kernel()
kernel.add_plugin(memory_plugin, plugin_name="synap")
kernel.add_service(AzureChatCompletion())
The planner can now call search_memory and store_memory as kernel functions

Step 3: Deploy

For multi-tenant production, register a fresh plugin instance per request. This ensures tenant isolation at the kernel level. Synap handles persistence, compaction, and retrieval. Your kernel handles orchestration.

Full config, scoping rules, and error handling: see the integration docs (https://docs.maximem.ai/integrations/semantic-kernel)

Memory Is Infrastructure

Semantic Kernel gave .NET and Python developers a framework for orchestrating agents. The ChatHistory it ships handles in-session conversation well. Making that history persist across sessions, resolve entities, and retrieve intelligently is a different layer of the stack.

The teams that ship production agents discover this around month three. They either build memory infrastructure themselves, or they plug in a system built for the problem.

Memory is infrastructure, not a feature.

Start building Semantic Kernel agents that remember across sessions (https://synap.maximem.ai)

Synap pricing is usage-based. You pay for memory operations: storage, retrieval, compaction. No per-seat or per-framework surcharge. The $49/month starter plan includes a base allocation; usage beyond that is metered by operation. Every new account gets $25 in free credits to test before committing. See the full pricing page (https://synap.maximem.ai/pricing).

What Is Agentic Context Management? (/blog/what-is-agentic-context-management)
The Real Cost of DIY Agent Memory (/blog/real-cost-diy-agent-memory)
Skills Are the New Microservices (/blog/skills-new-microservices)

Frequently Asked Questions

Does Semantic Kernel have built-in memory?

Yes. Three patterns: in-process ChatHistory, Volatile VariableStore, and vector store connectors. ChatHistory accumulates messages but does not survive restarts. VariableStore is a flat key-value bag. Vector connectors give you similarity search without entity resolution or compaction. See the Semantic Kernel agent documentation (https://learn.microsoft.com/en-us/semantic-kernel/frameworks/agent/agent-chat) for details.

How is Synap different from a vector store connector?

Vector store connectors give you cosine similarity. Synap adds entity linking across 15 reference patterns, temporal awareness, automatic accuracy-preserving compaction, and a graph layer for relationship queries. Synap hosts the records so memory survives restarts. Different depth.

What is the latency overhead?

Typical search_memory recall returns in under 100ms. store_memory ingestion runs asynchronously and does not block kernel execution.

Can I use this in production today?

Yes. The plugin implements standard Semantic Kernel function patterns. Read failures return empty results and a logged error. Write failures raise SynapIntegrationError so your kernel knows persistence missed. For multi-tenant deployments, register a fresh plugin instance per request.

How does pricing work?

Usage-based. Storage, retrieval, and compaction are metered separately. No per-seat or per-framework fee. The $49/month starter plan includes a base allocation; usage beyond that is metered by operation. Every new account gets $25 in free credits. See the full pricing page (https://synap.maximem.ai/pricing).