LiveKit × MemoAir Voice Memory
Add persistent voice memory to a LiveKit Agents pipeline. MemoAir ships Python and TypeScript SDKs — the local memory runtime is bundled and auto-launched on first use, so there is no Docker step. Search results are injected before each LLM reply, and completed turns are saved back.
Choose your path
Install
Pick the package family for your worker language. Both use the same local memory runtime contract — no Docker required.
# TypeScript / Node LiveKit Agentsnpm install memoair-voice memoair-livekit @livekit/agents # Python LiveKit Agentspip install "memoair-voice>=0.3.2" "memoair-livekit>=0.3.2" \ "livekit-agents>=1.0" \ "livekit-plugins-openai" "livekit-plugins-deepgram" \ "livekit-plugins-silero" "livekit-plugins-turn-detector" \ python-dotenvmemoair-livekit depends on memoair-voice. Install both when your code also uses the raw client for index seeding, smoke tests, or custom framework logic.
Get your API key, project ID, and agent ID
Sign in at dashboard.memoair.space and create (or pick) a project + agent. From the dashboard copy:
- API key — looks like
memoair_pk_…. Account-scoped: one per account, shared across every project and agent in your org. Treat as a secret. - Project ID — looks like
proj_…. Identifies the workspace. Safe to log. - Agent ID — looks like
agent_…. Identifies the voice bot inside the project. Safe to log.
For per-user isolation MemoAir uses the user_id you pin at agent construction time (LiveKit dispatch is 1:1, so the LiveKit drop-in keeps single-user construction). Use the LiveKit participant identity so each caller keeps their own memory.
Environment setup
Create a .env next to your agent script:
# MemoAirMEMOAIR_API_KEY=memoair_pk_...MEMOAIR_PROJECT_ID=proj_...MEMOAIR_AGENT_ID=agent_... # LiveKit (devkey works for local livekit-server --dev)LIVEKIT_URL=ws://localhost:7880LIVEKIT_API_KEY=devkeyLIVEKIT_API_SECRET=secret # AI provider keysOPENAI_API_KEY=sk-...DEEPGRAM_API_KEY=...Seed the knowledge base
Static knowledge (FAQs, docs, user profile facts) lives in MemoAir's org index — project-scoped, shared across all users. Conversation turns (per-user) get saved automatically once the agent runs; you only need to seed the org index once. Pick either path below — both land in the same index.
Option A — Dashboard upload (no code)
At dashboard.memoair.space open your project, go to Knowledge → New index, name it agent-memory, and upload PDFs / text / paste raw notes. The dashboard handles chunking and embedding. Best for non-trivial corpora (PDFs, long docs, bulk ingest beyond 100 docs per call).
Option B — Code-driven seed
Good for structured facts you want to keep next to your agent code (user profile, FAQ entries, sample memories). The cloud endpoint /v1/voice-sdk/index/create is build-or-append: the first call creates the index, later calls add documents to the same index name. Limit per call: 100 docs and 1 MB total.
import asyncioimport osfrom dotenv import load_dotenv # MemoAirVoiceClient is the cloud + local-runtime wrapper. We only need its# create_index() here — the same client also exposes search_memory() and# save_response() used by the agent at runtime.from memoair_voice import MemoAirVoiceClient load_dotenv() # Pick a stable name. The agent at runtime searches "all indexes" for this# project, so the exact name doesn't have to match anything — but reusing# the same name across runs makes "re-seed" trivial (it's build-or-append).INDEX_NAME = "agent-memory" # Each memory is a dict with three fields:# - id: stable per-document key; re-using an id REPLACES that doc on# the next call (idempotent update).# - text: the searchable content. Plain text; the cloud handles chunking# + embedding.# - metadata: optional free-form tags. Surfaces in search results so your# prompt can filter / cite. Keep values as strings for now.MEMORIES = [ { "id": "return-policy", "text": "Returns accepted within 30 days of purchase with a receipt.", "metadata": {"kind": "faq", "topic": "returns"}, }, { "id": "shipping", "text": "Standard shipping is 3-5 business days. Express is 1-2 days.", "metadata": {"kind": "faq", "topic": "shipping"}, }, { "id": "support", "text": "Support is available 24/7 at support@example.com.", "metadata": {"kind": "faq", "topic": "support"}, },] async def main() -> None: # create_index is a cloud-only call — it doesn't need a runtime, so # the build script never opens the local pool. aclose() releases the # cloud HTTP client cleanly. client = MemoAirVoiceClient( api_key=os.environ["MEMOAIR_API_KEY"], project_id=os.environ["MEMOAIR_PROJECT_ID"], agent_id=os.environ["MEMOAIR_AGENT_ID"], ) try: # First call CREATES the index. Subsequent calls with the same # INDEX_NAME APPEND new documents (or replace ones with the same id). # Cap per call: 100 docs and 1 MB total — batch larger corpora. result = await client.create_index(INDEX_NAME, MEMORIES) # chunk_count = how many embedding chunks the cloud produced from # your docs (long text gets split). version bumps every time the # index changes; the runtime uses it to detect when to re-pull. print(f"Indexed {result.chunk_count} chunks (version={result.version})") finally: await client.aclose() if __name__ == "__main__": asyncio.run(main())python build_index.pyWire MemoAir into your agent
Pick the option that matches how much control you want. Both end up doing the same thing — search before each reply, inject context, save the completed turn.
Option A — memoair-livekit (drop-in)
MemoAirLiveKitAgent subclasses LiveKit's Agent base class and wires every lifecycle hook for you:
on_user_turn_completed→ callssearch_memoryand splices a SYSTEM message before the user's message (LiveKit #5053 ordering workaround).on_agent_response_completed→ callssave_response(user_text, assistant_text).on_enter/on_exit→ manages the local runtime session (auto-spawns the bundled runtime on 127.0.0.1:7878).
import asyncioimport osfrom dotenv import load_dotenvfrom livekit.agents import AgentSession, JobContext, WorkerOptions, clifrom livekit.plugins import deepgram, openai, silerofrom livekit.plugins.turn_detector.english import EnglishModel # MemoAirLiveKitAgent is a livekit.agents.Agent subclass — drop-in. It owns:# - on_enter: opens the per-(project, user) runtime via the SDK's pool# - on_user_turn_completed: calls search_memory and splices recall as a# SYSTEM message BEFORE the user message (LiveKit #5053 workaround)# - on_exit: tears down the runtime session cleanlyfrom memoair_livekit import MemoAirLiveKitAgent load_dotenv() async def entrypoint(ctx: JobContext) -> None: await ctx.connect() # user_id pins the per-user memory lane for this LiveKit job (LiveKit # dispatch is 1:1, so the drop-in keeps single-user construction). # Same identity across calls = same per-user memory. In real dispatch # this is a real string; in CLI "console" mode the LiveKit harness # mocks it (a MagicMock would silently corrupt the runtime identity), # so we fall back to a stable default that lets repeated runs reuse # the same memory. raw = ctx.room.local_participant.identity user_id = raw if isinstance(raw, str) and raw.strip() else "console-user" agent = MemoAirLiveKitAgent( api_key=os.environ["MEMOAIR_API_KEY"], project_id=os.environ["MEMOAIR_PROJECT_ID"], agent_id=os.environ["MEMOAIR_AGENT_ID"], user_id=user_id, # System prompt the LLM sees on every turn. Tell it that recall will # arrive as a system message — otherwise some models ignore the # context. Keep replies short because they're being spoken aloud. instructions=( "You are a helpful voice assistant with access to long-term memory. " "Relevant memories are injected as a system message before each reply — " "ground your answers in that context whenever it applies. Keep replies " "concise; the user is hearing them spoken aloud." ), ) session = AgentSession( stt=deepgram.STT(), llm=openai.LLM(model="gpt-4o"), tts=openai.TTS(), turn_detection=EnglishModel(), vad=silero.VAD.load(), ) # SAVE wiring (required). LiveKit's base Agent never calls # on_agent_response_completed itself, so MemoAirLiveKitAgent's save path # wouldn't fire on its own. We forward LiveKit's conversation_item_added # event for assistant turns, which is what triggers save_response. # # Drop this whole block to make the agent read-only (search but never # save). Useful if you only want to surface the pre-seeded org index. @session.on("conversation_item_added") def _on_item_added(event) -> None: # event.item can be a ChatMessage (has .role) or an AgentHandoff # (no .role, raises AttributeError). Use getattr to skip handoffs # silently — without this, the first handoff event at session boot # would crash this listener and silently disable saves for the # rest of the session. if getattr(event.item, "role", None) == "assistant": asyncio.create_task( agent.on_agent_response_completed(None, event.item) ) await session.start(agent=agent, room=ctx.room) if __name__ == "__main__": cli.run_app(WorkerOptions(entrypoint_fnc=entrypoint))Drop the conversation_item_added block to make the agent read-only (search but never write). Reference example in the repo: examples/livekit/voice_agents/memoair_agent.py.
Option B — memoair-voice (full control)
Same shape as a standard retrieval integration: you subclass Agent, override on_user_turn_completed, call search_memory on the MemoAir client, and add the recalled context to turn_ctx as a system message. Use this when you want to compose your own prompt, gate which lanes get searched, or skip saving turns altogether.
import asyncioimport loggingimport osfrom dotenv import load_dotenvfrom livekit.agents import ( Agent, AgentSession, ChatContext, ChatMessage, JobContext, WorkerOptions, cli,)from livekit.plugins import deepgram, openai, silerofrom livekit.plugins.turn_detector.english import EnglishModel # MemoAirVoiceClient is the raw client — no LiveKit assumptions. We drive# search_memory and save_response ourselves so we can decide exactly when# and how memory participates in each turn.from memoair_voice import MemoAirVoiceClient load_dotenv()logging.basicConfig(level=logging.INFO)logger = logging.getLogger("memoair-agent") class MemoAirRetrievalAgent(Agent): """LiveKit Agent that runs search_memory before every reply. Standard retrieval shape: subclass Agent, override on_user_turn_completed, search + inject, let LiveKit's pipeline carry on. The save path is wired separately from AgentSession events (see entrypoint() below) because LiveKit's base Agent doesn't expose a post-reply hook. """ def __init__(self, client: MemoAirVoiceClient, *, user_id: str): super().__init__( instructions=( "You are a helpful voice assistant. Use the memory context " "provided in system messages to answer the user. If memory " "does not cover the question, answer from general knowledge." ) ) self._client = client self._user_id = user_id # Cache the user turn text so the save handler in entrypoint() can # pair it with the eventual assistant text. None means "no user # turn pending" (interrupted turn, replay, race). self._pending_user_text: str | None = None async def on_user_turn_completed( self, turn_ctx: ChatContext, new_message: ChatMessage ) -> None: # LiveKit calls this AFTER the user message is finalised but BEFORE # the LLM is invoked — the perfect window to splice in recall. user_text = new_message.text_content or "" self._pending_user_text = user_text try: # 250 ms keeps the hot path snappy; the runtime falls back to # whatever lanes responded in time and returns a composed # contextText string covering profile + working + permanent + org. result = await self._client.search_memory( user_text, user={"id": self._user_id}, timeout_ms=250, ) context = (result.contextText or "").strip() if context: # LiveKit issue #5053: turn_ctx.add_message(role="system",...) # orders by created_at, which lands AFTER new_message because # the user spoke earlier. The LLM needs recall FIRST, so we # splice directly into turn_ctx.items at the index immediately # before the user's message. idx = turn_ctx.items.index(new_message) turn_ctx.items.insert( idx, ChatMessage(role="system", content=[context]), ) except Exception as exc: # Never block the LLM on a memory failure — log and continue # with no recall. Timeouts, network errors, runtime restarts # all land here. logger.warning("search_memory failed: %s", exc) async def entrypoint(ctx: JobContext) -> None: await ctx.connect() # Per-user memory pin. Same identity across calls = same memory. user_id = ctx.room.local_participant.identity or "console-user" # Construct the MemoAir client for the lifetime of this LiveKit job. # MemoAirVoiceClient owns an internal runtime pool keyed by # (project_id, user.id) — pass user={...} on every search/save call # below so the pool routes correctly. aclose() flushes every pooled # runtime's outbox before tearing them down. client = MemoAirVoiceClient( api_key=os.environ["MEMOAIR_API_KEY"], project_id=os.environ["MEMOAIR_PROJECT_ID"], agent_id=os.environ["MEMOAIR_AGENT_ID"], ) try: agent = MemoAirRetrievalAgent(client, user_id=user_id) session = AgentSession( stt=deepgram.STT(), llm=openai.LLM(model="gpt-4o"), tts=openai.TTS(), turn_detection=EnglishModel(), vad=silero.VAD.load(), ) # SAVE wiring. LiveKit's Agent base class has no on_assistant_turn # hook, so we listen on AgentSession's conversation_item_added event # — the only place we reliably see a finalised assistant message. # Drop this whole block for a search-only agent. @session.on("conversation_item_added") def _on_item_added(event) -> None: # Skip non-assistant items (user turns, agent handoffs). Also # skip if we have no paired user text (interrupted turn etc.) — # otherwise save_response would land a "" user side. if ( getattr(event.item, "role", None) == "assistant" and agent._pending_user_text is not None ): user_text = agent._pending_user_text assistant_text = event.item.text_content or "" # Clear immediately so a duplicate event can't double-save. agent._pending_user_text = None # Fire-and-forget; save_response itself swallows transient # errors and never blocks the next turn. asyncio.create_task( client.save_response( user_text=user_text, assistant_text=assistant_text, user={"id": user_id}, ) ) await session.start(agent=agent, room=ctx.room) finally: await client.aclose() if __name__ == "__main__": cli.run_app(WorkerOptions(entrypoint_fnc=entrypoint))The same custom-agent shape works outside LiveKit too — the full reference is examples/livekit/voice_agents/memoair_voice_custom_agent.py in the repo. Strip the LiveKit imports and the only MemoAir calls left are client.search_memory(query, user=…) and client.save_response(…, user=…). Skip the save_response wiring for a search-only agent; client.search_memory also accepts a lanes kwarg (subset of ["profile", "working", "permanent", "org"]) if you want to gate which lanes participate per turn.
Run the agent
Start a local LiveKit server in one terminal:
livekit-server --devThen in a second terminal, download the LiveKit VAD / turn-detector models (one-time) and talk to the agent from the console:
python agent.py download-filespython agent.py consoleFor a browser UI, run python agent.py dev instead and point LiveKit Agents Playground at your local server.
How memory works
- •Search before reply. On every user turn the SDK queries the local memory runtime across four lanes (profile, working, permanent, org) and returns a composed
contextTextin well under 250 ms. - •Inject as system message. Recalled context is added to
turn_ctxas a system message right before the user's message — no tool-call round trip, no extra LLM latency. - •Save after reply. The completed (user, assistant) pair is forwarded to
save_response, which lands the turn in the working and permanent lanes. The agent's memory grows with every call. - •Lanes split by scope. Org index = project-scoped, shared across all users (the data seeded in step 4). Working + permanent = per-
user_id, isolated.
Troubleshooting
pool.exhausted / pool.port_exhausted. The client pool hit its max_concurrent_users cap (with every handle in-flight), or every port in runtime_port_range is bound. Bump the cap, widen the port range, or wait for an in-flight call to release. See the pool reference for tuning.
Save runs but nothing appears in MemoAir. Check that you actually wired conversation_item_added. LiveKit's base Agent does not call on_agent_response_completed on its own.
Conversation items don't fire on barge-in. LiveKit emits conversation_item_added only when a turn is finalised — interrupted turns may never reach this event. Track barge-in separately if you want them saved.
Search returns empty context. Expected on the first turn before any data is seeded. Run step 4 (dashboard upload or build_index.py) to populate the org lane, and let the agent run a few turns to populate the per-user lanes.