Home/Documentation

⌘K

LiveKit Agents

LiveKit × MemoAir Voice Memory

Add persistent voice memory to a LiveKit Agents pipeline. MemoAir ships Python and TypeScript SDKs — the local memory runtime is bundled and auto-launched on first use, so there is no Docker step. Search results are injected before each LLM reply, and completed turns are saved back.

Choose your path

TypeScript worker

Use memoair-livekit with the Node @livekit/agents SDK.

Python worker

Use memoair-livekit with Python LiveKit Agents.

Custom framework

Use memoair-voice directly in your own pipeline.

Install

Pick the package family for your worker language. Both use the same local memory runtime contract — no Docker required.

terminal

BASH

# TypeScript / Node LiveKit Agents
npm install memoair-voice memoair-livekit @livekit/agents
 
# Python LiveKit Agents
pip install "memoair-voice>=0.3.2" "memoair-livekit>=0.3.2" \
  "livekit-agents>=1.0" \
  "livekit-plugins-openai" "livekit-plugins-deepgram" \
  "livekit-plugins-silero" "livekit-plugins-turn-detector" \
  python-dotenv

memoair-livekit depends on memoair-voice. Install both when your code also uses the raw client for index seeding, smoke tests, or custom framework logic.

Get your API key, project ID, and agent ID

API key — looks like memoair_pk_…. Account-scoped: one per account, shared across every project and agent in your org. Treat as a secret.
Project ID — looks like proj_…. Identifies the workspace. Safe to log.
Agent ID — looks like agent_…. Identifies the voice bot inside the project. Safe to log.

For per-user isolation MemoAir uses the user_id you pin at agent construction time (LiveKit dispatch is 1:1, so the LiveKit drop-in keeps single-user construction). Use the LiveKit participant identity so each caller keeps their own memory.

Environment setup

Create a .env next to your agent script:

.env

BASH

# MemoAir
MEMOAIR_API_KEY=memoair_pk_...
MEMOAIR_PROJECT_ID=proj_...
MEMOAIR_AGENT_ID=agent_...
 
# LiveKit (devkey works for local livekit-server --dev)
LIVEKIT_URL=ws://localhost:7880
LIVEKIT_API_KEY=devkey
LIVEKIT_API_SECRET=secret
 
# AI provider keys
OPENAI_API_KEY=sk-...
DEEPGRAM_API_KEY=...

Seed the knowledge base

Static knowledge (FAQs, docs, user profile facts) lives in MemoAir's org index — project-scoped, shared across all users. Conversation turns (per-user) get saved automatically once the agent runs; you only need to seed the org index once. Pick either path below — both land in the same index.

Option A — Dashboard upload (no code)

At dashboard.memoair.space open your project, go to Knowledge → New index, name it agent-memory, and upload PDFs / text / paste raw notes. The dashboard handles chunking and embedding. Best for non-trivial corpora (PDFs, long docs, bulk ingest beyond 100 docs per call).

Option B — Code-driven seed

Good for structured facts you want to keep next to your agent code (user profile, FAQ entries, sample memories). The cloud endpoint /v1/voice-sdk/index/create is build-or-append: the first call creates the index, later calls add documents to the same index name. Limit per call: 100 docs and 1 MB total.

build_index.py

PYTHON

import asyncio
import os
from dotenv import load_dotenv
 
# MemoAirVoiceClient is the cloud + local-runtime wrapper. We only need its
# create_index() here — the same client also exposes search_memory() and
# save_response() used by the agent at runtime.
from memoair_voice import MemoAirVoiceClient
 
load_dotenv()
 
# Pick a stable name. The agent at runtime searches "all indexes" for this
# project, so the exact name doesn't have to match anything — but reusing
# the same name across runs makes "re-seed" trivial (it's build-or-append).
INDEX_NAME = "agent-memory"
 
# Each memory is a dict with three fields:
#   - id:       stable per-document key; re-using an id REPLACES that doc on
#               the next call (idempotent update).
#   - text:     the searchable content. Plain text; the cloud handles chunking
#               + embedding.
#   - metadata: optional free-form tags. Surfaces in search results so your
#               prompt can filter / cite. Keep values as strings for now.
MEMORIES = [
    {
        "id": "return-policy",
        "text": "Returns accepted within 30 days of purchase with a receipt.",
        "metadata": {"kind": "faq", "topic": "returns"},
    },
    {
        "id": "shipping",
        "text": "Standard shipping is 3-5 business days. Express is 1-2 days.",
        "metadata": {"kind": "faq", "topic": "shipping"},
    },
    {
        "id": "support",
        "text": "Support is available 24/7 at support@example.com.",
        "metadata": {"kind": "faq", "topic": "support"},
    },
]
 
 
async def main() -> None:
    # create_index is a cloud-only call — it doesn't need a runtime, so
    # the build script never opens the local pool. aclose() releases the
    # cloud HTTP client cleanly.
    client = MemoAirVoiceClient(
        api_key=os.environ["MEMOAIR_API_KEY"],
        project_id=os.environ["MEMOAIR_PROJECT_ID"],
        agent_id=os.environ["MEMOAIR_AGENT_ID"],
    )
    try:
        # First call CREATES the index. Subsequent calls with the same
        # INDEX_NAME APPEND new documents (or replace ones with the same id).
        # Cap per call: 100 docs and 1 MB total — batch larger corpora.
        result = await client.create_index(INDEX_NAME, MEMORIES)
 
        # chunk_count = how many embedding chunks the cloud produced from
        # your docs (long text gets split). version bumps every time the
        # index changes; the runtime uses it to detect when to re-pull.
        print(f"Indexed {result.chunk_count} chunks (version={result.version})")
    finally:
        await client.aclose()
 
 
if __name__ == "__main__":
    asyncio.run(main())

terminal

BASH

python build_index.py

Wire MemoAir into your agent

Pick the option that matches how much control you want. Both end up doing the same thing — search before each reply, inject context, save the completed turn.

Option A — `memoair-livekit` (drop-in)

MemoAirLiveKitAgent subclasses LiveKit's Agent base class and wires every lifecycle hook for you:

on_user_turn_completed → calls search_memory and splices a SYSTEM message before the user's message (LiveKit #5053 ordering workaround).
on_agent_response_completed → calls save_response(user_text, assistant_text).
on_enter / on_exit → manages the local runtime session (auto-spawns the bundled runtime on 127.0.0.1:7878).

agent.py

PYTHON

import asyncio
import os
from dotenv import load_dotenv
from livekit.agents import AgentSession, JobContext, WorkerOptions, cli
from livekit.plugins import deepgram, openai, silero
from livekit.plugins.turn_detector.english import EnglishModel
 
# MemoAirLiveKitAgent is a livekit.agents.Agent subclass — drop-in. It owns:
#   - on_enter: opens the per-(project, user) runtime via the SDK's pool
#   - on_user_turn_completed: calls search_memory and splices recall as a
#     SYSTEM message BEFORE the user message (LiveKit #5053 workaround)
#   - on_exit: tears down the runtime session cleanly
from memoair_livekit import MemoAirLiveKitAgent
 
load_dotenv()
 
 
async def entrypoint(ctx: JobContext) -> None:
    await ctx.connect()
 
    # user_id pins the per-user memory lane for this LiveKit job (LiveKit
    # dispatch is 1:1, so the drop-in keeps single-user construction).
    # Same identity across calls = same per-user memory. In real dispatch
    # this is a real string; in CLI "console" mode the LiveKit harness
    # mocks it (a MagicMock would silently corrupt the runtime identity),
    # so we fall back to a stable default that lets repeated runs reuse
    # the same memory.
    raw = ctx.room.local_participant.identity
    user_id = raw if isinstance(raw, str) and raw.strip() else "console-user"
 
    agent = MemoAirLiveKitAgent(
        api_key=os.environ["MEMOAIR_API_KEY"],
        project_id=os.environ["MEMOAIR_PROJECT_ID"],
        agent_id=os.environ["MEMOAIR_AGENT_ID"],
        user_id=user_id,
        # System prompt the LLM sees on every turn. Tell it that recall will
        # arrive as a system message — otherwise some models ignore the
        # context. Keep replies short because they're being spoken aloud.
        instructions=(
            "You are a helpful voice assistant with access to long-term memory. "
            "Relevant memories are injected as a system message before each reply — "
            "ground your answers in that context whenever it applies. Keep replies "
            "concise; the user is hearing them spoken aloud."
        ),
    )
 
    session = AgentSession(
        stt=deepgram.STT(),
        llm=openai.LLM(model="gpt-4o"),
        tts=openai.TTS(),
        turn_detection=EnglishModel(),
        vad=silero.VAD.load(),
    )
 
    # SAVE wiring (required). LiveKit's base Agent never calls
    # on_agent_response_completed itself, so MemoAirLiveKitAgent's save path
    # wouldn't fire on its own. We forward LiveKit's conversation_item_added
    # event for assistant turns, which is what triggers save_response.
    #
    # Drop this whole block to make the agent read-only (search but never
    # save). Useful if you only want to surface the pre-seeded org index.
    @session.on("conversation_item_added")
    def _on_item_added(event) -> None:
        # event.item can be a ChatMessage (has .role) or an AgentHandoff
        # (no .role, raises AttributeError). Use getattr to skip handoffs
        # silently — without this, the first handoff event at session boot
        # would crash this listener and silently disable saves for the
        # rest of the session.
        if getattr(event.item, "role", None) == "assistant":
            asyncio.create_task(
                agent.on_agent_response_completed(None, event.item)
            )
 
    await session.start(agent=agent, room=ctx.room)
 
 
if __name__ == "__main__":
    cli.run_app(WorkerOptions(entrypoint_fnc=entrypoint))

Drop the conversation_item_added block to make the agent read-only (search but never write). Reference example in the repo: examples/livekit/voice_agents/memoair_agent.py.

Option B — `memoair-voice` (full control)

Same shape as a standard retrieval integration: you subclass Agent, override on_user_turn_completed, call search_memory on the MemoAir client, and add the recalled context to turn_ctx as a system message. Use this when you want to compose your own prompt, gate which lanes get searched, or skip saving turns altogether.

agent.py

PYTHON

import asyncio
import logging
import os
from dotenv import load_dotenv
from livekit.agents import (
    Agent,
    AgentSession,
    ChatContext,
    ChatMessage,
    JobContext,
    WorkerOptions,
    cli,
)
from livekit.plugins import deepgram, openai, silero
from livekit.plugins.turn_detector.english import EnglishModel
 
# MemoAirVoiceClient is the raw client — no LiveKit assumptions. We drive
# search_memory and save_response ourselves so we can decide exactly when
# and how memory participates in each turn.
from memoair_voice import MemoAirVoiceClient
 
load_dotenv()
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger("memoair-agent")
 
 
class MemoAirRetrievalAgent(Agent):
    """LiveKit Agent that runs search_memory before every reply.
 
    Standard retrieval shape: subclass Agent, override
    on_user_turn_completed, search + inject, let LiveKit's pipeline carry
    on. The save path is wired separately from AgentSession events (see
    entrypoint() below) because LiveKit's base Agent doesn't expose a
    post-reply hook.
    """
 
    def __init__(self, client: MemoAirVoiceClient, *, user_id: str):
        super().__init__(
            instructions=(
                "You are a helpful voice assistant. Use the memory context "
                "provided in system messages to answer the user. If memory "
                "does not cover the question, answer from general knowledge."
            )
        )
        self._client = client
        self._user_id = user_id
        # Cache the user turn text so the save handler in entrypoint() can
        # pair it with the eventual assistant text. None means "no user
        # turn pending" (interrupted turn, replay, race).
        self._pending_user_text: str | None = None
 
    async def on_user_turn_completed(
        self, turn_ctx: ChatContext, new_message: ChatMessage
    ) -> None:
        # LiveKit calls this AFTER the user message is finalised but BEFORE
        # the LLM is invoked — the perfect window to splice in recall.
        user_text = new_message.text_content or ""
        self._pending_user_text = user_text
        try:
            # 250 ms keeps the hot path snappy; the runtime falls back to
            # whatever lanes responded in time and returns a composed
            # contextText string covering profile + working + permanent + org.
            result = await self._client.search_memory(
                user_text,
                user={"id": self._user_id},
                timeout_ms=250,
            )
            context = (result.contextText or "").strip()
            if context:
                # LiveKit issue #5053: turn_ctx.add_message(role="system",...)
                # orders by created_at, which lands AFTER new_message because
                # the user spoke earlier. The LLM needs recall FIRST, so we
                # splice directly into turn_ctx.items at the index immediately
                # before the user's message.
                idx = turn_ctx.items.index(new_message)
                turn_ctx.items.insert(
                    idx,
                    ChatMessage(role="system", content=[context]),
                )
        except Exception as exc:
            # Never block the LLM on a memory failure — log and continue
            # with no recall. Timeouts, network errors, runtime restarts
            # all land here.
            logger.warning("search_memory failed: %s", exc)
 
 
async def entrypoint(ctx: JobContext) -> None:
    await ctx.connect()
    # Per-user memory pin. Same identity across calls = same memory.
    user_id = ctx.room.local_participant.identity or "console-user"
 
    # Construct the MemoAir client for the lifetime of this LiveKit job.
    # MemoAirVoiceClient owns an internal runtime pool keyed by
    # (project_id, user.id) — pass user={...} on every search/save call
    # below so the pool routes correctly. aclose() flushes every pooled
    # runtime's outbox before tearing them down.
    client = MemoAirVoiceClient(
        api_key=os.environ["MEMOAIR_API_KEY"],
        project_id=os.environ["MEMOAIR_PROJECT_ID"],
        agent_id=os.environ["MEMOAIR_AGENT_ID"],
    )
    try:
        agent = MemoAirRetrievalAgent(client, user_id=user_id)
 
        session = AgentSession(
            stt=deepgram.STT(),
            llm=openai.LLM(model="gpt-4o"),
            tts=openai.TTS(),
            turn_detection=EnglishModel(),
            vad=silero.VAD.load(),
        )
 
        # SAVE wiring. LiveKit's Agent base class has no on_assistant_turn
        # hook, so we listen on AgentSession's conversation_item_added event
        # — the only place we reliably see a finalised assistant message.
        # Drop this whole block for a search-only agent.
        @session.on("conversation_item_added")
        def _on_item_added(event) -> None:
            # Skip non-assistant items (user turns, agent handoffs). Also
            # skip if we have no paired user text (interrupted turn etc.) —
            # otherwise save_response would land a "" user side.
            if (
                getattr(event.item, "role", None) == "assistant"
                and agent._pending_user_text is not None
            ):
                user_text = agent._pending_user_text
                assistant_text = event.item.text_content or ""
                # Clear immediately so a duplicate event can't double-save.
                agent._pending_user_text = None
                # Fire-and-forget; save_response itself swallows transient
                # errors and never blocks the next turn.
                asyncio.create_task(
                    client.save_response(
                        user_text=user_text,
                        assistant_text=assistant_text,
                        user={"id": user_id},
                    )
                )
 
        await session.start(agent=agent, room=ctx.room)
    finally:
        await client.aclose()
 
 
if __name__ == "__main__":
    cli.run_app(WorkerOptions(entrypoint_fnc=entrypoint))

The same custom-agent shape works outside LiveKit too — the full reference is examples/livekit/voice_agents/memoair_voice_custom_agent.py in the repo. Strip the LiveKit imports and the only MemoAir calls left are client.search_memory(query, user=…) and client.save_response(…, user=…). Skip the save_response wiring for a search-only agent; client.search_memory also accepts a lanes kwarg (subset of ["profile", "working", "permanent", "org"]) if you want to gate which lanes participate per turn.

Run the agent

Start a local LiveKit server in one terminal:

terminal-1

BASH

livekit-server --dev

Then in a second terminal, download the LiveKit VAD / turn-detector models (one-time) and talk to the agent from the console:

terminal-2

BASH

python agent.py download-files
python agent.py console

For a browser UI, run python agent.py dev instead and point LiveKit Agents Playground at your local server.

How memory works

•Search before reply. On every user turn the SDK queries the local memory runtime across four lanes (profile, working, permanent, org) and returns a composed contextText in well under 250 ms.
•Inject as system message. Recalled context is added to turn_ctx as a system message right before the user's message — no tool-call round trip, no extra LLM latency.
•Save after reply. The completed (user, assistant) pair is forwarded to save_response, which lands the turn in the working and permanent lanes. The agent's memory grows with every call.
•Lanes split by scope. Org index = project-scoped, shared across all users (the data seeded in step 4). Working + permanent = per-user_id, isolated.

Troubleshooting

pool.exhausted / pool.port_exhausted. The client pool hit its max_concurrent_users cap (with every handle in-flight), or every port in runtime_port_range is bound. Bump the cap, widen the port range, or wait for an in-flight call to release. See the pool reference for tuning.

Save runs but nothing appears in MemoAir. Check that you actually wired conversation_item_added. LiveKit's base Agent does not call on_agent_response_completed on its own.

Conversation items don't fire on barge-in. LiveKit emits conversation_item_added only when a turn is finalised — interrupted turns may never reach this event. Track barge-in separately if you want them saved.

Search returns empty context. Expected on the first turn before any data is seeded. Run step 4 (dashboard upload or build_index.py) to populate the org lane, and let the agent run a few turns to populate the per-user lanes.

LiveKit × MemoAir Voice Memory

Choose your path

Install

Get your API key, project ID, and agent ID

Environment setup

Seed the knowledge base

Option A — Dashboard upload (no code)

Option B — Code-driven seed

Wire MemoAir into your agent

Option A — memoair-livekit (drop-in)

Option B — memoair-voice (full control)

Run the agent

How memory works

Troubleshooting

Option A — `memoair-livekit` (drop-in)

Option B — `memoair-voice` (full control)