Pipecat × MemoAir Voice Memory

Add persistent voice memory to a Pipecat pipeline. The pattern is the same as for any non-LiveKit framework: construct one MemoAirVoiceClient at boot, call search_memory before the LLM stage, and save_response after the assistant stage — all with a per-call user={id,name,metadata}.

v0.3 status — use the custom-agent pattern

A first-class Pipecat adapter (memoair-pipecat) is on the roadmap for v0.4. For Pipecat today, integrate via the public MemoAirVoiceClient surface — the same pattern documented as Option B on the LiveKit page and demoed end-to-end at examples/livekit/voice_agents/memoair_voice_custom_agent.py. Strip the LiveKit imports from that example and the only MemoAir surface left is the three calls below.

Install

terminal

BASH

pip install "memoair-voice>=0.3.1" python-dotenv \
  "pipecat-ai[deepgram,cartesia,openai,silero,daily]>=0.0.55"

Environment

.env

BASH

MEMOAIR_API_KEY=memoair_pk_...
MEMOAIR_PROJECT_ID=proj_...
MEMOAIR_AGENT_ID=agent_...
OPENAI_API_KEY=sk_...
DEEPGRAM_API_KEY=...
CARTESIA_API_KEY=...

Resolve user.id from your room, websocket, or app auth at runtime and pass it on every search_memory / save_response call. Do not put a shared user ID in your env.

The MemoAir bits — three calls

MemoAir does not need a Pipecat-specific frame processor today. You wrap the LLM stage with three calls — construct, search_memory, save_response — and inject the recalled context into the LLM context aggregator the same way Pipecat memory services do (user context → memory → LLM → TTS).

pipecat_memoair.py

PYTHON

import os
from dotenv import load_dotenv
from memoair_voice import MemoAirVoiceClient
 
load_dotenv()
 
# 1. ONE client per bot process, constructed at boot.
client = MemoAirVoiceClient(
    api_key=os.environ["MEMOAIR_API_KEY"],
    project_id=os.environ["MEMOAIR_PROJECT_ID"],
    agent_id=os.environ["MEMOAIR_AGENT_ID"],
)
 
 
async def on_user_text(*, user_text: str, caller_id: str) -> str:
    # 2. Recall before the LLM stage. Returns a system-message-ready
    #    contextText covering profile + working + permanent + org.
    ctx = await client.search_memory(
        user_text,
        user={"id": caller_id},
        timeout_ms=250,
    )
    return ctx.contextText  # splice into your OpenAILLMContext system slot
 
 
async def on_assistant_text(*, user_text: str, assistant_text: str, caller_id: str) -> None:
    # 3. Persist the completed turn so memory grows over time.
    await client.save_response(
        user_text=user_text,
        assistant_text=assistant_text,
        user={"id": caller_id},
    )

Wire on_user_text into your OpenAILLMContext system message right before the LLM processor consumes the frame; wire on_assistant_text into the post-assistant aggregator. Replace OpenAILLMContext with the equivalent for your provider — the surface is provider-agnostic.

Production shape — many concurrent callers

The single client process can serve many concurrent callers. MemoAirVoiceClient owns an internal RuntimePool that spawns one voice-runtime per (project_id, user.id), capped by max_concurrent_users. No sidecar to provision; no per-call cold-start.

bridge_at_scale.py

PYTHON

client = MemoAirVoiceClient(
    api_key=os.environ["MEMOAIR_API_KEY"],
    project_id=os.environ["MEMOAIR_PROJECT_ID"],
    agent_id=os.environ["MEMOAIR_AGENT_ID"],
    max_concurrent_users=64,
    runtime_idle_ttl_s=300,
)

See the multi-workspace SaaS guide for the partner / project-per-customer pattern when one Pipecat bridge serves multiple customer workspaces.

Advanced

Need lane-level control or an LLM-decided search_memory tool? See the advanced tool surface.