Retell × MemoAir Voice Memory
Drive memory from a Retell custom-function webhook. Your bridge server exposes search_memory / record_turn endpoints; each handler calls the public MemoAirVoiceClient surface with a per-call user={id,name,metadata}.
v0.3 status — use the custom-agent pattern
A first-class Retell adapter (memoair-retell) is on the roadmap for v0.4. For Retell today, expose your own FastAPI webhook routes and call MemoAirVoiceClient.search_memory + MemoAirVoiceClient.save_response from inside the handlers — see the LiveKit Option B walkthrough and the canonical custom-agent reference at examples/livekit/voice_agents/memoair_voice_custom_agent.py for the same shape.
Install
pip install "memoair-voice>=0.3.1" "fastapi>=0.110" "uvicorn>=0.27" python-dotenvEnvironment
MEMOAIR_API_KEY=memoair_pk_...MEMOAIR_PROJECT_ID=proj_...MEMOAIR_AGENT_ID=agent_...RETELL_SIGNING_SECRET=...The MemoAir bits — three calls
Construct one client at boot, then call search_memory from your POST /retell/search_memory handler and save_response from your POST /retell/record_turn handler. Resolve user.id from the call payload (a customer ID in metadata, a phone-number hash, etc.) per request.
import osfrom typing import Anyfrom dotenv import load_dotenvfrom fastapi import FastAPIfrom memoair_voice import MemoAirVoiceClient load_dotenv() app = FastAPI(title="MemoAir Retell bridge") # ONE client per bridge process. Pool fans out across concurrent callers.client = MemoAirVoiceClient( api_key=os.environ["MEMOAIR_API_KEY"], project_id=os.environ["MEMOAIR_PROJECT_ID"], agent_id=os.environ["MEMOAIR_AGENT_ID"],) def user_id_from_call(call: dict[str, Any]) -> str: metadata = call.get("metadata") or {} return ( metadata.get("memoair_user_id") or call.get("from_number") or call.get("to_number") or "anonymous" ) @app.post("/retell/search_memory")async def search_memory(payload: dict[str, Any]) -> dict[str, str]: caller_id = user_id_from_call(payload.get("call") or {}) query = payload.get("args", {}).get("query", "") ctx = await client.search_memory( query, user={"id": caller_id}, timeout_ms=250, ) # Retell expects a string in the function-tool result envelope. return {"contextText": ctx.contextText} @app.post("/retell/record_turn")async def record_turn(payload: dict[str, Any]) -> dict[str, str]: caller_id = user_id_from_call(payload.get("call") or {}) args = payload.get("args", {}) await client.save_response( user_text=args.get("user_text", ""), assistant_text=args.get("assistant_text", ""), user={"id": caller_id}, ) return {"status": "ok"} @app.get("/health")def health() -> dict[str, str]: return {"status": "ok"}Configure Retell
Add these custom functions / webhooks on your Retell agent:
- search_memory →
POST https://your-server.com/retell/search_memory - record_turn →
POST https://your-server.com/retell/record_turn
The search_memory function should accept a required query string. Return contextText for Retell to feed back to the model as a system-message slot.
Production shape — many concurrent callers
A Retell bridge usually serves many concurrent callers from one process. The shared MemoAirVoiceClient uses an internal RuntimePool that spawns one voice-runtime process per (project_id, user.id), capped by max_concurrent_users. Bump the cap to match the concurrency target of your Retell deployment.
client = MemoAirVoiceClient( api_key=os.environ["MEMOAIR_API_KEY"], project_id=os.environ["MEMOAIR_PROJECT_ID"], agent_id=os.environ["MEMOAIR_AGENT_ID"], max_concurrent_users=64, runtime_idle_ttl_s=300,)See the multi-workspace SaaS guide for the partner / project-per-customer pattern when one Retell bridge fronts multiple customer workspaces.