Retell × MemoAir Voice Memory

Drive memory from a Retell custom-function webhook. Your bridge server exposes search_memory / record_turn endpoints; each handler calls the public MemoAirVoiceClient surface with a per-call user={id,name,metadata}.

v0.3 status — use the custom-agent pattern

A first-class Retell adapter (memoair-retell) is on the roadmap for v0.4. For Retell today, expose your own FastAPI webhook routes and call MemoAirVoiceClient.search_memory + MemoAirVoiceClient.save_response from inside the handlers — see the LiveKit Option B walkthrough and the canonical custom-agent reference at examples/livekit/voice_agents/memoair_voice_custom_agent.py for the same shape.

Install

terminal

BASH

pip install "memoair-voice>=0.3.1" "fastapi>=0.110" "uvicorn>=0.27" python-dotenv

Environment

.env

BASH

MEMOAIR_API_KEY=memoair_pk_...
MEMOAIR_PROJECT_ID=proj_...
MEMOAIR_AGENT_ID=agent_...
RETELL_SIGNING_SECRET=...

The MemoAir bits — three calls

Construct one client at boot, then call search_memory from your POST /retell/search_memory handler and save_response from your POST /retell/record_turn handler. Resolve user.id from the call payload (a customer ID in metadata, a phone-number hash, etc.) per request.

retell_bridge.py

PYTHON

import os
from typing import Any
from dotenv import load_dotenv
from fastapi import FastAPI
from memoair_voice import MemoAirVoiceClient
 
load_dotenv()
 
app = FastAPI(title="MemoAir Retell bridge")
 
# ONE client per bridge process. Pool fans out across concurrent callers.
client = MemoAirVoiceClient(
    api_key=os.environ["MEMOAIR_API_KEY"],
    project_id=os.environ["MEMOAIR_PROJECT_ID"],
    agent_id=os.environ["MEMOAIR_AGENT_ID"],
)
 
 
def user_id_from_call(call: dict[str, Any]) -> str:
    metadata = call.get("metadata") or {}
    return (
        metadata.get("memoair_user_id")
        or call.get("from_number")
        or call.get("to_number")
        or "anonymous"
    )
 
 
@app.post("/retell/search_memory")
async def search_memory(payload: dict[str, Any]) -> dict[str, str]:
    caller_id = user_id_from_call(payload.get("call") or {})
    query = payload.get("args", {}).get("query", "")
    ctx = await client.search_memory(
        query, user={"id": caller_id}, timeout_ms=250,
    )
    # Retell expects a string in the function-tool result envelope.
    return {"contextText": ctx.contextText}
 
 
@app.post("/retell/record_turn")
async def record_turn(payload: dict[str, Any]) -> dict[str, str]:
    caller_id = user_id_from_call(payload.get("call") or {})
    args = payload.get("args", {})
    await client.save_response(
        user_text=args.get("user_text", ""),
        assistant_text=args.get("assistant_text", ""),
        user={"id": caller_id},
    )
    return {"status": "ok"}
 
 
@app.get("/health")
def health() -> dict[str, str]:
    return {"status": "ok"}

Configure Retell

Add these custom functions / webhooks on your Retell agent:

search_memory → POST https://your-server.com/retell/search_memory
record_turn → POST https://your-server.com/retell/record_turn

The search_memory function should accept a required query string. Return contextText for Retell to feed back to the model as a system-message slot.

Production shape — many concurrent callers

A Retell bridge usually serves many concurrent callers from one process. The shared MemoAirVoiceClient uses an internal RuntimePool that spawns one voice-runtime process per (project_id, user.id), capped by max_concurrent_users. Bump the cap to match the concurrency target of your Retell deployment.

bridge_at_scale.py

PYTHON

client = MemoAirVoiceClient(
    api_key=os.environ["MEMOAIR_API_KEY"],
    project_id=os.environ["MEMOAIR_PROJECT_ID"],
    agent_id=os.environ["MEMOAIR_AGENT_ID"],
    max_concurrent_users=64,
    runtime_idle_ttl_s=300,
)

See the multi-workspace SaaS guide for the partner / project-per-customer pattern when one Retell bridge fronts multiple customer workspaces.

Advanced

Need custom turn pairing, lane gating, or LLM-decided memory tools? See the advanced tool surface.