Running MemoAir for many tenants — the Fundamento pattern

When you build voice agents on behalf of your customers, you need MemoAir to scale along three axes at once: many customer workspaces, many bot personas per workspace, and many concurrent callers per bot. This guide walks through how Fundamento (a voice-agent builder) lays that out — projects per customer, agents per use-case, end-users per call — and shows the partner-API and SDK calls that make it work.

The example: Fundamento

Fundamento sells a voice-agent platform to enterprises. Three of their customers — Acme, Beta, Gamma — each get their own dedicated voice bot, their own knowledge base, and isolated per-caller memory. Mapping that onto MemoAir's identity model:

Fundamento (account)            — one MemoAir org, one memoair_pk_… key
├── Acme (project)              — proj_acme
│   └── "Returns concierge"     — agent_acme_returns
│       ├── caller hash 0xab12  — end-user (per call)
│       └── caller hash 0xcd34  — end-user (per call)
├── Beta (project)              — proj_beta
│   └── "Triage IVR"            — agent_beta_triage
│       └── caller hash 0xef56  — end-user (per call)
└── Gamma (project)             — proj_gamma
    └── "Onboarding tutor"      — agent_gamma_onboard

Account = Fundamento. One MemoAir org. One memoair_pk_… key. All partner-API calls and SDK calls authenticate with that one key.
Project = customer workspace. Acme, Beta, Gamma each get a project. Knowledge bases, profiles, permanent memory — all scoped per project, no cross-tenant leak.
Agent = voice bot. Each customer has at least one voice bot identity per use-case. Agents own their prompt, eval suite, and dashboard analytics.
End-user = caller. Per call, never per process. Owns the per-user profile and permanent lanes; routed via per-call user={id,name,metadata} through the SDK runtime pool.

Onboard a new customer (partner API)

When Fundamento signs Acme as a customer, their backend calls the MemoAir partner API to provision a project and agent. The same account API key authorises both calls; tenancy isolation comes from the project/agent IDs, not from a per-tenant key.

onboard.sh

BASH

# 1. Create the project for the customer.
curl -X POST https://backend.memoair.space/v1/projects \
  -H "Authorization: Bearer $MEMOAIR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "Acme",
    "slug": "acme",
    "description": "Acme returns + support automation"
  }'
# → { "id": "proj_acme", ... }
 
# 2. Create the voice agent inside the project. The org is inferred from
#    the API key; you don't need to pass an org ID.
curl -X POST https://backend.memoair.space/v1/projects/proj_acme/agents \
  -H "Authorization: Bearer $MEMOAIR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "Returns concierge",
    "agentType": "voice"
  }'
# → { "id": "agent_acme_returns", ... }

Persist (customer, project_id, agent_id) in your own tenant table. Your call router uses that mapping at dispatch time to pick the right MemoAir identity.

Boot one client per agent

Each agent process (LiveKit worker, Pipecat bot, custom bridge) constructs one MemoAirVoiceClient at startup — bound to the customer's project + agent — and serves every concurrent caller from the SDK's internal RuntimePool. Each call passes its own user={id,name,metadata}; the pool spawns / reuses a dedicated voice-runtime per (project_id, user.id), capped by max_concurrent_users.

client_factory.py

PYTHON

from dataclasses import dataclass
from memoair_voice import MemoAirVoiceClient
 
 
@dataclass(frozen=True)
class TenantBinding:
    project_id: str
    agent_id: str
 
 
# Filled at boot from your tenant table. The same account API key works
# for every project — it's account-scoped, not per-project.
TENANTS: dict[str, TenantBinding] = {
    "acme":  TenantBinding(project_id="proj_acme",  agent_id="agent_acme_returns"),
    "beta":  TenantBinding(project_id="proj_beta",  agent_id="agent_beta_triage"),
    "gamma": TenantBinding(project_id="proj_gamma", agent_id="agent_gamma_onboard"),
}
 
 
def make_client(tenant: str, *, api_key: str) -> MemoAirVoiceClient:
    """One client per agent process. Pool fans out across end-users."""
    binding = TENANTS[tenant]
    return MemoAirVoiceClient(
        api_key=api_key,                # the single account key
        project_id=binding.project_id,  # this customer's workspace
        agent_id=binding.agent_id,      # this customer's voice bot
        max_concurrent_users=64,        # raise if your bridge fans out hard
        runtime_idle_ttl_s=300,         # default; reaper drops idle handles
    )

Route each call (per-call user_id)

Every search_memory and save_response takes a required user={id,name,metadata} object. The SDK pools runtimes by (project_id, user.id) for you — no sidecar allocation, no per-call cold-start. The name and metadata are persisted as a durable session_start event so the dashboard can later render caller lists.

turn_handler.py

PYTHON

async def handle_turn(
    client: MemoAirVoiceClient,
    *,
    caller_id: str,        # phone hash, signed cookie, etc.
    caller_name: str,
    user_text: str,
) -> str:
    user = {
        "id": caller_id,
        "name": caller_name,
        "metadata": {"tenant": "acme", "source": "voice"},
    }
    # Pool routes (caller_id) → dedicated voice-runtime; spawns if absent.
    ctx = await client.search_memory(
        user_text,
        user=user,
        timeout_ms=250,
    )
    assistant_text = await your_llm_call(ctx.contextText, user_text)
 
    # Persist the turn so this caller's permanent lane grows over time.
    await client.save_response(
        user_text=user_text,
        assistant_text=assistant_text,
        user=user,
    )
    return assistant_text

The full custom-agent reference lives at examples/livekit/voice_agents/memoair_voice_custom_agent.py — it is the canonical pattern for any non-LiveKit framework (Pipecat, Vapi, Retell, custom WebRTC bridge). Strip the LiveKit plumbing and the only MemoAir surface left is MemoAirVoiceClient + search_memory(query, user=…) + save_response(…, user=…).

Isolation guarantees

Project isolation. Acme, Beta, Gamma each have their own project ID. Knowledge, profiles, permanent memory — never crossed. Enforced cloud-side by the project ID on every request.
Agent isolation. Each project can run multiple agents (returns concierge, triage IVR, onboarding tutor). Agents share the project's knowledge but keep their own prompt, evals, and analytics — partitioned by X-Agent-Id.
User isolation. The runtime pool spawns one voice-runtime process per (project_id, user.id). Mismatched calls hit runtime.identity_mismatch — the runtime layer enforces single-tenancy even if the SDK were misused.
Disk isolation. Local projections live under {storage_root}/{project_id}/{user_id}/ and are never shared across users.

Capacity planning

max_concurrent_users: cap of live runtimes per client process. Default 20. Plan for peak concurrent callers per agent worker; LRU eviction handles the long tail (warm bootstrap on the next call).
runtime_idle_ttl_s: idle reaper threshold. Default 300 s. Drop on memory-pressured hosts; raise where most of your callers come back within a few minutes.
runtime_port_range: inclusive loopback range. Default (7878, 7977) = 100 ports. Widen on hosts with hundreds of concurrent users.
Hitting the pool cap raises pool.exhausted; running out of free ports raises pool.port_exhausted. Surface both back to your call router so it can fail fast and shed load.

Related guides

Use the framework docs for the in-call adapter shape: LiveKit, Pipecat, Vapi, and Retell. For the identity headers and four-level model, see Voice Memory concepts.

Where this lands in code

Custom-agent reference (canonical for non-LiveKit frameworks): examples/livekit/voice_agents/memoair_voice_custom_agent.py
LiveKit drop-in: examples/livekit/voice_agents/memoair_agent.py