AJERISDocs

Platform overview

How Ajeris is put together: gateway, per-user agent containers, MCP tools, and the data flows that connect SMS, voice, and the web.

Ajeris Platform Architecture, Complete System Overview

Date: 2026-04-12 Status: Current production state (end of Plan 9 + omnichannel + work tools sessions)


1. What Ajeris Is

Ajeris is a personal AI agent platform where each user gets their own isolated agent instance. The agent manages email, calendar, ride-booking, smart home, music, shopping, web search, Slack, and any custom tool the user connects, all from a text message or voice command.

Core design principle: One agent, many surfaces. The user talks to the same agent whether they're speaking to Alexa, texting via iMessage, or (in the future) using a web dashboard. The conversation is continuous across surfaces, a voice turn can reference an SMS from 2 minutes ago, and the agent can push links to the phone while speaking aloud.


2. System Architecture

2.1 Three-layer stack

USER DEVICES
  ├── iPhone (iMessage/SMS via Twilio)
  ├── Alexa (Echo, Fire TV, Alexa app)
  └── (future: web dashboard, iOS app)
        │
        ▼
┌─────────────────────────────────────┐
│           GATEWAY SERVICE           │
│     (packages/gateway, Express)     │
│                                     │
│  /webhook/sms    ← Twilio webhook   │
│  /alexa/skill    ← Amazon Alexa     │
│  /alexa/oauth/*  ← Account linking  │
│                                     │
│  Responsibilities:                  │
│  • Validate inbound requests        │
│  • Resolve user identity            │
│  • Forward to agent                 │
│  • Format voice output              │
└──────────────┬──────────────────────┘
               │ HTTP POST /agent/voice
               │ HTTP POST /agent/sms
               ▼
┌─────────────────────────────────────┐
│           AGENT SERVICE             │
│  (packages/agent, per-user on       │
│   Railway, one container per user)  │
│                                     │
│  /agent/voice  → channel='voice'    │
│  /agent/sms    → channel='sms'      │
│                                     │
│  On each request:                   │
│  1. Load core memories (top 150)    │
│  2. Load conversation history       │
│     (last 10 turns / 30 min,        │
│      cross-surface)                 │
│  3. Load custom MCP servers         │
│  4. Assemble system prompt          │
│  5. Call Claude Agent SDK           │
│  6. Log turn to DB (channel-tagged) │
│  7. Return reply                    │
└──────────────┬──────────────────────┘
               │ stdio (child process)
               ▼
┌─────────────────────────────────────┐
│        MCP SERVER (ajeris-tools)    │
│    (packages/agent/src/mcp)         │
│                                     │
│  40+ built-in tools across:         │
│  • Memory (save/recall/search/      │
│    list/forget)                     │
│  • Gmail (inbox/read/send/reply/    │
│    search/summarize)                │
│  • Calendar (today/next/create/     │
│    cancel/free-time)                │
│  • Uber (rides + Eats deep links)   │
│  • Hue (status/rooms/lights/        │
│    scenes/activate)                 │
│  • Spotify (play/pause/skip/queue/  │
│    devices/transfer/resume)         │
│  • YouTube (search/info/channel/    │
│    subscriptions/summarize/play)    │
│  • Apple Music (search/playlists/   │
│    add-to-library/create/play)      │
│  • DoorDash (restaurants/food/      │
│    save-order/reorder)              │
│  • Slack (channels/read/unread/     │
│    send/search/react)               │
│  • push_to_phone (cross-surface     │
│    content delivery via SMS)        │
│  + WebSearch + WebFetch (built-in)  │
└─────────────────────────────────────┘
               +
┌─────────────────────────────────────┐
│     CUSTOM MCP SERVERS              │
│     (user's own, bring-your-own)    │
│                                     │
│  Example: arvexi-ops (70+ tools)    │
│  • Salesforce (pipeline, leads)     │
│  • Sentry (errors, issues)          │
│  • PostHog (active users, funnels)  │
│  • BetterStack (uptime monitors)    │
│  • Google Drive, Sheets, Calendar   │
│  • Apollo.io (prospect research)    │
│  • SEC EDGAR (financial filings)    │
│  • DocuSign, Stripe, LinkedIn, X    │
│                                     │
│  Any MCP server: stdio or HTTP      │
│  Config stored in user_mcp_servers  │
│  Credentials encrypted (pgcrypto)   │
└─────────────────────────────────────┘
               │
               ▼
┌─────────────────────────────────────┐
│         POSTGRESQL DATABASE         │
│   (Prisma ORM, RLS-enforced,       │
│    pgcrypto encryption)             │
│                                     │
│  Tables:                            │
│  • users (profile, home address,    │
│    timezone, agent config)          │
│  • oauth_tokens (Google, Spotify,   │
│    Apple Music, Hue, Slack,        │
│    encrypted)                       │
│  • core_memories (facts,            │
│    preferences, routines)           │
│  • conversations (channel-tagged,   │
│    cross-surface history)           │
│  • alexa_user_links (identity       │
│    mapping for tokenless requests)  │
│  • user_mcp_servers (custom MCP     │
│    configs, encrypted env/headers)  │
│  • pending_actions, alexa_auth_     │
│    codes, alexa_verification_codes  │
└─────────────────────────────────────┘

2.2 Data flow: SMS turn

User sends iMessage "what's on my calendar"
  → Twilio receives SMS
  → POST to gateway /webhook/sms
  → Gateway validates Twilio signature
  → Gateway looks up user by phone number
  → Gateway forwards to agent POST /agent/sms
  → Agent loads: memories + history + custom MCP servers
  → Agent assembles system prompt (channel='sms')
  → Agent calls Claude SDK query() with:
      - System prompt (personality + surface inventory + memories)
      - User message (history preamble + current message)
      - MCP servers (ajeris-tools + custom servers)
  → Claude selects calendar_today tool
  → MCP server calls Google Calendar API
  → Tool returns event list with IDs
  → Claude composes SMS-formatted reply (emoji OK, markdown OK)
  → Agent logs turn to conversations (channel='sms')
  → Agent sends reply via Twilio SMS
  → User receives iMessage reply

2.3 Data flow: Voice turn

User says "hey my agent, what's on my calendar"
  → Alexa ASR transcribes speech
  → Alexa NLU matches AjerisCatchAllIntent
  → POST to gateway /alexa/skill (via ngrok/production URL)
  → Gateway validates request (signature or dev bypass)
  → Gateway resolves user:
      1A. JWT in person/user/session accessToken → verify → upsert AlexaUserLink
      1B. If no token → lookup AlexaUserLink by Alexa account ID
      1C. If no mapping → return "please link your account"
  → Gateway forwards query to agent POST /agent/voice
  → Agent loads: memories + history + custom MCP servers
  → Agent assembles system prompt (channel='voice')
  → Agent calls Claude SDK query() with unified tools
  → Claude selects calendar_today tool → Google Calendar API
  → Claude composes voice-formatted reply (NO markdown, NO emoji, NO URLs)
  → Agent logs turn to conversations (channel='voice')
  → Agent returns reply JSON
  → Gateway applies formatForVoice():
      - stripMarkdownForVoice() (bold, headers, lists → spoken text)
      - stripUrlsForVoice() (URLs → "(link sent to your phone)")
      - stripEmojiForVoice() (emoji → removed, prevents TTS reading them)
  → Gateway wraps in Alexa response envelope
  → Alexa TTS speaks the reply

2.4 Data flow: Cross-surface (voice + SMS in one conversation)

Turn 1 (SMS): "remember that my flight is at 3pm tomorrow"
  → Agent saves to core_memories
  → Agent replies on SMS: "Got it, flight at 3pm tomorrow"
  → Logged to conversations with channel='sms'

Turn 2 (Voice, 10 minutes later): "when is my flight?"
  → Agent loads conversation history from DB
      → Sees Turn 1 (SMS): "remember that my flight is at 3pm"
  → Agent also loads core_memories
      → Sees: [fact] flight at 3pm tomorrow
  → Claude answers from BOTH sources
  → Voice: "Your flight is at 3pm tomorrow"
  → Logged to conversations with channel='voice'

2.5 Data flow: Bring-your-own MCP (arvexi example)

Voice: "give me a quick briefing on arvexi"
  → Gateway → Agent
  → Agent loads custom MCP servers from user_mcp_servers table
  → Finds "Arvexi Ops" (stdio, bash -c "cd /arvexi && npx tsx mcp/src/index.ts")
  → Decrypts env vars (81 keys: Salesforce, Sentry, PostHog, etc.)
  → Claude SDK spawns TWO MCP servers:
      1. ajeris-tools (30+ built-in tools)
      2. arvexi-ops (70+ custom tools)
  → Claude sees ~100 tools total
  → For "briefing" query, Claude calls:
      - daily_briefing (arvexi-ops) → "11 orgs, 5006 leases"
      - uptime_status (arvexi-ops) → BetterStack → "all green"
      - posthog_active_users (arvexi-ops) → PostHog → "4 users today"
      - sentry_errors (arvexi-ops) → Sentry → "2 slow DB queries"
  → Claude composes spoken briefing combining all 4 tool results
  → Voice: "Platform, 11 active orgs, 5006 leases. Uptime, all green.
     Users, 4 today. Errors, slow DB query, 252 events since March..."

2.6 Data flow: push_to_phone (cross-surface delivery)

Voice: "book me an uber to the airport"
  → Agent calls uber_request_ride → gets deep link URL
  → System prompt rule: "NEVER speak a URL on voice"
  → Agent calls push_to_phone tool with the URL
  → push_to_phone sends SMS to user's phone via Twilio
  → Agent replies on voice: "Texted you the link, tap to confirm"
  → User receives SMS with tappable Uber deep link
  → User taps → Uber app opens with destination prefilled

3. Connected Services

3.1 Built-in integrations (OAuth during onboarding)

ServiceAuth MethodToolsWhat the user says
GmailGoogle OAuth 2.0inbox, read, send, reply, search, summarize"check my email", "reply to Alice"
CalendarGoogle OAuth 2.0today, next, create, cancel, free-time"what's on my calendar", "cancel the run"
SpotifyOAuth 2.0 + refreshplay, pause, skip, queue, devices, transfer, resume"play John Legend", "pause"
Apple MusicMusicKit JWT + user tokensearch, playlists, add, create, play"play Taylor Swift"
HueCloud2Cloud OAuthstatus, rooms, lights, scenes, activate"turn off the lights", "dim to 50%"
UberDeep link (no OAuth)request-ride, eats-search"uber to the airport", "order chipotle"
YouTubeOAuth 2.0search, info, channel, subscriptions, summarize, play, open"find a recipe video"
DoorDashDeep link + memoryrestaurants, food, save-order, reorder"reorder my usual"
SlackUser token (xoxp-)channels, read, unread, send, search, react"check my Slack", "message Dave"

3.2 Cross-surface tools

ToolPurpose
push_to_phoneSend content (URLs, long text) to the user's phone from any surface
WebSearchReal-time web search via Anthropic's built-in tool
WebFetchFetch and read web pages
Memory (save/recall/search/list/forget)Persistent user knowledge across all turns and surfaces

3.3 Custom MCP servers (bring-your-own)

Any MCP server the user connects. Example with arvexi-ops:

Tool namespaceSource APIExample tools
researchSEC EDGAR, Apollo.iosec_xbrl_frames, apollo_search_people
pipelineSalesforcesf_get_pipeline, sf_create_opportunity
billingStripe, DocuSigncreate_invoice, create_envelope
gmailGoogle Gmailgmail_list_inbox (arvexi mailboxes)
opsSentry, BetterStack, PostHogsentry_errors, uptime_status, posthog_active_users
contentYouTube, LinkedIn, Xyoutube_upload, linkedin_post, x_post_tweet

4. Omnichannel Delivery System

4.1 Delivery surfaces

type DeliverySurface =
  | { kind: 'sms'; phoneNumber: string; platformNumber: string }
  | { kind: 'voice'; device: 'alexa' | 'phone-call' }
  | { kind: 'push'; pushToken: string; platform: 'ios' | 'android' }  // future
  | { kind: 'email'; address: string }                                 // future
  | { kind: 'imessage-rich'; handle: string };                         // future

4.2 How the agent picks the right surface

The system prompt includes a surface inventory:

This user is reachable through these delivery surfaces:
- sms: text message to the phone number ending in 9966 (tappable links, emoji, async ok)
- voice: Alexa skill (spoken TTS, no visuals, no taps)

The CURRENT turn arrived via: voice.

Rules:

  • Voice turn → URL in reply: Agent calls push_to_phone, says "sent you a link"
  • Voice turn → long list: Agent summarizes aloud, push_to_phone sends full list
  • SMS turn → link: Agent embeds link directly in SMS reply (user taps on phone)
  • Either turn → follow-up references other surface: History preamble carries context

4.3 Cross-surface conversation history

(recent conversation, oldest first)
[voice, ~8m ago] user: what's on my calendar today
[voice, ~8m ago] agent: You've got one thing, a run from 6 to 7.
[sms, ~2m ago] user: cancel the run
[sms, ~2m ago] agent: Done, cancelled.

(current turn, via voice)
when is my next meeting?

Loaded from conversations table (channel-tagged). Windowed: last 10 turns OR last 30 minutes, whichever is smaller. Lives in the user message (not system prompt) to preserve prompt caching.


5. Security Model

LayerMechanism
SMS authenticationTwilio HMAC signature validation
Alexa authenticationJWT verification + AlexaUserLink mapping
Database isolationRow-Level Security (RLS) per user
Token storagepgcrypto encryption for all OAuth tokens
Custom MCP credentialsEncrypted env vars + headers in user_mcp_servers
MCP subprocess isolationEach server gets ONLY its own env, not the parent's
Voice safety netformatForVoice strips markdown, emoji, URLs before TTS
Financial safetySystem prompt: CONFIRM required for actions >$100
Process resilienceunhandledRejection handler prevents single-request crashes
Gateway hardeningDefensive body validation, top-level try/catch wrapper

6. Database Schema

TablePurposeKey Fields
usersUser profilephone, agent_name, agent_tone, home_address/lat/lng/timezone
oauth_tokensEncrypted service tokensservice, access_token, refresh_token, expires_at
core_memoriesPersistent knowledgecategory, content, source, times_recalled
conversationsTurn log (cross-surface)role, content, channel, model_used, tokens
alexa_user_linksAlexa account → internal useralexa_user_id (PK), internal_user_id, person_id
user_mcp_serversCustom MCP configstransport, command/url, env_encrypted, tool_schema_hash
alexa_auth_codesOAuth code grantcode, user_id, redirect_uri, expires_at
alexa_verification_codesPhone verificationphone_number, code, attempts, expires_at
pending_actionsConfirmation-gated actionsaction_type, payload, status, expires_at

7. Model Routing

function pickModelTier(message: string): 'haiku' | 'sonnet' {
  // Haiku (fast, cheap) for:
  //   - Smart home: "turn off lights", "dim", "scene"
  //   - Music: "play X", "pause", "next", "volume"
  //   - Simple lookups: "what's on my calendar", "any emails"
  //   - Confirmations: "yes", "no", "ok"
  //   - Memory: "remember X", "what's my Y"
  
  // Sonnet (powerful, reasoning) for:
  //   - Booking: "book an uber", "schedule a meeting"
  //   - Composition: "draft an email", "reply to X"
  //   - Planning: "help me plan", "what should I do"
  //   - Long messages (>160 chars)
  //   - Everything else (default)
}

8. Tech Stack

LayerTechnology
RuntimeNode.js + TypeScript
AIClaude Agent SDK, Haiku 4.5 + Sonnet 4.6
ToolsMCP protocol (stdio + Streamable HTTP)
DatabasePostgreSQL + Prisma + pgcrypto + RLS
SMSTwilio Messaging Service
VoiceAlexa Skills Kit (custom HTTPS endpoint)
GatewayExpress.js
AgentExpress.js (per-user container)
HostingRailway (containers) + ngrok (dev)
TestingVitest (371 tests, 0 failures)