Ajeris Platform Architecture, Complete System Overview
Date: 2026-04-12 Status: Current production state (end of Plan 9 + omnichannel + work tools sessions)
1. What Ajeris Is
Ajeris is a personal AI agent platform where each user gets their own isolated agent instance. The agent manages email, calendar, ride-booking, smart home, music, shopping, web search, Slack, and any custom tool the user connects, all from a text message or voice command.
Core design principle: One agent, many surfaces. The user talks to the same agent whether they're speaking to Alexa, texting via iMessage, or (in the future) using a web dashboard. The conversation is continuous across surfaces, a voice turn can reference an SMS from 2 minutes ago, and the agent can push links to the phone while speaking aloud.
2. System Architecture
2.1 Three-layer stack
USER DEVICES
├── iPhone (iMessage/SMS via Twilio)
├── Alexa (Echo, Fire TV, Alexa app)
└── (future: web dashboard, iOS app)
│
▼
┌─────────────────────────────────────┐
│ GATEWAY SERVICE │
│ (packages/gateway, Express) │
│ │
│ /webhook/sms ← Twilio webhook │
│ /alexa/skill ← Amazon Alexa │
│ /alexa/oauth/* ← Account linking │
│ │
│ Responsibilities: │
│ • Validate inbound requests │
│ • Resolve user identity │
│ • Forward to agent │
│ • Format voice output │
└──────────────┬──────────────────────┘
│ HTTP POST /agent/voice
│ HTTP POST /agent/sms
▼
┌─────────────────────────────────────┐
│ AGENT SERVICE │
│ (packages/agent, per-user on │
│ Railway, one container per user) │
│ │
│ /agent/voice → channel='voice' │
│ /agent/sms → channel='sms' │
│ │
│ On each request: │
│ 1. Load core memories (top 150) │
│ 2. Load conversation history │
│ (last 10 turns / 30 min, │
│ cross-surface) │
│ 3. Load custom MCP servers │
│ 4. Assemble system prompt │
│ 5. Call Claude Agent SDK │
│ 6. Log turn to DB (channel-tagged) │
│ 7. Return reply │
└──────────────┬──────────────────────┘
│ stdio (child process)
▼
┌─────────────────────────────────────┐
│ MCP SERVER (ajeris-tools) │
│ (packages/agent/src/mcp) │
│ │
│ 40+ built-in tools across: │
│ • Memory (save/recall/search/ │
│ list/forget) │
│ • Gmail (inbox/read/send/reply/ │
│ search/summarize) │
│ • Calendar (today/next/create/ │
│ cancel/free-time) │
│ • Uber (rides + Eats deep links) │
│ • Hue (status/rooms/lights/ │
│ scenes/activate) │
│ • Spotify (play/pause/skip/queue/ │
│ devices/transfer/resume) │
│ • YouTube (search/info/channel/ │
│ subscriptions/summarize/play) │
│ • Apple Music (search/playlists/ │
│ add-to-library/create/play) │
│ • DoorDash (restaurants/food/ │
│ save-order/reorder) │
│ • Slack (channels/read/unread/ │
│ send/search/react) │
│ • push_to_phone (cross-surface │
│ content delivery via SMS) │
│ + WebSearch + WebFetch (built-in) │
└─────────────────────────────────────┘
+
┌─────────────────────────────────────┐
│ CUSTOM MCP SERVERS │
│ (user's own, bring-your-own) │
│ │
│ Example: arvexi-ops (70+ tools) │
│ • Salesforce (pipeline, leads) │
│ • Sentry (errors, issues) │
│ • PostHog (active users, funnels) │
│ • BetterStack (uptime monitors) │
│ • Google Drive, Sheets, Calendar │
│ • Apollo.io (prospect research) │
│ • SEC EDGAR (financial filings) │
│ • DocuSign, Stripe, LinkedIn, X │
│ │
│ Any MCP server: stdio or HTTP │
│ Config stored in user_mcp_servers │
│ Credentials encrypted (pgcrypto) │
└─────────────────────────────────────┘
│
▼
┌─────────────────────────────────────┐
│ POSTGRESQL DATABASE │
│ (Prisma ORM, RLS-enforced, │
│ pgcrypto encryption) │
│ │
│ Tables: │
│ • users (profile, home address, │
│ timezone, agent config) │
│ • oauth_tokens (Google, Spotify, │
│ Apple Music, Hue, Slack, │
│ encrypted) │
│ • core_memories (facts, │
│ preferences, routines) │
│ • conversations (channel-tagged, │
│ cross-surface history) │
│ • alexa_user_links (identity │
│ mapping for tokenless requests) │
│ • user_mcp_servers (custom MCP │
│ configs, encrypted env/headers) │
│ • pending_actions, alexa_auth_ │
│ codes, alexa_verification_codes │
└─────────────────────────────────────┘
2.2 Data flow: SMS turn
User sends iMessage "what's on my calendar"
→ Twilio receives SMS
→ POST to gateway /webhook/sms
→ Gateway validates Twilio signature
→ Gateway looks up user by phone number
→ Gateway forwards to agent POST /agent/sms
→ Agent loads: memories + history + custom MCP servers
→ Agent assembles system prompt (channel='sms')
→ Agent calls Claude SDK query() with:
- System prompt (personality + surface inventory + memories)
- User message (history preamble + current message)
- MCP servers (ajeris-tools + custom servers)
→ Claude selects calendar_today tool
→ MCP server calls Google Calendar API
→ Tool returns event list with IDs
→ Claude composes SMS-formatted reply (emoji OK, markdown OK)
→ Agent logs turn to conversations (channel='sms')
→ Agent sends reply via Twilio SMS
→ User receives iMessage reply
2.3 Data flow: Voice turn
User says "hey my agent, what's on my calendar"
→ Alexa ASR transcribes speech
→ Alexa NLU matches AjerisCatchAllIntent
→ POST to gateway /alexa/skill (via ngrok/production URL)
→ Gateway validates request (signature or dev bypass)
→ Gateway resolves user:
1A. JWT in person/user/session accessToken → verify → upsert AlexaUserLink
1B. If no token → lookup AlexaUserLink by Alexa account ID
1C. If no mapping → return "please link your account"
→ Gateway forwards query to agent POST /agent/voice
→ Agent loads: memories + history + custom MCP servers
→ Agent assembles system prompt (channel='voice')
→ Agent calls Claude SDK query() with unified tools
→ Claude selects calendar_today tool → Google Calendar API
→ Claude composes voice-formatted reply (NO markdown, NO emoji, NO URLs)
→ Agent logs turn to conversations (channel='voice')
→ Agent returns reply JSON
→ Gateway applies formatForVoice():
- stripMarkdownForVoice() (bold, headers, lists → spoken text)
- stripUrlsForVoice() (URLs → "(link sent to your phone)")
- stripEmojiForVoice() (emoji → removed, prevents TTS reading them)
→ Gateway wraps in Alexa response envelope
→ Alexa TTS speaks the reply
2.4 Data flow: Cross-surface (voice + SMS in one conversation)
Turn 1 (SMS): "remember that my flight is at 3pm tomorrow"
→ Agent saves to core_memories
→ Agent replies on SMS: "Got it, flight at 3pm tomorrow"
→ Logged to conversations with channel='sms'
Turn 2 (Voice, 10 minutes later): "when is my flight?"
→ Agent loads conversation history from DB
→ Sees Turn 1 (SMS): "remember that my flight is at 3pm"
→ Agent also loads core_memories
→ Sees: [fact] flight at 3pm tomorrow
→ Claude answers from BOTH sources
→ Voice: "Your flight is at 3pm tomorrow"
→ Logged to conversations with channel='voice'
2.5 Data flow: Bring-your-own MCP (arvexi example)
Voice: "give me a quick briefing on arvexi"
→ Gateway → Agent
→ Agent loads custom MCP servers from user_mcp_servers table
→ Finds "Arvexi Ops" (stdio, bash -c "cd /arvexi && npx tsx mcp/src/index.ts")
→ Decrypts env vars (81 keys: Salesforce, Sentry, PostHog, etc.)
→ Claude SDK spawns TWO MCP servers:
1. ajeris-tools (30+ built-in tools)
2. arvexi-ops (70+ custom tools)
→ Claude sees ~100 tools total
→ For "briefing" query, Claude calls:
- daily_briefing (arvexi-ops) → "11 orgs, 5006 leases"
- uptime_status (arvexi-ops) → BetterStack → "all green"
- posthog_active_users (arvexi-ops) → PostHog → "4 users today"
- sentry_errors (arvexi-ops) → Sentry → "2 slow DB queries"
→ Claude composes spoken briefing combining all 4 tool results
→ Voice: "Platform, 11 active orgs, 5006 leases. Uptime, all green.
Users, 4 today. Errors, slow DB query, 252 events since March..."
2.6 Data flow: push_to_phone (cross-surface delivery)
Voice: "book me an uber to the airport"
→ Agent calls uber_request_ride → gets deep link URL
→ System prompt rule: "NEVER speak a URL on voice"
→ Agent calls push_to_phone tool with the URL
→ push_to_phone sends SMS to user's phone via Twilio
→ Agent replies on voice: "Texted you the link, tap to confirm"
→ User receives SMS with tappable Uber deep link
→ User taps → Uber app opens with destination prefilled
3. Connected Services
3.1 Built-in integrations (OAuth during onboarding)
| Service | Auth Method | Tools | What the user says |
|---|---|---|---|
| Gmail | Google OAuth 2.0 | inbox, read, send, reply, search, summarize | "check my email", "reply to Alice" |
| Calendar | Google OAuth 2.0 | today, next, create, cancel, free-time | "what's on my calendar", "cancel the run" |
| Spotify | OAuth 2.0 + refresh | play, pause, skip, queue, devices, transfer, resume | "play John Legend", "pause" |
| Apple Music | MusicKit JWT + user token | search, playlists, add, create, play | "play Taylor Swift" |
| Hue | Cloud2Cloud OAuth | status, rooms, lights, scenes, activate | "turn off the lights", "dim to 50%" |
| Uber | Deep link (no OAuth) | request-ride, eats-search | "uber to the airport", "order chipotle" |
| YouTube | OAuth 2.0 | search, info, channel, subscriptions, summarize, play, open | "find a recipe video" |
| DoorDash | Deep link + memory | restaurants, food, save-order, reorder | "reorder my usual" |
| Slack | User token (xoxp-) | channels, read, unread, send, search, react | "check my Slack", "message Dave" |
3.2 Cross-surface tools
| Tool | Purpose |
|---|---|
| push_to_phone | Send content (URLs, long text) to the user's phone from any surface |
| WebSearch | Real-time web search via Anthropic's built-in tool |
| WebFetch | Fetch and read web pages |
| Memory (save/recall/search/list/forget) | Persistent user knowledge across all turns and surfaces |
3.3 Custom MCP servers (bring-your-own)
Any MCP server the user connects. Example with arvexi-ops:
| Tool namespace | Source API | Example tools |
|---|---|---|
| research | SEC EDGAR, Apollo.io | sec_xbrl_frames, apollo_search_people |
| pipeline | Salesforce | sf_get_pipeline, sf_create_opportunity |
| billing | Stripe, DocuSign | create_invoice, create_envelope |
| gmail | Google Gmail | gmail_list_inbox (arvexi mailboxes) |
| ops | Sentry, BetterStack, PostHog | sentry_errors, uptime_status, posthog_active_users |
| content | YouTube, LinkedIn, X | youtube_upload, linkedin_post, x_post_tweet |
4. Omnichannel Delivery System
4.1 Delivery surfaces
type DeliverySurface =
| { kind: 'sms'; phoneNumber: string; platformNumber: string }
| { kind: 'voice'; device: 'alexa' | 'phone-call' }
| { kind: 'push'; pushToken: string; platform: 'ios' | 'android' } // future
| { kind: 'email'; address: string } // future
| { kind: 'imessage-rich'; handle: string }; // future4.2 How the agent picks the right surface
The system prompt includes a surface inventory:
This user is reachable through these delivery surfaces:
- sms: text message to the phone number ending in 9966 (tappable links, emoji, async ok)
- voice: Alexa skill (spoken TTS, no visuals, no taps)
The CURRENT turn arrived via: voice.
Rules:
- Voice turn → URL in reply: Agent calls push_to_phone, says "sent you a link"
- Voice turn → long list: Agent summarizes aloud, push_to_phone sends full list
- SMS turn → link: Agent embeds link directly in SMS reply (user taps on phone)
- Either turn → follow-up references other surface: History preamble carries context
4.3 Cross-surface conversation history
(recent conversation, oldest first)
[voice, ~8m ago] user: what's on my calendar today
[voice, ~8m ago] agent: You've got one thing, a run from 6 to 7.
[sms, ~2m ago] user: cancel the run
[sms, ~2m ago] agent: Done, cancelled.
(current turn, via voice)
when is my next meeting?
Loaded from conversations table (channel-tagged). Windowed: last 10
turns OR last 30 minutes, whichever is smaller. Lives in the user
message (not system prompt) to preserve prompt caching.
5. Security Model
| Layer | Mechanism |
|---|---|
| SMS authentication | Twilio HMAC signature validation |
| Alexa authentication | JWT verification + AlexaUserLink mapping |
| Database isolation | Row-Level Security (RLS) per user |
| Token storage | pgcrypto encryption for all OAuth tokens |
| Custom MCP credentials | Encrypted env vars + headers in user_mcp_servers |
| MCP subprocess isolation | Each server gets ONLY its own env, not the parent's |
| Voice safety net | formatForVoice strips markdown, emoji, URLs before TTS |
| Financial safety | System prompt: CONFIRM required for actions >$100 |
| Process resilience | unhandledRejection handler prevents single-request crashes |
| Gateway hardening | Defensive body validation, top-level try/catch wrapper |
6. Database Schema
| Table | Purpose | Key Fields |
|---|---|---|
users | User profile | phone, agent_name, agent_tone, home_address/lat/lng/timezone |
oauth_tokens | Encrypted service tokens | service, access_token, refresh_token, expires_at |
core_memories | Persistent knowledge | category, content, source, times_recalled |
conversations | Turn log (cross-surface) | role, content, channel, model_used, tokens |
alexa_user_links | Alexa account → internal user | alexa_user_id (PK), internal_user_id, person_id |
user_mcp_servers | Custom MCP configs | transport, command/url, env_encrypted, tool_schema_hash |
alexa_auth_codes | OAuth code grant | code, user_id, redirect_uri, expires_at |
alexa_verification_codes | Phone verification | phone_number, code, attempts, expires_at |
pending_actions | Confirmation-gated actions | action_type, payload, status, expires_at |
7. Model Routing
function pickModelTier(message: string): 'haiku' | 'sonnet' {
// Haiku (fast, cheap) for:
// - Smart home: "turn off lights", "dim", "scene"
// - Music: "play X", "pause", "next", "volume"
// - Simple lookups: "what's on my calendar", "any emails"
// - Confirmations: "yes", "no", "ok"
// - Memory: "remember X", "what's my Y"
// Sonnet (powerful, reasoning) for:
// - Booking: "book an uber", "schedule a meeting"
// - Composition: "draft an email", "reply to X"
// - Planning: "help me plan", "what should I do"
// - Long messages (>160 chars)
// - Everything else (default)
}8. Tech Stack
| Layer | Technology |
|---|---|
| Runtime | Node.js + TypeScript |
| AI | Claude Agent SDK, Haiku 4.5 + Sonnet 4.6 |
| Tools | MCP protocol (stdio + Streamable HTTP) |
| Database | PostgreSQL + Prisma + pgcrypto + RLS |
| SMS | Twilio Messaging Service |
| Voice | Alexa Skills Kit (custom HTTPS endpoint) |
| Gateway | Express.js |
| Agent | Express.js (per-user container) |
| Hosting | Railway (containers) + ngrok (dev) |
| Testing | Vitest (371 tests, 0 failures) |