Alexa+ integration

API surface Ajeris exposes to Amazon's Multi-Agent SDK: capabilities, request format, and authentication.

Ajeris: API Documentation

Product: Ajeris, a specialized personal productivity agent. Category: Personal assistant / productivity / multi-agent. Status: in production for invite-only users. Classic Alexa Skills Kit skill live and serving a test cohort. Contact: hi@ajeris.com.

This document is the public API surface of the Ajeris agent platform. It describes the capabilities, integration model, authentication, and response format that would be exposed to Alexa+ via the Multi-Agent SDK when granted access.

What Ajeris is

Ajeris is a specialized productivity agent. Unlike a general chatbot, it is purpose-built to manage a user's day-to-day operational life across a small number of high-value verticals:

Email (Gmail): summarize unread, search, send, reply, and flag important messages.
Calendar (Google Calendar): check schedule, create events, find free time, cancel events.
Smart home (Philips Hue via Cloud2Cloud): control lights, scenes, and rooms.
Music (Spotify and Apple Music): play, pause, queue, search by artist or mood.
Ride-hailing (Uber): generate deep links with pre-filled origin and destination.
Food delivery (DoorDash): reorder "usual" items and save recurring orders.
Memory (internal): remember preferences, allergies, routines, and recall them later.
Financial awareness (Plaid + credit bureau): monitor credit scores, detect utilization spikes, flag payment issues, propose payment plans.
Communication (Slack): send messages and check channels.
Drive and documents (Google Drive): create Docs and Sheets, list files.
Reminders and scheduled tasks: user-defined recurring checks, notifications, and daily reflections.

Each capability maps to a named tool (technically: an MCP tool registered to the user's per-user container). New capabilities can be added by installing additional MCP servers without any change to the core agent.

The agent has a named personality ("Jarvis" by default, user-configurable) and a channel-aware response style. The same agent serves SMS (primary), Alexa voice (classic ASK today), and will serve Alexa+ (when Multi-Agent SDK access is granted) and future push/web surfaces.

Architecture overview

                  +---------------+
                  |   Alexa+      |
                  | (Multi-Agent  |
                  |   router)     |
                  +-------+-------+
                          |
                          v
+--------+    +-----------+-----------+    +--------+
|  SMS   |--->|    Ajeris Gateway     |<---| Twilio |
+--------+    |   (signature verify,  |    +--------+
              |    account linking,   |
              |    routing, fallback) |
              +-----------+-----------+
                          |
                          v
               +----------+-----------+
               |  Per-user agent      |
               |  container (Railway) |
               +----------+-----------+
                          |
              +-----------+----------+
              |  MCP child process   |
              |   (tools: email,     |
              |    calendar, music,  |
              |    smart home, ...)  |
              +----------+-----------+
                         |
                         v
                 +-------+-------+
                 | External APIs |
                 | Gmail, Cal,   |
                 | Hue, Spotify, |
                 | Plaid, etc.   |
                 +---------------+

Gateway: a single HTTPS endpoint receives all external traffic. Responsible for request authentication (signature verification, JWT account linking), rate limiting, and routing to the correct user's agent container.
Agent container: per-user isolated process running the Claude Agent SDK with an attached MCP child. Each user has their own container, their own memory store, their own OAuth tokens. Multi-tenancy isolation is enforced at the OS level, not just the database.
MCP child: exposes tools. Each tool is a thin adapter over an external API. New tools can be added per-user without changing the core agent.

Integration surface for Alexa+ Multi-Agent SDK

When granted access to the Multi-Agent SDK, Ajeris will register a single agent that Alexa+'s LLM can route to for productivity-category queries. The registration will expose a capability catalog so Alexa+ knows what Ajeris can do:

Capabilities (proposed registration)

Capability ID	Description	Example utterances
`ajeris.email`	Read, search, summarize, send, and reply to emails	"summarize my unread email", "did I get anything from Sarah"
`ajeris.calendar`	Check schedule, create events, find free time	"what's on my calendar", "schedule a call with John tomorrow at 2"
`ajeris.smart_home`	Control Philips Hue lights, scenes, rooms	"dim the bedroom", "turn off the kitchen"
`ajeris.music`	Play music across Spotify and Apple Music	"play John Legend", "something upbeat"
`ajeris.rides`	Generate Uber deep links with context	"get me an Uber home"
`ajeris.food`	DoorDash reorders and saved favorites	"reorder my usual from Chipotle"
`ajeris.memory`	Recall and save user preferences, facts, routines	"remember my wife's birthday is March 3", "what's my home address"
`ajeris.finance`	Credit monitoring, utilization alerts, payment triage	"how's my credit", "any bills coming up"
`ajeris.slack`	Read and send Slack messages	"send a message to #team in Slack saying I'll be late"
`ajeris.drive`	Create Google Docs and Sheets, list files	"create a doc titled Q2 plan"
`ajeris.conversation`	Free-form dialog with full context	any utterance that doesn't cleanly map to the above

Alexa+'s LLM router should invoke ajeris.conversation as the default for anything ambiguous. Our agent is self-sufficient for routing inside its own conversation: we load per-user MCP tools, cross-surface conversation history, memories, skills, and daily logs on every turn and let Claude choose the appropriate tool.

Request format (proposed)

POST /agent/voice
Content-Type: application/json
Authorization: Bearer <Alexa+ per-user access token>
 
{
  "text": "summarize my unread email",
  "context": {
    "deviceType": "ECHO_SHOW",
    "locale": "en-US",
    "surface": "voice"
  }
}

Our existing /agent/voice route already accepts this shape (minus the Bearer header, which will be added). We already validate the shared-secret header from the gateway in dev; the Multi-Agent SDK authentication will replace that with standard Bearer tokens.

Response format (proposed)

200 OK
Content-Type: application/json
 
{
  "replyText": "You have 100 unread. The top three need attention: Chase about a fraud alert, Resend about a failed $20 payment, and GitHub with a PR review request. Want me to read the Chase one?",
  "modelUsed": "haiku",
  "tokensIn": 16,
  "tokensOut": 115,
  "totalCostUsd": 0.003
}

replyText is ready for direct TTS. No markdown, no bullets, no URLs (URLs are side-channeled via push-to-phone to the user's linked phone number). Usage and cost fields are informational and may be elided from the Alexa+ API depending on their preferences.

Session semantics

Ajeris maintains its own cross-surface conversation history in the database, keyed by user ID. Within the 30-minute frozen-session window, subsequent turns reuse a cached system prompt (prefix-cache optimized for Anthropic's LLM), so repeat turns cost significantly less than first turns. This means the Multi-Agent SDK does not need to pass prior-turn context to Ajeris: we maintain it ourselves.

Authentication

Account linking (end user)

Classic ASK skill today uses OAuth 2.0 authorization-code flow:

User enables "Ajeris" in the Alexa app.
Alexa redirects to our /alexa/oauth/authorize endpoint.
User enters their phone number. We verify ownership via SMS.
We mint a 1-year JWT containing their internal user ID.
Alexa stores the JWT as the access token.
All subsequent skill invocations arrive with the JWT in context.System.user.accessToken.
Our gateway verifies the JWT on every request and resolves to the user's container.

This same flow is expected to work with the Multi-Agent SDK: standard OAuth 2.0 is the universal pattern.

Gateway-to-agent (internal)

Per-user agent containers accept HTTP requests only from the gateway, authenticated via a shared secret (GATEWAY_INTERNAL_SECRET) set per-container at provisioning time. End users and external services never reach the agent directly.

Operational characteristics

Latency: typical LLM turn completes in 4-9 seconds. Simple queries (weather, time, a short calendar check) under 5 seconds. Complex queries (research, email-heavy summaries) may exceed 9 seconds. Turns that miss the 9-second voice deadline are delivered via SMS as a fallback.
Availability: per-user containers run on Railway with health checks. Gateway runs on Railway or a similar managed platform.
Multi-tenancy: each user has their own container, their own OAuth tokens, their own memory store, and their own MCP child. Users cannot see each other's data.
Cost to Amazon integration: zero direct cost. All LLM and tool-call costs are on us.
Rate limits: per-user limits enforced at the gateway. No batch or scraping endpoints exposed.

Privacy and data handling

Data we collect: user's phone number, OAuth tokens for linked services (Gmail, Calendar, Hue, Spotify, Apple Music, Plaid, Slack, etc.), conversation history, and saved memories.
Data we do NOT collect: passwords (we use OAuth only), payment details (we use Plaid for read-only financial data), biometric data, contact lists beyond what the user's email/calendar contains.
Retention: conversation history stored per-user, encrypted at rest, user can delete at any time via voice or SMS command ("forget everything", "delete my data"). See docs/data-retention-policy.md.
Third-party sharing: none. Per-user data never leaves the user's container except via calls to the services they authorized.
Privacy policy: https://ajeris.com/privacy
Terms of use: https://ajeris.com/terms

Roadmap and alignment with Alexa+ direction

Current: classic ASK skill, single-user test cohort, SMS as primary.
Near-term (when Multi-Agent SDK access granted): migrate Alexa voice path to Multi-Agent, keep classic skill for non-Alexa+ devices.
Multi-user scale: per-user Railway containers provisioned on sign-up. Target: 1000 users, then iterate.
Future surfaces: native mobile push surface, potentially an OpenAI Realtime / Claude voice-mode continuous-audio experience for deep conversational sessions. Alexa+ will remain the "quick hands-free" surface.

Why this fits the Alexa+ Multi-Agent SDK

Quoting Amazon's Feb 2025 announcement:

"We see a future of specialized agents that will be extremely smart in very specific areas, like advanced tutoring, specialized productivity, and research. Alexa+ will interact with these agents on behalf of customers."

Ajeris is precisely a specialized productivity agent. It has a named personality (Jarvis), a scoped capability set (the 11 capabilities above), and a coordination layer (MCP) that Amazon is independently converging toward in its own architecture. The Multi-Agent SDK description in the blog post reads like a product spec for Ajeris.

We are ready to integrate. The existing HTTP endpoints already accept and return the shapes Amazon has hinted at. The existing account-linking flow already follows OAuth 2.0 conventions. The existing conversation-history and memory systems mean Alexa+ does not need to re-send context on every turn. All that remains is the registration schema and certification, which will be provided upon access approval.

Contact for Amazon

Primary: Sarah Mitchell, hi@ajeris.com Existing classic ASK skill ID: amzn1.ask.skill.2b8710ed-af0e-4b31-bc57-9af1cc7ab878 Skill dashboard: https://developer.amazon.com/alexa/console/ask/build/custom/amzn1.ask.skill.2b8710ed-af0e-4b31-bc57-9af1cc7ab878

This documentation is published at: (TBD public URL, to be added when we host it)