Ajeris: API Documentation
Product: Ajeris, a specialized personal productivity agent. Category: Personal assistant / productivity / multi-agent. Status: in production for invite-only users. Classic Alexa Skills Kit skill live and serving a test cohort. Contact: hi@ajeris.com.
This document is the public API surface of the Ajeris agent platform. It describes the capabilities, integration model, authentication, and response format that would be exposed to Alexa+ via the Multi-Agent SDK when granted access.
What Ajeris is
Ajeris is a specialized productivity agent. Unlike a general chatbot, it is purpose-built to manage a user's day-to-day operational life across a small number of high-value verticals:
- Email (Gmail): summarize unread, search, send, reply, and flag important messages.
- Calendar (Google Calendar): check schedule, create events, find free time, cancel events.
- Smart home (Philips Hue via Cloud2Cloud): control lights, scenes, and rooms.
- Music (Spotify and Apple Music): play, pause, queue, search by artist or mood.
- Ride-hailing (Uber): generate deep links with pre-filled origin and destination.
- Food delivery (DoorDash): reorder "usual" items and save recurring orders.
- Memory (internal): remember preferences, allergies, routines, and recall them later.
- Financial awareness (Plaid + credit bureau): monitor credit scores, detect utilization spikes, flag payment issues, propose payment plans.
- Communication (Slack): send messages and check channels.
- Drive and documents (Google Drive): create Docs and Sheets, list files.
- Reminders and scheduled tasks: user-defined recurring checks, notifications, and daily reflections.
Each capability maps to a named tool (technically: an MCP tool registered to the user's per-user container). New capabilities can be added by installing additional MCP servers without any change to the core agent.
The agent has a named personality ("Jarvis" by default, user-configurable) and a channel-aware response style. The same agent serves SMS (primary), Alexa voice (classic ASK today), and will serve Alexa+ (when Multi-Agent SDK access is granted) and future push/web surfaces.
Architecture overview
+---------------+
| Alexa+ |
| (Multi-Agent |
| router) |
+-------+-------+
|
v
+--------+ +-----------+-----------+ +--------+
| SMS |--->| Ajeris Gateway |<---| Twilio |
+--------+ | (signature verify, | +--------+
| account linking, |
| routing, fallback) |
+-----------+-----------+
|
v
+----------+-----------+
| Per-user agent |
| container (Railway) |
+----------+-----------+
|
+-----------+----------+
| MCP child process |
| (tools: email, |
| calendar, music, |
| smart home, ...) |
+----------+-----------+
|
v
+-------+-------+
| External APIs |
| Gmail, Cal, |
| Hue, Spotify, |
| Plaid, etc. |
+---------------+
- Gateway: a single HTTPS endpoint receives all external traffic. Responsible for request authentication (signature verification, JWT account linking), rate limiting, and routing to the correct user's agent container.
- Agent container: per-user isolated process running the Claude Agent SDK with an attached MCP child. Each user has their own container, their own memory store, their own OAuth tokens. Multi-tenancy isolation is enforced at the OS level, not just the database.
- MCP child: exposes tools. Each tool is a thin adapter over an external API. New tools can be added per-user without changing the core agent.
Integration surface for Alexa+ Multi-Agent SDK
When granted access to the Multi-Agent SDK, Ajeris will register a single agent that Alexa+'s LLM can route to for productivity-category queries. The registration will expose a capability catalog so Alexa+ knows what Ajeris can do:
Capabilities (proposed registration)
| Capability ID | Description | Example utterances |
|---|---|---|
ajeris.email | Read, search, summarize, send, and reply to emails | "summarize my unread email", "did I get anything from Sarah" |
ajeris.calendar | Check schedule, create events, find free time | "what's on my calendar", "schedule a call with John tomorrow at 2" |
ajeris.smart_home | Control Philips Hue lights, scenes, rooms | "dim the bedroom", "turn off the kitchen" |
ajeris.music | Play music across Spotify and Apple Music | "play John Legend", "something upbeat" |
ajeris.rides | Generate Uber deep links with context | "get me an Uber home" |
ajeris.food | DoorDash reorders and saved favorites | "reorder my usual from Chipotle" |
ajeris.memory | Recall and save user preferences, facts, routines | "remember my wife's birthday is March 3", "what's my home address" |
ajeris.finance | Credit monitoring, utilization alerts, payment triage | "how's my credit", "any bills coming up" |
ajeris.slack | Read and send Slack messages | "send a message to #team in Slack saying I'll be late" |
ajeris.drive | Create Google Docs and Sheets, list files | "create a doc titled Q2 plan" |
ajeris.conversation | Free-form dialog with full context | any utterance that doesn't cleanly map to the above |
Alexa+'s LLM router should invoke ajeris.conversation as the default for anything ambiguous. Our agent is self-sufficient for routing inside its own conversation: we load per-user MCP tools, cross-surface conversation history, memories, skills, and daily logs on every turn and let Claude choose the appropriate tool.
Request format (proposed)
POST /agent/voice
Content-Type: application/json
Authorization: Bearer <Alexa+ per-user access token>
{
"text": "summarize my unread email",
"context": {
"deviceType": "ECHO_SHOW",
"locale": "en-US",
"surface": "voice"
}
}Our existing /agent/voice route already accepts this shape (minus the Bearer header, which will be added). We already validate the shared-secret header from the gateway in dev; the Multi-Agent SDK authentication will replace that with standard Bearer tokens.
Response format (proposed)
200 OK
Content-Type: application/json
{
"replyText": "You have 100 unread. The top three need attention: Chase about a fraud alert, Resend about a failed $20 payment, and GitHub with a PR review request. Want me to read the Chase one?",
"modelUsed": "haiku",
"tokensIn": 16,
"tokensOut": 115,
"totalCostUsd": 0.003
}replyText is ready for direct TTS. No markdown, no bullets, no URLs (URLs are side-channeled via push-to-phone to the user's linked phone number). Usage and cost fields are informational and may be elided from the Alexa+ API depending on their preferences.
Session semantics
Ajeris maintains its own cross-surface conversation history in the database, keyed by user ID. Within the 30-minute frozen-session window, subsequent turns reuse a cached system prompt (prefix-cache optimized for Anthropic's LLM), so repeat turns cost significantly less than first turns. This means the Multi-Agent SDK does not need to pass prior-turn context to Ajeris: we maintain it ourselves.
Authentication
Account linking (end user)
Classic ASK skill today uses OAuth 2.0 authorization-code flow:
- User enables "Ajeris" in the Alexa app.
- Alexa redirects to our
/alexa/oauth/authorizeendpoint. - User enters their phone number. We verify ownership via SMS.
- We mint a 1-year JWT containing their internal user ID.
- Alexa stores the JWT as the access token.
- All subsequent skill invocations arrive with the JWT in
context.System.user.accessToken. - Our gateway verifies the JWT on every request and resolves to the user's container.
This same flow is expected to work with the Multi-Agent SDK: standard OAuth 2.0 is the universal pattern.
Gateway-to-agent (internal)
Per-user agent containers accept HTTP requests only from the gateway, authenticated via a shared secret (GATEWAY_INTERNAL_SECRET) set per-container at provisioning time. End users and external services never reach the agent directly.
Operational characteristics
- Latency: typical LLM turn completes in 4-9 seconds. Simple queries (weather, time, a short calendar check) under 5 seconds. Complex queries (research, email-heavy summaries) may exceed 9 seconds. Turns that miss the 9-second voice deadline are delivered via SMS as a fallback.
- Availability: per-user containers run on Railway with health checks. Gateway runs on Railway or a similar managed platform.
- Multi-tenancy: each user has their own container, their own OAuth tokens, their own memory store, and their own MCP child. Users cannot see each other's data.
- Cost to Amazon integration: zero direct cost. All LLM and tool-call costs are on us.
- Rate limits: per-user limits enforced at the gateway. No batch or scraping endpoints exposed.
Privacy and data handling
- Data we collect: user's phone number, OAuth tokens for linked services (Gmail, Calendar, Hue, Spotify, Apple Music, Plaid, Slack, etc.), conversation history, and saved memories.
- Data we do NOT collect: passwords (we use OAuth only), payment details (we use Plaid for read-only financial data), biometric data, contact lists beyond what the user's email/calendar contains.
- Retention: conversation history stored per-user, encrypted at rest, user can delete at any time via voice or SMS command ("forget everything", "delete my data"). See
docs/data-retention-policy.md. - Third-party sharing: none. Per-user data never leaves the user's container except via calls to the services they authorized.
- Privacy policy: https://ajeris.com/privacy
- Terms of use: https://ajeris.com/terms
Roadmap and alignment with Alexa+ direction
- Current: classic ASK skill, single-user test cohort, SMS as primary.
- Near-term (when Multi-Agent SDK access granted): migrate Alexa voice path to Multi-Agent, keep classic skill for non-Alexa+ devices.
- Multi-user scale: per-user Railway containers provisioned on sign-up. Target: 1000 users, then iterate.
- Future surfaces: native mobile push surface, potentially an OpenAI Realtime / Claude voice-mode continuous-audio experience for deep conversational sessions. Alexa+ will remain the "quick hands-free" surface.
Why this fits the Alexa+ Multi-Agent SDK
Quoting Amazon's Feb 2025 announcement:
"We see a future of specialized agents that will be extremely smart in very specific areas, like advanced tutoring, specialized productivity, and research. Alexa+ will interact with these agents on behalf of customers."
Ajeris is precisely a specialized productivity agent. It has a named personality (Jarvis), a scoped capability set (the 11 capabilities above), and a coordination layer (MCP) that Amazon is independently converging toward in its own architecture. The Multi-Agent SDK description in the blog post reads like a product spec for Ajeris.
We are ready to integrate. The existing HTTP endpoints already accept and return the shapes Amazon has hinted at. The existing account-linking flow already follows OAuth 2.0 conventions. The existing conversation-history and memory systems mean Alexa+ does not need to re-send context on every turn. All that remains is the registration schema and certification, which will be provided upon access approval.
Contact for Amazon
Primary: Sarah Mitchell, hi@ajeris.com
Existing classic ASK skill ID: amzn1.ask.skill.2b8710ed-af0e-4b31-bc57-9af1cc7ab878
Skill dashboard: https://developer.amazon.com/alexa/console/ask/build/custom/amzn1.ask.skill.2b8710ed-af0e-4b31-bc57-9af1cc7ab878
This documentation is published at: (TBD public URL, to be added when we host it)