Spotter Agent — Behavior
Surface
| Endpoint | Purpose |
|---|---|
GET /organizations/:orgId/agent/usage | Org daily AI spend snapshot (tier, cap, spent, percent, reset). |
GET /organizations/:orgId/agent/conversations | List caller’s conversations in this org. |
GET /organizations/:orgId/agent/conversations/:conversationId | Full conversation detail with reconstructed messages + tool calls. |
POST /organizations/:orgId/agent/messages | Send a user turn. Returns an SSE stream. |
POST /organizations/:orgId/agent/conversations/:conversationId/confirm/:toolUseId | Approve / reject a pending destructive tool. Returns an SSE stream. |
POST /organizations/:orgId/agent/conversations/:conversationId/pick/:toolUseId | Resolve a disambiguation_pending candidate pick. Returns an SSE stream. |
POST /organizations/:orgId/agent/conversations/:conversationId/undo/:toolUseId | Invoke the declared inverse of a previously-executed write. Bypasses confirmation; logs both events. |
All endpoints require a Clerk session and active org membership. The send/confirm/pick endpoints additionally reject member role (v1 staff-only).
State machine
new user message
│
▼
┌─────────────────────────────────┐
│ conversation_started (if new) │
└─────────────────────────────────┘
│
preCheck org daily $ cap
│
┌──────┴──────┐
│ │
over cap ok
│ │
▼ ▼
error+done persist user turn
│
▼
ambient RAG (800ms budget)
│
▼
compactIfNeeded (3s budget)
│
▼
┌─────── driveModelLoop (≤ 6 turns) ───────┐
│ │
│ validateReplay → repair if needed │
│ withCachedTail (4th cache breakpoint) │
│ anthropic.messages.stream │
│ emit text_delta │
│ persist assistant message │
│ │
│ stop_reason == end_turn? │
│ yes → exit loop │
│ no → processToolUses │
│ ↓ │
│ ┌────────┼────────┐ │
│ │ │ │ │
│ normal user_picker destructive │
│ (run (suspend) (suspend) │
│ inline) │
│ │ │ │ │
│ └────────┴────────┘ │
│ ↓ all normal: tool_result blocks → next turn
│ ↓ any picker/destructive: suspend → done
└──────────────────────────────────────────┘
│
▼
finalize: usage + cost +
recordTurn + observability + doneSuspended runs
When a user_picker tool is invoked, or when a destructive write is pending, the orchestrator stops the loop and emits:
disambiguation_pendingwithkind,prompt,candidates[].confirmation_pendingwithrouter,action,input,confirm: 'destructive' | 'always'.
The frontend renders a card; user click triggers POST .../pick/:toolUseId or POST .../confirm/:toolUseId. The new SSE stream resumes the loop by merging all completed tool_results into a single tool_result message and calling driveModelLoop again with the updated history.
Any tools batched after the suspending tool (within the same assistant turn) are also persisted as pending so they can be confirmed sequentially.
MAX_TURNS
MAX_TURNS = 6 per controller call. After six round trips the loop exits; the assistant’s last message stands.
Routers and actions
The tool surface lives in apps/api/src/ai/agent/tools/leaves/*.tools.ts. Each @AgentTool decorator declares:
router,action,summary(model-visible).inputSchema(Zod) — re-validated server-side viameta.inputSchema.parse(rawInput).scopes: { roles: AgentRole[] }— RBAC filter at registry-build time + defence-in-depth check in tool-runner.sideEffects: 'read' | 'write'— drives audit and inverse semantics.confirm: 'never' | 'destructive' | 'always'— gates suspension.audit?—{ resource, actionLabel }writes toaudit_logspost-success.inverse?—{ router, action, buildInput(output) }enables undo.kind?—'normal' | 'user_picker'.user_pickerhandlers are never invoked; the orchestrator suspends and waits for apick.
Routers (apps/api/src/ai/agent/tools/leaves/<file>.tools.ts):
read — 11 actions, all read, never confirm
| Action | Effect | Example prompt |
|---|---|---|
get_current_context | Returns org + caller identity + locale + now. | ”who am I?” |
programs_list | List active programs. | ”what programs do we run?” |
members_search | Search members by name/email/role/status. | ”find Saar” |
members_get | Full member detail by membershipId. | ”show me Dani’s plan + payment status” |
exercises_search | Hybrid (lexical + semantic) exercise library search. | ”find a squat variation” |
exercises_resolve_batch | Batch-resolve movement names → exercise ids in one round-trip; flags ambiguous. | (used internally by workout build) |
ask_user_to_pick | user_picker kind. Suspends; user picks one candidate. | ”which Saar?” |
search_anything | Cross-entity (program/workout/exercise/member). | “find Murph” |
lookup_by_id | Detail by type+id. | (post-search) |
workout_comments_in_program | List recent comments on workouts in a program (unreadOnly filter). | “any new comments in coaching?” |
workout_comments_for_assignment | Full comment thread on a single assignment. | ”what did Dani say on Monday’s WOD?” |
workouts — 4 actions
| Action | Effect | Confirm | Inverse |
|---|---|---|---|
create | Insert a workout (freeform or structured). | never | workouts.delete |
update | Patch a workout. | never | — |
set_sections | Replace all sections + movements wholesale. | never | — |
delete | Soft-delete. | destructive | — |
Example prompt: “build a workout: 3 rounds — 200m run, 30 double-unders”.
programs — 4 actions
| Action | Effect | Confirm | Inverse |
|---|---|---|---|
create | New training program. | never | programs.delete |
update | Patch. | never | — |
delete | Soft-delete. | destructive | — |
enroll_member | Enroll a member in a coaching/feed program. | never | — |
assignments — 8 actions
| Action | Effect | Confirm |
|---|---|---|
assign_personal | 1-on-1 PT assignment for a date. | never |
update | Patch assignment. | never |
set_published | Toggle published flag. | never |
delete | Single assignment. | destructive |
mark_comments_read | Mark comments as read for a coach. | never |
bulk_preview | Dry-run for bulk ops. | never (read) |
bulk_delete | Bulk delete N assignments. | destructive |
bulk_publish | Bulk publish N assignments. | always |
bookings — 3 actions, all read
| Action | Effect |
|---|---|
list_mine | Caller’s bookings (members) — admin-readable. |
attendance_summary | Summary stats. |
attendance_trend | Time-series. |
class_sessions — 9 actions
| Action | Effect | Confirm | Inverse |
|---|---|---|---|
list_for_date | Sessions on a date. | never | — |
create | New class session. | never | class_sessions.cancel |
update | Patch session. | never | — |
publish | Publish. | never | class_sessions.unpublish |
unpublish | Unpublish. | never | — |
cancel | Cancel. | destructive | — |
bulk_preview | Dry-run for bulk ops. | never (read) | — |
bulk_delete | Bulk delete. | destructive | — |
bulk_publish | Bulk publish. | always | — |
Example prompt: “schedule open gym tomorrow 13:00–14:00 published”.
class_types — 1 action
| Action | Effect |
|---|---|
list_for_program | Class types under a program. |
analytics — 12 actions, all read
| Action | Wraps |
|---|---|
revenue_summary | AnalyticsService.getRevenueSummary |
revenue_trend | getRevenueTrend |
members_summary | getMembersSummary |
members_growth | getMembersGrowth |
at_risk_members | getAtRiskMembers |
popular_classes | getPopularClasses |
class_utilization | getClassUtilization |
coach_overview | getCoachOverview |
org_insights | InsightsService.getOrgInsights |
plan_distribution | getPlanDistribution |
members_activation | getMembersActivation |
workouts_summary | getWorkoutsSummary |
tasks — 5 actions
| Action | Effect | Confirm | Inverse |
|---|---|---|---|
list | List tasks. | never (read) | — |
create | New task. | never | tasks.delete |
update | Patch. | never | — |
complete | Quick-complete. | never | — |
delete | Soft delete. | destructive | — |
program_templates — 4 actions
| Action | Effect | Confirm |
|---|---|---|
list | List templates. | never (read) |
create | New template. | never |
apply | Materialize a template into the org. | always |
from_history | Derive a template from historical workouts. | always |
forms — 2 actions, both read (FIT-176 / FIT-184)
| Action | Effect |
|---|---|
list_pending_for_org | Pending form assignments. |
compliance_status_for_member | Per-typeKey compliance status (FIT-158). |
Confirmation flow
- Model emits
tool_useblock for a tool withconfirm: 'destructive'or'always'. - Orchestrator inserts
ai_tool_executionsrow withstatus: 'pending'. - Orchestrator emits SSE
confirmation_pendingwith{ toolUseId, router, action, input, confirm }. - Any subsequent
tool_useblocks in the same assistant message are also persisted pending (they can’t be partially fed back). - Loop suspends; controller emits
donewith usage 0. - User clicks Approve or Reject in the frontend →
POST /confirm/:toolUseId { approved: boolean }. - If approved, runner executes the tool; status transitions
pending → succeeded | failed.tool_completedSSE event emitted. - If rejected, status →
rejected_by_userwith synthetic error{ code: 'rejected_by_user' }. - If other pending tools remain in the same assistant message, the resume returns
doneearly (waiting for more confirmations). - Once all pending tools for the message are resolved, the orchestrator appends a unified
tool_resultuser message and resumes the model loop.
The same merge-and-resume path handles disambiguation_pending picks.
Undo flow
POST .../undo/:toolUseId resolves the inverse tool of a previously-succeeded write:
- Load
ai_tool_executionsby(conversationId, toolUseId). - Reject if not
status: 'succeeded'(422). - Lookup the registry entry; require
meta.inverseis defined. - Build the inverse input via
inverse.buildInput(original.output)— typically{ id: output.id }. - Execute via
ToolRunnerService.execute— bypasses confirmation entirely. - Audit log captures both events: the original and the inverse, linked by their audit rows.
Example: workouts.create returns { id, ... }; the declared inverse is workouts.delete with buildInput: (out) => ({ id: out.id }). A user toast “Workout created — Undo (15s)” calls the undo endpoint; the workout is soft-deleted.
Disambiguation flow (read.ask_user_to_pick)
Triggered by the system prompt when ambient context lists multiple plausible candidates for the same entity:
- Model emits
read.ask_user_to_pick { kind, prompt, candidates: [{ id, label, sublabel?, detail? }, ...] }. - Orchestrator sees
meta.kind === 'user_picker', persists pending, emitsdisambiguation_pending. - Frontend renders a “Which Saar?” card with 2-8 buttons.
- User clicks →
POST .../pick/:toolUseId { id: <pickedId> }. - Orchestrator validates the id is in
candidates, persists{ pickedId, pickedLabel, kind }as the tool output, emitstool_completed, and resumes the model loop.
Audit logging
ToolRunnerService.execute writes an audit_logs row when:
meta.auditis defined, ANDmeta.sideEffects === 'write', AND- The tool succeeded.
Row carries:
actor_clerk_id= caller.action=meta.audit.actionLabel.resource=meta.audit.resource.resource_id= extracted from outputidfield if present.metadata.agent: trueso admin views filter agent vs human.
ai_tool_executions.audit_log_id links the tool run back to its audit row.
Per-org daily $ cap
Lives in apps/api/src/ai/agent/rate-limit.service.ts and is sourced from PLATFORM_TIER_MAP (libs/shared/src/lib/constants/platform-tiers.ts):
| Tier | Daily budget (USD) |
|---|---|
| Lite | $1.00 (1_000_000 micros) |
| Pro | $5.00 |
| Elite | $25.00 |
AgentRateLimitService.preCheck(orgId):
- Read org → tier →
aiDailyBudgetUsdMicros. - Budget
-1→ unmetered, allow. - Else compare to
costTracker.getOrgSpendTodayMicros(orgId)(UTC day bucket). - If
spent >= budget, return{ ok: false, code: 'agent_budget_exceeded', message: 'Your org has reached its AI daily spending limit ($X.XX). It resets at 00:00 UTC. Upgrade your plan for a higher limit.', remainingUsdMicros: 0 }.
AgentController.sendMessage invokes preCheck after SSE headers flush so the client can render the upgrade prompt inline.
AgentRateLimitService.getUsageSnapshot(orgId) returns the same payload the dashboard composer footer reads — used to display “0.40 / $5.00” inline counters.
Per-org/user/day cost tracking
AgentCostTracker.recordTurn upserts into ai_usage_daily keyed on (organization_id, user_id, day) with SQL increments so concurrent turns don’t lose updates. Day key is UTC YYYY-MM-DD.
AgentOrchestrator.finalize computes costUsdMicros per turn:
COST_INPUT_PER_M = 3.0
COST_OUTPUT_PER_M = 15.0
COST_CACHE_READ_PER_M = 0.3 // 10% of input
COST_CACHE_CREATE_PER_M = 3.75 // 125% of inputSum of (tokens / 1M) * rate for each bucket, rounded to micros.
Role-aware upsell
The composer footer (apps/web/src/components/agent/usage-footer.tsx) reads GET /agent/usage and shows:
spent / capfor capped tiers.- A subtle upgrade CTA when
percentUsed >= 0.8. - A blocking modal with localized copy when
agent_budget_exceededflows over SSE.
Member role never sees the composer; staff-only gate at the controller.
Prompt caching architecture
AgentContextBuilder.buildSystem returns three system blocks with cache_control:
[
{ type: 'text', text: STATIC_SYSTEM_PROMPT, cache_control: { type: 'ephemeral', ttl: '1h' } },
{ type: 'text', text: orgContextBlock, cache_control: { type: 'ephemeral', ttl: '1h' } },
{ type: 'text', text: pageContextBlock /* no cache */ },
]AgentOrchestrator.driveModelLoop calls withCachedTail(messages) before every request — adds a cache_control: { type: 'ephemeral' } marker to the trailing message’s last content block. This is the 4th breakpoint (Anthropic allows up to 4):
[ static system ][ org ][ page+ambient ][ tools ][ ...conv history... ][ tail*cache ]
↑1h ↑1h no cache (implicit cache via tools array)
↑ markedResult: every subsequent turn re-reads the entire prefix from cache at 10% of input cost. Cache-creation pays 125% the first time; payback after one re-use.
Why ttl: '1h' rather than 5-min default:
- 5-min cache evicts during quiet afternoon windows.
- Re-creating after eviction costs 125% — worse than not caching at all on a single turn.
- 1h matches the natural cadence of static/org context changes.
Steady-state cache-hit ratio is computed in finalize:
cacheHitRatio = cacheReadTokens / (cacheReadTokens + inputTokens)Healthy target: 0.85–0.95.
Observability
AgentObservabilityService emits typed events to Pino + Sentry breadcrumbs + PostHog (production only):
| Event | Stage |
|---|---|
agent.turn.started | First emit at the controller. Carries endpoint, promptLength. |
agent.turn.completed | finalize. Carries inputTokens, outputTokens, cacheReadTokens, cacheCreationTokens, costUsdMicros, cacheHitRatio. |
agent.turn.failed | logTurnFailure. Carries the classified errorCode. |
agent.tool.executed | Per-tool. Carries router, action, durationMs, outcome, optional errorCode. |
agent.tool.confirmation_pending | Per pending destructive tool. |
agent.tool.disambiguation_pending | Per pending picker tool. |
agent.replay.violation | On replay-integrity violation. Carries violations count, reasons[], repaired. |
agent.storage.snapshot | Daily cron at 03:00. Per-table row counts + bytes. |
scrub() drops content, text, prompt, message, output, input, token, apiKey, secret, password before fan-out. Tokens and durations stay; user text never leaves the server.
Sentry tags: agent.trace_id, agent.org_id, agent.code, agent.endpoint. The trace id is server-generated UUID and surfaces back to the client on errors so QA can paste it into bug reports.
Error classification (error-classifier.ts)
Coarse codes for stable SSE / i18n surface:
agent_disabled,agent_budget_exceeded(from rate-limit),rate_limitedreplay_integrity_violation,tool_execution_not_found,tool_already_resolved,invalid_pickaborted(AbortError)provider_invalid_request(4xx),provider_unauthorized(401/403),provider_overloaded(529 / type),provider_rate_limited(429),provider_unavailable(5xx)internal(catch-all)
Each maps to a localized client string under agent.errors.<code>.
Ambient RAG (rag.service.ts)
On every user turn (skipped on resume), findAmbient({ orgId, query }):
- 800ms timeout, never throws.
- 30-character minimum query length (short prompts skip RAG).
- Up to 3 hits per scope, 5 total across scopes.
- Scopes:
program,workout,exercise,member. - Hybrid: Postgres
ilikefor lexical + Voyagevoyage-multilingual-2embedding for semantic; results blended with an RRF-ish score. - Token extraction strips stopwords (articles, verbs, calendar tokens, entity nouns) before lexical search.
Hits are injected as <ambient_context> in the page-context system block.
Compaction (compaction.service.ts)
- Trigger threshold:
>= 24messages past the latestsystem_noteanchor (or from start). - Keep last 6 messages verbatim.
- Run
claude-haiku-4-5summarizer (3s budget). - Persist the summary as a
system_noterow withpageContext.summarizedThroughMessageId. listMessagesForReplayreads from the latestsystem_noteforward.- Title generation also uses Haiku, 1.5s budget, best-effort.
Replay-integrity guard (replay-validator.ts)
Catches the dominant failure mode: a pending picker / confirmation tool_use whose tool_result was never paired (user typed a free-text reply instead of clicking).
validateReplay(messages) returns { ok, violations[] }. On violation:
validateAndRepairReplay(messages)synthesizes errortool_resultblocks for the orphanedtool_useids so the model can reason about the redirect.AgentObservabilityService.logReplayViolationrecords the violation (PII-scrubbed snapshot of role + block types only).- Loop continues with repaired messages.
SSE protocol (agent-schemas/sse-events.ts)
event: conversation_started\ndata: {"type":"conversation_started","conversationId":"<uuid>"}\n\n
event: text_delta\ndata: {"type":"text_delta","delta":"..."}\n\n
event: tool_started\ndata: {"type":"tool_started","toolUseId":"...","router":"...","action":"...","input":{...}}\n\n
event: tool_completed\ndata: {"type":"tool_completed",..."ok":true,"output":{...},"inverseAvailable":true}\n\n
event: confirmation_pending\ndata: {"type":"confirmation_pending","toolUseId":"...","router":"...","action":"...","input":{...},"confirm":"destructive"}\n\n
event: disambiguation_pending\ndata: {"type":"disambiguation_pending","toolUseId":"...","kind":"member","prompt":"Which Saar?","candidates":[...]}\n\n
event: message_done\ndata: {"type":"message_done","messageId":"<uuid>","stopReason":"end_turn"}\n\n
event: done\ndata: {"type":"done","conversationId":"<uuid>","usage":{...}}\n\n
event: error\ndata: {"type":"error","code":"...","message":"...","traceId":"..."}\n\nThe web reducer in apps/web/src/providers/agent-provider.tsx (and apps/web/src/components/agent/message-list.tsx) reconstructs the conversation from these events.
Frontend integration
agent-launcher.tsx— floating button + globalCmd-Khotkey.agent-sheet.tsx— slide-in drawer (<Sheet>).composer.tsx— input box + page-context capture + usage footer.message-list.tsx— streaming text + tool-call cards + suspension cards.tool-call-card.tsx— per-tool result card with undo affordance wheninverseAvailable.suggested-prompts.tsx— page-aware starter prompts.conversation-list.tsx— sidebar with past conversations.streaming-text.tsx— handles SSE delta buffering.usage-footer.tsx— daily $ counter + upsell.agent-sheet-mount.tsx— keeps the sheet mounted across route changes so streams survive navigation.
Locale
localeis read fromx-localeheader (set by the web client) oraccept-language, defaulting toen.- The agent’s static prompt instructs “Reply in the caller’s preferred locale unless they switch.”
- All i18n keys for agent UI live under
agent.*inapps/web/src/i18n/{en,he,ru}.json.
Localization of system prompt instructions
- Intent signals embedded for Hebrew:
סשן,מפגש,שיעור,לו״ז,כל החברים— so prompts like “כל החברים בתכנית” route correctly to the schedule path.
Failure modes — agent-specific
| Failure | Surface | Recovery |
|---|---|---|
ANTHROPIC_API_KEY unset | agent_disabled SSE error | Set env. |
| Org over daily $ cap | agent_budget_exceeded SSE error | Wait for UTC midnight or upgrade tier. |
Picker / confirm sent twice for the same toolUseId | tool_already_resolved | UI disables the action card after first submit. |
| Free-text reply instead of clicking | replay-integrity guard repairs synthetically | User retries the prompt; agent re-asks. |
| Tool throws | tool-runner converts to typed { code, message, hint } tool_result | Model retries or apologizes; no run abort. |
| Anthropic 529 overloaded | provider_overloaded SSE error | Retry in a moment. |
| Anthropic 429 | provider_rate_limited | Retry. |
| User closes the dashboard mid-stream | AbortController cancels the Anthropic stream | No partial state — assistant row carries whatever streamed before abort. |