Skip to Content
Living documentation — last reviewed 2026-05-28
FeaturesSpotter AgentSpotter Agent — Behavior

Spotter Agent — Behavior

Surface

EndpointPurpose
GET /organizations/:orgId/agent/usageOrg daily AI spend snapshot (tier, cap, spent, percent, reset).
GET /organizations/:orgId/agent/conversationsList caller’s conversations in this org.
GET /organizations/:orgId/agent/conversations/:conversationIdFull conversation detail with reconstructed messages + tool calls.
POST /organizations/:orgId/agent/messagesSend a user turn. Returns an SSE stream.
POST /organizations/:orgId/agent/conversations/:conversationId/confirm/:toolUseIdApprove / reject a pending destructive tool. Returns an SSE stream.
POST /organizations/:orgId/agent/conversations/:conversationId/pick/:toolUseIdResolve a disambiguation_pending candidate pick. Returns an SSE stream.
POST /organizations/:orgId/agent/conversations/:conversationId/undo/:toolUseIdInvoke the declared inverse of a previously-executed write. Bypasses confirmation; logs both events.

All endpoints require a Clerk session and active org membership. The send/confirm/pick endpoints additionally reject member role (v1 staff-only).

State machine

new user message ┌─────────────────────────────────┐ │ conversation_started (if new) │ └─────────────────────────────────┘ preCheck org daily $ cap ┌──────┴──────┐ │ │ over cap ok │ │ ▼ ▼ error+done persist user turn ambient RAG (800ms budget) compactIfNeeded (3s budget) ┌─────── driveModelLoop (≤ 6 turns) ───────┐ │ │ │ validateReplay → repair if needed │ │ withCachedTail (4th cache breakpoint) │ │ anthropic.messages.stream │ │ emit text_delta │ │ persist assistant message │ │ │ │ stop_reason == end_turn? │ │ yes → exit loop │ │ no → processToolUses │ │ ↓ │ │ ┌────────┼────────┐ │ │ │ │ │ │ │ normal user_picker destructive │ │ (run (suspend) (suspend) │ │ inline) │ │ │ │ │ │ │ └────────┴────────┘ │ │ ↓ all normal: tool_result blocks → next turn │ ↓ any picker/destructive: suspend → done └──────────────────────────────────────────┘ finalize: usage + cost + recordTurn + observability + done

Suspended runs

When a user_picker tool is invoked, or when a destructive write is pending, the orchestrator stops the loop and emits:

  • disambiguation_pending with kind, prompt, candidates[].
  • confirmation_pending with router, action, input, confirm: 'destructive' | 'always'.

The frontend renders a card; user click triggers POST .../pick/:toolUseId or POST .../confirm/:toolUseId. The new SSE stream resumes the loop by merging all completed tool_results into a single tool_result message and calling driveModelLoop again with the updated history.

Any tools batched after the suspending tool (within the same assistant turn) are also persisted as pending so they can be confirmed sequentially.

MAX_TURNS

MAX_TURNS = 6 per controller call. After six round trips the loop exits; the assistant’s last message stands.

Routers and actions

The tool surface lives in apps/api/src/ai/agent/tools/leaves/*.tools.ts. Each @AgentTool decorator declares:

  • router, action, summary (model-visible).
  • inputSchema (Zod) — re-validated server-side via meta.inputSchema.parse(rawInput).
  • scopes: { roles: AgentRole[] } — RBAC filter at registry-build time + defence-in-depth check in tool-runner.
  • sideEffects: 'read' | 'write' — drives audit and inverse semantics.
  • confirm: 'never' | 'destructive' | 'always' — gates suspension.
  • audit?{ resource, actionLabel } writes to audit_logs post-success.
  • inverse?{ router, action, buildInput(output) } enables undo.
  • kind?'normal' | 'user_picker'. user_picker handlers are never invoked; the orchestrator suspends and waits for a pick.

Routers (apps/api/src/ai/agent/tools/leaves/<file>.tools.ts):

read — 11 actions, all read, never confirm

ActionEffectExample prompt
get_current_contextReturns org + caller identity + locale + now.”who am I?”
programs_listList active programs.”what programs do we run?”
members_searchSearch members by name/email/role/status.”find Saar”
members_getFull member detail by membershipId.”show me Dani’s plan + payment status”
exercises_searchHybrid (lexical + semantic) exercise library search.”find a squat variation”
exercises_resolve_batchBatch-resolve movement names → exercise ids in one round-trip; flags ambiguous.(used internally by workout build)
ask_user_to_pickuser_picker kind. Suspends; user picks one candidate.”which Saar?”
search_anythingCross-entity (program/workout/exercise/member).“find Murph”
lookup_by_idDetail by type+id.(post-search)
workout_comments_in_programList recent comments on workouts in a program (unreadOnly filter).“any new comments in coaching?”
workout_comments_for_assignmentFull comment thread on a single assignment.”what did Dani say on Monday’s WOD?”

workouts — 4 actions

ActionEffectConfirmInverse
createInsert a workout (freeform or structured).neverworkouts.delete
updatePatch a workout.never
set_sectionsReplace all sections + movements wholesale.never
deleteSoft-delete.destructive

Example prompt: “build a workout: 3 rounds — 200m run, 30 double-unders”.

programs — 4 actions

ActionEffectConfirmInverse
createNew training program.neverprograms.delete
updatePatch.never
deleteSoft-delete.destructive
enroll_memberEnroll a member in a coaching/feed program.never

assignments — 8 actions

ActionEffectConfirm
assign_personal1-on-1 PT assignment for a date.never
updatePatch assignment.never
set_publishedToggle published flag.never
deleteSingle assignment.destructive
mark_comments_readMark comments as read for a coach.never
bulk_previewDry-run for bulk ops.never (read)
bulk_deleteBulk delete N assignments.destructive
bulk_publishBulk publish N assignments.always

bookings — 3 actions, all read

ActionEffect
list_mineCaller’s bookings (members) — admin-readable.
attendance_summarySummary stats.
attendance_trendTime-series.

class_sessions — 9 actions

ActionEffectConfirmInverse
list_for_dateSessions on a date.never
createNew class session.neverclass_sessions.cancel
updatePatch session.never
publishPublish.neverclass_sessions.unpublish
unpublishUnpublish.never
cancelCancel.destructive
bulk_previewDry-run for bulk ops.never (read)
bulk_deleteBulk delete.destructive
bulk_publishBulk publish.always

Example prompt: “schedule open gym tomorrow 13:00–14:00 published”.

class_types — 1 action

ActionEffect
list_for_programClass types under a program.

analytics — 12 actions, all read

ActionWraps
revenue_summaryAnalyticsService.getRevenueSummary
revenue_trendgetRevenueTrend
members_summarygetMembersSummary
members_growthgetMembersGrowth
at_risk_membersgetAtRiskMembers
popular_classesgetPopularClasses
class_utilizationgetClassUtilization
coach_overviewgetCoachOverview
org_insightsInsightsService.getOrgInsights
plan_distributiongetPlanDistribution
members_activationgetMembersActivation
workouts_summarygetWorkoutsSummary

tasks — 5 actions

ActionEffectConfirmInverse
listList tasks.never (read)
createNew task.nevertasks.delete
updatePatch.never
completeQuick-complete.never
deleteSoft delete.destructive

program_templates — 4 actions

ActionEffectConfirm
listList templates.never (read)
createNew template.never
applyMaterialize a template into the org.always
from_historyDerive a template from historical workouts.always

forms — 2 actions, both read (FIT-176 / FIT-184)

ActionEffect
list_pending_for_orgPending form assignments.
compliance_status_for_memberPer-typeKey compliance status (FIT-158).

Confirmation flow

  1. Model emits tool_use block for a tool with confirm: 'destructive' or 'always'.
  2. Orchestrator inserts ai_tool_executions row with status: 'pending'.
  3. Orchestrator emits SSE confirmation_pending with { toolUseId, router, action, input, confirm }.
  4. Any subsequent tool_use blocks in the same assistant message are also persisted pending (they can’t be partially fed back).
  5. Loop suspends; controller emits done with usage 0.
  6. User clicks Approve or Reject in the frontend → POST /confirm/:toolUseId { approved: boolean }.
  7. If approved, runner executes the tool; status transitions pending → succeeded | failed. tool_completed SSE event emitted.
  8. If rejected, status → rejected_by_user with synthetic error { code: 'rejected_by_user' }.
  9. If other pending tools remain in the same assistant message, the resume returns done early (waiting for more confirmations).
  10. Once all pending tools for the message are resolved, the orchestrator appends a unified tool_result user message and resumes the model loop.

The same merge-and-resume path handles disambiguation_pending picks.

Undo flow

POST .../undo/:toolUseId resolves the inverse tool of a previously-succeeded write:

  1. Load ai_tool_executions by (conversationId, toolUseId).
  2. Reject if not status: 'succeeded' (422).
  3. Lookup the registry entry; require meta.inverse is defined.
  4. Build the inverse input via inverse.buildInput(original.output) — typically { id: output.id }.
  5. Execute via ToolRunnerService.execute — bypasses confirmation entirely.
  6. Audit log captures both events: the original and the inverse, linked by their audit rows.

Example: workouts.create returns { id, ... }; the declared inverse is workouts.delete with buildInput: (out) => ({ id: out.id }). A user toast “Workout created — Undo (15s)” calls the undo endpoint; the workout is soft-deleted.

Disambiguation flow (read.ask_user_to_pick)

Triggered by the system prompt when ambient context lists multiple plausible candidates for the same entity:

  1. Model emits read.ask_user_to_pick { kind, prompt, candidates: [{ id, label, sublabel?, detail? }, ...] }.
  2. Orchestrator sees meta.kind === 'user_picker', persists pending, emits disambiguation_pending.
  3. Frontend renders a “Which Saar?” card with 2-8 buttons.
  4. User clicks → POST .../pick/:toolUseId { id: <pickedId> }.
  5. Orchestrator validates the id is in candidates, persists { pickedId, pickedLabel, kind } as the tool output, emits tool_completed, and resumes the model loop.

Audit logging

ToolRunnerService.execute writes an audit_logs row when:

  • meta.audit is defined, AND
  • meta.sideEffects === 'write', AND
  • The tool succeeded.

Row carries:

  • actor_clerk_id = caller.
  • action = meta.audit.actionLabel.
  • resource = meta.audit.resource.
  • resource_id = extracted from output id field if present.
  • metadata.agent: true so admin views filter agent vs human.

ai_tool_executions.audit_log_id links the tool run back to its audit row.

Per-org daily $ cap

Lives in apps/api/src/ai/agent/rate-limit.service.ts and is sourced from PLATFORM_TIER_MAP (libs/shared/src/lib/constants/platform-tiers.ts):

TierDaily budget (USD)
Lite$1.00 (1_000_000 micros)
Pro$5.00
Elite$25.00

AgentRateLimitService.preCheck(orgId):

  • Read org → tier → aiDailyBudgetUsdMicros.
  • Budget -1 → unmetered, allow.
  • Else compare to costTracker.getOrgSpendTodayMicros(orgId) (UTC day bucket).
  • If spent >= budget, return { ok: false, code: 'agent_budget_exceeded', message: 'Your org has reached its AI daily spending limit ($X.XX). It resets at 00:00 UTC. Upgrade your plan for a higher limit.', remainingUsdMicros: 0 }.

AgentController.sendMessage invokes preCheck after SSE headers flush so the client can render the upgrade prompt inline.

AgentRateLimitService.getUsageSnapshot(orgId) returns the same payload the dashboard composer footer reads — used to display “0.40 / $5.00” inline counters.

Per-org/user/day cost tracking

AgentCostTracker.recordTurn upserts into ai_usage_daily keyed on (organization_id, user_id, day) with SQL increments so concurrent turns don’t lose updates. Day key is UTC YYYY-MM-DD.

AgentOrchestrator.finalize computes costUsdMicros per turn:

COST_INPUT_PER_M = 3.0 COST_OUTPUT_PER_M = 15.0 COST_CACHE_READ_PER_M = 0.3 // 10% of input COST_CACHE_CREATE_PER_M = 3.75 // 125% of input

Sum of (tokens / 1M) * rate for each bucket, rounded to micros.

Role-aware upsell

The composer footer (apps/web/src/components/agent/usage-footer.tsx) reads GET /agent/usage and shows:

  • spent / cap for capped tiers.
  • A subtle upgrade CTA when percentUsed >= 0.8.
  • A blocking modal with localized copy when agent_budget_exceeded flows over SSE.

Member role never sees the composer; staff-only gate at the controller.

Prompt caching architecture

AgentContextBuilder.buildSystem returns three system blocks with cache_control:

[ { type: 'text', text: STATIC_SYSTEM_PROMPT, cache_control: { type: 'ephemeral', ttl: '1h' } }, { type: 'text', text: orgContextBlock, cache_control: { type: 'ephemeral', ttl: '1h' } }, { type: 'text', text: pageContextBlock /* no cache */ }, ]

AgentOrchestrator.driveModelLoop calls withCachedTail(messages) before every request — adds a cache_control: { type: 'ephemeral' } marker to the trailing message’s last content block. This is the 4th breakpoint (Anthropic allows up to 4):

[ static system ][ org ][ page+ambient ][ tools ][ ...conv history... ][ tail*cache ] ↑1h ↑1h no cache (implicit cache via tools array) ↑ marked

Result: every subsequent turn re-reads the entire prefix from cache at 10% of input cost. Cache-creation pays 125% the first time; payback after one re-use.

Why ttl: '1h' rather than 5-min default:

  • 5-min cache evicts during quiet afternoon windows.
  • Re-creating after eviction costs 125% — worse than not caching at all on a single turn.
  • 1h matches the natural cadence of static/org context changes.

Steady-state cache-hit ratio is computed in finalize:

cacheHitRatio = cacheReadTokens / (cacheReadTokens + inputTokens)

Healthy target: 0.85–0.95.

Observability

AgentObservabilityService emits typed events to Pino + Sentry breadcrumbs + PostHog (production only):

EventStage
agent.turn.startedFirst emit at the controller. Carries endpoint, promptLength.
agent.turn.completedfinalize. Carries inputTokens, outputTokens, cacheReadTokens, cacheCreationTokens, costUsdMicros, cacheHitRatio.
agent.turn.failedlogTurnFailure. Carries the classified errorCode.
agent.tool.executedPer-tool. Carries router, action, durationMs, outcome, optional errorCode.
agent.tool.confirmation_pendingPer pending destructive tool.
agent.tool.disambiguation_pendingPer pending picker tool.
agent.replay.violationOn replay-integrity violation. Carries violations count, reasons[], repaired.
agent.storage.snapshotDaily cron at 03:00. Per-table row counts + bytes.

scrub() drops content, text, prompt, message, output, input, token, apiKey, secret, password before fan-out. Tokens and durations stay; user text never leaves the server.

Sentry tags: agent.trace_id, agent.org_id, agent.code, agent.endpoint. The trace id is server-generated UUID and surfaces back to the client on errors so QA can paste it into bug reports.

Error classification (error-classifier.ts)

Coarse codes for stable SSE / i18n surface:

  • agent_disabled, agent_budget_exceeded (from rate-limit), rate_limited
  • replay_integrity_violation, tool_execution_not_found, tool_already_resolved, invalid_pick
  • aborted (AbortError)
  • provider_invalid_request (4xx), provider_unauthorized (401/403), provider_overloaded (529 / type), provider_rate_limited (429), provider_unavailable (5xx)
  • internal (catch-all)

Each maps to a localized client string under agent.errors.<code>.

Ambient RAG (rag.service.ts)

On every user turn (skipped on resume), findAmbient({ orgId, query }):

  • 800ms timeout, never throws.
  • 30-character minimum query length (short prompts skip RAG).
  • Up to 3 hits per scope, 5 total across scopes.
  • Scopes: program, workout, exercise, member.
  • Hybrid: Postgres ilike for lexical + Voyage voyage-multilingual-2 embedding for semantic; results blended with an RRF-ish score.
  • Token extraction strips stopwords (articles, verbs, calendar tokens, entity nouns) before lexical search.

Hits are injected as <ambient_context> in the page-context system block.

Compaction (compaction.service.ts)

  • Trigger threshold: >= 24 messages past the latest system_note anchor (or from start).
  • Keep last 6 messages verbatim.
  • Run claude-haiku-4-5 summarizer (3s budget).
  • Persist the summary as a system_note row with pageContext.summarizedThroughMessageId.
  • listMessagesForReplay reads from the latest system_note forward.
  • Title generation also uses Haiku, 1.5s budget, best-effort.

Replay-integrity guard (replay-validator.ts)

Catches the dominant failure mode: a pending picker / confirmation tool_use whose tool_result was never paired (user typed a free-text reply instead of clicking).

validateReplay(messages) returns { ok, violations[] }. On violation:

  • validateAndRepairReplay(messages) synthesizes error tool_result blocks for the orphaned tool_use ids so the model can reason about the redirect.
  • AgentObservabilityService.logReplayViolation records the violation (PII-scrubbed snapshot of role + block types only).
  • Loop continues with repaired messages.

SSE protocol (agent-schemas/sse-events.ts)

event: conversation_started\ndata: {"type":"conversation_started","conversationId":"<uuid>"}\n\n event: text_delta\ndata: {"type":"text_delta","delta":"..."}\n\n event: tool_started\ndata: {"type":"tool_started","toolUseId":"...","router":"...","action":"...","input":{...}}\n\n event: tool_completed\ndata: {"type":"tool_completed",..."ok":true,"output":{...},"inverseAvailable":true}\n\n event: confirmation_pending\ndata: {"type":"confirmation_pending","toolUseId":"...","router":"...","action":"...","input":{...},"confirm":"destructive"}\n\n event: disambiguation_pending\ndata: {"type":"disambiguation_pending","toolUseId":"...","kind":"member","prompt":"Which Saar?","candidates":[...]}\n\n event: message_done\ndata: {"type":"message_done","messageId":"<uuid>","stopReason":"end_turn"}\n\n event: done\ndata: {"type":"done","conversationId":"<uuid>","usage":{...}}\n\n event: error\ndata: {"type":"error","code":"...","message":"...","traceId":"..."}\n\n

The web reducer in apps/web/src/providers/agent-provider.tsx (and apps/web/src/components/agent/message-list.tsx) reconstructs the conversation from these events.

Frontend integration

  • agent-launcher.tsx — floating button + global Cmd-K hotkey.
  • agent-sheet.tsx — slide-in drawer (<Sheet>).
  • composer.tsx — input box + page-context capture + usage footer.
  • message-list.tsx — streaming text + tool-call cards + suspension cards.
  • tool-call-card.tsx — per-tool result card with undo affordance when inverseAvailable.
  • suggested-prompts.tsx — page-aware starter prompts.
  • conversation-list.tsx — sidebar with past conversations.
  • streaming-text.tsx — handles SSE delta buffering.
  • usage-footer.tsx — daily $ counter + upsell.
  • agent-sheet-mount.tsx — keeps the sheet mounted across route changes so streams survive navigation.

Locale

  • locale is read from x-locale header (set by the web client) or accept-language, defaulting to en.
  • The agent’s static prompt instructs “Reply in the caller’s preferred locale unless they switch.”
  • All i18n keys for agent UI live under agent.* in apps/web/src/i18n/{en,he,ru}.json.

Localization of system prompt instructions

  • Intent signals embedded for Hebrew: סשן, מפגש, שיעור, לו״ז, כל החברים — so prompts like “כל החברים בתכנית” route correctly to the schedule path.

Failure modes — agent-specific

FailureSurfaceRecovery
ANTHROPIC_API_KEY unsetagent_disabled SSE errorSet env.
Org over daily $ capagent_budget_exceeded SSE errorWait for UTC midnight or upgrade tier.
Picker / confirm sent twice for the same toolUseIdtool_already_resolvedUI disables the action card after first submit.
Free-text reply instead of clickingreplay-integrity guard repairs syntheticallyUser retries the prompt; agent re-asks.
Tool throwstool-runner converts to typed { code, message, hint } tool_resultModel retries or apologizes; no run abort.
Anthropic 529 overloadedprovider_overloaded SSE errorRetry in a moment.
Anthropic 429provider_rate_limitedRetry.
User closes the dashboard mid-streamAbortController cancels the Anthropic streamNo partial state — assistant row carries whatever streamed before abort.