Spotter Agent — Behavior

Surface

Endpoint	Purpose
`GET /organizations/:orgId/agent/usage`	Org daily AI spend snapshot (tier, cap, spent, percent, reset).
`GET /organizations/:orgId/agent/conversations`	List caller’s conversations in this org.
`GET /organizations/:orgId/agent/conversations/:conversationId`	Full conversation detail with reconstructed messages + tool calls.
`POST /organizations/:orgId/agent/messages`	Send a user turn. Returns an SSE stream.
`POST /organizations/:orgId/agent/conversations/:conversationId/confirm/:toolUseId`	Approve / reject a pending destructive tool. Returns an SSE stream.
`POST /organizations/:orgId/agent/conversations/:conversationId/pick/:toolUseId`	Resolve a `disambiguation_pending` candidate pick. Returns an SSE stream.
`POST /organizations/:orgId/agent/conversations/:conversationId/undo/:toolUseId`	Invoke the declared inverse of a previously-executed write. Bypasses confirmation; logs both events.

All endpoints require a Clerk session and active org membership. The send/confirm/pick endpoints additionally reject member role (v1 staff-only).

State machine


                       new user message
                              │
                              ▼
              ┌─────────────────────────────────┐
              │  conversation_started (if new)  │
              └─────────────────────────────────┘
                              │
                  preCheck org daily $ cap
                              │
                       ┌──────┴──────┐
                       │             │
                  over cap        ok
                       │             │
                       ▼             ▼
                error+done    persist user turn
                              │
                              ▼
                  ambient RAG (800ms budget)
                              │
                              ▼
                  compactIfNeeded (3s budget)
                              │
                              ▼
                  ┌─────── driveModelLoop (≤ 6 turns) ───────┐
                  │                                          │
                  │   validateReplay → repair if needed      │
                  │   withCachedTail (4th cache breakpoint)  │
                  │   anthropic.messages.stream              │
                  │   emit text_delta                        │
                  │   persist assistant message              │
                  │                                          │
                  │   stop_reason == end_turn?               │
                  │      yes → exit loop                     │
                  │      no  → processToolUses               │
                  │            ↓                             │
                  │   ┌────────┼────────┐                    │
                  │   │        │        │                    │
                  │ normal user_picker destructive           │
                  │ (run    (suspend)  (suspend)             │
                  │  inline)                                 │
                  │   │        │        │                    │
                  │   └────────┴────────┘                    │
                  │      ↓ all normal: tool_result blocks → next turn
                  │      ↓ any picker/destructive: suspend → done
                  └──────────────────────────────────────────┘
                              │
                              ▼
                   finalize: usage + cost +
                   recordTurn + observability + done

Suspended runs

When a user_picker tool is invoked, or when a destructive write is pending, the orchestrator stops the loop and emits:

disambiguation_pending with kind, prompt, candidates[].
confirmation_pending with router, action, input, confirm: 'destructive' | 'always'.

The frontend renders a card; user click triggers POST .../pick/:toolUseId or POST .../confirm/:toolUseId. The new SSE stream resumes the loop by merging all completed tool_results into a single tool_result message and calling driveModelLoop again with the updated history.

Any tools batched after the suspending tool (within the same assistant turn) are also persisted as pending so they can be confirmed sequentially.

MAX_TURNS

MAX_TURNS = 6 per controller call. After six round trips the loop exits; the assistant’s last message stands.

Routers and actions

The tool surface lives in apps/api/src/ai/agent/tools/leaves/*.tools.ts. Each @AgentTool decorator declares:

router, action, summary (model-visible).
inputSchema (Zod) — re-validated server-side via meta.inputSchema.parse(rawInput).
scopes: { roles: AgentRole[] } — RBAC filter at registry-build time + defence-in-depth check in tool-runner.
sideEffects: 'read' | 'write' — drives audit and inverse semantics.
confirm: 'never' | 'destructive' | 'always' — gates suspension.
audit? — { resource, actionLabel } writes to audit_logs post-success.
inverse? — { router, action, buildInput(output) } enables undo.
kind? — 'normal' | 'user_picker'. user_picker handlers are never invoked; the orchestrator suspends and waits for a pick.

Routers (apps/api/src/ai/agent/tools/leaves/<file>.tools.ts):

`read` — 11 actions, all read, never confirm

Action	Effect	Example prompt
`get_current_context`	Returns org + caller identity + locale + `now`.	”who am I?”
`programs_list`	List active programs.	”what programs do we run?”
`members_search`	Search members by name/email/role/status.	”find Saar”
`members_get`	Full member detail by membershipId.	”show me Dani’s plan + payment status”
`exercises_search`	Hybrid (lexical + semantic) exercise library search.	”find a squat variation”
`exercises_resolve_batch`	Batch-resolve movement names → exercise ids in one round-trip; flags ambiguous.	(used internally by workout build)
`ask_user_to_pick`	`user_picker` kind. Suspends; user picks one candidate.	”which Saar?”
`search_anything`	Cross-entity (program/workout/exercise/member).	“find Murph”
`lookup_by_id`	Detail by type+id.	(post-search)
`workout_comments_in_program`	List recent comments on workouts in a program (unreadOnly filter).	“any new comments in coaching?”
`workout_comments_for_assignment`	Full comment thread on a single assignment.	”what did Dani say on Monday’s WOD?”

`workouts` — 4 actions

Action	Effect	Confirm	Inverse
`create`	Insert a workout (freeform or structured).	never	`workouts.delete`
`update`	Patch a workout.	never	—
`set_sections`	Replace all sections + movements wholesale.	never	—
`delete`	Soft-delete.	destructive	—

Example prompt: “build a workout: 3 rounds — 200m run, 30 double-unders”.

`programs` — 4 actions

Action	Effect	Confirm	Inverse
`create`	New training program.	never	`programs.delete`
`update`	Patch.	never	—
`delete`	Soft-delete.	destructive	—
`enroll_member`	Enroll a member in a coaching/feed program.	never	—

`assignments` — 8 actions

Action	Effect	Confirm
`assign_personal`	1-on-1 PT assignment for a date.	never
`update`	Patch assignment.	never
`set_published`	Toggle published flag.	never
`delete`	Single assignment.	destructive
`mark_comments_read`	Mark comments as read for a coach.	never
`bulk_preview`	Dry-run for bulk ops.	never (read)
`bulk_delete`	Bulk delete N assignments.	destructive
`bulk_publish`	Bulk publish N assignments.	always

`bookings` — 3 actions, all read

Action	Effect
`list_mine`	Caller’s bookings (members) — admin-readable.
`attendance_summary`	Summary stats.
`attendance_trend`	Time-series.

`class_sessions` — 9 actions

Action	Effect	Confirm	Inverse
`list_for_date`	Sessions on a date.	never	—
`create`	New class session.	never	`class_sessions.cancel`
`update`	Patch session.	never	—
`publish`	Publish.	never	`class_sessions.unpublish`
`unpublish`	Unpublish.	never	—
`cancel`	Cancel.	destructive	—
`bulk_preview`	Dry-run for bulk ops.	never (read)	—
`bulk_delete`	Bulk delete.	destructive	—
`bulk_publish`	Bulk publish.	always	—

Example prompt: “schedule open gym tomorrow 13:00–14:00 published”.

`class_types` — 1 action

Action	Effect
`list_for_program`	Class types under a program.

`analytics` — 12 actions, all read

Action	Wraps
`revenue_summary`	`AnalyticsService.getRevenueSummary`
`revenue_trend`	`getRevenueTrend`
`members_summary`	`getMembersSummary`
`members_growth`	`getMembersGrowth`
`at_risk_members`	`getAtRiskMembers`
`popular_classes`	`getPopularClasses`
`class_utilization`	`getClassUtilization`
`coach_overview`	`getCoachOverview`
`org_insights`	`InsightsService.getOrgInsights`
`plan_distribution`	`getPlanDistribution`
`members_activation`	`getMembersActivation`
`workouts_summary`	`getWorkoutsSummary`

`tasks` — 5 actions

Action	Effect	Confirm	Inverse
`list`	List tasks.	never (read)	—
`create`	New task.	never	`tasks.delete`
`update`	Patch.	never	—
`complete`	Quick-complete.	never	—
`delete`	Soft delete.	destructive	—

`program_templates` — 4 actions

Action	Effect	Confirm
`list`	List templates.	never (read)
`create`	New template.	never
`apply`	Materialize a template into the org.	always
`from_history`	Derive a template from historical workouts.	always

`forms` — 2 actions, both read (FIT-176 / FIT-184)

Action	Effect
`list_pending_for_org`	Pending form assignments.
`compliance_status_for_member`	Per-typeKey compliance status (FIT-158).

Confirmation flow

Model emits tool_use block for a tool with confirm: 'destructive' or 'always'.
Orchestrator inserts ai_tool_executions row with status: 'pending'.
Orchestrator emits SSE confirmation_pending with { toolUseId, router, action, input, confirm }.
Any subsequent tool_use blocks in the same assistant message are also persisted pending (they can’t be partially fed back).
Loop suspends; controller emits done with usage 0.
User clicks Approve or Reject in the frontend → POST /confirm/:toolUseId { approved: boolean }.
If approved, runner executes the tool; status transitions pending → succeeded | failed. tool_completed SSE event emitted.
If rejected, status → rejected_by_user with synthetic error { code: 'rejected_by_user' }.
If other pending tools remain in the same assistant message, the resume returns done early (waiting for more confirmations).
Once all pending tools for the message are resolved, the orchestrator appends a unified tool_result user message and resumes the model loop.

The same merge-and-resume path handles disambiguation_pending picks.

Undo flow

POST .../undo/:toolUseId resolves the inverse tool of a previously-succeeded write:

Load ai_tool_executions by (conversationId, toolUseId).
Reject if not status: 'succeeded' (422).
Lookup the registry entry; require meta.inverse is defined.
Build the inverse input via inverse.buildInput(original.output) — typically { id: output.id }.
Execute via ToolRunnerService.execute — bypasses confirmation entirely.
Audit log captures both events: the original and the inverse, linked by their audit rows.

Example: workouts.create returns { id, ... }; the declared inverse is workouts.delete with buildInput: (out) => ({ id: out.id }). A user toast “Workout created — Undo (15s)” calls the undo endpoint; the workout is soft-deleted.

Disambiguation flow (`read.ask_user_to_pick`)

Triggered by the system prompt when ambient context lists multiple plausible candidates for the same entity:

Model emits read.ask_user_to_pick { kind, prompt, candidates: [{ id, label, sublabel?, detail? }, ...] }.
Orchestrator sees meta.kind === 'user_picker', persists pending, emits disambiguation_pending.
Frontend renders a “Which Saar?” card with 2-8 buttons.
User clicks → POST .../pick/:toolUseId { id: <pickedId> }.
Orchestrator validates the id is in candidates, persists { pickedId, pickedLabel, kind } as the tool output, emits tool_completed, and resumes the model loop.

Audit logging

ToolRunnerService.execute writes an audit_logs row when:

meta.audit is defined, AND
meta.sideEffects === 'write', AND
The tool succeeded.

Row carries:

actor_clerk_id = caller.
action = meta.audit.actionLabel.
resource = meta.audit.resource.
resource_id = extracted from output id field if present.
metadata.agent: true so admin views filter agent vs human.

ai_tool_executions.audit_log_id links the tool run back to its audit row.

Per-org daily $ cap

Lives in apps/api/src/ai/agent/rate-limit.service.ts and is sourced from PLATFORM_TIER_MAP (libs/shared/src/lib/constants/platform-tiers.ts):

Tier	Daily budget (USD)
Lite	$1.00 (`1_000_000` micros)
Pro	$5.00
Elite	$25.00

AgentRateLimitService.preCheck(orgId):

Read org → tier → aiDailyBudgetUsdMicros.
Budget -1 → unmetered, allow.
Else compare to costTracker.getOrgSpendTodayMicros(orgId) (UTC day bucket).
If spent >= budget, return { ok: false, code: 'agent_budget_exceeded', message: 'Your org has reached its AI daily spending limit ($X.XX). It resets at 00:00 UTC. Upgrade your plan for a higher limit.', remainingUsdMicros: 0 }.

AgentController.sendMessage invokes preCheck after SSE headers flush so the client can render the upgrade prompt inline.

AgentRateLimitService.getUsageSnapshot(orgId) returns the same payload the dashboard composer footer reads — used to display “0.40 / $5.00” inline counters.

Per-org/user/day cost tracking

AgentCostTracker.recordTurn upserts into ai_usage_daily keyed on (organization_id, user_id, day) with SQL increments so concurrent turns don’t lose updates. Day key is UTC YYYY-MM-DD.

AgentOrchestrator.finalize computes costUsdMicros per turn:


COST_INPUT_PER_M       = 3.0
COST_OUTPUT_PER_M      = 15.0
COST_CACHE_READ_PER_M  = 0.3   // 10% of input
COST_CACHE_CREATE_PER_M = 3.75 // 125% of input

Sum of (tokens / 1M) * rate for each bucket, rounded to micros.

Role-aware upsell

The composer footer (apps/web/src/components/agent/usage-footer.tsx) reads GET /agent/usage and shows:

spent / cap for capped tiers.
A subtle upgrade CTA when percentUsed >= 0.8.
A blocking modal with localized copy when agent_budget_exceeded flows over SSE.

Member role never sees the composer; staff-only gate at the controller.

Prompt caching architecture

AgentContextBuilder.buildSystem returns three system blocks with cache_control:


[
  { type: 'text', text: STATIC_SYSTEM_PROMPT, cache_control: { type: 'ephemeral', ttl: '1h' } },
  { type: 'text', text: orgContextBlock,      cache_control: { type: 'ephemeral', ttl: '1h' } },
  { type: 'text', text: pageContextBlock /* no cache */ },
]

AgentOrchestrator.driveModelLoop calls withCachedTail(messages) before every request — adds a cache_control: { type: 'ephemeral' } marker to the trailing message’s last content block. This is the 4th breakpoint (Anthropic allows up to 4):


[ static system ][ org ][ page+ambient ][ tools ][ ...conv history... ][ tail*cache ]
   ↑1h            ↑1h     no cache         (implicit cache via tools array)
                                                                              ↑ marked

Result: every subsequent turn re-reads the entire prefix from cache at 10% of input cost. Cache-creation pays 125% the first time; payback after one re-use.

Why ttl: '1h' rather than 5-min default:

5-min cache evicts during quiet afternoon windows.
Re-creating after eviction costs 125% — worse than not caching at all on a single turn.
1h matches the natural cadence of static/org context changes.

Steady-state cache-hit ratio is computed in finalize:


cacheHitRatio = cacheReadTokens / (cacheReadTokens + inputTokens)

Healthy target: 0.85–0.95.

Observability

AgentObservabilityService emits typed events to Pino + Sentry breadcrumbs + PostHog (production only):

Event	Stage
`agent.turn.started`	First emit at the controller. Carries `endpoint`, `promptLength`.
`agent.turn.completed`	`finalize`. Carries `inputTokens`, `outputTokens`, `cacheReadTokens`, `cacheCreationTokens`, `costUsdMicros`, `cacheHitRatio`.
`agent.turn.failed`	`logTurnFailure`. Carries the classified `errorCode`.
`agent.tool.executed`	Per-tool. Carries `router`, `action`, `durationMs`, `outcome`, optional `errorCode`.
`agent.tool.confirmation_pending`	Per pending destructive tool.
`agent.tool.disambiguation_pending`	Per pending picker tool.
`agent.replay.violation`	On replay-integrity violation. Carries `violations` count, `reasons[]`, `repaired`.
`agent.storage.snapshot`	Daily cron at 03:00. Per-table row counts + bytes.

scrub() drops content, text, prompt, message, output, input, token, apiKey, secret, password before fan-out. Tokens and durations stay; user text never leaves the server.

Sentry tags: agent.trace_id, agent.org_id, agent.code, agent.endpoint. The trace id is server-generated UUID and surfaces back to the client on errors so QA can paste it into bug reports.

Error classification (`error-classifier.ts`)

Coarse codes for stable SSE / i18n surface:

agent_disabled, agent_budget_exceeded (from rate-limit), rate_limited
replay_integrity_violation, tool_execution_not_found, tool_already_resolved, invalid_pick
aborted (AbortError)
provider_invalid_request (4xx), provider_unauthorized (401/403), provider_overloaded (529 / type), provider_rate_limited (429), provider_unavailable (5xx)
internal (catch-all)

Each maps to a localized client string under agent.errors.<code>.

Ambient RAG (`rag.service.ts`)

On every user turn (skipped on resume), findAmbient({ orgId, query }):

800ms timeout, never throws.
30-character minimum query length (short prompts skip RAG).
Up to 3 hits per scope, 5 total across scopes.
Scopes: program, workout, exercise, member.
Hybrid: Postgres ilike for lexical + Voyage voyage-multilingual-2 embedding for semantic; results blended with an RRF-ish score.
Token extraction strips stopwords (articles, verbs, calendar tokens, entity nouns) before lexical search.

Hits are injected as <ambient_context> in the page-context system block.

Compaction (`compaction.service.ts`)

Trigger threshold: >= 24 messages past the latest system_note anchor (or from start).
Keep last 6 messages verbatim.
Run claude-haiku-4-5 summarizer (3s budget).
Persist the summary as a system_note row with pageContext.summarizedThroughMessageId.
listMessagesForReplay reads from the latest system_note forward.
Title generation also uses Haiku, 1.5s budget, best-effort.

Replay-integrity guard (`replay-validator.ts`)

Catches the dominant failure mode: a pending picker / confirmation tool_use whose tool_result was never paired (user typed a free-text reply instead of clicking).

validateReplay(messages) returns { ok, violations[] }. On violation:

validateAndRepairReplay(messages) synthesizes error tool_result blocks for the orphaned tool_use ids so the model can reason about the redirect.
AgentObservabilityService.logReplayViolation records the violation (PII-scrubbed snapshot of role + block types only).
Loop continues with repaired messages.

SSE protocol (`agent-schemas/sse-events.ts`)


event: conversation_started\ndata: {"type":"conversation_started","conversationId":"<uuid>"}\n\n
event: text_delta\ndata: {"type":"text_delta","delta":"..."}\n\n
event: tool_started\ndata: {"type":"tool_started","toolUseId":"...","router":"...","action":"...","input":{...}}\n\n
event: tool_completed\ndata: {"type":"tool_completed",..."ok":true,"output":{...},"inverseAvailable":true}\n\n
event: confirmation_pending\ndata: {"type":"confirmation_pending","toolUseId":"...","router":"...","action":"...","input":{...},"confirm":"destructive"}\n\n
event: disambiguation_pending\ndata: {"type":"disambiguation_pending","toolUseId":"...","kind":"member","prompt":"Which Saar?","candidates":[...]}\n\n
event: message_done\ndata: {"type":"message_done","messageId":"<uuid>","stopReason":"end_turn"}\n\n
event: done\ndata: {"type":"done","conversationId":"<uuid>","usage":{...}}\n\n
event: error\ndata: {"type":"error","code":"...","message":"...","traceId":"..."}\n\n

The web reducer in apps/web/src/providers/agent-provider.tsx (and apps/web/src/components/agent/message-list.tsx) reconstructs the conversation from these events.

Frontend integration

agent-launcher.tsx — floating button + global Cmd-K hotkey.
agent-sheet.tsx — slide-in drawer (<Sheet>).
composer.tsx — input box + page-context capture + usage footer.
message-list.tsx — streaming text + tool-call cards + suspension cards.
tool-call-card.tsx — per-tool result card with undo affordance when inverseAvailable.
suggested-prompts.tsx — page-aware starter prompts.
conversation-list.tsx — sidebar with past conversations.
streaming-text.tsx — handles SSE delta buffering.
usage-footer.tsx — daily $ counter + upsell.
agent-sheet-mount.tsx — keeps the sheet mounted across route changes so streams survive navigation.

Locale

locale is read from x-locale header (set by the web client) or accept-language, defaulting to en.
The agent’s static prompt instructs “Reply in the caller’s preferred locale unless they switch.”
All i18n keys for agent UI live under agent.* in apps/web/src/i18n/{en,he,ru}.json.

Localization of system prompt instructions

Intent signals embedded for Hebrew: סשן, מפגש, שיעור, לו״ז, כל החברים — so prompts like “כל החברים בתכנית” route correctly to the schedule path.

Failure modes — agent-specific

Failure	Surface	Recovery
`ANTHROPIC_API_KEY` unset	`agent_disabled` SSE error	Set env.
Org over daily $ cap	`agent_budget_exceeded` SSE error	Wait for UTC midnight or upgrade tier.
Picker / confirm sent twice for the same `toolUseId`	`tool_already_resolved`	UI disables the action card after first submit.
Free-text reply instead of clicking	replay-integrity guard repairs synthetically	User retries the prompt; agent re-asks.
Tool throws	tool-runner converts to typed `{ code, message, hint }` tool_result	Model retries or apologizes; no run abort.
Anthropic 529 overloaded	`provider_overloaded` SSE error	Retry in a moment.
Anthropic 429	`provider_rate_limited`	Retry.
User closes the dashboard mid-stream	`AbortController` cancels the Anthropic stream	No partial state — assistant row carries whatever streamed before abort.

Spotter Agent — Behavior

Surface

State machine

Suspended runs

MAX_TURNS

Routers and actions

read — 11 actions, all read, never confirm

workouts — 4 actions

programs — 4 actions

assignments — 8 actions

bookings — 3 actions, all read

class_sessions — 9 actions

class_types — 1 action

analytics — 12 actions, all read

tasks — 5 actions

program_templates — 4 actions

forms — 2 actions, both read (FIT-176 / FIT-184)