Spotter Agent

FitKit’s embedded AI assistant — a conversational, tool-calling Claude agent integrated into the dashboard. Just like a spotter in the gym, it doesn’t lift the weight for the user; it watches, supports, and steps in when needed.

What

A streaming, tool-using agent that reads and writes FitKit data via 63 typed tools across 11 routers. It runs inside the API (apps/api/src/ai/agent/), surfaces in the dashboard via a slide-in sheet (apps/web/src/components/agent/), and powers prompts like:

“build a workout: 3 rounds — 200m run, 30 double-unders”
“who hasn’t paid this month?”
“schedule open gym tomorrow 13:00–14:00”
“what’s our MRR?”
“delete the workout I just made” (with one-click undo)

Why

The dashboard has 50+ pages. Most owner asks span 3-4 of them. Spotter collapses multi-page workflows into one prompt.
Coaches’ real-world ask is rarely “open the schedule, scroll, click new, fill in 11 fields” — it’s “put open gym on tomorrow at 1pm”. Spotter is that interface.
Programming workouts is the single most-time-consuming task in a gym day. Structured-workout creation via Spotter takes <10s versus 2-3 minutes hand-clicking the builder.
Owners and coaches share a mental model of the gym; one prompt-driven agent is far cheaper to learn than 50 page-specific affordances.

Who

Owner / admin / coach — full access in v1 (FIT-161 phase 1). Members are 403’d at the controller.
Spotter agent itself — has its own audit log entries with metadata.agent: true so the admin app can pivot agent actions next to human actions.

Persona impact

Persona	Impact
Owner	Run-the-business prompts: revenue, debt, at-risk members, refunds (via dashboard), staff onboarding.
Coach	Build-the-day prompts: workouts, assignments, class sessions, daily programming. Single biggest time-saver in the product.
Admin	Mirror of owner with no payment ceiling.
Member	Out of scope in v1; planned for v2.

Capabilities

63 typed tools across 11 routers (read, workouts, programs, assignments, bookings, class_sessions, class_types, analytics, tasks, program_templates, forms). Each registered via the @AgentTool decorator and auto-discovered at module init.
Anthropic prompt caching with three stable cache_control breakpoints (1h TTL) + a 4th breakpoint on the message tail so steady-state cache-hit ratio stabilizes around 0.85-0.95.
Disambiguation flow (read.ask_user_to_pick) — when ambient context surfaces multiple candidates, the agent suspends the run and asks the user to click one in a card.
Confirmation flow — destructive tools (confirm: 'destructive') and always-confirm tools (confirm: 'always') suspend the run and emit a confirmation_pending SSE event. The user clicks Approve/Reject; the run resumes.
Undo — non-destructive write tools declare an inverse. A 15-second toast-driven undo invokes the inverse without re-prompting the model.
Per-org daily $ cap — tier-driven AI budget. Lite $1, Pro $5, Elite $25 (USD micros). Pre-check gates every turn; over-cap → immediate agent_budget_exceeded SSE error with a localized upgrade prompt.
Per-conversation cost tracking in ai_conversations.cost_usd_micros; per-org per-user per-day in ai_usage_daily.
Conversation persistence — every turn is a row in ai_messages with the Anthropic content blocks stored verbatim for lossless replay.
Replay-integrity guard — catches pending tool_use blocks that never got a tool_result and auto-repairs them so a stuck conversation never blocks the user.
Compaction — opportunistic Haiku-based summarization once a conversation exceeds 24 messages past the latest anchor.
Title generation — first message → Haiku-generated 4-6 word title (3s budget, non-blocking).
Ambient RAG — hybrid lexical+semantic search across programs/workouts/exercises/members; results injected as <ambient_context> so the model can reference ids without an extra search round-trip.
Page context — the web sends pathname + selection so prompts are page-aware (“delete this workout” works because the workout id is in context).
Observability — Pino + Sentry + PostHog event fan-out via AgentObservabilityService. PII-scrubbed event props.
Daily storage snapshot — agent_storage_snapshot cron tracks table sizes for cost forecasting.

analytics / insights — read-only domain endpoints the agent’s analytics router wraps.
webhooks — Clerk webhooks create users; agent uses those user rows for the runtime.user context.
event-tracking — PostHog facade reused by agent observability.
admin-app — surfaces agent traces, costs, and conversations in the platform observability dashboard.
payments / platform-billing — tier source-of-truth that drives the daily $ cap.
exercises (search service) — exercise resolve + ambient RAG.
embeddings (apps/api/src/ai/embeddings/) — vector embeddings for workouts, programs, exercises, member profiles. Powers RAG.

Status

FIT-161 Phase 1 — shipped. 63 tools, prompt caching, undo, confirmation, disambiguation, replay guard, cost cap, observability, conversation persistence, compaction, ambient RAG. Live in production.
FIT-162 Phase 2 — backlog. Planned: member-role read tools (member self-service), WhatsApp bot wrapper, multilingual TTS reply, agent → email composer, agent → CSV export for ad-hoc queries.

Gaps

Member-role tools are 403’d at the controller in v1 (Agent chat is staff-only in this release.). Member surface needs a curated subset (no writes, scoped reads).
No long-running tools — every tool runs inline. A “rebuild embeddings for 10k workouts” prompt would block the turn beyond the SSE timeout. Long-running work should land as a queued task with progress polling.
No model routing — DEFAULT_MODEL is claude-sonnet-4-5 for everything. Title and compaction use claude-haiku-4-5, but the main loop never demotes. A planner+executor split (Haiku for read-only, Sonnet for writes) would cut cost ~3x.
No structured “what happened” recap at end-of-turn beyond the done event. Multi-tool turns rely on the assistant’s text summary.
Agent has no opinion about programming quality — it builds what you ask; doesn’t suggest improvements. Future: a coach-style critic loop.
No background agent — the agent only acts when prompted. No “ping the owner at 9am with their day’s triage” yet.
Conversation privacy is per-user — staff in the same org cannot see each other’s conversations. Acceptable for v1; cross-staff sharing is FIT-162.