Skip to Content
Living documentation — last reviewed 2026-05-28
DecisionsADR-0010: Spotter agent — tool surface, confirm-on-write, per-org cost cap

ADR-0010: Spotter agent — tool surface, confirm-on-write, per-org cost cap

Status: Accepted (Phase 1 shipped per FIT-161) Date: ~2026-04 (FIT-161 milestone) Context owner: Owner

Context

Coaches and studio owners do a lot of repetitive admin: creating workouts, booking clients, adjusting plans, checking compliance status, drafting messages. An LLM agent that can drive the API on their behalf is high-leverage — IF the safety, cost, and trust model is right.

We picked Claude as the model (Anthropic SDK), and committed to building the agent in-app rather than handing it off to a third-party automation platform.

Decision

The Spotter agent lives in apps/api/src/ai/agent/. Key architectural choices:

Tools as a 1:1 mirror of services

Each tool calls the same service code path as the equivalent HTTP route. Routers under apps/api/src/ai/agent/tools/ (e.g., workouts/, members/, bookings/, forms/, messages/, class_sessions/). Phase 1 (FIT-161) shipped 56 tools across 8 routers. Phase 2 (FIT-162) extends this.

Schemas in libs/shared/src/lib/agent-schemas/. The agent sees the same Zod schemas the HTTP layer uses, plus a category field per tool (read / write / destructive) that drives UX behavior.

As of 2026-05-28 the surface is 63 tools across 11 routers (the original FIT-161 brief cited 56/8 — Phase 1 expanded during shipping). features/spotter-agent/README.md carries the live count.

Confirm-on-write, undo-on-destructive

  • Read tools execute immediately.
  • Write tools execute immediately but emit a “what I did” card with a 15s undo button (where the tool has an inverse).
  • Destructive tools (delete, cancel, suspend) show a confirmation card BEFORE execution. The user clicks confirm; only then does the tool actually run.

A bulk-confirmation card for multi-destructive turns is on the Phase 2 backlog (FIT-162).

Per-org daily $ cost cap

Each org has a daily token-spend cap. Hitting it returns a structured response (not an error) with an upsell. PostHog tracks cap-hit events. Implemented because LLM tokens cost real money and an automated workflow could otherwise drain the budget in seconds.

Prompt caching architecture

Anthropic prompt caching is configured with:

  • 1-hour TTL on stable blocks (system prompt cookbook, tool definitions). These rarely change.
  • A 4th breakpoint on the message tail for conversation context within a session.

PostHog tracks per-turn cost, cache-hit ratio, and storage growth so we can tune.

Audit + inverse plumbing

Every write tool records an agent_tool_call row. Tools with snapshot-restore inverses (Phase 2 work, FIT-162) snapshot prior state before write, making the undo durable.

Observability

PostHog events per turn:

  • agent.turn.completed — model, tokens, cost, cache hit, duration.
  • agent.tool.executed — router, action, outcome.
  • agent.daily_cap.hit — org, cap value.

Plus storage growth metrics for the agent_* tables.

Consequences

Positive

  • The agent uses the same auth, the same Zod schemas, the same services as the HTTP layer. Bug in the API → bug in the agent (and vice versa) — they fail together. No drift.
  • Confirmation flow keeps trust high for destructive ops without making read flows annoying.
  • Cost cap is visible to the operator and predictable per org.
  • New tools are cheap to add (one schema + one service call + an enum entry + i18n labels).

Negative

  • Tool surface grows with the API surface. Phase 2 adds bookings-for-others, member writes, workouts.get, member history, etc.
  • The agent can do things the user could do “by hand” — including things they shouldn’t, if permissions on the underlying service are misconfigured. Tools inherit the user’s permissions, but we still need careful destructive-action UX.
  • Snapshot-restore for undo is harder than it looks for cross-table writes (e.g., a workout edit that touches sections + movements). Phase 1 doesn’t have it on every write tool.

Discipline

For every new tool:

  1. Tool registered in apps/api/src/ai/agent/tools/leaves/ with proper audit + confirm + inverse (where applicable).
  2. Schema in libs/shared/src/lib/agent-schemas/, re-exported from the index.
  3. Action enum in tool-categories.ts.
  4. EN/HE/RU labels under agent.tools.* in the dictionaries.
  5. System-prompt cookbook updated if the tool has non-obvious usage.
  6. agent.tool.executed event fires with router + action + outcome.
  7. One real prompt exercises the tool end-to-end (Spotter has its own playground harness).

Open questions / future

  • Member-facing agent. Today only coaches/owners can use Spotter. A member-facing chat (book classes, log workout, check schedule) is v3.
  • Cross-staff visibility. Owners can’t see what coaches asked the agent. Privacy vs ownership question to resolve.
  • WhatsApp / mobile invocation. Once the WhatsApp ISV integration (FIT-140) ships, the agent becomes invocable from chat. Different latency budget, same tools.
  • Payments writes. Refunds and chargebacks are explicitly out of Phase 1/2 scope. Treat as a separate ticket with elevated review.