Observability
Three tools, plus a deliberate gap.
| Tool | Role | Project |
|---|---|---|
| Sentry | Errors, traces, slow-query tagging | fitkit-backend (API) + fitkit-frontend (web). Marketing and minisites share configs. |
| PostHog | Frontend events, funnel analytics, Spotter cost telemetry, session replay | EU instance (https://eu.i.posthog.com) |
| Pino | Structured API logs | Streamed wherever Railway captures stdout |
| Bull-Board | BullMQ job inspection (internal) | Mounted in the API |
Gap: the audit log surface (FIT-20) is not complete. The schema exists; the cross-cutting interceptor + admin UI is not done. Treat this as a known shortfall.
Sentry
API
Init in apps/api/src/instrument.ts. Imported at the top of main.ts so it loads before NestFactory.
Key behaviour:
| Setting | Value | Why |
|---|---|---|
dsn | SENTRY_DSN env | Empty disables Sentry. |
environment | NODE_ENV | development, test, production. |
release | SENTRY_RELEASE | Set in CI/Railway to git SHA. |
enabled | isProd | Sentry off in dev (uses Sentry.init but no transmission). |
tracesSampleRate | prod 0.3, dev 1.0 | Catch real perf issues without burning quota. |
profilesSampleRate | prod 0.1, dev 0 | Profile a slice of slow transactions. |
enableLogs | isProd | Logs forwarded to Sentry only in prod. |
pinoIntegration | prod only, on error and fatal | Pino → Sentry forwarding. |
beforeSend | drops 401 / 403 | Expected auth failures aren’t reported. |
beforeSendTransaction | tags slow_db: true when any db span > 1s | Lets you filter slow-query traces. |
Per-request scope is set in LoggingInterceptor (apps/api/src/app/logging.interceptor.ts):
Sentry.getCurrentScope().setTag('requestId', requestId)Sentry.getCurrentScope().setUser({ id: userId })when authenticatedSentry.getCurrentScope().setTag('orgId', orgId)when present
The AllExceptionsFilter extends SentryGlobalFilter so unhandled exceptions are reported automatically.
Web
@sentry/nextjs configured via the standard sentry.client.config.ts / sentry.server.config.ts / sentry.edge.config.ts pattern (Next.js Sentry SDK). Source maps uploaded at build time when SENTRY_AUTH_TOKEN is set; build still succeeds without it.
AnalyticsUserSync in apps/web/src/providers.tsx ties Clerk user identity into Sentry:
Sentry.setUser({ id: user.id, email: user.primaryEmailAddress?.emailAddress });
Sentry.setTag('posthog_session_id', sessionId);posthog_session_id lets you cross-link a Sentry issue to the PostHog session replay.
Marketing + minisites
Each has its own Sentry project key in env. Same SDK, smaller footprint.
Env vars
SENTRY_DSN API
SENTRY_RELEASE optional, used for release tracking
NEXT_PUBLIC_SENTRY_DSN Web (client + server)
EXPO_PUBLIC_SENTRY_DSN Mobile (fitkit-mobile) — distinct project
SENTRY_URL default https://de.sentry.io
SENTRY_ORG fitkit (or fitkit1 for mobile)
SENTRY_API_PROJECT fitkit-backend
SENTRY_WEB_PROJECT fitkit-frontend
SENTRY_MOBILE_PROJECT fitkit-mobile
SENTRY_AUTH_TOKEN CI/Railway/EAS — enables source map uploadMobile (fitkit-mobile)
Separate Sentry project (fitkit1/fitkit-mobile). Init happens at module top of app/_layout.tsx, before any RN screen mounts. The @sentry/react-native/expo config plugin uploads source maps + debug IDs during EAS Build (needs SENTRY_AUTH_TOKEN as an EAS secret).
Sample rates:
| Setting | Value | Why |
|---|---|---|
tracesSampleRate | 0 | No perf traces until we have a specific bottleneck — they balloon event volume. |
replaysOnErrorSampleRate | 1.0 | Capture every error session. |
replaysSessionSampleRate | 0.1 | 10% of normal sessions for baseline behavior. |
sendDefaultPii | false | Matches the shipping PrivacyInfo.xcprivacy (NSPrivacyTracking=false). |
The mobile-replay integration ships UI frames; combined with replaysOnErrorSampleRate: 1.0 you get a visual recording of every crash. iOS dSYMs and Android proguard mapping are auto-uploaded by EAS.
PostHog
Web client
Initialised in apps/web/src/providers/posthog-provider.tsx. Env: NEXT_PUBLIC_POSTHOG_KEY, NEXT_PUBLIC_POSTHOG_HOST (default https://eu.i.posthog.com).
PostHogPageView (apps/web/src/components/posthog-pageview.tsx) sends a $pageview per navigation. AnalyticsUserSync calls posthog.identify(user.id, { email, name }) on Clerk-user changes and posthog.reset() on sign-out.
Mobile client
posthog-react-native. Env: EXPO_PUBLIC_POSTHOG_KEY, EXPO_PUBLIC_POSTHOG_HOST (default EU). Initialised inside useAnalyticsIdentify() (src/hooks/use-analytics-identify.ts) once Clerk loads — distinct id = Clerk user id, mirrors web so the same identified user shows up across both surfaces. Screen views are tracked via Expo Router’s navigation events.
API server-side
apps/api/src/event-tracking/event-tracking.service.ts. Env: POSTHOG_API_KEY, POSTHOG_PROJECT_ID, POSTHOG_HOST (default EU). Used to fire server-authoritative events:
user_signed_up(from the Clerk webhook handler)- Payment funnel:
checkout_started,form_created,webhook_received, transaction-stage events - Spotter telemetry: per-turn cost, cache hit ratio, tool counts (
apps/api/src/ai/agent/observability/)
Why server-side
The web client can’t be trusted to fire a “payment succeeded” event — it might be a different browser, an offline reconnect, or a payment-provider callback hitting only the API. Critical funnel events fire from the API.
Pino structured logs
Set up in apps/api/src/app/app.module.ts via LoggerModule.forRoot:
pino-prettytransport in non-prod (colorized, human-readable).- JSON logs in prod (Railway captures and forwards).
autoLogging: true— request/response auto-logged.genReqId—x-request-idheader passthrough, else generatesreq_<ts36>_<rand>.customLogLevel:err || statusCode >= 500→errorstatusCode >= 400→warn- else →
info
- Serializers strip request to
{ id, method, url, remoteAddress }and response to{ statusCode }.
Slow-request warnings
LoggingInterceptor emits a SLOW warn whenever a request exceeds 800ms (lowered from 3s so the 500–2000ms tail surfaces).
Error log shape
{
"level": "error",
"requestId": "req_l2x...",
"userId": "user_2gT...",
"orgId": "0193...",
"duration": 1241,
"statusCode": 500,
"error": "...",
"pgError": "duplicate key value violates unique constraint", // when err.cause is a pg error
"stack": "..." // only for 5xx
}Health endpoint
GET /health — apps/api/src/app/health.controller.ts. Public + skip-throttle. Returns:
{
"status": "ok" | "degraded",
"timestamp": "2026-05-28T...",
"checks": { "database": "up" | "down", "redis": "up" | "down" }
}Used by Railway for liveness probing and by uptime monitors.
Bull-Board
apps/api/src/bull-board/bull-board.module.ts mounts the BullMQ admin UI (route name lives in that module). Access is gated to FitKit-admin users.
Use it to:
- Inspect queue depth and failed jobs.
- Retry failures.
- Browse job payloads.
In a production incident, Bull-Board is the fastest way to see whether the push-notification, embedding, or import queues are backed up.
Spotter observability
Spotter ships a dedicated observability service: apps/api/src/ai/agent/observability/agent-observability.service.ts. Tracks:
- Per-turn cost (USD) — derived from token counts × model price.
- Cache hit ratio (RAG + tool memoization).
- Tool counts per conversation.
- Storage growth (snapshot service:
storage-snapshot.service.ts).
These flow into PostHog so the admin cost dashboards can show per-org Spotter spend and watch the cache-hit ratio drift.
Audit logging (gap)
Schema: audit_logs table in libs/db/src/lib/schema/admin.ts:
{
id, actorClerkId, action, resource, resourceId, metadata, ipAddress, createdAt
}State today:
- The table is created.
- Specific high-trust paths write rows directly (e.g. admin tier changes).
- No global interceptor. Most write paths do not log to
audit_logs. - No admin UI for browsing audit entries.
This is FIT-20 and is on the near-term roadmap. Until it lands:
- Don’t claim the platform has a complete audit trail to customers.
- For incident forensics, lean on Pino logs (Railway log retention) + Sentry breadcrumbs.
Per-request id propagation
Every request that arrives at the API:
- Reads
x-request-idif present, else generates one (Pino). - Sets
requestIdon Pino’s per-request log context. - Tags Sentry scope with the same id.
- Sets the response header
X-Request-Id(LoggingInterceptor).
serverFetch on the web side surfaces it back into Sentry context on failed API calls:
scope.setContext('api_request', { path, status, requestId: res.headers.get('x-request-id') });When a customer reports a bug, ask for the X-Request-Id from a network tab — it pivots into both Sentry and Pino quickly.