Skip to Content
Living documentation — last reviewed 2026-05-28
ArchitectureObservability

Observability

Three tools, plus a deliberate gap.

ToolRoleProject
SentryErrors, traces, slow-query taggingfitkit-backend (API) + fitkit-frontend (web). Marketing and minisites share configs.
PostHogFrontend events, funnel analytics, Spotter cost telemetry, session replayEU instance (https://eu.i.posthog.com)
PinoStructured API logsStreamed wherever Railway captures stdout
Bull-BoardBullMQ job inspection (internal)Mounted in the API

Gap: the audit log surface (FIT-20) is not complete. The schema exists; the cross-cutting interceptor + admin UI is not done. Treat this as a known shortfall.

Sentry

API

Init in apps/api/src/instrument.ts. Imported at the top of main.ts so it loads before NestFactory.

Key behaviour:

SettingValueWhy
dsnSENTRY_DSN envEmpty disables Sentry.
environmentNODE_ENVdevelopment, test, production.
releaseSENTRY_RELEASESet in CI/Railway to git SHA.
enabledisProdSentry off in dev (uses Sentry.init but no transmission).
tracesSampleRateprod 0.3, dev 1.0Catch real perf issues without burning quota.
profilesSampleRateprod 0.1, dev 0Profile a slice of slow transactions.
enableLogsisProdLogs forwarded to Sentry only in prod.
pinoIntegrationprod only, on error and fatalPino → Sentry forwarding.
beforeSenddrops 401 / 403Expected auth failures aren’t reported.
beforeSendTransactiontags slow_db: true when any db span > 1sLets you filter slow-query traces.

Per-request scope is set in LoggingInterceptor (apps/api/src/app/logging.interceptor.ts):

  • Sentry.getCurrentScope().setTag('requestId', requestId)
  • Sentry.getCurrentScope().setUser({ id: userId }) when authenticated
  • Sentry.getCurrentScope().setTag('orgId', orgId) when present

The AllExceptionsFilter extends SentryGlobalFilter so unhandled exceptions are reported automatically.

Web

@sentry/nextjs configured via the standard sentry.client.config.ts / sentry.server.config.ts / sentry.edge.config.ts pattern (Next.js Sentry SDK). Source maps uploaded at build time when SENTRY_AUTH_TOKEN is set; build still succeeds without it.

AnalyticsUserSync in apps/web/src/providers.tsx ties Clerk user identity into Sentry:

Sentry.setUser({ id: user.id, email: user.primaryEmailAddress?.emailAddress }); Sentry.setTag('posthog_session_id', sessionId);

posthog_session_id lets you cross-link a Sentry issue to the PostHog session replay.

Marketing + minisites

Each has its own Sentry project key in env. Same SDK, smaller footprint.

Env vars

SENTRY_DSN API SENTRY_RELEASE optional, used for release tracking NEXT_PUBLIC_SENTRY_DSN Web (client + server) EXPO_PUBLIC_SENTRY_DSN Mobile (fitkit-mobile) — distinct project SENTRY_URL default https://de.sentry.io SENTRY_ORG fitkit (or fitkit1 for mobile) SENTRY_API_PROJECT fitkit-backend SENTRY_WEB_PROJECT fitkit-frontend SENTRY_MOBILE_PROJECT fitkit-mobile SENTRY_AUTH_TOKEN CI/Railway/EAS — enables source map upload

Mobile (fitkit-mobile)

Separate Sentry project (fitkit1/fitkit-mobile). Init happens at module top of app/_layout.tsx, before any RN screen mounts. The @sentry/react-native/expo config plugin uploads source maps + debug IDs during EAS Build (needs SENTRY_AUTH_TOKEN as an EAS secret).

Sample rates:

SettingValueWhy
tracesSampleRate0No perf traces until we have a specific bottleneck — they balloon event volume.
replaysOnErrorSampleRate1.0Capture every error session.
replaysSessionSampleRate0.110% of normal sessions for baseline behavior.
sendDefaultPiifalseMatches the shipping PrivacyInfo.xcprivacy (NSPrivacyTracking=false).

The mobile-replay integration ships UI frames; combined with replaysOnErrorSampleRate: 1.0 you get a visual recording of every crash. iOS dSYMs and Android proguard mapping are auto-uploaded by EAS.

PostHog

Web client

Initialised in apps/web/src/providers/posthog-provider.tsx. Env: NEXT_PUBLIC_POSTHOG_KEY, NEXT_PUBLIC_POSTHOG_HOST (default https://eu.i.posthog.com).

PostHogPageView (apps/web/src/components/posthog-pageview.tsx) sends a $pageview per navigation. AnalyticsUserSync calls posthog.identify(user.id, { email, name }) on Clerk-user changes and posthog.reset() on sign-out.

Mobile client

posthog-react-native. Env: EXPO_PUBLIC_POSTHOG_KEY, EXPO_PUBLIC_POSTHOG_HOST (default EU). Initialised inside useAnalyticsIdentify() (src/hooks/use-analytics-identify.ts) once Clerk loads — distinct id = Clerk user id, mirrors web so the same identified user shows up across both surfaces. Screen views are tracked via Expo Router’s navigation events.

API server-side

apps/api/src/event-tracking/event-tracking.service.ts. Env: POSTHOG_API_KEY, POSTHOG_PROJECT_ID, POSTHOG_HOST (default EU). Used to fire server-authoritative events:

  • user_signed_up (from the Clerk webhook handler)
  • Payment funnel: checkout_started, form_created, webhook_received, transaction-stage events
  • Spotter telemetry: per-turn cost, cache hit ratio, tool counts (apps/api/src/ai/agent/observability/)

Why server-side

The web client can’t be trusted to fire a “payment succeeded” event — it might be a different browser, an offline reconnect, or a payment-provider callback hitting only the API. Critical funnel events fire from the API.

Pino structured logs

Set up in apps/api/src/app/app.module.ts via LoggerModule.forRoot:

  • pino-pretty transport in non-prod (colorized, human-readable).
  • JSON logs in prod (Railway captures and forwards).
  • autoLogging: true — request/response auto-logged.
  • genReqIdx-request-id header passthrough, else generates req_<ts36>_<rand>.
  • customLogLevel:
    • err || statusCode >= 500error
    • statusCode >= 400warn
    • else → info
  • Serializers strip request to { id, method, url, remoteAddress } and response to { statusCode }.

Slow-request warnings

LoggingInterceptor emits a SLOW warn whenever a request exceeds 800ms (lowered from 3s so the 500–2000ms tail surfaces).

Error log shape

{ "level": "error", "requestId": "req_l2x...", "userId": "user_2gT...", "orgId": "0193...", "duration": 1241, "statusCode": 500, "error": "...", "pgError": "duplicate key value violates unique constraint", // when err.cause is a pg error "stack": "..." // only for 5xx }

Health endpoint

GET /healthapps/api/src/app/health.controller.ts. Public + skip-throttle. Returns:

{ "status": "ok" | "degraded", "timestamp": "2026-05-28T...", "checks": { "database": "up" | "down", "redis": "up" | "down" } }

Used by Railway for liveness probing and by uptime monitors.

Bull-Board

apps/api/src/bull-board/bull-board.module.ts mounts the BullMQ admin UI (route name lives in that module). Access is gated to FitKit-admin users.

Use it to:

  • Inspect queue depth and failed jobs.
  • Retry failures.
  • Browse job payloads.

In a production incident, Bull-Board is the fastest way to see whether the push-notification, embedding, or import queues are backed up.

Spotter observability

Spotter ships a dedicated observability service: apps/api/src/ai/agent/observability/agent-observability.service.ts. Tracks:

  • Per-turn cost (USD) — derived from token counts × model price.
  • Cache hit ratio (RAG + tool memoization).
  • Tool counts per conversation.
  • Storage growth (snapshot service: storage-snapshot.service.ts).

These flow into PostHog so the admin cost dashboards can show per-org Spotter spend and watch the cache-hit ratio drift.

Audit logging (gap)

Schema: audit_logs table in libs/db/src/lib/schema/admin.ts:

{ id, actorClerkId, action, resource, resourceId, metadata, ipAddress, createdAt }

State today:

  • The table is created.
  • Specific high-trust paths write rows directly (e.g. admin tier changes).
  • No global interceptor. Most write paths do not log to audit_logs.
  • No admin UI for browsing audit entries.

This is FIT-20 and is on the near-term roadmap. Until it lands:

  • Don’t claim the platform has a complete audit trail to customers.
  • For incident forensics, lean on Pino logs (Railway log retention) + Sentry breadcrumbs.

Per-request id propagation

Every request that arrives at the API:

  1. Reads x-request-id if present, else generates one (Pino).
  2. Sets requestId on Pino’s per-request log context.
  3. Tags Sentry scope with the same id.
  4. Sets the response header X-Request-Id (LoggingInterceptor).

serverFetch on the web side surfaces it back into Sentry context on failed API calls:

scope.setContext('api_request', { path, status, requestId: res.headers.get('x-request-id') });

When a customer reports a bug, ask for the X-Request-Id from a network tab — it pivots into both Sentry and Pino quickly.