Skip to Content
Living documentation — last reviewed 2026-05-28
ArchitectureInfrastructure

Infrastructure

How FitKit gets built, where it runs, and the env that holds it together.

Hosting summary

AppHostConfigRuntime
apps/apiRailwayapps/api/railway.tomlNode 22, webpack-built NestJS
apps/webVercelvercel.json (repo root)Next.js 16
apps/marketingRailwayapps/marketing/railway.tomlAstro server (node apps/marketing/dist/server/entry.mjs)
apps/adminVercelapps/admin/vercel.jsonVite static
apps/minisitesVercelapps/minisites/vercel.jsonAstro

PostgreSQL in production: managed (Neon or similar). The pgvector/pgvector:pg16 image powers local dev — prod must have pgvector available (Neon ships it).

Redis in production: managed (Railway). Used for BullMQ, Socket.IO adapter, Spotter rate limiting + caching.

Object storage: Cloudflare R2 — two logical buckets (general + compliance).

Environments

EnvironmentPurposeDBRedisNotes
local devEngineer machinesdocker compose up postgres → 5432docker compose up redis → 6379apps/api/.env, apps/web/.env.local. make dev-up
local testVitest + Playwrightdocker compose -f docker-compose.test.yml up → 55432→ 56379.env.test if present, defaults in Makefile
CIGitHub ActionsService container pgvector/pgvector:pg16 on 55432redis:7-alpine on 6379See .github/workflows/
PR previewVercel preview (web) + Railway preview env (API)Shared preview DBShared preview Redis.github/workflows/deploy-pr-preview.yml rewrites FRONTEND_URL + ALLOWED_ORIGINS per-PR
productionLive customersManaged PostgresManaged RedisSingle instance per app today (Spotter + WS multi-instance-ready via Redis adapter, just not currently scaled out)

There’s no explicit staging environment — PR previews fill that role. Adding a long-lived staging is on the wishlist; not blocking.

Local dev setup

See runbooks/env-setup.md for the recipe. Short form:

corepack enable && corepack prepare pnpm@10.29.1 --activate pnpm install cp .env.example apps/api/.env # fill REQUIREDs (Clerk keys etc.) cp .env.example apps/web/.env.local # fill NEXT_PUBLIC_* make dev-up # postgres + redis make dev-db-migrate pnpm dev # api on 3001, web on 3000

Required env (backend)

The full list lives in .env.example. Bare minimum to boot the API:

DATABASE_URL REDIS_URL CLERK_SECRET_KEY CLERK_PUBLISHABLE_KEY FRONTEND_URL ALLOWED_ORIGINS PAYMENT_CREDENTIALS_ENCRYPTION_KEY NATIONAL_ID_ENCRYPTION_KEY

Without any of those validateEnv() in apps/api/src/config/env.schema.ts aborts startup.

Encryption keys

Two AES-256-GCM keys, 32-byte hex each:

KeyPurposeRotation note
PAYMENT_CREDENTIALS_ENCRYPTION_KEYPayment provider config secrets + member payment tokensRotating after writes makes existing values unreadable. Plan a re-key migration before rotating in prod.
NATIONAL_ID_ENCRYPTION_KEYIsraeli national IDsSame caveat.

Generate with node -e "console.log(require('crypto').randomBytes(32).toString('hex'))". Dev defaults to a 64-zero string in the Makefile and .env.example; prod must override.

Required env (web)

NEXT_PUBLIC_CLERK_PUBLISHABLE_KEY NEXT_PUBLIC_API_URL CLERK_SECRET_KEY # server-side use in middleware

Optional but commonly set: NEXT_PUBLIC_POSTHOG_KEY, NEXT_PUBLIC_SENTRY_DSN, NEXT_PUBLIC_GOOGLE_MAPS_API_KEY, NEXT_PUBLIC_APP_URL (for payment-provider redirects).

Secrets management

Vars live in:

  • Local: .env files (gitignored).
  • CI: GitHub Actions secrets (see workflow files).
  • Railway: project-level env on each service.
  • Vercel: project env vars per environment (preview / production).

There is no Vault / Doppler / 1Password Secrets Automation today. Rotation is a manual edit-and-redeploy. Encryption keys (payment + national ID) are particularly load-bearing — see the rotation note above.

Redis usage

SubsystemUse
BullMQQueue storage + job state (push notifications, embeddings, imports, exports, etc.)
Socket.IO adapterCross-instance event pub/sub for WS messages
Spotter rate limitingPer-org daily caps
Spotter cachingTool call + RAG result memoization
WS presence”Online now” state

A single Redis covers all of them in dev and prod. Failure mode: BullMQ jobs stall, WS messages don’t propagate across instances. The health check (/health) reports redis down.

R2 (Cloudflare)

apps/api/src/r2/r2.module.ts builds an S3-compatible client. Two buckets:

EnvUse
R2_BUCKET_NAMEDefault — exports, attachments, uploads.
R2_COMPLIANCE_BUCKET_NAMEFIT-158 signed compliance PDFs. 7+ year retention, lifecycle + object-lock policies must be set at the bucket level in Cloudflare. Falls back to R2_BUCKET_NAME when unset (dev convenience only).

Reads always presign — there’s no public bucket URL.

CI workflows

.github/workflows/:

WorkflowTriggersPurpose
ci-smoke-tests.ymlEvery PR touching apps/api, apps/web, libs/**, root configBoots Postgres + Redis service containers, runs unit + integration tests across API + web. Fast gate.
cd-full-test-gate.ymlPushes to main, workflow_dispatchSame infra as smoke + full Playwright E2E. Slower, post-merge.
deploy-pr-preview.ymlissue_comment from vercel[bot] with a “Visit Preview” link, or manual dispatchReads the Vercel preview URL, rewrites FRONTEND_URL + ALLOWED_ORIGINS on the Railway “preview” service, redeploys the PR head. ALLOWED_ORIGINS is per-PR exact-match — Vercel preview URLs don’t match a wildcard pattern.
publish-shared.ymlManual dispatchPublishes @fitkit/shared (if/when needed as an installable package).
smegrep.yml(Internal utility)Code search tooling.

Both ci-smoke-tests.yml and cd-full-test-gate.yml set:

  • TEST_AUTH_BYPASS=true, TEST_HOOKS_ENABLED=true, NEXT_PUBLIC_E2E_TEST_MODE=true
  • CRONS_ENABLED=true (so billing-retry handlers actually do work)
  • Dummy encryption keys
  • Sandbox Morning provider URLs
  • Test R2 credentials (non-functional — uploads return errors, tests don’t exercise them)

Deployment flow

API: Railway watches the repo’s main branch (and a “preview” environment for the PR-preview workflow). Pushes auto-deploy. apps/api/railway.toml carries the build + start command.

Web: Vercel watches the same repo. Build command: pnpm exec nx build @fitkit/web. Output: apps/web/.next.

Marketing: Railway, separate service. pnpm exec nx build @fitkit/marketing then node apps/marketing/dist/server/entry.mjs.

Minisites: Vercel. Per-org custom domains attached via the Vercel Domain API using VERCEL_TOKEN + VERCEL_MINISITES_PROJECT_ID — see the minisite onboarding flow in apps/api/src/.

Domains

SurfaceDomain (prod)
Marketingfitkit.fit (or similar — confirm in the Vercel/Railway project)
Operator webapp.fitkit.*
APIapi.fitkit.*
Minisitesper-org custom domains + <slug>.fitkit.* fallback

(Exact production hostnames depend on the active deployment — confirm via Railway/Vercel settings rather than this doc.)

Backups

Managed Postgres host owns backup policy (Neon has point-in-time recovery in higher tiers). No explicit FitKit-owned backup script exists today — relying on the managed provider.

Cloudflare R2 does not version objects by default; compliance bucket relies on object-lock + lifecycle for retention rather than backup.

This is a documented gap. For SOC2-style controls we’ll need explicit backup verification.

Scaling notes

  • API: single instance today. Multi-instance-ready (Socket.IO Redis adapter, BullMQ workers can pick up jobs from any instance). Scale horizontally on Railway when traffic warrants.
  • Web: Vercel autoscales serverless functions; no specific config needed.
  • DB: vertical scaling on Neon for now. Read replicas not used.
  • Redis: single instance. Failure is degraded-mode (jobs stall, WS messages don’t fan out cross-instance).

Reference: full env var list

.env.example is the source of truth. Key categories:

  • Backend core: DATABASE_URL, REDIS_URL, CLERK_*, FRONTEND_URL, ALLOWED_ORIGINS, PORT
  • DB pool: DB_POOL_*, DIRECT_DATABASE_URL, DB_LOGGING
  • Encryption: PAYMENT_CREDENTIALS_ENCRYPTION_KEY, NATIONAL_ID_ENCRYPTION_KEY
  • Payments: CARDCOM_*, RIVHIT_*, ICREDIT_*, MORNING_*, PLATFORM_BILLING_*
  • Email: RESEND_API_KEY, RESEND_FROM_ADDRESS
  • Webhooks: CLERK_WEBHOOK_SECRET, INVITE_SECRET, QR_CHECKIN_SECRET
  • R2: R2_ACCOUNT_ID, R2_ACCESS_KEY_ID, R2_SECRET_ACCESS_KEY, R2_BUCKET_NAME, R2_COMPLIANCE_BUCKET_NAME
  • AI: VOYAGE_API_KEY, ANTHROPIC_API_KEY
  • Cron switch: CRONS_ENABLED
  • Sentry: SENTRY_DSN, SENTRY_RELEASE, build-time SENTRY_*
  • PostHog: POSTHOG_API_KEY, POSTHOG_PROJECT_ID, POSTHOG_HOST + NEXT_PUBLIC_POSTHOG_*
  • Admin observability: RAILWAY_API_TOKEN, NEON_API_KEY
  • Testing: TEST_AUTH_BYPASS, TEST_HOOKS_ENABLED, TEST_SEED_SECRET, NEXT_PUBLIC_E2E_TEST_MODE, E2E_*

Comments inside .env.example explain when to override each variable.