Infrastructure
How FitKit gets built, where it runs, and the env that holds it together.
Hosting summary
| App | Host | Config | Runtime |
|---|---|---|---|
apps/api | Railway | apps/api/railway.toml | Node 22, webpack-built NestJS |
apps/web | Vercel | vercel.json (repo root) | Next.js 16 |
apps/marketing | Railway | apps/marketing/railway.toml | Astro server (node apps/marketing/dist/server/entry.mjs) |
apps/admin | Vercel | apps/admin/vercel.json | Vite static |
apps/minisites | Vercel | apps/minisites/vercel.json | Astro |
PostgreSQL in production: managed (Neon or similar). The pgvector/pgvector:pg16 image powers local dev — prod must have pgvector available (Neon ships it).
Redis in production: managed (Railway). Used for BullMQ, Socket.IO adapter, Spotter rate limiting + caching.
Object storage: Cloudflare R2 — two logical buckets (general + compliance).
Environments
| Environment | Purpose | DB | Redis | Notes |
|---|---|---|---|---|
| local dev | Engineer machines | docker compose up postgres → 5432 | docker compose up redis → 6379 | apps/api/.env, apps/web/.env.local. make dev-up |
| local test | Vitest + Playwright | docker compose -f docker-compose.test.yml up → 55432 | → 56379 | .env.test if present, defaults in Makefile |
| CI | GitHub Actions | Service container pgvector/pgvector:pg16 on 55432 | redis:7-alpine on 6379 | See .github/workflows/ |
| PR preview | Vercel preview (web) + Railway preview env (API) | Shared preview DB | Shared preview Redis | .github/workflows/deploy-pr-preview.yml rewrites FRONTEND_URL + ALLOWED_ORIGINS per-PR |
| production | Live customers | Managed Postgres | Managed Redis | Single instance per app today (Spotter + WS multi-instance-ready via Redis adapter, just not currently scaled out) |
There’s no explicit staging environment — PR previews fill that role. Adding a long-lived staging is on the wishlist; not blocking.
Local dev setup
See runbooks/env-setup.md for the recipe. Short form:
corepack enable && corepack prepare pnpm@10.29.1 --activate
pnpm install
cp .env.example apps/api/.env # fill REQUIREDs (Clerk keys etc.)
cp .env.example apps/web/.env.local # fill NEXT_PUBLIC_*
make dev-up # postgres + redis
make dev-db-migrate
pnpm dev # api on 3001, web on 3000Required env (backend)
The full list lives in .env.example. Bare minimum to boot the API:
DATABASE_URL
REDIS_URL
CLERK_SECRET_KEY
CLERK_PUBLISHABLE_KEY
FRONTEND_URL
ALLOWED_ORIGINS
PAYMENT_CREDENTIALS_ENCRYPTION_KEY
NATIONAL_ID_ENCRYPTION_KEYWithout any of those validateEnv() in apps/api/src/config/env.schema.ts aborts startup.
Encryption keys
Two AES-256-GCM keys, 32-byte hex each:
| Key | Purpose | Rotation note |
|---|---|---|
PAYMENT_CREDENTIALS_ENCRYPTION_KEY | Payment provider config secrets + member payment tokens | Rotating after writes makes existing values unreadable. Plan a re-key migration before rotating in prod. |
NATIONAL_ID_ENCRYPTION_KEY | Israeli national IDs | Same caveat. |
Generate with node -e "console.log(require('crypto').randomBytes(32).toString('hex'))". Dev defaults to a 64-zero string in the Makefile and .env.example; prod must override.
Required env (web)
NEXT_PUBLIC_CLERK_PUBLISHABLE_KEY
NEXT_PUBLIC_API_URL
CLERK_SECRET_KEY # server-side use in middlewareOptional but commonly set: NEXT_PUBLIC_POSTHOG_KEY, NEXT_PUBLIC_SENTRY_DSN, NEXT_PUBLIC_GOOGLE_MAPS_API_KEY, NEXT_PUBLIC_APP_URL (for payment-provider redirects).
Secrets management
Vars live in:
- Local:
.envfiles (gitignored). - CI: GitHub Actions secrets (see workflow files).
- Railway: project-level env on each service.
- Vercel: project env vars per environment (preview / production).
There is no Vault / Doppler / 1Password Secrets Automation today. Rotation is a manual edit-and-redeploy. Encryption keys (payment + national ID) are particularly load-bearing — see the rotation note above.
Redis usage
| Subsystem | Use |
|---|---|
| BullMQ | Queue storage + job state (push notifications, embeddings, imports, exports, etc.) |
| Socket.IO adapter | Cross-instance event pub/sub for WS messages |
| Spotter rate limiting | Per-org daily caps |
| Spotter caching | Tool call + RAG result memoization |
| WS presence | ”Online now” state |
A single Redis covers all of them in dev and prod. Failure mode: BullMQ jobs stall, WS messages don’t propagate across instances. The health check (/health) reports redis down.
R2 (Cloudflare)
apps/api/src/r2/r2.module.ts builds an S3-compatible client. Two buckets:
| Env | Use |
|---|---|
R2_BUCKET_NAME | Default — exports, attachments, uploads. |
R2_COMPLIANCE_BUCKET_NAME | FIT-158 signed compliance PDFs. 7+ year retention, lifecycle + object-lock policies must be set at the bucket level in Cloudflare. Falls back to R2_BUCKET_NAME when unset (dev convenience only). |
Reads always presign — there’s no public bucket URL.
CI workflows
.github/workflows/:
| Workflow | Triggers | Purpose |
|---|---|---|
ci-smoke-tests.yml | Every PR touching apps/api, apps/web, libs/**, root config | Boots Postgres + Redis service containers, runs unit + integration tests across API + web. Fast gate. |
cd-full-test-gate.yml | Pushes to main, workflow_dispatch | Same infra as smoke + full Playwright E2E. Slower, post-merge. |
deploy-pr-preview.yml | issue_comment from vercel[bot] with a “Visit Preview” link, or manual dispatch | Reads the Vercel preview URL, rewrites FRONTEND_URL + ALLOWED_ORIGINS on the Railway “preview” service, redeploys the PR head. ALLOWED_ORIGINS is per-PR exact-match — Vercel preview URLs don’t match a wildcard pattern. |
publish-shared.yml | Manual dispatch | Publishes @fitkit/shared (if/when needed as an installable package). |
smegrep.yml | (Internal utility) | Code search tooling. |
Both ci-smoke-tests.yml and cd-full-test-gate.yml set:
TEST_AUTH_BYPASS=true,TEST_HOOKS_ENABLED=true,NEXT_PUBLIC_E2E_TEST_MODE=trueCRONS_ENABLED=true(so billing-retry handlers actually do work)- Dummy encryption keys
- Sandbox Morning provider URLs
- Test R2 credentials (non-functional — uploads return errors, tests don’t exercise them)
Deployment flow
API: Railway watches the repo’s main branch (and a “preview” environment for the PR-preview workflow). Pushes auto-deploy. apps/api/railway.toml carries the build + start command.
Web: Vercel watches the same repo. Build command: pnpm exec nx build @fitkit/web. Output: apps/web/.next.
Marketing: Railway, separate service. pnpm exec nx build @fitkit/marketing then node apps/marketing/dist/server/entry.mjs.
Minisites: Vercel. Per-org custom domains attached via the Vercel Domain API using VERCEL_TOKEN + VERCEL_MINISITES_PROJECT_ID — see the minisite onboarding flow in apps/api/src/.
Domains
| Surface | Domain (prod) |
|---|---|
| Marketing | fitkit.fit (or similar — confirm in the Vercel/Railway project) |
| Operator web | app.fitkit.* |
| API | api.fitkit.* |
| Minisites | per-org custom domains + <slug>.fitkit.* fallback |
(Exact production hostnames depend on the active deployment — confirm via Railway/Vercel settings rather than this doc.)
Backups
Managed Postgres host owns backup policy (Neon has point-in-time recovery in higher tiers). No explicit FitKit-owned backup script exists today — relying on the managed provider.
Cloudflare R2 does not version objects by default; compliance bucket relies on object-lock + lifecycle for retention rather than backup.
This is a documented gap. For SOC2-style controls we’ll need explicit backup verification.
Scaling notes
- API: single instance today. Multi-instance-ready (Socket.IO Redis adapter, BullMQ workers can pick up jobs from any instance). Scale horizontally on Railway when traffic warrants.
- Web: Vercel autoscales serverless functions; no specific config needed.
- DB: vertical scaling on Neon for now. Read replicas not used.
- Redis: single instance. Failure is degraded-mode (jobs stall, WS messages don’t fan out cross-instance).
Reference: full env var list
.env.example is the source of truth. Key categories:
- Backend core:
DATABASE_URL,REDIS_URL,CLERK_*,FRONTEND_URL,ALLOWED_ORIGINS,PORT - DB pool:
DB_POOL_*,DIRECT_DATABASE_URL,DB_LOGGING - Encryption:
PAYMENT_CREDENTIALS_ENCRYPTION_KEY,NATIONAL_ID_ENCRYPTION_KEY - Payments:
CARDCOM_*,RIVHIT_*,ICREDIT_*,MORNING_*,PLATFORM_BILLING_* - Email:
RESEND_API_KEY,RESEND_FROM_ADDRESS - Webhooks:
CLERK_WEBHOOK_SECRET,INVITE_SECRET,QR_CHECKIN_SECRET - R2:
R2_ACCOUNT_ID,R2_ACCESS_KEY_ID,R2_SECRET_ACCESS_KEY,R2_BUCKET_NAME,R2_COMPLIANCE_BUCKET_NAME - AI:
VOYAGE_API_KEY,ANTHROPIC_API_KEY - Cron switch:
CRONS_ENABLED - Sentry:
SENTRY_DSN,SENTRY_RELEASE, build-timeSENTRY_* - PostHog:
POSTHOG_API_KEY,POSTHOG_PROJECT_ID,POSTHOG_HOST+NEXT_PUBLIC_POSTHOG_* - Admin observability:
RAILWAY_API_TOKEN,NEON_API_KEY - Testing:
TEST_AUTH_BYPASS,TEST_HOOKS_ENABLED,TEST_SEED_SECRET,NEXT_PUBLIC_E2E_TEST_MODE,E2E_*
Comments inside .env.example explain when to override each variable.