ADR-0009: Background jobs via BullMQ on Redis
Status: Accepted Date: ~2026-01 (estimate) Context owner: Owner
Context
Several flows can’t happen in the request/response cycle:
- Sending invitation emails after a member signs up.
- Generating PDFs for signed compliance forms (Puppeteer renders take seconds).
- Reconciling payment provider statuses on webhook receipt.
- Cron-driven workflows: pre-charge reminders, no-show sweeps, AI storage snapshots, billing retry, embeddings enrichment.
- Spotter agent has periodic maintenance (cache eviction, daily usage rollups).
We need durable queues with retries, scheduled jobs (cron-equivalent), and observability.
Decision
Use BullMQ on Redis.
BullModule.forRootAsyncregisters a global connection inapps/api/src/app/app.module.ts.- Per-module queues are registered via
BullModule.registerQueue({ name: '...' }). - The
bull-boardUI is mounted at/admin/bull-boardfor operational visibility (apps/api/src/bull-board/). Access gated to platform admins. CRONS_ENABLEDenv flag gates@Crondecorators globally — kept off in dev to spare the Neon free-tier compute.
Redis also serves as the Socket.IO adapter (multi-instance WebSocket presence) and as the Spotter agent’s caching/rate-limit store. One Redis, multiple use cases.
Consequences
Positive
- Durable: a crashed worker resumes from Redis.
- Retries with exponential backoff are built in.
- Scheduling (delayed, repeated, cron) is one library, one mental model.
- bull-board gives an operator-friendly view of stuck queues, failed jobs.
- Reusing Redis for sockets + cache reduces infrastructure count.
Negative
- Single Redis = single point of failure. Mitigation: use a managed Redis with a sensible SLA.
- Coupling: a Redis outage takes out jobs, sockets, and the Spotter rate-limiter at once.
- Job code must be idempotent — BullMQ retries on transient failures, and a “succeeded but DB write didn’t commit” job is the wrong default.
- Cron jobs default to off (
CRONS_ENABLED=false) so dev environments don’t burn DB compute. Operational gotcha when troubleshooting:make dev-updoesn’t enable crons.
Discipline
- Every job must be idempotent. A second run with the same input must not double-charge, double-send, or double-allocate.
- Don’t queue ad-hoc. New queues are registered in the relevant module, named per the domain (
forms.pdf-render,notifications.send, etc.). - Long-running jobs need keepalive heartbeats if they exceed the visibility timeout.
- Failures are observable. A job that fails three times moves to
failed; bull-board surfaces it. - Don’t let CI run scheduled cron jobs by surprise. CI sets
CRONS_ENABLED=trueonly because some tests exercise the cron handler directly via/testing/*endpoints.
Related
- architecture/observability.md — bull-board access
- features/notifications/README.md
- features/payments/behavior.md — webhook idempotency