improve health checks across all nomad job specs
All checks were successful
CI / Terraform fmt + validate (pull_request) Successful in 27s
CI / Nomad job spec validate (pull_request) Successful in 22s
CI / Docker image pull validation (pull_request) Successful in 16s
CI / Terraform fmt + validate (push) Successful in 23s
CI / Nomad job spec validate (push) Successful in 22s
CI / Docker image pull validation (push) Has been skipped

- traefik: TCP → HTTP check on /ping (enable ping entrypoint)
- gitea: check path → /api/healthz
- jellyfin: TCP → HTTP check on /health
- glance: TCP → HTTP check on /
- sonarr/prowlarr: check path / → /ping (×2 checks each)
- ntfy/transfer/deluge/openreader/authelia/pgadmin: add name and port to existing checks
- postgres: remove invalid TCP check (Connect-enabled service)
- unifi: TCP → script check via curl (macvlan host isolation workaround)
This commit was merged in pull request #15.
This commit is contained in:
2026-05-26 20:03:19 +10:00
parent 8e32d00d90
commit e695485353
15 changed files with 66 additions and 14 deletions

View File

@@ -188,8 +188,8 @@ Most jobs already have Consul health checks — these can use `health_check = "c
| frigate | ✅ | ✅ `single-node-writer` | ⚠️ same — rolling |
| glance | ✅ | no | ✅ yes |
| transfer | ✅ | ✅ `single-node-writer` | ⚠️ rolling |
| openreader | | ✅ `single-node-writer` | ⚠️ add check first, then rolling |
| unifi | | ✅ `single-node-writer` | ⚠️ add check first, then rolling |
| openreader | `/` | ✅ `single-node-writer` | ⚠️ rolling |
| unifi | ✅ script | ✅ `single-node-writer` | ⚠️ rolling |
| traefik | (ingress) | ✅ | ⚠️ rolling — downtime risk, promote quickly |
| authelia | (ingress) | ✅ | ✅ stateless config, canary fine |
| renovate | batch job | n/a | n/a — no deployment model |
@@ -298,8 +298,8 @@ exit 1
- [x] **Phase 1c**: Add Nomad validate step — add `NOMAD_ADDR` + read-only `NOMAD_TOKEN` to Gitea secrets
- [x] **Phase 2**: Add image pull validation step to the workflow
- [ ] **Phase 3a**: Add `update` stanzas to ntfy and glance (simplest, no volume conflict)
- [ ] **Phase 3b**: Add rolling `update` stanzas to remaining service jobs (jellyfin, sonarr, etc.)
- [ ] **Phase 3c**: Add health checks to openreader and unifi before adding update stanzas
- [ ] **Phase 3b**: Add rolling `update` stanzas to remaining service jobs (jellyfin, sonarr, prowlarr, deluge, gitea, immich, transfer, frigate, openreader, unifi, authelia, traefik)
- [x] **Phase 3c**: Add health checks to openreader and unifi before adding update stanzas
- [ ] **Phase 4a**: Add on-push workflow that runs `terraform apply -auto-approve` using full credential set
- [ ] **Phase 4b**: Add deployment promotion/revert polling script
- [ ] **Phase 4c**: Wire ntfy notifications for promote/revert outcomes