Apply Codex review findings across all 17 components

Template Engine: add composed member addressing (path-qualified canonical names),
override granularity per entity type, semantic validation (call targets, arg types),
graph acyclicity enforcement, revision hashes for flattened configs.

Deployment Manager: add deployment ID + idempotency, per-instance operation lock
covering all mutating commands, state transition matrix, site-side apply atomicity
(all-or-nothing), artifact version compatibility policy.

Site Runtime: add script trust model (forbidden APIs, execution timeout, constrained
compilation), concurrency/serialization rules (Instance Actor serializes mutations),
site-wide stream backpressure (per-subscriber buffering, fire-and-forget publish).

Communication: add application-level correlation IDs for protocol safety beyond
Akka.NET transport guarantees.

External System Gateway: add 408/429 as transient errors, CachedCall idempotency
note, dedicated dispatcher for blocking I/O isolation.

Health Monitoring: add monotonic sequence numbers to prevent stale report overwrites.

Security: require LDAPS/StartTLS for LDAP connections.

Central UI: add failover behavior (SignalR reconnect, JWT survives, shared Data
Protection keys, load balancer readiness).

Cluster Infrastructure: add down-if-alone=on for safe singleton ownership.

Site Event Logging: clarify active-node-only logging (no replication), add 1GB
storage cap with oldest-first purge.

Host: add readiness gating (health check endpoint, no traffic until operational).

Commons: add message contract versioning policy (additive-only evolution).

Configuration Database: add optimistic concurrency on deployment status records.
This commit is contained in:
Joseph Doherty
2026-03-16 09:06:12 -04:00
parent 70e5ae33d5
commit 34694adba2
13 changed files with 152 additions and 10 deletions

View File

@@ -41,10 +41,27 @@ Engineer (UI) → Deployment Manager (Central)
└── 8. Update deployment status in config DB
```
## Deployment Concurrency
## Deployment Identity & Idempotency
- **Same instance**: A deployment to an instance is **blocked** if a previous deployment to that instance is still in progress (waiting for site response). The UI shows the deployment is in progress and rejects the second request. This prevents conflicting state at the site.
- **Different instances**: Deployments to different instances can proceed **in parallel**, even at the same site. Each deployment tracks status independently. This supports the bulk "deploy all out-of-date instances" operation efficiently.
- Every deployment is assigned a unique **deployment ID** and includes the flattened configuration's **revision hash** (from the Template Engine).
- Site-side apply is **idempotent on deployment ID** — if the same deployment is received twice (e.g., after a timeout where the site actually applied it), the site responds with "already applied" rather than re-applying.
- Sites **reject stale configurations** — if a deployment carries an older revision hash than what is already applied, the site rejects it and reports the current version.
- After a central failover or timeout, the Deployment Manager **queries the site for current deployment state** before allowing a re-deploy. This prevents duplicate application and out-of-order config changes.
## Operation Concurrency
All mutating operations on a single instance (deploy, disable, enable, delete) share a **per-instance operation lock**:
- Only one mutating operation per instance can be in-flight at a time. A second operation is rejected with an "operation in progress" error.
- **Different instances**: Operations on different instances can proceed **in parallel**, even at the same site. Each tracks status independently. This supports the bulk "deploy all out-of-date instances" operation efficiently.
### Allowed State Transitions
| Current State | Deploy | Disable | Enable | Delete |
|---------------|--------|---------|--------|--------|
| Enabled | Yes | Yes | No (already enabled) | Yes |
| Disabled | Yes (enables on apply) | No (already disabled) | Yes | Yes |
| Not deployed | Yes (initial deploy) | No | No | No |
## System-Wide Artifact Deployment Failure Handling
@@ -92,6 +109,20 @@ A deployment to a site includes the flattened instance configuration plus any sy
System-wide artifact deployment is a **separate action** from instance deployment, triggered explicitly by a user with the Deployment role.
## Site-Side Apply Atomicity
Applying a deployment at the site is **all-or-nothing per instance**:
- The site stores the new config, compiles all scripts, and creates/updates the Instance Actor as a single operation.
- If any step fails (e.g., script compilation), the entire deployment for that instance is rejected. The previous configuration remains active and unchanged.
- The site reports the specific failure reason (e.g., compilation error details) back to central.
## System-Wide Artifact Version Compatibility
- Cross-site version skew for artifacts (shared scripts, external system definitions, etc.) is **supported** — sites can temporarily run different artifact versions after a partial deployment.
- Artifacts are self-contained and site-independent. A site running an older version of shared scripts continues to operate correctly with its current instance configurations.
- The Central UI clearly indicates which sites have pending artifact updates so engineers can remediate.
## Instance Lifecycle Commands
The Deployment Manager sends the following commands to sites via the Communication Layer: