Apply Codex review findings across all 17 components

Template Engine: add composed member addressing (path-qualified canonical names), override granularity per entity type, semantic validation (call targets, arg types), graph acyclicity enforcement, revision hashes for flattened configs. Deployment Manager: add deployment ID + idempotency, per-instance operation lock covering all mutating commands, state transition matrix, site-side apply atomicity (all-or-nothing), artifact version compatibility policy. Site Runtime: add script trust model (forbidden APIs, execution timeout, constrained compilation), concurrency/serialization rules (Instance Actor serializes mutations), site-wide stream backpressure (per-subscriber buffering, fire-and-forget publish). Communication: add application-level correlation IDs for protocol safety beyond Akka.NET transport guarantees. External System Gateway: add 408/429 as transient errors, CachedCall idempotency note, dedicated dispatcher for blocking I/O isolation. Health Monitoring: add monotonic sequence numbers to prevent stale report overwrites. Security: require LDAPS/StartTLS for LDAP connections. Central UI: add failover behavior (SignalR reconnect, JWT survives, shared Data Protection keys, load balancer readiness). Cluster Infrastructure: add down-if-alone=on for safe singleton ownership. Site Event Logging: clarify active-node-only logging (no replication), add 1GB storage cap with oldest-first purge. Host: add readiness gating (health check endpoint, no traffic until operational). Commons: add message contract versioning policy (additive-only evolution). Configuration Database: add optimistic concurrency on deployment status records.
2026-03-16 09:06:12 -04:00
parent 70e5ae33d5
commit 34694adba2
13 changed files with 152 additions and 10 deletions
--- a/Component-SiteEventLogging.md
+++ b/Component-SiteEventLogging.md
@@ -40,8 +40,9 @@ Each event entry contains:
 ## Storage

 - Events are stored in **local SQLite** on each site node.
- Each node maintains its own event log (the active node generates events; the standby node generates minimal events related to replication).
+- Each node maintains its own event log. Only the **active node** generates and stores events. Event logs are **not replicated** to the standby node. On failover, the new active node starts logging to its own SQLite database; historical events from the previous active node are no longer queryable via central until that node comes back online. This is acceptable because event logs are diagnostic, not transactional.
 - **Retention**: 30 days. A **daily background job** runs on the active node and deletes all events older than 30 days. Hard delete — no archival.
+- **Storage cap**: A configurable maximum database size (default: 1 GB) is enforced. If the storage cap is reached before the 30-day retention window, the oldest events are purged first. This prevents disk exhaustion from alarm storms, script failure loops, or connection flapping.

 ## Central Access