docs(components): accuracy fixes from deep review (batch 3)

NotificationService (Notify.Send returns string not NotificationId; MaxConcurrentConnections unenforced; AddHttpClient), NotificationOutbox (one Attempted row always, terminal row only on terminal status), SiteCallAudit (direct dual-write, no Tell; KPI tiles consumed by CentralUI), HealthMonitoring (CentralOfflineTimeout 180s = 6x ReportInterval; HealthReportSender gates on IsActiveNode), SiteEventLogging (active-node purge seam not wired; runs on both nodes), InboundAPI (whole System.Diagnostics namespace forbidden).
2026-06-03 16:37:15 -04:00
parent 25bae4e43b
commit 9175b0c013
6 changed files with 27 additions and 28 deletions
@@ -20,7 +20,7 @@ The DI entry point is `ServiceCollectionExtensions.AddSiteEventLogging`, registe

 ### Active-node-only writes

-Only the active site node generates and stores events. The standby's local SQLite receives no writes, so purging there is unnecessary. `EventLogPurgeService` consults an optional `SiteEventLogActiveNodeCheck` delegate on each tick; the Host registers the real check on site nodes, and the purge early-exits on the standby. When no delegate is registered (tests, non-clustered hosts), the purge runs on every tick, preserving pre-cluster behaviour.
+Only the active site node generates and stores events. The standby's local SQLite receives no writes, so purging there is unnecessary. `EventLogPurgeService` consults an optional `SiteEventLogActiveNodeCheck` delegate on each tick and early-exits when the delegate returns `false`. The delegate is an optional seam: `AddSiteEventLogging` resolves it via `sp.GetService<SiteEventLogActiveNodeCheck>()`, so the service compiles and runs without it. The Host does **not** currently register the delegate, so `GetService` returns `null` and the constructor defaults to `() => true`. As a result the purge currently runs on every tick on both nodes. When no delegate is registered, the purge runs on every tick, preserving pre-cluster behaviour.

 On failover, the newly active node starts logging to its own SQLite database. Historical events from the previous active node are not queryable until that node comes back online. This is acceptable because event logs are diagnostic, not transactional — a missing log tail after failover is not a data-integrity concern.

@@ -199,7 +199,7 @@ The docker cluster appsettings (`ScadaBridge:SiteEventLog`) sets `RetentionDays:
 - [Site Runtime (#3)](./SiteRuntime.md) — `ScriptActor` and `ScriptExecutionActor` log `script`-type events: trigger expression failures, script execution errors, and timeouts. `ISiteEventLogger` is resolved from DI inside execution actors.
 - [Data Connection Layer (#4)](./DataConnectionLayer.md) — `DataConnectionActor` logs `connection`-type events: connection loss, reconnection, and endpoint failover. `DataConnectionManagerActor` may also log connection-category events.
 - [Store-and-Forward Engine (#6)](./StoreAndForward.md) — logs `store_and_forward`-type events on the site→central notification forward path (forward failures, long-buffered notifications). Routine enqueue and forward-success events are not logged; central's `Notifications` table is the authoritative record.
- [Host (#15)](./Host.md) — `SiteServiceRegistration` calls `AddSiteEventLogging` and binds `SiteEventLogOptions`. `AkkaHostedService` wires `EventLogHandlerActor` as a cluster singleton scoped to `"site-{SiteId}"` and registers the `SiteEventLogActiveNodeCheck` delegate so the purge runs only on the active node.
+- [Host (#15)](./Host.md) — `SiteServiceRegistration` calls `AddSiteEventLogging` and binds `SiteEventLogOptions`. `AkkaHostedService` wires `EventLogHandlerActor` as a cluster singleton scoped to `"site-{SiteId}"`. The `SiteEventLogActiveNodeCheck` delegate is an optional seam defined in `SiteEventLogging` for the Host to register when it wants to gate the purge to the active node only; the Host does not currently register it, so the purge defaults to always-active and runs on every node.
 - [Audit Log (#23)](./AuditLog.md) — a distinct component. The Audit Log captures every trust-boundary action (outbound API calls, DB writes, notifications, inbound API) and flows to a central append-only table with monthly partitioning and 365-day retention. The site event log captures internal runtime diagnostics (failures, state transitions) locally with 30-day retention. The two stores are complementary, not overlapping.
 - [Site Call Audit (#22)](./SiteCallAudit.md) — a distinct component. Site Call Audit mirrors cached-call operational status in the central `SiteCalls` table via gRPC telemetry. Site Event Logging has no role in that flow.