docs(components): accuracy fixes from deep review (batch 3)

NotificationService (Notify.Send returns string not NotificationId;
MaxConcurrentConnections unenforced; AddHttpClient), NotificationOutbox
(one Attempted row always, terminal row only on terminal status), SiteCallAudit
(direct dual-write, no Tell; KPI tiles consumed by CentralUI), HealthMonitoring
(CentralOfflineTimeout 180s = 6x ReportInterval; HealthReportSender gates on
IsActiveNode), SiteEventLogging (active-node purge seam not wired; runs on both
nodes), InboundAPI (whole System.Diagnostics namespace forbidden).
This commit is contained in:
Joseph Doherty
2026-06-03 16:37:15 -04:00
parent 25bae4e43b
commit 9175b0c013
6 changed files with 27 additions and 28 deletions
+6 -7
View File
@@ -6,7 +6,7 @@ Site Call Audit (#22) is a central-only observability component that maintains a
The component lives in `src/ZB.MOM.WW.ScadaBridge.SiteCallAudit/` and runs only on the central cluster. Its single class, `SiteCallAuditActor`, is an Akka.NET `ReceiveActor` deployed as a `ClusterSingletonManager`-managed singleton on the active central node.
Telemetry reaches central through the shared `CachedCallTelemetry` packet (see [Audit Log](./AuditLog.md)), which carries both an `AuditEvent` for the `AuditLog` table and a `SiteCallOperational` snapshot for the `SiteCalls` table. The `AuditLogIngestActor` (Audit Log #23) writes both in a single MS SQL transaction when it receives an `IngestCachedTelemetryCommand`; it then tells `SiteCallAuditActor` an `UpsertSiteCallCommand` so the `SiteCalls` row is always consistent with the paired audit row. The `SiteCallAuditActor` is therefore an ingest target, not a transport; it never constructs telemetry packets and never decides what gets delivered.
Telemetry reaches central through the shared `CachedCallTelemetry` packet (see [Audit Log](./AuditLog.md)), which carries both an `AuditEvent` for the `AuditLog` table and a `SiteCallOperational` snapshot for the `SiteCalls` table. The `AuditLogIngestActor` (Audit Log #23) writes both rows directly inside a single MS SQL transaction when it receives an `IngestCachedTelemetryCommand`: `auditRepo.InsertIfNotExistsAsync(...)` followed by `siteCallRepo.UpsertAsync(...)`, committed together or rolled back together. There is no Tell to `SiteCallAuditActor` on this path; the `UpsertSiteCallCommand` / `OnUpsertAsync` handler exists for other callers, not the cached-telemetry hot path. The `SiteCallAuditActor` is therefore an ingest target, not a transport; it never constructs telemetry packets and never decides what gets delivered.
Sites remain the source of truth. `Tracking.Status()` is answered site-locally from the site SQLite tracking store; the central `SiteCalls` row is what the Central UI Site Calls page reads — it may lag by one telemetry cycle.
@@ -193,13 +193,13 @@ private void HandleKpi(SiteCallKpiRequest request)
}
```
The `SiteCallKpiSnapshot` shape mirrors `NotificationKpiSnapshot` so the Central UI dashboard can reuse the same tile layout for both components.
`SiteCallKpiSnapshot` is structurally similar to `NotificationKpiSnapshot` so the Central UI dashboard can reuse the same tile layout for both components. The shapes differ: `SiteCallKpiSnapshot` carries 6 fields (`BufferedCount`, `ParkedCount`, `FailedLastInterval`, `DeliveredLastInterval`, `OldestPendingAge`, `StuckCount`), while `NotificationKpiSnapshot` carries 5 (`QueueDepth`, `StuckCount`, `ParkedCount`, `DeliveredLastInterval`, `OldestPendingAge`) — `BufferedCount` replaces `QueueDepth` and `FailedLastInterval` is an addition with no counterpart in the notification shape.
## Usage
The actor accepts only Akka messages — there is no public API beyond the message protocol defined in Commons. The Central UI's Site Calls page sends `SiteCallQueryRequest` / `SiteCallKpiRequest` / `PerSiteSiteCallKpiRequest` / `SiteCallDetailRequest` through `CommunicationService`, which Asks the singleton and awaits `SiteCallQueryResponse` / `SiteCallKpiResponse` / `PerSiteSiteCallKpiResponse` / `SiteCallDetailResponse`.
The ingest path is driven by `AuditLogIngestActor.OnCachedTelemetryAsync`, which tells an `UpsertSiteCallCommand` after committing the dual-write transaction. The `SiteCallAuditActor` does not need to coordinate with `AuditLogIngestActor` — the transaction guarantees the `AuditLog` row always precedes the upsert command.
The ingest path is driven by `AuditLogIngestActor.OnCachedTelemetryAsync`, which writes both the `AuditLog` row and the `SiteCalls` upsert directly inside a single EF transaction — no message is sent to `SiteCallAuditActor` on this path. Both writes succeed or both roll back; neither component needs to coordinate with the other after the transaction commits.
Registration is via `ServiceCollectionExtensions.AddSiteCallAudit`, which binds `SiteCallAuditOptions` from the `ScadaBridge:SiteCallAudit` configuration section. The actor `Props` and the `ClusterSingletonManager` registration are wired in the Host's central-role composition.
@@ -217,11 +217,10 @@ Registration is via `ServiceCollectionExtensions.AddSiteCallAudit`, which binds
- [Commons (#16)](./Commons.md) — owns `SiteCall`, `SiteCallOperational`, `TrackedOperationId`, `SiteCallAuditOptions`-adjacent types (`SiteCallKpiSnapshot`, `SiteCallSiteKpiSnapshot`, `SiteCallQueryFilter`, `SiteCallPaging`), all message contracts (`UpsertSiteCallCommand`, `UpsertSiteCallReply`, `SiteCallQueryRequest/Response`, `SiteCallDetailRequest/Response`, `SiteCallKpiRequest/Response`, `PerSiteSiteCallKpiRequest/Response`, `RetrySiteCallRequest/Response`, `DiscardSiteCallRequest/Response`, `SiteCallRelayOutcome`), and the `ISiteCallAuditRepository` interface.
- [Configuration Database (#17)](./ConfigurationDatabase.md) — implements `ISiteCallAuditRepository` against the central `dbo.SiteCalls` table. Central hosts must call `AddConfigurationDatabase` for the actor to resolve its scoped repository.
- [Audit Log (#23)](./AuditLog.md) — shares the `CachedCallTelemetry` packet. `AuditLogIngestActor.OnCachedTelemetryAsync` writes the `AuditLog` row and the `SiteCalls` upsert in a single MS SQL transaction, then tells `UpsertSiteCallCommand` to this actor. The two components coordinate via message-passing, not a shared service.
- [Audit Log (#23)](./AuditLog.md) — shares the `CachedCallTelemetry` packet. `AuditLogIngestActor.OnCachedTelemetryAsync` writes the `AuditLog` row and the `SiteCalls` upsert directly in a single MS SQL transaction; it does not send a message to this actor on that path. The two components share a database transaction, not a message exchange.
- [CentralSite Communication (#5)](./Communication.md) — the `CentralCommunicationActor` is the transport the relay handlers use. It is registered via `RegisterCentralCommunication` by the Host after both actors are running. `CommunicationService` also provides the async wrappers (`RetrySiteCallAsync`, `DiscardSiteCallAsync`) that the Central UI calls; those methods Ask the `SiteCallAuditActor` with the outer `CommunicationOptions.QueryTimeout`.
- [Store-and-Forward Engine (#6)](./StoreAndForward.md) — site-side executor of `RetryParkedOperation` and `DiscardParkedOperation`. The site's S&F buffer is the source of truth for parked cached calls; it emits updated telemetry after applying an operator action.
- [Health Monitoring (#11)](./HealthMonitoring.md) — consumes `SiteCallKpiResponse` / `PerSiteSiteCallKpiResponse` to surface buffered count, parked count, stuck count, and throughput KPI tiles on the health dashboard alongside the Notification Outbox tiles.
- [Central UI (#9)](./CentralUI.md) — the Site Calls page queries this actor for the paginated list, detail modal, and KPIs; it issues Retry/Discard actions that flow through `CommunicationService` to the relay handlers here.
- [Central UI (#9)](./CentralUI.md) — the `Health.razor` page (`SiteCallKpiTiles` component) consumes `SiteCallKpiResponse` to surface buffered count, parked count, stuck count, and throughput KPI tiles on the health dashboard alongside the Notification Outbox tiles; the Site Calls page queries this actor for the paginated list, detail modal, and KPIs and issues Retry/Discard actions that flow through `CommunicationService` to the relay handlers here.
- [Cluster Infrastructure (#13)](./ClusterInfrastructure.md) — hosts the `SiteCallAuditActor` singleton with active/standby failover via `ClusterSingletonManager`.
## Troubleshooting
@@ -240,7 +239,7 @@ The actor caught a repository exception and replied false to the caller without
### `SiteCalls` rows not appearing
Ingest flows through `AuditLogIngestActor.OnCachedTelemetryAsync`, which writes the `AuditLog` row and `SiteCalls` upsert in one transaction before telling `UpsertSiteCallCommand`. If that transaction fails, neither row is written. Check `AuditLog` ingest health first — a missing `AuditLog` row for the same `TrackedOperationId` confirms the telemetry never reached central, not that the `SiteCalls` upsert failed in isolation.
Ingest flows through `AuditLogIngestActor.OnCachedTelemetryAsync`, which writes the `AuditLog` row and `SiteCalls` upsert directly in one EF transaction. If that transaction fails, neither row is written. Check `AuditLog` ingest health first — a missing `AuditLog` row for the same `TrackedOperationId` confirms the telemetry never reached central, not that the `SiteCalls` upsert failed in isolation.
## Related Documentation