fix(high-severity): close 9 of 10 open High findings across 8 modules

Comm-016: delete dead HandleConnectionStateChanged + _debugSubscriptions /
_inProgressDeployments tracking + ConnectionStateChanged message record.
Disconnect detection is owned by the transport layers (gRPC keepalive PING
~25s; Ask-timeout at CommunicationService). Updates the
Component-Communication.md design doc to make that explicit.

SnF-018: NotificationForwarder.DeliverAsync now discards a corrupt buffered
payload (Warning log + return true) instead of returning false and parking
the row — honoring the design's "notifications do not park" invariant.

DM-018: reconciliation no longer force-sets Enabled, preserving an
intentional Disabled state after central failover.

ESG-018: DeliverBufferedAsync (both ExternalSystemClient + DatabaseGateway)
catches JsonException and returns false, turning a corrupt buffered row
into a parked operation instead of a retry-forever poison message.

InboundAPI-022: register ActiveNodeGate as IActiveNodeGate in the Central
DI branch so standby-node gating is actually wired up in production.

NS-019: remove orphaned NotificationDeliveryService /
INotificationDeliveryService / NotificationResult; central notification
delivery now lives entirely in NotificationOutbox.

SEL-016: normalise From/To filters to UTC before ISO-string compare so
non-UTC DateTimeOffset clients no longer get spuriously excluded events.

TE-017: include Description on attributes/alarms and a HashableConnections
projection (protocol, endpoint JSON, failover count) in the revision hash
and DiffService; staleness detection now catches description-only and
connection-endpoint edits.

Transport-001 and Transport-002 (also High) remain Open — they're being
handled in a follow-up batch because both touch BundleImporter.cs and
must serialise.
This commit is contained in:
Joseph Doherty
2026-05-28 05:40:15 -04:00
parent f936f55f51
commit ac96b83b08
38 changed files with 852 additions and 1729 deletions
+17 -1
View File
@@ -774,9 +774,25 @@ than being masked by an endpoint-agnostic mock.
|--|--|
| Severity | High |
| Category | Design-document adherence |
| Status | Open |
| Status | Resolved |
| Location | `src/ScadaLink.Communication/Actors/CentralCommunicationActor.cs:169`, `src/ScadaLink.Communication/Actors/CentralCommunicationActor.cs:338-375` |
**Resolution** — deleted the dead code path in favour of the keepalive-based
detection that is the actual production behaviour: removed the
`Receive<ConnectionStateChanged>` handler, the `HandleConnectionStateChanged`
method, the `_debugSubscriptions` / `_inProgressDeployments` tracking dicts
+ the `TrackMessageForCleanup` helper that fed them, and the dead message
record `src/ScadaLink.Commons/Messages/Communication/ConnectionStateChanged.cs`.
The two dead tests (`ConnectionLost_DebugStreamsKilled` in
CentralCommunicationActorTests, `RoundTrip_ConnectionStateChanged_Succeeds`
in CompatibilityTests) were removed alongside. The design doc
`docs/requirements/Component-Communication.md` "Connection Failure Behavior"
section was updated to state explicitly that disconnect is detected at the
transport layer (gRPC keepalive PING ~25 s for debug streams; Ask-timeout
at the CommunicationService layer for command/control), with no
application-level signal. `DebugStreamTerminated` survives because
`DebugStreamBridgeActor` uses it for an unrelated intra-actor stop signal.
**Description**
`CentralCommunicationActor.HandleConnectionStateChanged` is wired to
+3 -1
View File
@@ -912,9 +912,11 @@ would be meaningless).
|--|--|
| Severity | High |
| Category | Correctness & logic bugs |
| Status | Open |
| Status | Resolved |
| Location | `src/ScadaLink.DeploymentManager/DeploymentService.cs:675-682,721-748` |
**Resolution** — Added a `forceEnabledState` parameter to `ApplyPostSuccessSideEffectsAsync`. The normal deploy path passes `true` (fresh apply legitimately ends in `Enabled`); the reconciliation path passes `false`, so the helper only promotes `NotDeployed → Enabled` and leaves an existing `Disabled` (or `Enabled`) untouched. Regression test `DeployInstanceAsync_Reconciled_DisabledInstance_PreservesDisabledState` exercises the failover scenario and asserts the prior record still flips to `Success` while `Instance.State` stays `Disabled`.
**Description**
`TryReconcileWithSiteAsync` calls `ApplyPostSuccessSideEffectsAsync` whenever
@@ -1003,9 +1003,11 @@ captured request URI has no trailing `?`; it was verified to fail before the fix
|--|--|
| Severity | High |
| Category | Error handling & resilience |
| Status | Open |
| Status | Resolved |
| Location | `src/ScadaLink.ExternalSystemGateway/ExternalSystemClient.cs:176`, `src/ScadaLink.ExternalSystemGateway/DatabaseGateway.cs:151` |
**Resolution** — Wrapped the `JsonSerializer.Deserialize<...>(message.PayloadJson)` call in both `ExternalSystemClient.DeliverBufferedAsync` and `DatabaseGateway.DeliverBufferedAsync` in a `try`/`catch (JsonException)` block. A `JsonException` is by definition permanent (the same payload bytes always deserialize identically), so the catch branch logs at `LogError` and returns `false`, parking the message via the S&F engine instead of letting it throw and be retried as a transient failure. Regression tests `DeliverBuffered_MalformedJsonPayload_ReturnsFalseSoMessageParks` were added to both `ExternalSystemClientTests` and `DatabaseGatewayTests` — each feeds a truncated `PayloadJson` to the handler and asserts `delivered == false` and that no exception escapes.
**Description**
Both `ExternalSystemClient.DeliverBufferedAsync` and `DatabaseGateway.DeliverBufferedAsync`
+3 -1
View File
@@ -1061,9 +1061,11 @@ that an attribute read/write carries the inherited `ParentExecutionId`.
|--|--|
| Severity | High |
| Category | Security |
| Status | Open |
| Status | Resolved |
| Location | `src/ScadaLink.InboundAPI/IActiveNodeGate.cs`, `src/ScadaLink.InboundAPI/InboundApiEndpointFilter.cs:52-60`; absent from `src/ScadaLink.Host/Program.cs` |
**Resolution** — Added `src/ScadaLink.Host/Health/ActiveNodeGate.cs`, a production `IActiveNodeGate` implementation backed by `AkkaHostedService` that mirrors `ActiveNodeHealthCheck`'s leadership probe (member status `Up` AND `Cluster.State.Leader == SelfAddress`), and registered it as a singleton in the central-role branch of `Program.cs`. A structural regression test (`CentralCompositionRootTests.Central_IActiveNodeGate_IsRegisteredAsActiveNodeGate`) reflects over the built `IServiceProvider` to assert the registration's existence and concrete type — failing on `main` and passing after the fix. The `InboundApiEndpointFilter`'s fall-through-to-allow behaviour is retained as the documented safe default for non-clustered hosts and tests.
**Description**
InboundAPI-008's resolution adds `IActiveNodeGate` (lines 1724 of
+3 -1
View File
@@ -647,9 +647,11 @@ Resolved 2026-05-17. All three issues confirmed against source. The hand-rolled
|--|--|
| Severity | High |
| Category | Design-document adherence |
| Status | Open |
| Status | Resolved |
| Location | `src/ScadaLink.NotificationService/NotificationDeliveryService.cs:18-442`, `src/ScadaLink.NotificationService/ServiceCollectionExtensions.cs:20-21`, `src/ScadaLink.Commons/Interfaces/Services/INotificationDeliveryService.cs:1-33`, `src/ScadaLink.Host/Program.cs:77` |
**Resolution** — Executed option 1. Deleted `src/ScadaLink.NotificationService/NotificationDeliveryService.cs`, `src/ScadaLink.Commons/Interfaces/Services/INotificationDeliveryService.cs` (also retires `NotificationResult` + `BufferedNotification`), and the orphaned `tests/ScadaLink.NotificationService.Tests/NotificationDeliveryServiceTests.cs` suite; reduced `AddNotificationService` to the shared SMTP primitives (`OAuth2TokenService`, `Func<ISmtpClientWrapper>`, `NotificationOptions`), updated `CompositionRootTests` (assert the primitives instead of the dead types), and removed the `Notification_Send_MockSmtp_Delivers` assertion in `IntegrationSurfaceTests` (central delivery is covered by `EmailNotificationDeliveryAdapterTests`). Grep-verified `grep -rn "INotificationDeliveryService\|NotificationDeliveryService\|NotificationResult\|BufferedNotification\|DeliverBufferedAsync" --include="*.cs" src/ tests/` before delete: zero production callers (only XML-doc cross-references in NS, MailKit wrapper, NotificationOptions and `EmailNotificationDeliveryAdapter`, plus the dead test files); cross-reference comments updated to remove the stale class references. `dotnet build ScadaLink.slnx` succeeds (0 warnings, 0 errors); affected test projects all pass (`NotificationService.Tests` 52/52, `NotificationOutbox.Tests` 86/86 on rerun — one flaky timing-sensitive Akka.TestKit test unrelated to NS-019, `Host.Tests` 205/205); `IntegrationTests` 64/66 with two pre-existing failures in `NotificationOutboxFlowTests` (SQLite "near IF: syntax error", reproducible on pristine `main`, unrelated to NS-019).
**Description**
The updated `Component-NotificationService.md` (re-read in full at this commit) makes the new design unambiguous: "The Notification Service is the central component that manages notification-list and SMTP definitions and provides the per-type delivery adapters used to send notifications. … Notification delivery has been inverted: a site script's notification is store-and-forwarded to the central cluster, and the central **Notification Outbox** owns dispatch and delivery, calling an `INotificationDeliveryAdapter` supplied by this component." The doc explicitly states the service is "central cluster only", "no longer present at site clusters", and "no longer delivers notifications from sites".
+3 -1
View File
@@ -793,9 +793,11 @@ chosen policy on `ISiteEventLogger.LogEventAsync`.
|--|--|
| Severity | High |
| Category | Correctness & logic bugs |
| Status | Open |
| Status | Resolved |
| Location | `src/ScadaLink.SiteEventLogging/EventLogQueryService.cs:67-77`, `src/ScadaLink.SiteEventLogging/SiteEventLogger.cs:159`, `src/ScadaLink.SiteEventLogging/EventLogPurgeService.cs:72-78` |
**Resolution** — `EventLogQueryService.ExecuteQuery` now calls `.ToUniversalTime()` on `request.From`/`request.To` before `ToString("o")`, so the produced ISO 8601 string always ends in `+00:00` and lexicographically matches the UTC timestamps written by `SiteEventLogger`. `EventLogPurgeService.PurgeByRetention` was also made defensive with an explicit `.ToUniversalTime()` on the cutoff. A regression test (`Query_FiltersByTimeRange_HandlesNonUtcOffset`) constructs a `+05:00` `DateTimeOffset` and asserts the matching UTC-stored events are returned and out-of-range ones are excluded.
**Description**
Event rows are persisted with `timestamp` = `DateTimeOffset.UtcNow.ToString("o")`,
+11 -1
View File
@@ -991,9 +991,19 @@ the StoreAndForward-016 replication) — and pass it to `RaiseActivity` (falling
|--|--|
| Severity | High |
| Category | Design-document adherence |
| Status | Open |
| Status | Resolved |
| Location | `src/ScadaLink.StoreAndForward/NotificationForwarder.cs:62``:69`, `:105``:122`; `src/ScadaLink.StoreAndForward/StoreAndForwardService.cs:369``:397` |
**Resolution** — `NotificationForwarder.DeliverAsync` now discards a corrupt
buffered payload instead of returning false. The corrupt path logs a Warning
with the buffered row id + length-capped payload preview via an injected
`ILogger<NotificationForwarder>` (NullLogger by default for back-compat),
then returns true so the S&F engine clears the row via its standard
success-path cleanup — honoring the "notifications do not park" design
invariant. Two regression tests in `NotificationForwarderTests` cover the
two corrupt shapes (invalid JSON, `null` deserialisation) and pin that
nothing is forwarded to central in either case.
**Description**
The Component design doc explicitly carves out notifications from the parking lifecycle:
+16 -2
View File
@@ -843,9 +843,11 @@ resolves against the real parent module. Regression test:
|--|--|
| Severity | High |
| Category | Design-document adherence |
| Status | Open |
| Status | Resolved |
| Location | `src/ScadaLink.TemplateEngine/Flattening/RevisionHashService.cs:128`, `src/ScadaLink.TemplateEngine/Flattening/RevisionHashService.cs:156`, `src/ScadaLink.TemplateEngine/Flattening/RevisionHashService.cs:42`, `src/ScadaLink.TemplateEngine/Flattening/DiffService.cs:110`, `src/ScadaLink.TemplateEngine/Flattening/DiffService.cs:118` |
**Resolution** — Added `Description` to `HashableAttribute` and `HashableAlarm` (placed alphabetically per the determinism contract) and introduced a `HashableConnection` projection plus a `SortedDictionary<string, HashableConnection> Connections` field on `HashableConfiguration` that captures protocol, primary/backup JSON, and failover retry count for every deployed connection. `DiffService.AttributesEqual` and `AlarmsEqual` now compare `Description`, and a new public `ConnectionsEqual` helper covers connection-endpoint drift so callers can detect the change in the same shape used by the other entity comparators. Regression tests `ComputeHash_AttributeDescriptionEdit_ChangesHash`, `ComputeHash_AlarmDescriptionEdit_ChangesHash`, `ComputeHash_ConnectionEndpointEdit_ChangesHash`, and `ConnectionsEqual_EndpointEdit_ReturnsFalse` lock the behaviour in.
**Description**
The design states the revision hash is "computed from the resolved content" and
@@ -891,7 +893,19 @@ it in `DiffService`. Add tests:
**Resolution**
_Unresolved._
Resolved (commit `pending`): `RevisionHashService` now folds `Description` into
the `HashableAttribute` / `HashableAlarm` projections (alphabetical placement
preserved) and adds a sorted `Connections` map of `HashableConnection`
(Protocol, ConfigurationJson, BackupConfigurationJson, FailoverRetryCount) on
`HashableConfiguration`. `DiffService.AttributesEqual` / `AlarmsEqual` compare
`Description`, and a public `ConnectionsEqual` helper covers connection drift
in the same shape as the other entity comparators. Regression tests:
`ComputeHash_AttributeDescriptionEdit_ChangesHash`,
`ComputeHash_AlarmDescriptionEdit_ChangesHash`,
`ComputeHash_ConnectionEndpointEdit_ChangesHash`,
`ConnectionsEqual_EndpointEdit_ReturnsFalse`. The diff-shape extension that
surfaces added/removed/changed connections in the UI remains tracked under
TemplateEngine-018.
### TemplateEngine-018 — `DiffService` reports no entries for added/removed/changed connections