fix(high-severity): close 9 of 10 open High findings across 8 modules

Comm-016: delete dead HandleConnectionStateChanged + _debugSubscriptions /
_inProgressDeployments tracking + ConnectionStateChanged message record.
Disconnect detection is owned by the transport layers (gRPC keepalive PING
~25s; Ask-timeout at CommunicationService). Updates the
Component-Communication.md design doc to make that explicit.

SnF-018: NotificationForwarder.DeliverAsync now discards a corrupt buffered
payload (Warning log + return true) instead of returning false and parking
the row — honoring the design's "notifications do not park" invariant.

DM-018: reconciliation no longer force-sets Enabled, preserving an
intentional Disabled state after central failover.

ESG-018: DeliverBufferedAsync (both ExternalSystemClient + DatabaseGateway)
catches JsonException and returns false, turning a corrupt buffered row
into a parked operation instead of a retry-forever poison message.

InboundAPI-022: register ActiveNodeGate as IActiveNodeGate in the Central
DI branch so standby-node gating is actually wired up in production.

NS-019: remove orphaned NotificationDeliveryService /
INotificationDeliveryService / NotificationResult; central notification
delivery now lives entirely in NotificationOutbox.

SEL-016: normalise From/To filters to UTC before ISO-string compare so
non-UTC DateTimeOffset clients no longer get spuriously excluded events.

TE-017: include Description on attributes/alarms and a HashableConnections
projection (protocol, endpoint JSON, failover count) in the revision hash
and DiffService; staleness detection now catches description-only and
connection-endpoint edits.

Transport-001 and Transport-002 (also High) remain Open — they're being
handled in a follow-up batch because both touch BundleImporter.cs and
must serialise.
This commit is contained in:
Joseph Doherty
2026-05-28 05:40:15 -04:00
parent f936f55f51
commit ac96b83b08
38 changed files with 852 additions and 1729 deletions
@@ -116,30 +116,13 @@ public class CentralCommunicationActorTests : TestKit
ExpectNoMsg(TimeSpan.FromMilliseconds(200));
}
[Fact]
public void ConnectionLost_DebugStreamsKilled()
{
var site = CreateSite("site1", "akka.tcp://scadalink@host:8082");
var (actor, _, siteProbes) = CreateActorWithMockRepo(new[] { site });
// Wait for auto-refresh
Thread.Sleep(1000);
// Subscribe to debug view (tracks the subscription)
var subscriberProbe = CreateTestProbe();
var subRequest = new SubscribeDebugViewRequest("inst1", "corr-123");
actor.Tell(new SiteEnvelope("site1", subRequest), subscriberProbe.Ref);
// The ClusterClient probe receives the routed message
siteProbes["site1"].ExpectMsg<ClusterClient.Send>();
// Simulate site disconnection
actor.Tell(new ConnectionStateChanged("site1", false, DateTimeOffset.UtcNow));
// The subscriber should receive a DebugStreamTerminated notification
subscriberProbe.ExpectMsg<DebugStreamTerminated>(
msg => msg.SiteId == "site1" && msg.CorrelationId == "corr-123");
}
// Communication-016: the prior `ConnectionLost_DebugStreamsKilled` test was
// removed alongside the dead HandleConnectionStateChanged handler. No
// production code ever emitted ConnectionStateChanged, so the test was
// exercising a workflow that never ran. Disconnect detection is owned by
// the gRPC keepalive (DebugStreamBridgeActor self-terminates) and by the
// Ask-timeout path at the CommunicationService layer (deploy callers see
// a failure).
[Fact]
public void Heartbeat_BumpsAggregatorTimestamp()