fix(high-severity): close 9 of 10 open High findings across 8 modules

Comm-016: delete dead HandleConnectionStateChanged + _debugSubscriptions /
_inProgressDeployments tracking + ConnectionStateChanged message record.
Disconnect detection is owned by the transport layers (gRPC keepalive PING
~25s; Ask-timeout at CommunicationService). Updates the
Component-Communication.md design doc to make that explicit.

SnF-018: NotificationForwarder.DeliverAsync now discards a corrupt buffered
payload (Warning log + return true) instead of returning false and parking
the row — honoring the design's "notifications do not park" invariant.

DM-018: reconciliation no longer force-sets Enabled, preserving an
intentional Disabled state after central failover.

ESG-018: DeliverBufferedAsync (both ExternalSystemClient + DatabaseGateway)
catches JsonException and returns false, turning a corrupt buffered row
into a parked operation instead of a retry-forever poison message.

InboundAPI-022: register ActiveNodeGate as IActiveNodeGate in the Central
DI branch so standby-node gating is actually wired up in production.

NS-019: remove orphaned NotificationDeliveryService /
INotificationDeliveryService / NotificationResult; central notification
delivery now lives entirely in NotificationOutbox.

SEL-016: normalise From/To filters to UTC before ISO-string compare so
non-UTC DateTimeOffset clients no longer get spuriously excluded events.

TE-017: include Description on attributes/alarms and a HashableConnections
projection (protocol, endpoint JSON, failover count) in the revision hash
and DiffService; staleness detection now catches description-only and
connection-endpoint edits.

Transport-001 and Transport-002 (also High) remain Open — they're being
handled in a follow-up batch because both touch BundleImporter.cs and
must serialise.
This commit is contained in:
Joseph Doherty
2026-05-28 05:40:15 -04:00
parent f936f55f51
commit ac96b83b08
38 changed files with 852 additions and 1729 deletions
+17 -1
View File
@@ -774,9 +774,25 @@ than being masked by an endpoint-agnostic mock.
|--|--| |--|--|
| Severity | High | | Severity | High |
| Category | Design-document adherence | | Category | Design-document adherence |
| Status | Open | | Status | Resolved |
| Location | `src/ScadaLink.Communication/Actors/CentralCommunicationActor.cs:169`, `src/ScadaLink.Communication/Actors/CentralCommunicationActor.cs:338-375` | | Location | `src/ScadaLink.Communication/Actors/CentralCommunicationActor.cs:169`, `src/ScadaLink.Communication/Actors/CentralCommunicationActor.cs:338-375` |
**Resolution** — deleted the dead code path in favour of the keepalive-based
detection that is the actual production behaviour: removed the
`Receive<ConnectionStateChanged>` handler, the `HandleConnectionStateChanged`
method, the `_debugSubscriptions` / `_inProgressDeployments` tracking dicts
+ the `TrackMessageForCleanup` helper that fed them, and the dead message
record `src/ScadaLink.Commons/Messages/Communication/ConnectionStateChanged.cs`.
The two dead tests (`ConnectionLost_DebugStreamsKilled` in
CentralCommunicationActorTests, `RoundTrip_ConnectionStateChanged_Succeeds`
in CompatibilityTests) were removed alongside. The design doc
`docs/requirements/Component-Communication.md` "Connection Failure Behavior"
section was updated to state explicitly that disconnect is detected at the
transport layer (gRPC keepalive PING ~25 s for debug streams; Ask-timeout
at the CommunicationService layer for command/control), with no
application-level signal. `DebugStreamTerminated` survives because
`DebugStreamBridgeActor` uses it for an unrelated intra-actor stop signal.
**Description** **Description**
`CentralCommunicationActor.HandleConnectionStateChanged` is wired to `CentralCommunicationActor.HandleConnectionStateChanged` is wired to
+3 -1
View File
@@ -912,9 +912,11 @@ would be meaningless).
|--|--| |--|--|
| Severity | High | | Severity | High |
| Category | Correctness & logic bugs | | Category | Correctness & logic bugs |
| Status | Open | | Status | Resolved |
| Location | `src/ScadaLink.DeploymentManager/DeploymentService.cs:675-682,721-748` | | Location | `src/ScadaLink.DeploymentManager/DeploymentService.cs:675-682,721-748` |
**Resolution** — Added a `forceEnabledState` parameter to `ApplyPostSuccessSideEffectsAsync`. The normal deploy path passes `true` (fresh apply legitimately ends in `Enabled`); the reconciliation path passes `false`, so the helper only promotes `NotDeployed → Enabled` and leaves an existing `Disabled` (or `Enabled`) untouched. Regression test `DeployInstanceAsync_Reconciled_DisabledInstance_PreservesDisabledState` exercises the failover scenario and asserts the prior record still flips to `Success` while `Instance.State` stays `Disabled`.
**Description** **Description**
`TryReconcileWithSiteAsync` calls `ApplyPostSuccessSideEffectsAsync` whenever `TryReconcileWithSiteAsync` calls `ApplyPostSuccessSideEffectsAsync` whenever
@@ -1003,9 +1003,11 @@ captured request URI has no trailing `?`; it was verified to fail before the fix
|--|--| |--|--|
| Severity | High | | Severity | High |
| Category | Error handling & resilience | | Category | Error handling & resilience |
| Status | Open | | Status | Resolved |
| Location | `src/ScadaLink.ExternalSystemGateway/ExternalSystemClient.cs:176`, `src/ScadaLink.ExternalSystemGateway/DatabaseGateway.cs:151` | | Location | `src/ScadaLink.ExternalSystemGateway/ExternalSystemClient.cs:176`, `src/ScadaLink.ExternalSystemGateway/DatabaseGateway.cs:151` |
**Resolution** — Wrapped the `JsonSerializer.Deserialize<...>(message.PayloadJson)` call in both `ExternalSystemClient.DeliverBufferedAsync` and `DatabaseGateway.DeliverBufferedAsync` in a `try`/`catch (JsonException)` block. A `JsonException` is by definition permanent (the same payload bytes always deserialize identically), so the catch branch logs at `LogError` and returns `false`, parking the message via the S&F engine instead of letting it throw and be retried as a transient failure. Regression tests `DeliverBuffered_MalformedJsonPayload_ReturnsFalseSoMessageParks` were added to both `ExternalSystemClientTests` and `DatabaseGatewayTests` — each feeds a truncated `PayloadJson` to the handler and asserts `delivered == false` and that no exception escapes.
**Description** **Description**
Both `ExternalSystemClient.DeliverBufferedAsync` and `DatabaseGateway.DeliverBufferedAsync` Both `ExternalSystemClient.DeliverBufferedAsync` and `DatabaseGateway.DeliverBufferedAsync`
+3 -1
View File
@@ -1061,9 +1061,11 @@ that an attribute read/write carries the inherited `ParentExecutionId`.
|--|--| |--|--|
| Severity | High | | Severity | High |
| Category | Security | | Category | Security |
| Status | Open | | Status | Resolved |
| Location | `src/ScadaLink.InboundAPI/IActiveNodeGate.cs`, `src/ScadaLink.InboundAPI/InboundApiEndpointFilter.cs:52-60`; absent from `src/ScadaLink.Host/Program.cs` | | Location | `src/ScadaLink.InboundAPI/IActiveNodeGate.cs`, `src/ScadaLink.InboundAPI/InboundApiEndpointFilter.cs:52-60`; absent from `src/ScadaLink.Host/Program.cs` |
**Resolution** — Added `src/ScadaLink.Host/Health/ActiveNodeGate.cs`, a production `IActiveNodeGate` implementation backed by `AkkaHostedService` that mirrors `ActiveNodeHealthCheck`'s leadership probe (member status `Up` AND `Cluster.State.Leader == SelfAddress`), and registered it as a singleton in the central-role branch of `Program.cs`. A structural regression test (`CentralCompositionRootTests.Central_IActiveNodeGate_IsRegisteredAsActiveNodeGate`) reflects over the built `IServiceProvider` to assert the registration's existence and concrete type — failing on `main` and passing after the fix. The `InboundApiEndpointFilter`'s fall-through-to-allow behaviour is retained as the documented safe default for non-clustered hosts and tests.
**Description** **Description**
InboundAPI-008's resolution adds `IActiveNodeGate` (lines 1724 of InboundAPI-008's resolution adds `IActiveNodeGate` (lines 1724 of
+3 -1
View File
@@ -647,9 +647,11 @@ Resolved 2026-05-17. All three issues confirmed against source. The hand-rolled
|--|--| |--|--|
| Severity | High | | Severity | High |
| Category | Design-document adherence | | Category | Design-document adherence |
| Status | Open | | Status | Resolved |
| Location | `src/ScadaLink.NotificationService/NotificationDeliveryService.cs:18-442`, `src/ScadaLink.NotificationService/ServiceCollectionExtensions.cs:20-21`, `src/ScadaLink.Commons/Interfaces/Services/INotificationDeliveryService.cs:1-33`, `src/ScadaLink.Host/Program.cs:77` | | Location | `src/ScadaLink.NotificationService/NotificationDeliveryService.cs:18-442`, `src/ScadaLink.NotificationService/ServiceCollectionExtensions.cs:20-21`, `src/ScadaLink.Commons/Interfaces/Services/INotificationDeliveryService.cs:1-33`, `src/ScadaLink.Host/Program.cs:77` |
**Resolution** — Executed option 1. Deleted `src/ScadaLink.NotificationService/NotificationDeliveryService.cs`, `src/ScadaLink.Commons/Interfaces/Services/INotificationDeliveryService.cs` (also retires `NotificationResult` + `BufferedNotification`), and the orphaned `tests/ScadaLink.NotificationService.Tests/NotificationDeliveryServiceTests.cs` suite; reduced `AddNotificationService` to the shared SMTP primitives (`OAuth2TokenService`, `Func<ISmtpClientWrapper>`, `NotificationOptions`), updated `CompositionRootTests` (assert the primitives instead of the dead types), and removed the `Notification_Send_MockSmtp_Delivers` assertion in `IntegrationSurfaceTests` (central delivery is covered by `EmailNotificationDeliveryAdapterTests`). Grep-verified `grep -rn "INotificationDeliveryService\|NotificationDeliveryService\|NotificationResult\|BufferedNotification\|DeliverBufferedAsync" --include="*.cs" src/ tests/` before delete: zero production callers (only XML-doc cross-references in NS, MailKit wrapper, NotificationOptions and `EmailNotificationDeliveryAdapter`, plus the dead test files); cross-reference comments updated to remove the stale class references. `dotnet build ScadaLink.slnx` succeeds (0 warnings, 0 errors); affected test projects all pass (`NotificationService.Tests` 52/52, `NotificationOutbox.Tests` 86/86 on rerun — one flaky timing-sensitive Akka.TestKit test unrelated to NS-019, `Host.Tests` 205/205); `IntegrationTests` 64/66 with two pre-existing failures in `NotificationOutboxFlowTests` (SQLite "near IF: syntax error", reproducible on pristine `main`, unrelated to NS-019).
**Description** **Description**
The updated `Component-NotificationService.md` (re-read in full at this commit) makes the new design unambiguous: "The Notification Service is the central component that manages notification-list and SMTP definitions and provides the per-type delivery adapters used to send notifications. … Notification delivery has been inverted: a site script's notification is store-and-forwarded to the central cluster, and the central **Notification Outbox** owns dispatch and delivery, calling an `INotificationDeliveryAdapter` supplied by this component." The doc explicitly states the service is "central cluster only", "no longer present at site clusters", and "no longer delivers notifications from sites". The updated `Component-NotificationService.md` (re-read in full at this commit) makes the new design unambiguous: "The Notification Service is the central component that manages notification-list and SMTP definitions and provides the per-type delivery adapters used to send notifications. … Notification delivery has been inverted: a site script's notification is store-and-forwarded to the central cluster, and the central **Notification Outbox** owns dispatch and delivery, calling an `INotificationDeliveryAdapter` supplied by this component." The doc explicitly states the service is "central cluster only", "no longer present at site clusters", and "no longer delivers notifications from sites".
+3 -1
View File
@@ -793,9 +793,11 @@ chosen policy on `ISiteEventLogger.LogEventAsync`.
|--|--| |--|--|
| Severity | High | | Severity | High |
| Category | Correctness & logic bugs | | Category | Correctness & logic bugs |
| Status | Open | | Status | Resolved |
| Location | `src/ScadaLink.SiteEventLogging/EventLogQueryService.cs:67-77`, `src/ScadaLink.SiteEventLogging/SiteEventLogger.cs:159`, `src/ScadaLink.SiteEventLogging/EventLogPurgeService.cs:72-78` | | Location | `src/ScadaLink.SiteEventLogging/EventLogQueryService.cs:67-77`, `src/ScadaLink.SiteEventLogging/SiteEventLogger.cs:159`, `src/ScadaLink.SiteEventLogging/EventLogPurgeService.cs:72-78` |
**Resolution** — `EventLogQueryService.ExecuteQuery` now calls `.ToUniversalTime()` on `request.From`/`request.To` before `ToString("o")`, so the produced ISO 8601 string always ends in `+00:00` and lexicographically matches the UTC timestamps written by `SiteEventLogger`. `EventLogPurgeService.PurgeByRetention` was also made defensive with an explicit `.ToUniversalTime()` on the cutoff. A regression test (`Query_FiltersByTimeRange_HandlesNonUtcOffset`) constructs a `+05:00` `DateTimeOffset` and asserts the matching UTC-stored events are returned and out-of-range ones are excluded.
**Description** **Description**
Event rows are persisted with `timestamp` = `DateTimeOffset.UtcNow.ToString("o")`, Event rows are persisted with `timestamp` = `DateTimeOffset.UtcNow.ToString("o")`,
+11 -1
View File
@@ -991,9 +991,19 @@ the StoreAndForward-016 replication) — and pass it to `RaiseActivity` (falling
|--|--| |--|--|
| Severity | High | | Severity | High |
| Category | Design-document adherence | | Category | Design-document adherence |
| Status | Open | | Status | Resolved |
| Location | `src/ScadaLink.StoreAndForward/NotificationForwarder.cs:62``:69`, `:105``:122`; `src/ScadaLink.StoreAndForward/StoreAndForwardService.cs:369``:397` | | Location | `src/ScadaLink.StoreAndForward/NotificationForwarder.cs:62``:69`, `:105``:122`; `src/ScadaLink.StoreAndForward/StoreAndForwardService.cs:369``:397` |
**Resolution** — `NotificationForwarder.DeliverAsync` now discards a corrupt
buffered payload instead of returning false. The corrupt path logs a Warning
with the buffered row id + length-capped payload preview via an injected
`ILogger<NotificationForwarder>` (NullLogger by default for back-compat),
then returns true so the S&F engine clears the row via its standard
success-path cleanup — honoring the "notifications do not park" design
invariant. Two regression tests in `NotificationForwarderTests` cover the
two corrupt shapes (invalid JSON, `null` deserialisation) and pin that
nothing is forwarded to central in either case.
**Description** **Description**
The Component design doc explicitly carves out notifications from the parking lifecycle: The Component design doc explicitly carves out notifications from the parking lifecycle:
+16 -2
View File
@@ -843,9 +843,11 @@ resolves against the real parent module. Regression test:
|--|--| |--|--|
| Severity | High | | Severity | High |
| Category | Design-document adherence | | Category | Design-document adherence |
| Status | Open | | Status | Resolved |
| Location | `src/ScadaLink.TemplateEngine/Flattening/RevisionHashService.cs:128`, `src/ScadaLink.TemplateEngine/Flattening/RevisionHashService.cs:156`, `src/ScadaLink.TemplateEngine/Flattening/RevisionHashService.cs:42`, `src/ScadaLink.TemplateEngine/Flattening/DiffService.cs:110`, `src/ScadaLink.TemplateEngine/Flattening/DiffService.cs:118` | | Location | `src/ScadaLink.TemplateEngine/Flattening/RevisionHashService.cs:128`, `src/ScadaLink.TemplateEngine/Flattening/RevisionHashService.cs:156`, `src/ScadaLink.TemplateEngine/Flattening/RevisionHashService.cs:42`, `src/ScadaLink.TemplateEngine/Flattening/DiffService.cs:110`, `src/ScadaLink.TemplateEngine/Flattening/DiffService.cs:118` |
**Resolution** — Added `Description` to `HashableAttribute` and `HashableAlarm` (placed alphabetically per the determinism contract) and introduced a `HashableConnection` projection plus a `SortedDictionary<string, HashableConnection> Connections` field on `HashableConfiguration` that captures protocol, primary/backup JSON, and failover retry count for every deployed connection. `DiffService.AttributesEqual` and `AlarmsEqual` now compare `Description`, and a new public `ConnectionsEqual` helper covers connection-endpoint drift so callers can detect the change in the same shape used by the other entity comparators. Regression tests `ComputeHash_AttributeDescriptionEdit_ChangesHash`, `ComputeHash_AlarmDescriptionEdit_ChangesHash`, `ComputeHash_ConnectionEndpointEdit_ChangesHash`, and `ConnectionsEqual_EndpointEdit_ReturnsFalse` lock the behaviour in.
**Description** **Description**
The design states the revision hash is "computed from the resolved content" and The design states the revision hash is "computed from the resolved content" and
@@ -891,7 +893,19 @@ it in `DiffService`. Add tests:
**Resolution** **Resolution**
_Unresolved._ Resolved (commit `pending`): `RevisionHashService` now folds `Description` into
the `HashableAttribute` / `HashableAlarm` projections (alphabetical placement
preserved) and adds a sorted `Connections` map of `HashableConnection`
(Protocol, ConfigurationJson, BackupConfigurationJson, FailoverRetryCount) on
`HashableConfiguration`. `DiffService.AttributesEqual` / `AlarmsEqual` compare
`Description`, and a public `ConnectionsEqual` helper covers connection drift
in the same shape as the other entity comparators. Regression tests:
`ComputeHash_AttributeDescriptionEdit_ChangesHash`,
`ComputeHash_AlarmDescriptionEdit_ChangesHash`,
`ComputeHash_ConnectionEndpointEdit_ChangesHash`,
`ConnectionsEqual_EndpointEdit_ReturnsFalse`. The diff-shape extension that
surfaces added/removed/changed connections in the UI remains tracked under
TemplateEngine-018.
### TemplateEngine-018 — `DiffService` reports no entries for added/removed/changed connections ### TemplateEngine-018 — `DiffService` reports no entries for added/removed/changed connections
+4 -2
View File
@@ -224,8 +224,10 @@ The ManagementActor is registered at the well-known path `/user/management` on c
## Connection Failure Behavior ## Connection Failure Behavior
- **In-flight messages**: When a connection drops while a request is in flight (e.g., deployment sent but no response received), the Akka ask pattern times out and the caller receives a failure. There is **no automatic retry or buffering at central** — the engineer sees the failure in the UI and re-initiates the action. This is consistent with the design principle that central does not buffer messages. Disconnect is detected at the **transport layer**, never via an application-level signal from central. There is no `ConnectionStateChanged`-style synchronous notification: the central coordinator does not maintain a model of "this site is up / down" because the two transports already report unavailability at their natural cadence.
- **Debug streams**: Any gRPC stream interruption triggers reconnection logic in the `DebugStreamBridgeActor`. The bridge actor attempts to reconnect to the other site node endpoint (NodeB if NodeA failed, or vice versa), with up to 3 retries and 5-second backoff. If all retries fail, the consumer is notified via `OnStreamTerminated` and the bridge actor is stopped. Events during the reconnection gap are lost (acceptable for real-time debug view). On successful reconnection, the consumer can request a fresh snapshot to re-sync state.
- **In-flight command/control messages (ClusterClient + Ask)**: When a connection drops while a request is in flight (e.g., a deployment sent but no response received), the Akka ask pattern times out and the caller receives a failure. There is **no automatic retry or buffering at central** — the engineer sees the failure in the UI and re-initiates the action. This is consistent with the design principle that central does not buffer messages. An in-progress deployment whose round-trip exceeds the Ask timeout (default 120 s at `CommunicationService.DeployInstanceAsync`) surfaces as `DeploymentStatus.Failed` to the caller.
- **Debug streams (gRPC)**: Any gRPC stream interruption is detected by the HTTP/2 keepalive PING (~25 s) and triggers reconnection logic in the `DebugStreamBridgeActor`. The bridge actor attempts to reconnect to the other site node endpoint (NodeB if NodeA failed, or vice versa), with up to 3 retries and 5-second backoff. If all retries fail, the consumer is notified via `OnStreamTerminated` and the bridge actor is stopped. Events during the reconnection gap are lost (acceptable for real-time debug view). On successful reconnection, the consumer can request a fresh snapshot to re-sync state.
## Failover Behavior ## Failover Behavior
@@ -1,32 +0,0 @@
namespace ScadaLink.Commons.Interfaces.Services;
/// <summary>
/// Interface for sending notifications.
/// Implemented by NotificationService, consumed by ScriptRuntimeContext.
/// </summary>
public interface INotificationDeliveryService
{
/// <summary>
/// Sends a notification to a named list. Transient failures go to S&amp;F.
/// Permanent failures returned to caller.
/// </summary>
/// <param name="listName">Name of the notification list to deliver to.</param>
/// <param name="subject">Subject line of the notification.</param>
/// <param name="message">Plain-text body of the notification.</param>
/// <param name="originInstanceName">Optional name of the instance that triggered the send.</param>
/// <param name="cancellationToken">Cancellation token for the async operation.</param>
Task<NotificationResult> SendAsync(
string listName,
string subject,
string message,
string? originInstanceName = null,
CancellationToken cancellationToken = default);
}
/// <summary>
/// Result of a notification send attempt.
/// </summary>
public record NotificationResult(
bool Success,
string? ErrorMessage,
bool WasBuffered = false);
@@ -1,6 +0,0 @@
namespace ScadaLink.Commons.Messages.Communication;
public record ConnectionStateChanged(
string SiteId,
bool IsConnected,
DateTimeOffset Timestamp);
@@ -60,17 +60,18 @@ public class CentralCommunicationActor : ReceiveActor
/// </summary> /// </summary>
private readonly Dictionary<string, (IActorRef Client, ImmutableHashSet<string> ContactAddresses)> _siteClients = new(); private readonly Dictionary<string, (IActorRef Client, ImmutableHashSet<string> ContactAddresses)> _siteClients = new();
/// <summary> // Communication-016: the previous _debugSubscriptions / _inProgressDeployments
/// Tracks active debug view subscriptions: correlationId → (siteId, subscriber). // dictionaries existed solely to support a documented "synchronous kill streams +
/// Used to kill debug streams on site disconnection (WP-5). // mark deployments failed on site disconnect" workflow triggered by
/// </summary> // ConnectionStateChanged. No production code ever emitted that message — only
private readonly Dictionary<string, (string SiteId, IActorRef Subscriber)> _debugSubscriptions = new(); // the unit test did — so the workflow was dead from end to end. Disconnect
// detection is owned by the underlying transports: the gRPC keepalive PING
/// <summary> // signals stream interruption in ~25s (handled by DebugStreamBridgeActor's own
/// Tracks in-progress deployments: deploymentId → siteId. // reconnection logic), and an Ask round-trip for a deploy times out at the
/// On central failover, in-progress deployments are treated as failed (WP-5). // CommunicationService layer (caller sees failure). The tracking dicts +
/// </summary> // ConnectionStateChanged record + HandleConnectionStateChanged handler are
private readonly Dictionary<string, string> _inProgressDeployments = new(); // removed; see docs/requirements/Component-Communication.md "Connection
// Failure Behavior" for the keepalive-based contract that survives.
private ICancelable? _refreshSchedule; private ICancelable? _refreshSchedule;
@@ -165,9 +166,6 @@ public class CentralCommunicationActor : ReceiveActor
Receive<SiteHealthReportReplica>(r => ProcessLocally(r.Report)); Receive<SiteHealthReportReplica>(r => ProcessLocally(r.Report));
Receive<SubscribeAck>(_ => { /* DistributedPubSub subscribe confirmation */ }); Receive<SubscribeAck>(_ => { /* DistributedPubSub subscribe confirmation */ });
// Connection state changes
Receive<ConnectionStateChanged>(HandleConnectionStateChanged);
// Route enveloped messages to sites // Route enveloped messages to sites
Receive<SiteEnvelope>(HandleSiteEnvelope); Receive<SiteEnvelope>(HandleSiteEnvelope);
@@ -335,44 +333,10 @@ public class CentralCommunicationActor : ReceiveActor
} }
} }
private void HandleConnectionStateChanged(ConnectionStateChanged msg) // Communication-016: HandleConnectionStateChanged removed — no production
{ // caller emitted ConnectionStateChanged, so the workflow ran only in tests.
if (!msg.IsConnected) // Disconnect detection is owned by the transport layers (gRPC keepalive +
{ // ClusterClient/Ask timeout).
_log.Warning("Site {0} disconnected at {1}", msg.SiteId, msg.Timestamp);
// WP-5: Kill active debug streams for the disconnected site
var toRemove = _debugSubscriptions
.Where(kvp => kvp.Value.SiteId == msg.SiteId)
.ToList();
foreach (var kvp in toRemove)
{
_log.Info("Killing debug stream {0} for disconnected site {1}", kvp.Key, msg.SiteId);
kvp.Value.Subscriber.Tell(new DebugStreamTerminated(msg.SiteId, kvp.Key));
_debugSubscriptions.Remove(kvp.Key);
}
// WP-5: Mark in-progress deployments as failed
var failedDeployments = _inProgressDeployments
.Where(kvp => kvp.Value == msg.SiteId)
.Select(kvp => kvp.Key)
.ToList();
foreach (var deploymentId in failedDeployments)
{
_log.Warning("Deployment {0} to site {1} treated as failed due to disconnection",
deploymentId, msg.SiteId);
_inProgressDeployments.Remove(deploymentId);
}
// Note: Do NOT stop the ClusterClient — it handles reconnection internally
}
else
{
_log.Info("Site {0} connected at {1}", msg.SiteId, msg.Timestamp);
}
}
private void HandleSiteEnvelope(SiteEnvelope envelope) private void HandleSiteEnvelope(SiteEnvelope envelope)
{ {
@@ -385,9 +349,6 @@ public class CentralCommunicationActor : ReceiveActor
return; return;
} }
// Track debug subscriptions for cleanup on disconnect
TrackMessageForCleanup(envelope);
// Route via ClusterClient — Sender is preserved for Ask response routing // Route via ClusterClient — Sender is preserved for Ask response routing
entry.Client.Tell( entry.Client.Tell(
new ClusterClient.Send("/user/site-communication", envelope.Message), new ClusterClient.Send("/user/site-communication", envelope.Message),
@@ -485,23 +446,8 @@ public class CentralCommunicationActor : ReceiveActor
_log.Info("Site ClusterClient cache refreshed with {0} site(s)", _siteClients.Count); _log.Info("Site ClusterClient cache refreshed with {0} site(s)", _siteClients.Count);
} }
private void TrackMessageForCleanup(SiteEnvelope envelope) // Communication-016: TrackMessageForCleanup removed — the dicts it fed
{ // existed solely to support the dead ConnectionStateChanged workflow.
switch (envelope.Message)
{
case Commons.Messages.DebugView.SubscribeDebugViewRequest sub:
_debugSubscriptions[sub.CorrelationId] = (envelope.SiteId, Sender);
break;
case Commons.Messages.DebugView.UnsubscribeDebugViewRequest unsub:
_debugSubscriptions.Remove(unsub.CorrelationId);
break;
case Commons.Messages.Deployment.DeployInstanceCommand deploy:
_inProgressDeployments[deploy.DeploymentId] = envelope.SiteId;
break;
}
}
/// <inheritdoc /> /// <inheritdoc />
protected override SupervisorStrategy SupervisorStrategy() protected override SupervisorStrategy SupervisorStrategy()
@@ -547,11 +493,8 @@ public class CentralCommunicationActor : ReceiveActor
/// <inheritdoc /> /// <inheritdoc />
protected override void PostStop() protected override void PostStop()
{ {
_log.Info("CentralCommunicationActor stopped. In-progress deployments treated as failed (WP-5)."); _log.Info("CentralCommunicationActor stopped");
_refreshSchedule?.Cancel(); _refreshSchedule?.Cancel();
// On central failover, all in-progress deployments are failed
_inProgressDeployments.Clear();
_debugSubscriptions.Clear();
} }
} }
@@ -228,7 +228,8 @@ public class DeploymentService
// logged loudly for operator reconciliation but must not flip // logged loudly for operator reconciliation but must not flip
// the already-committed Success record back to Failed. // the already-committed Success record back to Failed.
await ApplyPostSuccessSideEffectsAsync( await ApplyPostSuccessSideEffectsAsync(
instance, deploymentId, revisionHash, configJson, cancellationToken); instance, deploymentId, revisionHash, configJson,
forceEnabledState: true, cancellationToken);
} }
// Audit log // Audit log
@@ -677,8 +678,22 @@ public class DeploymentService
// the instance State to Enabled and store/refresh the deployed // the instance State to Enabled and store/refresh the deployed
// config snapshot — otherwise the central state machine and the // config snapshot — otherwise the central state machine and the
// deployed-snapshot invariant diverge from what the site is running. // deployed-snapshot invariant diverge from what the site is running.
//
// DeploymentManager-018: the reconciliation path runs only when the
// prior record is InProgress or timeout-Failed — exactly the cases
// that survive a central failover. The in-memory operation lock is
// lost on failover, so an operator may have legitimately invoked
// Disable on the instance between the original timed-out deploy and
// this redeploy. Disable does not change the deployed config, so the
// site still reports the target revision hash. Reconciliation must
// therefore PRESERVE an intentional Disabled state instead of
// silently flipping it back to Enabled — pass forceEnabledState:
// false so the helper only promotes NotDeployed → Enabled (the
// first-deploy-timed-out case) and leaves an explicit Disabled
// alone.
await ApplyPostSuccessSideEffectsAsync( await ApplyPostSuccessSideEffectsAsync(
instance, prior.DeploymentId, targetRevisionHash, configJson, cancellationToken); instance, prior.DeploymentId, targetRevisionHash, configJson,
forceEnabledState: false, cancellationToken);
await _auditService.LogAsync(prior.DeployedBy, "DeployReconciled", "Instance", await _auditService.LogAsync(prior.DeployedBy, "DeployReconciled", "Instance",
instance.Id.ToString(), instance.UniqueName, instance.Id.ToString(), instance.UniqueName,
@@ -713,6 +728,19 @@ public class DeploymentService
/// deployed config snapshot (WP-8). Factored into one helper so the two /// deployed config snapshot (WP-8). Factored into one helper so the two
/// paths cannot drift (DeploymentManager-015). /// paths cannot drift (DeploymentManager-015).
/// ///
/// DeploymentManager-018: <paramref name="forceEnabledState"/> distinguishes
/// the two callers. The normal deploy path passes <c>true</c> — a fresh
/// successful apply legitimately puts the instance into <see cref="InstanceState.Enabled"/>
/// (the documented "Deploy on a Disabled instance also enables it" semantics
/// of <see cref="StateTransitionValidator"/>). The reconciliation path
/// passes <c>false</c>: it is reconciling a *prior* deployment that may
/// have completed before the current operator session (central failover
/// loses the in-memory operation lock, so an operator may have legitimately
/// Disabled the instance in between). On that path we only promote
/// <see cref="InstanceState.NotDeployed"/> → <see cref="InstanceState.Enabled"/>
/// (the first-deploy-timed-out case) and leave an explicit Disabled alone,
/// so reconciliation never silently undoes a Disable.
///
/// Best-effort: the deployment record's terminal <see cref="DeploymentStatus.Success"/> /// Best-effort: the deployment record's terminal <see cref="DeploymentStatus.Success"/>
/// status is already committed by the caller before this runs. A failure /// status is already committed by the caller before this runs. A failure
/// here is logged loudly for operator reconciliation but is NOT propagated — /// here is logged loudly for operator reconciliation but is NOT propagated —
@@ -723,12 +751,20 @@ public class DeploymentService
string deploymentId, string deploymentId,
string revisionHash, string revisionHash,
string configJson, string configJson,
bool forceEnabledState,
CancellationToken cancellationToken) CancellationToken cancellationToken)
{ {
try try
{ {
// WP-4: Update instance state to Enabled on successful deployment // WP-4: Update instance state to Enabled on successful deployment.
// DeploymentManager-018: on the reconciliation path
// (forceEnabledState=false) only promote NotDeployed → Enabled,
// preserving an intentional Disabled state set between the original
// timed-out deploy and the redeploy.
if (forceEnabledState || instance.State == InstanceState.NotDeployed)
{
instance.State = InstanceState.Enabled; instance.State = InstanceState.Enabled;
}
await _repository.UpdateInstanceAsync(instance, cancellationToken); await _repository.UpdateInstanceAsync(instance, cancellationToken);
// WP-8: Store deployed config snapshot // WP-8: Store deployed config snapshot
@@ -148,7 +148,26 @@ public class DatabaseGateway : IDatabaseGateway
public async Task<bool> DeliverBufferedAsync( public async Task<bool> DeliverBufferedAsync(
StoreAndForwardMessage message, CancellationToken cancellationToken = default) StoreAndForwardMessage message, CancellationToken cancellationToken = default)
{ {
var payload = JsonSerializer.Deserialize<CachedWritePayload>(message.PayloadJson); // ExternalSystemGateway-018: a malformed (not just empty/null-fielded)
// PayloadJson would otherwise throw `JsonException` here, which the S&F
// engine treats as a transient failure and retries forever (poison
// message). Re-running the same deserialization against the same payload
// will throw deterministically, so JsonException is permanent — log,
// and return false so the S&F engine parks the message instead.
CachedWritePayload? payload;
try
{
payload = JsonSerializer.Deserialize<CachedWritePayload>(message.PayloadJson);
}
catch (JsonException ex)
{
_logger.LogError(
ex,
"Buffered CachedDbWrite message {Id} has malformed JSON payload; parking.",
message.Id);
return false;
}
if (payload == null || string.IsNullOrEmpty(payload.ConnectionName) || string.IsNullOrEmpty(payload.Sql)) if (payload == null || string.IsNullOrEmpty(payload.ConnectionName) || string.IsNullOrEmpty(payload.Sql))
{ {
_logger.LogError("Buffered CachedDbWrite message {Id} has an unreadable payload; parking.", message.Id); _logger.LogError("Buffered CachedDbWrite message {Id} has an unreadable payload; parking.", message.Id);
@@ -173,7 +173,26 @@ public class ExternalSystemClient : IExternalSystemClient
public async Task<bool> DeliverBufferedAsync( public async Task<bool> DeliverBufferedAsync(
StoreAndForwardMessage message, CancellationToken cancellationToken = default) StoreAndForwardMessage message, CancellationToken cancellationToken = default)
{ {
var payload = JsonSerializer.Deserialize<CachedCallPayload>(message.PayloadJson); // ExternalSystemGateway-018: a malformed (not just empty/null-fielded)
// PayloadJson would otherwise throw `JsonException` here, which the S&F
// engine treats as a transient failure and retries forever (poison
// message). Re-running the same deserialization against the same payload
// will throw deterministically, so JsonException is permanent — log,
// and return false so the S&F engine parks the message instead.
CachedCallPayload? payload;
try
{
payload = JsonSerializer.Deserialize<CachedCallPayload>(message.PayloadJson);
}
catch (JsonException ex)
{
_logger.LogError(
ex,
"Buffered ExternalSystem message {Id} has malformed JSON payload; parking.",
message.Id);
return false;
}
if (payload == null || string.IsNullOrEmpty(payload.SystemName) || string.IsNullOrEmpty(payload.MethodName)) if (payload == null || string.IsNullOrEmpty(payload.SystemName) || string.IsNullOrEmpty(payload.MethodName))
{ {
_logger.LogError("Buffered ExternalSystem message {Id} has an unreadable payload; parking.", message.Id); _logger.LogError("Buffered ExternalSystem message {Id} has an unreadable payload; parking.", message.Id);
@@ -0,0 +1,57 @@
using Akka.Cluster;
using ScadaLink.Host.Actors;
using ScadaLink.InboundAPI;
namespace ScadaLink.Host.Health;
/// <summary>
/// InboundAPI-008 / InboundAPI-022: production implementation of
/// <see cref="IActiveNodeGate"/> backed by the running Akka.NET cluster.
///
/// The inbound API is "Central cluster only (active node)" — a standby central
/// node must not execute method scripts or <c>Route.To()</c> calls. This gate
/// mirrors the leadership check in <see cref="ActiveNodeHealthCheck"/> (the
/// node is the cluster leader, <see cref="MemberStatus.Up"/>), so
/// <see cref="InboundApiEndpointFilter"/> can return HTTP 503 on a standby.
///
/// Registered only in the Central-role branch of <c>Program.cs</c>. The gate
/// is resolved per request from <c>HttpContext.RequestServices</c>; while the
/// <c>AkkaHostedService</c> is still warming up (<c>ActorSystem == null</c>)
/// or the node has not yet reached <see cref="MemberStatus.Up"/>, this
/// implementation reports <c>IsActiveNode == false</c> — the safe-by-default
/// answer matching the standby case.
/// </summary>
public sealed class ActiveNodeGate : IActiveNodeGate
{
private readonly AkkaHostedService _akkaService;
/// <summary>Initializes a new <see cref="ActiveNodeGate"/> bound to the given Akka hosted service.</summary>
/// <param name="akkaService">The Akka hosted service exposing the cluster's <see cref="Akka.Actor.ActorSystem"/>.</param>
public ActiveNodeGate(AkkaHostedService akkaService)
{
_akkaService = akkaService;
}
/// <summary>
/// <c>true</c> only when this node has joined the cluster (<see cref="MemberStatus.Up"/>)
/// AND is the current cluster leader; <c>false</c> in every other state
/// (actor system not yet started, node still joining, node is a standby).
/// </summary>
public bool IsActiveNode
{
get
{
var system = _akkaService.ActorSystem;
if (system == null)
return false;
var cluster = Cluster.Get(system);
var self = cluster.SelfMember;
if (self.Status != MemberStatus.Up)
return false;
var leader = cluster.State.Leader;
return leader != null && leader == self.Address;
}
}
}
+10
View File
@@ -120,6 +120,16 @@ try
builder.Services.AddSingleton<AkkaHostedService>(); builder.Services.AddSingleton<AkkaHostedService>();
builder.Services.AddHostedService(sp => sp.GetRequiredService<AkkaHostedService>()); builder.Services.AddHostedService(sp => sp.GetRequiredService<AkkaHostedService>());
// InboundAPI-022: register the production IActiveNodeGate implementation so
// standby-node gating is actually enforced (the InboundApiEndpointFilter
// consults IActiveNodeGate and defaults to "allow" when none is registered,
// which leaves the design's "central cluster only (active node)" guarantee
// unenforced in deployed binaries). The gate is backed by the same Akka
// cluster-leadership check as ActiveNodeHealthCheck above, so the inbound
// API and the /health/active endpoint Traefik routes against agree on
// which node is active.
builder.Services.AddSingleton<ScadaLink.InboundAPI.IActiveNodeGate, ActiveNodeGate>();
// Cluster node status provider scoped to the Central role — feeds the // Cluster node status provider scoped to the Central role — feeds the
// CentralHealthReportLoop so the central cluster appears on /monitoring/health. // CentralHealthReportLoop so the central cluster appears on /monitoring/health.
builder.Services.AddSingleton<IClusterNodeProvider>(sp => builder.Services.AddSingleton<IClusterNodeProvider>(sp =>
@@ -34,8 +34,7 @@ public static class SiteServiceRegistration
// Sites no longer deliver notifications over SMTP — a buffered notification is // Sites no longer deliver notifications over SMTP — a buffered notification is
// forwarded to the central cluster (via NotificationForwarder / SiteCommunicationActor), // forwarded to the central cluster (via NotificationForwarder / SiteCommunicationActor),
// and central owns SMTP delivery through the Notification Outbox. The SMTP machinery // and central owns SMTP delivery through the Notification Outbox. The SMTP machinery
// (OAuth2TokenService, ISmtpClientWrapper, INotificationDeliveryService) has no // (OAuth2TokenService, ISmtpClientWrapper) has no consumer on a site node.
// consumer on a site node.
// Health report transport: sends SiteHealthReport to SiteCommunicationActor via Akka // Health report transport: sends SiteHealthReport to SiteCommunicationActor via Akka
services.AddSingleton<ISiteIdentityProvider, SiteIdentityProvider>(); services.AddSingleton<ISiteIdentityProvider, SiteIdentityProvider>();
@@ -10,14 +10,14 @@ namespace ScadaLink.NotificationOutbox.Delivery;
/// <summary> /// <summary>
/// Task 12: Email channel delivery adapter for the central notification outbox. /// Task 12: Email channel delivery adapter for the central notification outbox.
/// ///
/// Reuses the <see cref="ScadaLink.NotificationService"/> SMTP machinery /// Reuses the <see cref="ScadaLink.NotificationService"/> SMTP primitives
/// <see cref="ISmtpClientWrapper"/>, <see cref="SmtpTlsModeParser"/>, /// <see cref="ISmtpClientWrapper"/>, <see cref="SmtpTlsModeParser"/>,
/// <see cref="OAuth2TokenService"/> and the typed <see cref="SmtpPermanentException"/>. /// <see cref="OAuth2TokenService"/> and the typed <see cref="SmtpPermanentException"/>.
/// The connect/auth/send/disconnect sequence and error classification mirror /// This adapter owns the full connect/auth/send/disconnect sequence and maps the
/// <c>NotificationDeliveryService.DeliverAsync</c>; this adapter, however, maps the /// outcome to the outbox's three-way <see cref="DeliveryOutcome"/> (Success / Permanent /
/// result to the outbox's three-way <see cref="DeliveryOutcome"/> (Success / Permanent /// Transient) — the canonical central-side email delivery path. NS-019: the prior
/// / Transient) rather than the S&amp;F-coupled <c>NotificationResult</c>, which cannot /// site-shaped <c>NotificationDeliveryService</c> was deleted with sites no longer
/// distinguish a permanent failure from a buffered transient one. /// delivering notifications.
/// </summary> /// </summary>
public sealed class EmailNotificationDeliveryAdapter : INotificationDeliveryAdapter public sealed class EmailNotificationDeliveryAdapter : INotificationDeliveryAdapter
{ {
@@ -44,9 +44,8 @@ public sealed class EmailNotificationDeliveryAdapter : INotificationDeliveryAdap
_smtpClientFactory = smtpClientFactory; _smtpClientFactory = smtpClientFactory;
_logger = logger; _logger = logger;
_tokenService = tokenService; _tokenService = tokenService;
// Mirrors NotificationDeliveryService: NotificationOptions supplies the // NotificationOptions supplies the documented fallback values used when a
// documented fallback values used when a deployed SmtpConfiguration row // deployed SmtpConfiguration row leaves a field unset (non-positive).
// leaves a field unset (non-positive).
_options = options?.Value ?? new NotificationOptions(); _options = options?.Value ?? new NotificationOptions();
} }
@@ -81,7 +80,7 @@ public sealed class EmailNotificationDeliveryAdapter : INotificationDeliveryAdap
} }
// An unknown TLS mode is a configuration error that retrying cannot fix — // An unknown TLS mode is a configuration error that retrying cannot fix —
// surface it as a permanent failure (mirrors NS-005 in NotificationDeliveryService). // surface it as a permanent failure (NS-005 SMTP TLS validation policy).
SmtpTlsMode tlsMode; SmtpTlsMode tlsMode;
try try
{ {
@@ -154,11 +153,9 @@ public sealed class EmailNotificationDeliveryAdapter : INotificationDeliveryAdap
} }
/// <summary> /// <summary>
/// Delivers the plain-text BCC email via SMTP. Mirrors the connect/auth/send/ /// Delivers the plain-text BCC email via SMTP. A permanent failure surfaces as
/// disconnect sequence of <c>NotificationDeliveryService.DeliverAsync</c>: a /// <see cref="SmtpPermanentException"/>; transient failures propagate for the
/// permanent failure surfaces as <see cref="SmtpPermanentException"/>; transient /// caller's classifier; the connection is always torn down in the finally block.
/// failures propagate for the caller's classifier; the connection is always torn
/// down in the finally block.
/// </summary> /// </summary>
private async Task SendAsync( private async Task SendAsync(
SmtpConfiguration config, SmtpConfiguration config,
@@ -57,7 +57,7 @@ public class MailKitSmtpClientWrapper : ISmtpClientWrapper, IDisposable
// worst, sending where authentication was required. Authentication being // worst, sending where authentication was required. Authentication being
// skipped must never be silent: each of these is a permanent configuration // skipped must never be silent: each of these is a permanent configuration
// fault, surfaced as SmtpPermanentException so SendAsync returns a clean // fault, surfaced as SmtpPermanentException so SendAsync returns a clean
// failure and DeliverBufferedAsync parks the buffered message. // failure that the central Notification Outbox dispatcher classifies as permanent.
if (string.IsNullOrEmpty(credentials)) if (string.IsNullOrEmpty(credentials))
{ {
throw new SmtpPermanentException( throw new SmtpPermanentException(
@@ -1,448 +0,0 @@
using System.Text.Json;
using Microsoft.Extensions.Logging;
using Microsoft.Extensions.Options;
using ScadaLink.Commons.Entities.Notifications;
using ScadaLink.Commons.Interfaces.Repositories;
using ScadaLink.Commons.Interfaces.Services;
using ScadaLink.Commons.Types.Enums;
using ScadaLink.StoreAndForward;
namespace ScadaLink.NotificationService;
/// <summary>
/// WP-11: Notification delivery via SMTP.
/// WP-12: Error classification and S&amp;F integration.
/// Transient: connection refused, timeout, SMTP 4xx → hand to S&amp;F.
/// Permanent: SMTP 5xx → returned to script.
/// </summary>
public class NotificationDeliveryService : INotificationDeliveryService, IDisposable
{
private readonly INotificationRepository _repository;
private readonly Func<ISmtpClientWrapper> _smtpClientFactory;
private readonly OAuth2TokenService? _tokenService;
private readonly StoreAndForwardService? _storeAndForward;
private readonly ILogger<NotificationDeliveryService> _logger;
private readonly NotificationOptions _options;
/// <summary>
/// Initializes a new instance of the NotificationDeliveryService with the specified dependencies.
/// </summary>
/// <param name="repository">The notification repository for data access.</param>
/// <param name="smtpClientFactory">Factory for creating SMTP client instances.</param>
/// <param name="logger">Logger for diagnostic messages.</param>
/// <param name="tokenService">Optional OAuth2 token service for authentication.</param>
/// <param name="storeAndForward">Optional store-and-forward service for handling transient failures.</param>
/// <param name="options">Optional notification options with fallback values.</param>
public NotificationDeliveryService(
INotificationRepository repository,
Func<ISmtpClientWrapper> smtpClientFactory,
ILogger<NotificationDeliveryService> logger,
OAuth2TokenService? tokenService = null,
StoreAndForwardService? storeAndForward = null,
IOptions<NotificationOptions>? options = null)
{
_repository = repository;
_smtpClientFactory = smtpClientFactory;
_logger = logger;
_tokenService = tokenService;
_storeAndForward = storeAndForward;
// NS-017: NotificationOptions supplies the documented fallback values used
// when a deployed SmtpConfiguration row leaves a field unset (non-positive).
_options = options?.Value ?? new NotificationOptions();
}
/// <inheritdoc />
public async Task<NotificationResult> SendAsync(
string listName,
string subject,
string message,
string? originInstanceName = null,
CancellationToken cancellationToken = default)
{
ObjectDisposedException.ThrowIf(_disposed, this);
var list = await _repository.GetListByNameAsync(listName, cancellationToken);
if (list == null)
{
return new NotificationResult(false, $"Notification list '{listName}' not found");
}
var recipients = await _repository.GetRecipientsByListIdAsync(list.Id, cancellationToken);
if (recipients.Count == 0)
{
return new NotificationResult(false, $"Notification list '{listName}' has no recipients");
}
var smtpConfigs = await _repository.GetAllSmtpConfigurationsAsync(cancellationToken);
var smtpConfig = smtpConfigs.FirstOrDefault();
if (smtpConfig == null)
{
return new NotificationResult(false, "No SMTP configuration available");
}
// NS-005: validate the configured TLS mode up front — an unknown value is a
// configuration error and must surface as a clean result, not a silent
// fallback to opportunistic TLS negotiation.
try
{
SmtpTlsModeParser.Parse(smtpConfig.TlsMode);
}
catch (ArgumentException ex)
{
_logger.LogError("Invalid SMTP TLS mode for list {List}: {Reason}", listName, ex.Message);
return new NotificationResult(false, ex.Message);
}
// NS-008: validate every email address before attempting delivery. A single
// malformed address previously caused MailboxAddress.Parse to throw a
// ParseException that escaped SendAsync unhandled; it must instead produce a
// clean NotificationResult the calling script can handle.
var addressError = EmailAddressValidator.ValidateAddresses(smtpConfig.FromAddress, recipients);
if (addressError != null)
{
_logger.LogWarning("Notification to list {List} has invalid addresses: {Reason}", listName, addressError);
return new NotificationResult(false, addressError);
}
try
{
await DeliverAsync(smtpConfig, recipients, subject, message, cancellationToken);
return new NotificationResult(true, null);
}
catch (SmtpPermanentException ex)
{
// WP-12: Permanent SMTP failure — returned to script.
// NS-009: scrub credential fragments out of the server-supplied message
// before logging or returning it.
var detail = CredentialRedactor.Scrub(ex.Message, smtpConfig.Credentials);
_logger.LogError(
"Permanent SMTP failure sending to list {List}: {Detail}", listName, detail);
return new NotificationResult(false, $"Permanent SMTP error: {detail}");
}
catch (OperationCanceledException) when (cancellationToken.IsCancellationRequested)
{
// NS-002: a caller-requested cancellation propagates; it is not buffered.
throw;
}
catch (Exception ex) when (SmtpErrorClassifier.IsTransient(ex, cancellationToken))
{
// WP-12: Transient SMTP failure — hand to S&F.
// NS-009: scrub credential fragments before logging.
_logger.LogWarning(
"Transient SMTP failure sending to list {List} ({ExceptionType}): {Detail}; buffering for retry",
listName, ex.GetType().Name, CredentialRedactor.Scrub(ex.Message, smtpConfig.Credentials));
if (_storeAndForward == null)
{
return new NotificationResult(false, "Transient SMTP error and store-and-forward not available");
}
var payload = JsonSerializer.Serialize(new
{
ListName = listName,
Subject = subject,
Message = message
});
// attemptImmediateDelivery: false — DeliverAsync was already attempted
// above; letting EnqueueAsync re-invoke the handler would send twice.
await _storeAndForward.EnqueueAsync(
StoreAndForwardCategory.Notification,
listName,
payload,
originInstanceName,
smtpConfig.MaxRetries > 0 ? smtpConfig.MaxRetries : null,
smtpConfig.RetryDelay > TimeSpan.Zero ? smtpConfig.RetryDelay : null,
attemptImmediateDelivery: false);
return new NotificationResult(true, null, WasBuffered: true);
}
catch (Exception ex)
{
// NS-015: a failure that SmtpErrorClassifier does not recognise (Unknown) —
// most importantly an OAuth2 token-fetch failure (HttpRequestException
// from EnsureSuccessStatusCode, or InvalidOperationException from a
// malformed credential triple) — used to fall through all the catch
// clauses above and escape SendAsync as a raw exception to the calling
// script, which the INotificationDeliveryService contract never
// advertises. Convert any otherwise-unhandled exception into a clean,
// credential-scrubbed permanent NotificationResult: returning control to
// the script is the safe default. (A caller-requested cancellation is
// already re-thrown by the filter above and never reaches here.)
var detail = CredentialRedactor.Scrub(ex.Message, smtpConfig.Credentials);
_logger.LogError(
"Unclassified failure sending to list {List} ({ExceptionType}): {Detail}",
listName, ex.GetType().Name, detail);
return new NotificationResult(false, $"Notification delivery failed: {detail}");
}
}
/// <summary>
/// WP-11/12: Delivers a buffered notification during a store-and-forward retry
/// sweep — re-resolves the list, recipients and SMTP config and re-attempts
/// delivery. Returns true on success, false on permanent failure (the message
/// is parked); throws on a transient failure so the engine retries.
/// </summary>
/// <param name="message">The buffered store-and-forward message to deliver.</param>
/// <param name="cancellationToken">Cancellation token for the delivery attempt.</param>
public async Task<bool> DeliverBufferedAsync(
StoreAndForwardMessage message, CancellationToken cancellationToken = default)
{
var payload = JsonSerializer.Deserialize<BufferedNotification>(message.PayloadJson);
if (payload == null || string.IsNullOrEmpty(payload.ListName))
{
_logger.LogError("Buffered notification message {Id} has an unreadable payload; parking.", message.Id);
return false;
}
var list = await _repository.GetListByNameAsync(payload.ListName, cancellationToken);
if (list == null)
{
_logger.LogError(
"Buffered notification to list '{List}' cannot be delivered — the list no longer exists; parking.",
payload.ListName);
return false;
}
var recipients = await _repository.GetRecipientsByListIdAsync(list.Id, cancellationToken);
if (recipients.Count == 0)
{
_logger.LogError("Buffered notification to list '{List}' has no recipients; parking.", payload.ListName);
return false;
}
var smtpConfig = (await _repository.GetAllSmtpConfigurationsAsync(cancellationToken)).FirstOrDefault();
if (smtpConfig == null)
{
_logger.LogError("Buffered notification cannot be delivered — no SMTP configuration available; parking.");
return false;
}
// NS-005: an unknown TLS mode is a configuration error that retrying cannot
// fix — park the buffered message rather than throwing on every sweep.
try
{
SmtpTlsModeParser.Parse(smtpConfig.TlsMode);
}
catch (ArgumentException ex)
{
_logger.LogError(
"Buffered notification to list '{List}' cannot be delivered — {Reason}; parking.",
payload.ListName, ex.Message);
return false;
}
// NS-008: a malformed address cannot be fixed by retrying — park it.
var addressError = EmailAddressValidator.ValidateAddresses(smtpConfig.FromAddress, recipients);
if (addressError != null)
{
_logger.LogError(
"Buffered notification to list '{List}' has invalid addresses ({Reason}); parking.",
payload.ListName, addressError);
return false;
}
try
{
await DeliverAsync(smtpConfig, recipients, payload.Subject, payload.Message, cancellationToken);
return true;
}
catch (SmtpPermanentException ex)
{
// NS-009: scrub credential fragments out of the message before logging.
_logger.LogError(
"Buffered notification to list '{List}' failed permanently ({Detail}); parking.",
payload.ListName, CredentialRedactor.Scrub(ex.Message, smtpConfig.Credentials));
return false;
}
catch (OperationCanceledException) when (cancellationToken.IsCancellationRequested)
{
// A handler shutdown cancellation is neither a delivery success nor a
// permanent failure — let it propagate so the engine does not park.
throw;
}
catch (Exception ex) when (SmtpErrorClassifier.IsTransient(ex, cancellationToken))
{
// A typed transient SMTP error: re-throw so the S&F engine retries.
throw;
}
catch (Exception ex)
{
// NS-014: an exception SmtpErrorClassifier does not recognise (Unknown) —
// chiefly an OAuth2 token-fetch failure — used to escape this handler.
// The S&F engine treats ANY thrown exception as transient, so a
// permanently-broken config (bad client secret, malformed credential
// triple) was retried on every sweep until MaxRetries, burning token
// endpoint calls. Decide deliberately rather than letting it leak:
// - an HttpRequestException with a 5xx token-endpoint status is a
// transient outage → re-throw so the engine retries;
// - everything else (a 4xx/401 token rejection, a malformed credential
// InvalidOperationException, any other unclassified fault) is not
// fixable by retrying → return false so the message is parked.
if (ex is HttpRequestException { StatusCode: { } status } && (int)status is >= 500 and < 600)
{
_logger.LogWarning(
"Buffered notification to list '{List}' hit a transient OAuth2 token-endpoint error ({Status}); will retry.",
payload.ListName, (int)status);
throw;
}
_logger.LogError(
"Buffered notification to list '{List}' failed with a non-retryable error ({ExceptionType}: {Detail}); parking.",
payload.ListName, ex.GetType().Name,
CredentialRedactor.Scrub(ex.Message, smtpConfig.Credentials));
return false;
}
}
private sealed record BufferedNotification(string ListName, string Subject, string Message);
/// <summary>
/// NS-007: throttles concurrent SMTP deliveries to the configured
/// <c>MaxConcurrentConnections</c>. One SMTP config is deployed per site, so the
/// limit is a stable per-site invariant; it is captured lazily on first use.
/// NS-018: a <see cref="Lazy{T}"/> replaces the hand-rolled double-checked
/// init — its publication is correctly synchronised (no lock-free read of a
/// non-volatile field) and it is disposed in <see cref="Dispose"/>.
/// </summary>
private Lazy<SemaphoreSlim>? _concurrencyLimiter;
private readonly object _limiterLock = new();
private bool _disposed;
private SemaphoreSlim GetConcurrencyLimiter(SmtpConfiguration config)
{
// NS-018: the limiter is sized once; capture the size now so the Lazy
// factory does not close over a value that could change between calls.
var configured = config.MaxConcurrentConnections > 0
? config.MaxConcurrentConnections
// NS-017: fall back to the NotificationOptions value, then the
// design-doc default of 5, when the deployed row leaves it unset.
: _options.MaxConcurrentConnections > 0 ? _options.MaxConcurrentConnections : 5;
lock (_limiterLock)
{
ObjectDisposedException.ThrowIf(_disposed, this);
_concurrencyLimiter ??= new Lazy<SemaphoreSlim>(
() => new SemaphoreSlim(configured, configured));
return _concurrencyLimiter.Value;
}
}
/// <summary>
/// NS-018: disposes the lazily-created concurrency limiter. The service is a
/// scoped DI service; without this the <see cref="SemaphoreSlim"/> leaked a
/// handle per scope.
/// </summary>
public void Dispose()
{
lock (_limiterLock)
{
if (_disposed)
{
return;
}
_disposed = true;
if (_concurrencyLimiter is { IsValueCreated: true } limiter)
{
limiter.Value.Dispose();
}
}
GC.SuppressFinalize(this);
}
/// <summary>
/// Delivers an email via SMTP. Throws on failure (transient errors and
/// <see cref="SmtpPermanentException"/> propagate; the caller classifies them).
/// </summary>
/// <param name="config">The SMTP configuration to use for the connection.</param>
/// <param name="recipients">The list of recipients to deliver to.</param>
/// <param name="subject">The email subject line.</param>
/// <param name="body">The plain-text email body.</param>
/// <param name="cancellationToken">Cancellation token for the delivery.</param>
internal async Task DeliverAsync(
SmtpConfiguration config,
IReadOnlyList<NotificationRecipient> recipients,
string subject,
string body,
CancellationToken cancellationToken)
{
var tlsMode = SmtpTlsModeParser.Parse(config.TlsMode);
// NS-007: bound the number of concurrent SMTP connections per site.
var limiter = GetConcurrencyLimiter(config);
await limiter.WaitAsync(cancellationToken);
// NS-004: create exactly one client and dispose the one actually used.
var smtp = _smtpClientFactory();
using var disposable = smtp as IDisposable;
try
{
// NS-005/NS-007: explicit TLS mode and the configured connection timeout.
// NS-017: when the deployed SmtpConfiguration row leaves the timeout
// unset (non-positive), fall back to the NotificationOptions value.
var timeoutSeconds = config.ConnectionTimeoutSeconds > 0
? config.ConnectionTimeoutSeconds
: _options.ConnectionTimeoutSeconds;
await smtp.ConnectAsync(
config.Host, config.Port, tlsMode, timeoutSeconds, cancellationToken);
// Resolve credentials (OAuth2 token fetched/cached by the token service).
var credentials = config.Credentials;
if (config.AuthType.Equals("oauth2", StringComparison.OrdinalIgnoreCase) && _tokenService != null && credentials != null)
{
var token = await _tokenService.GetTokenAsync(credentials, cancellationToken);
credentials = token;
}
// NS-021: OAuth2 XOAUTH2 requires the user identity (FromAddress) to be
// sent alongside the access token; an empty user is rejected by M365.
await smtp.AuthenticateAsync(
config.AuthType,
credentials,
oauth2UserName: config.FromAddress,
cancellationToken: cancellationToken);
var bccAddresses = recipients.Select(r => r.EmailAddress).ToList();
await smtp.SendAsync(config.FromAddress, bccAddresses, subject, body, cancellationToken);
}
catch (OperationCanceledException) when (cancellationToken.IsCancellationRequested)
{
// NS-002: A deliberately cancelled token must propagate as a cancellation,
// not be misclassified as a transient SMTP failure and buffered for retry.
throw;
}
catch (Exception ex) when (SmtpErrorClassifier.Classify(ex, cancellationToken) == SmtpErrorClass.Permanent
&& ex is not SmtpPermanentException)
{
// NS-003: Permanent SMTP failure (5xx) — surface a typed permanent exception.
throw new SmtpPermanentException(ex.Message, ex);
}
// Transient and SmtpPermanentException both propagate unchanged: SendAsync's
// catch filters (SmtpPermanentException / SmtpErrorClassifier.IsTransient) handle them.
finally
{
// NS-010: always tear the connection down, regardless of outcome. The
// SMTP QUIT used to run only on the success path inside the try block,
// so a failed Connect/Authenticate/Send left an open, authenticated
// connection until finalization reclaimed the socket — exhausting the
// server's connection slots under sustained transient failures.
// Disconnect is best-effort: a disconnect failure (e.g. the connection
// is already dead) must not mask the original delivery exception.
try
{
await smtp.DisconnectAsync(cancellationToken);
}
catch (Exception disconnectEx)
{
_logger.LogDebug(
"Ignoring SMTP disconnect failure during cleanup: {Reason}", disconnectEx.Message);
}
// NS-007: always release the concurrency slot, even on failure.
limiter.Release();
}
}
}
@@ -5,10 +5,10 @@ namespace ScadaLink.NotificationService;
/// <c>ScadaLink:Notification</c> configuration section. /// <c>ScadaLink:Notification</c> configuration section.
/// ///
/// SMTP settings are primarily carried by the deployed <c>SmtpConfiguration</c> /// SMTP settings are primarily carried by the deployed <c>SmtpConfiguration</c>
/// entity. NS-017: these values are the fallback used by /// entity. NS-017: these values are the fallback used by the central
/// <see cref="NotificationDeliveryService"/> when the corresponding /// Notification Outbox's <c>EmailNotificationDeliveryAdapter</c> when the
/// <c>SmtpConfiguration</c> field is left unset (non-positive) on a partially /// corresponding <c>SmtpConfiguration</c> field is left unset (non-positive) on a
/// deployed row — a value present on the row always takes precedence. /// partially deployed row — a value present on the row always takes precedence.
/// </summary> /// </summary>
public class NotificationOptions public class NotificationOptions
{ {
@@ -1,12 +1,17 @@
using Microsoft.Extensions.DependencyInjection; using Microsoft.Extensions.DependencyInjection;
using ScadaLink.Commons.Interfaces.Services;
namespace ScadaLink.NotificationService; namespace ScadaLink.NotificationService;
public static class ServiceCollectionExtensions public static class ServiceCollectionExtensions
{ {
/// <summary> /// <summary>
/// Registers the notification delivery services (SMTP, OAuth2 token, delivery adapter). /// Registers the shared SMTP delivery primitives consumed by the central Notification
/// Outbox's <c>EmailNotificationDeliveryAdapter</c>: <see cref="NotificationOptions"/>,
/// <see cref="OAuth2TokenService"/>, and the <see cref="ISmtpClientWrapper"/> factory.
/// Central-only — sites no longer deliver notifications (see
/// <c>Component-NotificationService.md</c>), and the orphaned site-shaped
/// <c>NotificationDeliveryService</c> + <c>INotificationDeliveryService</c> contract
/// was removed (NS-019). Notification dispatch lives in <c>ScadaLink.NotificationOutbox</c>.
/// </summary> /// </summary>
/// <param name="services">The service collection to register into.</param> /// <param name="services">The service collection to register into.</param>
public static IServiceCollection AddNotificationService(this IServiceCollection services) public static IServiceCollection AddNotificationService(this IServiceCollection services)
@@ -17,8 +22,6 @@ public static class ServiceCollectionExtensions
services.AddHttpClient(); services.AddHttpClient();
services.AddSingleton<OAuth2TokenService>(); services.AddSingleton<OAuth2TokenService>();
services.AddSingleton<Func<ISmtpClientWrapper>>(_ => () => new MailKitSmtpClientWrapper()); services.AddSingleton<Func<ISmtpClientWrapper>>(_ => () => new MailKitSmtpClientWrapper());
services.AddScoped<NotificationDeliveryService>();
services.AddScoped<INotificationDeliveryService>(sp => sp.GetRequiredService<NotificationDeliveryService>());
return services; return services;
} }
@@ -26,10 +26,10 @@ public enum SmtpErrorClass
/// the numeric <see cref="SmtpStatusCode"/> rather than locale-dependent substring /// the numeric <see cref="SmtpStatusCode"/> rather than locale-dependent substring
/// matching on the exception message. /// matching on the exception message.
/// <para> /// <para>
/// Public and shared: both <see cref="NotificationDeliveryService"/> (store-and-forward /// Public and shared: the central Notification Outbox's <c>EmailNotificationDeliveryAdapter</c>
/// delivery) and the central Notification Outbox's <c>EmailNotificationDeliveryAdapter</c> /// routes every SMTP failure through this single policy. (NS-019: the orphaned site-side
/// route every SMTP failure through this single policy, so a transient/permanent /// <c>NotificationDeliveryService</c> that previously co-used this classifier was removed
/// boundary change cannot diverge between the two delivery paths. /// when sites stopped delivering notifications.)
/// </para> /// </para>
/// </summary> /// </summary>
public static class SmtpErrorClassifier public static class SmtpErrorClassifier
@@ -1,5 +1,7 @@
using System.Text.Json; using System.Text.Json;
using Akka.Actor; using Akka.Actor;
using Microsoft.Extensions.Logging;
using Microsoft.Extensions.Logging.Abstractions;
using ScadaLink.Commons.Messages.Notification; using ScadaLink.Commons.Messages.Notification;
namespace ScadaLink.StoreAndForward; namespace ScadaLink.StoreAndForward;
@@ -31,6 +33,7 @@ public sealed class NotificationForwarder
private readonly IActorRef _siteCommunicationActor; private readonly IActorRef _siteCommunicationActor;
private readonly string _sourceSiteId; private readonly string _sourceSiteId;
private readonly TimeSpan _forwardTimeout; private readonly TimeSpan _forwardTimeout;
private readonly ILogger<NotificationForwarder> _logger;
/// <param name="siteCommunicationActor"> /// <param name="siteCommunicationActor">
/// The site communication actor. It forwards a <see cref="NotificationSubmit"/> to /// The site communication actor. It forwards a <see cref="NotificationSubmit"/> to
@@ -42,14 +45,21 @@ public sealed class NotificationForwarder
/// How long to wait for central's ack before treating the forward as a transient /// How long to wait for central's ack before treating the forward as a transient
/// failure. Sourced from host configuration. /// failure. Sourced from host configuration.
/// </param> /// </param>
/// <param name="logger">
/// Optional logger. StoreAndForward-018: a corrupt buffered payload is logged at
/// Warning before being discarded so an operator has a forensic trail of the row
/// that vanished from the buffer.
/// </param>
public NotificationForwarder( public NotificationForwarder(
IActorRef siteCommunicationActor, IActorRef siteCommunicationActor,
string sourceSiteId, string sourceSiteId,
TimeSpan forwardTimeout) TimeSpan forwardTimeout,
ILogger<NotificationForwarder>? logger = null)
{ {
_siteCommunicationActor = siteCommunicationActor; _siteCommunicationActor = siteCommunicationActor;
_sourceSiteId = sourceSiteId; _sourceSiteId = sourceSiteId;
_forwardTimeout = forwardTimeout; _forwardTimeout = forwardTimeout;
_logger = logger ?? NullLogger<NotificationForwarder>.Instance;
} }
/// <summary> /// <summary>
@@ -61,11 +71,26 @@ public sealed class NotificationForwarder
/// <param name="message">The buffered store-and-forward message to deliver to central.</param> /// <param name="message">The buffered store-and-forward message to deliver to central.</param>
public async Task<bool> DeliverAsync(StoreAndForwardMessage message) public async Task<bool> DeliverAsync(StoreAndForwardMessage message)
{ {
// An unreadable payload cannot be fixed by retrying — park it (return false), // StoreAndForward-018: an unreadable payload cannot be fixed by retrying.
// mirroring how the former SMTP handler treated a corrupt buffered payload. // The design doc explicitly forbids parking notifications ("notifications do
// not park — they are retried at the fixed forward interval until central
// acks"; Component-StoreAndForward.md). The earlier behaviour returned false
// here, which the S&F engine interprets as a permanent failure and parks
// the row — contradicting the invariant and surfacing the row in the
// central UI's parked-message list. The correct outcome for a corrupt-payload
// notification is to DISCARD: log a Warning with the buffered row id +
// payload preview for forensics, then return true so the engine clears the
// buffer via its standard success-path cleanup. The buffered row is
// unrecoverable; retrying or parking would both make the queue worse, not
// better.
if (!TryBuildSubmit(message, out var submit)) if (!TryBuildSubmit(message, out var submit))
{ {
return false; _logger.LogWarning(
"Discarding corrupt buffered notification {NotificationId} (payload is not deserialisable as NotificationSubmit). " +
"Payload preview: {PayloadPreview}",
message.Id,
PreviewPayload(message.PayloadJson));
return true;
} }
// The reply may legitimately be a non-accepted ack, so it is not requested as // The reply may legitimately be a non-accepted ack, so it is not requested as
@@ -140,6 +165,25 @@ public sealed class NotificationForwarder
}; };
return true; return true;
} }
private const int CorruptPayloadPreviewMaxLength = 200;
/// <summary>
/// Returns a length-capped preview of a corrupt buffered payload for the Warning
/// log line emitted on discard. The full payload may be megabytes and is not
/// suitable for the structured log; the preview retains the leading characters,
/// which is what an operator typically uses to identify the producing script.
/// </summary>
private static string PreviewPayload(string? payloadJson)
{
if (string.IsNullOrEmpty(payloadJson))
{
return "<empty>";
}
return payloadJson.Length <= CorruptPayloadPreviewMaxLength
? payloadJson
: payloadJson.Substring(0, CorruptPayloadPreviewMaxLength) + "…";
}
} }
/// <summary> /// <summary>
@@ -111,12 +111,14 @@ public class DiffService
a.CanonicalName == b.CanonicalName && a.CanonicalName == b.CanonicalName &&
a.Value == b.Value && a.Value == b.Value &&
a.DataType == b.DataType && a.DataType == b.DataType &&
a.Description == b.Description &&
a.IsLocked == b.IsLocked && a.IsLocked == b.IsLocked &&
a.DataSourceReference == b.DataSourceReference && a.DataSourceReference == b.DataSourceReference &&
a.BoundDataConnectionId == b.BoundDataConnectionId; a.BoundDataConnectionId == b.BoundDataConnectionId;
private static bool AlarmsEqual(ResolvedAlarm a, ResolvedAlarm b) => private static bool AlarmsEqual(ResolvedAlarm a, ResolvedAlarm b) =>
a.CanonicalName == b.CanonicalName && a.CanonicalName == b.CanonicalName &&
a.Description == b.Description &&
a.PriorityLevel == b.PriorityLevel && a.PriorityLevel == b.PriorityLevel &&
a.IsLocked == b.IsLocked && a.IsLocked == b.IsLocked &&
a.TriggerType == b.TriggerType && a.TriggerType == b.TriggerType &&
@@ -132,4 +134,27 @@ public class DiffService
a.ParameterDefinitions == b.ParameterDefinitions && a.ParameterDefinitions == b.ParameterDefinitions &&
a.ReturnDefinition == b.ReturnDefinition && a.ReturnDefinition == b.ReturnDefinition &&
a.MinTimeBetweenRuns == b.MinTimeBetweenRuns; a.MinTimeBetweenRuns == b.MinTimeBetweenRuns;
/// <summary>
/// Compares two <see cref="ConnectionConfig"/> instances for equality across
/// the fields that travel in the deployment package: protocol, primary and
/// backup configuration JSON, and failover retry count. Used by callers that
/// need to detect connection-endpoint drift; the public diff shape only
/// exposes attribute / alarm / script changes today (see TemplateEngine-018
/// for the diff-shape extension that surfaces added / removed / changed
/// connections in the UI).
/// </summary>
/// <param name="a">First connection configuration.</param>
/// <param name="b">Second connection configuration.</param>
/// <returns>True when both configurations are equal.</returns>
public static bool ConnectionsEqual(ConnectionConfig a, ConnectionConfig b)
{
ArgumentNullException.ThrowIfNull(a);
ArgumentNullException.ThrowIfNull(b);
return a.Protocol == b.Protocol &&
a.ConfigurationJson == b.ConfigurationJson &&
a.BackupConfigurationJson == b.BackupConfigurationJson &&
a.FailoverRetryCount == b.FailoverRetryCount;
}
} }
@@ -52,6 +52,7 @@ public class RevisionHashService
CanonicalName = a.CanonicalName, CanonicalName = a.CanonicalName,
Value = a.Value, Value = a.Value,
DataType = a.DataType, DataType = a.DataType,
Description = a.Description,
IsLocked = a.IsLocked, IsLocked = a.IsLocked,
DataSourceReference = a.DataSourceReference, DataSourceReference = a.DataSourceReference,
BoundDataConnectionId = a.BoundDataConnectionId BoundDataConnectionId = a.BoundDataConnectionId
@@ -62,6 +63,7 @@ public class RevisionHashService
.Select(a => new HashableAlarm .Select(a => new HashableAlarm
{ {
CanonicalName = a.CanonicalName, CanonicalName = a.CanonicalName,
Description = a.Description,
PriorityLevel = a.PriorityLevel, PriorityLevel = a.PriorityLevel,
IsLocked = a.IsLocked, IsLocked = a.IsLocked,
TriggerType = a.TriggerType, TriggerType = a.TriggerType,
@@ -82,7 +84,20 @@ public class RevisionHashService
ReturnDefinition = s.ReturnDefinition, ReturnDefinition = s.ReturnDefinition,
MinTimeBetweenRunsTicks = s.MinTimeBetweenRuns?.Ticks MinTimeBetweenRunsTicks = s.MinTimeBetweenRuns?.Ticks
}) })
.ToList() .ToList(),
Connections = configuration.Connections is { Count: > 0 }
? new SortedDictionary<string, HashableConnection>(
configuration.Connections.ToDictionary(
kvp => kvp.Key,
kvp => new HashableConnection
{
BackupConfigurationJson = kvp.Value.BackupConfigurationJson,
ConfigurationJson = kvp.Value.ConfigurationJson,
FailoverRetryCount = kvp.Value.FailoverRetryCount,
Protocol = kvp.Value.Protocol
}),
StringComparer.Ordinal)
: null
}; };
var json = JsonSerializer.Serialize(hashInput, CanonicalJsonOptions); var json = JsonSerializer.Serialize(hashInput, CanonicalJsonOptions);
@@ -108,6 +123,12 @@ public class RevisionHashService
/// </summary> /// </summary>
public List<HashableAttribute> Attributes { get; init; } = []; public List<HashableAttribute> Attributes { get; init; } = [];
/// <summary> /// <summary>
/// Data connection configurations keyed by connection name. Sorted by key
/// (ordinal) to keep serialization deterministic. Null when the deployment
/// package carries no connections.
/// </summary>
public SortedDictionary<string, HashableConnection>? Connections { get; init; }
/// <summary>
/// The unique instance name. /// The unique instance name.
/// </summary> /// </summary>
public string InstanceUniqueName { get; init; } = string.Empty; public string InstanceUniqueName { get; init; } = string.Empty;
@@ -144,6 +165,11 @@ public class RevisionHashService
/// </summary> /// </summary>
public string DataType { get; init; } = string.Empty; public string DataType { get; init; } = string.Empty;
/// <summary> /// <summary>
/// The attribute description (authoring-time documentation that still
/// travels with the deployed payload).
/// </summary>
public string? Description { get; init; }
/// <summary>
/// Whether the attribute is locked. /// Whether the attribute is locked.
/// </summary> /// </summary>
public bool IsLocked { get; init; } public bool IsLocked { get; init; }
@@ -160,6 +186,11 @@ public class RevisionHashService
/// </summary> /// </summary>
public string CanonicalName { get; init; } = string.Empty; public string CanonicalName { get; init; } = string.Empty;
/// <summary> /// <summary>
/// The alarm description (authoring-time documentation that still
/// travels with the deployed payload).
/// </summary>
public string? Description { get; init; }
/// <summary>
/// Whether the alarm is locked. /// Whether the alarm is locked.
/// </summary> /// </summary>
public bool IsLocked { get; init; } public bool IsLocked { get; init; }
@@ -181,6 +212,26 @@ public class RevisionHashService
public string TriggerType { get; init; } = string.Empty; public string TriggerType { get; init; } = string.Empty;
} }
private sealed record HashableConnection
{
/// <summary>
/// Backup connection configuration JSON, if any.
/// </summary>
public string? BackupConfigurationJson { get; init; }
/// <summary>
/// Primary connection configuration JSON.
/// </summary>
public string? ConfigurationJson { get; init; }
/// <summary>
/// Number of failover retries before giving up.
/// </summary>
public int FailoverRetryCount { get; init; }
/// <summary>
/// Protocol name (e.g. "OpcUa").
/// </summary>
public string Protocol { get; init; } = string.Empty;
}
private sealed record HashableScript private sealed record HashableScript
{ {
/// <summary> /// <summary>
@@ -230,17 +230,10 @@ public class CompatibilityTests
// ── Round-trip serialization for all key message types ── // ── Round-trip serialization for all key message types ──
[Fact] // Communication-016: RoundTrip_ConnectionStateChanged_Succeeds removed
public void RoundTrip_ConnectionStateChanged_Succeeds() // alongside the dead ConnectionStateChanged message record. No production
{ // code emits or receives this message — disconnect detection is owned by
var msg = new ConnectionStateChanged("site-01", true, DateTimeOffset.UtcNow); // the gRPC keepalive and the Ask-timeout path.
var json = JsonSerializer.Serialize(msg);
var deserialized = JsonSerializer.Deserialize<ConnectionStateChanged>(json, Options);
Assert.NotNull(deserialized);
Assert.Equal("site-01", deserialized!.SiteId);
Assert.True(deserialized.IsConnected);
}
[Fact] [Fact]
public void RoundTrip_AlarmStateChanged_Succeeds() public void RoundTrip_AlarmStateChanged_Succeeds()
@@ -116,30 +116,13 @@ public class CentralCommunicationActorTests : TestKit
ExpectNoMsg(TimeSpan.FromMilliseconds(200)); ExpectNoMsg(TimeSpan.FromMilliseconds(200));
} }
[Fact] // Communication-016: the prior `ConnectionLost_DebugStreamsKilled` test was
public void ConnectionLost_DebugStreamsKilled() // removed alongside the dead HandleConnectionStateChanged handler. No
{ // production code ever emitted ConnectionStateChanged, so the test was
var site = CreateSite("site1", "akka.tcp://scadalink@host:8082"); // exercising a workflow that never ran. Disconnect detection is owned by
var (actor, _, siteProbes) = CreateActorWithMockRepo(new[] { site }); // the gRPC keepalive (DebugStreamBridgeActor self-terminates) and by the
// Ask-timeout path at the CommunicationService layer (deploy callers see
// Wait for auto-refresh // a failure).
Thread.Sleep(1000);
// Subscribe to debug view (tracks the subscription)
var subscriberProbe = CreateTestProbe();
var subRequest = new SubscribeDebugViewRequest("inst1", "corr-123");
actor.Tell(new SiteEnvelope("site1", subRequest), subscriberProbe.Ref);
// The ClusterClient probe receives the routed message
siteProbes["site1"].ExpectMsg<ClusterClient.Send>();
// Simulate site disconnection
actor.Tell(new ConnectionStateChanged("site1", false, DateTimeOffset.UtcNow));
// The subscriber should receive a DebugStreamTerminated notification
subscriberProbe.ExpectMsg<DebugStreamTerminated>(
msg => msg.SiteId == "site1" && msg.CorrelationId == "corr-123");
}
[Fact] [Fact]
public void Heartbeat_BumpsAggregatorTimestamp() public void Heartbeat_BumpsAggregatorTimestamp()
@@ -820,6 +820,56 @@ public class DeploymentServiceTests : TestKit
Assert.Equal("sha256:target", storedSnapshot.RevisionHash); Assert.Equal("sha256:target", storedSnapshot.RevisionHash);
} }
// ── DeploymentManager-018: reconciliation must preserve an intentional Disabled state ──
[Fact]
public async Task DeployInstanceAsync_Reconciled_DisabledInstance_PreservesDisabledState()
{
// DeploymentManager-018: after a central failover, the in-memory
// OperationLockManager is lost (by design — in-progress treated as
// failed). The prior deployment record remains InProgress in the DB.
// The operator can legitimately invoke Disable on the instance between
// the timed-out deploy and the redeploy. Disable does not change the
// deployed config, so the site still reports the target revision hash.
// When the operator retries the deploy, the reconciliation branch must
// NOT silently overwrite Instance.State back to Enabled — that would
// undo the explicit operator action with no audit trail.
var instance = new Instance("ReconcileDisabled")
{
Id = 72, SiteId = 1, State = InstanceState.Disabled
};
_repo.GetInstanceByIdAsync(72, Arg.Any<CancellationToken>()).Returns(instance);
SetupValidPipeline(72, "ReconcileDisabled", "sha256:target");
var prior = new DeploymentRecord("dep-prior-72", "admin")
{
InstanceId = 72,
Status = DeploymentStatus.InProgress,
RevisionHash = "sha256:target"
};
_repo.GetCurrentDeploymentStatusAsync(72, Arg.Any<CancellationToken>()).Returns(prior);
_repo.GetDeployedSnapshotByInstanceIdAsync(72, Arg.Any<CancellationToken>())
.Returns((DeployedConfigSnapshot?)null);
var commActor = Sys.ActorOf(Props.Create(() =>
new ReconcileProbeActor(siteHash: "sha256:target", failQuery: false)));
var service = CreateServiceWithCommActor(commActor);
var result = await service.DeployInstanceAsync(72, "admin");
// The reconciliation still succeeds and the prior record is marked
// Success — central and site agree on the applied config.
Assert.True(result.IsSuccess);
Assert.Equal(DeploymentStatus.Success, prior.Status);
Assert.Equal(1, ReconcileProbeActor.QueryCount);
Assert.Equal(0, ReconcileProbeActor.DeployCount);
// DeploymentManager-018: the operator's explicit Disable must survive
// the reconciliation — Instance.State stays Disabled, not silently
// flipped to Enabled.
Assert.Equal(InstanceState.Disabled, instance.State);
}
// ── DeploymentManager-016: reconciled record must carry the target revision hash ── // ── DeploymentManager-016: reconciled record must carry the target revision hash ──
[Fact] [Fact]
@@ -207,6 +207,31 @@ public class DatabaseGatewayTests
Assert.False(delivered); // permanent — the S&F engine parks the message Assert.False(delivered); // permanent — the S&F engine parks the message
} }
// ── ExternalSystemGateway-018: malformed JSON payload must park, not retry-forever ──
[Fact]
public async Task DeliverBuffered_MalformedJsonPayload_ReturnsFalseSoMessageParks()
{
// No connection stub needed — deserialization fails before any
// resolution or SQL execution. If the JsonException were to escape (the
// pre-018 behaviour) the S&F engine would treat it as transient and
// retry the same poison row forever.
var gateway = new DatabaseGateway(_repository, NullLogger<DatabaseGateway>.Instance);
var poisonMessage = new ScadaLink.StoreAndForward.StoreAndForwardMessage
{
Id = Guid.NewGuid().ToString("N"),
Category = ScadaLink.Commons.Types.Enums.StoreAndForwardCategory.CachedDbWrite,
Target = "someDb",
// Truncated mid-write — `{` opens an object that never closes.
PayloadJson = "{\"ConnectionName\":\"someDb\",\"Sql\":\"INSERT",
};
var delivered = await gateway.DeliverBufferedAsync(poisonMessage);
Assert.False(delivered); // permanent — the S&F engine parks the message
}
// ── ExternalSystemGateway-010: SqlConnection must not leak when OpenAsync fails ── // ── ExternalSystemGateway-010: SqlConnection must not leak when OpenAsync fails ──
[Fact] [Fact]
@@ -234,6 +234,32 @@ public class ExternalSystemClientTests
() => client.DeliverBufferedAsync(BufferedCall("TestAPI", "failMethod"))); () => client.DeliverBufferedAsync(BufferedCall("TestAPI", "failMethod")));
} }
// ── ExternalSystemGateway-018: malformed JSON payload must park, not retry-forever ──
[Fact]
public async Task DeliverBuffered_MalformedJsonPayload_ReturnsFalseSoMessageParks()
{
// No repository / HTTP stubs needed — deserialization fails before any
// resolution or HTTP call. If the JsonException were to escape (the
// pre-018 behaviour) the S&F engine would treat it as transient and
// retry the same poison row forever.
var client = new ExternalSystemClient(
_httpClientFactory, _repository, NullLogger<ExternalSystemClient>.Instance);
var poisonMessage = new StoreAndForwardMessage
{
Id = Guid.NewGuid().ToString("N"),
Category = ScadaLink.Commons.Types.Enums.StoreAndForwardCategory.ExternalSystem,
Target = "TestAPI",
// Truncated mid-write — `{` opens an object that never closes.
PayloadJson = "{\"SystemName\":\"TestAPI\",\"MethodName\":\"get",
};
var delivered = await client.DeliverBufferedAsync(poisonMessage);
Assert.False(delivered); // permanent — the S&F engine parks the message
}
// ── ExternalSystemGateway-003: CachedCall must not double-dispatch ── // ── ExternalSystemGateway-003: CachedCall must not double-dispatch ──
[Fact] [Fact]
@@ -17,6 +17,7 @@ using ScadaLink.ExternalSystemGateway;
using ScadaLink.HealthMonitoring; using ScadaLink.HealthMonitoring;
using ScadaLink.Host; using ScadaLink.Host;
using ScadaLink.Host.Actors; using ScadaLink.Host.Actors;
using ScadaLink.Host.Health;
using ScadaLink.InboundAPI; using ScadaLink.InboundAPI;
using ScadaLink.ManagementService; using ScadaLink.ManagementService;
using ScadaLink.NotificationService; using ScadaLink.NotificationService;
@@ -204,9 +205,13 @@ public class CentralCompositionRootTests : IDisposable
new object[] { typeof(IExternalSystemClient) }, new object[] { typeof(IExternalSystemClient) },
new object[] { typeof(DatabaseGateway) }, new object[] { typeof(DatabaseGateway) },
new object[] { typeof(IDatabaseGateway) }, new object[] { typeof(IDatabaseGateway) },
// NotificationService // NotificationService — central-only SMTP primitives. The orphan
new object[] { typeof(NotificationDeliveryService) }, // NotificationDeliveryService + INotificationDeliveryService were removed
new object[] { typeof(INotificationDeliveryService) }, // (NS-019) when sites stopped delivering notifications; the central
// EmailNotificationDeliveryAdapter is now the only resolver of these
// primitives.
new object[] { typeof(Func<ISmtpClientWrapper>) },
new object[] { typeof(OAuth2TokenService) },
// ConfigurationDatabase repositories // ConfigurationDatabase repositories
new object[] { typeof(ScadaLinkDbContext) }, new object[] { typeof(ScadaLinkDbContext) },
new object[] { typeof(ISecurityRepository) }, new object[] { typeof(ISecurityRepository) },
@@ -277,6 +282,30 @@ public class CentralCompositionRootTests : IDisposable
var hostedServices = _factory.Services.GetServices<IHostedService>(); var hostedServices = _factory.Services.GetServices<IHostedService>();
Assert.Contains(hostedServices, s => s.GetType() == typeof(CentralHealthAggregator)); Assert.Contains(hostedServices, s => s.GetType() == typeof(CentralHealthAggregator));
} }
// --- InboundAPI-022 regression ---
/// <summary>
/// InboundAPI-022 regression: the Central composition root MUST register a
/// concrete <see cref="IActiveNodeGate"/> implementation. Without it,
/// <see cref="InboundApiEndpointFilter"/> falls through to "allow" and a
/// standby central node continues to serve the inbound API, racing the
/// active node and executing scripts against stale singleton state. The
/// design's "central cluster only (active node)" guarantee is enforced only
/// when the production gate is wired here.
///
/// Structural check on the built provider (not just <see cref="IServiceCollection"/>)
/// — a registration the framework cannot resolve to a concrete instance is
/// indistinguishable from "missing" at runtime, which is the failure mode
/// the finding describes.
/// </summary>
[Fact]
public void Central_IActiveNodeGate_IsRegisteredAsActiveNodeGate()
{
var gate = _factory.Services.GetService<IActiveNodeGate>();
Assert.NotNull(gate);
Assert.IsType<ActiveNodeGate>(gate);
}
} }
/// <summary> /// <summary>
@@ -5,11 +5,9 @@ using System.Text.Json;
using Microsoft.Extensions.DependencyInjection; using Microsoft.Extensions.DependencyInjection;
using NSubstitute; using NSubstitute;
using ScadaLink.Commons.Entities.InboundApi; using ScadaLink.Commons.Entities.InboundApi;
using ScadaLink.Commons.Entities.Notifications;
using ScadaLink.Commons.Interfaces.Repositories; using ScadaLink.Commons.Interfaces.Repositories;
using ScadaLink.Commons.Interfaces.Services; using ScadaLink.Commons.Interfaces.Services;
using ScadaLink.InboundAPI; using ScadaLink.InboundAPI;
using ScadaLink.NotificationService;
namespace ScadaLink.IntegrationTests; namespace ScadaLink.IntegrationTests;
@@ -98,42 +96,11 @@ public class IntegrationSurfaceTests
} }
// ── Notification: mock SMTP delivery ── // ── Notification: mock SMTP delivery ──
// NS-019: the site-shaped NotificationDeliveryService that this case exercised
[Fact] // was removed when sites stopped delivering notifications. The central SMTP
public async Task Notification_Send_MockSmtp_Delivers() // delivery path is now covered end-to-end by
{ // ScadaLink.NotificationOutbox.Tests.Delivery.EmailNotificationDeliveryAdapterTests;
var repository = Substitute.For<INotificationRepository>(); // no equivalent integration-surface assertion is needed here.
var smtpClient = Substitute.For<ISmtpClientWrapper>();
var list = new NotificationList("alerts") { Id = 1 };
var recipients = new List<NotificationRecipient>
{
new("Admin", "admin@example.com") { Id = 1, NotificationListId = 1 }
};
var smtpConfig = new SmtpConfiguration("smtp.example.com", "basic", "noreply@example.com")
{
Id = 1, Port = 587, Credentials = "user:pass"
};
repository.GetListByNameAsync("alerts").Returns(list);
repository.GetRecipientsByListIdAsync(1).Returns(recipients);
repository.GetAllSmtpConfigurationsAsync().Returns(new List<SmtpConfiguration> { smtpConfig });
var service = new NotificationDeliveryService(
repository,
() => smtpClient,
Microsoft.Extensions.Logging.Abstractions.NullLogger<NotificationDeliveryService>.Instance);
var result = await service.SendAsync("alerts", "Test Alert", "Something happened");
Assert.True(result.Success);
await smtpClient.Received(1).SendAsync(
"noreply@example.com",
Arg.Is<IEnumerable<string>>(r => r.Contains("admin@example.com")),
"Test Alert",
"Something happened",
Arg.Any<CancellationToken>());
}
// ── Script Context: integration API wiring ── // ── Script Context: integration API wiring ──
File diff suppressed because it is too large Load Diff
@@ -197,4 +197,58 @@ public class NotificationForwarderTests : TestKit
Assert.Equal(submit1.NotificationId, submit2.NotificationId); Assert.Equal(submit1.NotificationId, submit2.NotificationId);
Assert.Equal("stable-msg-id", submit1.NotificationId); Assert.Equal("stable-msg-id", submit1.NotificationId);
} }
[Fact]
public async Task Deliver_CorruptJsonPayload_ReturnsTrue_AndDoesNotForwardAnything()
{
// Regression test for StoreAndForward-018. The design doc forbids parking
// notifications ("notifications do not park — they are retried at the fixed
// forward interval until central acks"; Component-StoreAndForward.md). The
// previous implementation returned false on a corrupt payload, which the S&F
// engine interprets as a permanent failure and parks the row — contradicting
// the invariant. The fix: discard a corrupt buffered notification by
// returning true (engine clears the buffer via its normal success path),
// with a Warning log line carrying the row id and a payload preview.
var centralProbe = CreateTestProbe();
var forwarder = new NotificationForwarder(
centralProbe.Ref, "site-7", ForwardTimeout);
var corrupt = new StoreAndForwardMessage
{
Id = "msg-corrupt",
Category = StoreAndForwardCategory.Notification,
Target = "Operators",
PayloadJson = "{not-valid-json",
OriginInstanceName = "Plant.Pump3",
};
Assert.True(await forwarder.DeliverAsync(corrupt));
// The corrupt-payload path must NOT round-trip to central — no
// NotificationSubmit / no Ask. ExpectNoMsg confirms nothing was forwarded.
centralProbe.ExpectNoMsg(TimeSpan.FromMilliseconds(200));
}
[Fact]
public async Task Deliver_NullDeserializedPayload_ReturnsTrue_AndDoesNotForwardAnything()
{
// The companion case to corrupt JSON: the payload is valid JSON but
// deserialises to null (e.g. "null"). Same treatment per StoreAndForward-018
// — discard rather than park.
var centralProbe = CreateTestProbe();
var forwarder = new NotificationForwarder(
centralProbe.Ref, "site-7", ForwardTimeout);
var nullPayload = new StoreAndForwardMessage
{
Id = "msg-null",
Category = StoreAndForwardCategory.Notification,
Target = "Operators",
PayloadJson = "null",
OriginInstanceName = "Plant.Pump3",
};
Assert.True(await forwarder.DeliverAsync(nullPayload));
centralProbe.ExpectNoMsg(TimeSpan.FromMilliseconds(200));
}
} }
@@ -133,4 +133,122 @@ public class DiffServiceTests
Assert.Single(diff.ScriptChanges); Assert.Single(diff.ScriptChanges);
Assert.Equal(DiffChangeType.Changed, diff.ScriptChanges[0].ChangeType); Assert.Equal(DiffChangeType.Changed, diff.ScriptChanges[0].ChangeType);
} }
[Fact]
public void ComputeDiff_AttributeDescriptionChange_DetectedAsChanged()
{
// TemplateEngine-017: AttributesEqual must compare Description.
var oldConfig = new FlattenedConfiguration
{
InstanceUniqueName = "Instance1",
Attributes =
[
new ResolvedAttribute { CanonicalName = "Temp", Value = "25", DataType = "Double", Description = "Original" }
]
};
var newConfig = new FlattenedConfiguration
{
InstanceUniqueName = "Instance1",
Attributes =
[
new ResolvedAttribute { CanonicalName = "Temp", Value = "25", DataType = "Double", Description = "Updated" }
]
};
var diff = _sut.ComputeDiff(oldConfig, newConfig);
Assert.True(diff.HasChanges);
Assert.Single(diff.AttributeChanges);
Assert.Equal(DiffChangeType.Changed, diff.AttributeChanges[0].ChangeType);
}
[Fact]
public void ComputeDiff_AlarmDescriptionChange_DetectedAsChanged()
{
// TemplateEngine-017: AlarmsEqual must compare Description.
var oldConfig = new FlattenedConfiguration
{
InstanceUniqueName = "Instance1",
Alarms =
[
new ResolvedAlarm { CanonicalName = "HighTemp", TriggerType = "RangeViolation", Description = "Original" }
]
};
var newConfig = new FlattenedConfiguration
{
InstanceUniqueName = "Instance1",
Alarms =
[
new ResolvedAlarm { CanonicalName = "HighTemp", TriggerType = "RangeViolation", Description = "Updated" }
]
};
var diff = _sut.ComputeDiff(oldConfig, newConfig);
Assert.True(diff.HasChanges);
Assert.Single(diff.AlarmChanges);
Assert.Equal(DiffChangeType.Changed, diff.AlarmChanges[0].ChangeType);
}
[Fact]
public void ConnectionsEqual_IdenticalConfigs_ReturnsTrue()
{
// TemplateEngine-017: ConnectionsEqual is the comparator callers use
// to detect connection-endpoint drift (the diff-view extension that
// surfaces this in the UI is tracked under TemplateEngine-018).
var a = new ConnectionConfig
{
Protocol = "OpcUa",
ConfigurationJson = "{\"endpoint\":\"opc.tcp://host-a\"}",
BackupConfigurationJson = "{\"endpoint\":\"opc.tcp://host-b\"}",
FailoverRetryCount = 3
};
var b = a with { };
Assert.True(DiffService.ConnectionsEqual(a, b));
}
[Fact]
public void ConnectionsEqual_EndpointEdit_ReturnsFalse()
{
// TemplateEngine-017: primary endpoint JSON edit must surface as a
// change. Without this, deployment redeploys ship a different
// ConnectionConfig with no visible drift signal.
var a = new ConnectionConfig
{
Protocol = "OpcUa",
ConfigurationJson = "{\"endpoint\":\"opc.tcp://host-a:4840\"}",
FailoverRetryCount = 3
};
var b = a with { ConfigurationJson = "{\"endpoint\":\"opc.tcp://host-b:4840\"}" };
Assert.False(DiffService.ConnectionsEqual(a, b));
}
[Fact]
public void ConnectionsEqual_BackupConfigurationEdit_ReturnsFalse()
{
var a = new ConnectionConfig { Protocol = "OpcUa", ConfigurationJson = "{}", BackupConfigurationJson = null, FailoverRetryCount = 3 };
var b = a with { BackupConfigurationJson = "{\"endpoint\":\"opc.tcp://backup\"}" };
Assert.False(DiffService.ConnectionsEqual(a, b));
}
[Fact]
public void ConnectionsEqual_FailoverRetryCountEdit_ReturnsFalse()
{
var a = new ConnectionConfig { Protocol = "OpcUa", ConfigurationJson = "{}", FailoverRetryCount = 3 };
var b = a with { FailoverRetryCount = 5 };
Assert.False(DiffService.ConnectionsEqual(a, b));
}
[Fact]
public void ConnectionsEqual_ProtocolEdit_ReturnsFalse()
{
var a = new ConnectionConfig { Protocol = "OpcUa", ConfigurationJson = "{}", FailoverRetryCount = 3 };
var b = a with { Protocol = "Modbus" };
Assert.False(DiffService.ConnectionsEqual(a, b));
}
} }
@@ -131,6 +131,154 @@ public class RevisionHashServiceTests
} }
} }
[Fact]
public void ComputeHash_AttributeDescriptionEdit_ChangesHash()
{
// TemplateEngine-017: Description must be folded into the hash so that
// edits to authoring-time documentation (which still travels in the
// deployed payload) flow through the staleness indicator.
var baseAttr = new ResolvedAttribute
{
CanonicalName = "Temperature",
Value = "25",
DataType = "Double",
Description = "Original description"
};
var editedAttr = baseAttr with { Description = "Updated description" };
var configBefore = new FlattenedConfiguration
{
InstanceUniqueName = "Instance1",
TemplateId = 1,
SiteId = 1,
Attributes = [baseAttr]
};
var configAfter = configBefore with { Attributes = [editedAttr] };
Assert.NotEqual(_sut.ComputeHash(configBefore), _sut.ComputeHash(configAfter));
}
[Fact]
public void ComputeHash_AlarmDescriptionEdit_ChangesHash()
{
// TemplateEngine-017: same Description contract applies to alarms.
var baseAlarm = new ResolvedAlarm
{
CanonicalName = "HighTemp",
TriggerType = "RangeViolation",
Description = "Original"
};
var editedAlarm = baseAlarm with { Description = "Updated" };
var configBefore = new FlattenedConfiguration
{
InstanceUniqueName = "Instance1",
TemplateId = 1,
SiteId = 1,
Alarms = [baseAlarm]
};
var configAfter = configBefore with { Alarms = [editedAlarm] };
Assert.NotEqual(_sut.ComputeHash(configBefore), _sut.ComputeHash(configAfter));
}
[Fact]
public void ComputeHash_ConnectionEndpointEdit_ChangesHash()
{
// TemplateEngine-017: a Deployment user editing the primary endpoint
// JSON of a data connection bound to an instance must produce a
// different revision hash. The connection's protocol, primary/backup
// configuration JSON, and failover retry count are all part of the
// deployment package and therefore part of the hash input.
var connectionsBefore = new Dictionary<string, ConnectionConfig>
{
["plc1"] = new ConnectionConfig
{
Protocol = "OpcUa",
ConfigurationJson = "{\"endpoint\":\"opc.tcp://host-a:4840\"}",
BackupConfigurationJson = null,
FailoverRetryCount = 3
}
};
var connectionsAfter = new Dictionary<string, ConnectionConfig>
{
["plc1"] = connectionsBefore["plc1"] with
{
ConfigurationJson = "{\"endpoint\":\"opc.tcp://host-b:4840\"}"
}
};
var configBefore = new FlattenedConfiguration
{
InstanceUniqueName = "Instance1",
TemplateId = 1,
SiteId = 1,
Connections = connectionsBefore
};
var configAfter = configBefore with { Connections = connectionsAfter };
Assert.NotEqual(_sut.ComputeHash(configBefore), _sut.ComputeHash(configAfter));
}
[Fact]
public void ComputeHash_ConnectionProtocolEdit_ChangesHash()
{
// TemplateEngine-017: changing protocol must change the hash.
var connectionsBefore = new Dictionary<string, ConnectionConfig>
{
["plc1"] = new ConnectionConfig
{
Protocol = "OpcUa",
ConfigurationJson = "{}",
FailoverRetryCount = 3
}
};
var connectionsAfter = new Dictionary<string, ConnectionConfig>
{
["plc1"] = connectionsBefore["plc1"] with { Protocol = "Modbus" }
};
var configBefore = new FlattenedConfiguration
{
InstanceUniqueName = "Instance1",
TemplateId = 1,
SiteId = 1,
Connections = connectionsBefore
};
var configAfter = configBefore with { Connections = connectionsAfter };
Assert.NotEqual(_sut.ComputeHash(configBefore), _sut.ComputeHash(configAfter));
}
[Fact]
public void ComputeHash_ConnectionsSameContent_SameHash()
{
// TemplateEngine-017: equal Connections maps must yield the same hash,
// regardless of dictionary iteration order (the SortedDictionary
// projection guards this).
var connections1 = new Dictionary<string, ConnectionConfig>
{
["b"] = new ConnectionConfig { Protocol = "OpcUa", ConfigurationJson = "{\"k\":2}", FailoverRetryCount = 3 },
["a"] = new ConnectionConfig { Protocol = "OpcUa", ConfigurationJson = "{\"k\":1}", FailoverRetryCount = 3 }
};
var connections2 = new Dictionary<string, ConnectionConfig>
{
["a"] = new ConnectionConfig { Protocol = "OpcUa", ConfigurationJson = "{\"k\":1}", FailoverRetryCount = 3 },
["b"] = new ConnectionConfig { Protocol = "OpcUa", ConfigurationJson = "{\"k\":2}", FailoverRetryCount = 3 }
};
var config1 = new FlattenedConfiguration
{
InstanceUniqueName = "Instance1",
TemplateId = 1,
SiteId = 1,
Connections = connections1
};
var config2 = config1 with { Connections = connections2 };
Assert.Equal(_sut.ComputeHash(config1), _sut.ComputeHash(config2));
}
private static FlattenedConfiguration CreateConfig(string instanceName, string tempValue) private static FlattenedConfiguration CreateConfig(string instanceName, string tempValue)
{ {
return new FlattenedConfiguration return new FlattenedConfiguration