All five suppressed advisories are now resolved at baseline/resolved versions,
so every NuGetAuditSuppress is removed repo-wide:
- System.Security.Cryptography.Xml (GHSA-37gx-xxp4-5rgx / GHSA-w3x6-4m5h-cxqf)
-> fixed by the .NET 10 baseline (10.0.6)
- OPCFoundation Opc.Ua.Core (GHSA-h958-fxgg-g7w3) -> fixed at resolved 1.5.378.106
Two were still live and are now patched via direct security pins:
- OpenTelemetry.Api 1.9.0 -> 1.15.3 (GHSA-g94r-2vxg-569j) pinned in Cluster;
Runtime/ControlPlane/AdminUI + tests inherit via project reference
- Tmds.DBus.Protocol 0.20.0 -> 0.21.3 (GHSA-xrw6-gwf8-vvr9) pinned in Client.UI
Also correct the Historian sidecar runtime comments (x86 -> x64, matching the
csproj PlatformTarget). Solution audit: 0 vulnerable packages; full build clean.
T21: add an AdminUI path for acknowledging/shelving alarms that routes
through the admin-pinned AdminOperationsActor cluster singleton, which
republishes onto the same 'alarm-commands' DPS topic the OPC UA method
path (T18) and the engine subscriber (T19) use. The broadcast + the
ScriptedAlarmHostActor ownership filter handle cross-node routing, so
the singleton needs no knowledge of which node owns the alarm.
- Commons: AcknowledgeAlarmCommand/ShelveAlarmCommand (+ result records)
and a shared AlarmCommandsTopic const; ScriptedAlarmHostActor now
re-exports that const (mirrors the DriverControlTopic pattern).
- AdminOperationsActor: two handlers map the control-plane messages to
AlarmCommand (Acknowledge / OneShotShelve / TimedShelve / Unshelve,
threading User/Comment/UnshelveAtUtc) and publish via the DPS mediator.
- IAdminOperationsClient + AdminOperationsClient: typed Acknowledge/Shelve
ask wrappers mirroring StartDeploymentAsync.
- Alerts.razor: per-row DriverOperator-gated Ack/Shelve/Unshelve controls;
operator name from AuthenticationState. Timed-shelve datetime UI deferred.
- 5 TestKit tests (mediator-probe subscribed to alarm-commands) verifying
each kind's mapping + reply; 56/56 ControlPlane tests green.
Wrap the script compile-cost guardrail block in its own inner try/catch so a
transient SQL failure on ToListAsync cannot fall through to the outer catch and
produce a Rejected reply for an otherwise-valid deploy. advisory is declared in
the outer scope so the Accepted StartDeploymentResult Message is unaffected on
the happy path; the inner catch logs a Warning and leaves advisory null.
Aligns ConfigPublishCoordinator's _acks/_expectedAcks with the case-insensitive
ClusterId/NodeId scoping in DeploymentArtifact.ResolveClusterScope, so an ack
from a node whose host:port differs only in case still matches its expected-ack
entry (SQL collation + DNS are case-insensitive).
Persist the canonical AuditOutcome and make structured audit rows visible.
- ConfigAuditLog gains a nullable Outcome column, stored as the AuditOutcome
enum member name (nvarchar(16), mirroring how AdminRole is persisted). The
AuditWriterActor flush now writes Outcome = evt.Outcome.ToString(). Nullable so
legacy rows and the bespoke stored-procedure path (no derived outcome) write
NULL.
- Migration 20260602135350_AddConfigAuditLogOutcome: additive nullable column,
no backfill. Up adds the column, Down drops it. Chains after
20260602112419_CanonicalizeAdminRoles; `dotnet ef migrations
has-pending-model-changes` is clean.
- ClusterAudit visibility fix: the page filtered solely on ClusterId, but the
structured AuditWriterActor path stamps NodeId (ClusterId null), so those rows
were invisible. Extracted ClusterAuditQuery.ForClusterAsync (shared by the page
and tests) which ORs in rows whose NodeId belongs to a node in the cluster —
membership resolved from ClusterNode (NodeId -> ClusterId). SP-path
ClusterId-stamped rows still match.
Tests: ControlPlane 45/45 (adds Outcome persistence + Denied-outcome asserts);
new Configuration ClusterAuditQueryTests 3/3 (both-paths visible, other-cluster
excluded, page-size cap); AdminUI 121/121. Configuration Unit suite is green on a
clean run (a pre-existing timing flake in ResilientConfigReaderTests, untouched
here, occasionally fails under parallel load and passes in isolation).
Deep-adopt the shared audit record. Deletes the bespoke 8-field positional
Commons AuditEvent and repoints the writer path at ZB.MOM.WW.Audit.AuditEvent
(0.1.0, feed-mapped via dohertj2-gitea). Adds the package reference to both
Commons and ControlPlane.
- AuditWriterActor now implements IAuditWriter: WriteAsync(evt, ct) is a
best-effort, never-throwing entry point that Self.Tell()s the event onto the
same batching/dedup/flush pipeline and returns Task.CompletedTask. Existing
Receive<AuditEvent> + 500/5s batching + two-layer dedup unchanged.
- Flush mapping updated for the canonical field types: OccurredAtUtc is now
DateTimeOffset (.UtcDateTime into the datetime2 column), SourceNode is string?
(was NodeId.Value), CorrelationId is Guid? (stored null when null). Outcome is
NOT yet persisted (column lands in Task 2.2).
- New AuditOutcomeMapper.FromAction maps the OtOpcUa action vocabulary to the
required canonical Outcome: OpcUaAccessDenied / CrossClusterNamespaceAttempt ->
Denied; config verbs (DraftCreated/Edited, Published, RolledBack, NodeApplied,
ClusterCreated, NodeAdded, CredentialAdded/Disabled, ExternalIdReleased) ->
Success. OtOpcUa emits no Failure events.
The Akka message shape changed, but the structured audit path is dormant (zero
production emit/Tell sites; all live audit flows through the bespoke SP path),
so there is no rolling-deploy wire-compat concern. Tested-not-exercised by
design.
ControlPlane.Tests: 44/44 green (AuditWriterActor suite rewritten to construct
the canonical record + assert the Outcome derivation table + the WriteAsync
best-effort/mailbox-routing contract + null SourceNode/CorrelationId handling).
- Topic-name drift fix: DriverHealthChanged.TopicName and
DriverControlTopic.Name now live on the message contracts in
Commons. AkkaDriverHealthPublisher, DriverStatusSignalRBridge,
DriverHostActor, and AdminOperationsActor all delegate to the
single constant so a rename can't silently desynchronise
publisher and subscriber.
- DriverStatusPanel._opResultClearTimer switched from
System.Timers.Timer to System.Threading.Timer + awaited
DisposeAsync. Prevents an in-flight 8s clear-callback from
invoking StateHasChanged on a component whose hub has already
been released.
- PublishHealthSnapshot deduplicates against the last published
(state, lastSuccess, lastError, errorCount) fingerprint. The
30s heartbeat no longer floods the SignalR layer with identical
Healthy snapshots — newly-joined clients still warm up via the
snapshot store on JoinDriver.
- RestartDriver / ReconnectDriver messages + AdminOperationsActor
handlers (broadcast via driver-control DPS topic; audited via
ConfigEdits).
- DriverHostActor subscribes to driver-control; locates the
matching child DriverInstanceActor and stops+respawns it
(Restart) or sends it a ForceReconnect internal message
(Reconnect — re-enters Reconnecting state without full stop).
DriverInstanceSpec constructor call uses named args to handle
the full 6-parameter signature.
- New DriverOperator authorization policy mapped to DriverOperator
or FleetAdmin role; documented in docs/security.md. Map LDAP
group via GroupToRole (e.g. "ot-driver-operator": "DriverOperator").
- DriverStatusPanel renders Reconnect + Restart buttons when the
user holds the DriverOperator policy (hidden otherwise). Restart
requires an in-page Razor confirm block (no JS confirm, keeps
SignalR event loop unblocked). Both buttons show a spinner and
are disabled during in-flight; result chip auto-clears after 8s.
Username sourced from AuthenticationStateProvider.
Reconnect resolves to "ForceReconnect" (re-enter Reconnecting,
not full stop+respawn) — transport drops and retries while actor
and in-memory state are preserved. All DriverInstanceActor states
handle ForceReconnect safely (no-op when already in transition).
- IDriverProbe abstraction in Core.Abstractions; one impl per driver
type, resolved by DriverType string. Phase 7.3 + 7.4 add concrete
probes for the 9 supported driver types.
- TestDriverConnect / TestDriverConnectResult messages.
- AdminOperationsActor.HandleTestDriverConnectAsync looks up the probe
by DriverType, runs it with a [1,60]s clamped timeout, and returns
success/latency or failure/message. Probes that throw or time out
surface as soft failures.
Adds <summary>, <param>, <typeparam>, and <inheritdoc/> tags to public
members surfaced by commentchecker — resolves 5,847 of 5,869 issues
(99.6%) across three /fixdocs passes.
ConfigAuditLog gains two nullable columns (EventId, CorrelationId) + a filtered
unique index UX_ConfigAuditLog_EventId. EF migration
20260526105027_AddConfigAuditLogEventIdColumns is additive (nullable + filtered
index = legacy rows backfill cleanly).
AuditWriterActor now writes EventId + CorrelationId into the dedicated columns
instead of synthesising a JSON wrapper into DetailsJson. Cross-restart dedup
is now real: a retry of an already-flushed batch hits the unique index and
SaveChanges throws; the existing catch drops the duplicate without losing the
rest of the batch.
WrapDetails helper deleted — F4 (its JSON hardening) becomes moot.
AuditWriterActorTests.Details_wrapper_embeds_eventId_and_correlationId renamed
+ rewritten to assert against the columns. All 29 ControlPlane tests pass,
all 95 v2 tests green.
DeployHappyPathTests exercises the full deploy pipeline on the 2-node harness:
AdminOperationsActor → ConfigPublishCoordinator → DistributedPubSub →
DriverHostActor on both nodes → ApplyAck → coordinator seals. Verifies both
NodeDeploymentState rows reach Applied and Deployment.Status reaches Sealed.
Exposed + fixed two production bugs along the way:
1. Coordinator was publishing DispatchDeployment on the "deployments" topic but
never subscribed to anything — DriverHostActor ACKs published on the same
topic could not reach it. Added dedicated "deployment-acks" topic with
coordinator subscription in PreStart, and DriverHostActor publishes ACKs
there.
2. NodeId derivation used member.Address.Host only — two cluster members on a
shared loopback host (test harness, dev VMs) collided to one identity. The
coordinator's expected-ack set became {1} and the system sealed after only
half the nodes acked. Switched to host:port everywhere (ClusterRoleInfo +
coordinator) so loopback nodes stay distinct and production identities are
harmlessly more specific.
Tests: 95 v2 tests pass (was 93 + 2 deploy tests), 0 skipped.
Failover scenarios (design §8 cases 3-7: node-kill-mid-apply, split-brain,
restart-during-deploy) deferred — they need controlled node-down primitives
on the harness. Tracked as F22 (failover scenario test cases).
Mirrors the publisher-injection pattern from FleetStatusBroadcaster and
PeerOpcUaProbeActor: Props accepts an optional Action<object> override so
tests can use a TestProbe sink instead of bootstrapping DistributedPubSub
(unreliable single-node in TestKit).
Un-skips the two RedundancyStateActor tests deferred under F6.
Adds the empty project skeletons that subsequent v2 tasks fill in:
src/Core/ZB.MOM.WW.OtOpcUa.Commons (types, interfaces, message contracts)
src/Core/ZB.MOM.WW.OtOpcUa.Cluster (Akka.Hosting + cluster wiring)
src/Server/ZB.MOM.WW.OtOpcUa.Security (cookie+JWT auth, LDAP)
src/Server/ZB.MOM.WW.OtOpcUa.ControlPlane (admin-role cluster singletons)
src/Server/ZB.MOM.WW.OtOpcUa.Runtime (per-node driver actors)
src/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer (OPC UA SDK application host)
src/Server/ZB.MOM.WW.OtOpcUa.AdminUI (Razor class library)
src/Server/ZB.MOM.WW.OtOpcUa.Host (single fused web binary)
Each project sets TreatWarningsAsErrors=true in its own csproj (per the
Directory.Build.props deviation note in the previous commit). NuGetAuditSuppress
entries cover transitive vulnerability advisories the new strictness surfaces:
- GHSA-g94r-2vxg-569j (OpenTelemetry.Api 1.9.0 via Akka.Cluster.Hosting/Tools)
- GHSA-h958-fxgg-g7w3 (Opc.Ua.Core 1.5.374.126 via OpcUaServer)
- GHSA-37gx-xxp4-5rgx + GHSA-w3x6-4m5h-cxqf (legacy advisories already accepted)
OpcUaServer pins OPCFoundation.NetStandard.Opc.Ua.Configuration to 1.5.374.126
via VersionOverride to match Opc.Ua.Server's transitive Opc.Ua.Core (same
constraint as the legacy Server project).
Runtime does NOT project-reference any concrete Driver.* assemblies; drivers
load reflectively at runtime (Phase 6). Runtime gets the IDriver contract
through Core.Abstractions instead.
Host's Microsoft.Extensions.Hosting.WindowsServices is conditional on the
Windows OS so the project builds on macOS dev machines.
Build verification: dotnet build -> 438 warnings (all pre-existing xUnit1051
in legacy Server.Tests/Admin.Tests), 0 errors. Closes Task 9 (build green
smoke check, no separate commit).