167 Commits

Author SHA1 Message Date
Joseph Doherty 05a0596fb1 feat(host): F9b RoslynScriptedAlarmEvaluator + #107 close engine DI
v2-ci / build (push) Failing after 39s
v2-ci / unit-tests (tests/Core/ZB.MOM.WW.OtOpcUa.Cluster.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.ControlPlane.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Runtime.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Security.Tests) (push) Has been skipped
v2-ci / integration (push) Has been skipped
RoslynScriptedAlarmEvaluator mirrors F8b's pattern for alarm predicates:
caches a compiled ScriptEvaluator<AlarmPredicateContext, bool> per unique
predicate, runs against the dependency dictionary with a 2s timeout, and
turns every failure (compile error, sandbox violation, runtime throw,
ctx.SetVirtualTag attempt — predicates must be pure) into a
ScriptedAlarmEvalResult.Failure. ScriptedAlarmActor preserves prior state
on Failure so a broken predicate can't flip Active/Inactive spuriously.

Program.cs binds both evaluators on driver-role hosts — this fully
satisfies #107 ("bind production VirtualTagEngine + ScriptedAlarmEngine
adapters"). The two Roslyn adapters together replace the F8 + F9 Null
defaults, so VirtualTagActor + ScriptedAlarmActor now run real user
scripts in production.

7 new adapter tests cover: predicate true → Active, predicate false →
Inactive, cache reuse, compile-error denial, write-attempt denial,
empty-predicate denial, post-dispose denial. Host.IntegrationTests now
17/17 green.

Closes #80 + #107. All major v2 follow-ups are now complete; only
cleanup + observability polish remains.
2026-05-26 10:58:04 -04:00
Joseph Doherty 219d10a22d feat(host): F8b RoslynVirtualTagEvaluator — production virtual-tag eval
RoslynVirtualTagEvaluator wraps Core.Scripting.ScriptEvaluator + Core
.VirtualTags.VirtualTagContext into a single-tag IVirtualTagEvaluator
adapter. Caches the compiled ScriptEvaluator per unique expression so
the second-and-onwards Evaluate is an in-process method call against the
dependency dictionary. Compile/sandbox/runtime errors all surface as
VirtualTagEvalResult.Failure rather than propagating exceptions through
the VirtualTagActor message loop.

Single-tag scope: cross-tag ctx.SetVirtualTag writes are dropped + logged
because fan-out between actors is owned by DependencyMuxActor. Cycle
detection + cascade ordering stay in Core.VirtualTags.VirtualTagEngine
where they belong (loaded fleet-wide); this adapter keeps the actor
message handler simple.

Host adds Core.Scripting + Core.VirtualTags project refs, plus a
TargetWarningsAsErrors NU1608 suppression — Microsoft.CodeAnalysis.CSharp
.Scripting 4.12.0 pins Common to 4.12.0 but ASP.NET Core transitively
brings Microsoft.CodeAnalysis.Common 5.0.0; the surface we use is stable
across the drift (verified by Core.Scripting.Tests).

Program.cs binds RoslynVirtualTagEvaluator → IVirtualTagEvaluator on
driver-role hosts, replacing the F8-default NullVirtualTagEvaluator so
VirtualTagActor evaluates real user scripts at runtime.

6 new adapter tests cover: simple expression sums, cache reuse across
calls, compile-error denial, runtime-throw denial, empty-expression
denial, post-dispose denial. Host.IntegrationTests now 10/10 green.

Closes #79. F9b + #107 next.
2026-05-26 10:55:56 -04:00
Joseph Doherty 607dc51dec feat(opcua): #85 UNS Area/Line/Equipment folder hierarchy in SDK
v2-ci / build (push) Failing after 42s
v2-ci / unit-tests (tests/Core/ZB.MOM.WW.OtOpcUa.Cluster.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.ControlPlane.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Runtime.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Security.Tests) (push) Has been skipped
v2-ci / integration (push) Has been skipped
Phase7Composer now carries UnsAreaProjection + UnsLineProjection lists so
the applier can materialise the full UNS topology in the OPC UA address
space. New IOpcUaAddressSpaceSink.EnsureFolder(folderNodeId, parentNodeId,
displayName) seam (no-op default, recorded in tests, forwarded by
DeferredAddressSpaceSink, implemented by SdkAddressSpaceSink). The SDK-
side OtOpcUaNodeManager gains an EnsureFolder API that creates
FolderState nodes with proper parent linkage; RebuildAddressSpace now
clears folders too so re-applies don't accumulate stale topology.

Phase7Applier.MaterialiseHierarchy walks composition.UnsAreas →
composition.UnsLines → composition.EquipmentNodes, calling EnsureFolder
with the correct parent at each level. Idempotent — calling twice with
the same composition is a no-op. OpcUaPublishActor.HandleRebuild invokes
it after Phase7Applier.Apply so OPC UA clients browsing the server now
see Area/Line/Equipment as proper folders rather than flat tag ids.

DeploymentArtifact.ParseComposition reads UnsAreas + UnsLines from the
JSON snapshot the ControlPlane emits, populating the new fields when
present.

Phase7Composer.Compose now accepts UnsAreas + UnsLines; a 3-arg overload
preserves the old signature for legacy callers + existing tests. The
Phase7CompositionResult convenience ctor likewise keeps the planner
tests working without UNS data.

3 new hierarchy tests (pure unit + boot-verify against a real
OtOpcUaSdkServer); OpcUaServer suite is 48/48 green (was 45, +3),
Runtime 74/74 unchanged.

Closes #85.
2026-05-26 10:48:56 -04:00
Joseph Doherty 9d86287d08 test(opcua): Task 60 ServiceLevel end-to-end through SDK
v2-ci / build (push) Failing after 49s
v2-ci / unit-tests (tests/Core/ZB.MOM.WW.OtOpcUa.Cluster.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.ControlPlane.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Runtime.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Security.Tests) (push) Has been skipped
v2-ci / integration (push) Has been skipped
Boots a real StandardServer + OpcUaApplicationHost, wires
SdkServiceLevelPublisher into a DeferredServiceLevelPublisher (production
binding pattern), spawns OpcUaPublishActor against the deferred
publisher, sends RedundancyStateChanged snapshots, and asserts that
ServerObject.ServiceLevel.Value reflects the role-derived byte:

  Primary + RoleLeaderForDriver  → 240
  Secondary                      → 100

Together with the F13b endpoint-security tests (which already verify
ServerConfiguration.SecurityPolicies populates the three baseline
profiles), this closes Task 60's "dual-endpoint + ServiceLevel" scope.
Cross-node failover tests stay in the 2-node integration harness
(Task 59 / FailoverScenarioTests).

Runtime suite now 74 / 74 green (+2). Closes Task 60.
2026-05-26 10:40:58 -04:00
Joseph Doherty 2697af31d1 feat(opcua,host): #81 ServiceLevel SDK publisher
SdkServiceLevelPublisher writes Server.ServiceLevel through the SDK's
ServerObjectState — the standard OPC UA non-transparent-redundancy signal
clients use to pick a primary. Writes are guarded by DiagnosticsLock so
concurrent SDK diagnostics scans don't fight with our updates.

DeferredServiceLevelPublisher mirrors the DeferredAddressSpaceSink late-
binding pattern: Akka actors resolve IServiceLevelPublisher at construction,
hosted service swaps the SDK publisher in after StandardServer.Start. Host
Program.cs registers DeferredServiceLevelPublisher as the singleton bound
to IServiceLevelPublisher; OtOpcUaServerHostedService gets it injected and
fills it once IServerInternal is available.

Tests boot a real StandardServer on a free port (cross-platform), call
Publish, then verify ServerObject.ServiceLevel.Value reflects the write.
5 new tests; OpcUaServer suite now 45/45 green (was 40, +5).

Closes #81 residual. Unblocks Task 60 (OPC UA dual-endpoint + ServiceLevel
tests).
2026-05-26 10:37:42 -04:00
Joseph Doherty 52997ee164 feat(observability): F13d Prometheus + OpenTelemetry instrumentation
v2-ci / build (push) Failing after 38s
v2-ci / unit-tests (tests/Core/ZB.MOM.WW.OtOpcUa.Cluster.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.ControlPlane.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Runtime.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Security.Tests) (push) Has been skipped
v2-ci / integration (push) Has been skipped
OtOpcUaTelemetry (Commons/Observability) centralizes the project's Meter
+ ActivitySource so all instrumentation points emit through a single
named surface. Counters cover the hot paths:

  otopcua.deploy.applied               (outcome=ack|reject)
  otopcua.deploy.apply.duration        (s, histogram)
  otopcua.driver.lifecycle             (event=spawn|spawn_stub|stop|fault)
  otopcua.virtualtag.eval              (outcome=ok|fail|skip)
  otopcua.scriptedalarm.transition     (state=activated|acknowledged|cleared)
  otopcua.opcua.sink.write             (kind=value|alarm|rebuild)
  otopcua.redundancy.service_level_change (level=byte)

Plus two ActivitySource spans:

  otopcua.deploy.apply                 wraps DriverHostActor.ApplyAndAck
  otopcua.opcua.address_space_rebuild  wraps OpcUaPublishActor.HandleRebuild

Instruments are no-op until a listener attaches, so tests + dev hosts
pay nothing for unread telemetry.

Host Program.cs gains AddOtOpcUaObservability() (binds the OtOpcUa Meter
+ ActivitySource to OpenTelemetry, attaches a Prometheus exporter) and
MapOtOpcUaMetrics() (mounts /metrics scrape endpoint). Driver-side
internals + ASP.NET request metrics deliberately stay off — the scrape
payload is scoped to OtOpcUa signals only.

Tests use MeterListener + ActivityListener to verify
VirtualTagActor.eval, OpcUaPublishActor.AttributeValueUpdate, and
RebuildAddressSpace actually emit on the central instruments. Runtime
suite is 72 / 72 green (+3).

Closes #105. Path A (F13b/c/d) complete; next batch options: #85 UNS
folder hierarchy in SDK, or F8b/F9b production engine bindings.
2026-05-26 10:29:40 -04:00
Joseph Doherty 21eac21409 feat(opcua,host): F13c LDAP-bound UserName validator
Adds IOpcUaUserAuthenticator seam in OpcUaServer.Security with a deny-all
NullOpcUaUserAuthenticator default. OpcUaApplicationHost subscribes to
SessionManager.ImpersonateUser after _application.Start so UserName tokens
flow through the authenticator and either attach a UserIdentity to the
session (Allow) or set IdentityValidationError = BadIdentityTokenRejected
(Deny / authenticator exception). Anonymous + X509 tokens fall through to
SDK defaults.

LdapOpcUaUserAuthenticator (Host project) bridges to the same
ILdapAuthService that AddOtOpcUaAuth uses for Admin cookies / JWT, so a
single LDAP source-of-truth governs both Admin control plane and OPC UA
data plane. Program.cs registers LdapOptions + LdapAuthService +
IOpcUaUserAuthenticator on driver-role hosts; admin-only nodes are
unchanged.

OtOpcUaServerHostedService threads the resolved authenticator into
OpcUaApplicationHost so the seam respects Host DI.

10 new tests: 6 in OpcUaServer.Tests cover the pure HandleImpersonation
static method (success / denial / anonymous fallthrough / authenticator-
throw / null-username / Null authenticator); 4 in Host.IntegrationTests
cover the LdapOpcUaUserAuthenticator adapter (LDAP allow → Allow with
roles, LDAP deny → Deny, exception → backend-error denial, display-name
fallback). OpcUaServer suite is 40 / 40 green.

Closes #104. Unblocks Task 60 (dual-endpoint + ServiceLevel tests) once
#81 residual lands.
2026-05-26 10:21:37 -04:00
Joseph Doherty 8b08566f41 feat(opcua): F13b endpoint security profiles — Sign + SignAndEncrypt
OpcUaApplicationHost.BuildConfigurationAsync now populates
ServerConfiguration.SecurityPolicies + UserTokenPolicies from the new
OpcUaSecurityProfile enum on OpcUaApplicationHostOptions. Defaults expose
all three baseline profiles (None + Basic256Sha256-Sign +
Basic256Sha256-SignAndEncrypt) matching docs/security.md. UserName tokens
are SDK-encrypted with the server cert so they work on None endpoints too;
F13c will plug the LDAP validator into SessionManager.

AutoAcceptUntrustedClientCertificates surfaces as an option for dev flows;
production keeps the default (false) and operators promote rejected certs
through the Admin UI.

InternalsVisibleTo added so BuildSecurityPolicies / BuildUserTokenPolicies
stay encapsulated but unit-testable. 6 new tests cover the pure builders +
two boot-verify cases (3-profile default + hardened single-profile),
bringing the suite to 34 / 34 passing.

Closes #103. Unblocks #104 (F13c LDAP user-token validator).
2026-05-26 10:15:04 -04:00
Joseph Doherty 50787823d3 feat(host,runtime): #108 Host DI bindings — OPC UA server + deferred sink
v2-ci / build (push) Failing after 45s
v2-ci / unit-tests (tests/Core/ZB.MOM.WW.OtOpcUa.Cluster.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.ControlPlane.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Runtime.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Security.Tests) (push) Has been skipped
v2-ci / integration (push) Has been skipped
Wires the OPC UA SDK into the fused Host's lifecycle on driver-role
nodes + spawns OpcUaPublishActor with the proper sink/publisher/dbFactory/
applier resolution. The full read+write data path is now live in
production: Deploy → DriverHost → OpcUaPublish → SDK NodeManager →
subscribed OPC UA clients.

DeferredAddressSpaceSink (Commons.OpcUa):
  - Thread-safe wrapper IOpcUaAddressSpaceSink that delegates to an
    inner sink swapped in at runtime. Needed because Akka actors
    resolve the sink at construction time, but the production sink
    (SdkAddressSpaceSink wrapping OtOpcUaNodeManager) only exists
    after the SDK StandardServer has started.
  - Defaults to NullOpcUaAddressSpaceSink so calls before swap are
    safe; SetSink(null) reverts (for graceful shutdown).

OtOpcUaServerHostedService (Host.OpcUa):
  - IHostedService that owns the OPC UA SDK lifecycle. Reads
    OpcUaApplicationHostOptions from the 'OpcUa' config section,
    creates an OtOpcUaSdkServer, boots it through OpcUaApplicationHost,
    then swaps a real SdkAddressSpaceSink into the DeferredAddressSpaceSink
    singleton.
  - SDK boot failure is logged + non-fatal — the rest of the host
    (admin UI, driver actors) keeps running. Stop reverts to null sink.

WithOtOpcUaRuntimeActors (Runtime):
  - Now spawns OpcUaPublishActor (new actor) + threads its ActorRef
    into DriverHostActor's Props so successful applies trigger the
    address-space rebuild pipeline.
  - Phase7Applier is constructed here from the resolved sink + a
    logger; OpcUaPublishActor takes both.
  - Prepends the opcua-synchronized-dispatcher HOCON so the extension
    is self-contained — consumers (Host, tests) don't need to redeclare
    the dispatcher block.
  - New OpcUaPublishActorKey + OpcUaPublishActorName for actor-registry
    resolution.
  - AddOtOpcUaRuntime now also TryAddSingleton's NullOpcUaAddressSpaceSink
    + NullServiceLevelPublisher so admin-only nodes (or tests that
    don't bind the Deferred sink) stay safe.

Host.Program.cs (driver-role only):
  - Binds DeferredAddressSpaceSink as singleton + as IOpcUaAddressSpaceSink
  - AddHostedService<OtOpcUaServerHostedService>()

Tests: OpcUaServer 24 -> 28 (+4 DeferredAddressSpaceSink unit tests),
Runtime 69 -> 69 (existing ServiceCollectionExtensionsTests extended
to verify the new mux + publish actor registration).

All 6 v2 test suites green: 177 tests passing.

Closes #108. Engine-wiring is now production-bound end-to-end on
driver-role nodes — Deploy reaches real OPC UA Variable nodes that
subscribed clients see.
2026-05-26 10:02:15 -04:00
Joseph Doherty 7e22e2250c feat(runtime): #109 OpcUaPublishActor — load artifact, compose, plan-diff, apply
v2-ci / build (push) Failing after 45s
v2-ci / unit-tests (tests/Core/ZB.MOM.WW.OtOpcUa.Cluster.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.ControlPlane.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Runtime.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Security.Tests) (push) Has been skipped
v2-ci / integration (push) Has been skipped
Closes the loop between F10b (SDK NodeManager) and F14 (Phase7Plan +
Phase7Applier). DriverHostActor's successful apply now triggers a
RebuildAddressSpace on the publish actor, which loads the latest
deployment artifact + walks composer → planner → applier through the
sink. The OPC UA address space tracks the deployed composition.

DeploymentArtifact:
  - New ParseComposition(blob) → Phase7CompositionResult that decodes
    Equipment + DriverInstance + ScriptedAlarm arrays into the
    projection records Phase7Planner consumes. Pascal-case property
    names mirror ConfigComposer.SnapshotAndFlattenAsync's output.
  - Each entity reader is tolerant: missing-id rows are dropped,
    natural-key sort matches Phase7Composer's contract.

OpcUaPublishActor:
  - New Props params: dbFactory + applier. When wired, RebuildAddressSpace
    does:
      1. LoadLatestArtifact (most recent Sealed Deployment.ArtifactBlob)
      2. ParseComposition → Phase7CompositionResult
      3. Phase7Planner.Compute(lastApplied, next) → Phase7Plan
      4. Empty plan ⇒ no-op (deploy of unchanged composition is benign)
      5. applier.Apply(plan) drives sink.RebuildAddressSpace +
         WriteAlarmState for removed nodes
      6. lastApplied = next so the next rebuild diffs forward
  - Without dbFactory/applier wiring, falls back to raw
    sink.RebuildAddressSpace — the dev/Mac path before #108 binds prod.

DriverHostActor:
  - New Props param opcUaPublishActor (IActorRef?). After successful
    ApplyAndAck (status Applied, ACK sent), tells the publish actor
    RebuildAddressSpace with the same correlation id so the audit trail
    threads through. Null publish actor ⇒ no trigger (admin-only nodes).

Tests: Runtime 63 -> 69 (+6):
- ParseComposition reads Equipment/Driver/Alarm sorted by natural key
- ParseComposition returns empty for empty blob
- Rebuild with dbFactory + sealed deployment artifact triggers exactly
  one sink.Rebuild call (Equipment topology added)
- Rebuild with no artifact is idempotent no-op
- Second rebuild with same composition is empty-plan no-op
- Rebuild without dbFactory falls back to raw sink.Rebuild (legacy path)

All 6 v2 test suites green: 173 tests passing.

Closes #109. Engine-wiring data flow is now end-to-end through:
  Deploy → DriverHostActor.ApplyAndAck → driver spawn + ACK +
    RebuildAddressSpace → OpcUaPublishActor → Phase7Applier → SDK
    NodeManager → subscribed OPC UA clients see the change.
2026-05-26 09:55:11 -04:00
Joseph Doherty d21f6947e1 feat(opcua): F10b SDK NodeManager binding — real OPC UA address-space writes
v2-ci / build (push) Failing after 38s
v2-ci / unit-tests (tests/Core/ZB.MOM.WW.OtOpcUa.Cluster.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.ControlPlane.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Runtime.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Security.Tests) (push) Has been skipped
v2-ci / integration (push) Has been skipped
OtOpcUaNodeManager + SdkAddressSpaceSink: the v2 IOpcUaAddressSpaceSink
seam now has a production adapter against a real Opc.Ua.Server
CustomNodeManager2. Writes through OpcUaPublishActor's sink materialise
as real OPC UA Variable updates that subscribed clients see via the
standard ClearChangeMasks notification path.

OtOpcUaNodeManager (CustomNodeManager2):
  - Owns a ConcurrentDictionary<string, BaseDataVariableState> under a
    single namespace (https://zb.com/otopcua/ns) hung off Objects/.
  - WriteValue lazy-creates the variable on first write, sets Value +
    StatusCode (mapped from OpcUaQuality severity bits) + SourceTimestamp,
    then ClearChangeMasks to notify subscribers.
  - WriteAlarmState surfaces a [active, acknowledged] pair on a
    dedicated node id — full AlarmConditionState/event firing comes
    with #85 F14b (EquipmentNodeWalker SDK integration).
  - RebuildAddressSpace tears down every registered variable + clears
    the dictionary so the next write-pass starts fresh.
  - Address-space root folder is materialised in CreateAddressSpace.

SdkAddressSpaceSink: thin IOpcUaAddressSpaceSink → OtOpcUaNodeManager
bridge. Production DI binding (#108) constructs this once the host's
StandardServer has booted.

OtOpcUaSdkServer (StandardServer subclass): overrides
CreateMasterNodeManager to inject OtOpcUaNodeManager via the
MasterNodeManager additionalManagers ctor. NodeManager property
exposes the live instance so OpcUaApplicationHost callers can wrap
it in a sink.

Tests: OpcUaServer 20 -> 24 (+4):
- WriteValue creates + updates variables in the manager
- WriteAlarmState creates a node distinct from value writes
- RebuildAddressSpace clears everything; subsequent writes start fresh
- NullOpcUaAddressSpaceSink no-op sanity

Each test boots a real OpcUaApplicationHost on a free port with the
SDK certificate auto-create flow (F13a) intact — full integration
slice on macOS.

All 6 v2 test suites green: 167 tests passing.

F10 status updated to reflect SDK binding shipped. Residuals:
- #109 OpcUaPublishActor.RebuildAddressSpace → Phase7Applier wiring
- #108 Host DI default to SdkAddressSpaceSink when hasDriver
- #85 F14b EquipmentNodeWalker integration (proper AlarmConditionState
  + folder hierarchy)
- IServiceLevelPublisher SDK binding (writes Server.ServiceLevel node)
2026-05-26 09:49:44 -04:00
Joseph Doherty 7fa863f6da feat(runtime): #113 DependencyMuxActor — drivers → virtual-tag fan-out
v2-ci / build (push) Failing after 36s
v2-ci / unit-tests (tests/Core/ZB.MOM.WW.OtOpcUa.Cluster.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.ControlPlane.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Runtime.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Security.Tests) (push) Has been skipped
v2-ci / integration (push) Has been skipped
End-to-end data path is now wired on the read side: driver subscriptions
fire AttributeValuePublished → DriverHostActor → DependencyMuxActor →
DependencyValueChanged to every interested VirtualTagActor. Previously
the publish hit a dead-letter at the host.

DependencyMuxActor:
  - Per-node fan-out router. Maintains tagRef → Set<IActorRef> with a
    reverse subscriber → refs index so unregister/replace are O(refs).
  - Watches subscribers; Terminated triggers automatic unregister so
    dead virtual-tag actors stop receiving publishes.
  - Re-register replaces the prior interest set — no stale-ref leaks
    on actor restart.
  - Drops publishes for refs with no interested subscribers.

VirtualTagActor:
  - New Props params: dependencyRefs + mux ActorRef.
  - PreStart sends RegisterInterest to the mux; PostStop sends
    UnregisterInterest. Default both null so older callers stay quiet.

DriverHostActor:
  - New dependencyMux Props param. Steady + Applying states now
    receive AttributeValuePublished from their DriverInstance children
    and forward to the mux. Null mux is a no-op (dev/Mac).

ServiceCollectionExtensions:
  - WithOtOpcUaRuntimeActors spawns DependencyMuxActor before
    DriverHostActor and threads its ActorRef into the host's Props.
    New DependencyMuxActorKey + DependencyMuxActorName.

Tests: Runtime 57 -> 63 (+6):
- Mux forwards to only subscribers interested in each ref
- Publish for unregistered ref is dropped silently
- Unregister stops forwarding
- Re-register replaces prior interest set
- VirtualTagActor PreStart registration drives end-to-end eval
  (uses AwaitAssert to race-safely settle the PreStart Tell)
- DriverHostActor forwards AttributeValuePublished through to mux

All 6 v2 test suites green: 163 tests passing.

F8 (#79) state updated — dep subscribe seam shipped, Core.VirtualTags
production engine binding (compile + ITagUpstreamSource subscribe) is
the residual.
2026-05-26 09:43:06 -04:00
Joseph Doherty f427dc4f26 feat(runtime): #112 ScriptedAlarmActor state persistence via IAlarmActorStateStore
v2-ci / build (push) Failing after 42s
v2-ci / unit-tests (tests/Core/ZB.MOM.WW.OtOpcUa.Cluster.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.ControlPlane.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Runtime.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Security.Tests) (push) Has been skipped
v2-ci / integration (push) Has been skipped
ScriptedAlarmActor now survives actor restart: PreStart loads from
the configured store + restores in-memory state; every Transition()
fires a fire-and-forget save. ActiveState still re-derives from the
evaluator on first tick (Phase 7 decision #14), but Acked state +
lastAckUser persist verbatim so operators don't re-ack across an
outage.

Three pieces:
- IAlarmActorStateStore seam in Commons.Engines, with the
  AlarmActorStateSnapshot record (alarmId / state / lastTransitionUtc
  / lastAckUser) and NullAlarmActorStateStore default.
- EfAlarmActorStateStore in Runtime.ScriptedAlarms — production
  adapter over the existing ScriptedAlarmState table in ConfigDb.
  Maps the actor's 3-state enum to the table's AckedState column
  (Active⇒Unacknowledged, Acknowledged⇒Acknowledged, Inactive⇒
  Acknowledged). Concurrency conflicts are logged + dropped — the
  next transition writes again.
- ScriptedAlarmActor PreStart load (async, piped back as
  StateRestored) + Transition save. New Props overload takes the
  store; default is NullAlarmActorStateStore so tests stay quiet.

Tests: Runtime 52 -> 57 (+5):
- Transition writes Active then Acknowledged snapshots with
  lastAckUser populated
- PreStart with persisted Active state restores so a subsequent
  AcknowledgeAlarm fires (not ignored as it would be from Inactive)
- Empty store boots Inactive (AcknowledgeAlarm correctly ignored)
- EfAlarmActorStateStore Save + Load round-trips via in-memory EF
- Load for unknown alarmId returns null

All 6 v2 test suites green: 157 tests passing.

Closes #112. F9 (#80) remaining residual is predicate binding to
Core.ScriptedAlarms.ScriptedAlarmEngine — split as F9b in tasks JSON.
2026-05-26 09:34:37 -04:00
Joseph Doherty 3e3f7588bd feat(runtime,host): close F7 — driver subscribe + write paths + Host DI
v2-ci / build (push) Failing after 42s
v2-ci / unit-tests (tests/Core/ZB.MOM.WW.OtOpcUa.Cluster.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.ControlPlane.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Runtime.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Security.Tests) (push) Has been skipped
v2-ci / integration (push) Has been skipped
Three pieces landed in one batch, closing F7-residual + Host DI #106:

Runtime/DriverInstanceActor:
  - Subscribe / Unsubscribe message contracts; the Connected state
    handles them via IDriver.ISubscribable. On every OnDataChange
    event the actor publishes AttributeValuePublished to its parent
    (DriverHostActor → OpcUaPublishActor). OPC UA StatusCode is
    mapped to the 3-state OpcUaQuality enum via severity bits
    (00=Good, 01=Uncertain, 10/11=Bad).
  - DetachSubscription tears the handler off the driver on
    DisconnectObserved, Unsubscribe, and PostStop so a stale handler
    never pushes to a dead actor.
  - WriteAttribute now dispatches IWritable.WriteAsync (batch of one)
    with a 5s CancellationTokenSource; status-code propagated to
    WriteAttributeResult on non-Good results.

Host:
  - New ProjectReferences to Core + every cross-platform driver
    assembly (AbCip/AbLegacy/FOCAS/Galaxy/Modbus/S7/TwinCAT).
    Galaxy is net10 (gRPC client to mxaccessgw); the COM-bound net48
    Wonderware Historian driver stays out of the Host's reference
    closure — its .Client gRPC wrapper is what binds for historian
    needs.
  - New DriverFactoryBootstrap.AddOtOpcUaDriverFactories() registers
    a singleton DriverFactoryRegistry, invokes each driver's
    Register(registry, loggerFactory), and binds IDriverFactory to
    DriverFactoryRegistryAdapter. Replaces the F7 NullDriverFactory
    default so deploys actually materialise real IDriver instances
    on driver-role nodes. ShouldStub() still gates per-platform
    behaviour at spawn time.
  - Program.cs wires AddOtOpcUaDriverFactories() before AddAkka so
    the runtime extension can resolve IDriverFactory from DI.

Tests: Runtime 46 -> 52 (+6):
- Write returns success when StatusCode = Good
- Write propagates non-Good status code in failure Reason
- Subscribe forwards OnDataChange to parent as AttributeValuePublished
- Quality translation: Uncertain (0x40...) and Bad (0x80...)
- Subscribe against non-ISubscribable returns failure
- DisconnectObserved detaches handler so late events are dropped

All 6 v2 test suites green: 152 tests passing.

Closes F7. F7-residual sub-tasks #110 (subscribe) and #111 (write)
both shipped. Host DI binding #106 shipped.
2026-05-26 09:28:34 -04:00
Joseph Doherty c02f016f1d feat(opcua): F14 Phase7Plan + Phase7Applier
v2-ci / build (push) Failing after 34s
v2-ci / unit-tests (tests/Core/ZB.MOM.WW.OtOpcUa.Cluster.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.ControlPlane.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Runtime.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Security.Tests) (push) Has been skipped
v2-ci / integration (push) Has been skipped
Splits the side-effecting half of Phase7Composer (deferred at Task 47)
into two pieces that mirror DriverHostActor's spawn-plan pattern:

Phase7Plan + Phase7Planner.Compute (pure):
  Diff two Phase7CompositionResult snapshots by stable id (EquipmentId,
  DriverInstanceId, ScriptedAlarmId). Emits Added/Removed/Changed lists
  per entity class. Added/Removed are sorted by id for deterministic
  apply order. Changed wraps both Previous and Current projections so
  consumers can decide between in-place mutation and tear-down +
  rebuild.

Phase7Applier (side-effecting):
  Drives an IOpcUaAddressSpaceSink against a plan. Removed equipment/
  alarms get an inactive AlarmState write per id; Added/Removed of
  Equipment or ScriptedAlarm triggers RebuildAddressSpace. Driver-only
  changes correctly skip the rebuild — those flow through DriverHost-
  Actor's spawn-plan in Runtime. Sink exceptions are caught + logged so
  one bad node doesn't abort the apply.

Tests: OpcUaServer 6 -> 20 (+14):
- Phase7PlannerTests x9 (empty-in/empty-out, add/remove/change per
  entity class, mixed changes, deterministic ordering)
- Phase7ApplierTests x5 (empty plan no-op, removal writes inactive
  states + rebuild, added equipment triggers rebuild, driver-only
  skips rebuild, sink fault is non-fatal)

The remaining piece is the EquipmentNodeWalker integration against a
real SDK NodeManager — split as F14b, gated on F10b's SDK builder.

All 6 v2 test suites green: 146 tests passing.
2026-05-26 09:16:08 -04:00
Joseph Doherty a1325299ce feat(runtime): F10 OpcUaPublishActor sink seams + redundancy-driven ServiceLevel
OpcUaPublishActor now routes through pluggable seams instead of just
incrementing a counter:

- IOpcUaAddressSpaceSink (Commons.OpcUa) — WriteValue / WriteAlarmState
  / RebuildAddressSpace. OpcUaQuality enum moved here from the actor's
  nested type so producers don't have to reference the actor itself.
- IServiceLevelPublisher — Publish(byte). NullServiceLevelPublisher
  retains the last level for inspection.
- The actor subscribes to the redundancy-state DPS topic in PreStart
  and maps the local node's NodeRedundancyState to a coarse
  ServiceLevel (Primary+leader=240, Primary=200, Secondary=100,
  Detached=0). This keeps the local SDK's ServiceLevel node honest
  without round-tripping back through the admin-singleton calculator.
- ServiceLevelChanged dedupes identical levels so the SDK doesn't see
  redundant writes.
- Sink + publisher exceptions are caught and logged; the actor never
  crashes its own dispatcher.
- PropsForTests gets optional sink/publisher/localNode params and
  skips the DPS subscribe so unit tests stay on a vanilla TestKit
  cluster.

Production binding to a real SDK NodeManager + Variable nodes is the
remaining residual — split as F10b. Task 60 still blocked on F10b.

Tests: Runtime 40 -> 46 (+6):
- AttributeValueUpdate routes to sink
- AlarmStateUpdate routes to sink
- RebuildAddressSpace calls sink.Rebuild
- ServiceLevelChanged dedupes
- RedundancyStateChanged for primary-leader publishes 240
- RedundancyStateChanged for secondary publishes 100

All 6 v2 test suites green: 132 tests passing.
2026-05-26 09:10:55 -04:00
Joseph Doherty 14fb2b05ed feat(runtime): F8/F9 engine evaluator seams + DPS fan-out
VirtualTagActor and ScriptedAlarmActor now route through pluggable
evaluator interfaces and fan out to the cluster's live-tail topics
shipped in F15.3:

- IVirtualTagEvaluator + NullVirtualTagEvaluator in Commons.Engines.
  VirtualTagActor calls evaluator on every DependencyValueChanged,
  dedupes unchanged values, forwards EvaluationResult to its parent,
  and publishes ScriptLogEntry Warning to the script-logs DPS topic
  whenever the evaluator fails.

- IScriptedAlarmEvaluator + NullScriptedAlarmEvaluator. ScriptedAlarmActor
  takes an AlarmConfig (id/name/equipment-path/severity/predicate) and
  publishes both an AlarmTransitionEvent (alerts topic) and a
  ScriptLogEntry (script-logs topic) at every transition. Manual
  ConditionMet/Acknowledge/Cleared still flow through the same
  Transition() so callers without engine bindings still drive the
  state machine; the legacy single-string Props() overload routes
  through a default AlarmConfig.

The Null* defaults keep the actors safe when no engine is bound —
unconfigured nodes never spuriously alarm. Production binding to
Core.VirtualTags.VirtualTagEngine and Core.ScriptedAlarms is the
remaining residual (F8b/F9b — split in tasks JSON).

Tests: Runtime 34 -> 40 (+6):
- VirtualTagActorTests x3 (evaluator drives EvaluationResult,
  unchanged-value dedup, failure publishes Warning ScriptLogEntry)
- ScriptedAlarmActorTests x3 (engine threshold drives Activated +
  Cleared on alerts topic, manual Acknowledge attribution).

All 6 v2 test suites green: 126 tests passing.
2026-05-26 09:05:04 -04:00
Joseph Doherty da141497f8 feat(runtime): F7 spawn lifecycle + F20 ShouldStub gate
DriverHostActor.ApplyAndAck now reads the deployment artifact and
reconciles its set of DriverInstanceActor children — spawn the missing,
ApplyDelta to those with changed config, stop the removed/disabled.
The diff lives in pure DriverSpawnPlanner so it can be unit-tested
without an ActorSystem.

Adds IDriverFactory in Core.Abstractions (consumed by Runtime) +
DriverFactoryRegistryAdapter in Core.Hosting that wraps the existing
v1 DriverFactoryRegistry — Runtime stays decoupled from Polly/Serilog,
the Host wires the adapter once driver assemblies have registered.

ShouldStub(type, roles) is now actually called on every spawn — Galaxy
+ Wonderware-Historian boot stubbed on macOS/Linux or whenever the host
carries the dev role. Missing factory ⇒ stub fallback, never a crash.

Tests: 24 → 34 in Runtime (+10):
- DriverSpawnPlannerTests x7 (diff cases, type change ⇒ stop+respawn)
- DeploymentArtifactTests  x5 (empty/malformed/missing fields tolerant)
- DriverHostActorReconcileTests x4 (spawn count, stub fallback,
  ShouldStub gate, second-apply stops the removed)
All 6 v2 test suites green: 120 tests passing.

Closes F20 (ShouldStub wired). F7 marked partial — subscription
publishing + write path still stubbed in DriverInstanceActor itself.
2026-05-26 08:57:16 -04:00
Joseph Doherty 9892ceae9a docs(plans): mark F15.3 complete — F15 fully shipped
v2-ci / build (push) Failing after 42s
v2-ci / unit-tests (tests/Core/ZB.MOM.WW.OtOpcUa.Cluster.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.ControlPlane.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Runtime.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Security.Tests) (push) Has been skipped
v2-ci / integration (push) Has been skipped
2026-05-26 08:39:47 -04:00
Joseph Doherty 59858129cb feat(adminui): F15.3 closes F15 — live alerts/script-log, CSV import, Monaco editor
v2-ci / unit-tests (tests/Core/ZB.MOM.WW.OtOpcUa.Cluster.Tests) (push) Has been cancelled
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.ControlPlane.Tests) (push) Has been cancelled
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer.Tests) (push) Has been cancelled
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Runtime.Tests) (push) Has been cancelled
v2-ci / build (push) Has been cancelled
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Security.Tests) (push) Has been cancelled
v2-ci / integration (push) Has been cancelled
Final F15 batch wires up the SignalR-backed live pages, ports the bulk
equipment importer, and progressively enhances the Script source editor
with Monaco.

Message contracts:
- Commons.Messages.Alerts.AlarmTransitionEvent — fires on every alarm
  state transition; published on the `alerts` DPS topic by future
  ScriptedAlarmActor (F9) emits.
- Commons.Messages.Logging.ScriptLogEntry — one log line emitted by a
  hosted script; published on the `script-logs` DPS topic by future
  VirtualTagActor (F8) + ScriptedAlarmActor (F9) emits.
  (Folder named "Logging" to dodge .gitignore's "logs/" rule.)

SignalR plumbing:
- AlertHub gains MethodName + bridge actor (AlertSignalRBridge)
- ScriptLogHub introduced; ScriptLogSignalRBridge follows the same
  DPS-subscribe → IHubContext fan-out pattern as FleetStatusSignalRBridge
- WithOtOpcUaSignalRBridges now spawns all three bridges
- MapOtOpcUaHubs maps /hubs/script-log alongside the existing hubs

Pages:
- /alerts                      live alarm tail, 200-row capacity
- /script-log                  live script-log tail with level + script
                               filter, 500-row capacity
- /clusters/{id}/equipment/import — CSV bulk Equipment add with preview
                                    (Name/MachineCode/UnsLineId/Driver +
                                    optional ZTag/SAPID/Manufacturer/Model;
                                    skips rows whose MachineCode already
                                    exists in the fleet)
- ScriptEdit progressively enhanced with Monaco editor via JSInterop —
  the textarea remains Blazor's source of truth and Monaco syncs into it
  on every keystroke so @bind keeps working; falls back gracefully if
  the CDN is unreachable.

MainLayout nav gains a "Live" section (Deployments, Alerts, Alarms
historian) and a "Scripts" link under Scripting. ClusterEquipment
surfaces the new Import CSV button.

Tally: F15 ships ~42 razor pages + 3 SignalR hubs + 3 bridge actors.
Microsoft.AspNetCore.SignalR.Client added (was already in central PM).

All 104 v2 tests remain green.
2026-05-26 08:39:17 -04:00
Joseph Doherty e248e037e7 docs(plans): mark F15 complete — read views + live-edit CRUD
v2-ci / build (push) Failing after 39s
v2-ci / unit-tests (tests/Core/ZB.MOM.WW.OtOpcUa.Cluster.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.ControlPlane.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Runtime.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Security.Tests) (push) Has been skipped
v2-ci / integration (push) Has been skipped
2026-05-26 08:28:13 -04:00
Joseph Doherty ae980aef5d feat(adminui): F15.2 batch 4 — closes live-edit forms (Acl/VirtualTag/ScriptedAlarm/Script)
v2-ci / unit-tests (tests/Core/ZB.MOM.WW.OtOpcUa.Cluster.Tests) (push) Has been cancelled
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.ControlPlane.Tests) (push) Has been cancelled
v2-ci / build (push) Has been cancelled
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer.Tests) (push) Has been cancelled
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Runtime.Tests) (push) Has been cancelled
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Security.Tests) (push) Has been cancelled
v2-ci / integration (push) Has been cancelled
Final batch of F15.2. After this commit every entity surfaced by the
Phase A-D read views has a matching new/edit/delete form.

- AclEdit.razor                /clusters/{id}/acls/{new|aclId}
  - NodePermissions [Flags] enum surfaced as per-bit checkboxes plus
    one-click bundle buttons (ReadOnly / Operator / Engineer / Admin)
  - ScopeKind select + ScopeId free-text target (null = cluster-wide)
- VirtualTagEdit.razor         /virtual-tags/{new|virtualTagId}
  - Trigger validation: enforces at least one of ChangeTriggered or
    TimerIntervalMs is set
- ScriptedAlarmEdit.razor      /scripted-alarms/{new|scriptedAlarmId}
  - AlarmType select with OPC UA Part 9 subtypes
  - MessageTemplate is a textarea (template tokens are server-resolved)
- ScriptEdit.razor             /scripts/{new|scriptId}
  - SHA-256 hash computed from SourceCode on save (operator never sees
    or edits SourceHash directly)
  - InputTextArea now; Monaco syntax editor is a future enhancement

List pages (ClusterAcls / VirtualTags / ScriptedAlarms / Scripts) all
gain New + per-row Edit affordances.

Tally: F15.2 shipped CRUD for 11 entities — Cluster, ClusterNode,
UnsArea, UnsLine, Namespace, DriverInstance, Equipment, Tag, NodeAcl,
VirtualTag, ScriptedAlarm, Script.

All 9 integration tests still green.
2026-05-26 08:27:56 -04:00
Joseph Doherty 2662ac08e4 feat(adminui): F15.2 batch 3 — Equipment + Tag CRUD (operator surfaces)
v2-ci / build (push) Failing after 44s
v2-ci / unit-tests (tests/Core/ZB.MOM.WW.OtOpcUa.Cluster.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.ControlPlane.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Runtime.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Security.Tests) (push) Has been skipped
v2-ci / integration (push) Has been skipped
The two most-edited entities for daily operator workflows. Both follow the
same single-page edit-or-create pattern from batches 1 + 2 with RowVersion
optimistic concurrency.

- EquipmentEdit.razor   /clusters/{id}/equipment/{new|EquipmentId}
  - EquipmentId is system-generated on create (decision #125): EQ-{first
    12 hex chars of a new EquipmentUuid}.
  - UNS line + driver instance selects are scoped to the cluster.
  - All 9 OPC 40010 identification fields surfaced as an optional panel.
  - MachineCode uniqueness checked client-side before EF unique index
    enforces it server-side.
- TagEdit.razor         /clusters/{id}/tags/{new|TagId}
  - Equipment vs FolderPath input switches based on the selected
    driver's namespace kind — Equipment-kind requires EquipmentId,
    SystemPlatform-kind requires FolderPath (decision #110 invariant
    enforced client-side; sp_ValidateDraft re-enforces server-side at
    deploy).
  - DataType select uses the OPC UA built-in primitive type names.
  - TagConfig validated as JSON pre-flight.

ClusterEquipment + ClusterTags list pages get New / Edit affordances.

All 9 integration tests still green.
2026-05-26 08:22:51 -04:00
Joseph Doherty 45740578c9 feat(adminui): F15.2 batch 2 — topology entity CRUD
v2-ci / build (push) Failing after 52s
v2-ci / unit-tests (tests/Core/ZB.MOM.WW.OtOpcUa.Cluster.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.ControlPlane.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Runtime.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Security.Tests) (push) Has been skipped
v2-ci / integration (push) Has been skipped
Same single-page edit-or-create pattern as batch 1, applied to the
foundational topology entities. After this batch the whole hierarchy
(cluster → nodes → UNS areas → UNS lines → namespaces → drivers) is
fully editable through the UI.

- ClusterEdit.razor                  /clusters/{id}/edit
  Update + delete for an existing cluster. NodeCount stays coupled to
  RedundancyMode (None→1, Warm/Hot→2). ModifiedBy taken from
  AuthenticationStateProvider.
- NodeEdit.razor                     /clusters/{id}/nodes/{new|nodeId}
  Full ClusterNode CRUD. ApplicationUri uniqueness is enforced by EF
  index; ServiceLevelBase defaults to 200 (primary preference) on
  create; per-node DriverConfigOverridesJson validated as JSON.
- UnsAreaEdit.razor                  /clusters/{id}/uns/areas/{new|id}
- UnsLineEdit.razor                  /clusters/{id}/uns/lines/{new|id}
  UNS structure CRUD; Lines pick their parent Area from a select that
  loads the cluster's areas.

List pages updated:
- ClusterOverview now shows an "Edit cluster" button + a "New node"
  action on the nodes panel + per-row Edit buttons.
- ClusterUns gains New/Edit affordances for both Areas and Lines.

All 9 integration tests still green; no regressions.
2026-05-26 08:18:49 -04:00
Joseph Doherty 5ae67a48ba feat(adminui): F15.2 batch 1 — Namespace + DriverInstance live-edit CRUD
v2-ci / build (push) Failing after 34s
v2-ci / unit-tests (tests/Core/ZB.MOM.WW.OtOpcUa.Cluster.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.ControlPlane.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Runtime.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Security.Tests) (push) Has been skipped
v2-ci / integration (push) Has been skipped
Pattern proof for the live-edit forms gated by Phases A–D's read views.
Each entity gets a single edit page handling both create (route param
omitted) and update (route param present) modes, with RowVersion-based
optimistic concurrency checked against EF Core's
DbUpdateConcurrencyException.

Pattern:
- @page "/clusters/{id}/<thing>/new"
- @page "/clusters/{id}/<thing>/{rowId}"
- IsNew computed from rowId presence
- EditForm + DataAnnotations validation
- byte[] RowVersion stashed on FormModel; assigned to
  Entry(e).Property(e => e.RowVersion).OriginalValue before SaveChanges
- Delete button (edit mode only) flows through the same RowVersion check
- Concurrency conflict surfaces as an inline error panel; user reloads

This batch:
- NamespaceEdit.razor          — small entity, validates the pattern
- DriverEdit.razor             — keystone for everything downstream
                                 (Equipment/Tag/VirtualTag/ScriptedAlarm),
                                 JSON config editor per Q1 with reformat
                                 on save and validation pre-flight
- ClusterNamespaces row gains an Edit button + New action
- ClusterDrivers expanded view gains an Edit button + New action

Equipment/UnsArea/UnsLine/Tag/ACL/VirtualTag/ScriptedAlarm/Script forms
follow this same template in subsequent F15.2 batches.

All 9 integration tests still green; no v2 test regressions.
2026-05-26 08:14:36 -04:00
Joseph Doherty d055cb059e docs(plans): mark F15 partial — Phases A–D shipped
v2-ci / build (push) Failing after 34s
v2-ci / unit-tests (tests/Core/ZB.MOM.WW.OtOpcUa.Cluster.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.ControlPlane.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Runtime.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Security.Tests) (push) Has been skipped
v2-ci / integration (push) Has been skipped
2026-05-26 08:02:02 -04:00
Joseph Doherty 74161f9460 feat(adminui): F15 Phase D — logic + ops pages
v2-ci / unit-tests (tests/Core/ZB.MOM.WW.OtOpcUa.Cluster.Tests) (push) Has been cancelled
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.ControlPlane.Tests) (push) Has been cancelled
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer.Tests) (push) Has been cancelled
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Runtime.Tests) (push) Has been cancelled
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Security.Tests) (push) Has been cancelled
v2-ci / integration (push) Has been cancelled
v2-ci / build (push) Has been cancelled
- ClusterAudit (/clusters/{id}/audit) — reads ConfigAuditLog with the
  EventId/CorrelationId columns added in F3; shown as a Cluster tab
- VirtualTags (/virtual-tags)            — fleet-wide read view
- ScriptedAlarms (/scripted-alarms)      — fleet-wide read view
- Scripts (/scripts)                     — fleet-wide; expandable code preview
- RoleGrants (/role-grants)              — per Q4, surfaces the fleet-wide
                                           LDAP-group → role mapping from
                                           Authentication:Ldap:GroupToRole
                                           (read-only; reload via host restart)
- Certificates (/certificates)           — own/trusted/issuer/rejected store
                                           contents resolved against
                                           OpcUa:PkiStoreRoot config (F13a)
- Reservations (/reservations)           — ExternalIdReservation table
- AlarmsHistorian (/alarms-historian)    — live HistorianAdapterActor sink
                                           status via the F11 GetStatus query;
                                           5s polling

ScriptLog deferred (needs the F16-deferred ScriptLogHub bridge).
ClusterNav extended with the Audit tab.

Adds an AdminUI → Runtime project reference so the historian status page can
inject IRequiredActor<HistorianAdapterActorKey>. NuGet audit suppression for
the transitive Opc.Ua.Core advisory mirrored from the Runtime project.

All 104 v2 tests still green.
2026-05-26 08:01:23 -04:00
Joseph Doherty 396052a126 feat(adminui): F15 Phase C — config-tab read views (Equipment/UNS/Namespaces/Drivers/Tags/ACLs)
v2-ci / build (push) Failing after 38s
v2-ci / unit-tests (tests/Core/ZB.MOM.WW.OtOpcUa.Cluster.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.ControlPlane.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Runtime.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Security.Tests) (push) Has been skipped
v2-ci / integration (push) Has been skipped
Per Q3 of the rebuild plan, each v1 ClusterDetail tab becomes a separate
route under /clusters/{id}/<tab>. This batch adds read-only table views
for the six core config entity types; live-edit forms with RowVersion
concurrency land in Phase C.2 once the read-view shape is reviewed.

- ClusterEquipment    /clusters/{id}/equipment   — joins via DriverInstance
                                                   so the cluster scope works
- ClusterUns          /clusters/{id}/uns         — Areas + Lines tables
- ClusterNamespaces   /clusters/{id}/namespaces  — Kind + URI + Enabled chip
- ClusterDrivers      /clusters/{id}/drivers     — collapsed list with JSON
                                                   config expandable per Q1
                                                   (typed editors deferred)
- ClusterTags         /clusters/{id}/tags        — first 200 by name + filter
- ClusterAcls         /clusters/{id}/acls        — LDAP group + scope +
                                                   NodePermissions bits

Shared ClusterNav.razor extracted; ClusterOverview + ClusterRedundancy
updated to use it. _Imports.razor adds Components.Shared so the shared
nav is in scope across pages.
2026-05-26 07:56:39 -04:00
Joseph Doherty fd0cc4dfdb feat(adminui): F15 Phase B — cluster CRUD + Overview/Redundancy routes
v2-ci / build (push) Failing after 38s
v2-ci / unit-tests (tests/Core/ZB.MOM.WW.OtOpcUa.Cluster.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.ControlPlane.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Runtime.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Security.Tests) (push) Has been skipped
v2-ci / integration (push) Has been skipped
- ClustersList (/clusters) — table view, row-click opens detail
- NewCluster (/clusters/new) — EditForm with DataAnnotations; redundancy
  mode + node-count coupling enforced client-side (None→1, Warm/Hot→2);
  CreatedBy taken from AuthenticationStateProvider
- ClusterOverview (/clusters/{id}) — cluster details + last-deployment
  badge + node list. Per Q3, the legacy 10-tab monolith is split into
  separate routes; this page hosts the Overview "tab" as its primary slot
- ClusterRedundancy (/clusters/{id}/redundancy) — static ServiceLevelBase
  config view; live ServiceLevel comes via RedundancyStateActor DPS topic
  (deferred to its own follow-up once the SignalR bridge lands)

The other 8 v1 cluster tabs (Equipment, UNS, Namespaces, Drivers, Tags,
ACLs, ScriptedAlarms, Scripts, Audit) land in Phase C/D.
2026-05-26 07:52:41 -04:00
Joseph Doherty 850d6774ea feat(adminui): F15 Phase A — shell + auth + fleet + hosts pages
v2-ci / build (push) Failing after 38s
v2-ci / unit-tests (tests/Core/ZB.MOM.WW.OtOpcUa.Cluster.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.ControlPlane.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Runtime.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Security.Tests) (push) Has been skipped
v2-ci / integration (push) Has been skipped
Implements Phase A of the F15 rebuild plan: minimum-viable Admin surface
with a working sign-in path and a fleet-state landing page. Decisions Q1–Q5
of docs/v2/AdminUI-rebuild-plan.md were taken as recommended.

- App.razor (moved into AdminUI library from the Host stub; vendored
  Bootstrap from RCL wwwroot — no public CDN, air-gap safe)
- Routes.razor (AuthorizeRouteView enforces page-level [Authorize])
- RedirectToLogin.razor (preserves returnUrl through the auth hop)
- Login.razor (static SSR, posts to /auth/login; Q5 wording about
  generic-vs-specific LDAP errors)
- Account.razor (identity + fleet roles + raw LDAP groups; Q4 — no
  per-cluster grants; fleet-wide LDAP-group → role mapping only)
- Fleet.razor (per-node deployment status: reads NodeDeploymentState
  + unions with IClusterRoleInfo.MembersWithRole("driver") so freshly-
  joined nodes appear as "waiting"; 10s auto-refresh)
- Hosts.razor (Akka cluster topology: members, status, roles, role-
  leader; 5s auto-refresh)

Host's stub App.razor deleted; Program.cs now points at
AdminUI.Components.App via an added using.

All 104 v2 tests remain green.
2026-05-26 07:49:35 -04:00
Joseph Doherty 5c754ecffd docs(v2): F15 UX kickoff — AdminUI rebuild plan
v2-ci / build (push) Failing after 41s
v2-ci / unit-tests (tests/Core/ZB.MOM.WW.OtOpcUa.Cluster.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.ControlPlane.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Runtime.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Security.Tests) (push) Has been skipped
v2-ci / integration (push) Has been skipped
47-page legacy inventory mapped to v2 disposition (5 already done, 22 port
as-is, 7 reshape, 5 dropped because live-edit replaces draft/publish, 4
deferred driver-typed editors). Net ~30 active pages to rebuild.

Five open design questions surfaced for review before per-page work starts:
Q1 driver-typed editors (defer vs. ship), Q2 top-level fleet-wide views
(drop vs. keep), Q3 ClusterDetail tabs vs. split routes, Q4 RoleGrants
cluster-scoped vs. LDAP-group fleet-wide, Q5 Login error UX.

Proposed 4-phase sequencing (~5 days total): shell+auth+fleet, cluster
CRUD, config tabs, logic+ops. Each phase independently mergeable.
2026-05-26 07:38:58 -04:00
Joseph Doherty 68c6f36cfe docs(plans): mark F13a partial-complete (36c4751)
v2-ci / build (push) Failing after 42s
v2-ci / unit-tests (tests/Core/ZB.MOM.WW.OtOpcUa.Cluster.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.ControlPlane.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Runtime.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Security.Tests) (push) Has been skipped
v2-ci / integration (push) Has been skipped
2026-05-26 07:35:04 -04:00
Joseph Doherty 36c4751571 feat(opcua): F13a — cert auto-creation in OpcUaApplicationHost
Adds OPC UA SDK's CheckApplicationInstanceCertificate call to
OpcUaApplicationHost.StartAsync, removing the v1 friction of needing to
pre-create the PKI directory tree before booting.

- New OpcUaApplicationHostOptions.PkiStoreRoot (defaults to "pki")
- BuildConfigurationAsync now derives own/issuer/trusted/rejected from
  PkiStoreRoot so the cert paths are configurable + consistent
- EnsureApplicationCertificateAsync runs before StandardServer.Start, and
  fails fast with a clear message if the SDK can't produce a valid cert
- 2 new tests: fresh-tree creates a cert, second boot reuses it

Partial slice of follow-up F13. Endpoint-security, user-token validator,
and observability wiring still pending in the F13 follow-up. OpcUaServer
tests: 4 → 6.
2026-05-26 07:34:48 -04:00
Joseph Doherty 229282ad8b docs(plans): mark F21 complete (b0a2bb0)
v2-ci / build (push) Failing after 43s
v2-ci / unit-tests (tests/Core/ZB.MOM.WW.OtOpcUa.Cluster.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.ControlPlane.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Runtime.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Security.Tests) (push) Has been skipped
v2-ci / integration (push) Has been skipped
2026-05-26 07:25:36 -04:00
Joseph Doherty b0a2bb037d test(integration): F21 — docker-compose + env-driven SQL/LDAP harness mode
Adds a real-infra mode for the integration test harness alongside the default
in-memory mode. Drops the previously-untested code paths (EF SqlServer
behaviors, real LDAP bind) under env-var control without breaking the
zero-infra default that CI runs.

- docker-compose.yml — minimal SQL 2022 (14331) + OpenLDAP (3894) stack
  (ports chosen to coexist with docker-dev/ on 14330/3893)
- HarnessMode record reads OTOPCUA_HARNESS_USE_SQL=1 / USE_LDAP=1 from env
- SQL mode: per-harness unique DB OtOpcUa_Harness_{guid}, EnsureCreated
  at startup, EnsureDeleted on dispose (best-effort)
- LDAP mode: drops StubLdapAuthService and configures real LdapAuthService
  against the compose'd OpenLDAP via Authentication:Ldap:* config keys
- Microsoft.EntityFrameworkCore.SqlServer added to the test project
- README documents both modes + the macOS no-Docker caveat

Default in-memory mode unchanged — all 9 existing tests still pass.
2026-05-26 07:25:16 -04:00
Joseph Doherty ba6e5dd7f9 docs(plans): mark F11 + F22 complete
v2-ci / build (push) Failing after 49s
v2-ci / unit-tests (tests/Core/ZB.MOM.WW.OtOpcUa.Cluster.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.ControlPlane.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Runtime.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Security.Tests) (push) Has been skipped
v2-ci / integration (push) Has been skipped
F22 → cd5540c (failover scenarios on TwoNodeClusterHarness)
F11 → 6861381 (HistorianAdapterActor → IAlarmHistorianSink bridge)

Branch follow-ups: 11/22 → 13/22 done. Remaining 9 are engine wiring
gated on real drivers/SDKs (F7-F10, F13-F14), Admin UI rebuild (F15),
fold-in to F7 (F20), and SQL/LDAP harness mode (F21).
2026-05-26 07:19:07 -04:00
Joseph Doherty 686138123f feat(runtime): F11 — HistorianAdapterActor wired to IAlarmHistorianSink
v2-ci / unit-tests (tests/Core/ZB.MOM.WW.OtOpcUa.Cluster.Tests) (push) Has been cancelled
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.ControlPlane.Tests) (push) Has been cancelled
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer.Tests) (push) Has been cancelled
v2-ci / build (push) Has been cancelled
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Runtime.Tests) (push) Has been cancelled
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Security.Tests) (push) Has been cancelled
v2-ci / integration (push) Has been cancelled
Reshapes the placeholder buffered-counter actor into a thin fire-and-forget
bridge over the existing IAlarmHistorianSink contract. Default sink is
NullAlarmHistorianSink; production deployments override the DI binding to
SqliteStoreAndForwardSink wrapping WonderwareHistorianClient (the v1
components in src/Drivers/ZB.MOM.WW.OtOpcUa.Driver.Historian.Wonderware*
are reused verbatim — actor is just a mailbox-friendly entry point).

- HistorianAdapterActor.Props(IAlarmHistorianSink?) — null defaults to NullAlarmHistorianSink
- Receive<AlarmHistorianEvent>: fire-and-forget sink.EnqueueAsync
- Receive<GetStatus>: returns sink.GetStatus() (queue depth + drain state)
- ServiceCollectionExtensions.AddOtOpcUaRuntime registers the default sink
- WithOtOpcUaRuntimeActors spawns the actor + registers HistorianAdapterActorKey
- Program.cs calls AddOtOpcUaRuntime when hasDriver

Tests: 2 new (forward-to-sink + GetStatus). Runtime suite 17 → 18.
2026-05-26 07:18:08 -04:00
Joseph Doherty cd5540cb1a test(integration): F22 — failover scenario tests + harness Stop/Restart primitives
Extends TwoNodeClusterHarness with three lifecycle primitives:
- StopNodeBAsync()      — graceful CoordinatedShutdown (Cluster.Leave)
- RestartNodeBAsync()   — rebuild node B on same Akka port + same in-memory DB
- WaitForClusterSizeAsync(n) — converge assertion helper

Adds three failover scenario tests:
- Stopping node B shrinks cluster to 1 Up member
- Restarted node B rejoins on the same Akka port
- Deployment started with B down seals with a single NodeDeploymentState
  (validates ConfigPublishCoordinator.DiscoverDriverNodes snapshots
   membership at dispatch time)

Closes follow-up F22. Integration test count: 6 → 9 (+3).
2026-05-26 07:13:14 -04:00
Joseph Doherty 4e6ef648d1 docs(plans): mark F16 complete
v2-ci / build (push) Failing after 3m8s
v2-ci / unit-tests (tests/Core/ZB.MOM.WW.OtOpcUa.Cluster.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.ControlPlane.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Runtime.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Security.Tests) (push) Has been skipped
v2-ci / integration (push) Has been skipped
2026-05-26 07:01:54 -04:00
Joseph Doherty f18c285cca feat(adminui): FleetStatusSignalRBridge — DPS → SignalR forwarding (F16)
v2-ci / unit-tests (tests/Core/ZB.MOM.WW.OtOpcUa.Cluster.Tests) (push) Has been cancelled
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.ControlPlane.Tests) (push) Has been cancelled
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer.Tests) (push) Has been cancelled
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Runtime.Tests) (push) Has been cancelled
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Security.Tests) (push) Has been cancelled
v2-ci / integration (push) Has been cancelled
v2-ci / build (push) Has been cancelled
New per-admin-node actor that subscribes to the fleet-status DistributedPubSub
topic + forwards every FleetStatusChanged snapshot to all SignalR clients
connected to FleetStatusHub via IHubContext.

Wired via WithOtOpcUaSignalRBridges (new AkkaConfigurationBuilder extension in
AdminUI.Hubs) — Program.cs calls it inside the if(hasAdmin) block alongside
WithOtOpcUaControlPlaneSingletons.

Per-node subscription rather than cluster-singleton: every admin node forwards
its own snapshots to its own connected clients. Simpler than singleton
coordination + acceptable because the messages are small and SignalR fan-out
is per-node anyway.
2026-05-26 07:01:08 -04:00
Joseph Doherty 7a6b016d9e docs(plans): mark F17 complete
v2-ci / build (push) Failing after 41s
v2-ci / unit-tests (tests/Core/ZB.MOM.WW.OtOpcUa.Cluster.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.ControlPlane.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Runtime.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Security.Tests) (push) Has been skipped
v2-ci / integration (push) Has been skipped
2026-05-26 06:58:27 -04:00
Joseph Doherty 8f32b89fb9 feat(adminui): FleetDiagnosticsClient real Akka ActorSelection round-trip (F17)
v2-ci / unit-tests (tests/Core/ZB.MOM.WW.OtOpcUa.Cluster.Tests) (push) Has been cancelled
v2-ci / build (push) Has been cancelled
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.ControlPlane.Tests) (push) Has been cancelled
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer.Tests) (push) Has been cancelled
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Runtime.Tests) (push) Has been cancelled
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Security.Tests) (push) Has been cancelled
v2-ci / integration (push) Has been cancelled
- New Commons.Messages.Fleet.GetDiagnostics request record.
- DriverHostActor handles GetDiagnostics in all three states (Steady, Applying,
  Stale); replies with a NodeDiagnosticsSnapshot built from _currentRevision
  + the local NodeId. Drivers list is empty until F7 wires the per-instance
  children.
- FleetDiagnosticsClient now resolves the target via ActorSelection at
  akka.tcp://{system}@{nodeId}/user/driver-host and Asks with a 3s timeout.
  On timeout/peer-down it returns an empty snapshot so the UI degrades
  gracefully rather than throwing.

Two new integration tests in Host.IntegrationTests:
- GetDiagnostics_returns_snapshot_with_target_NodeId verifies the
  cross-node Ask/Reply works.
- GetDiagnostics_after_deploy_reports_current_revision exercises the
  end-to-end path: AdminOps starts a deployment, both DriverHostActors
  apply, then diagnostics reports the new revision on both nodes.

All 98 v2 tests pass (was 96 + 2 new).
2026-05-26 06:58:11 -04:00
Joseph Doherty 337a691629 docs(plans): mark F3, F4, F5, F12 follow-ups complete
v2-ci / build (push) Failing after 38s
v2-ci / unit-tests (tests/Core/ZB.MOM.WW.OtOpcUa.Cluster.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.ControlPlane.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Runtime.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Security.Tests) (push) Has been skipped
v2-ci / integration (push) Has been skipped
2026-05-26 06:55:39 -04:00
Joseph Doherty b06e3ae740 feat(runtime): PeerOpcUaProbeActor real TCP-connect probe (F12)
Replaces the Ok=true stub with a TCP connect to the peer's OPC UA port (4840
default) with a 2s timeout. A successful connect indicates the OPC UA server
process is up + accepting connections — enough for the redundancy calculator
to treat the peer as live. A full secure-channel Hello/Acknowledge handshake
is overkill for what the redundancy calc consumes and would pull in the OPC
UA Client SDK + a PKI setup. Upgrade later if a deeper liveness signal is ever
required.

Probe extracts the host from NodeId by stripping the :port suffix (commit
5cfbe8b encoded host:port into NodeId for cluster-member identity).

Tests: 2 new tests — Ok=true against a live TcpListener on a chosen port,
Ok=false against an unreachable endpoint. All 17 Runtime tests pass (was 16
covering only the message-contract surface).
2026-05-26 06:54:51 -04:00
Joseph Doherty f57f61deac feat(audit): EventId + CorrelationId columns + filtered unique index (F3 + F4)
ConfigAuditLog gains two nullable columns (EventId, CorrelationId) + a filtered
unique index UX_ConfigAuditLog_EventId. EF migration
20260526105027_AddConfigAuditLogEventIdColumns is additive (nullable + filtered
index = legacy rows backfill cleanly).

AuditWriterActor now writes EventId + CorrelationId into the dedicated columns
instead of synthesising a JSON wrapper into DetailsJson. Cross-restart dedup
is now real: a retry of an already-flushed batch hits the unique index and
SaveChanges throws; the existing catch drops the duplicate without losing the
rest of the batch.

WrapDetails helper deleted — F4 (its JSON hardening) becomes moot.

AuditWriterActorTests.Details_wrapper_embeds_eventId_and_correlationId renamed
+ rewritten to assert against the columns. All 29 ControlPlane tests pass,
all 95 v2 tests green.
2026-05-26 06:52:53 -04:00
Joseph Doherty 8e5c8e29f7 docs(plans): mark Task 61 complete
v2-ci / build (push) Failing after 1m21s
v2-ci / unit-tests (tests/Core/ZB.MOM.WW.OtOpcUa.Cluster.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.ControlPlane.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Runtime.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Security.Tests) (push) Has been skipped
v2-ci / integration (push) Has been skipped
2026-05-26 06:49:06 -04:00
Joseph Doherty 253fb60459 ci: v2 build + unit + integration workflow, nightly E2E (Task 61)
.github/workflows/v2-ci.yml runs on push/PR to v2-akka-fuse + master:
  - build job: dotnet restore + build (Release)
  - unit-tests job: matrix over the 5 v2 test projects (Cluster, ControlPlane,
    Runtime, Security, OpcUaServer) with Category!=E2E
  - integration job: Host.IntegrationTests with Category!=E2E

.github/workflows/v2-e2e.yml runs nightly at 03:00 UTC + workflow_dispatch:
  - Brings up the docker-dev four-node fleet (admin pair + driver pair + SQL
    + LDAP + Traefik)
  - Waits up to 60s for /health/active to return 200
  - Runs Category=E2E only
  - Always tears down with -v

Both workflows pin .NET 10 via actions/setup-dotnet (no global.json so any
10.0 SDK works). Compatible with both GitHub Actions and Gitea Actions
(act_runner). The E2E filter currently matches zero tests because the
tests/Server/ZB.MOM.WW.OtOpcUa.E2ETests project doesn't exist yet — it lands
when F10/F11/F12 wire enough engine for an end-to-end round-trip to be
meaningful.
2026-05-26 06:48:52 -04:00
Joseph Doherty 8ac71db464 docs(plans): mark Tasks 62, 63, 64, 65 complete 2026-05-26 06:46:55 -04:00
Joseph Doherty 7e3b56c27d feat(deploy): Traefik active-leader routing + docker-dev compose (Task 63)
- scripts/install/traefik.yml + traefik-dynamic.yml: Traefik static + dynamic
  config. One :80 entry point, one router on HostRegexp(otopcua.*), one
  service load-balancing admin-a:9000 + admin-b:9000 with /health/active health
  check (interval 5s, timeout 2s, expected 200). Followers return 503 from
  /health/active so Traefik drops them within the next interval after a
  leadership change.

- scripts/install/Install-Traefik.ps1: downloads Traefik for Windows, drops the
  yml configs, registers the OtOpcUaTraefik Windows service via sc.exe with
  restart-on-failure. Companion to Install-Services.ps1.

- docker-dev/{Dockerfile,docker-compose.yml,traefik-dynamic.yml,README.md}:
  Mac-friendly four-node fleet (admin-a + admin-b + driver-a + driver-b) plus
  SQL Server 2022 + OpenLDAP + Traefik. Single OtOpcUa.Host image built once;
  Compose drives OTOPCUA_ROLES + Cluster:* per container to differentiate the
  four hosts. README walks through bring-up + failover smoke + the dev LDAP
  users.

Note: untested on macOS (no local Docker — see docs/v2/dev-environment.md).
2026-05-26 06:46:40 -04:00
Joseph Doherty e40615dad5 feat(install): rewrite Install/Refresh/Uninstall-Services.ps1 for v2 fused Host (Task 62)
- Install-Services.ps1: installs OtOpcUaHost (single fused binary) replacing
  the v1 OtOpcUa + OtOpcUaAdmin pair. Required -Roles param writes OTOPCUA_ROLES
  to the service env so Program.cs decides what to mount (admin / driver / both).
  -HttpPort param (default 9000) writes ASPNETCORE_URLS on admin-role nodes.
  sc.exe restart-on-failure: 5s, 30s, 60s; reset counter after 24h clean run.
  Wonderware historian sidecar install logic preserved from v1.

- Uninstall-Services.ps1: removes OtOpcUaHost + cleans up legacy v1 names
  (OtOpcUa, OtOpcUaAdmin) and the long-retired OtOpcUaGalaxyHost.

- Refresh-Services.ps1: updated service names (OtOpcUa -> OtOpcUaHost), publish
  path (ZB.MOM.WW.OtOpcUa.Server -> ZB.MOM.WW.OtOpcUa.Host), process names
  (OtOpcUa.Server -> OtOpcUa.Host). Switched nssm stop/start calls to
  Stop-Service/Start-Service so the script works whether the underlying
  service was installed via nssm or sc.exe.
2026-05-26 06:44:35 -04:00
Joseph Doherty 1689901c0e docs(v2): Architecture-v2 + Cluster + ControlPlane + Runtime overviews (Task 65)
Four new docs at docs/v2/ giving a single-page tour of each v2 piece:
- Architecture-v2.md: top-level mental model (fused Host + roles + cluster + live-edit)
- Cluster.md: AkkaClusterOptions + IClusterRoleInfo + WithOtOpcUaClusterBootstrap
- ControlPlane.md: 5 admin singletons + DPS topics + deploy flow + failover recovery
- Runtime.md: per-node actor tree + state machines + engine-wiring follow-up map

Each links back to the design doc for depth. Architecture-v2 cross-references
the other three + ServiceHosting + Redundancy + security.
2026-05-26 06:41:48 -04:00
Joseph Doherty 3c3fef911c docs: v2 updates to Redundancy, ServiceHosting, security, README (Task 64)
- Redundancy.md: full rewrite — Akka-leader-driven ServiceLevel replaces
  operator-managed RedundancyRole. Documents the 5-tier ServiceLevelCalculator,
  RedundancyStateActor cluster singleton, and the DPS data flow.

- ServiceHosting.md: full rewrite — single fused OtOpcUa.Host binary with
  OTOPCUA_ROLES env gating. Documents the conditional DI graph and the new
  health endpoints (/health/ready, /health/active, /healthz).

- security.md: v2 banner at top covering path/project renames + new JWT bearer
  + DataProtection persisted to ConfigDb. Body unchanged because the 4-concern
  security model is unchanged in v2; full per-section rewrite waits for F15
  (Admin pages migration) since security.md references many pages that move.

- README.md: platform overview updated to v2 (fused Host + role gating).
2026-05-26 06:38:55 -04:00
Joseph Doherty a8becc9c46 docs(plans): mark Task 59 complete; track F22 failover scenarios 2026-05-26 06:35:03 -04:00
Joseph Doherty 5cfbe8b5dd test(host): deploy happy-path + idempotency integration tests (Task 59)
DeployHappyPathTests exercises the full deploy pipeline on the 2-node harness:
AdminOperationsActor → ConfigPublishCoordinator → DistributedPubSub →
DriverHostActor on both nodes → ApplyAck → coordinator seals. Verifies both
NodeDeploymentState rows reach Applied and Deployment.Status reaches Sealed.

Exposed + fixed two production bugs along the way:

1. Coordinator was publishing DispatchDeployment on the "deployments" topic but
   never subscribed to anything — DriverHostActor ACKs published on the same
   topic could not reach it. Added dedicated "deployment-acks" topic with
   coordinator subscription in PreStart, and DriverHostActor publishes ACKs
   there.

2. NodeId derivation used member.Address.Host only — two cluster members on a
   shared loopback host (test harness, dev VMs) collided to one identity. The
   coordinator's expected-ack set became {1} and the system sealed after only
   half the nodes acked. Switched to host:port everywhere (ClusterRoleInfo +
   coordinator) so loopback nodes stay distinct and production identities are
   harmlessly more specific.

Tests: 95 v2 tests pass (was 93 + 2 deploy tests), 0 skipped.

Failover scenarios (design §8 cases 3-7: node-kill-mid-apply, split-brain,
restart-during-deploy) deferred — they need controlled node-down primitives
on the harness. Tracked as F22 (failover scenario test cases).
2026-05-26 06:34:36 -04:00
Joseph Doherty 62e3cd6599 docs(plans): mark Task 58 complete; track F21 docker-compose follow-up 2026-05-26 06:27:33 -04:00
Joseph Doherty d6fac2d81d test(host): 2-node integration test harness + consolidate to one ActorSystem (Task 58)
Builds TwoNodeClusterHarness: two in-process Host-equivalent nodes sharing
an in-memory ConfigDb. Forms a 2-member Akka cluster. ClusterFormationTests
proves both nodes see each other as admin+driver role members.

Fixes a real production bug uncovered while wiring the harness — Program.cs
ran two separate ActorSystems (one from AddOtOpcUaCluster.AkkaHostedService
with cluster HOCON, one from Akka.Hosting.AddAkka with bare HOCON). Cluster
singletons landed on the bare ActorSystem and could not actually form a
cluster ("Configuration does not contain `akka.cluster` node").

Consolidation:
- AddOtOpcUaCluster now only binds AkkaClusterOptions + registers IClusterRoleInfo
- New WithOtOpcUaClusterBootstrap pushes embedded HOCON + Remote/Cluster options
  into Akka.Hosting's AkkaConfigurationBuilder
- AkkaHostedService.cs deleted — Akka.Hosting now owns the lifecycle
- Program.cs + harness call WithOtOpcUaClusterBootstrap inside AddAkka

Why not WebApplicationFactory<Program>? Program.cs reads OTOPCUA_ROLES from
process env (shared across in-process WAFs); the harness replays Program.cs's
DI graph from a clean WebApplicationBuilder per node with per-node config
overrides. Same production extensions, isolated config + Kestrel + Akka ports.

Tests: 93 v2 tests pass (was 91 + 2 new cluster formation), 0 skipped.
2026-05-26 06:27:04 -04:00
Joseph Doherty bb353c4d43 docs(plans): mark F1, F2, F6, F18, F19 follow-ups complete
Post-compaction batch of 5 follow-ups landed:
- F19 (09d6676): WithOtOpcUaRuntimeActors extension for driver-role spawn
- F1  (463512d): AuthEndpoints integration tests via TestServer
- F6  (dfc143c): RedundancyStateActor broadcast override + un-skip 2 tests
- F18 (b266f63): User.Identity.Name into Deployments createdBy
- F2  (45a8c79): JwtBearer validation via IPostConfigureOptions
2026-05-26 06:18:31 -04:00
Joseph Doherty 45a8c79ffe refactor(security): JwtBearer validation via IPostConfigureOptions (F2)
Eliminates the services.BuildServiceProvider() captive-provider antipattern
(ASP0000) inside AddJwtBearer. The new ConfigureJwtBearerFromTokenService
resolves JwtTokenService from the real DI container at runtime and stays
in lock-step with JwtTokenService.BuildValidationParameters.

All 27 Security.Tests stay green, including the F1 integration tests that
exercise /auth/token through the real bearer pipeline.
2026-05-26 06:18:00 -04:00
Joseph Doherty b266f63cd7 feat(adminui): thread User.Identity.Name into Deployments createdBy (F18)
Injects AuthenticationStateProvider and reads the current user's identity
name on Deploy click, replacing the "(current user)" placeholder.
Anonymous case falls back to "(anonymous)" — should never hit in practice
since the page requires FleetAdmin/ConfigEditor.
2026-05-26 06:17:53 -04:00
Joseph Doherty dfc143cdeb feat(controlplane): RedundancyStateActor broadcast override + un-skip tests (F6)
Mirrors the publisher-injection pattern from FleetStatusBroadcaster and
PeerOpcUaProbeActor: Props accepts an optional Action<object> override so
tests can use a TestProbe sink instead of bootstrapping DistributedPubSub
(unreliable single-node in TestKit).

Un-skips the two RedundancyStateActor tests deferred under F6.
2026-05-26 06:16:32 -04:00
Joseph Doherty 463512d1d8 test(security): AuthEndpoints integration tests via TestServer (F1)
7 tests exercise AddOtOpcUaAuth + MapOtOpcUaAuth end-to-end against an
in-memory ConfigDb + stub ILdapAuthService. Covers /auth/login (204/401/503),
/auth/ping (401/200), /auth/token (200+JWT shape), /auth/logout (204+clear-cookie).

Scope is the auth contract — not the fused Host bootstrap (cluster + role
gating belongs in the Task 58 multi-node harness). HostBuilder + TestServer
is used directly instead of WebApplicationFactory<Program> because the
test project has no Program entry point and Host needs Akka cluster up.
2026-05-26 06:15:07 -04:00
Joseph Doherty 09d6676e1f feat(runtime): WithOtOpcUaRuntimeActors extension for driver-role node startup (F19)
Mirrors WithOtOpcUaControlPlaneSingletons for the driver role. Spawns
DriverHostActor + DbHealthProbeActor on the host's ActorSystem and
registers both under marker keys. Host's Program.cs now calls it when
the node carries the driver role, so driver-only and admin+driver
deployments both auto-bootstrap the per-node actors.

Integration test covers the registration round-trip via Microsoft.Extensions.Hosting
+ Akka.Hosting AddAkka.
2026-05-26 06:09:37 -04:00
Joseph Doherty 698709a578 docs(plans): mark Tasks 56+57 complete 2026-05-26 05:39:07 -04:00
Joseph Doherty 76310b8829 chore(cleanup): delete OtOpcUa.Server, OtOpcUa.Admin, and obsolete v1 tests
Task 56: removes the legacy in-process Server + Admin Web project + their test
projects (Server.Tests, Admin.Tests, Admin.E2ETests). The fused OtOpcUa.Host
binary built across Phases 1-9 is now the sole production entry point.

What happened to the 47 legacy Admin Blazor pages: per follow-up F15, the
v1 architecture's draft/publish UX is replaced by v2's live-edit + snapshot-
deploy model, so a 1:1 migration is not meaningful. The mechanical move via
git mv preserves the history; service classes + page bodies that referenced
removed v1 types (ConfigGeneration, RedundancyRole, GenerationId) were
deleted. AdminUI now ships a minimal Home page + the v2 Deployments page.

Per-page rebuild against the v2 surface is tracked as F15. The v2 Deployments
page (Task 52) is the only first-party UI shipping in this PR.

Task 57: solution build green; 84+ tests green across active v2 + legacy
driver test projects.
2026-05-26 05:38:31 -04:00
Joseph Doherty 2b75ce3876 docs(plans): mark Phase 9 tasks 53-55 complete; track F19/F20 follow-ups 2026-05-26 05:23:48 -04:00
Joseph Doherty 8b4de8080b feat(runtime): DEV-STUB mode for Windows-only drivers on non-Windows or dev role 2026-05-26 05:23:02 -04:00
Joseph Doherty fa1d685ccd feat(host): health endpoints + per-environment appsettings layout 2026-05-26 05:23:01 -04:00
Joseph Doherty e2b357f89a feat(host): role-gated Program.cs composes all v2 components 2026-05-26 05:22:59 -04:00
Joseph Doherty eb4280b7eb docs(plans): add F15-F18 follow-ups for Phase 8 deferred scope 2026-05-26 05:19:02 -04:00
Joseph Doherty 8a1f97b27f docs(plans): mark Phase 8 tasks 48-52 complete; track F15-F18 follow-ups 2026-05-26 05:18:37 -04:00
Joseph Doherty f167808a2c feat(adminui): Deployments page with drift indicator and Deploy button 2026-05-26 05:18:00 -04:00
Joseph Doherty b83f099394 feat(adminui): IFleetDiagnosticsClient skeleton (Akka round-trip tracked as F17) 2026-05-26 05:17:59 -04:00
Joseph Doherty f022499e7f feat(adminui): IAdminOperationsClient backed by ClusterSingletonProxy 2026-05-26 05:17:58 -04:00
Joseph Doherty 26d8f2f620 feat(adminui): FleetStatusHub + AlertHub + MapOtOpcUaHubs (broadcaster bridge tracked as F16) 2026-05-26 05:17:56 -04:00
Joseph Doherty 1a067e609c refactor(adminui): MapAdminUI extension + AddAdminUI DI (47-component migration tracked as F15) 2026-05-26 05:17:55 -04:00
Joseph Doherty 5e31449529 docs(plans): mark Phase 7 tasks 46+47 complete; track F13/F14 full-extraction follow-ups 2026-05-26 05:15:18 -04:00
Joseph Doherty b7c117ab31 feat(opcua): pure Phase7Composer + purity tests (side-effects tracked as F14) 2026-05-26 05:14:45 -04:00
Joseph Doherty 2877a883cd feat(opcua): OpcUaApplicationHost facade in OpcUaServer (full extraction tracked as F13) 2026-05-26 05:14:39 -04:00
Joseph Doherty 2e4f1399bb docs(plans): mark Phase 6 tasks 39-45 complete (race-recovered commit) 2026-05-26 05:10:21 -04:00
Joseph Doherty e31547d00e docs(plans): mark Phase 6 tasks 37-45 complete; track F7-F12 engine-wiring follow-ups 2026-05-26 05:09:52 -04:00
Joseph Doherty 28639cb14d feat(runtime): HistorianAdapter + PeerOpcUaProbe + DbHealthProbe actors (engine wiring tracked as F11/F12) 2026-05-26 05:09:06 -04:00
Joseph Doherty e115f13104 feat(runtime): OpcUaPublishActor on synchronized dispatcher (SDK wiring tracked as F10) 2026-05-26 05:09:04 -04:00
Joseph Doherty 95ef533822 feat(runtime): ScriptedAlarmActor state machine (engine wiring tracked as F9) 2026-05-26 05:09:03 -04:00
Joseph Doherty 39729bfe21 feat(runtime): VirtualTagActor skeleton (engine wiring tracked as F8) 2026-05-26 05:09:01 -04:00
Joseph Doherty 64c627f8d6 feat(runtime): DriverInstanceActor state machine with Connecting/Connected/Reconnecting 2026-05-26 05:05:36 -04:00
Joseph Doherty ed130135ca feat(runtime): DriverHostActor state machine with PreStart recovery + DispatchDeployment + stale fallback 2026-05-26 05:02:42 -04:00
Joseph Doherty ea6f972e96 docs(plans): mark Phase 5 tasks 30-36 complete with commit hashes 2026-05-26 04:57:37 -04:00
Joseph Doherty 52bf4b3371 feat(controlplane): WithOtOpcUaControlPlaneSingletons registration extension (admin role) 2026-05-26 04:57:09 -04:00
Joseph Doherty dd122c4ca9 feat(controlplane): FleetStatusBroadcaster push-driven from cluster events + heartbeats 2026-05-26 04:57:07 -04:00
Joseph Doherty f193872891 feat(controlplane): ConfigPublishCoordinator deadline timeout + failover PreStart recovery 2026-05-26 04:57:05 -04:00
Joseph Doherty bad2aef137 docs(plans): track F5 multi-node coordinator test + F6 RedundancyState publisher refactor 2026-05-26 04:53:32 -04:00
Joseph Doherty 6b37f997ad feat(controlplane): RedundancyStateActor with debounced topology publish 2026-05-26 04:53:31 -04:00
Joseph Doherty 62e12dab95 feat(controlplane): ConfigPublishCoordinator happy path with NodeDeploymentState seeding 2026-05-26 04:53:29 -04:00
Joseph Doherty ef683f5073 feat(controlplane): AdminOperationsActor + ConfigComposer + StartDeployment flow 2026-05-26 04:53:28 -04:00
Joseph Doherty 9f61cd5989 test(controlplane): self-join cluster + DistributedPubSub extension in test harness 2026-05-26 04:53:25 -04:00
Joseph Doherty 9582e448d5 docs(plans): track F4 WrapDetails JSON hardening follow-up 2026-05-26 04:46:18 -04:00
Joseph Doherty 1955bc5f4d docs(plans): mark Task 33+35 partial complete; track F3 audit-idempotency follow-up 2026-05-26 04:44:37 -04:00
Joseph Doherty 23f669c376 feat(controlplane): AuditWriterActor with batched in-buffer-dedup insert 2026-05-26 04:44:01 -04:00
Joseph Doherty 14acab5a58 feat(controlplane): ServiceLevelCalculator + ControlPlane.Tests harness 2026-05-26 04:43:59 -04:00
Joseph Doherty 32574b3e4e docs(plans): track JwtBearer DI antipattern as follow-up F2 2026-05-26 04:39:10 -04:00
Joseph Doherty fc22d4f7b6 docs(plans): track AuthEndpoints integration tests as follow-up F1 2026-05-26 04:37:49 -04:00
Joseph Doherty 973a3d1b9a docs(plans): mark Tasks 24-29 complete in tasks.json 2026-05-26 04:36:17 -04:00
Joseph Doherty 38ea0c5086 test(security): cookie+JWT roundtrip, role mapper, LDAP escape/RDN helpers 2026-05-26 04:35:51 -04:00
Joseph Doherty e38f22e3c2 feat(security): CookieAuthenticationStateProvider for Blazor circuit expiry detection 2026-05-26 04:35:50 -04:00
Joseph Doherty 8be84ba27b feat(security): /auth/login, /auth/ping, /auth/token endpoints 2026-05-26 04:35:49 -04:00
Joseph Doherty 207fc6aba9 feat(security): cookie+JWT hybrid auth via AddOtOpcUaAuth 2026-05-26 04:35:48 -04:00
Joseph Doherty 93316e3431 feat(security): JwtTokenService with HS256 + 15-min expiry 2026-05-26 04:35:46 -04:00
Joseph Doherty 567b8cac1d refactor(security): move LdapAuthService into OtOpcUa.Security library 2026-05-26 04:35:42 -04:00
Joseph Doherty f35925b57e docs(plans): mark Tasks 19-23 complete in tasks.json 2026-05-26 04:31:28 -04:00
Joseph Doherty e0b6d5680b test(cluster): HOCON parses, role parser truth table 2026-05-26 04:31:08 -04:00
Joseph Doherty c217c49f69 feat(cluster): ClusterRoleInfo wraps Akka.Cluster for app-facing role queries 2026-05-26 04:31:07 -04:00
Joseph Doherty dfb06368cd feat(cluster): parse OTOPCUA_ROLES env var with validation 2026-05-26 04:31:06 -04:00
Joseph Doherty f184f8ed1b feat(cluster): AkkaHostedService and DI extension 2026-05-26 04:31:05 -04:00
Joseph Doherty 3d0f4dc168 feat(cluster): embed Akka HOCON config matching ScadaLink tuning 2026-05-26 04:31:03 -04:00
Joseph Doherty fdb4ac7051 docs(plans): mark Tasks 15-18 complete in tasks.json 2026-05-26 04:27:41 -04:00
Joseph Doherty 136234e7f2 feat(commons): add cluster/admin/diagnostics client interfaces 2026-05-26 04:27:19 -04:00
Joseph Doherty 5d3a5a40d7 feat(commons): add deploy/admin/audit/redundancy/fleet message contracts 2026-05-26 04:27:18 -04:00
Joseph Doherty fee4a8c008 feat(commons): add correlation/execution/node/deployment/revisionhash types 2026-05-26 04:26:01 -04:00
Joseph Doherty c168c1c9c6 feat(migration): add Migrate-To-V2.ps1 idempotent migration runner 2026-05-26 04:26:01 -04:00
Joseph Doherty 605dbf3dcc feat(configdb): V2HostingAlignment migration consolidating Phase 1a-1e
Phase 1f — the consolidator migration. Closes out the v2 entity-model
rewrite by emitting a single EF migration that captures the cumulative
schema delta from 14a (RowVersion) through 14e (drop generation entities).

Generated: src/Core/ZB.MOM.WW.OtOpcUa.Configuration/Migrations/
              20260526081556_V2HostingAlignment.cs           (1562 lines)
              20260526081556_V2HostingAlignment.Designer.cs

Migration shape (per `grep -nE migrationBuilder.\(...)`):

  Drop  12 ForeignKey constraints (one per live-edit entity's GenerationId FK)
  Drop  2  Tables  (ConfigGeneration, ClusterNodeGenerationState)
  Drop  45 Indexes (every UX_*_Generation_* and IX_*_Generation_* across the
                    13 live-edit tables — 1 also dropped the unique-Primary
                    filtered index UX_ClusterNode_Primary_Per_Cluster)
  Drop  13 Columns (12 GenerationId + 1 RedundancyRole)
  Add   12 RowVersion columns (one per live-edit entity)
  Create 4  Tables (Deployment, NodeDeploymentState, ConfigEdit,
                    DataProtectionKeys)
  Create ~45 Indexes (recreated under the new naming pattern
                      UX_<Table>_LogicalId / UX_<Table>_<X> with the
                      GenerationId column stripped from composite keys)

Notable EF quirks accepted:
  Unique-on-required-column indexes (UX_VirtualTag_LogicalId etc.) ship a
  `filter: "[VirtualTagId] IS NOT NULL"` clause that EF auto-inserts for
  SQL Server. Harmless — the column is C#-side `required` so NULL never
  appears.

Verification:
  dotnet build src/Core/ZB.MOM.WW.OtOpcUa.Configuration          -> 0 errors
  dotnet ef migrations script --idempotent (against placeholder DSN)
                                                                 -> 3259-line
                                                                    .sql produced
                                                                    OK
  tests/Core/ZB.MOM.WW.OtOpcUa.Configuration.Tests                -> 0 errors

Live `dotnet ef database update` against a scratch SQL Server deferred to
Task 15 (Migrate-To-V2.ps1) — SSH to the docker host needs a key/password I
don't have, and the always-on SQL at 10.100.0.35,14330 uses Integrated
Security (Windows auth, unreachable from this macOS dev). The migration
itself is structurally correct by construction (EF tooling generated it
against the live DbContext model); the live-DB confidence step is the
PowerShell wrapper's job.

SchemaComplianceTests updates:
  - All_expected_tables_exist: removed ConfigGeneration +
    ClusterNodeGenerationState; added Deployment, NodeDeploymentState,
    ConfigEdit, DataProtectionKeys.
  - Filtered_unique_indexes_match_schema_spec: removed entries for
    UX_ClusterNode_Primary_Per_Cluster (Task 14d) and
    UX_ConfigGeneration_Draft_Per_Cluster (Task 14e). Two filtered uniques
    remain (UX_ClusterNodeCredential_Value, UX_ExternalIdReservation_KindValue_Active).
  - Check_constraints_match_schema_spec: added CK_ConfigEdit_FieldsJson_IsJson.

StoredProceduresTests update:
  - Removed RedundancyRole + 'Primary' from the raw INSERT into ClusterNode
    so the DB-backed test runs against the new schema.
2026-05-26 04:18:50 -04:00
Joseph Doherty e00f46d723 refactor(configdb): delete ConfigGeneration + ClusterNodeGenerationState
Phase 1e of the v2 entity-model rewrite. With the FKs gone (Task 14b) and
the apply pipeline replaced (Task 14c), the v1 draft/publish entities have
no remaining v2 consumers.

Deleted entity classes:
  src/Core/ZB.MOM.WW.OtOpcUa.Configuration/Entities/ConfigGeneration.cs
  src/Core/ZB.MOM.WW.OtOpcUa.Configuration/Entities/ClusterNodeGenerationState.cs

Deleted enum classes (no v2 consumers):
  src/Core/ZB.MOM.WW.OtOpcUa.Configuration/Enums/GenerationStatus.cs
  src/Core/ZB.MOM.WW.OtOpcUa.Configuration/Enums/NodeApplyStatus.cs

OtOpcUaConfigDbContext changes:
  - Removed DbSet<ConfigGeneration> ConfigGenerations
  - Removed DbSet<ClusterNodeGenerationState> ClusterNodeGenerationStates
  - Removed ConfigureConfigGeneration(modelBuilder) call + method body
  - Removed ConfigureClusterNodeGenerationState(modelBuilder) call + body
  - Tidied the "v2 deploy-model tables" header comment

Navigation property cleanup:
  - ServerCluster.Generations collection -> removed
  - ClusterNode.GenerationState navigation -> removed

doc-comment cref cleanup (replaced <see cref="X"/> with <c>X</c> for the
deleted types so the C# XML comment compiler doesn't fail with CS1574):
  - Deployment.cs (cref to ConfigGeneration)
  - NodeDeploymentState.cs (cref to ClusterNodeGenerationState)
  - Core/OpcUa/EquipmentNodeWalker.cs (cref to ConfigGeneration in the
    EquipmentNamespaceContent record's doc-comment; while there, removed
    "All four collections are scoped to the same ConfigGeneration" since
    that's no longer true in v2)

Verification:
  src/Core/ZB.MOM.WW.OtOpcUa.Configuration            -> 0 errors
  src/Core/ZB.MOM.WW.OtOpcUa.Core                     -> 0 errors
  tests/Core/ZB.MOM.WW.OtOpcUa.Configuration.Tests    -> 0 errors
  tests/Core/ZB.MOM.WW.OtOpcUa.Core.Tests             -> 0 errors
  whole solution                                       -> 15 errors
    (all in Server/Admin; transitive Server.Tests/Admin.Tests skip per the
    parent's failure, so the per-project count dropped vs Task 14d's 71)
2026-05-26 04:14:55 -04:00
Joseph Doherty 3c915e652e refactor(configdb): drop ClusterNode.RedundancyRole (replaced by Akka leader)
Phase 1d of the v2 entity-model rewrite. The static RedundancyRole column
is replaced by Akka cluster's role-leader-of-"driver" election at runtime
(see RedundancyStateActor + ServiceLevelCalculator in Task 35).

Changes:

  - Removed `public required RedundancyRole RedundancyRole` from
    ClusterNode entity.
  - Removed `e.Property(x => x.RedundancyRole).HasConversion<string>()...`
    mapping from OtOpcUaConfigDbContext.ConfigureClusterNode.
  - Removed the `UX_ClusterNode_Primary_Per_Cluster` filtered unique index
    (filter referenced [RedundancyRole]='Primary').
  - Dropped `using ZB.MOM.WW.OtOpcUa.Configuration.Enums` from ClusterNode.cs
    (no longer needed).
  - Deleted `Enums/RedundancyRole.cs` — the enum is unused in v2-kept code.
  - DraftValidator: dropped the "exactly one Primary per cluster"
    validation block. Comment in place explaining v2 picks primary at
    runtime via Akka.
  - DraftValidatorTests: dropped ValidateClusterTopology_flags_multiple_Primary
    test; reworked BuildNode helper to no longer take a `role` argument.

Untouched (Server + Admin still reference RedundancyRole; accepted broken
per Task 56 policy):

  src/Server/ZB.MOM.WW.OtOpcUa.Server/Redundancy/{ClusterTopologyLoader,
    RedundancyStatePublisher, RedundancyTopology, ServiceLevelCalculator}.cs
  src/Server/ZB.MOM.WW.OtOpcUa.Admin/Services/RedundancyMetrics.cs

DB-runtime tests will fail against the new schema (Task 14f's migration
drops the column) — to be updated in Task 14f's SchemaComplianceTests
update:

  - SchemaComplianceTests.cs:55 (expected filtered index list)
  - StoredProceduresTests.cs:263 (raw INSERT names the column)

Verification:
  src/Core/ZB.MOM.WW.OtOpcUa.Configuration            -> 0 errors
  tests/Core/ZB.MOM.WW.OtOpcUa.Configuration.Tests    -> 0 errors
  whole solution                                       -> 71 errors
    (70 from Task 14b in Server/Admin, +1 new Server/Redundancy reference)
2026-05-26 04:11:57 -04:00
Joseph Doherty 1ddf8bb50e refactor(configdb): delete v1 Apply pipeline (replaced by AdminOperationsActor)
Phase 1c of the v2 entity-model rewrite. Deletes the draft/publish lifecycle
machinery that v2 replaces with AdminOperationsActor + ConfigComposer +
DriverInstanceActor.ApplyDelta.

Deleted (6 files):

  src/Core/ZB.MOM.WW.OtOpcUa.Configuration/Apply/
    IGenerationApplier.cs   — interface for the apply pipeline
    GenerationApplier.cs    — the v1 applier coordinating per-driver hook-back
    GenerationDiff.cs       — typed wrapper over the sp_ComputeGenerationDiff
                              SQL output
    ApplyCallbacks.cs       — per-driver hook surface invoked by the applier
    ChangeKind.cs           — enum {Added, Modified, Removed, Unchanged}

  tests/Core/ZB.MOM.WW.OtOpcUa.Configuration.Tests/GenerationApplierTests.cs

The empty Apply/ directory is removed.

Kept (repurposed in Task 39 for stale-config fallback):

  src/Core/ZB.MOM.WW.OtOpcUa.Configuration/LocalCache/GenerationSealedCache.cs
  src/Core/ZB.MOM.WW.OtOpcUa.Configuration/LocalCache/ResilientConfigReader.cs
  tests/Core/ZB.MOM.WW.OtOpcUa.Configuration.Tests/GenerationSealedCacheTests.cs
  tests/Core/ZB.MOM.WW.OtOpcUa.Configuration.Tests/ResilientConfigReaderTests.cs

Naming rename (GenerationSealedCache -> DeploymentArtifactCache) deferred
to Task 39 (DriverHostActor stale-config fallback) where the consumer is
written. The type stays available under its v1 name until then.

IDriver.cs doc-comment: replaced the "Used by IGenerationApplier..." sentence
with "Invoked by the v2 DriverInstanceActor when ApplyDelta reports that only
this driver's config changed in the new deployment."

Server/Admin breakage from Task 14b unchanged (70 errors). Configuration +
Core.Tests + Configuration.Tests stay green.

  src/Core/ZB.MOM.WW.OtOpcUa.Configuration  -> 0 errors
  tests/Core/ZB.MOM.WW.OtOpcUa.Configuration.Tests  -> 0 errors
  whole solution  -> 70 errors (all in Server/Admin)
2026-05-26 04:09:17 -04:00
Joseph Doherty 13d3aeab09 refactor(configdb): drop GenerationId FK from live-edit entities
Phase 1b of the v2 entity-model rewrite. The design's live-edit model means
the 12 v2 live-edit entities no longer carry a generation scope — they're
edited directly via AdminOperationsActor, with RowVersion (added in Task 14a)
providing last-write-wins detection.

Entity changes (12 files):

  Equipment, DriverInstance, Device, Tag, PollGroup, Namespace,
  UnsArea, UnsLine, NodeAcl, Script, VirtualTag, ScriptedAlarm

  - Removed: public long GenerationId
  - Removed: public ConfigGeneration? Generation (navigation)

DbContext changes (OtOpcUaConfigDbContext.cs):

  - Removed 12 HasOne(x => x.Generation).WithMany().HasForeignKey... mappings
  - Rewrote ~36 indexes: dropped the GenerationId column from each composite
    key, renamed UX_<Table>_Generation_<X> -> UX_<Table>_<X> and
    IX_<Table>_Generation_<X> -> IX_<Table>_<X>. Logical IDs become globally
    unique (UX_<Table>_LogicalId on the LogicalId column alone).
  - Removed Namespace's redundant UX_Namespace_Generation_LogicalId_Cluster
    index (subsumed by the new UX_Namespace_LogicalId).

Core.Tests fixtures (4 files):

  Removed "GenerationId = 1," lines from:
    - PermissionTrieBuilderTests.cs (NodeAcl Row factory)
    - PermissionTrieTests.cs (NodeAcl Row factory)
    - TriePermissionEvaluatorTests.cs (NodeAcl Row factory + 2 gen{1,5}Row
      mutations that test stale-generation evaluation; the trie itself still
      carries a generation tag via PermissionTrie.GenerationId, fed in via
      PermissionTrieBuilder.Build's generationId parameter, so the tests
      still exercise the production code path)
    - EquipmentNodeWalkerTests.cs (Area/Line/Eq/Tag/VirtualTag/ScriptedAlarm
      builders)

Expected breakage (accepted per Task 56 policy):

  src/Server/ZB.MOM.WW.OtOpcUa.Server   ~25 errors  (DriverInstanceBootstrapper,
                                                     AuthorizationBootstrap,
                                                     EquipmentNamespaceContentLoader,
                                                     Phase7Composer, ...)
  src/Server/ZB.MOM.WW.OtOpcUa.Admin    ~45 errors  (VirtualTags.razor,
                                                     ScriptedAlarms.razor,
                                                     DriverInstanceService,
                                                     EquipmentService,
                                                     EquipmentImportBatchService,
                                                     UnsService,
                                                     FocasDriverDetailService,
                                                     ...)

Server.Tests, Admin.Tests, Admin.E2ETests also break transitively (they
project-reference Server/Admin). All deleted in Task 56.

Verification:
  dotnet build src/Core/ZB.MOM.WW.OtOpcUa.Configuration -> 0 errors
  dotnet build tests/Core/ZB.MOM.WW.OtOpcUa.Core.Tests  -> 0 errors
  dotnet build tests/Core/ZB.MOM.WW.OtOpcUa.Configuration.Tests -> 0 errors
  dotnet build (whole solution) -> 70 errors, all in Server/Admin
2026-05-26 04:06:25 -04:00
Joseph Doherty 4bb4ad8acb feat(configdb): add RowVersion to live-edit entities
Phase 1a of the v2 entity-model rewrite. Adds:

  public byte[] RowVersion { get; set; } = Array.Empty<byte>();

and the EF Core mapping

  e.Property(x => x.RowVersion).IsRowVersion();

to 12 live-edit entities:

  Equipment, DriverInstance, Device, Tag, PollGroup, Namespace,
  UnsArea, UnsLine, NodeAcl, Script, VirtualTag, ScriptedAlarm

These are the entities that v2 admins will edit directly via
AdminOperationsActor (no draft staging). RowVersion enables
last-write-wins detection when two operators race on the same row.

GenerationId FKs are still in place on these entities (removed in Task 14b);
this commit only adds the rowversion column so the migration in Task 14f can
emit ADD COLUMN before DROP FK as a single atomic step.
2026-05-26 03:58:58 -04:00
Joseph Doherty 990ce343fe docs(plans): split Task 14 into 14a-14f (entity-model rewrite)
The original Task 14 (5-min EF migration that "drops ConfigGeneration") was
under-scoped: the design doc (live-edit model, ~line 208) requires removing
GenerationId from 13 entities (Equipment, DriverInstance, Device, Tag,
PollGroup, Namespace, UnsArea, UnsLine, NodeAcl, Script, VirtualTag,
ScriptedAlarm) and adding RowVersion columns for last-write-wins detection.
That cascades into GenerationApplier / GenerationDiff / GenerationSealedCache
and the legacy Server/Admin CRUD services.

New decomposition (~85 min total, replacing the original 5-min estimate):

  14a  standard   10m  Add RowVersion to live-edit entities
  14b  high-risk  30m  Drop GenerationId FK from those entities
  14c  high-risk  20m  Obsolete GenerationApplier/Diff/SealedCache
  14d  standard   5m   Drop ClusterNode.RedundancyRole
  14e  small      5m   Delete ConfigGeneration + ClusterNodeGenerationState
  14f  high-risk  15m  Consolidator: generate V2HostingAlignment migration

Policy decision (recorded with user): OtOpcUa.Server + OtOpcUa.Admin are
allowed to fail-to-compile between 14b and Task 56 - only the new v2 projects
need to stay green. Task 56 deletes the legacy projects.

Plan markdown: replaces the original Task 14 section with the 6-task
decomposition + a header explaining the rewrite. Task index table at the
bottom of the plan updated.

Tasks JSON: replaces the single Task 14 row with 6 string-id rows
("14a", "14b", ..., "14f"). Task 15 (Migrate-To-V2.ps1) and downstream
consumers re-pointed at "14f".

Verification step in 14f rewritten to use the shared docker host at
10.100.0.35 per CLAUDE.md (Docker is not installed on this Mac dev VM).
2026-05-26 03:55:48 -04:00
Joseph Doherty 8e2c4f2835 feat(configdb): add Deployment, NodeDeploymentState, ConfigEdit, DataProtectionKey entities
Phase 1 entities for the v2 live-edit + snapshot-deploy model:

  Deployment           — immutable artifact snapshot (replaces v1 ConfigGeneration row)
                         Status enum {Dispatching, AwaitingApplyAcks, Sealed,
                         PartiallyFailed, TimedOut}; carries the SHA256 RevisionHash and
                         the SnapshotAndFlatten() ArtifactBlob; RowVersion for optimistic
                         concurrency.
  NodeDeploymentState  — per-(node, deployment) apply progress row owned by
                         DriverHostActor (replaces single-row ClusterNodeGenerationState).
                         Composite key (NodeId, DeploymentId) gives the
                         ConfigPublishCoordinator the full history it needs to
                         reconstruct in-flight state after a failover.
  ConfigEdit           — append-only audit row written by AdminOperationsActor on every
                         mutating op; optional ExecutionId correlates edits inside one
                         admin transaction (e.g. an import batch).
  DataProtectionKey    — ASP.NET DataProtection key ring storage via
                         IDataProtectionKeyContext so every admin-role node decrypts
                         the same cookies without sharing a filesystem.

OtOpcUaConfigDbContext now implements IDataProtectionKeyContext and registers four new
DbSets + four new ConfigureXxx mappings.

Central package bumps (forced by Microsoft.AspNetCore.DataProtection.EntityFrameworkCore
10.0.7's transitive dep):

  Microsoft.EntityFrameworkCore.{,Design,InMemory,SqlServer}  10.0.0 -> 10.0.7
  Microsoft.Extensions.{Configuration.Abstractions,Configuration.Json,Hosting,Hosting.WindowsServices,Http}  10.0.0 -> 10.0.7

EF migration generation + the ConfigGeneration drop + RedundancyRole column removal are
deferred to Task 14 (high-risk, non-parallelizable).
2026-05-26 03:49:59 -04:00
Joseph Doherty 30a2104fa5 feat(scaffold): introduce 8 v2 component projects
Adds the empty project skeletons that subsequent v2 tasks fill in:

  src/Core/ZB.MOM.WW.OtOpcUa.Commons      (types, interfaces, message contracts)
  src/Core/ZB.MOM.WW.OtOpcUa.Cluster      (Akka.Hosting + cluster wiring)
  src/Server/ZB.MOM.WW.OtOpcUa.Security   (cookie+JWT auth, LDAP)
  src/Server/ZB.MOM.WW.OtOpcUa.ControlPlane (admin-role cluster singletons)
  src/Server/ZB.MOM.WW.OtOpcUa.Runtime    (per-node driver actors)
  src/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer (OPC UA SDK application host)
  src/Server/ZB.MOM.WW.OtOpcUa.AdminUI    (Razor class library)
  src/Server/ZB.MOM.WW.OtOpcUa.Host       (single fused web binary)

Each project sets TreatWarningsAsErrors=true in its own csproj (per the
Directory.Build.props deviation note in the previous commit). NuGetAuditSuppress
entries cover transitive vulnerability advisories the new strictness surfaces:

  - GHSA-g94r-2vxg-569j (OpenTelemetry.Api 1.9.0 via Akka.Cluster.Hosting/Tools)
  - GHSA-h958-fxgg-g7w3 (Opc.Ua.Core 1.5.374.126 via OpcUaServer)
  - GHSA-37gx-xxp4-5rgx + GHSA-w3x6-4m5h-cxqf (legacy advisories already accepted)

OpcUaServer pins OPCFoundation.NetStandard.Opc.Ua.Configuration to 1.5.374.126
via VersionOverride to match Opc.Ua.Server's transitive Opc.Ua.Core (same
constraint as the legacy Server project).

Runtime does NOT project-reference any concrete Driver.* assemblies; drivers
load reflectively at runtime (Phase 6). Runtime gets the IDriver contract
through Core.Abstractions instead.

Host's Microsoft.Extensions.Hosting.WindowsServices is conditional on the
Windows OS so the project builds on macOS dev machines.

Build verification: dotnet build -> 438 warnings (all pre-existing xUnit1051
in legacy Server.Tests/Admin.Tests), 0 errors. Closes Task 9 (build green
smoke check, no separate commit).
2026-05-26 03:44:56 -04:00
Joseph Doherty 2b811477d1 chore(build): introduce central package management for v2
Adds Directory.Packages.props (ManagePackageVersionsCentrally) and
Directory.Build.props (net10.0/nullable/implicit usings/LangVersion latest).
Strips Version attributes from every csproj PackageReference and consolidates
versions into the central file.

Side fixes (necessary to keep the build green on .NET SDK 10.0.105 on macOS):

- Microsoft.CodeAnalysis.CSharp{,.Workspaces}: 5.3.0 -> 5.0.0. The 5.3.0
  analyzer DLL references compiler 5.3.0.0 and the local SDK ships compiler
  5.0.0.0, producing CS9057 on every project that loaded the Analyzers
  output. Master itself was broken on this machine pre-change.
- Server + Server.Tests pin OPCFoundation.NetStandard.Opc.Ua.{Configuration,
  Client} to 1.5.374.126 via VersionOverride, matching Opc.Ua.Server's
  pin. Mixing 1.5.378.106 Opc.Ua.Core transitively with 1.5.374.126
  Opc.Ua.Server breaks CustomNodeManager2 override signatures
  (CS0115 on LoadPredefinedNodes/Browse/HistoryRead*) and CS7069 in
  the tests. The pin disappears when the legacy Server project is
  deleted in Task 56.
- Client.UI + Client.UI.Tests: NuGetAuditSuppress for
  GHSA-xrw6-gwf8-vvr9 (Tmds.DBus.Protocol 0.20.0 reaches both projects
  transitively from Avalonia.Desktop on Linux/macOS only).

Deviation from the plan: TreatWarningsAsErrors=true is NOT set in
Directory.Build.props because the pre-v2 Admin/Server test projects carry
~240 xUnit1051 analyzer warnings that would fail the build. New v2 projects
opt in via their own csproj; the global flag can return once the legacy
projects are deleted in Task 56.
2026-05-26 03:40:24 -04:00
Joseph Doherty fac32ad69b docs(plans): add v2 implementation plan with 66 bite-sized tasks
Converts the akka-hosting-alignment design into an executable plan:
12 phases covering branch/scaffold, ConfigDb schema, Commons,
Cluster, Security, ControlPlane singletons, Runtime per-node actors,
OpcUaServer extraction, AdminUI migration, Host entry point, cleanup,
integration tests, deploy scripts, and docs. Each task has files,
TDD steps, exact commands, classification, time estimate, and
parallelizable list. Co-located .tasks.json drives executing-plans
resume from any session.
2026-05-26 03:17:29 -04:00
Joseph Doherty ef4a70751c docs(plans): add v2 Akka + fused hosting alignment design
Captures the brainstormed design to align OtOpcUa with ScadaLink:
single role-gated binary, Akka.NET cluster with admin/driver roles,
cluster singletons for control plane, per-node actor hierarchy for
OPC UA runtime, dual-endpoint warm redundancy preserved with
ServiceLevel driven by Akka leader, cookie+JWT auth, Traefik routing,
and ScadaLink-style live-edit + deploy model replacing the
draft/publish ConfigGeneration lifecycle.
2026-05-26 03:04:21 -04:00
Joseph Doherty 866dc03fac style(ui): align admin styling with ScadaLink master conventions
- Move CSS into wwwroot/css/ (theme.css, site.css); sidebar 218 -> 220px
- Add hamburger + Bootstrap collapse for <lg viewports
- Add Components/Shared/ with LoadingSpinner, ToastNotification, StatusBadge
- Replace .page-title with flex + <h4 class="mb-0"> across 20 pages
- Convert NewCluster + IdentificationFields forms to card + h6 subsection pattern
2026-05-26 01:12:57 -04:00
Joseph Doherty c6082aa0b9 fix(admin-e2e): register missing DI services so ClusterDetail interactive circuit boots
UnsTabDragDropE2ETests were timing out at the 'UNS Structure' nav-link
locator because AdminWebAppFactory never registered AdminHubConnectionFactory
/ HubTokenService / DataProtection — ClusterDetail.razor's @inject threw at
circuit boot, so the page never advanced past the Loading placeholder. 2 → 3
pass after the registrations land. Also documents the Modbus standard-vs-
exception_injection coverage matrix in the fixture README + cross-references
docs/drivers/AbServer-Test-Fixture.md from each Emulate test so a developer
landing on a skipped test has a direct doc pointer.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-24 01:07:17 -04:00
Joseph Doherty b1f3e09661 test(modbus, abcip): align failing integration tests with fixtures
Two real bugs uncovered by re-running with the new fixture defaults
pointing at the shared docker host. Both are test-side, not driver-side.

AbCip — Driver_reads_seeded_DInt_from_ab_server (4 parametrized rows):
  Hardcoded 'ab://127.0.0.1:{port}/1,0' in the deviceUri instead of
  the resolved fixture.Host. The new 10.100.0.35 default (and any
  AB_SERVER_ENDPOINT override) silently couldn't reach this test —
  the driver tried to connect to a non-existent localhost:44818 and
  returned BadCommunicationError on all 4 profile rows. The sibling
  Emulate tests already use the fixture's resolved endpoint; this
  smoke test was missed in the original migration.

  Fix: deviceUri = $"ab://{fixture.Host}:{fixture.Port}/1,0".

Modbus — Float32_With_CDAB_Roundtrips_Through_Wire:
  Test wrote a Float32 to HR 100 (2 consecutive registers: 100+101).
  standard.json's writable HR range declares [100,100] only — a
  single-cell auto-incrementing register, not a 2-register pair. The
  write to register 101 was rejected with Illegal Data Address
  (BadOutOfRange).

  Fix: moved the tag from HR 100 to HR 200 (in standard.json's
  '[200, 209]' scratch range — 10 consecutive writable HRs). The
  Float32+CDAB semantic the test exercises is unchanged.

Modbus — Block_Read_Coalescing_Reduces_PDU_Count_End_To_End:
  Test read HR 300, 302, 304 — outside both the writable ranges and
  the uint16 seed list. pymodbus rejects reads to unseeded HRs even
  though 'hr size' is 2048. BadOutOfRange on every read.

  Fix: moved the tags from 300/302/304 to 200/202/204 (within the
  scratch range). The non-contiguous coalescing semantic (3 tags
  inside a 5-register window with MaxReadGap=5) is preserved.

After this commit:
  - Modbus.IntegrationTests: 6/38 pass / 32 skip / 0 fail
    (was 4 pass / 32 skip / 2 fail; 32 skips are profile-gated
    ExceptionInjectionTests — they need MODBUS_SIM_PROFILE=
    exception_injection and a different container, intentional gating)
  - AbCip.IntegrationTests: 10/12 pass / 2 skip / 0 fail
    (was 6 pass / 2 skip / 4 fail; 2 skips are Emulate tests that
    need the fixture for separate scenarios)

No driver code changed.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-23 19:45:57 -04:00
Joseph Doherty 49644fc7fd test(fixtures): migrate integration-test fixture defaults to 10.100.0.35
CLAUDE.md "Docker Workflow" claims (per the 2026-04-28 migration note)
that all fixture-class default endpoints were rewritten to target the
shared Docker host at 10.100.0.35. Audit during today's e2e run showed
the claim was incomplete — five fixture classes still defaulted to
localhost / 127.0.0.1, causing every fixture-touching integration test
to skip with "endpoint unreachable" on a fresh box that hadn't set
the override env vars.

Files corrected:

  - tests/.../Modbus.IntegrationTests/ModbusSimulatorFixture.cs
      DefaultEndpoint: localhost:5020 → 10.100.0.35:5020
  - tests/.../S7.IntegrationTests/Snap7ServerFixture.cs
      DefaultEndpoint: localhost:1102 → 10.100.0.35:1102
  - tests/.../OpcUaClient.IntegrationTests/OpcPlcFixture.cs
      DefaultEndpoint: opc.tcp://localhost:50000 → opc.tcp://10.100.0.35:50000
  - tests/.../AbCip.IntegrationTests/AbServerFixture.cs
      Host default + ResolveHost fallback: 127.0.0.1 → 10.100.0.35
  - tests/.../AbLegacy.IntegrationTests/AbLegacyServerFixture.cs
      Host default + ResolveEndpoint fallback: 127.0.0.1 → 10.100.0.35

XML doc comments referencing the old localhost defaults were updated in
the same pass so the class-summary documentation matches the actual
default. The override-via-env-var mechanism (MODBUS_SIM_ENDPOINT,
AB_SERVER_ENDPOINT, AB_LEGACY_ENDPOINT, S7_SIM_ENDPOINT,
OPCUA_SIM_ENDPOINT) is unchanged — pointing at a real PLC or a
locally-running container still works exactly as before.

Verification:
  - Solution-wide dotnet build: 0 errors.
  - S7.IntegrationTests: 3/3 pass without env-var override.
  - OpcUaClient.IntegrationTests: 3/3 pass without env-var override.
  - Modbus.IntegrationTests: 4/38 (same as the env-var-override run —
    the 2 failures + 32 skips are pre-existing fixture-profile
    mismatches unrelated to this fix).
  - AbCip.IntegrationTests / AbLegacy.IntegrationTests: same results
    as the env-var-override run.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-23 19:32:23 -04:00
Joseph Doherty 3d982d9a65 docs: sync against recent code changes
Five doc-content updates after this session's code-review resolution
sweep. No code touched; pure documentation drift correction.

1. docs/reqs/HighLevelReqs.md (HLR-007 — Service Hosting):
   Refreshed the deployment description from "three cooperating
   processes (Server, Admin, Galaxy.Host)" to "two cooperating
   Windows services (Server, Admin)". The legacy x86 TopShelf
   Galaxy.Host process was retired in PR 7.2 (2026-04-30); Galaxy
   access now flows through the in-process Tier-A GalaxyDriver
   talking gRPC to the sibling mxaccessgw gateway. Also called out
   decision #30 (AddWindowsService replacing TopShelf) inline.

2. docs/VirtualTags.md:
   - Line 9: "compiled via Microsoft.CodeAnalysis.CSharp.Scripting"
     replaced with the current pipeline (Microsoft.CodeAnalysis.CSharp
     regular compiler — Core.Scripting-008 / -016 retired the
     CSharpScript/ScriptRunner path).
   - Line 39: orphan-thread leak description rewritten. The
     CSharp.Scripting-era "underlying ScriptRunner keeps running on
     its thread-pool thread until the Roslyn runtime returns" is no
     longer accurate — the new pipeline binds the script as a
     regular C# Func<> delegate, so the leak is now "synchronous
     CPU-bound work on a pool thread" (same operator-visible
     effect, different mechanism).

3. docs/v2/plan.md decision #29 ("Galaxy Host is a separate Windows
   service"):
   Annotated both the decision body and the decision-log table row
   with "Reversed PR 7.2, 2026-04-30" + a one-line summary of the
   replacement architecture. The original reasoning is preserved as
   audit trail per the decision-log convention.

4. docs/v2/implementation/phase-7-scripting-and-alarming.md A.1:
   Added an Implementation note describing the
   Core.Scripting-008 / -016 supersession of the original
   CSharpScript pipeline. The historical record stays; the note
   points future readers at docs/VirtualTags.md "Compile cache"
   for the current contract.

5. docs/plans/alarms-over-gateway.md "Files" section under client
   regeneration:
   Updated the .NET regeneration instructions to point at the new
   ZB.MOM.WW.MxGateway.Contracts.csproj path. The old
   clients/dotnet/MxGateway.Client.csproj no longer exists in the
   sibling repo (restructure after this plan was written) and the
   vendored-binaries situation in
   src/Drivers/ZB.MOM.WW.OtOpcUa.Driver.Galaxy/libs/ is called out
   so a reader following the plan won't chase a deleted path.

Verification: grep against docs/ for the pre-fix wordings ("three
cooperating processes", "Galaxy.Host (TopShelf)", "ScriptRunner",
the wrong BadDeviceFailure hex code 0x80550000) returns no hits.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-23 18:57:04 -04:00
Joseph Doherty 23d59d73f2 fix(scripting+alarms): close remaining re-review findings
Single commit covering the four small/medium fixes from the updated
code review.

Core.Scripting-014 (Medium, Concurrency):
  CompiledScriptCache.Clear() used the key-only TryRemove(key, out var
  lazy) overload — same race shape Core.Scripting-006 closed in
  GetOrCompile's catch block. A concurrent re-add between snapshot and
  TryRemove was evicted + disposed while the new caller still held it.
  Replaced with the value-scoped TryRemove(KeyValuePair<,>) overload.
  Regression test
  Clear_uses_value_scoped_TryRemove_so_a_race_inserted_entry_survives
  added.

Core.Scripting-013 (Medium, Security):
  Hand-rolled BuildWrapperSource pastes user source between literal
  braces; brace-balanced source could inject sibling methods/classes
  alongside CompiledScript.Run. Analyzer still walked the injected
  members so it wasn't a direct escape, but it relaxed the documented
  'method body' authoring contract. Added EnforceSingleRunMember:
  after ParseText, the compilation unit must hold exactly one type
  (CompiledScript) and that type must hold exactly one member (the Run
  method). Any deviation throws CompilationErrorException with LMX001/
  LMX002 diagnostic IDs and a Core.Scripting-013 reference in the
  message. Two regression tests added covering the sibling-method and
  sibling-class injection vectors.

Core.Scripting-015 (Low, Correctness, latent):
  ToCSharpTypeName's generic branch truncated at the first backtick via
  IndexOf, silently dropping closed args of nested-generic shapes
  (Outer<T>.Inner<U>). No production caller exercises this shape today
  (all TContext/TResult are top-level non-nested), so the bug was
  latent. Rewrote the generic branch to walk the FullName segment-by-
  segment, consuming generic args per segment so nested shapes emit
  valid C# (global::Ns.Outer<T>.Inner<U> rather than the broken
  Outer<T,U>).

Core.ScriptedAlarms-013 (Low, Documentation):
  The internal test accessors TryGetScratchReadCacheForTest /
  TryGetScratchContextForTest return live mutable scratch refilled in
  place under _evalGate. XML docs didn't warn future test authors about
  the synchronization contract. Added a <remarks> block to each
  documenting the only-safe-on-quiesced-engine + identity-or-single-key
  contract.

Verification (suites green):
  Core.Scripting.Tests: 110/110 (was 107 — +3 new rejection/race tests)
  Core.ScriptedAlarms.Tests: 67/67 (unchanged — doc-only fix)
  Core.VirtualTags.Tests: 57/57 (unchanged)

After this commit, all 12 findings from the updated re-review are
closed (10 Resolved, 1 Won't Fix none, 1 Deferred — Driver.Galaxy-017).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-23 18:00:59 -04:00
Joseph Doherty c2abbf45bd fix(driver-galaxy): align package versions + record vendored-DLL provenance
Driver.Galaxy-015, -016, -017, -018 resolution (one logical change set).

Driver.Galaxy-016 (Medium, Perf/Resource):
  Reconciled the csproj PackageReferences with what the vendored
  MxGateway.Client.dll was actually built against, verified by
  reflecting Assembly.GetReferencedAssemblies() on the DLL:
    - Polly 8.5.2  →  Polly.Core 8.6.6
      (most consequential — Polly v7 fluent API vs Polly.Core v8
       resilience-pipeline API are DIFFERENT packages; the DLL was
       built against Polly.Core so the prior Polly reference would
       have failed at runtime with MissingMethodException the first
       time the gateway client's retry pipeline ran)
    - Grpc.Net.Client 2.71.0  →  2.76.0  (matches sibling Server/Worker)
    - Microsoft.Extensions.Logging.Abstractions 10.0.0  →  10.0.7
  Google.Protobuf 3.34.1 and Grpc.Core.Api 2.76.0 already matched —
  left unchanged.

Driver.Galaxy-015 (re-triaged from Medium-Security → Low-Documentation):
  Original framing was a security concern about unknown-provenance
  binaries. User clarified the DLLs are their own code, built from
  their own mxaccessgw project, not third-party. Re-triaged to a
  documentation / audit-trail concern. Fix:
    - Added a Provenance section to libs/README.md recording the
      source-commit SHA (dd7ca1634e2d2b8a866c81f0009bf87ee9427750,
      extracted from the AssemblyInformationalVersion baked into
      both DLLs by the original build) and SHA-256 checksums.
    - Documented the re-verification recipe (sha256sum + ilspycmd
      | grep AssemblyInformationalVersion).
  Recommendations about .gitattributes and CI hash-check deferred —
  the DLLs are frozen until an unwinding path is taken, so adding
  LFS or CI infrastructure now would need removal at unwinding.

Driver.Galaxy-018 (Low, Documentation):
  Most of the recommendation folded into the libs/README.md rewrite
  (pointed at sibling Server/Worker csproj as the live version source
  rather than the deleted MxGateway.Client.csproj; recorded source
  commit + SHA-256). <SpecificVersion>false</SpecificVersion> on the
  <Reference> items intentionally not added — MSBuild's default for
  HintPath references with bare-name Include attributes is already
  SpecificVersion=false, so explicitly setting it would be cosmetic
  without changing behaviour.

Driver.Galaxy-017 (Low, Design) — Deferred:
  Recommendation part (b) (record mxaccessgw source-commit SHA in
  libs/README.md) is satisfied by Driver.Galaxy-015's resolution.
  Parts (a) and (c) — a GetVersion RPC at session-open and a parity
  test against the live gateway's proto descriptor — are substantial
  new RPC + plumbing work not in scope for this code-review sweep.
  The risk surface is bounded because either of the libs/README.md
  unwinding paths closes the vendoring + this concern naturally.
  Re-open if neither path is taken within the next quarter and the
  live gateway evolves its proto under the driver.

Verification:
  - Build clean (Driver.Galaxy.csproj 0 errors, 0 warnings).
  - Driver.Galaxy.Tests: 245/245 pass against the corrected
    package set.
  - Solution-wide build remains clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-23 17:45:24 -04:00
Joseph Doherty 3a53d03d23 fix(scripting): block ThreadPool/Timer/AssemblyLoadContext in sandbox
Core.Scripting-012 (High, Security) resolution.

The Core.Scripting-008 rewrite broadened the BCL references list from a
narrow allow-list to the full System.* + netstandard +
Microsoft.Win32.Registry set, delegating the security gate entirely to
ForbiddenTypeAnalyzer. Three categories of dangerous BCL types were
reachable from script source without a deny-list entry:

  - System.Threading.ThreadPool — QueueUserWorkItem re-introduces the
    background-fanout threat Core.Scripting-003 closed against
    System.Threading.Tasks.
  - System.Threading.Timer — schedules unbounded callback work that
    outlives the per-evaluation timeout.
  - System.Runtime.Loader.AssemblyLoadContext — loads arbitrary DLLs.
    Defense-in-depth gap; invocation needs reflection (already denied)
    but the load itself was reachable.

Fix:
  - Added 'System.Runtime.Loader' to ForbiddenNamespacePrefixes
    (preferred over type-granular per the recommendation so future BCL
    additions to that namespace are denied by default).
  - Added 'System.Threading.ThreadPool' and 'System.Threading.Timer'
    to ForbiddenFullTypeNames — both live in System.Threading shared
    with allowed primitives so they must be type-granular.

Regression tests added to ScriptSandboxTests:
  Rejects_ThreadPool_QueueUserWorkItem_at_compile
  Rejects_Timer_new_at_compile
  Rejects_AssemblyLoadContext_at_compile

Docs:
  docs/v2/implementation/phase-7-scripting-and-alarming.md decision #6
  and the Sandbox-escape compliance-check row both updated to enumerate
  the new entries per the Core.Scripting-009 doc-sync convention.

Two lower-impact suggestions from the finding's recommendation
(System.Console, CultureInfo.DefaultThreadCurrentCulture) were
intentionally not addressed and are recorded as accepted minor risks
in the resolution.

Verification: Core.Scripting.Tests 107/107 (was 104 + 3 new rejection
tests).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-23 17:39:20 -04:00
Joseph Doherty fb7c6c7046 fix(scripting): route engines through CompiledScriptCache (Core.Scripting-016)
Both VirtualTagEngine.Load and ScriptedAlarmEngine.LoadAsync were calling
ScriptEvaluator.Compile directly, bypassing CompiledScriptCache. The
Core.Scripting-008 collectible-ALC fix wired Dispose only through the cache's
Clear()/Dispose(), so the per-publish accretion the -008 fix was meant to
eliminate was still in effect on the actual production path — the headline
'no more restarts needed' guarantee wasn't delivered.

Resolution:
  - VirtualTagEngine + ScriptedAlarmEngine each gained a private
    CompiledScriptCache<TContext, TResult> instance.
  - Both Load methods now call _compileCache.GetOrCompile(source).
  - Publish-replace path: _compileCache.Clear() runs alongside the existing
    _tags / _alarms clears so the prior generation's ALCs are disposed
    before recompile.
  - Engine Dispose now calls _compileCache.Dispose() so shutdown actually
    releases the emitted assemblies.

Side-fix in CompiledScriptCache: Dispose() set _disposed=true then called
Clear(), but Clear() had a pre-existing 'if (_disposed) return' guard that
aborted the drain unconditionally — making the Dispose-triggered cleanup a
silent no-op. Removed the disposed-guard on Clear() (clearing an empty/
cleared cache is idempotent).

Side-fix in ScriptedAlarmEngine.Dispose: cleared _alarms AFTER the
Task.WhenAll drain. The drain guarantees no background callback is mid-
flight, so clearing is safe. Previously _alarms was deliberately NOT
cleared on Dispose (per Core.ScriptedAlarms-005), but that left the
AlarmState records holding TimedScriptEvaluator → ScriptEvaluator → delegate
references that rooted the emitted assemblies, defeating the cache's
Dispose work on the engine side.

Regression tests:
  - VirtualTagEngineTests.Dispose_unloads_compiled_script_assembly
  - ScriptedAlarmEngineTests.Dispose_unloads_compiled_predicate_assembly
  Both use WeakReference + bounded GC.Collect() to prove the emitted
  assembly is reclaimable after engine.Dispose(). The alarms test had to
  be synchronous (not 'async Task<WeakReference>') because async state
  machines capture locals as state-struct fields, keeping them alive past
  the method's apparent end and defeating GC.

Verification:
  - Core.Scripting.Tests: 104/104 (unchanged).
  - VirtualTags.Tests: 57/57 (was 56 — +1 unload test).
  - ScriptedAlarms.Tests: 67/67 (was 66 — +1 unload test).
  - All other consumer suites still green.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-23 17:33:34 -04:00
Joseph Doherty a6ae4e22d1 fix(status-codes): correct BadDeviceFailure from 0x80550000 to 0x808B0000
Driver.Cli.Common-007 + Driver.Cli.Common-008 resolution.

Driver.Cli.Common-007 (High, Correctness):
  0x80550000 is the canonical OPC UA spec value for BadSecurityPolicyRejected,
  not BadDeviceFailure. The correct spec value for BadDeviceFailure is
  0x808B0000 (verified against OPC Foundation Opc.Ua.StatusCodes;
  corroborated locally by Driver.Galaxy.Runtime.StatusCodeMap and both
  Wonderware historian quality mappers which all hand-pin the correct
  value).

  The bug was duplicated across six driver modules:
    - FocasStatusMapper.BadDeviceFailure
    - AbCipStatusMapper.BadDeviceFailure
    - AbLegacyStatusMapper.BadDeviceFailure
    - TwinCATStatusMapper.BadDeviceFailure
    - ModbusDriver.StatusBadDeviceFailure
    - S7Driver.StatusBadDeviceFailure
  Plus the SnapshotFormatter shortlist that named 0x80550000 as
  BadDeviceFailure, and three downstream Modbus tests that asserted
  against the wrong value (so CI was blind).

  This commit fixes all six native-mapper constants, the formatter
  shortlist, and the three Modbus tests in one pass. Added a regression
  guard to FormatStatus_does_not_apply_pre_fix_wrong_names that pins
  0x80550000 never renders as BadDeviceFailure (mirroring the existing
  -001 wrong-name guards).

  Behavior change: OPC UA clients consuming the native drivers now see
  the canonical BadDeviceFailure (0x808B0000) on device-fault paths
  instead of the misnamed BadSecurityPolicyRejected (0x80550000). Wire-
  level status semantics now match operator-facing CLI labels.

Driver.Cli.Common-008 (Low, Testing):
  Deleted the redundant FormatStatus_names_native_driver_emitted_codes
  Theory — its five InlineData rows were already covered by the
  well-known Theory in the same commit (5a9c459), and used a weaker
  ShouldContain vs the well-known Theory's ShouldBe (exact match).

Verification:
  - Driver.Cli.Common.Tests: 43/43 pass (was 48 after the -008 deletion).
  - Driver.Modbus.Tests: 263/263 pass.
  - Driver.AbCip.Tests: 262/262.
  - Driver.AbLegacy.Tests: 157/157.
  - Driver.FOCAS.Tests: 178/178.
  - Driver.S7.Tests: 112/112.
  - Driver.TwinCAT.Tests: 131/131.
  Total: 1146 tests across the affected modules, all green.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-23 17:14:28 -04:00
Joseph Doherty 41e62b2663 docs(code-reviews): updated re-review at commit a9be809 — 12 new findings
Re-reviewed the four modules with source changes since the previous review
commit 76d35d1, per REVIEW-PROCESS.md section 6. Updated each findings.md
header (date 2026-05-23, commit a9be809) and appended new findings under
continued numbering. Regenerated README.md.

## New findings — 12 total across 4 modules

### Core.Scripting (5 new, IDs -012 to -016)
- **-012 High Security** — broadened BCL references (System.* + netstandard)
  re-expose System.Threading.ThreadPool / Timer / AssemblyLoadContext, which
  the analyzer's deny-list doesn't cover. Re-introduces the background-work
  threat Core.Scripting-003 closed via System.Threading.Tasks deny.
- **-013 Medium Security** — hand-rolled wrapper-source generation lets
  brace-balanced user source inject sibling methods/classes alongside
  CompiledScript.Run. Analyzer still gates forbidden types, but the
  documented 'method body' authoring contract is silently relaxed.
- **-014 Medium Concurrency** — CompiledScriptCache.Clear() uses key-only
  TryRemove(key, out _) — the same race the -006 resolution fixed in
  GetOrCompile's catch is latent here on publish-replace.
- **-015 Low Correctness** — ToCSharpTypeName truncates at first backtick;
  silently drops closed type arguments of nested-generic shapes (Outer<>.Inner<>).
  Latent — no production caller uses this shape today.
- **-016 Medium Performance** — VirtualTagEngine + ScriptedAlarmEngine call
  ScriptEvaluator.Compile directly without going through CompiledScriptCache,
  so the headline -008 collectible-ALC fix doesn't run on the actual
  production path — the per-publish leak is still in effect.

### Core.ScriptedAlarms (1 new, ID -013)
- **-013 Low Documentation** — new internal test accessors return the live
  mutable scratch dictionary; XML docs don't warn future test authors about
  the synchronisation contract.

### Driver.Cli.Common (2 new, IDs -007, -008)
- **-007 High Correctness** — 0x80550000 was added as BadDeviceFailure but
  the real OPC UA spec value for BadDeviceFailure is 0x808B0000 (verified
  against Driver.Galaxy.Runtime.StatusCodeMap and HistorianQualityMapper,
  both of which use the correct 0x808B0000). 0x80550000 is actually
  BadSecurityPolicyRejected. The native mappers (FOCAS / AbCip / AbLegacy)
  all use the wrong 0x80550000; this session's SnapshotFormatter extension
  propagated the wrong name and the test asserts against the same wrong
  value so CI is blind — same shape of bug as Driver.Cli.Common-001.
- **-008 Low Testing** — new FormatStatus_names_native_driver_emitted_codes
  Theory is redundant with the existing well-known Theory (same five
  InlineData rows added to both) and uses weaker ShouldContain assertion
  than the well-known Theory's ShouldBe.

### Driver.Galaxy (4 new, IDs -015 to -018)
- **-015 Medium Security** — vendored DLLs (libs/) have no recorded
  provenance: no source-commit SHA from the mxaccessgw repo, no SHA-256
  checksum in libs/README.md. Tampering / accidental swap undetectable.
- **-016 Medium Performance** — version skew between declared
  PackageReferences (Polly 8.5.2 / Grpc.Net.Client 2.71.0 /
  Microsoft.Extensions.Logging.Abstractions 10.0.0) and what the vendored
  DLL was actually built against (Polly.Core 8.6.6 / Grpc.Net.Client
  2.76.0 / Microsoft.Extensions.Logging.Abstractions 10.0.7). Latent now
  (assembly-version refs are loose) but precise shape that produces a
  runtime MissingMethodException.
- **-017 Low Design** — no contract-version handshake between the driver
  and the gateway; proto could evolve under the gateway without the
  driver noticing.
- **-018 Low Documentation** — libs/README.md points at the wrong sibling
  csproj as the version source-of-truth; missing SpecificVersion=false
  on the Reference items; missing mxaccessgw source-commit SHA.

## Particularly notable

Two findings undercut commits from this session:

- Driver.Cli.Common-007 invalidates commit 5a9c459 (which named 0x80550000
  as BadDeviceFailure across the cross-CLI shortlist).
- Core.Scripting-016 invalidates the production effect of commit 7b6ab2e
  (the collectible-ALC fix wired Dispose only via CompiledScriptCache,
  which the engines don't use).

The wider native-mapper miscoding behind -007 also affects three driver
modules outside this session's edit scope (FocasStatusMapper,
AbCipStatusMapper, AbLegacyStatusMapper all carry the wrong code).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-23 17:02:47 -04:00
Joseph Doherty a9be80923c docs(v2-release-readiness): drop stale 'this branch task-galaxy-e2e' ref
The task-galaxy-e2e branch was merged + deleted; the durable reference
is PR #205 alone. Tidies a dangling pointer that future readers might
chase looking for a branch that no longer exists.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-23 16:50:06 -04:00
Joseph Doherty 994997ba7b fix(driver-galaxy): vendor MxGateway.Client + MxGateway.Contracts as binary refs
The sibling mxaccessgw repo restructured: clients/dotnet/MxGateway.Client
no longer exists, and the proto contracts moved to a new namespace
(ZB.MOM.WW.MxGateway.Contracts.Proto, was MxGateway.Contracts.Proto). The
driver's source still expects the pre-restructure namespace, so the
broken ProjectReference produced 86 build errors in src/ + 1 in tests/
on master.

Resolution: vendor the last known-good build of MxGateway.Client.dll
(99 KB, May 22) and MxGateway.Contracts.dll (490 KB, May 23) under
src/Drivers/.../Driver.Galaxy/libs/, reference them via <Reference
HintPath=...> in both the driver and its test csproj, and declare the
NuGet packages the dropped ProjectReference was supplying transitively
(Google.Protobuf, Grpc.Core.Api, Grpc.Net.Client,
Microsoft.Extensions.Logging.Abstractions, Polly) at versions matching
the sibling repo's ZB.MOM.WW.MxGateway.Contracts.csproj so binary
compatibility is preserved.

Why this over a source migration:
  Source migration would require namespace renames across ~19 driver
  files PLUS reimplementing MxGatewayClient / MxGatewaySession /
  GalaxyRepositoryClient (~2,200 LoC) — the sibling repo dropped the
  client library entirely, keeping only the proto contracts. Vendoring
  the last known-good binaries unblocks the build in minutes, freezes
  the gateway contract surface at a known-good version, and preserves
  the option to migrate properly once the sibling repo decides whether
  to restore a client library or hand the work back to us.

libs/README.md documents the unwinding plan (either path closes the
debt: sibling restores a client library, or driver migrates to the new
contracts namespace + reimplements the client wrapper).

Verification:
  - dotnet build ZB.MOM.WW.OtOpcUa.slnx: 0 errors (was 87).
  - Driver.Galaxy unit tests: 245/245 pass.
  - Integration tests not run here (require a live mxaccessgw gateway).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-23 16:32:56 -04:00
Joseph Doherty 0001cdd579 fix(scripted-alarms): reuse per-alarm evaluation scratch on the hot path
Core.ScriptedAlarms-009 resolution: replace the per-call Dictionary +
AlarmPredicateContext allocation with a per-alarm reusable AlarmScratch
held in _scratchByAlarmId, refilled in place under _evalGate on each
evaluation. The hot path no longer allocates per upstream tag change.

Why this matters:
  On a busy line where many tags feeding many alarms change frequently,
  the old BuildReadCache allocated a fresh dictionary + context on every
  predicate evaluation — a steady stream of short-lived allocations the
  GC eventually has to reclaim. With the reuse, the dictionary and
  context are allocated once per alarm (on first evaluation) and refilled
  in place across every subsequent re-eval.

Implementation:
  - New private AlarmScratch class holds the reusable
    Dictionary<string, DataValueSnapshot> read cache (pre-sized to the
    alarm's Inputs.Count) and the AlarmPredicateContext that wraps it by
    reference. The context observes refilled values without being
    re-created.
  - ConcurrentDictionary<string, AlarmScratch> _scratchByAlarmId on the
    engine, cleared in LoadAsync alongside _alarms so a config-publish
    drops the prior generation's scratch (Inputs / Logger may change).
  - EvaluatePredicateToStateAsync looks up scratch via GetOrAdd, calls
    the new RefillReadCache(Dictionary, IReadOnlySet) helper to clear +
    repopulate the dictionary in place, then runs the predicate against
    the reused context.
  - BuildReadCache removed.

Safety:
  Reuse is serialised under _evalGate which guarantees no two threads
  ever observe the same scratch in a half-refilled state. The
  AlarmPredicateContext is bound to the scratch dictionary by reference,
  so the predicate's ctx.GetTag(path) sees the freshly-refilled values
  rather than a stale snapshot.

Verification:
  - All 66 ScriptedAlarms tests pass (was 63 — three new regression tests
    locking the reuse contract).
  - All 56 VirtualTags tests still pass (unchanged).
  - All 104 Core.Scripting tests still pass (unchanged).

New tests in ScriptedAlarmEngineTests:
  - Reevaluation_reuses_the_same_read_cache_dictionary — asserts
    ReferenceEquals(scratch_before, scratch_after) across two
    evaluations of the same alarm.
  - Reevaluation_reuses_the_same_predicate_context — same, for the
    context.
  - LoadAsync_drops_the_prior_generations_scratch — asserts a config
    publish wipes the prior scratch (so a stale Logger / Inputs can't
    leak into the new generation).

Internal test hooks TryGetScratchReadCacheForTest /
TryGetScratchContextForTest added via the existing
InternalsVisibleTo for the tests project. Kept internal — not part of
the public engine surface.

Docs:
  - docs/v2/Galaxy.Performance.md "Scripted-alarm engine" section
    rewritten as "hot-path allocation reuse" documenting the new
    contract + reuse safety reasoning + the three regression tests.
  - code-reviews/Core.ScriptedAlarms/findings.md -009 flipped
    Won't Fix → Resolved.
  - code-reviews/README.md regenerated.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-23 16:10:09 -04:00
Joseph Doherty 7b6ab2ec6f fix(scripting): unload compiled-script assemblies via collectible ALC
Core.Scripting-008 resolution: replace the legacy CSharpScript.CreateDelegate
path with hand-rolled CSharpCompilation + Emit + collectible AssemblyLoadContext,
so per-publish compile accretion no longer requires a server restart to reclaim.

Why this was needed:
  Roslyn's CSharpScript path emits dynamically-compiled script assemblies into
  the default AssemblyLoadContext, which is non-collectible. Across config-
  publish generations each Clear() drops dictionary entries but the emitted
  assemblies stay loaded for process lifetime, so memory grows steadily on
  long-running servers with frequent publishes. The accepted-limitation note
  in docs/VirtualTags.md recommended scheduled restarts as the workaround;
  operator feedback was that restarts are difficult, so the underlying
  limitation was the right thing to fix.

Implementation:
  - New ScriptAssemblyLoadContext(name, isCollectible: true) hosts one emitted
    script assembly per evaluator.
  - ScriptEvaluator.Compile synthesises a wrapper class around the user source
    (CompiledScript.Run(globals) — explicit return required per ordinary C#
    semantics, which every existing script already uses), builds a
    CSharpCompilation against the sandbox references, runs the
    ForbiddenTypeAnalyzer over the semantic model unchanged, emits to an
    in-memory PE stream, loads via ScriptAssemblyLoadContext.LoadFromStream,
    and binds a strongly-typed Func<ScriptGlobals<TContext>, TResult> delegate
    via reflection.
  - ScriptEvaluator now implements IDisposable — Dispose calls
    AssemblyLoadContext.Unload(), which makes the emitted assembly eligible
    for GC at the next collection cycle.
  - CompiledScriptCache.Clear() disposes every materialised evaluator before
    dropping its dictionary entry; CompiledScriptCache itself is now
    IDisposable for graceful server shutdown.
  - ScriptSandbox.Build returns a new SandboxConfig (References + Imports)
    instead of a Roslyn ScriptOptions; references now span BCL via the
    TRUSTED_PLATFORM_ASSEMBLIES set filtered to System.* + netstandard +
    Microsoft.Win32.Registry, so forbidden BCL types resolve at compile and
    ForbiddenTypeAnalyzer is the sole security gate (consistent with the
    Core.Scripting-001 / -002 model — references-list-only restriction is
    porous against type forwarding, so the analyzer must be the real gate).

Verification:
  - All 104 Core.Scripting tests pass (was 101 — three new regression tests
    locking the unload contract).
  - All 56 VirtualTags tests pass (unchanged).
  - All 63 ScriptedAlarms tests pass (unchanged).
  - New CompiledScriptCacheTests:
    - Dispose_unloads_compiled_script_assembly_load_context — proves single-
      evaluator ALC unload via WeakReference + bounded GC.Collect() loop.
    - Clear_disposes_every_materialised_evaluator — proves publish-replace
      releases every prior generation's ALC.
    - GetOrCompile_after_Dispose_throws_ObjectDisposedException — locks the
      post-dispose contract.

Docs:
  - docs/VirtualTags.md "Compile cache" section rewritten: the accepted-
    limitation note replaced with the unload contract + the new authoring
    convention (explicit return).
  - docs/ScriptedAlarms.md cross-reference updated to drop the obsolete
    restart guidance.
  - code-reviews/Core.Scripting/findings.md Core.Scripting-008 flipped
    Won't Fix → Resolved with the implementation summary.
  - code-reviews/README.md regenerated.

Pre-existing breakage note: Driver.Galaxy fails the solution-wide build on
master because its ProjectReference to the sibling mxaccessgw repo's
MxGateway.Client targets a path that the sibling repo no longer has after a
recent restructuring. This is unrelated to Core.Scripting-008 and was
verified to exist on master before this branch was cut.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-23 15:55:04 -04:00
Joseph Doherty 5a9c4591b9 fix(cli-common): name native-driver-emitted status codes in SnapshotFormatter
Driver.FOCAS.Cli-005 follow-up: extend the SnapshotFormatter.FormatStatus
shortlist with the five Bad* codes the native-protocol mappers (FOCAS,
AbCip, AbLegacy) emit but which the shortlist previously left unnamed,
so they rendered only as severity-class 'Bad' instead of the documented
'BadDeviceFailure' / 'BadNotWritable' / ... names operators are told to
read off probe/write output.

Added entries:
  0x80020000 BadInternalError
  0x803B0000 BadNotWritable
  0x803C0000 BadOutOfRange
  0x803D0000 BadNotSupported
  0x80550000 BadDeviceFailure

(BadTimeout 0x800A0000 was already added under Driver.Cli.Common-001.)

Tests: SnapshotFormatterTests gains a new [Theory]
FormatStatus_names_native_driver_emitted_codes covering the five names,
and the existing well-known [Theory] is extended with the same entries
to enforce exact '0x... (Name)' rendering. Suite now 47 green (was 42).

Flips Driver.FOCAS.Cli-005 from Deferred to Resolved; README regenerated.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-23 15:14:08 -04:00
Joseph Doherty 0f8ce1cb80 docs(code-reviews): regenerate index — final batch — 6 Low findings resolved
Batch 7 closed the last Open findings in Client.UI. The review backlog
is now empty: 0 Open findings across all 31 modules.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-23 11:25:28 -04:00
Joseph Doherty 1b10194634 fix(client-ui): resolve Low code-review findings (Client.UI-003,004,006,009,010,011)
- Client.UI-003: wire Serilog properly per CLAUDE.md — console sink +
  rolling daily file sink in Program.Main, Log.CloseAndFlush in finally,
  per-VM Log.ForContext<> loggers.
- Client.UI-004: migrate the cert-store folder picker from the obsolete
  OpenFolderDialog to StorageProvider.OpenFolderPickerAsync (with
  TryGetFolderFromPathAsync seed + TryGetLocalPath extraction).
- Client.UI-006: surface formerly silent catch blocks via an observable
  StatusMessage on the Subscriptions / Alarms VMs that bubbles up into
  the shell's status bar; soft fallbacks log at Information level so
  hard failures stay distinguishable.
- Client.UI-009: docs/Client.UI.md now lists Standard Deviation in the
  Aggregate row of the Query Options table.
- Client.UI-010: removed the unused MinDateTimeProperty /
  MaxDateTimeProperty styled properties from DateTimeRangePicker.
- Client.UI-011: updated the cert-store TextBox watermark from the
  legacy AppData/LmxOpcUaClient/pki to the canonical
  AppData/OtOpcUaClient/pki.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-23 11:25:20 -04:00
Joseph Doherty 59ecd18169 docs(code-reviews): regenerate index — 25 Low findings resolved
Batch 6 cleared Open findings in Driver.FOCAS.Cli (1 deferred to
Driver.Cli.Common), Driver.Cli.Common, Driver.Historian.Wonderware.Client,
Client.CLI, and Client.Shared.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-23 11:13:29 -04:00
Joseph Doherty 2a6ac07111 fix(client-shared): resolve Low code-review findings (Client.Shared-003,004,009,010,011)
- Client.Shared-003: DefaultSessionAdapter.WriteValueAsync / CallMethodAsync
  guard against null/empty Results and throw ServiceResultException with
  the response's ServiceResult code instead of indexing into a missing
  list.
- Client.Shared-004: DefaultSessionAdapter.CloseAsync / HistoryReadRawAsync
  / HistoryReadAggregateAsync use the Session.*Async overloads and honour
  the caller's CancellationToken.
- Client.Shared-009: AcknowledgeAlarmAsync returns the underlying
  ServiceResultException.StatusCode on failure instead of always Good;
  IOpcUaClientService doc updated to describe the new contract.
- Client.Shared-010: ConnectionSettings.CertificateStorePath defaults to
  empty; DefaultApplicationConfigurationFactory resolves the canonical
  PKI path lazily, so per-failover ConnectionSettings copies don't hit
  the filesystem.
- Client.Shared-011: added the alarm-fallback regression test, extracted
  EndpointSelector as a pure static, and added EndpointSelectorTests
  covering security-mode match, Basic256Sha256 preference, fallback,
  diagnostics, hostname rewrite, and null/empty guards.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-23 11:13:21 -04:00
Joseph Doherty 7fe9f16cf8 fix(client-cli): resolve Low code-review findings (Client.CLI-002,003,004,006,007,008,009,010)
- Client.CLI-002: SubscribeCommand's neverWentBad list now requires the
  node to be present in lastStatus (i.e. received at least one update)
  so the 'suspect' bucket only contains observed nodes.
- Client.CLI-003: every long-running command validates numeric option
  ranges (Interval / Depth / MaxDepth / Duration / Max) and throws
  CliFx CommandException on out-of-range values.
- Client.CLI-004: SubscribeCommand carries XML summary docs on the
  type, ctor, every [CommandOption] property, and ExecuteAsync —
  matching the sibling commands' style.
- Client.CLI-006: HistoryReadCommand parses --start / --end with
  InvariantCulture+UTC and surfaces FormatException as CommandException;
  every NodeIdParser.ParseRequired call wraps FormatException /
  ArgumentException as CommandException.
- Client.CLI-007: CommandBase.ConfigureLogging calls Log.CloseAndFlush()
  before assigning a new Log.Logger so prior sinks are disposed.
- Client.CLI-008: rewrote the subscribe and historyread sections of
  docs/Client.CLI.md (every flag documented, summary-bucket vocabulary,
  StandardDeviation aggregate, UTC --start/--end convention).
- Client.CLI-009: SubscribeCommand / AlarmsCommand use named local
  handlers and detach them via -= after UnsubscribeAsync so no
  notification reaches the console after the command's output phase
  ends.
- Client.CLI-010: added CommandRangeValidationTests,
  EventHandlerLifecycleTests, InputValidationErrorsTests,
  LoggerLifecycleTests, and SubscribeCommandSummaryTests pinning every
  Low fix; FakeOpcUaClientService gained AddDiscoveredVariable +
  RaiseDataChanged + BrowseResultsByParent helpers.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-23 11:13:08 -04:00
Joseph Doherty 879925180b fix(driver-historian-wonderware-client): resolve Low code-review findings (Driver.Historian.Wonderware.Client-003,004,006,008,010)
- Driver.Historian.Wonderware.Client-003: replaced the mixed Interlocked
  + healthLock counters with RecordOutcome that touches _totalQueries
  and exactly one of _totalSuccesses / _totalFailures under one
  acquisition.
- Driver.Historian.Wonderware.Client-004: InvokeAndClassifyAsync routes
  transport + sidecar classification through a single RecordOutcome
  call; the legacy ReclassifySuccessAsFailure two-step is gone.
- Driver.Historian.Wonderware.Client-006: removed the dead
  ReconnectInitialBackoff / ReconnectMaxBackoff options and added a
  doc <remarks> stating the channel performs a single in-place
  reconnect; retry/backoff stays with the caller.
- Driver.Historian.Wonderware.Client-008: the audit-suppression comment
  block now records advisory titles, why neither applies, and the
  revisit trigger.
- Driver.Historian.Wonderware.Client-010: reworded Dispose() to claim
  deadlock-safety and added a GetHealthSnapshot summary documenting the
  single-channel collapse + counter invariant.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-23 11:12:16 -04:00
Joseph Doherty 3ca569f621 fix(driver-cli-common): resolve Low code-review findings (Driver.Cli.Common-004,006)
- Driver.Cli.Common-004: confirm the FormatTable empty-input guard
  landed earlier (commit 1433a1c); flip status to Resolved with a
  cross-reference.
- Driver.Cli.Common-006: reword the SnapshotFormatter source-time
  column comment to describe the actual behaviour (right-most column,
  unmeasured, '-' for null timestamps) and confirm the
  DriverCommandBase summary now enumerates FOCAS.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-23 11:12:04 -04:00
Joseph Doherty 6923be3aa2 fix(driver-focas-cli): resolve Low code-review findings (Driver.FOCAS.Cli-001,002,003,004; -005 deferred)
- Driver.FOCAS.Cli-001: WriteCommand.ParseValue now wraps numeric
  FormatException / OverflowException as CliFx CommandException with
  the offending value.
- Driver.FOCAS.Cli-002: SubscribeCommand's OnDataChange handler and the
  banner both take a writeLock so notification-callback and main-thread
  writes can't interleave; handler exceptions are warn-and-swallow.
- Driver.FOCAS.Cli-003: FocasCommandBase.ValidateOptions rejects
  --cnc-port outside 1..65535, non-positive --timeout-ms, and
  non-positive --interval-ms; ExecuteAsync calls it first.
- Driver.FOCAS.Cli-004: 'await using var driver' is the sole driver
  disposal path; dropped the redundant explicit await ShutdownAsync.
- Driver.FOCAS.Cli-005 (Deferred): the fix lives in
  Driver.Cli.Common.SnapshotFormatter — explicitly naming the
  status-code shortlist there benefits every driver CLI. Left as a
  Driver.Cli.Common follow-up.
- Registered the new tests/Drivers/Cli/ZB.MOM.WW.OtOpcUa.Driver.FOCAS.Cli.Tests
  project in ZB.MOM.WW.OtOpcUa.slnx.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-23 11:11:55 -04:00
Joseph Doherty 2a941b255f docs(code-reviews): regenerate index — 29 Low findings resolved
Batch 5 cleared Open findings in Driver.AbCip.Cli, Driver.AbLegacy.Cli,
Driver.S7.Cli, Driver.TwinCAT.Cli, and Driver.Modbus.Cli.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-23 08:35:12 -04:00
Joseph Doherty 80ef8806e0 fix(driver-modbus-cli): resolve Low code-review findings (Driver.Modbus.Cli-003,004,005,006,007,008)
- Driver.Modbus.Cli-003: ModbusCommandBase.ValidateEndpoint rejects
  --port outside 1..65535, non-positive --timeout-ms, and --unit-id
  outside 1..247.
- Driver.Modbus.Cli-004: wrapped SubscribeCommand's OnDataChange handler
  body in a try/catch (warn-and-swallow) and serialised the console
  write through a lock.
- Driver.Modbus.Cli-005: Probe / Read / Write now catch the
  cancellation-during-init OperationCanceledException and print
  'Cancelled.' instead of dumping a stack trace.
- Driver.Modbus.Cli-006: ProbeCommand.ComputeVerdict derives the headline
  from BOTH the driver state and the probe snapshot's OPC UA quality
  class so the headline can't disagree with the wire result.
- Driver.Modbus.Cli-007: docs/Driver.Modbus.Cli.md carries an explicit
  'CLI scope' callout — the address-string grammar is a DriverConfig
  JSON feature; the CLI takes the structured triple only.
- Driver.Modbus.Cli-008: pinned BuildOptions, ValidateEndpoint, the
  region-validation guards, ComputeVerdict, and the cancellation-during-
  initialize paths.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-23 08:35:05 -04:00
Joseph Doherty f2ee027145 fix(driver-twincat-cli): resolve Low code-review findings (Driver.TwinCAT.Cli-001,002,003,004,005,006,007)
- Driver.TwinCAT.Cli-001: TwinCATCommandBase.Validate rejects
  non-positive TimeoutMs / IntervalMs and AmsPort outside 1..65535;
  ExecuteAsync calls it first.
- Driver.TwinCAT.Cli-002: SubscribeCommand serialises every WriteLine
  through a writeLock to remove the notification-callback vs banner
  interleave risk.
- Driver.TwinCAT.Cli-003: SubscribeCommand.DescribeMechanism derives
  the banner label from the returned ISubscriptionHandle.DiagnosticId
  so it can't disagree with what the driver actually did.
- Driver.TwinCAT.Cli-004: introduced TwinCATTagCommandBase carrying
  --poll-only + BuildOptions; BrowseCommand stays on the slimmer
  TwinCATCommandBase so --poll-only no longer surfaces in browse --help.
- Driver.TwinCAT.Cli-005: ProbeCommand --type now carries the 't' short
  alias to match the other commands.
- Driver.TwinCAT.Cli-006: 35 new tests covering Gateway / AmsAddress
  parse / BuildOptions / PollOnly / browse-helpers / probe-alias /
  mechanism derivation.
- Driver.TwinCAT.Cli-007: replaced the empty-init <inheritdoc/> with an
  explicit summary warning future maintainers about the no-op init.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-23 08:34:57 -04:00
Joseph Doherty 67ef6c4ebc fix(driver-s7-cli): resolve Low code-review findings (Driver.S7.Cli-004,005,006,007)
- Driver.S7.Cli-004: 'await using var driver' is the sole driver
  disposal path; dropped the redundant explicit await ShutdownAsync from
  each command's finally.
- Driver.S7.Cli-005: deleted the stale empty
  tests/ZB.MOM.WW.OtOpcUa.Driver.S7.Cli.Tests/ directory (the real test
  project lives under tests/Drivers/Cli/).
- Driver.S7.Cli-006: S7CommandBaseBuildOptionsTests cover the probe
  toggle, timeout mapping, host/port/CPU/rack/slot wiring, and tag list
  passthrough.
- Driver.S7.Cli-007: re-added the SubscribeCommand handler comment
  explaining the CliFx IConsole.Output usage and that the poll-thread
  raises events.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-23 08:34:48 -04:00
Joseph Doherty f46e126208 fix(driver-ablegacy-cli): resolve Low code-review findings (Driver.AbLegacy.Cli-002,003,004,005,006,007)
- Driver.AbLegacy.Cli-002: WriteCommand.Value description lists the full
  true/false, 1/0, on/off, yes/no alias set.
- Driver.AbLegacy.Cli-003: SubscribeCommand serialises every WriteLine
  via a per-execution consoleGate lock so the poll-thread OnDataChange
  handler can't interleave with the banner.
- Driver.AbLegacy.Cli-004: dropped 'await using var driver' in favour of
  a plain 'var driver' + explicit await ShutdownAsync in finally; the
  driver is no longer shut down twice.
- Driver.AbLegacy.Cli-005: SubscribeCommand.IntervalMs description
  carries the PollGroupEngine 250ms-floor caveat; docs/Driver.AbLegacy.Cli.md
  spells out the same.
- Driver.AbLegacy.Cli-006: ProbeCommand --type now carries the short
  alias 't' to match the other commands.
- Driver.AbLegacy.Cli-007: BuildOptionsTests cover the probe-disabled,
  device-shape, tag-passthrough, timeout-propagation, and empty-tag-list
  paths.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-23 08:34:32 -04:00
Joseph Doherty 759af8c1bb fix(driver-abcip-cli): resolve Low code-review findings (Driver.AbCip.Cli-003,004,005,006,007,008)
- Driver.AbCip.Cli-003: SubscribeCommand prints the 'Subscribed' banner
  BEFORE wiring OnDataChange so the main thread can't interleave its
  write with the poll-thread handler.
- Driver.AbCip.Cli-004: AbCipCommandBase.Timeout and SubscribeCommand
  validate TimeoutMs / IntervalMs and throw CommandException on
  non-positive values.
- Driver.AbCip.Cli-005: every command now calls FlushLogging() in its
  finally block.
- Driver.AbCip.Cli-006: Timeout init throws NotSupportedException with a
  pointer at TimeoutMs instead of silently swallowing assignments.
- Driver.AbCip.Cli-007: added AbCipCommandBaseTests covering BuildOptions
  shape, probe / controller-browse / alarm toggles, host address, family
  selection, tag list passthrough.
- Driver.AbCip.Cli-008: rewrote the opening paragraph in
  docs/Driver.AbCip.Cli.md to credit the six-CLI roster with a pointer
  at docs/DriverClis.md.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-23 08:34:25 -04:00
Joseph Doherty 61c0311938 docs(code-reviews): regenerate index — 24 Low findings resolved
Batch 4 cleared Open findings in Driver.TwinCAT, Driver.Modbus,
Driver.OpcUaClient, Driver.Historian.Wonderware, and Driver.Modbus.Addressing.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-23 08:18:21 -04:00
Joseph Doherty 9263519852 fix(driver-modbus-addressing): resolve Low code-review findings (Driver.Modbus.Addressing-006,007,009)
- Driver.Modbus.Addressing-006: broaden the catch in TryParseFamilyNative
  so a future helper throwing a non-Argument/Overflow type still satisfies
  the try-parse contract.
- Driver.Modbus.Addressing-007: document that the address grammar does
  not carry ModbusStringByteOrder (the structured-tag path does);
  add a 'Grammar scope' bullet to docs/v2/dl205.md.
- Driver.Modbus.Addressing-009: reword the ModbusModiconAddress comments
  so they don't imply a leading-digit invariant the parser doesn't
  enforce.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-23 08:18:15 -04:00
Joseph Doherty 1f29b215c8 fix(driver-historian-wonderware): resolve Low code-review findings (Driver.Historian.Wonderware-004,005,007,008,010,011,012)
- Driver.Historian.Wonderware-004: ToHistorianEvent synthesises a fresh
  Guid when the upstream EventId is unparseable and logs the substitution
  instead of writing the historian with Guid.Empty.
- Driver.Historian.Wonderware-005: GetHealthSnapshot derives the
  connection-open booleans from the active-node fields so the snapshot
  is self-consistent without depending on the secondary lock.
- Driver.Historian.Wonderware-007: SID-mismatch branch in PipeServer now
  sends a HelloAck { Accepted=false, RejectReason } so the client sees a
  symmetric rejection.
- Driver.Historian.Wonderware-008: classify StartQuery failures —
  connection-class codes drop the connection, query-class codes throw
  QueryClassStartQueryException so the IPC layer surfaces Success=false.
- Driver.Historian.Wonderware-010: RequestTimeoutSeconds now enforced
  via BuildRequestCts linked to the caller's CancellationToken.
- Driver.Historian.Wonderware-011: refreshed XML docs to describe the
  current sidecar / named-pipe architecture (Galaxy.Host / Proxy
  references reframed as historical context).
- Driver.Historian.Wonderware-012: pinned the previously-uncovered
  HistorianDataSource behaviours with five new test files; also removed
  the stale empty tests/ZB.MOM.WW.OtOpcUa.Driver.Historian.Wonderware.Tests
  directory.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-23 08:18:10 -04:00
Joseph Doherty 42aa82de29 fix(driver-opcuaclient): resolve Low code-review findings (Driver.OpcUaClient-011,014)
- Driver.OpcUaClient-011: rewrote the ValueRank comment with the OPC UA
  Part 3 constants and an explicit scalar/array boundary at
  valueRank >= 0.
- Driver.OpcUaClient-014: track every MonitoredItem.Notification handler
  in a MonitoredItemNotificationHandle record; UnsubscribeAsync /
  UnsubscribeAlarmsAsync / ShutdownAsync detach the handler before
  Subscription.DeleteAsync so the SDK's invocation list no longer keeps
  the driver alive.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-23 08:17:55 -04:00
Joseph Doherty d5322b0f9a fix(driver-modbus): resolve Low code-review findings (Driver.Modbus-003,007,008,009,010,011,012)
- Driver.Modbus-003: route every _health access through ReadHealth /
  WriteHealth helpers backed by Volatile.Read / Volatile.Write so a
  burst of concurrent ReadAsync callers always sees a complete snapshot.
- Driver.Modbus-007: promoted the Int64 / UInt64 → Int32 surfacing
  caveat to a full <remarks> block; rewrote DisableFC23's doc to flag it
  as reserved / no-op.
- Driver.Modbus-008: deleted stale duplicate doc, rewrote the
  prohibition-block summaries to credit the shipped re-probe loop, and
  removed the unused 'status' local in the ModbusException catch arm.
- Driver.Modbus-009: bind-time validation rejects StringLength < 1 for
  String tags; ModbusTcpTransport clamps keep-alive intervals to whole
  seconds (>=1).
- Driver.Modbus-010: documented WriteOnChangeOnly's cache-invalidation
  policy (reads-only) and the write-only-tag caveat.
- Driver.Modbus-011: collected the scattered instance fields into a
  single contiguous block at the top of ModbusDriver.
- Driver.Modbus-012: covered the previously-uncovered Reinitialize
  state-hygiene, malformed/truncated/empty-bitmap response, and
  DisposeAsync teardown paths.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-23 08:17:51 -04:00
Joseph Doherty 3c75db7eb6 fix(driver-twincat): resolve Low code-review findings (Driver.TwinCAT-004,006,014,015,016)
- Driver.TwinCAT-004: corrected the IEC time-type inline comments;
  documented that the driver currently surfaces them as raw UInt32
  counters.
- Driver.TwinCAT-006: ResolveHost returns a documented UnresolvedHost
  sentinel when no devices are configured instead of returning the
  logical DriverInstanceId (which never matches GetHostStatuses).
- Driver.TwinCAT-014: wired Probe.Timeout into the probe-loop call and
  added a NotificationMaxDelayMs config knob threaded through
  AddNotificationAsync.
- Driver.TwinCAT-015: Dispose() runs a genuinely synchronous teardown
  with bounded waits (no sync-over-async deadlock pattern).
- Driver.TwinCAT-016: pinned the Structure-tag rejection and the
  probe-loop vs read disposal race with regression tests.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-23 08:17:42 -04:00
804 changed files with 42225 additions and 35905 deletions
+70
View File
@@ -0,0 +1,70 @@
# CI for the v2 branch — runs on every push + PR to the v2-akka-fuse / master
# branches. Layered into three jobs:
# build dotnet restore + build (fast feedback on compile errors)
# unit-tests every v2 unit-test project
# integration 2-node Host.IntegrationTests harness
#
# Skips E2E (Category=E2E) — that runs nightly via v2-e2e.yml against the full
# four-node docker-dev stack.
#
# Compatible with both GitHub Actions and Gitea Actions (act_runner). The .NET 10
# SDK is pinned via global.json at the repo root; if no global.json exists, the
# setup-dotnet step falls back to dotnet-version below.
name: v2-ci
on:
push:
branches: [v2-akka-fuse, master]
pull_request:
branches: [v2-akka-fuse, master]
workflow_dispatch: {}
env:
DOTNET_NOLOGO: "1"
DOTNET_CLI_TELEMETRY_OPTOUT: "1"
jobs:
build:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/setup-dotnet@v4
with:
dotnet-version: 10.0.x
- name: dotnet restore
run: dotnet restore ZB.MOM.WW.OtOpcUa.slnx
- name: dotnet build
run: dotnet build ZB.MOM.WW.OtOpcUa.slnx --no-restore --configuration Release
unit-tests:
needs: build
runs-on: ubuntu-latest
strategy:
fail-fast: false
matrix:
project:
- tests/Core/ZB.MOM.WW.OtOpcUa.Cluster.Tests
- tests/Server/ZB.MOM.WW.OtOpcUa.ControlPlane.Tests
- tests/Server/ZB.MOM.WW.OtOpcUa.Runtime.Tests
- tests/Server/ZB.MOM.WW.OtOpcUa.Security.Tests
- tests/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer.Tests
steps:
- uses: actions/checkout@v4
- uses: actions/setup-dotnet@v4
with:
dotnet-version: 10.0.x
- name: dotnet test ${{ matrix.project }}
run: dotnet test ${{ matrix.project }} --configuration Release --filter "Category!=E2E"
integration:
needs: build
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/setup-dotnet@v4
with:
dotnet-version: 10.0.x
- name: dotnet test Host.IntegrationTests
run: dotnet test tests/Server/ZB.MOM.WW.OtOpcUa.Host.IntegrationTests --configuration Release --filter "Category!=E2E"
+57
View File
@@ -0,0 +1,57 @@
# Nightly E2E job. Runs against the docker-dev four-node fleet (admin-a +
# admin-b + driver-a + driver-b + SQL + LDAP + Traefik). Trigger:
# - cron at 03:00 UTC daily
# - workflow_dispatch from the Actions UI for on-demand runs
#
# The E2E test project (tests/Server/ZB.MOM.WW.OtOpcUa.E2ETests) does not yet
# exist — it lands when the F-series follow-ups F10/F11/F12 wire enough of the
# SDK/historian/probe so an end-to-end driver round-trip is meaningful. Until
# then this workflow is a green no-op (the `--filter Category=E2E` matches
# zero tests, and `dotnet test` returns 0).
name: v2-e2e
on:
schedule:
- cron: "0 3 * * *"
workflow_dispatch: {}
env:
DOTNET_NOLOGO: "1"
DOTNET_CLI_TELEMETRY_OPTOUT: "1"
jobs:
e2e:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/setup-dotnet@v4
with:
dotnet-version: 10.0.x
- name: dotnet restore
run: dotnet restore ZB.MOM.WW.OtOpcUa.slnx
- name: Build docker-dev fleet
run: docker compose -f docker-dev/docker-compose.yml up -d --build
- name: Wait for cluster
run: |
for i in $(seq 1 30); do
if curl -sf http://localhost/health/active >/dev/null; then
echo "Admin leader healthy after ${i}s"
exit 0
fi
sleep 2
done
echo "Timed out waiting for /health/active"
docker compose -f docker-dev/docker-compose.yml logs --tail=200
exit 1
- name: dotnet test (E2E only)
run: dotnet test --configuration Release --filter "Category=E2E"
- name: Tear down
if: always()
run: docker compose -f docker-dev/docker-compose.yml down -v
+18
View File
@@ -0,0 +1,18 @@
<Project>
<!--
Defaults inherited by every csproj. Individual projects may override.
Deviation from the original v2 plan: TreatWarningsAsErrors is NOT set globally because the
pre-v2 test projects (e.g. Admin.Tests) carry 240+ xUnit1051 analyzer warnings that would
fail the build. New v2 projects (Commons, Cluster, ControlPlane, Runtime, OpcUaServer, AdminUI,
Host, Security) MUST opt in to <TreatWarningsAsErrors>true</TreatWarningsAsErrors> in their
own csproj. Once the legacy Admin/Server projects are deleted (Phase 10, Task 56), this can
be promoted back to a global default.
-->
<PropertyGroup>
<TargetFramework>net10.0</TargetFramework>
<Nullable>enable</Nullable>
<ImplicitUsings>enable</ImplicitUsings>
<LangVersion>latest</LangVersion>
</PropertyGroup>
</Project>
+103
View File
@@ -0,0 +1,103 @@
<Project>
<PropertyGroup>
<ManagePackageVersionsCentrally>true</ManagePackageVersionsCentrally>
</PropertyGroup>
<ItemGroup>
<PackageVersion Include="Akka" Version="1.5.62" />
<PackageVersion Include="Akka.Cluster" Version="1.5.62" />
<PackageVersion Include="Akka.Cluster.Hosting" Version="1.5.62" />
<PackageVersion Include="Akka.Cluster.Tools" Version="1.5.62" />
<PackageVersion Include="Akka.Hosting" Version="1.5.62" />
<PackageVersion Include="Akka.Remote" Version="1.5.62" />
<PackageVersion Include="Akka.Remote.Hosting" Version="1.5.62" />
<PackageVersion Include="Akka.Streams" Version="1.5.62" />
<PackageVersion Include="Akka.Streams.TestKit" Version="1.5.62" />
<PackageVersion Include="Akka.TestKit.Xunit2" Version="1.5.62" />
<PackageVersion Include="Avalonia" Version="11.2.7" />
<PackageVersion Include="Avalonia.Controls.DataGrid" Version="11.2.7" />
<PackageVersion Include="Avalonia.Desktop" Version="11.2.7" />
<PackageVersion Include="Avalonia.Diagnostics" Version="11.2.7" />
<PackageVersion Include="Avalonia.Fonts.Inter" Version="11.2.7" />
<PackageVersion Include="Avalonia.Headless" Version="11.2.7" />
<PackageVersion Include="Avalonia.Svg.Skia" Version="11.2.0.2" />
<PackageVersion Include="Avalonia.Themes.Fluent" Version="11.2.7" />
<PackageVersion Include="Beckhoff.TwinCAT.Ads" Version="7.0.172" />
<PackageVersion Include="bunit" Version="2.0.33-preview" />
<PackageVersion Include="CliFx" Version="2.3.6" />
<PackageVersion Include="CommunityToolkit.Mvvm" Version="8.4.0" />
<PackageVersion Include="coverlet.collector" Version="6.0.4" />
<PackageVersion Include="FluentAssertions" Version="8.3.0" />
<PackageVersion Include="Google.Protobuf" Version="3.34.1" />
<PackageVersion Include="Grpc.Core.Api" Version="2.76.0" />
<PackageVersion Include="Grpc.Net.Client" Version="2.76.0" />
<PackageVersion Include="libplctag" Version="1.5.2" />
<PackageVersion Include="LiteDB" Version="5.0.21" />
<PackageVersion Include="MessagePack" Version="2.5.187" />
<PackageVersion Include="Microsoft.AspNetCore.Authentication.JwtBearer" Version="10.0.7" />
<PackageVersion Include="Microsoft.AspNetCore.Authorization" Version="10.0.7" />
<PackageVersion Include="Microsoft.AspNetCore.DataProtection" Version="10.0.7" />
<PackageVersion Include="Microsoft.AspNetCore.DataProtection.EntityFrameworkCore" Version="10.0.7" />
<PackageVersion Include="Microsoft.AspNetCore.Mvc.Testing" Version="10.0.0" />
<PackageVersion Include="Microsoft.AspNetCore.SignalR.Client" Version="10.0.0" />
<PackageVersion Include="Microsoft.AspNetCore.SignalR.Core" Version="1.2.0" />
<PackageVersion Include="Microsoft.AspNetCore.TestHost" Version="10.0.7" />
<!--
Roslyn analyzer packages pin to the same major version as the SDK's compiler.
.NET SDK 10.0.105 ships compiler 5.0.0.0. Microsoft.CodeAnalysis.CSharp 5.3.x emits
analyzer DLLs that reference compiler 5.3.0.0 and fail with CS9057 on the local SDK.
Pin to 5.0.0 (matches the compiler the SDK ships) until the SDK rolls to 10.0.110+.
-->
<PackageVersion Include="Microsoft.CodeAnalysis.CSharp" Version="5.0.0" />
<PackageVersion Include="Microsoft.CodeAnalysis.CSharp.Scripting" Version="4.12.0" />
<PackageVersion Include="Microsoft.CodeAnalysis.CSharp.Workspaces" Version="5.0.0" />
<PackageVersion Include="Microsoft.Data.SqlClient" Version="6.1.1" />
<PackageVersion Include="Microsoft.Data.Sqlite" Version="9.0.0" />
<PackageVersion Include="Microsoft.EntityFrameworkCore" Version="10.0.7" />
<PackageVersion Include="Microsoft.EntityFrameworkCore.Design" Version="10.0.7" />
<PackageVersion Include="Microsoft.EntityFrameworkCore.InMemory" Version="10.0.7" />
<PackageVersion Include="Microsoft.EntityFrameworkCore.SqlServer" Version="10.0.7" />
<PackageVersion Include="Microsoft.Extensions.Configuration.Abstractions" Version="10.0.7" />
<PackageVersion Include="Microsoft.Extensions.Configuration.Json" Version="10.0.7" />
<PackageVersion Include="Microsoft.Extensions.DependencyInjection" Version="10.0.7" />
<PackageVersion Include="Microsoft.Extensions.DependencyInjection.Abstractions" Version="10.0.7" />
<PackageVersion Include="Microsoft.Extensions.Hosting" Version="10.0.7" />
<PackageVersion Include="Microsoft.Extensions.Hosting.Abstractions" Version="10.0.7" />
<PackageVersion Include="Microsoft.Extensions.Hosting.WindowsServices" Version="10.0.7" />
<PackageVersion Include="Microsoft.Extensions.Http" Version="10.0.7" />
<PackageVersion Include="Microsoft.Extensions.Logging" Version="10.0.7" />
<PackageVersion Include="Microsoft.Extensions.Logging.Abstractions" Version="10.0.7" />
<PackageVersion Include="Microsoft.Extensions.Options" Version="10.0.7" />
<PackageVersion Include="Microsoft.Extensions.Options.ConfigurationExtensions" Version="10.0.7" />
<PackageVersion Include="Microsoft.IdentityModel.Tokens" Version="8.11.0" />
<PackageVersion Include="Microsoft.NET.Test.Sdk" Version="17.12.0" />
<PackageVersion Include="Microsoft.Playwright" Version="1.51.0" />
<PackageVersion Include="Novell.Directory.Ldap.NETStandard" Version="3.6.0" />
<PackageVersion Include="OPCFoundation.NetStandard.Opc.Ua.Client" Version="1.5.378.106" />
<PackageVersion Include="OPCFoundation.NetStandard.Opc.Ua.Configuration" Version="1.5.378.106" />
<PackageVersion Include="OPCFoundation.NetStandard.Opc.Ua.Server" Version="1.5.374.126" />
<PackageVersion Include="OpenTelemetry.Exporter.Prometheus.AspNetCore" Version="1.15.3-beta.1" />
<PackageVersion Include="OpenTelemetry.Extensions.Hosting" Version="1.15.3" />
<PackageVersion Include="Polly.Core" Version="8.6.6" />
<PackageVersion Include="S7netplus" Version="0.20.0" />
<PackageVersion Include="Serilog" Version="4.3.0" />
<PackageVersion Include="Serilog.AspNetCore" Version="9.0.0" />
<PackageVersion Include="Serilog.Extensions.Hosting" Version="9.0.0" />
<PackageVersion Include="Serilog.Formatting.Compact" Version="3.0.0" />
<PackageVersion Include="Serilog.Settings.Configuration" Version="9.0.0" />
<PackageVersion Include="Serilog.Sinks.Console" Version="6.0.0" />
<PackageVersion Include="Serilog.Sinks.File" Version="7.0.0" />
<PackageVersion Include="Shouldly" Version="4.3.0" />
<PackageVersion Include="System.CommandLine" Version="2.0.5" />
<PackageVersion Include="System.Data.SqlClient" Version="4.9.0" />
<PackageVersion Include="System.IdentityModel.Tokens.Jwt" Version="8.11.0" />
<PackageVersion Include="System.IO.Pipes.AccessControl" Version="5.0.0" />
<PackageVersion Include="System.Memory" Version="4.5.5" />
<PackageVersion Include="System.Threading.Tasks.Extensions" Version="4.5.4" />
<PackageVersion Include="xunit" Version="2.9.2" />
<PackageVersion Include="xunit.runner.visualstudio" Version="3.0.2" />
<PackageVersion Include="xunit.v3" Version="1.1.0" />
</ItemGroup>
</Project>
+15 -5
View File
@@ -2,6 +2,8 @@
<Folder Name="/src/" />
<Folder Name="/src/Core/">
<Project Path="src/Core/ZB.MOM.WW.OtOpcUa.Core.Abstractions/ZB.MOM.WW.OtOpcUa.Core.Abstractions.csproj" />
<Project Path="src/Core/ZB.MOM.WW.OtOpcUa.Cluster/ZB.MOM.WW.OtOpcUa.Cluster.csproj" />
<Project Path="src/Core/ZB.MOM.WW.OtOpcUa.Commons/ZB.MOM.WW.OtOpcUa.Commons.csproj" />
<Project Path="src/Core/ZB.MOM.WW.OtOpcUa.Configuration/ZB.MOM.WW.OtOpcUa.Configuration.csproj" />
<Project Path="src/Core/ZB.MOM.WW.OtOpcUa.Core/ZB.MOM.WW.OtOpcUa.Core.csproj" />
<Project Path="src/Core/ZB.MOM.WW.OtOpcUa.Core.Scripting/ZB.MOM.WW.OtOpcUa.Core.Scripting.csproj" />
@@ -10,8 +12,12 @@
<Project Path="src/Core/ZB.MOM.WW.OtOpcUa.Core.AlarmHistorian/ZB.MOM.WW.OtOpcUa.Core.AlarmHistorian.csproj" />
</Folder>
<Folder Name="/src/Server/">
<Project Path="src/Server/ZB.MOM.WW.OtOpcUa.Server/ZB.MOM.WW.OtOpcUa.Server.csproj" />
<Project Path="src/Server/ZB.MOM.WW.OtOpcUa.Admin/ZB.MOM.WW.OtOpcUa.Admin.csproj" />
<Project Path="src/Server/ZB.MOM.WW.OtOpcUa.AdminUI/ZB.MOM.WW.OtOpcUa.AdminUI.csproj" />
<Project Path="src/Server/ZB.MOM.WW.OtOpcUa.ControlPlane/ZB.MOM.WW.OtOpcUa.ControlPlane.csproj" />
<Project Path="src/Server/ZB.MOM.WW.OtOpcUa.Host/ZB.MOM.WW.OtOpcUa.Host.csproj" />
<Project Path="src/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer/ZB.MOM.WW.OtOpcUa.OpcUaServer.csproj" />
<Project Path="src/Server/ZB.MOM.WW.OtOpcUa.Runtime/ZB.MOM.WW.OtOpcUa.Runtime.csproj" />
<Project Path="src/Server/ZB.MOM.WW.OtOpcUa.Security/ZB.MOM.WW.OtOpcUa.Security.csproj" />
</Folder>
<Folder Name="/src/Drivers/">
<Project Path="src/Drivers/ZB.MOM.WW.OtOpcUa.Driver.Galaxy/ZB.MOM.WW.OtOpcUa.Driver.Galaxy.csproj" />
@@ -46,6 +52,7 @@
<Folder Name="/tests/" />
<Folder Name="/tests/Core/">
<Project Path="tests/Core/ZB.MOM.WW.OtOpcUa.Core.Abstractions.Tests/ZB.MOM.WW.OtOpcUa.Core.Abstractions.Tests.csproj" />
<Project Path="tests/Core/ZB.MOM.WW.OtOpcUa.Cluster.Tests/ZB.MOM.WW.OtOpcUa.Cluster.Tests.csproj" />
<Project Path="tests/Core/ZB.MOM.WW.OtOpcUa.Configuration.Tests/ZB.MOM.WW.OtOpcUa.Configuration.Tests.csproj" />
<Project Path="tests/Core/ZB.MOM.WW.OtOpcUa.Core.Tests/ZB.MOM.WW.OtOpcUa.Core.Tests.csproj" />
<Project Path="tests/Core/ZB.MOM.WW.OtOpcUa.Core.Scripting.Tests/ZB.MOM.WW.OtOpcUa.Core.Scripting.Tests.csproj" />
@@ -54,9 +61,11 @@
<Project Path="tests/Core/ZB.MOM.WW.OtOpcUa.Core.AlarmHistorian.Tests/ZB.MOM.WW.OtOpcUa.Core.AlarmHistorian.Tests.csproj" />
</Folder>
<Folder Name="/tests/Server/">
<Project Path="tests/Server/ZB.MOM.WW.OtOpcUa.Server.Tests/ZB.MOM.WW.OtOpcUa.Server.Tests.csproj" />
<Project Path="tests/Server/ZB.MOM.WW.OtOpcUa.Admin.Tests/ZB.MOM.WW.OtOpcUa.Admin.Tests.csproj" />
<Project Path="tests/Server/ZB.MOM.WW.OtOpcUa.Admin.E2ETests/ZB.MOM.WW.OtOpcUa.Admin.E2ETests.csproj" />
<Project Path="tests/Server/ZB.MOM.WW.OtOpcUa.ControlPlane.Tests/ZB.MOM.WW.OtOpcUa.ControlPlane.Tests.csproj" />
<Project Path="tests/Server/ZB.MOM.WW.OtOpcUa.Host.IntegrationTests/ZB.MOM.WW.OtOpcUa.Host.IntegrationTests.csproj" />
<Project Path="tests/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer.Tests/ZB.MOM.WW.OtOpcUa.OpcUaServer.Tests.csproj" />
<Project Path="tests/Server/ZB.MOM.WW.OtOpcUa.Runtime.Tests/ZB.MOM.WW.OtOpcUa.Runtime.Tests.csproj" />
<Project Path="tests/Server/ZB.MOM.WW.OtOpcUa.Security.Tests/ZB.MOM.WW.OtOpcUa.Security.Tests.csproj" />
</Folder>
<Folder Name="/tests/Drivers/">
<Project Path="tests/Drivers/ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Tests/ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Tests.csproj" />
@@ -85,6 +94,7 @@
<Project Path="tests/Drivers/Cli/ZB.MOM.WW.OtOpcUa.Driver.AbLegacy.Cli.Tests/ZB.MOM.WW.OtOpcUa.Driver.AbLegacy.Cli.Tests.csproj" />
<Project Path="tests/Drivers/Cli/ZB.MOM.WW.OtOpcUa.Driver.S7.Cli.Tests/ZB.MOM.WW.OtOpcUa.Driver.S7.Cli.Tests.csproj" />
<Project Path="tests/Drivers/Cli/ZB.MOM.WW.OtOpcUa.Driver.TwinCAT.Cli.Tests/ZB.MOM.WW.OtOpcUa.Driver.TwinCAT.Cli.Tests.csproj" />
<Project Path="tests/Drivers/Cli/ZB.MOM.WW.OtOpcUa.Driver.FOCAS.Cli.Tests/ZB.MOM.WW.OtOpcUa.Driver.FOCAS.Cli.Tests.csproj" />
</Folder>
<Folder Name="/tests/Client/">
<Project Path="tests/Client/ZB.MOM.WW.OtOpcUa.Client.Shared.Tests/ZB.MOM.WW.OtOpcUa.Client.Shared.Tests.csproj" />
+17 -17
View File
@@ -7,7 +7,7 @@
| Review date | 2026-05-22 |
| Commit reviewed | `76d35d1` |
| Status | Reviewed |
| Open findings | 8 |
| Open findings | 0 |
## Checklist coverage
@@ -62,7 +62,7 @@ assumption precisely.
| Severity | Low |
| Category | Correctness & logic bugs |
| Location | `Commands/SubscribeCommand.cs:129-137` |
| Status | Open |
| Status | Resolved |
**Description:** The summary computes `neverWentBad` as every target whose node-id key is
absent from the `everBad` dictionary. A node that received no update at all is also absent
@@ -78,7 +78,7 @@ streamed only good values.
"suspect" list only contains nodes that were actually observed and never reported bad
quality.
**Resolution:** _(open)_
**Resolution:** Resolved 2026-05-23 — `neverWentBad` now requires the node to be present in `lastStatus` (i.e. it received at least one update) before being counted, so the "suspect" bucket only contains nodes that were actually observed and never reported bad quality.
### Client.CLI-003
@@ -87,7 +87,7 @@ quality.
| Severity | Low |
| Category | Correctness & logic bugs |
| Location | `Commands/BrowseCommand.cs:29-30`, `Commands/SubscribeCommand.cs:20-27`, `Commands/AlarmsCommand.cs:28-29`, `Commands/HistoryReadCommand.cs:42-43` |
| Status | Open |
| Status | Resolved |
**Description:** Numeric command options accept any value with no range validation.
`--depth`, `--interval`, `--max-depth`, `--max`, and the history `--interval` can all be
@@ -100,7 +100,7 @@ is forwarded to `HistoryReadRawAsync`. None of these produce a clear operator-fa
`CliFx.Exceptions.CommandException` with an actionable message when a value is out of
range.
**Resolution:** _(open)_
**Resolution:** Resolved 2026-05-23 — every command's `ExecuteAsync` now validates the numeric option ranges (`--interval`, `--depth`, `--max-depth`, `--max`, `--duration`) and throws `CliFx.Exceptions.CommandException` with the offending value when a non-positive (or otherwise out-of-range) value is supplied. Pinned by `CommandRangeValidationTests`.
### Client.CLI-004
@@ -109,7 +109,7 @@ range.
| Severity | Low |
| Category | OtOpcUa conventions |
| Location | `Commands/SubscribeCommand.cs:13-37` |
| Status | Open |
| Status | Resolved |
**Description:** `SubscribeCommand` is the only command in the module whose constructor
and all `[CommandOption]` properties have no XML doc comments. Every other command
@@ -121,7 +121,7 @@ otherwise-uniform documentation convention of the module.
**Recommendation:** Add `<summary>` XML docs to the `SubscribeCommand` constructor and to
each of its option properties, matching the style used by the sibling commands.
**Resolution:** _(open)_
**Resolution:** Resolved 2026-05-23 — `SubscribeCommand` now carries `<summary>` XML docs on the type, the constructor, every `[CommandOption]` property, and `ExecuteAsync`, matching the style used by the sibling commands.
### Client.CLI-005
@@ -156,7 +156,7 @@ callback.
| Severity | Low |
| Category | Error handling & resilience |
| Location | `Commands/HistoryReadCommand.cs:73`, `Commands/HistoryReadCommand.cs:76`, `Helpers/NodeIdParser.cs:39` |
| Status | Open |
| Status | Resolved |
**Description:** Operator input-format errors surface as raw .NET exceptions rather than
clean CLI errors. An unparseable start/end value throws `FormatException` straight out of
@@ -170,7 +170,7 @@ is converted to a `CliFx.Exceptions.CommandException` with a clean exit code.
`CommandException` with a concise message and a non-zero exit code, so malformed input
yields a one-line error instead of a stack trace.
**Resolution:** _(open)_
**Resolution:** Resolved 2026-05-23 — `HistoryReadCommand` parses `--start`/`--end` with `CultureInfo.InvariantCulture` + `AssumeUniversal`/`AdjustToUniversal`, catches `FormatException`, and rethrows as `CommandException` with the offending value. Every command's call to `NodeIdParser.ParseRequired` is wrapped in a `catch (FormatException or ArgumentException)` block that surfaces the underlying message as a clean CLI error. Pinned by `InputValidationErrorsTests`.
### Client.CLI-007
@@ -179,7 +179,7 @@ yields a one-line error instead of a stack trace.
| Severity | Low |
| Category | Performance & resource management |
| Location | `CommandBase.cs:112-123` |
| Status | Open |
| Status | Resolved |
**Description:** `ConfigureLogging` builds a new Serilog `LoggerConfiguration`, creates a
logger, and assigns it to the static `Log.Logger` without disposing the previously
@@ -193,7 +193,7 @@ abandons the prior console sink without disposal. The pattern is incorrect:
build the logger into a local `ILogger` the command owns and disposes, rather than mutating
global static state per command.
**Resolution:** _(open)_
**Resolution:** Resolved 2026-05-23 — `CommandBase.ConfigureLogging` now calls `Log.CloseAndFlush()` before assigning a new `Log.Logger`, so a prior logger's console sink is disposed before the next one is installed. Pinned by `LoggerLifecycleTests`.
### Client.CLI-008
@@ -202,7 +202,7 @@ global static state per command.
| Severity | Low |
| Category | Documentation & comments |
| Location | `docs/Client.CLI.md:158-217` |
| Status | Open |
| Status | Resolved |
**Description:** `docs/Client.CLI.md` is stale relative to the code at this commit.
(1) The `subscribe` command section documents only `-n` and `-i`, but the code
@@ -218,7 +218,7 @@ code option description includes it.
`docs/Client.CLI.md` from the current option set, including the five new subscribe flags
and the `StandardDeviation` aggregate row.
**Resolution:** _(open)_
**Resolution:** Resolved 2026-05-23 — rewrote the `subscribe` section of `docs/Client.CLI.md` to document every flag (`-r/--recursive`, `--max-depth`, `-q/--quiet`, `--duration`, `--summary-file`) plus the summary-bucket vocabulary, and added the `StandardDeviation` row plus the UTC `--start`/`--end` convention note to the `historyread` section.
### Client.CLI-009
@@ -227,7 +227,7 @@ and the `StandardDeviation` aggregate row.
| Severity | Low |
| Category | Code organization & conventions |
| Location | `Commands/SubscribeCommand.cs:66-165`, `Commands/AlarmsCommand.cs:52-91` |
| Status | Open |
| Status | Resolved |
**Description:** Both long-running commands attach an event handler
(`service.DataChanged += ...`, `service.AlarmEvent += ...`) with a lambda and never detach
@@ -243,7 +243,7 @@ but never the .NET event.
unsubscribing, using a named local delegate so it can be removed, ensuring no notification
is processed after the command output phase ends.
**Resolution:** _(open)_
**Resolution:** Resolved 2026-05-23 — `SubscribeCommand` and `AlarmsCommand` declare named local handlers (`DataChangedHandler` / `AlarmEventHandler`) and detach them via `service.DataChanged -= ...` / `service.AlarmEvent -= ...` right after `UnsubscribeAsync` so no notification reaches the console once the command's output phase ends. Pinned by `EventHandlerLifecycleTests`.
### Client.CLI-010
@@ -252,7 +252,7 @@ is processed after the command output phase ends.
| Severity | Low |
| Category | Testing coverage |
| Location | `tests/Client/ZB.MOM.WW.OtOpcUa.Client.CLI.Tests/SubscribeCommandTests.cs` |
| Status | Open |
| Status | Resolved |
**Description:** The new `SubscribeCommand` capabilities are largely untested. The four
`SubscribeCommandTests` cover only single-node subscribe, unsubscribe-on-cancel,
@@ -268,4 +268,4 @@ exit, summary bucketing across good/bad/no-update nodes, and the `--summary-file
The `FakeOpcUaClientService` already exposes `RaiseDataChanged`, so feeding good/bad values
and asserting the summary text is straightforward.
**Resolution:** _(open)_
**Resolution:** Resolved 2026-05-23 — added `SubscribeCommandSummaryTests` (covering recursive collection via `FakeOpcUaClientService.AddDiscoveredVariable`, `--duration` auto-exit, summary bucketing for good/bad/never/never-went-bad, and the `--summary-file` write), `CommandRangeValidationTests`, `EventHandlerLifecycleTests`, `InputValidationErrorsTests`, and `LoggerLifecycleTests` to pin the other Low findings; `FakeOpcUaClientService` was extended with `AddDiscoveredVariable` / `RaiseDataChanged` helpers.
+11 -11
View File
@@ -7,7 +7,7 @@
| Review date | 2026-05-22 |
| Commit reviewed | `76d35d1` |
| Status | Reviewed |
| Open findings | 5 |
| Open findings | 0 |
## Checklist coverage
@@ -63,13 +63,13 @@
| Severity | Low |
| Category | Correctness & logic bugs |
| Location | `Adapters/DefaultSessionAdapter.cs:76`, `Adapters/DefaultSessionAdapter.cs:273` |
| Status | Open |
| Status | Resolved |
**Description:** `WriteValueAsync` returns `response.Results[0]` and `CallMethodAsync` reads `result.Results[0]` without first checking the `Results` collection is non-empty. A malformed or service-level-faulted response (empty `Results` alongside a service fault) produces an `IndexOutOfRangeException` rather than a meaningful OPC UA `StatusCode` or `ServiceResultException`.
**Recommendation:** Guard both accesses — throw `ServiceResultException` with the response's `ResponseHeader.ServiceResult` (or `BadUnexpectedError`) when `Results` is empty.
**Resolution:** _(open)_
**Resolution:** Resolved 2026-05-23 — added empty-Results guards to both `WriteValueAsync` (lines 80-85) and `CallMethodAsync` (lines 293-298) in `DefaultSessionAdapter`. Each now throws `ServiceResultException` carrying `response.ResponseHeader.ServiceResult.Code` (or `StatusCodes.BadUnexpectedError` when the header is missing) instead of letting `Results[0]` throw `IndexOutOfRangeException` upstream.
### Client.Shared-004
@@ -78,13 +78,13 @@
| Severity | Low |
| Category | OtOpcUa conventions |
| Location | `Adapters/DefaultSessionAdapter.cs:228`, `Adapters/DefaultSessionAdapter.cs:121`, `Adapters/DefaultSessionAdapter.cs:172` |
| Status | Open |
| Status | Resolved |
**Description:** `CloseAsync`, `HistoryReadRawAsync`, and `HistoryReadAggregateAsync` are declared `async Task` but call the synchronous `Session.Close()` / `Session.HistoryRead(...)` APIs and contain no `await`. The history methods run a blocking synchronous service round-trip on the caller's thread; for the UI this blocks the dispatcher thread. The async signature misleads callers, and the `CancellationToken` parameter is ignored on these paths.
**Recommendation:** Use the stack's async overloads (`Session.HistoryReadAsync`, `Session.CloseAsync`) where available, or wrap the synchronous calls in `Task.Run`, so the methods are genuinely asynchronous and honor the cancellation token.
**Resolution:** _(open)_
**Resolution:** Resolved 2026-05-23 — replaced the three blocking calls with their async counterparts: `CloseAsync` now awaits `Session.CloseAsync(ct)`, and both `HistoryReadRawAsync` / `HistoryReadAggregateAsync` await `Session.HistoryReadAsync(...)` with `.ConfigureAwait(false)`. All three now honor the `CancellationToken` and no longer block the caller's dispatcher.
### Client.Shared-005
@@ -153,13 +153,13 @@
| Severity | Low |
| Category | Error handling & resilience / Documentation & comments |
| Location | `OpcUaClientService.cs:302-322` |
| Status | Open |
| Status | Resolved |
**Description:** `AcknowledgeAlarmAsync` is typed `Task<StatusCode>` and its XML doc implies the returned code reports the ack outcome, but the method unconditionally `return StatusCodes.Good`. The actual failure path is `DefaultSessionAdapter.CallMethodAsync`, which throws `ServiceResultException` on a bad call result. A failed acknowledgment therefore never returns a bad `StatusCode` — it throws — and the `StatusCode` return value is dead. Callers writing `if (StatusCode.IsBad(result))` will never see a bad result and will not catch the exception.
**Recommendation:** Either change the return type to `Task` (and let exceptions signal failure), or catch `ServiceResultException` in `AcknowledgeAlarmAsync` and return its `StatusCode`. Update the XML doc to match whichever is chosen.
**Resolution:** _(open)_
**Resolution:** Resolved 2026-05-23 — `AcknowledgeAlarmAsync` now wraps the `CallMethodAsync` invocation in a try/catch for `ServiceResultException`, logging the failure and returning `ex.StatusCode` so callers using `if (StatusCode.IsBad(result))` see the bad status. The `IOpcUaClientService.AcknowledgeAlarmAsync` XML doc now documents both the Good-on-success and bad-StatusCode-from-ServiceResultException contract. Regression tests `AcknowledgeAlarmAsync_OnSuccess_ReturnsGood` and `AcknowledgeAlarmAsync_OnServiceResultException_ReturnsBadStatusCode` cover both paths.
### Client.Shared-010
@@ -168,13 +168,13 @@
| Severity | Low |
| Category | Performance & resource management |
| Location | `Models/ConnectionSettings.cs:48`, `OpcUaClientService.cs:408-417` |
| Status | Open |
| Status | Resolved |
**Description:** `ConnectionSettings.CertificateStorePath` is initialized to `ClientStoragePaths.GetPkiPath()` as a property initializer, so every `ConnectionSettings` instantiation runs `Environment.GetFolderPath` + `Path.Combine` and, on the first call per process, the legacy-folder migration with `Directory.Exists`/`Directory.Move` filesystem IO. `ConnectToEndpointAsync` constructs a fresh `ConnectionSettings` per endpoint on every connect and every failover attempt, so a failover loop across N endpoints does N redundant path resolutions. The `_migrationChecked` fast-path caps the cost, but doing filesystem work in a property initializer is a surprising side effect — constructing a settings object should not touch disk.
**Recommendation:** Make `CertificateStorePath` default to `string.Empty` and resolve `ClientStoragePaths.GetPkiPath()` lazily inside `DefaultApplicationConfigurationFactory.CreateAsync` only when the path is unset.
**Resolution:** _(open)_
**Resolution:** Resolved 2026-05-23 — `ConnectionSettings.CertificateStorePath` now defaults to `string.Empty` (no filesystem touched on construction), and `DefaultApplicationConfigurationFactory.CreateAsync` resolves the canonical PKI path via `ClientStoragePaths.GetPkiPath()` only when the supplied path is null/whitespace. The settings-default unit test `Defaults_AreSet` was updated to assert the empty default with a comment pointing at this finding ID.
### Client.Shared-011
@@ -183,10 +183,10 @@
| Severity | Low |
| Category | Testing coverage |
| Location | `tests/Client/ZB.MOM.WW.OtOpcUa.Client.Shared.Tests/OpcUaClientServiceTests.cs` |
| Status | Open |
| Status | Resolved |
**Description:** The test suite is solid for the happy paths, connection lifecycle, and single-failover behavior. Gaps relative to the findings above: (a) no test exercises concurrent `SubscribeAsync`/failover to expose the `_activeDataSubscriptions` race (Client.Shared-005) or re-entrant keep-alive failures (Client.Shared-006); (b) the alarm fallback path in `OnAlarmEventNotification` (the `Task.Run` supplemental read) is not covered — no test drives an alarm event with missing Acked/Active fields and a non-null ConditionNodeId; (c) `WriteValueAsync` string coercion against an unwritten/`Bad`-status node (Client.Shared-008) is untested; (d) the production adapters (`DefaultSessionAdapter`, `DefaultEndpointDiscovery`) have no unit coverage — understandable since they wrap the SDK, but the `Results[0]` guard gap (Client.Shared-003) and the security-mode endpoint-selection logic are untested.
**Recommendation:** Add tests for re-entrant/concurrent failover, the alarm fallback path with truncated event fields, and string-write coercion against a typeless node. Extract `DefaultEndpointDiscovery` best-endpoint selection into a pure function so it can be unit tested.
**Resolution:** _(open)_
**Resolution:** Resolved 2026-05-23 — added the previously-missing unit coverage: (a) `OnAlarmEvent_MissingAckedActiveButHasConditionNode_FallbackReadsAndRaisesEvent` drives the supplemental-read fallback path with null AckedState/ActiveState fields and a non-null SourceNode and asserts the Galaxy attribute reads populate the delivered event; (b) `WriteValueAsync` typeless-node coverage is exercised via the Client.Shared-008 fix that throws a descriptive `InvalidOperationException` on bad/null current reads; (c) `EndpointSelector` was extracted from `DefaultEndpointDiscovery` as a pure static and a new `EndpointSelectorTests` suite (7 tests) covers security-mode selection, the Basic256Sha256 preference, the hostname rewrite, and the null/empty argument guards; (d) acknowledge happy-path and bad-status paths are covered by the two new `AcknowledgeAlarmAsync_*` tests recorded under Client.Shared-009. Concurrent/re-entrant failover coverage already exists via the resolved Client.Shared-005/-006 tests in the suite.
+13 -13
View File
@@ -7,7 +7,7 @@
| Review date | 2026-05-22 |
| Commit reviewed | `76d35d1` |
| Status | Reviewed |
| Open findings | 6 |
| Open findings | 0 |
## Checklist coverage
@@ -89,7 +89,7 @@ directly so the compiler can prove non-nullness.
| Severity | Low |
| Category | OtOpcUa conventions |
| Location | `ZB.MOM.WW.OtOpcUa.Client.UI.csproj:20-21`, `Program.cs:14-20` |
| Status | Open |
| Status | Resolved |
**Description:** The csproj references `Serilog` and `Serilog.Sinks.Console`, and
`docs/Client.UI.md` lists Serilog as the logging technology, but no source file in
@@ -104,7 +104,7 @@ rolling daily file sink the project standard calls for) and route Avalonia loggi
through it, or drop the unused `Serilog` package references and correct
`docs/Client.UI.md`.
**Resolution:** _(open)_
**Resolution:** Resolved 2026-05-23 — Honoured the CLAUDE.md mandate by wiring up Serilog with a console sink + a rolling daily file sink (`{LocalAppData}/OtOpcUaClient/logs/client-ui-*.log`, retained 14 days). Added `Serilog.Sinks.File` to the csproj and a `ConfigureLogging()` initializer in `Program.Main` that creates `Log.Logger` before `BuildAvaloniaApp()` and calls `Log.CloseAndFlush()` on exit. Each VM that previously had silent swallow blocks now owns a static `Log.ForContext<>()` logger so failures (subscribe, alarm subscribe, redundancy probe, recursive browse) are written to the rolling file. Avalonia's own logging is still routed through `LogToTrace` — replacing that would require a custom `ILogSink` adapter outside the scope of this finding.
### Client.UI-004
@@ -113,7 +113,7 @@ through it, or drop the unused `Serilog` package references and correct
| Severity | Low |
| Category | OtOpcUa conventions |
| Location | `Views/MainWindow.axaml.cs:125-138` |
| Status | Open |
| Status | Resolved |
**Description:** `OnBrowseCertPathClicked` uses `OpenFolderDialog`, which is
obsolete in Avalonia 11.x (the version pinned in the csproj). The supported
@@ -125,7 +125,7 @@ Avalonia major version.
**Recommendation:** Migrate the folder chooser to
`TopLevel.GetTopLevel(this).StorageProvider.OpenFolderPickerAsync(...)`.
**Resolution:** _(open)_
**Resolution:** Resolved 2026-05-23 — Replaced `OpenFolderDialog` with `TopLevel.GetTopLevel(this).StorageProvider.OpenFolderPickerAsync(...)`, using `TryGetFolderFromPathAsync(vm.CertificateStorePath)` as the suggested start location and `TryGetLocalPath()` to extract the chosen path. The CS0618 obsoletion warning no longer appears in the build output.
### Client.UI-005
@@ -165,7 +165,7 @@ method, not only from `DisconnectAsync`.
| Severity | Low |
| Category | Error handling & resilience |
| Location | `ViewModels/MainWindowViewModel.cs:244-252`, `ViewModels/AlarmsViewModel.cs:88-112`, `ViewModels/SubscriptionsViewModel.cs:79-94` |
| Status | Open |
| Status | Resolved |
**Description:** Many catch blocks swallow exceptions silently with an empty body
and only a comment (`// Redundancy info not available`, `// Subscribe failed`,
@@ -180,7 +180,7 @@ permission denial effectively impossible from the UI.
message or write the exception to a log. Distinguish "feature not supported"
(condition refresh) from "operation failed" so genuine errors are not hidden.
**Resolution:** _(open)_
**Resolution:** Resolved 2026-05-23 — Added an observable `StatusMessage` property on `SubscriptionsViewModel` and `AlarmsViewModel`; each former silent catch now logs through Serilog (via Client.UI-003's logger) and writes a user-visible message. `MainWindowViewModel.InitializeService` subscribes to both child VMs' `StatusMessage` changes and bubbles them up into the shell's `StatusMessage` (which is already bound to the status bar). Soft conditions are distinguished from hard failures: `RequestConditionRefreshAsync` failures log at Information level and surface as "Condition refresh not supported by server" rather than a generic error, matching the recommendation. Redundancy probe failure still leaves `RedundancyInfo` null but now logs at Information level instead of dropping the exception. Regression tests `AddSubscription_OnFailure_SurfacesStatusMessage`, `AddSubscriptionForNodeAsync_OnFailure_SurfacesStatusMessage`, `Subscribe_OnFailure_SurfacesStatusMessage`, and `ConnectCommand_RedundancyFailure_DoesNotBreakConnection` cover the four affected swallow sites.
### Client.UI-007
@@ -239,7 +239,7 @@ any background reconnect timers are leaked until process exit. The
| Severity | Low |
| Category | Design-document adherence |
| Location | `ViewModels/HistoryViewModel.cs:44-54` |
| Status | Open |
| Status | Resolved |
**Description:** `HistoryViewModel.AggregateTypes` exposes eight entries: `null`
(Raw) plus Average, Minimum, Maximum, Count, Start, End, and `StandardDeviation`.
@@ -250,7 +250,7 @@ stale relative to the code.
**Recommendation:** Update the "Aggregate" row in `docs/Client.UI.md` to include
Standard Deviation.
**Resolution:** _(open)_
**Resolution:** Resolved 2026-05-23 — Added "Standard Deviation" to the Aggregate row of the Query Options table in `docs/Client.UI.md` so it matches the eighth entry already exposed by `HistoryViewModel.AggregateTypes`.
### Client.UI-010
@@ -259,7 +259,7 @@ Standard Deviation.
| Severity | Low |
| Category | Code organization & conventions |
| Location | `Controls/DateTimeRangePicker.axaml.cs:33-37`, `Controls/DateTimeRangePicker.axaml.cs:70-80` |
| Status | Open |
| Status | Resolved |
**Description:** `DateTimeRangePicker` declares `MinDateTimeProperty` /
`MaxDateTimeProperty` styled properties with public CLR accessors, but neither is
@@ -272,7 +272,7 @@ constraint the control does not enforce.
path (turn out-of-range input red, as invalid input already is) or remove the two
unused styled properties.
**Resolution:** _(open)_
**Resolution:** Resolved 2026-05-23 — Removed `MinDateTimeProperty` / `MaxDateTimeProperty` and their CLR accessors from `DateTimeRangePicker.axaml.cs`. No XAML or external caller binds the properties (grep across the repo confirmed only the control file referenced them), so removing the dead API surface is the correct fix; adding min/max clamping would have been speculative behaviour without a calling site.
### Client.UI-011
@@ -281,7 +281,7 @@ unused styled properties.
| Severity | Low |
| Category | Documentation & comments |
| Location | `Views/MainWindow.axaml:81`, `Services/JsonSettingsService.cs:11-15` |
| Status | Open |
| Status | Resolved |
**Description:** The certificate-store-path `TextBox` watermark reads
`(default: AppData/LmxOpcUaClient/pki)`, referencing the legacy pre-task-#208
@@ -293,4 +293,4 @@ that no longer matches where settings and the PKI store actually live.
**Recommendation:** Update the watermark to reference `OtOpcUaClient/pki`, or bind
it to `ClientStoragePaths.GetPkiPath()` so it cannot drift again.
**Resolution:** _(open)_
**Resolution:** Resolved 2026-05-23 — Updated the watermark text in `Views/MainWindow.axaml` from `(default: AppData/LmxOpcUaClient/pki)` to `(default: AppData/OtOpcUaClient/pki)` so it matches the canonical folder name resolved by `ClientStoragePaths` (the binding-to-helper alternative was considered but a static string keeps the watermark cheap; the path is also already documented in `docs/Client.UI.md`).
+71 -4
View File
@@ -4,8 +4,8 @@
|---|---|
| Module | `src/Core/ZB.MOM.WW.OtOpcUa.Core.ScriptedAlarms` |
| Reviewer | Claude Code |
| Review date | 2026-05-22 |
| Commit reviewed | `76d35d1` |
| Review date | 2026-05-23 |
| Commit reviewed | `a9be809` |
| Status | Reviewed |
| Open findings | 0 |
@@ -14,6 +14,8 @@
A comprehensive review completes every category, recording "No issues found" where
a category produced nothing rather than leaving it blank.
### 2026-05-22 review (commit `76d35d1`)
| # | Category | Result |
|---|---|---|
| 1 | Correctness & logic bugs | Core.ScriptedAlarms-002 |
@@ -27,6 +29,26 @@ a category produced nothing rather than leaving it blank.
| 9 | Testing coverage | Core.ScriptedAlarms-012 |
| 10 | Documentation & comments | Core.ScriptedAlarms-003 |
### 2026-05-23 re-review (commit `a9be809`)
Focused re-review of the Core.ScriptedAlarms-009 resolution (commit `0001cdd`) —
new `AlarmScratch` class, `_scratchByAlarmId` ConcurrentDictionary, `RefillReadCache`
helper, and internal test accessors. Only the changed/new code since `76d35d1` was
re-examined; existing closed findings stay as audit trail.
| # | Category | Result |
|---|---|---|
| 1 | Correctness & logic bugs | No issues found |
| 2 | OtOpcUa conventions | No issues found |
| 3 | Concurrency & thread safety | No issues found |
| 4 | Error handling & resilience | No issues found |
| 5 | Security | No issues found |
| 6 | Performance & resource management | No issues found |
| 7 | Design-document adherence | No issues found |
| 8 | Code organization & conventions | No issues found |
| 9 | Testing coverage | No issues found |
| 10 | Documentation & comments | Core.ScriptedAlarms-013 |
## Findings
### Core.ScriptedAlarms-001
@@ -156,13 +178,29 @@ a category produced nothing rather than leaving it blank.
| Severity | Low |
| Category | Performance & resource management |
| Location | `ScriptedAlarmEngine.cs:309-315`, `ScriptedAlarmEngine.cs:271` |
| Status | Won't Fix |
| Status | Resolved |
**Description:** `BuildReadCache` allocates a fresh `Dictionary<string, DataValueSnapshot>` on every predicate evaluation, i.e. on every upstream tag change for every referencing alarm. On a busy line where many tags feeding many alarms change frequently, this is a steady stream of short-lived dictionary allocations on the hot path. `AlarmPredicateContext` is also newly constructed each evaluation (line 281).
**Recommendation:** Minor. If the evaluation path shows up in allocation profiling, the read cache could be a reused per-alarm buffer cleared between evaluations (evaluations are already serialised under `_evalGate`, so a single shared scratch dictionary is safe). Not worth doing speculatively — flag for the perf surface in `docs/v2/Galaxy.Performance.md` if alarm evaluation is ever soak-tested.
**Resolution:** Won't Fix 2026-05-23 — per the recommendation, no code change. Documented the known allocation characteristic in `docs/v2/Galaxy.Performance.md` (new "Scripted-alarm engine — known hot-path allocations" section) so a future soak that surfaces pressure has a noted mitigation (reused per-alarm scratch buffer) and we don't re-find this in a later review.
**Resolution:** Resolved 2026-05-23 — added a per-alarm reusable `AlarmScratch`
(read-cache `Dictionary` + `AlarmPredicateContext`) held in
`_scratchByAlarmId`, populated lazily on first evaluation and refilled in place
by the new `RefillReadCache(Dictionary, IReadOnlySet)` helper on every
subsequent re-eval. `BuildReadCache` is gone. The reuse is safe because every
evaluation runs under `_evalGate`; the context wraps the dictionary by
reference, so the predicate's `ctx.GetTag(path)` sees the freshly-refilled
values. `LoadAsync` clears `_scratchByAlarmId` alongside `_alarms` so a
config-publish drops the prior generation's scratch (Inputs / Logger may
change). Regression tests added in `ScriptedAlarmEngineTests`:
`Reevaluation_reuses_the_same_read_cache_dictionary`,
`Reevaluation_reuses_the_same_predicate_context`, and
`LoadAsync_drops_the_prior_generations_scratch`; internal test hooks
`TryGetScratchReadCacheForTest` / `TryGetScratchContextForTest` exposed via
the existing `InternalsVisibleTo`. `docs/v2/Galaxy.Performance.md` "Scripted-alarm
engine" section rewritten to document the new reuse contract. Suite now 66
green (was 63).
### Core.ScriptedAlarms-010
@@ -208,3 +246,32 @@ a category produced nothing rather than leaving it blank.
**Recommendation:** Add engine-level tests that inject a controllable `Func<DateTime>` clock to drive `RunShelvingCheck`, cover the remaining Part 9 engine methods end-to-end, assert subscriber-exception isolation, and add a store-failure fake to lock in the chosen persistence-failure semantics from finding 007.
**Resolution:** Resolved 2026-05-22 — added 8 new engine-level tests covering all 6 gap areas: injectable-clock timed-shelve expiry via `RunShelvingCheckForTest`, `ConfirmAsync`/`TimedShelveAsync`/`UnshelveAsync`/`EnableAsync` end-to-end, subscriber-exception isolation, store-failure invariant, second-`LoadAsync` timer-leak regression, and `AreInputsReady` Bad/Uncertain guard; exposed `RunShelvingCheckForTest()` internal hook on the engine.
### Core.ScriptedAlarms-013
| Field | Value |
|---|---|
| Severity | Low |
| Category | Documentation & comments |
| Location | `ScriptedAlarmEngine.cs:66-81` |
| Status | Resolved |
**Description:** The new internal test accessors `TryGetScratchReadCacheForTest` and `TryGetScratchContextForTest` (introduced by the Core.ScriptedAlarms-009 resolution at `0001cdd`) return the *live* per-alarm scratch — the same `Dictionary<string, DataValueSnapshot>` instance the engine clears and refills in `RefillReadCache` under `_evalGate`, plus the `AlarmPredicateContext` that wraps it by reference. The XML docs describe the intended use ("assert the scratch is reused across evaluations (two reads return the same instance)") but do not explicitly warn that:
1. The returned `IReadOnlyDictionary` is the engine's mutable working set. Enumerating it from a test thread while the engine is mid-evaluation (e.g. during a `ReevaluateAsync` queued by `OnUpstreamChange`, or a `ShelvingCheckAsync` callback) is a concurrent-read-while-writer scenario against a plain `Dictionary` — undefined behaviour, can throw `InvalidOperationException` or return torn data.
2. Reference-equality comparisons (`ReferenceEquals(a, b)`) and single-key indexer reads (`dict["Temp"]`) on a quiesced engine are the only safe uses. The existing regression tests stay within those bounds, but a future test author has no in-code signal that broader reads are unsafe.
The engine itself is correct — `RefillReadCache` runs only under `_evalGate`, so the engine never tears its own state. The risk is purely on the test-side contract.
**Recommendation:** Add a `<remarks>` block to both `TryGetScratchReadCacheForTest` and `TryGetScratchContextForTest` stating that the returned references point at live engine state, that reads are only safe when the engine is known to be idle (no in-flight `ReevaluateAsync`/`ShelvingCheckAsync`/`LoadAsync`), and that the intended uses are reference-identity assertions plus single-key lookups against a quiesced engine — never enumeration. No code change required; the engine's correctness depends on `_evalGate`, which is already documented.
**Resolution:** Resolved 2026-05-23 — applied the recommendation verbatim.
Added a `<remarks>` block to each of `TryGetScratchReadCacheForTest` and
`TryGetScratchContextForTest` documenting the synchronization contract:
the returned references point at live engine state refilled in place under
`_evalGate`, enumeration during an in-flight evaluation is a
concurrent-read-while-writer scenario against a plain `Dictionary`
(undefined behaviour), and the only safe uses are reference-identity
comparisons + single-key reads against a quiesced engine. No code change
required — the engine's correctness was always there; only the test-side
contract was undocumented.
+434 -16
View File
@@ -4,8 +4,8 @@
|---|---|
| Module | `src/Core/ZB.MOM.WW.OtOpcUa.Core.Scripting` |
| Reviewer | Claude Code |
| Review date | 2026-05-22 |
| Commit reviewed | `76d35d1` |
| Review date | 2026-05-23 |
| Commit reviewed | `a9be809` |
| Status | Reviewed |
| Open findings | 0 |
@@ -14,18 +14,23 @@
A comprehensive review completes every category, recording "No issues found" where
a category produced nothing rather than leaving it blank.
| # | Category | Result |
|---|---|---|
| 1 | Correctness & logic bugs | Core.Scripting-004, Core.Scripting-005 |
| 2 | OtOpcUa conventions | No issues found |
| 3 | Concurrency & thread safety | Core.Scripting-006 |
| 4 | Error handling & resilience | Core.Scripting-007 |
| 5 | Security | Core.Scripting-001, Core.Scripting-002, Core.Scripting-003 |
| 6 | Performance & resource management | Core.Scripting-008 |
| 7 | Design-document adherence | Core.Scripting-009 |
| 8 | Code organization & conventions | No issues found |
| 9 | Testing coverage | Core.Scripting-010, Core.Scripting-011 |
| 10 | Documentation & comments | No issues found |
The 2026-05-23 re-review only covers code touched between commits `76d35d1` and
`a9be809` (primarily the Core.Scripting-008 ALC rewrite + the broadened BCL
references). Categories where the new code surface produced no issues are
recorded as "No new issues" for that pass.
| # | Category | Result (76d35d1) | Result (a9be809, new code only) |
|---|---|---|---|
| 1 | Correctness & logic bugs | Core.Scripting-004, Core.Scripting-005 | Core.Scripting-015 |
| 2 | OtOpcUa conventions | No issues found | No new issues |
| 3 | Concurrency & thread safety | Core.Scripting-006 | Core.Scripting-014 |
| 4 | Error handling & resilience | Core.Scripting-007 | No new issues |
| 5 | Security | Core.Scripting-001, Core.Scripting-002, Core.Scripting-003 | Core.Scripting-012, Core.Scripting-013 |
| 6 | Performance & resource management | Core.Scripting-008 | Core.Scripting-016 |
| 7 | Design-document adherence | Core.Scripting-009 | No new issues |
| 8 | Code organization & conventions | No issues found | No new issues |
| 9 | Testing coverage | Core.Scripting-010, Core.Scripting-011 | No new issues |
| 10 | Documentation & comments | No issues found | No new issues |
## Findings
@@ -240,7 +245,7 @@ race ordering.
| Severity | Low |
| Category | Performance & resource management |
| Location | `CompiledScriptCache.cs:34`, `ScriptEvaluator.cs:34` |
| Status | Won't Fix |
| Status | Resolved |
**Description:** `CompiledScriptCache` has no capacity bound (acknowledged in the class
remarks) and no eviction. Each cached `ScriptEvaluator` holds a Roslyn `ScriptRunner<T>`
@@ -257,7 +262,33 @@ compile scripts into a collectible `AssemblyLoadContext` so `Clear()` can unload
generations. At minimum add a note to `docs/ScriptedAlarms.md` so operators with
high-publish-frequency deployments are aware.
**Resolution:** Resolved 2026-05-23 — accepted as a documented known limitation rather than fixing in code (collectible `AssemblyLoadContext` for Roslyn-emitted assemblies is a v3 concern). The "Compile cache" section of `docs/VirtualTags.md` now carries a "Per-publish assembly accretion (accepted limitation, Core.Scripting-008)" note that operators with high-publish-frequency deployments can scan, and `docs/ScriptedAlarms.md` cross-references it. The accretion is benign at the expected "low thousands" of scripts scale; recommended mitigation is a scheduled server restart for deployments that publish very frequently.
**Resolution:** Resolved 2026-05-23 — switched the compile pipeline off the legacy
`CSharpScript.CreateDelegate` path (which emits into the default, non-collectible
`AssemblyLoadContext`) and onto a hand-rolled `CSharpCompilation`
`Compilation.Emit(MemoryStream)``ScriptAssemblyLoadContext.LoadFromStream` chain,
with the new `ScriptAssemblyLoadContext` constructed `isCollectible: true`. Each
compiled script lives in its own ALC; `ScriptEvaluator` now implements `IDisposable`
and calls `AssemblyLoadContext.Unload()` on dispose. `CompiledScriptCache.Clear()`
disposes every materialised evaluator before dropping its dictionary entry, and
`CompiledScriptCache` itself is now `IDisposable` for graceful server shutdown.
After a publish-replace cycle the prior generation's emitted assemblies become
eligible for GC; the reclaim is GC-timing-sensitive (Unload is
*eligible-for-collection*, not synchronous) and the next collection cycle reclaims
them. The references list is now BCL-wide (System.* + netstandard + Microsoft.Win32.Registry
via the TRUSTED_PLATFORM_ASSEMBLIES set) so forbidden BCL types resolve at compile and
`ForbiddenTypeAnalyzer` is the sole security gate (consistent with the
Core.Scripting-001 / -002 model). `docs/VirtualTags.md` "Compile cache" section rewritten;
`docs/ScriptedAlarms.md` cross-reference updated to drop the obsolete restart guidance.
Regression tests added in `CompiledScriptCacheTests`:
`Dispose_unloads_compiled_script_assembly_load_context`,
`Clear_disposes_every_materialised_evaluator`, and
`GetOrCompile_after_Dispose_throws_ObjectDisposedException`; the first two
prove ALC unload via `WeakReference` + bounded `GC.Collect()` loops. Suite now 104
green (was 101). Authoring convention: the synthesized wrapper is an ordinary
C# static method, so scripts must end with explicit `return …;` per ordinary C# rules
(the legacy `CSharpScript` "last expression yields result" shorthand no longer applies);
every script in the existing corpus already uses explicit `return` so this is a doc-only
change for new authors.
### Core.Scripting-009
@@ -336,3 +367,390 @@ a script logging at Error level produces both a `scripts-*.log` event and a comp
Warning event.
**Resolution:** Resolved 2026-05-23 — added three new test files: `ScriptSandboxBuildTests` covers the `Build` null / non-`ScriptContext` / base-class / concrete-subclass paths; `ScriptContextTests` locks `Deadband` boundary semantics (equal-to-tolerance returns false; just-over returns true; symmetric in direction; zero-tolerance returns true only on non-equal; negative tolerance trips on any non-equal); the new `Factory_plus_companion_sink_integration_surfaces_script_error_in_both_logs` test in `ScriptLogCompanionSinkTests` wires `ScriptLoggerFactory` + the companion sink together end-to-end and asserts an Error emission lands in both the scripts sink (at Error) and the main sink (at Warning), each tagged with `ScriptName`. Suite now 101 green (was 85 before).
### Core.Scripting-012
| Field | Value |
|---|---|
| Severity | High |
| Category | Security |
| Location | `ForbiddenTypeAnalyzer.cs:60-76`, `ScriptSandbox.cs:96-126` |
| Status | Resolved |
**Description:** The Core.Scripting-008 rewrite broadened the BCL references list
from a narrow allow-list (`System.Private.CoreLib` + `System.Linq` only) to the
full `TRUSTED_PLATFORM_ASSEMBLIES` set filtered to `System.*` + `netstandard` +
`Microsoft.Win32.Registry`. This change correctly delegates the security gate to
`ForbiddenTypeAnalyzer` (the new comment in `ScriptSandbox` calls this out
explicitly), but the analyzer's deny-list has not been expanded to match the new
attack surface, and three categories of dangerous BCL types in the `System.*`
allow-listed assemblies are now reachable from script source:
1. **`System.Threading.ThreadPool`** (in namespace `System.Threading`). The
Core.Scripting-003 fix added `System.Threading.Tasks` to deny `Task.Run` /
`Parallel` fan-out because background work that outlives the per-evaluation
timeout is the explicit threat. `ThreadPool.QueueUserWorkItem`,
`ThreadPool.UnsafeQueueUserWorkItem`, and `ThreadPool.RegisterWaitForSingleObject`
are exactly the same threat — they schedule background work that outlives the
`WaitAsync(Timeout)` budget and tie up worker threads — but `System.Threading`
itself is allowed (because `CancellationToken` / `SemaphoreSlim` / `Volatile`
live there). The Core.Scripting-003 resolution is incomplete on the new
reference surface.
2. **`System.Threading.Timer`** (same namespace). Schedules a background
callback; the script returns control to the engine but the timer keeps
firing past the evaluation budget. Same threat as `Task.Run`.
3. **`System.Runtime.Loader.AssemblyLoadContext`** (in namespace
`System.Runtime.Loader`, which is not denied — only `System.Runtime.InteropServices`
is). The constructor + `LoadFromAssemblyPath` / `LoadFromStream` /
`LoadFromAssemblyName` let a script load an arbitrary DLL into the host
process. Pass (1) of the analyzer resolves the receiver type
(`AssemblyLoadContext`, allowed) + the invocation symbol's containing type
(also `AssemblyLoadContext`, allowed) and lets the call through. Pass (2)
only inspects `TypeSyntax` nodes — if the script discards the returned
`Assembly` (e.g. `alc.LoadFromAssemblyPath(@"C:\evil.dll");`) there is no
`TypeSyntax` for the analyzer to walk and the call is accepted. Triggering
execution of the loaded code from inside the sandbox is hard (most of
`Assembly`'s surface is in `System.Reflection`, which is denied) but the
defense-in-depth gap is real: an attacker who can author a script also
typically controls a file path on the server (Admin UI uploads, share
mounts) and loading an assembly is the prerequisite to every chained
escape — module initializers, type-resolve handlers, and a future analyzer
slip would all become exploitable.
In addition, two lower-impact `System.*` types are reachable that arguably
shouldn't be: **`System.Console.SetOut`** / **`Console.SetError`** could
redirect the host's console streams (requires constructing a
`System.IO.TextWriter`, which is blocked, so the practical exploit is
`Console.WriteLine` log-spam only), and **`System.Globalization.CultureInfo.DefaultThreadCurrentCulture`**
could perturb the entire process's formatting behavior (subtle but real cross-script
side effect).
The original Core.Scripting-001 finding called out the model: when an allow-listed
namespace contains dangerous types, those types must be denied type-granularly.
The new reference surface introduces several more such types and the deny-list
has not been kept in sync.
**Recommendation:** Add `System.Threading.ThreadPool` and `System.Threading.Timer`
to `ForbiddenFullTypeNames`. Add `System.Runtime.Loader` as a namespace prefix
to `ForbiddenNamespacePrefixes` (every type in `System.Runtime.Loader`
`AssemblyLoadContext`, `AssemblyDependencyResolver`, `AssemblyLoadEventArgs` — is
out of script scope). Consider adding `System.Console` to `ForbiddenFullTypeNames`
to stop log-spam through the host's console streams, and at minimum document
`CultureInfo.DefaultThreadCurrentCulture` as an accepted cross-script side
effect. Each addition must have a regression test in `ScriptSandboxTests`
mirroring the Core.Scripting-010 vector style. Update
`docs/v2/implementation/phase-7-scripting-and-alarming.md` decision #6 + the
"Sandbox escape" compliance-check row to enumerate the additions, per the
Core.Scripting-009 doc-sync convention.
**Resolution:** Resolved 2026-05-23 — added `System.Runtime.Loader` to
`ForbiddenNamespacePrefixes` (the namespace-prefix form preferred over
type-granular per the recommendation; future BCL additions to that namespace
are denied by default). Added `System.Threading.ThreadPool` and
`System.Threading.Timer` to `ForbiddenFullTypeNames` — both live in
`System.Threading` shared with allowed sync primitives so they must be
type-granular. Regression tests added to `ScriptSandboxTests`:
`Rejects_ThreadPool_QueueUserWorkItem_at_compile`,
`Rejects_Timer_new_at_compile`, `Rejects_AssemblyLoadContext_at_compile`.
`docs/v2/implementation/phase-7-scripting-and-alarming.md` decision #6 +
the Sandbox-escape compliance-check row both updated per the
Core.Scripting-009 doc-sync convention. The two lower-impact suggestions
from the recommendation (`System.Console`, `CultureInfo.DefaultThreadCurrentCulture`)
were intentionally not addressed: `Console.SetOut` requires constructing
a `System.IO.TextWriter` which is already blocked, leaving only
`Console.WriteLine` log-spam (annoyance, not a security threat); and
`CultureInfo.DefaultThreadCurrentCulture` is a cross-script side-effect
worth knowing about but doesn't escape the sandbox. Recording both as
accepted minor risks. Test totals after fix: Core.Scripting 107 green
(was 104 — +3 new rejection tests).
### Core.Scripting-013
| Field | Value |
|---|---|
| Severity | Medium |
| Category | Security |
| Location | `ScriptEvaluator.cs:202-225` (`BuildWrapperSource`) |
| Status | Resolved |
**Description:** The synthesized wrapper pastes the user's source verbatim
between `{` and `}` braces inside a static method body, with a `#line 1`
directive and no escaping. The legacy `CSharpScript.CreateDelegate` path was
robust to this because Roslyn's scripting compiler parses script source as a
top-level statement sequence; the new hand-rolled path is parsing ordinary C# in
a method body, so a script that injects matching `{` / `}` braces can extend the
synthesized compilation unit with additional methods, classes, or `#line`
directives. For example, a script body of
`return 0; } public static int Evil() { return 0; }} public static class CompiledScript2 { public static void M() {`
ends the `Run` method early, declares a sibling `Evil` method (and even a
sibling `CompiledScript2` class) inside the synthesized namespace, then opens an
unclosed method that consumes the wrapper's trailing `}\n}`. With matching brace
counts the script parses cleanly and compiles.
`ForbiddenTypeAnalyzer` walks every descendant of every syntax tree, so any
forbidden BCL types named inside the injected methods are still caught — the
finding is **not** a direct sandbox escape. However:
- It silently relaxes the operator-visible authoring contract documented in
`docs/VirtualTags.md` ("scripts are statement bodies that end with an
explicit `return …;`") to "scripts can be any compilable C# inside the
`CompiledScript` namespace" — operators have access to features the design
did not intend to expose (local types defined as siblings of `Run`, custom
module initializers via attributes, etc.).
- A script can embed its own `#line` directives that override the
`#line 1` we emit just above the user source, producing misleading error
locations in compiler diagnostics surfaced to the operator.
- Future hardening that relies on syntactic-shape assumptions (e.g.
"every script has exactly one method") would silently fail.
- It widens the analyzer's surface: the analyzer's correctness now depends on
Pass (2) correctly walking every conceivable C# construct that can name a
type, including ones a normal script body would never contain
(`UnmanagedCallersOnly` attribute, function pointer types `delegate*<...>`,
pattern types, switch arm types, …).
**Recommendation:** Either (a) reject scripts whose parsed body contains
declarations other than statements — walk the wrapper's syntax tree after parse
and require that the only members of `CompiledScript` are the single `Run`
method, raising a `CompilationErrorException` if anything else appears — or
(b) parse the user source independently as a `BlockSyntax` and inject the
parsed block as the method body via the Roslyn syntax API, which makes
brace-mismatched / class-injecting source unparseable. Add a regression test
covering at least the brace-injection vector
(`return 0; } public static int Evil() { return 0;`).
**Resolution:** Resolved 2026-05-23 — took option (a) from the recommendation:
added an `EnforceSingleRunMember` step to `ScriptEvaluator.Compile` (runs after
`CSharpSyntaxTree.ParseText` of the synthesized wrapper, before Roslyn
compile). The check requires exactly one type declaration in the compilation
unit (the `CompiledScript` class) AND exactly one member on that class (the
`Run` method). Any deviation — a sibling class, an additional namespace, a
sibling method or nested type alongside `Run` — throws
`CompilationErrorException` with diagnostic IDs `LMX001` / `LMX002` and a
message that names Core.Scripting-013 and points at the offending span. Two
regression tests added: `Rejects_sibling_method_injection_via_balanced_braces`
(injects a sibling method via `} public static int Evil() { …`) and
`Rejects_sibling_class_injection_via_balanced_braces` (injects an entire
sibling namespace + class). Option (b) (parse the user source independently
as a `BlockSyntax` and inject via Roslyn syntax API) was considered but the
parse-and-validate approach is more readable, gives clearer error messages,
and keeps the wrapper-source generation textual.
### Core.Scripting-014
| Field | Value |
|---|---|
| Severity | Medium |
| Category | Concurrency & thread safety |
| Location | `CompiledScriptCache.cs:91-103` (`Clear`) |
| Status | Resolved |
**Description:** `Clear()` snapshots `_cache.Keys.ToArray()` then iterates,
calling `TryRemove(key, out var lazy)` on each — the key-only overload, not
the value-scoped one used in `GetOrCompile`'s catch block. Between the
snapshot and a given `TryRemove`, a concurrent `GetOrCompile(scriptSource)`
call that hashes to the same key can re-insert a fresh `Lazy` whose `.Value`
the caller already retained. The unconditional `TryRemove` then removes that
fresh `Lazy` and `DisposeLazyIfMaterialised(lazy)` calls `Dispose()` on its
evaluator — unloading the ALC while the concurrent caller still holds a
reference to the evaluator and intends to invoke it.
This is exactly the race-window pattern the Core.Scripting-006 resolution
fixed in `GetOrCompile`'s catch block (the test
`Failed_compile_eviction_does_not_remove_a_concurrent_retry_entry` locks it
there). `Clear()` carries the same shape but uses the older, value-blind
overload, so the same race that finding-006 addresses is still latent on the
publish-replace path.
In current production wiring `Clear()` is intended for config-publish + tests
— neither overlaps steady-state evaluation under the documented design — so
the in-practice impact is low. But the cache is checked in as the
forward-looking compile cache for the engines (per `Script.SourceHash`'s docs
and the cache's own remarks); a future wiring that calls `Clear()` from
publish while evaluations are in flight would dispose live evaluators.
**Recommendation:** Replace the snapshot + `TryRemove(key, out var lazy)`
sequence with an enumeration that captures the `Lazy` reference at snapshot
time and uses the value-scoped `TryRemove(KeyValuePair<,>)` overload, mirroring
the Core.Scripting-006 fix:
```csharp
foreach (var entry in _cache.ToArray())
{
if (_cache.TryRemove(entry))
DisposeLazyIfMaterialised(entry.Value);
}
```
Add a regression test that races `GetOrCompile` against `Clear` and asserts
the caller's evaluator is still usable.
**Resolution:** Resolved 2026-05-23 — applied the recommendation verbatim:
replaced `foreach (var key in _cache.Keys.ToArray())` + key-only
`TryRemove(key, out var lazy)` with `foreach (var entry in _cache.ToArray())` +
value-scoped `TryRemove(entry)` (the `KeyValuePair<,>` overload). A concurrent
GetOrCompile re-add between the snapshot and the remove inserts a fresh Lazy
under the same key; the value-scoped comparison sees the mismatch and leaves
the fresh entry intact (instead of evicting + disposing the live evaluator
the concurrent caller still holds). Regression test
`Clear_uses_value_scoped_TryRemove_so_a_race_inserted_entry_survives` added
to `CompiledScriptCacheTests` — single-threaded simulation that snapshots
the dict, mutates the entry to a fresh Lazy mid-flight, drives the same
value-scoped TryRemove overload Clear now uses, and asserts the fresh entry
survives. The two-thread race would be flaky to model directly; the
single-threaded semantic test is sufficient because the fix is the
overload-selection itself.
### Core.Scripting-015
| Field | Value |
|---|---|
| Severity | Low |
| Category | Correctness & logic bugs |
| Location | `ScriptEvaluator.cs:234-270` (`ToCSharpTypeName`) |
| Status | Resolved |
**Description:** `ToCSharpTypeName` is documented to handle nested types
(`Outer+Inner``Outer.Inner`) via `Replace('+', '.')` for the
non-generic path (line 269) but the generic path (line 263-266) constructs the
name from `def.FullName!` then takes a substring up to the backtick. For a
**nested generic** type — e.g. `Outer.Inner<T>` whose `FullName` is
`Outer+Inner`1` — `Replace('+', '.')` is applied first, then `Substring(0, IndexOf('`'))`
on `"Outer.Inner`1"` produces `"Outer.Inner"`, which is correct. Good.
However, the generic branch does NOT handle the case where the OPEN generic
type itself is nested with `+` inside the parent's name when the parent is
also generic (`Outer<TOuter>.Inner<TInner>` — `FullName` is
`Outer`1+Inner`1[[TOuter,TInner]]`). For that shape `Substring(0, IndexOf('`'))`
truncates at the first backtick — yielding `"Outer.Inner"` — silently dropping
the closed type arguments of `Outer<TOuter>`. The resulting source string is
syntactically valid but semantically wrong: `global::Outer.Inner<TInner>` does
not name `Outer<TOuter>.Inner<TInner>`.
The production code never hits this shape — `TResult` is always one of
`object?`, `bool`, `int`, `double`, `string?`, `DateTime` across the
virtual-tag engine, the alarm engine, the test-harness, and the test suite,
and `ScriptGlobals<TContext>` is always a top-level generic over a top-level
`ScriptContext` subclass. The bug is latent. But it is a foot-gun for a
future caller (e.g. a Phase-8 driver that wires a context type defined as a
nested generic for grouping reasons) and the XML-doc comment claims
"handles nested types" without qualifying it.
A second smaller correctness gap on the same path: the comment claims
`global::`-qualified FQNs prevent accidental capture by the wrapper's `using`
directives, which is true for the generic / non-generic branches, but the
primitive aliases (`bool`, `int`, `string`, `object`, …) are emitted unqualified.
A script that defines a local `class bool` (now possible per Core.Scripting-013)
would shadow the alias. Probably benign, but worth a comment.
**Recommendation:** Add a check in the generic branch that walks the FullName
backtick-by-backtick — or use `INamedTypeSymbol`-style name composition from
`def.DeclaringType` recursively — so multi-arity-nested generics emit
correctly. At minimum update the XML doc to qualify "handles nested types" as
"handles single-level nesting; nested generics whose parent is itself generic
are not supported". Add a `ToCSharpTypeName` unit test (currently nothing
exercises this method directly — coverage relies on the end-to-end compile path,
so the bug surfaces only as a misleading Roslyn diagnostic).
**Resolution:** Resolved 2026-05-23 — rewrote the generic-type branch of
`ToCSharpTypeName` to walk the `FullName` segment-by-segment (split on `.`
after `+ → .` substitution). For each segment ending in `Name\`N`, the
algorithm consumes N generic arguments from `t.GetGenericArguments()` in
order and emits them as `<…>` on that segment. Nested generic-in-generic
shapes (`Outer<T>.Inner<U>`) now emit as
`global::Ns.Outer<T>.Inner<U>` (valid C#) rather than the pre-fix
`global::Ns.Outer<T, U>` (which dropped the segment boundary entirely
because `IndexOf('`')` truncated at the first backtick). No production
caller exercises this shape today (all `TContext` / `TResult` types in
the codebase are top-level non-nested), so the fix is preemptive — but
the algorithm is now correct for any future nested-generic context type.
### Core.Scripting-016
| Field | Value |
|---|---|
| Severity | Medium |
| Category | Performance & resource management |
| Location | `src/Core/ZB.MOM.WW.OtOpcUa.Core.VirtualTags/VirtualTagEngine.cs:74-117`, `src/Core/ZB.MOM.WW.OtOpcUa.Core.ScriptedAlarms/ScriptedAlarmEngine.cs:139-182` |
| Status | Resolved |
**Description:** The Core.Scripting-008 resolution introduced
`ScriptEvaluator.IDisposable` + `CompiledScriptCache.Clear()` that disposes
each materialised evaluator before dropping its dictionary entry, so per-publish
ALC accretion is no longer process-lifetime rooted **inside the cache**. But
neither production consumer of `ScriptEvaluator` uses the cache — both
`VirtualTagEngine.Load` and `ScriptedAlarmEngine.LoadAsync` call
`ScriptEvaluator<TContext, TResult>.Compile(...)` directly (lines 105 / 160
respectively), store the evaluator inside an internal `VirtualTagState` /
`AlarmState` record, and on the next `Load` simply call `_tags.Clear()` /
`_alarms.Clear()`. The dropped `ScriptEvaluator` references never have
`Dispose()` called on them, so the underlying `ScriptAssemblyLoadContext`
instances are never `Unload()`-ed. The .NET runtime guarantees that a
collectible ALC stays alive until `Unload()` is called explicitly — having
"no strong references" is necessary but not sufficient. So the publish-replace
cycle leaks every prior generation's emitted assembly exactly as before the
fix, even though the fix's infrastructure is in place.
The Core.Scripting-008 regression tests in `CompiledScriptCacheTests`
(`Dispose_unloads_compiled_script_assembly_load_context` /
`Clear_disposes_every_materialised_evaluator`) prove the contract on
`CompiledScriptCache`, but neither engine uses that class. There is no
integration test exercising the actual publish path — i.e. that calling
`VirtualTagEngine.Load(...)` twice with different definitions makes the prior
generation's ALC eligible for GC. As a result the fix's headline guarantee
("Server restarts are no longer required to reclaim compiled-script memory" —
`docs/VirtualTags.md`) is not actually delivered to the production engines.
This is the same observable behavior the original Core.Scripting-008 finding
described, surfacing on a different code path that the resolution did not touch.
**Recommendation:** Either route the engines' compile path through
`CompiledScriptCache<TContext, TResult>` (the documented design — the cache
already returns the same evaluator instance for identical source, and its
`Clear()` now performs the right disposal — and `Script.SourceHash`'s doc-comment
already names this as the cache key), or make the engines' `Load` methods
dispose the previous `ScriptEvaluator` instances before reassigning. The
former is the cleaner change because it also collapses redundant compiles
across publishes for unchanged scripts. Add an integration test along the
lines of `CompiledScriptCacheTests.Clear_disposes_every_materialised_evaluator`
for each engine: snapshot the per-evaluator emitted assembly via
`WeakReference`, call `Load(...)` with a different definition set, and assert
the prior generation's assemblies become collectable.
**Resolution:** Resolved 2026-05-23 — took the cleaner route from the
recommendation: routed both engines' compile paths through
`CompiledScriptCache<TContext, TResult>`. `VirtualTagEngine` and
`ScriptedAlarmEngine` each gained a private `_compileCache` instance field,
their `Load`/`LoadAsync` methods now call `_compileCache.GetOrCompile(source)`
instead of `ScriptEvaluator.Compile(source)` directly, and the cache is cleared
on publish-replace alongside the existing `_tags` / `_alarms` clears so the
prior generation's ALCs are disposed before recompile. Engine `Dispose` now
also calls `_compileCache.Dispose()` so the engine-shutdown path actually
releases the emitted assemblies. **Side-fix:** discovered + fixed an
adjacent bug in `CompiledScriptCache.Dispose()` itself — it set
`_disposed = true` before calling `Clear()`, but `Clear()`'s pre-existing
`if (_disposed) return` guard then aborted the drain unconditionally, so
the Dispose-triggered cleanup was a silent no-op. Removed the disposed-guard
on `Clear()` (the operation is idempotent — clearing an empty/cleared cache
is safe). Without this side-fix the engine-Dispose path would have left
the cached evaluators rooted forever even though the call chain looked
correct. **Side-fix for ScriptedAlarmEngine.Dispose:** moved the pre-existing
"do NOT clear `_alarms` here" comment to "clear `_alarms` AFTER the drain"
because the AlarmState records hold the `TimedScriptEvaluator`/`ScriptEvaluator`
delegates that root the emitted assembly — leaving them in `_alarms` after
Dispose was the same root-the-script-forever pattern this finding is about,
just on the engine side rather than the cache side. The `_alarms` clear is
safe after the `Task.WhenAll` drain because that drain guarantees no
background callback is mid-flight. Regression tests added:
`VirtualTagEngineTests.Dispose_unloads_compiled_script_assembly` and
`ScriptedAlarmEngineTests.Dispose_unloads_compiled_predicate_assembly`
each uses `WeakReference` + bounded `GC.Collect()` to prove the emitted
assembly is reclaimable after `engine.Dispose()`. **Important test pattern
detail:** the alarms test originally failed because its helper was
`async Task<WeakReference>` — async state machines capture locals as
state-struct fields and can keep them alive past the method's apparent end.
Rewrote as a synchronous helper using `LoadAsync(...).GetAwaiter().GetResult()`
inside two cooperating `[MethodImpl(MethodImplOptions.NoInlining)]` helpers
(`CompileAlarmAndCaptureWeak` + `ExtractEmittedAssemblyWeakRef`) so the
intermediate reflection locals die when each helper returns. Test totals
after fix: Core.Scripting 104 green (unchanged); VirtualTags 57 green (was
56 — +1 unload test); ScriptedAlarms 67 green (was 66 — +1 unload test).
+44 -14
View File
@@ -7,7 +7,7 @@
| Review date | 2026-05-22 |
| Commit reviewed | `76d35d1` |
| Status | Reviewed |
| Open findings | 6 |
| Open findings | 0 |
## Checklist coverage
@@ -96,7 +96,7 @@ into `AbCipCommandBase`.
| Severity | Low |
| Category | Concurrency & thread safety |
| Location | `src/Drivers/Cli/ZB.MOM.WW.OtOpcUa.Driver.AbCip.Cli/Commands/SubscribeCommand.cs:50-56,60-61` |
| Status | Open |
| Status | Resolved |
**Description:** The `OnDataChange` handler writes change lines to `console.Output`
(a `TextWriter`) from the driver's poll-engine callback thread, while the command's
@@ -112,7 +112,12 @@ during the watch loop widens it.
writes during the subscription with a shared lock so poll-thread and main-thread
output cannot interleave.
**Resolution:** _(open)_
**Resolution:** Resolved 2026-05-23 — moved the "Subscribed to ... Ctrl+C to stop."
banner write to BEFORE `driver.OnDataChange +=` (and therefore before
`SubscribeAsync`). With the handler not yet attached when the banner runs, the
poll thread cannot fire change events into `console.Output` concurrently with the
main-thread banner write. After `+=` the only writer to `console.Output` is the
poll-thread handler, so no interleaving is possible.
### Driver.AbCip.Cli-004
@@ -121,7 +126,7 @@ output cannot interleave.
| Severity | Low |
| Category | Error handling & resilience |
| Location | `src/Drivers/Cli/ZB.MOM.WW.OtOpcUa.Driver.AbCip.Cli/Commands/SubscribeCommand.cs:28,58`; `AbCipCommandBase.cs:26-34` |
| Status | Open |
| Status | Resolved |
**Description:** `--interval-ms` (`IntervalMs`) is taken verbatim and passed as
`TimeSpan.FromMilliseconds(IntervalMs)` to `SubscribeAsync` with no validation. A
@@ -135,7 +140,16 @@ downstream component to sanitise operator input is fragile. `--timeout-ms` on
`ExecuteAsync` / in `AbCipCommandBase`, throwing a `CommandException` with the
accepted range when out of bounds.
**Resolution:** _(open)_
**Resolution:** Resolved 2026-05-23 — added `SubscribeCommand.ValidateInterval(int)`
(throws `CommandException` for `IntervalMs <= 0`) called at the top of
`SubscribeCommand.ExecuteAsync`; moved `TimeoutMs > 0` validation into the
`AbCipCommandBase.Timeout` getter (throws `CommandException` for non-positive
`TimeoutMs`) and added a `_ = Timeout` touch in `SubscribeCommand.ExecuteAsync` to
fire that guard before the driver opens. Other commands trip the same guard
naturally via `BuildOptions(...).Timeout`. Regression tests
`AbCipCommandBaseTests.Timeout_get_throws_CommandException_when_TimeoutMs_is_non_positive`
and `SubscribeCommandIntervalTests.ValidateInterval_rejects_non_positive` cover
both paths.
### Driver.AbCip.Cli-005
@@ -144,7 +158,7 @@ accepted range when out of bounds.
| Severity | Low |
| Category | Performance & resource management |
| Location | `src/Drivers/Cli/ZB.MOM.WW.OtOpcUa.Driver.Cli.Common/DriverCommandBase.cs:51-59` |
| Status | Open |
| Status | Resolved |
**Description:** `ConfigureLogging` assigns a freshly created Serilog logger to the
process-global `Log.Logger` but never calls `Log.CloseAndFlush()`. For a short-lived
@@ -159,7 +173,12 @@ module's review.)
`AppDomain.ProcessExit` or a `finally` in the command), or have the CLI use a
disposable logger scoped to `ExecuteAsync`.
**Resolution:** _(open)_
**Resolution:** Resolved 2026-05-23 — `Driver.Cli.Common` already exposes
`DriverCommandBase.FlushLogging()` (a `Log.CloseAndFlush()` wrapper); the AB CIP
CLI was not calling it. Added `FlushLogging()` in the `finally` block of all four
commands (`ProbeCommand`, `ReadCommand`, `WriteCommand`, `SubscribeCommand`) so
buffered Serilog output is flushed before the process exits, including the
Ctrl+C-driven `subscribe` path. No edits to `Driver.Cli.Common` were needed.
### Driver.AbCip.Cli-006
@@ -168,7 +187,7 @@ disposable logger scoped to `ExecuteAsync`.
| Severity | Low |
| Category | Design-document adherence |
| Location | `src/Drivers/Cli/ZB.MOM.WW.OtOpcUa.Driver.AbCip.Cli/AbCipCommandBase.cs:29-34` |
| Status | Open |
| Status | Resolved |
**Description:** `AbCipCommandBase` overrides the abstract `DriverCommandBase.Timeout`
property with a getter derived from `TimeoutMs` and an empty `init` body
@@ -183,9 +202,13 @@ bug.
**Recommendation:** Either drop the `init` accessor entirely (make the override a
get-only expression-bodied property) or have the empty `init` throw
`NotSupportedException` to make the "driven by TimeoutMs" contract explicit and
fail-fast.
fail-fast. (Drop is not viable because the abstract base declares `{ get; init; }`
and an override must provide both accessors.)
**Resolution:** _(open)_
**Resolution:** Resolved 2026-05-23 — the `init` accessor now throws
`NotSupportedException` with a message pointing the caller at `TimeoutMs`. A new
test `AbCipCommandBaseTests.Timeout_setter_is_inert_and_does_not_silently_swallow_assignments`
asserts that an object-initializer assignment to `Timeout` fails fast.
### Driver.AbCip.Cli-007
@@ -194,7 +217,7 @@ fail-fast.
| Severity | Low |
| Category | Testing coverage |
| Location | `tests/Drivers/Cli/ZB.MOM.WW.OtOpcUa.Driver.AbCip.Cli.Tests/WriteCommandParseValueTests.cs` |
| Status | Open |
| Status | Resolved |
**Description:** The only test file covers `WriteCommand.ParseValue` and
`ReadCommand.SynthesiseTagName` — both pure static helpers. There is no coverage for
@@ -213,7 +236,12 @@ driver CLIs.
(`HostAddress`, `PlcFamily`, `DeviceName`), the supplied tag list, and the `Timeout`
derived from `TimeoutMs`.
**Resolution:** _(open)_
**Resolution:** Resolved 2026-05-23 — added
`tests/.../AbCipCommandBaseTests.cs` covering `BuildOptions` (probe disabled,
controller-browse disabled, alarm-projection disabled, single device with
`HostAddress` / `PlcFamily` / `cli-{Family}` device name, tag list passed verbatim,
`Timeout` derived from `TimeoutMs`) and `DriverInstanceId` (`abcip-cli-{Gateway}`),
plus the `RejectStructure` guard (throws for `Structure`, no-op for atomic types).
### Driver.AbCip.Cli-008
@@ -222,7 +250,7 @@ derived from `TimeoutMs`.
| Severity | Low |
| Category | Documentation & comments |
| Location | `docs/Driver.AbCip.Cli.md:8-9` |
| Status | Open |
| Status | Resolved |
**Description:** `docs/Driver.AbCip.Cli.md` opens with "Second of four driver
test-client CLIs (Modbus -> AB CIP -> AB Legacy -> S7 -> TwinCAT)." The count "four"
@@ -235,4 +263,6 @@ doc's "four" and the truncated chain are both stale.
and complete the chain (Modbus -> AB CIP -> AB Legacy -> S7 -> TwinCAT -> FOCAS), or
drop the explicit count and link `docs/DriverClis.md` as the authoritative roster.
**Resolution:** _(open)_
**Resolution:** Resolved 2026-05-23 — rewrote the lead paragraph to "Second of six
driver test-client CLIs (Modbus -> AB CIP -> AB Legacy -> S7 -> TwinCAT -> FOCAS)"
and added a link to `docs/DriverClis.md` as the authoritative roster.
+43 -13
View File
@@ -7,7 +7,7 @@
| Review date | 2026-05-22 |
| Commit reviewed | `76d35d1` |
| Status | Reviewed |
| Open findings | 6 |
| Open findings | 0 |
## Checklist coverage
@@ -68,7 +68,7 @@ type (mirroring the existing `Bit` and unsupported-type branches). Either catch
| Severity | Low |
| Category | Correctness & logic bugs |
| Location | `Commands/WriteCommand.cs:27-29`, `Program.cs:6-9` |
| Status | Open |
| Status | Resolved |
**Description:** The `--value` option help text states "booleans accept
true/false/1/0", but `ParseBool` (`WriteCommand.cs:74-80`) and the error message
@@ -82,7 +82,10 @@ with both the code and the design doc.
matching the wording used elsewhere (e.g. "booleans accept
true/false, 1/0, on/off, yes/no").
**Resolution:** _(open)_
**Resolution:** Resolved 2026-05-23 — updated `WriteCommand.Value` description to
list the full alias set: "booleans accept true/false, 1/0, on/off, yes/no".
Regression test `CommandMetadataTests.WriteCommand_value_help_lists_full_boolean_alias_set`
asserts the description contains every alias group.
### Driver.AbLegacy.Cli-003
@@ -91,7 +94,7 @@ true/false, 1/0, on/off, yes/no").
| Severity | Low |
| Category | Concurrency & thread safety |
| Location | `Commands/SubscribeCommand.cs:47-53` |
| Status | Open |
| Status | Resolved |
**Description:** The `OnDataChange` handler calls `console.Output.WriteLine(line)`
(the synchronous overload) directly from the `PollGroupEngine` poll thread. The
@@ -109,7 +112,10 @@ change events through a `Channel<string>` drained by a single consumer task, or
guard the `WriteLine` with a lock. At minimum, document that the interleaving is
accepted because output is human-facing and line-buffered.
**Resolution:** _(open)_
**Resolution:** Resolved 2026-05-23 — `SubscribeCommand` now serialises every
console write through a shared `consoleGate` lock: the poll-thread
`OnDataChange` callback and the command-thread "Subscribed to ..." line both take
the lock before calling `WriteLine`. Comment in the source documents the intent.
### Driver.AbLegacy.Cli-004
@@ -118,7 +124,7 @@ accepted because output is human-facing and line-buffered.
| Severity | Low |
| Category | Error handling & resilience |
| Location | `Commands/ProbeCommand.cs:37-56`, `Commands/ReadCommand.cs:39-50`, `Commands/WriteCommand.cs:48-59`, `Commands/SubscribeCommand.cs:41-76` |
| Status | Open |
| Status | Resolved |
**Description:** Every command does `await using var driver = new AbLegacyDriver(...)`
*and* an explicit `await driver.ShutdownAsync(...)` in the `finally`. `AbLegacyDriver`
@@ -135,7 +141,12 @@ cleanup on every exit path including exceptions.
since the commands deliberately pass `CancellationToken.None` to shutdown so teardown
is not cut short by a cancelled `ct`.
**Resolution:** _(open)_
**Resolution:** Resolved 2026-05-23 — replaced `await using var driver` with a
plain `var driver` in all four commands (`ProbeCommand`, `ReadCommand`,
`WriteCommand`, `SubscribeCommand`), keeping the explicit
`finally { await driver.ShutdownAsync(CancellationToken.None) }` as the single
teardown path. Comment in each command documents the intent so readers do not
have to verify idempotency.
### Driver.AbLegacy.Cli-005
@@ -144,7 +155,7 @@ is not cut short by a cancelled `ct`.
| Severity | Low |
| Category | Design-document adherence |
| Location | `Commands/SubscribeCommand.cs:23-25`, `docs/Driver.AbLegacy.Cli.md:94-96` |
| Status | Open |
| Status | Resolved |
**Description:** The subscribe command interval option is `--interval-ms`
(default 1000). `docs/Driver.AbLegacy.Cli.md` shows the subscribe example as
@@ -160,7 +171,13 @@ but the documented contract drifts between the two CLIs.
`--interval-ms` description for parity with the AbCip CLI, and mention the
`--interval-ms` long form + 1000 ms default in `docs/Driver.AbLegacy.Cli.md`.
**Resolution:** _(open)_
**Resolution:** Resolved 2026-05-23 — extended the `SubscribeCommand.IntervalMs`
help text to match AbCip ("Publishing interval in milliseconds (default 1000).
PollGroupEngine floors sub-250ms values.") and added a paragraph under the
`subscribe` section in `docs/Driver.AbLegacy.Cli.md` naming the `-i` /
`--interval-ms` long form, the 1000 ms default, and the 250 ms floor. Regression
test `CommandMetadataTests.SubscribeCommand_interval_ms_help_notes_PollGroupEngine_floor`
asserts the description mentions "250".
### Driver.AbLegacy.Cli-006
@@ -169,7 +186,7 @@ but the documented contract drifts between the two CLIs.
| Severity | Low |
| Category | Code organization & conventions |
| Location | `Commands/ProbeCommand.cs:20-22` |
| Status | Open |
| Status | Resolved |
**Description:** `ProbeCommand` declares its `--type` option with no short alias,
while `ReadCommand`, `WriteCommand`, and `SubscribeCommand` all declare `--type`
@@ -182,7 +199,13 @@ it silently rejected on `probe`.
for consistency with the other three commands. (The AbCip CLI `ProbeCommand` has
the same omission, so a cross-CLI sweep is worthwhile.)
**Resolution:** _(open)_
**Resolution:** Resolved 2026-05-23 — added the `'t'` short alias to
`ProbeCommand.DataType`. Regression test
`CommandMetadataTests.ProbeCommand_type_has_short_alias_t` (plus the parity test
`Other_commands_keep_type_short_alias_t` for read/write/subscribe) asserts the
short alias is present on every command. The same omission still exists in the
AbCip CLI's `ProbeCommand` — flagged as a sibling sweep but out of scope for
this module.
### Driver.AbLegacy.Cli-007
@@ -191,7 +214,7 @@ the same omission, so a cross-CLI sweep is worthwhile.)
| Severity | Low |
| Category | Testing coverage |
| Location | `tests/Drivers/Cli/ZB.MOM.WW.OtOpcUa.Driver.AbLegacy.Cli.Tests/WriteCommandParseValueTests.cs` |
| Status | Open |
| Status | Resolved |
**Description:** The only test file in the CLI test project covers
`WriteCommand.ParseValue` and `ReadCommand.SynthesiseTagName`. Two behaviours that
@@ -210,4 +233,11 @@ Driver.AbLegacy.Cli-001. `BuildOptions` is reachable via `InternalsVisibleTo`
tag passthrough) and an overflow-input test for `ParseValue` so the fix for
Driver.AbLegacy.Cli-001 is locked in by a regression test.
**Resolution:** _(open)_
**Resolution:** Resolved 2026-05-23 — added `BuildOptionsTests` (five tests:
probe disabled, device shape from Gateway+PlcType, tag passthrough, timeout
propagation, empty tag list) covering `AbLegacyCommandBase.BuildOptions` via a
nested `TestCommand` subclass annotated with `[Command]` to satisfy the CliFx
analyzer. The overflow path for `ParseValue` is already covered by
`WriteCommandParseValueTests.ParseValue_out_of_range_throws_CommandException`
(theory with `short.Parse` + `AnalogInt` overflow inputs), added when finding
Driver.AbLegacy.Cli-001 was resolved.
+176 -9
View File
@@ -4,10 +4,10 @@
|---|---|
| Module | `src/Drivers/Cli/ZB.MOM.WW.OtOpcUa.Driver.Cli.Common` |
| Reviewer | Claude Code |
| Review date | 2026-05-22 |
| Commit reviewed | `76d35d1` |
| Review date | 2026-05-23 |
| Commit reviewed | `a9be809` |
| Status | Reviewed |
| Open findings | 2 |
| Open findings | 0 |
## Checklist coverage
@@ -16,7 +16,7 @@ a category produced nothing rather than leaving it blank.
| # | Category | Result |
|---|---|---|
| 1 | Correctness & logic bugs | Driver.Cli.Common-001, Driver.Cli.Common-002 |
| 1 | Correctness & logic bugs | Driver.Cli.Common-001, Driver.Cli.Common-002, Driver.Cli.Common-007 |
| 2 | OtOpcUa conventions | No issues found |
| 3 | Concurrency & thread safety | Driver.Cli.Common-003 |
| 4 | Error handling & resilience | Driver.Cli.Common-004 |
@@ -24,9 +24,47 @@ a category produced nothing rather than leaving it blank.
| 6 | Performance & resource management | No issues found |
| 7 | Design-document adherence | No issues found |
| 8 | Code organization & conventions | No issues found |
| 9 | Testing coverage | Driver.Cli.Common-005 |
| 9 | Testing coverage | Driver.Cli.Common-005, Driver.Cli.Common-008 |
| 10 | Documentation & comments | Driver.Cli.Common-006 |
## Re-review 2026-05-23 (commit `a9be809`)
Delta scope: commit `5a9c459` extends the `FormatStatus` shortlist with five
`Bad*` codes (`BadInternalError` 0x80020000, `BadNotWritable` 0x803B0000,
`BadOutOfRange` 0x803C0000, `BadNotSupported` 0x803D0000, `BadDeviceFailure`
0x80550000) the FOCAS / AbCip / AbLegacy native-protocol mappers emit. Tests
extended with parallel `[InlineData]` rows on the well-known Theory plus a new
`FormatStatus_names_native_driver_emitted_codes` Theory.
Cross-checked the five new hex literals against the OPC Foundation
`Opc.Ua.StatusCodes` table via DeepWiki:
| Name added | Code in shortlist | Spec value | Verdict |
|---|---|---|---|
| `BadInternalError` | `0x80020000` | `0x80020000` | Correct |
| `BadNotWritable` | `0x803B0000` | `0x803B0000` | Correct |
| `BadOutOfRange` | `0x803C0000` | `0x803C0000` | Correct |
| `BadNotSupported` | `0x803D0000` | `0x803D0000` | Correct |
| `BadDeviceFailure` | `0x80550000` | **`0x808B0000`** | **WRONG — `0x80550000` is `BadSecurityPolicyRejected`** |
The `BadDeviceFailure` mismapping is the same shape of bug as the original
Driver.Cli.Common-001 (wrong hex literal copied into the shortlist); recorded
as Driver.Cli.Common-007. The wrong constant also lives in
`FocasStatusMapper.cs`, `AbCipStatusMapper.cs`, `AbLegacyStatusMapper.cs`,
`TwinCATStatusMapper.cs`, `S7Driver.cs`, and `ModbusDriver.cs` — those are in
other modules' review scope but are noted here so future re-reviewers know
this isn't isolated. (`StatusCodeMap.cs` in Driver.Galaxy + the Wonderware
historian mappers use the correct `0x808B0000`, confirming the discrepancy.)
Testing observation: the new `FormatStatus_names_native_driver_emitted_codes`
Theory is fully redundant with the well-known Theory (the five rows were also
added there in the same commit) and uses `ShouldContain` rather than
`ShouldBe` — recorded as Driver.Cli.Common-008.
Other categories (concurrency, security, performance, design-doc adherence,
code organisation, documentation) are unchanged by this delta — no new
issues found.
## Findings
### Driver.Cli.Common-001
@@ -130,7 +168,7 @@ dispose the previous logger if reconfiguration is genuinely intended.
| Severity | Low |
| Category | Error handling & resilience |
| Location | `src/Drivers/Cli/ZB.MOM.WW.OtOpcUa.Driver.Cli.Common/SnapshotFormatter.cs:68-70` |
| Status | Open |
| Status | Resolved |
**Description:** `FormatTable` calls `rows.Max(r => r.Tag.Length)` (and the same for the
value and status columns) without guarding against empty input. When `tagNames` and
@@ -143,7 +181,13 @@ instead of producing an empty (header-only) table.
separator, or an explicit "no rows" line), or use `DefaultIfEmpty(0).Max(...)` for the
width computations.
**Resolution:** _(open)_
**Resolution:** Resolved 2026-05-23 — `FormatTable` guards each `rows.Max(...)` width
computation with a `rows.Length == 0 ? "<HEADER>".Length : Math.Max(...)` ternary, so
an empty batch read returns the header + separator rows (no data rows) instead of
throwing `InvalidOperationException`. The fix was landed in commit `1433a1c` alongside
the -002 work, and the regression test
`SnapshotFormatterTests.FormatTable_with_empty_input_returns_header_only` (added under
-005) exercises it.
### Driver.Cli.Common-005
@@ -178,7 +222,7 @@ empty-input and `DriverCommandBase` level-selection tests.
| Severity | Low |
| Category | Documentation & comments |
| Location | `src/Drivers/Cli/ZB.MOM.WW.OtOpcUa.Driver.Cli.Common/SnapshotFormatter.cs:71`, `src/Drivers/Cli/ZB.MOM.WW.OtOpcUa.Driver.Cli.Common/DriverCommandBase.cs:9` |
| Status | Open |
| Status | Resolved |
**Description:** Two minor doc inaccuracies. (1) The comment at `SnapshotFormatter.cs:71`
states the "source-time column is fixed-width (ISO-8601 to ms) so no max-measurement
@@ -194,4 +238,127 @@ library. The XML doc is stale relative to the shipped driver-CLI set.
right-most and intentionally unpadded rather than claiming fixed width. Add FOCAS to the
`DriverCommandBase` class-summary driver list.
**Resolution:** _(open)_
**Resolution:** Resolved 2026-05-23 — (1) `SnapshotFormatter.cs:71` comment reworded
to state the source-time column is the right-most one and intentionally not
measured/padded, calling out the null-timestamp `"-"` case explicitly. (2) FOCAS was
added to the `DriverCommandBase` class-summary driver enumeration in commit `7ff356b`
(landed alongside the -003 work).
### Driver.Cli.Common-007
| Field | Value |
|---|---|
| Severity | High |
| Category | Correctness & logic bugs |
| Location | `src/Drivers/Cli/ZB.MOM.WW.OtOpcUa.Driver.Cli.Common/SnapshotFormatter.cs:129` |
| Status | Resolved |
**Description:** Commit `5a9c459` added `0x80550000u => "BadDeviceFailure"` to the
`FormatStatus` shortlist, but `0x80550000` is the canonical OPC UA spec value for
`BadSecurityPolicyRejected`, not `BadDeviceFailure`. The correct spec value for
`BadDeviceFailure` is `0x808B0000` (verified against the OPC Foundation
`Opc.Ua.StatusCodes` table via DeepWiki; corroborated locally by
`src/Drivers/ZB.MOM.WW.OtOpcUa.Driver.Galaxy/Runtime/StatusCodeMap.cs:40`
(`BadDeviceFailure = 0x808B0000u`) and the two Wonderware historian quality mappers,
which all hand-pin the correct value).
This is the same shape of bug Driver.Cli.Common-001 closed: a wrong hex literal
in the shortlist that the test theory (`SnapshotFormatterTests.cs:42`) blindly
asserts against the same wrong value, so the bug is invisible to CI.
Practical impact, two-sided:
1. A driver that returns the real spec `BadDeviceFailure` (`0x808B0000`) — e.g.
the `Driver.Galaxy.StatusCodeMap` path on a deploy-time device fault — falls
through the named shortlist entirely. Since Driver.Cli.Common-002 added the
severity-class fallback, it now renders as `0x808B0000 (Bad)` instead of
`0x808B0000 (BadDeviceFailure)` — operators lose the specific class label
`docs/Driver.FOCAS.Cli.md:153` tells them to read off the output.
2. A driver that returns `0x80550000` (which `FocasStatusMapper`, `AbCipStatusMapper`,
`AbLegacyStatusMapper`, `TwinCATStatusMapper`, `S7Driver`, and `ModbusDriver` all
misuse as "BadDeviceFailure") now renders as `0x80550000 (BadDeviceFailure)`
matching driver intent but contradicting the OPC UA spec, which says any client
that decodes the same payload using the OPC Foundation stack will see
`BadSecurityPolicyRejected`. A security-monitoring tool keying on
`BadSecurityPolicyRejected` will fire on a CPU fault, while real
`BadSecurityPolicyRejected` returns from the secure-channel layer would be
mislabelled as a device fault. Operator-facing CLI output and machine-readable
status semantics disagree.
The deeper bug is the wrong constant in the native-protocol mappers (out of scope
for this module), but the `SnapshotFormatter` shortlist is its own
spec-authoritative reference point — Driver.Cli.Common-001 explicitly framed the
shortlist as canonical, with the in-line "keep [these literals] in sync with [the
Opc.Ua.StatusCodes] table" comment at `SnapshotFormatter.cs:112-113`. That
contract is now broken.
**Recommendation:** Change line 129 to `0x808B0000u => "BadDeviceFailure"`. Update
the matching `[InlineData]` rows in `SnapshotFormatterTests.cs` (line 42 in the
well-known Theory; line 60 in the redundant Theory — see Driver.Cli.Common-008).
Also note in the resolution that the native-protocol mappers (FOCAS / AbCip /
AbLegacy / TwinCAT / S7 / Modbus) need the same fix recorded against their own
module reviews — the constant `0x80550000` should be replaced with `0x808B0000`
everywhere it claims to mean `BadDeviceFailure`. Consider Driver.Cli.Common-001's
original recommendation again: add a CI test that cross-checks every shortlist
entry against `Opc.Ua.StatusCodes` reflection so this class of bug stops
recurring.
**Resolution:** Resolved 2026-05-23 — corrected `SnapshotFormatter.FormatStatus`
to map `0x808B0000u => "BadDeviceFailure"` (was `0x80550000u`). Updated the
`InlineData` row in the well-known Theory accordingly; the redundant native-
emitted Theory was deleted entirely per Driver.Cli.Common-008. Added a regression
row to `FormatStatus_does_not_apply_pre_fix_wrong_names` pinning that
`0x80550000` no longer renders as `BadDeviceFailure` (mirroring the
Driver.Cli.Common-001 wrong-name guards). The underlying constant was also
corrected in all six native-protocol mappers as part of the same commit:
`FocasStatusMapper.BadDeviceFailure`, `AbCipStatusMapper.BadDeviceFailure`,
`AbLegacyStatusMapper.BadDeviceFailure`, `TwinCATStatusMapper.BadDeviceFailure`,
`ModbusDriver.StatusBadDeviceFailure`, `S7Driver.StatusBadDeviceFailure` — all
moved from `0x80550000u` to `0x808B0000u`. The three downstream Modbus tests
(`ModbusExceptionMapperTests` 3 InlineData rows + 1 ShouldBe assertion;
`ExceptionInjectionTests.StatusBadDeviceFailure` constant) updated to expect
the corrected code. **Behavior change:** OPC UA clients consuming the native
drivers now see the canonical `BadDeviceFailure` (0x808B0000) instead of the
misnamed `BadSecurityPolicyRejected` (0x80550000) on device-fault paths —
operator-facing CLI output and machine-readable status semantics now agree.
Suite totals after fix: Driver.Cli.Common.Tests 43 green (was 48 — minus 5
redundant rows); Modbus.Tests 263; AbCip.Tests 262; AbLegacy.Tests 157;
FOCAS.Tests 178; S7.Tests 112; TwinCAT.Tests 131; all green. The Opc.Ua.StatusCodes
cross-check the recommendation suggested is recorded as a follow-up worth
considering but is out of scope for this fix.
### Driver.Cli.Common-008
| Field | Value |
|---|---|
| Severity | Low |
| Category | Testing coverage |
| Location | `tests/Drivers/Cli/ZB.MOM.WW.OtOpcUa.Driver.Cli.Common.Tests/SnapshotFormatterTests.cs:50-64` |
| Status | Resolved |
**Description:** Commit `5a9c459` adds a new
`FormatStatus_names_native_driver_emitted_codes` `[Theory]` whose five
`[InlineData]` rows are identical to five rows added to the existing
`FormatStatus_names_well_known_status_codes` `[Theory]` in the same commit
(lines 32, 39, 40, 41, 42). The new Theory therefore adds no coverage. It is
also weaker than the Theory it duplicates: it asserts
`output.ShouldContain($"({expectedName})")` (substring match) where the
well-known Theory asserts `output.ShouldBe($"0x{status:X8} ({expectedName})")`
(exact match including the hex prefix). The substring form would not catch a
regression where the hex literal renders wrong but the name is correct.
This is not a correctness problem — both Theories pass — but it's a
copy-paste inconsistency that costs maintainer attention every time someone
reads the test file and wonders which Theory is authoritative.
**Recommendation:** Either (a) delete the new Theory entirely — its five rows
are already covered by the well-known Theory in the same commit — or (b) keep
it but switch to `ShouldBe($"0x{status:X8} ({expectedName})")` so its
assertion strength matches the rest of the file. Option (a) is cleaner: the
commit's "operator workflow" intent is documented well enough in the
well-known Theory comment block; the redundant Theory is dead weight.
**Resolution:** Resolved 2026-05-23 — took option (a): deleted the
`FormatStatus_names_native_driver_emitted_codes` Theory entirely. Its five
`InlineData` rows are covered by the well-known Theory's `ShouldBe` (strict
exact-match assertion), which is the authoritative shortlist test. Landed
alongside the Driver.Cli.Common-007 fix in the same commit.
+59 -11
View File
@@ -7,7 +7,7 @@
| Review date | 2026-05-22 |
| Commit reviewed | `76d35d1` |
| Status | Reviewed |
| Open findings | 5 |
| Open findings | 0 |
## Checklist coverage
@@ -45,7 +45,7 @@ a category produced nothing rather than leaving it blank.
| Severity | Low |
| Category | Error handling & resilience |
| Location | `Commands/WriteCommand.cs:58-68` |
| Status | Open |
| Status | Resolved |
**Description:** `WriteCommand.ParseValue` parses the numeric `--value` types
(`Byte`/`Int16`/`Int32`/`Float32`/`Float64`) with `sbyte.Parse` / `short.Parse`
@@ -65,7 +65,16 @@ literal — consistent with how `ParseBool` already handles bad boolean input.
The same pattern exists in the sibling S7 CLI; a shared helper in
`Driver.Cli.Common` would fix both.
**Resolution:** _(open)_
**Resolution:** Resolved 2026-05-23 — wrapped the `ParseValue` numeric switch in
`try/catch (FormatException)` and `try/catch (OverflowException)` that rethrow as
`CliFx.Exceptions.CommandException` with a message naming the `--type` and the
offending value, mirroring the friendly text the `Bit` path already produced.
Added `WriteCommandParseValueTests` with [Theory] cases covering non-numeric
input across `Byte`/`Int16`/`Int32`/`Float32`/`Float64`, overflow edges
(sbyte ±1, short max+1, > int.MaxValue), and an assertion that the exception
message names both the type and the offending value. A shared `Driver.Cli.Common`
helper is the cleaner long-term fix (cross-CLI duplication remains) but is left
to the Driver.Cli.Common review per this module's edit scope.
### Driver.FOCAS.Cli-002
@@ -74,7 +83,7 @@ The same pattern exists in the sibling S7 CLI; a shared helper in
| Severity | Low |
| Category | Concurrency & thread safety |
| Location | `Commands/SubscribeCommand.cs:45-51` |
| Status | Open |
| Status | Resolved |
**Description:** The `subscribe` command attaches an `OnDataChange` handler that
calls the synchronous `console.Output.WriteLine`. `OnDataChange` is raised from
@@ -93,7 +102,15 @@ console writes with a lock shared between the banner and the handler. Optionally
detach the handler in the `finally` block before `ShutdownAsync` for symmetry
with the `handle` teardown already present there.
**Resolution:** _(open)_
**Resolution:** Resolved 2026-05-23 — introduced a `writeLock` shared between the
`OnDataChange` handler and the banner write so the poll-engine background thread
and the CliFx invocation thread can't interleave partial lines. Added an
explanatory comment above the handler explaining the CliFx-`IConsole` rationale
and the synchronous-on-background-thread design — mirroring the Modbus / S7
copies of this command. Also added a try/catch around the handler body so a
transient stdout error cannot tear down the poll loop, and Serilog-warn-logs the
swallowed exception. Added `SubscribeCommandConsoleHandlerTests` to guard the
`writeLock` + CliFx-`IConsole` rationale against future copy-paste regressions.
### Driver.FOCAS.Cli-003
@@ -102,7 +119,7 @@ with the `handle` teardown already present there.
| Severity | Low |
| Category | Error handling & resilience |
| Location | `FocasCommandBase.cs:19` (`CncPort`), `FocasCommandBase.cs:27` (`TimeoutMs`), `Commands/SubscribeCommand.cs:23` (`IntervalMs`) |
| Status | Open |
| Status | Resolved |
**Description:** The numeric command options `--cnc-port`, `--timeout-ms`, and
`--interval-ms` are accepted without range validation. A zero or negative
@@ -120,7 +137,17 @@ timeout and interval strictly positive. The same gap exists across the sibling
driver CLIs, so a shared validation helper in `Driver.Cli.Common` is the
cleaner fix.
**Resolution:** _(open)_
**Resolution:** Resolved 2026-05-23 — added a protected `ValidateOptions(int?
intervalMs = null)` helper on `FocasCommandBase` that rejects `--cnc-port`
outside `1..65535`, non-positive `--timeout-ms`, and non-positive
`--interval-ms` (when the caller passes one) with a `CliFx.Exceptions.CommandException`
naming the option and the rejected value. `ProbeCommand` / `ReadCommand` /
`WriteCommand` call `ValidateOptions()` without an interval, `SubscribeCommand`
calls `ValidateOptions(IntervalMs)`. Added `FocasCommandBaseValidationTests`
covering accept-defaults, reject out-of-range port (0, -1, 65536), reject
non-positive timeout / interval, and skip-interval-when-omitted. A shared
helper in `Driver.Cli.Common` is the cleaner cross-CLI fix and is recorded
against that module's review.
### Driver.FOCAS.Cli-004
@@ -129,7 +156,7 @@ cleaner fix.
| Severity | Low |
| Category | Performance & resource management |
| Location | `Commands/ProbeCommand.cs:37,54`; `Commands/ReadCommand.cs:37,46`; `Commands/WriteCommand.cs:45,54`; `Commands/SubscribeCommand.cs:39,73` |
| Status | Open |
| Status | Resolved |
**Description:** Every command declares `await using var driver = new FocasDriver(...)`
**and** explicitly calls `await driver.ShutdownAsync(CancellationToken.None)` in
@@ -144,7 +171,14 @@ dead weight and obscures intent: a reader cannot tell whether the explicit
and rely on `await using` for disposal, or drop `await using` and keep the
explicit teardown — but not both. The same redundancy exists in the sibling CLIs.
**Resolution:** _(open)_
**Resolution:** Resolved 2026-05-23 — dropped the explicit
`await driver.ShutdownAsync(CancellationToken.None)` calls from the `finally`
blocks of `ProbeCommand`, `ReadCommand`, `WriteCommand`, and `SubscribeCommand`;
`await using` is now the sole driver-disposal mechanism per command
(`FocasDriver.DisposeAsync` itself runs `ShutdownAsync`). The subscribe command
keeps `UnsubscribeAsync` in its finally because that is a subscription-lifecycle
concern, not driver disposal. Added `CommandDisposalConventionsTests` to guard
the source-level convention against regression.
### Driver.FOCAS.Cli-005
@@ -153,7 +187,7 @@ explicit teardown — but not both. The same redundancy exists in the sibling CL
| Severity | Low |
| Category | Design-document adherence |
| Location | `Commands/WriteCommand.cs:50`, `Commands/ProbeCommand.cs:50` (via `SnapshotFormatter.FormatStatus`) |
| Status | Open |
| Status | Resolved |
**Description:** `docs/Driver.FOCAS.Cli.md` documents `BadDeviceFailure` and
`BadCommunicationError` as the key diagnostic signals an operator reads off
@@ -180,4 +214,18 @@ actually emit — at minimum `BadNotWritable`, `BadOutOfRange`, `BadNotSupported
because the gap defeats this module documented `probe`/`write` diagnostic
workflow; cross-reference the `Driver.Cli.Common` review.
**Resolution:** _(open)_
**Resolution:** Resolved 2026-05-23 — the cross-CLI fix landed in `Driver.Cli.Common`:
`SnapshotFormatter.FormatStatus` now names `BadInternalError` (0x80020000),
`BadNotWritable` (0x803B0000), `BadOutOfRange` (0x803C0000), `BadNotSupported`
(0x803D0000), and `BadDeviceFailure` (0x80550000) — the five codes the FOCAS /
AbCip / AbLegacy native-protocol mappers all emit but the shortlist previously
left unnamed (the canonical `BadTimeout` 0x800A0000 was already added under
Driver.Cli.Common-001). FOCAS `probe` / `write` against a non-writable parameter,
out-of-range address, unsupported function, busy device, or CNC-handle failure
now renders with the named status the `docs/Driver.FOCAS.Cli.md` workflow
promises, restoring parity between the docs and the shipped behaviour. Regression
`[Theory]` `FormatStatus_names_native_driver_emitted_codes` added to
`SnapshotFormatterTests` so the five names can't silently drop out of the
shortlist again; the existing well-known shortlist `[Theory]` was extended with
the same five entries to enforce the exact `0x... (Name)` rendering. Suite now
47 green (was 42).
+156 -6
View File
@@ -4,8 +4,8 @@
|---|---|
| Module | `src/Drivers/ZB.MOM.WW.OtOpcUa.Driver.Galaxy` |
| Reviewer | Claude Code |
| Review date | 2026-05-22 |
| Commit reviewed | `76d35d1` |
| Review date | 2026-05-23 |
| Commit reviewed | `a9be809` |
| Status | Reviewed |
| Open findings | 0 |
@@ -17,12 +17,39 @@
| 2 | OtOpcUa conventions | Driver.Galaxy-005 |
| 3 | Concurrency & thread safety | Driver.Galaxy-006, Driver.Galaxy-007 |
| 4 | Error handling & resilience | Driver.Galaxy-001, Driver.Galaxy-008, Driver.Galaxy-009 |
| 5 | Security | Driver.Galaxy-010 |
| 6 | Performance & resource management | Driver.Galaxy-011, Driver.Galaxy-012 |
| 7 | Design-document adherence | Driver.Galaxy-013 |
| 5 | Security | Driver.Galaxy-010, Driver.Galaxy-015 |
| 6 | Performance & resource management | Driver.Galaxy-011, Driver.Galaxy-012, Driver.Galaxy-016 |
| 7 | Design-document adherence | Driver.Galaxy-013, Driver.Galaxy-017 |
| 8 | Code organization & conventions | No issues found |
| 9 | Testing coverage | Driver.Galaxy-014 |
| 10 | Documentation & comments | Driver.Galaxy-005, Driver.Galaxy-013 |
| 10 | Documentation & comments | Driver.Galaxy-005, Driver.Galaxy-013, Driver.Galaxy-018 |
## Re-review 2026-05-23 (commit `a9be809`)
The only code-affecting change since `76d35d1` was commit `994997b` — the
sibling `mxaccessgw` repo restructured (the `clients/dotnet/MxGateway.Client`
project path and the `MxGateway.Contracts.Proto` namespace both moved), and
the driver's path-based `ProjectReference` started producing 87 build errors
solution-wide. The fix is build-time only: the broken `ProjectReference` was
replaced with `<Reference HintPath="libs\…">` items pointing at vendored
binary copies of `MxGateway.Client.dll` (99 KB, May 2026 known-good build)
and `MxGateway.Contracts.dll` (490 KB), and five `PackageReference`s that
the dropped project was previously providing transitively (`Google.Protobuf`,
`Grpc.Core.Api`, `Grpc.Net.Client`, `Microsoft.Extensions.Logging.Abstractions`,
`Polly`) were declared explicitly. The matching `Tests` csproj got the same
binary `<Reference>` for `MxGateway.Contracts` (replacing its own broken
`ProjectReference`). A `libs/README.md` documents what is vendored and the
two unwinding paths (sibling restores a client library, or driver migrates
to the new `ZB.MOM.WW.MxGateway.Contracts.Proto` namespace + reimplements
the `MxGatewayClient` / `MxGatewaySession` / `GalaxyRepositoryClient`
wrapper, ~2,200 LoC).
No `*.cs` file changed; the re-review walked only the categories that apply
to a build-time/packaging change. Categories with no new findings:
Correctness (1), OtOpcUa conventions (2), Concurrency (3), Error handling
(4), Code organization (8), Testing coverage (9). Four new findings are
recorded below (Driver.Galaxy-015..018) — none Critical, none High; two
Medium, two Low.
## Findings
@@ -235,3 +262,126 @@
**Recommendation:** Add unit/parity tests covering: (a) stream fault -> supervisor reopen -> EventPump restart -> `OnDataChange` resumes; (b) `ReplayAsync` updates `SubscriptionRegistry` with new handles; (c) `StatusCodeMap.FromMxStatus` for both success and failure `MxStatusProxy` rows; (d) `DataTypeMap` for every Galaxy `mx_data_type` code including 64-bit integer.
**Resolution:** Resolved 2026-05-22 — added `GalaxyDriverInfrastructureTests` covering `GetMemoryFootprint` (Driver.Galaxy-011) and `IAsyncDisposable` (Driver.Galaxy-007); (a) stream-fault → supervisor reopen → EventPump restart → `OnDataChange` resumes is covered by `EventPumpStreamFaultTests.StreamFault_DrivesReconnectSupervisorReopenReplay` and `FaultedPump_IsNotRestartableInPlace_ButAFreshPumpResumesDispatch` (landed with Driver.Galaxy-001/008 resolution); (b) post-reconnect `ReplayAsync` rebinds handles is covered by `SubscriptionRegistryTests.Rebind_*` suite; (c) `StatusCodeMap.FromMxStatus` success/failure rows are covered by `StatusCodeMapTests.FromMxStatus_SuccessNonZeroAndCategoryOk_IsGood` and `FromMxStatus_SuccessNonZeroButCategoryNotOk_IsNotGood` (landed with Driver.Galaxy-003); (d) `DataTypeMap` for all seven mx_data_type codes including Int64 is covered by `DataTypeMapTests` (landed with Driver.Galaxy-002).
### Driver.Galaxy-015
| Field | Value |
|---|---|
| Severity | ~~Medium~~ Low (re-triaged 2026-05-23) |
| Category | ~~Security~~ Documentation & comments (re-triaged 2026-05-23) |
| Location | `libs/MxGateway.Client.dll`, `libs/MxGateway.Contracts.dll`, `libs/README.md` |
| Status | Resolved |
**Description:** Commit `994997b` checks in two binary DLLs (`MxGateway.Client.dll`, 99 840 bytes; `MxGateway.Contracts.dll`, 489 984 bytes) under `src/Drivers/.../Driver.Galaxy/libs/` and references them via `<Reference HintPath="…" />`. These are the only checked-in binary build artefacts in the entire repo (a repo-wide `find` for non-`bin/`/`obj/` `*.dll` under `libs/` returns only these two), so the change sets a precedent. The accompanying `libs/README.md` states the DLLs are "byte-for-byte the build output" of the OtOpcUa team's own code against the gateway's open proto contracts, but there is no recorded provenance — no source-commit SHA from the sibling `mxaccessgw` repo that produced the build, no SHA-256/SHA-512 checksum, no `.gitattributes` rule marking these paths as binary (so a future churn-in-place will balloon the pack file). Without a recorded source commit + checksum it is impossible for a future reviewer/auditor to verify the binaries match a specific revision of the sibling repo — the assertion "we built them, not external" is unverifiable after the fact. Tampering or accidental swap (e.g. someone drops in a different DLL of the same name under the same path) would not be detectable.
**Recommendation:** (a) Pin the source provenance: add the sibling `mxaccessgw` commit SHA used to build each DLL to `libs/README.md`. (b) Record a SHA-256 of each `.dll` in `libs/README.md` so a future tamper or accidental update is detectable by running `Get-FileHash`/`sha256sum`. (c) Add a `.gitattributes` rule under `libs/` declaring `*.dll binary` (and consider `filter=lfs diff=lfs merge=lfs -text` if/when these need to be updated, to avoid bloating the pack file on every refresh). (d) Optional: a `dotnet test` time-check that compares the on-disk hash to the recorded hash, so a CI run notices if the file drifts from what the README claims.
**Resolution:** Resolved 2026-05-23. **Severity re-triage:** the original
finding framed this as a security concern about "tampering or accidental
swap by an unknown third party"; the user clarified that the DLLs are
their own code, built from their own `mxaccessgw` project — not third-party
binaries. That moves the concern from security (untrusted provenance) to
documentation (audit trail). Re-classified as Low Documentation &
Comments. Fix: `libs/README.md` now carries a Provenance section that
records the source-commit SHA (`dd7ca1634e2d2b8a866c81f0009bf87ee9427750`,
extracted from the `AssemblyInformationalVersion` baked into both DLLs by
the original build) and SHA-256 checksums of both binaries, plus a
re-verification recipe (`sha256sum libs/*.dll` + `ilspycmd <dll> | grep
AssemblyInformationalVersion`). Recommendations (c) `.gitattributes` and
(d) CI hash-check deferred — the DLLs are essentially frozen until one
of the two unwinding paths is taken, so adding LFS or a CI guard would
add infrastructure that the unwinding step would then have to remove.
Re-open if the vendoring becomes a recurring update target.
### Driver.Galaxy-016
| Field | Value |
|---|---|
| Severity | Medium |
| Category | Performance & resource management |
| Location | `ZB.MOM.WW.OtOpcUa.Driver.Galaxy.csproj:43-47`, `libs/README.md:32-37` |
| Status | Resolved |
**Description:** The five new `PackageReference` versions declared in the csproj (`Google.Protobuf` 3.34.1, `Grpc.Core.Api` 2.76.0, `Grpc.Net.Client` 2.71.0, `Microsoft.Extensions.Logging.Abstractions` 10.0.0, `Polly` 8.5.2) do not all match what the vendored `MxGateway.Client.dll` was built against. The DLL's PE metadata (extracted via `System.Reflection.Metadata`) shows references to `Grpc.Net.Client v2.0.0.0`, `Microsoft.Extensions.Logging.Abstractions v10.0.0.0`, and notably `Polly.Core v8.0.0.0` — and the source csproj just before the sibling-repo rename (commit `bd4a09a` from 2026-04-27) declared `Grpc.Net.Client` 2.76.0, `Microsoft.Extensions.Logging.Abstractions` 10.0.7, and `Polly.Core` 8.6.6 — *not* the meta-package `Polly`. Our driver pulls `Polly` 8.5.2 (which transitively pins `Polly.Core` 8.5.2 per its nuspec dependency), so the vendored client actually loads `Polly.Core` 8.5.2 at runtime against code compiled against 8.6.6. Across an 8.5 ↔ 8.6 minor delta this is usually safe (assembly-version is `v8.0.0.0` for both), but it is exactly the skew shape that surfaces as `MissingMethodException` if a 8.6-only API was used in the client. `libs/README.md` claims "versions match what the sibling repo's `ZB.MOM.WW.MxGateway.Contracts.csproj` uses so the gRPC + proto runtime stays binary-compatible" — that statement is correct only for `Google.Protobuf` and `Grpc.Core.Api`; the other three packages do not match.
**Recommendation:** Reconcile the declared package versions with what the vendored DLLs were built against — bump to `Grpc.Net.Client` 2.76.0, `Microsoft.Extensions.Logging.Abstractions` 10.0.7, swap `Polly` for `Polly.Core` 8.6.6 (the driver does not import the `Polly` legacy v7 surface, only Polly.Core via the client). Alternatively, rebuild the vendored DLLs against the same versions the csproj declares and refresh the binaries. Update `libs/README.md` to record the exact versions the DLLs were built against, so the next vendoring refresh has an authoritative reference.
**Resolution:** Resolved 2026-05-23 — took the first option (reconcile
declared packages with what the DLL was built against, verified by
reflecting `Assembly.GetReferencedAssemblies()` on `MxGateway.Client.dll`).
Changes to the csproj: **`Polly` 8.5.2 → `Polly.Core` 8.6.6** (the most
consequential — `Polly` (v7 fluent API) and `Polly.Core` (v8 resilience-
pipeline API) are different packages, and the DLL was built against
`Polly.Core`; the prior `Polly` reference would have failed at runtime
with `MissingMethodException` the first time the gateway client's retry
pipeline ran). Also bumped `Grpc.Net.Client` 2.71.0 → 2.76.0 and
`Microsoft.Extensions.Logging.Abstractions` 10.0.0 → 10.0.7 to match the
sibling Server/Worker projects' current versions. `Google.Protobuf`
3.34.1 and `Grpc.Core.Api` 2.76.0 already matched; left unchanged.
`libs/README.md` rewritten to record what was actually verified
(`Assembly.GetReferencedAssemblies()` output + the resolved package
versions, including the sibling Server/Worker csproj as the version
source-of-truth — the deleted MxGateway.Client.csproj would have been
the original source but no longer exists). Verification: solution-wide
`dotnet build` clean, Driver.Galaxy.Tests 245/245 pass against the
corrected package set.
### Driver.Galaxy-017
| Field | Value |
|---|---|
| Severity | Low |
| Category | Design-document adherence |
| Location | `src/Drivers/ZB.MOM.WW.OtOpcUa.Driver.Galaxy/` (no source change), gateway proto contract |
| Status | Deferred |
**Description:** The vendored `MxGateway.Contracts.dll` only carries the OLD `MxGateway.Contracts.Proto[.Galaxy]` namespace (PE-namespace dump confirms — `MxGateway.Client`, `MxGateway.Contracts`, `MxGateway.Contracts.Proto`, `MxGateway.Contracts.Proto.Galaxy` only). The sibling `mxaccessgw` repo's live `Protos/mxaccess_gateway.proto`, `mxaccess_worker.proto`, and `galaxy_repository.proto` files now generate into `ZB.MOM.WW.MxGateway.Contracts.Proto.*`. The proto wire format itself can still evolve (new RPCs, renamed fields, removed fields) and the driver has no contract-version handshake (a repo-wide search for `ContractVersion|ProtocolVersion|ApiVersion|WireVersion` in the driver returns nothing) — so a gateway service that evolves its proto past what the vendored client knows will fail silently at runtime: gRPC `UNIMPLEMENTED` for a renamed RPC, default-value reads for a removed scalar field, or worse, a wire-tag collision if a field number is reused. The risk surface grew with vendoring: previously the `ProjectReference` would have hard-failed at build time if the proto changed shape; now the driver builds green against a frozen contract that may not match the running gateway.
**Recommendation:** (a) Add a single `Ping`/`GetVersion` RPC call at gateway-session open, comparing the gateway's reported contract version against a string baked into `libs/README.md` (or a `GatewayContractVersion` const) and refusing the session on mismatch with a clear log. (b) Document in `libs/README.md` the exact mxaccessgw commit SHA (and proto-file SHA-256s) the vendored DLLs were built from, so a parity-rig operator can grep the live gateway for the matching commit. (c) Add a soak/parity test that asserts the live gateway's proto descriptor still matches what the vendored DLL expects — fail loud rather than degrade.
**Resolution:** Deferred 2026-05-23 — the recommendation's part (b)
(record the mxaccessgw source-commit SHA in `libs/README.md`) is satisfied
by the Driver.Galaxy-015 resolution, which records both DLLs were built
from mxaccessgw commit `dd7ca1634e2d2b8a866c81f0009bf87ee9427750`. Parts
(a) and (c) — adding a `GetVersion` RPC at session-open and a parity
test against the live gateway's proto descriptor — are substantial new
RPC + plumbing work that is not in scope for this code-review-resolution
sweep. The risk surface is bounded because either of the two unwinding
paths in `libs/README.md` (sibling repo restores `MxGateway.Client.csproj`,
or this driver migrates to the new namespace) will move the codebase
past the vendoring + close this concern naturally. Re-open if neither
unwinding path is taken within the next quarter and the live gateway
service does evolve its proto under the driver.
### Driver.Galaxy-018
| Field | Value |
|---|---|
| Severity | Low |
| Category | Documentation & comments |
| Location | `libs/README.md:32-37`, `ZB.MOM.WW.OtOpcUa.Driver.Galaxy.csproj:40-47` |
| Status | Resolved |
**Description:** Several small documentation issues in the vendoring artefacts:
1. `libs/README.md` says "Versions match what the sibling repo's `ZB.MOM.WW.MxGateway.Contracts.csproj` uses" — but `ZB.MOM.WW.MxGateway.Contracts.csproj` only declares `Google.Protobuf` 3.34.1 and `Grpc.Core.Api` 2.76.0; the other three packages (`Grpc.Net.Client`, `Microsoft.Extensions.Logging.Abstractions`, `Polly`) come from the (now-deleted) `MxGateway.Client.csproj`, not the contracts csproj. The README points at the wrong source-of-truth file. See Driver.Galaxy-016 for the related version-skew issue.
2. `libs/README.md` says the DLLs "are built against net10.0" — accurate, but the README should also pin the source-commit SHA from `mxaccessgw` that produced the build (currently no such reference). Without it, "May 2026" is the only locator and a future refresh has no fixed point to roll back to.
3. The two `<Reference>` items in the csproj omit `<SpecificVersion>false</SpecificVersion>`. The vendored DLLs carry `AssemblyVersion 1.0.0.0`; MSBuild's default for `<Reference HintPath>` items is `SpecificVersion=true` only when the `Include` attribute contains version info, which it does not here, so this is benign — but spelling it out (`<SpecificVersion>false</SpecificVersion>`) would make a future refresh that bumps the AssemblyVersion robust without csproj edits.
4. The csproj `<Reference Include="MxGateway.Client">` value relies on the bare assembly simple-name; an explicit `<Reference Include="MxGateway.Client, Version=1.0.0.0, Culture=neutral, PublicKeyToken=null">` plus `<SpecificVersion>false</SpecificVersion>` would document the contract surface inside the csproj where a reviewer reads it.
**Recommendation:** (a) Update `libs/README.md` to (i) point at `MxGateway.Client.csproj` for the `Grpc.Net.Client`/`Microsoft.Extensions.Logging.Abstractions`/`Polly` version source, (ii) record the mxaccessgw commit SHA the vendored binaries were built from, and (iii) record SHA-256 hashes (see Driver.Galaxy-015). (b) Add `<SpecificVersion>false</SpecificVersion>` to both `<Reference>` items in the csproj to make the intent explicit and refresh-robust.
**Resolution:** Resolved 2026-05-23 — most of (a) was addressed alongside
Driver.Galaxy-015 + -016: `libs/README.md` rewritten to (i) point at the
sibling Server/Worker csproj as the live version source-of-truth (the
`MxGateway.Client.csproj` cited in the recommendation no longer exists —
the deleted-csproj reference would not have been actionable for a
future reader), (ii) record source commit
`dd7ca1634e2d2b8a866c81f0009bf87ee9427750`, and (iii) record SHA-256
checksums of both vendored DLLs. (b) `<SpecificVersion>false</SpecificVersion>`
was intentionally NOT added — the vendored DLL's AssemblyVersion is
`1.0.0.0` and MSBuild's default for `<Reference HintPath>` Include="bare-name"
items is already `SpecificVersion=false`, so the spelling-it-out
recommendation would be cosmetic without changing behaviour. If the
vendored DLLs are ever refreshed against a build with a different
`AssemblyVersion` the explicit attribute could be added then; for now
the existing csproj works correctly.
@@ -7,7 +7,7 @@
| Review date | 2026-05-22 |
| Commit reviewed | `76d35d1` |
| Status | Reviewed |
| Open findings | 5 |
| Open findings | 0 |
## Checklist coverage
@@ -92,7 +92,7 @@ dead-lettered. Until then, document explicitly that this writer never produces
| Severity | Low |
| Category | Concurrency & thread safety |
| Location | `WonderwareHistorianClient.cs:207`, `WonderwareHistorianClient.cs:132-150` |
| Status | Open |
| Status | Resolved |
**Description:** `_totalQueries` is mutated with `Interlocked.Increment` in `Invoke`, but
read inside `GetHealthSnapshot` under `_healthLock`, and every other counter
@@ -106,7 +106,7 @@ and the counters are advisory, but the mixed model is a latent hazard.
`_healthLock` block (a new `RecordQuery()` helper, or fold it into `RecordSuccess`/
`RecordFailure`) so all six health fields share a single lock.
**Resolution:** _(open)_
**Resolution:** Resolved 2026-05-23 — replaced the mixed `Interlocked.Increment(ref _totalQueries)` + `_healthLock`-protected outcome counters with a single `RecordOutcome(bool success, string? error)` helper that increments `_totalQueries` and exactly one of `_totalSuccesses` / `_totalFailures` under one `_healthLock` acquisition; `GetHealthSnapshot` documents the invariant that `TotalSuccesses + TotalFailures == TotalQueries` at every observed snapshot. Added the regression test `GetHealthSnapshot_ConcurrentCallsAndReads_CountersAreInternallyConsistent` that runs a polling reader concurrently with 50 calls and asserts the invariant never breaks (fails red against the previous code, passes green now).
### Driver.Historian.Wonderware.Client-004
@@ -115,7 +115,7 @@ and the counters are advisory, but the mixed model is a latent hazard.
| Severity | Low |
| Category | Concurrency & thread safety |
| Location | `WonderwareHistorianClient.cs:203-267` |
| Status | Open |
| Status | Resolved |
**Description:** A sidecar-reported failure is recorded in two non-atomic steps under
separate lock acquisitions: `Invoke` calls `RecordSuccess()` (line 211) and then the
@@ -132,7 +132,7 @@ sidecar-level `Success` flag has been checked, or pass the reply success/error i
single `RecordOutcome(bool transportOk, bool sidecarOk, string? error)` that updates all
counters under one lock acquisition.
**Resolution:** _(open)_
**Resolution:** Resolved 2026-05-23 — eliminated the `RecordSuccess``ReclassifySuccessAsFailure` undo dance. `InvokeAsync` now takes a `Func<TReply, (bool ok, string? error)>` evaluator, evaluates it once when the transport reply lands, and calls `RecordOutcome(bool success, string? error)` exactly once per call under a single `_healthLock` acquisition. A sidecar-reported failure is now classified as a failure on its first and only counter update — no transient "success then undo" state is observable. The read-side `InvokeAndClassifyAsync` wrapper preserves the prior `InvalidOperationException` throw on sidecar failure. Added regression test `GetHealthSnapshot_SidecarFailure_NeverInflatesSuccessCounter` pinning `TotalSuccesses=0`/`TotalFailures=1` after a sidecar-error call.
### Driver.Historian.Wonderware.Client-005
@@ -167,7 +167,7 @@ the reader.
| Severity | Low |
| Category | Error handling & resilience |
| Location | `Internal/PipeChannel.cs:96-107`, `WonderwareHistorianClientOptions.cs:11-12` |
| Status | Open |
| Status | Resolved |
**Description:** `PipeChannel.InvokeAsync` retries exactly once on transport failure and
otherwise propagates. The options expose `ReconnectInitialBackoff` and
@@ -182,7 +182,7 @@ or the options are dead config that misleads operators.
path, or remove the two unused option fields and their XML docs and state plainly that
retry/backoff is owned by the caller (the alarm drain worker / history router).
**Resolution:** _(open)_
**Resolution:** Resolved 2026-05-23 — removed the dead `ReconnectInitialBackoff`/`ReconnectMaxBackoff` fields (and their `Effective*` accessors) from `WonderwareHistorianClientOptions` and added a `<remarks>` block stating that retry/backoff is owned by the caller (the alarm drain worker and the read-side history router) and that the channel itself performs exactly one in-place reconnect with no delay. Confirmed no consumer referenced the removed fields (only `code-reviews/` references remain). Solution-level build clean — Server picks up the new options shape without change.
### Driver.Historian.Wonderware.Client-007
@@ -218,7 +218,7 @@ deserializing.
| Severity | Low |
| Category | Security |
| Location | `ZB.MOM.WW.OtOpcUa.Driver.Historian.Wonderware.Client.csproj:29-32` |
| Status | Open |
| Status | Resolved |
**Description:** The csproj suppresses two NuGet audit advisories
(`GHSA-37gx-xxp4-5rgx`, `GHSA-w3x6-4m5h-cxqf`) for the `MessagePack` 2.5.187 dependency
@@ -232,7 +232,7 @@ advisory title, why it does not apply to this module usage, and a revisit trigge
follow-up to upgrade `MessagePack` once a patched version is available so the suppressions
can be dropped.
**Resolution:** _(open)_
**Resolution:** Resolved 2026-05-23 — the suppression block in the csproj (already added under finding 007) records each advisory title (GHSA-37gx-xxp4-5rgx unsafe-dynamic-codegen, GHSA-w3x6-4m5h-cxqf typeless-resolver gadget chain), why neither applies to this module (default `StandardResolver` only, no `TypelessContractlessStandardResolver` / `DynamicUnion` / `DynamicGenericResolver`, plus the 64 KiB per-sample ValueBytes cap in `DeserializeSampleValue` from finding 007), and the revisit trigger ("Revisit once MessagePack 3.x is available and drop these suppressions at that time"). All three pieces the recommendation asked for are present; the single comment block above both `NuGetAuditSuppress` entries was confirmed to satisfy the audit-trail gap.
### Driver.Historian.Wonderware.Client-009
@@ -272,7 +272,7 @@ silent `[Key]` drift between the two duplicated contract sets is caught at build
| Severity | Low |
| Category | Documentation & comments |
| Location | `WonderwareHistorianClient.cs:355-361`, `WonderwareHistorianClient.cs:132-150` |
| Status | Open |
| Status | Resolved |
**Description:** Two doc/behaviour mismatches.
(1) The `Dispose()` XML comment asserts the underlying channel async cleanup is
@@ -291,4 +291,4 @@ node concept. The collapse is reasonable but undocumented.
short remark on `GetHealthSnapshot` explaining that the single-channel client maps both
connection flags to one transport and does not track per-node health.
**Resolution:** _(open)_
**Resolution:** Resolved 2026-05-23 — (1) reworded the `Dispose()` XML comment to drop the "non-blocking" claim and instead state that the bridge is **deadlock-safe** because the cleanup never awaits a captured `SynchronizationContext` nor takes any lock the caller could hold, while acknowledging that `NamedPipeClientStream` teardown can block briefly on OS handle release. (2) Added a full `<summary>` + `<remarks>` block to `GetHealthSnapshot` explaining the single-channel collapse — both `ProcessConnectionOpen` and `EventConnectionOpen` report the same channel state, and `ActiveProcessNode`/`ActiveEventNode`/`Nodes` are intentionally null/empty because the client has no per-node telemetry. The remarks also pin the finding-003/004 invariant `TotalSuccesses + TotalFailures == TotalQueries`.
@@ -7,7 +7,7 @@
| Review date | 2026-05-22 |
| Commit reviewed | `76d35d1` |
| Status | Reviewed |
| Open findings | 7 |
| Open findings | 0 |
## Checklist coverage
@@ -115,7 +115,7 @@ analog/integer tags.
| Severity | Low |
| Category | Correctness and logic bugs |
| Location | `Backend/SdkAlarmHistorianWriteBackend.cs:198-201` |
| Status | Open |
| Status | Resolved |
**Description:** `ToHistorianEvent` only assigns `historianEvent.Id` when
`Guid.TryParse(dto.EventId, ...)` succeeds. If `EventId` is not a parseable GUID
@@ -128,7 +128,7 @@ The non-parseable case is never logged.
the event as `PermanentFail` (malformed input) or synthesize a fresh
`Guid.NewGuid()` so each event still gets a unique id.
**Resolution:** _(open)_
**Resolution:** Resolved 2026-05-23 — `ToHistorianEvent` now synthesizes a fresh `Guid.NewGuid()` when the dto's `EventId` fails `Guid.TryParse`, and logs a warning carrying both the original (unparseable) id and the synthesized id so collisions stop happening silently. Regression tests `ToHistorianEvent_parseable_event_id_is_used_verbatim` and `ToHistorianEvent_unparseable_event_id_synthesizes_unique_non_empty_Guid` in `SdkAlarmHistorianWriteBackendTests`.
### Driver.Historian.Wonderware-005
@@ -137,7 +137,7 @@ the event as `PermanentFail` (malformed input) or synthesize a fresh
| Severity | Low |
| Category | Concurrency and thread safety |
| Location | `Backend/HistorianDataSource.cs:124`, `:126-127` |
| Status | Open |
| Status | Resolved |
**Description:** `GetHealthSnapshot` reads `_activeProcessNode` and
`_activeEventNode` inside `_healthLock`, but those two fields are written under
@@ -152,7 +152,7 @@ a momentarily inconsistent health snapshot.
`_healthLock` on every connection state change, or read them under the connection
lock), so the snapshot is internally consistent.
**Resolution:** _(open)_
**Resolution:** Resolved 2026-05-23 — `GetHealthSnapshot` now derives the `ProcessConnectionOpen` / `EventConnectionOpen` booleans from the active-node strings (`_activeProcessNode != null` / `_activeEventNode != null`) which all live under `_healthLock`, instead of reading `_connection`/`_eventConnection` via `Volatile.Read` outside the lock those fields are published under. The snapshot is now self-consistent by construction: open ↔ active node populated. Regression tests in `HistorianDataSourceHealthSnapshotTests` cover the three half-published states plus the steady-state cases.
### Driver.Historian.Wonderware-006
@@ -184,7 +184,7 @@ restart the sidecar cleanly.
| Severity | Low |
| Category | Error handling and resilience |
| Location | `Ipc/PipeServer.cs:70-75` |
| Status | Open |
| Status | Resolved |
**Description:** When `VerifyCaller` rejects the peer SID, the server logs the
reason and calls `_current.Disconnect()` with no `HelloAck` frame sent. The
@@ -198,7 +198,7 @@ harder to test from the client.
`caller-sid-mismatch` reject reason before disconnecting, consistent with the
other two rejection paths.
**Resolution:** _(open)_
**Resolution:** Resolved 2026-05-23 — the SID rejection path now writes a `HelloAck { Accepted=false, RejectReason="caller-sid-mismatch: ..." }` before disconnecting, symmetric with the shared-secret-mismatch and major-version-mismatch paths. The caller-verification function was also extracted into a `CallerVerifier` delegate so tests can override it (the pipe ACL would otherwise block the test client itself). End-to-end regression `PipeServerSidRejectTests.Caller_SID_mismatch_sends_HelloAck_with_reject_reason_before_disconnect` connects a real named-pipe client and asserts the rejecting ack frame arrives.
### Driver.Historian.Wonderware-008
@@ -207,7 +207,7 @@ other two rejection paths.
| Severity | Low |
| Category | Error handling and resilience |
| Location | `Backend/HistorianDataSource.cs:301-307`, `:374-380` |
| Status | Open |
| Status | Resolved |
**Description:** When `query.StartQuery` returns `false`, `ReadRawAsync` and
`ReadAggregateAsync` call `HandleConnectionError()` and return an empty result
@@ -226,7 +226,7 @@ connection intact, surface the error). Consider returning a failed reply
(`Success = false`) for query-class `StartQuery` failures so the client does not
treat an SDK error as an empty history.
**Resolution:** _(open)_
**Resolution:** Resolved 2026-05-23 — extracted a static `ConnectionErrorCodes` set + `IsConnectionClassError` classifier (mirroring the alarm-write side) and centralised the failure handling in a new `HandleStartQueryFailure` helper. Connection-class codes still drop the connection and mark the node failed; query-class codes throw a new `QueryClassStartQueryException` that the outer catch re-throws WITHOUT touching the connection. All four read paths (raw / aggregate / at-time / events) also re-throw caught exceptions so the IPC frame handler surfaces `Success=false` instead of returning an empty list with `Success=true`. Regression tests `HistorianDataSourceStartQueryClassificationTests` pin the connection-class vs query-class classification per error code; the connect-failover suite (`HistorianDataSourceConnectFailoverTests`) verifies the read paths now propagate the exception.
### Driver.Historian.Wonderware-009
@@ -261,7 +261,7 @@ cap with an explicit error reply rather than letting `WriteAsync` throw.
| Severity | Low |
| Category | Performance and resource management |
| Location | `Backend/HistorianConfiguration.cs:32-36`, `Backend/HistorianDataSource.cs` (all read methods) |
| Status | Open |
| Status | Resolved |
**Description:** `HistorianConfiguration.RequestTimeoutSeconds` is documented as
the "outer safety timeout applied to sync-over-async Historian operations" and is
@@ -278,7 +278,7 @@ timeout, but the query path does not). The documented safety net does not exist.
worker with a bounded wait), or remove the property and its XML doc so the code
does not advertise a guarantee it does not provide.
**Resolution:** _(open)_
**Resolution:** Resolved 2026-05-23 — added an internal `BuildRequestCts` helper that returns a `CancellationTokenSource` linked into the caller's `ct` with `CancelAfter(RequestTimeoutSeconds)` applied when positive. Each read method (`ReadRawAsync`, `ReadAggregateAsync`, `ReadAtTimeAsync`, `ReadEventsAsync`) now wraps its work with the linked CTS and feeds the linked token into the `ThrowIfCancellationRequested` checks between `MoveNext` iterations, so a hung SDK call cancels at the configured deadline instead of blocking the connection thread indefinitely. Regression tests `HistorianDataSourceRequestTimeoutTests` pin the helper: positive value enforces `CancelAfter`, zero/negative means no timeout, caller cancellation propagates, default is 60s.
### Driver.Historian.Wonderware-011
@@ -287,7 +287,7 @@ does not advertise a guarantee it does not provide.
| Severity | Low |
| Category | Design-document adherence |
| Location | `Backend/HistorianDataSource.cs:9-12`, `Backend/IHistorianDataSource.cs:9-11`, `Backend/HistorianSample.cs:7-9`, `Backend/HistorianConfiguration.cs:7-9` |
| Status | Open |
| Status | Resolved |
**Description:** Several XML doc comments reference the retired v1 architecture as
if it were current: "inside Galaxy.Host", "the Proxy maps returned samples", "the
@@ -303,7 +303,7 @@ review checklist.
architecture (sidecar talking to `WonderwareHistorianClient` over the named pipe),
dropping the `Galaxy.Host` / `Proxy` / `GalaxyDataValue` references.
**Resolution:** _(open)_
**Resolution:** Resolved 2026-05-23 — refreshed the XML doc comments on `HistorianDataSource`, `IHistorianDataSource`, `HistorianSample` / `HistorianAggregateSample`, and `HistorianConfiguration` to describe the current sidecar / named-pipe / .NET 10 `WonderwareHistorianClient` architecture. References to `Galaxy.Host` / `Galaxy.Proxy` / `GalaxyDataValue` are now framed as historical context tied to the PR 7.2 retirement rather than as current behaviour.
### Driver.Historian.Wonderware-012
@@ -312,7 +312,7 @@ dropping the `Galaxy.Host` / `Proxy` / `GalaxyDataValue` references.
| Severity | Low |
| Category | Testing coverage |
| Location | `Backend/HistorianDataSource.cs`, `tests/Drivers/ZB.MOM.WW.OtOpcUa.Driver.Historian.Wonderware.Tests/` |
| Status | Open |
| Status | Resolved |
**Description:** The unit-test suite covers `HistorianQualityMapper`,
`HistorianClusterEndpointPicker`, `SdkAlarmHistorianWriteBackend`,
@@ -334,4 +334,4 @@ removed to avoid confusion.
cancellation, and the value-type selection — and delete the stale empty
`tests/ZB.MOM.WW.OtOpcUa.Driver.Historian.Wonderware.Tests/` directory.
**Resolution:** _(open)_
**Resolution:** Resolved 2026-05-23 — added four new `HistorianDataSource`-targeted test files: `HistorianDataSourceHealthSnapshotTests` (snapshot consistency under half-published state, see also -005), `HistorianDataSourceStartQueryClassificationTests` (connection-class vs query-class error-code table, see also -008), `HistorianDataSourceRequestTimeoutTests` (the request-timeout helper, see also -010), `HistorianDataSourceConnectFailoverTests` (cluster failover order + cooldown via the `IHistorianConnectionFactory` fake), and `HistorianDataSourceValueAndAggregateTests` (the string-vs-numeric heuristic via the new SDK-independent `SelectValueFromPair` overload + the `ExtractAggregateValue` column dispatch). Stale empty `tests/ZB.MOM.WW.OtOpcUa.Driver.Historian.Wonderware.Tests/` directory deleted. Unit count rose from 80 to 125 (+45 new tests).
@@ -7,7 +7,7 @@
| Review date | 2026-05-22 |
| Commit reviewed | `76d35d1` |
| Status | Reviewed |
| Open findings | 3 |
| Open findings | 0 |
## Checklist coverage
@@ -157,7 +157,7 @@ overwrite it.
| Severity | Low |
| Category | Error handling & resilience |
| Location | `ModbusAddressParser.cs:297-301` |
| Status | Open |
| Status | Resolved |
**Description:** `TryParseFamilyNative` catches only `ArgumentException` and `OverflowException`.
The current helpers throw only those (including `ArgumentOutOfRangeException`, which derives from
@@ -171,7 +171,13 @@ depend on.
narrow catch, or broaden to a general catch-all that records the message — a try-parse method
should never throw.
**Resolution:** _(open)_
**Resolution:** Resolved 2026-05-23 — broadened the `catch` filter in
`ModbusAddressParser.TryParseFamilyNative` from `ArgumentException or OverflowException` to a
general `catch (Exception ex)` so any future helper exception type is converted to a structured
`(false, error)` rather than escaping the `TryParse` method. Added `DL205_TryParse_NeverThrows`
and `MELSEC_TryParse_NeverThrows` parameterised regression tests in
`ModbusAddressEdgeCaseTests` covering ~20 pathological inputs (empty prefixes, octal/hex digit
violations, overflow inputs, unknown prefixes) to pin the defensive contract.
### Driver.Modbus.Addressing-007
@@ -180,7 +186,7 @@ should never throw.
| Severity | Low |
| Category | Design-document adherence |
| Location | `ModbusDataType.cs:91-95`, `docs/v2/dl205.md` section Strings |
| Status | Open |
| Status | Resolved |
**Description:** `ModbusStringByteOrder` (HighByteFirst / LowByteFirst) is defined in this
assembly and documented as the DL205 low-byte-first string-packing knob, but `ParsedModbusAddress`
@@ -193,7 +199,18 @@ unreachable from the parser, so the grammar cannot represent a known, documented
token for it, or document explicitly that DL205 string byte order is only configurable via the
structured tag form and is intentionally out of grammar scope.
**Resolution:** _(open)_
**Resolution:** Resolved 2026-05-23 — chose the "document the limitation" branch of the
recommendation rather than adding a grammar token: the 3rd field slot is the multi-register
word/byte order and the 4th is the array count, so a 5th `:<order>` suffix would conflict with
the existing count-shape disambiguation; `ModbusStringByteOrder` is already plumbed through the
structured tag form (`ModbusDriverFactoryExtensions.ModbusTagDto.StringByteOrder`
`ModbusTagDefinition.StringByteOrder`) which is the canonical config path. Added an explicit
"Grammar scope" remarks block to `ModbusStringByteOrder` and to the `ModbusAddressParser`
`<remarks>` block stating that string byte order is configurable only via the structured tag
form. Added a corresponding bullet to `docs/v2/dl205.md` §Strings. Added two regression tests
(`Parser_STR_grammar_does_not_carry_StringByteOrder` reflecting on `ParsedModbusAddress`, and
`Parser_rejects_unknown_string_byte_order_token_in_grammar`) pinning the contract so a future
grammar change can't quietly add a conflicting token.
### Driver.Modbus.Addressing-008
@@ -226,7 +243,7 @@ finding -001.
| Severity | Low |
| Category | Documentation & comments |
| Location | `ModbusModiconAddress.cs:55-64`, `ModbusModiconAddress.cs:104-110` |
| Status | Open |
| Status | Resolved |
**Description:** The comments on `ModbusModiconAddress.TryParse` are slightly inaccurate. The
remark that 5-digit Modicon is always exactly 5 chars (40001..49999) and 6-digit is exactly 6
@@ -238,4 +255,11 @@ says the 5-digit form caps at 9999 by construction while the adjacent code path
**Recommendation:** Reword the range examples to cover all four region digits and drop the
caps-at-9999 aside or restate it as a precise statement about trailing-digit count.
**Resolution:** _(open)_
**Resolution:** Resolved 2026-05-23 — reworded the up-front range-check comment to describe all
four region digits (0/1/3/4) and give examples covering each region (coils 00001..09999 /
000001..065536, holding registers 40001..49999 / 400001..465536). Reworded the lower
`> 65536` comment to drop the misleading "5-digit form caps at 9999 by construction" framing and
state precisely that the check is reached only by the 6-digit form in practice, but applied to
both for safety rather than relying on the digit-count invariant. Pure documentation change —
no behavioural change; the existing `ModbusModiconAddressTests` already pin the cross-region
5-digit ranges (00001..09999 / 10001..19999 / 30001..39999 / 40001..49999).
+59 -13
View File
@@ -7,7 +7,7 @@
| Review date | 2026-05-22 |
| Commit reviewed | `76d35d1` |
| Status | Reviewed |
| Open findings | 6 |
| Open findings | 0 |
## Checklist coverage
@@ -87,7 +87,7 @@ message explaining coils carry a single bit.
| Severity | Low |
| Category | Correctness & logic bugs |
| Location | `src/Drivers/Cli/ZB.MOM.WW.OtOpcUa.Driver.Modbus.Cli/ModbusCommandBase.cs:14-24` |
| Status | Open |
| Status | Resolved |
**Description:** `Port` (`int`) and `TimeoutMs` (`int`) accept any 32-bit value,
including negatives and ports above 65535. `UnitId` is a `byte`, so it accepts
@@ -103,7 +103,13 @@ error. None of these are validated at parse time.
message — consistent with how `WriteCommand` already rejects bad regions and
boolean strings.
**Resolution:** _(open)_
**Resolution:** Resolved 2026-05-23 — added `ModbusCommandBase.ValidateEndpoint()`
that throws `CliFx.Exceptions.CommandException` for `--port` outside 1..65535,
non-positive `--timeout-ms`, and `--unit-id` outside 1..247 (Modbus spec unicast
range — rejects the broadcast 0 and reserved 248-255). Each of `ProbeCommand`,
`ReadCommand`, `WriteCommand`, and `SubscribeCommand` now calls it at the top of
`ExecuteAsync` after `ConfigureLogging()`. Regression tests live in
`ModbusCommandBaseTests` (range + boundary cases for all three knobs).
### Driver.Modbus.Cli-004
@@ -113,7 +119,7 @@ boolean strings.
| Severity | Low |
| Category | Concurrency & thread safety |
| Location | `src/Drivers/Cli/ZB.MOM.WW.OtOpcUa.Driver.Modbus.Cli/Commands/SubscribeCommand.cs:61-67` |
| Status | Open |
| Status | Resolved |
**Description:** The `OnDataChange` handler is invoked from the driver's
`PollGroupEngine` background thread and calls `console.Output.WriteLine`
@@ -127,7 +133,11 @@ any synchronization, so overlapping poll ticks could interleave partial lines.
write failures so a transient console-write error cannot tear down the poll loop.
A single `lock` around the write also removes the interleave risk.
**Resolution:** _(open)_
**Resolution:** Resolved 2026-05-23 — wrapped the `OnDataChange` handler in
`try/catch (Exception)` that logs to Serilog at Warning and swallows the failure
so a transient console-write error cannot fault the `PollGroupEngine` background
loop. The console write is also serialized through a local `lock` object, removing
the partial-line interleave risk when multiple poll ticks overlap.
### Driver.Modbus.Cli-005
@@ -136,7 +146,7 @@ A single `lock` around the write also removes the interleave risk.
| Severity | Low |
| Category | Error handling & resilience |
| Location | `src/Drivers/Cli/ZB.MOM.WW.OtOpcUa.Driver.Modbus.Cli/Commands/ProbeCommand.cs:21-54`; `Commands/ReadCommand.cs:46-75`; `Commands/WriteCommand.cs:54-89` |
| Status | Open |
| Status | Resolved |
**Description:** All three commands call `ConfigureLogging()` then
`console.RegisterCancellationHandler()`, but if the operator presses Ctrl+C
@@ -152,7 +162,14 @@ commands do not catch it around their driver calls.
the noisy trace on Ctrl+C-during-connect is acceptable. Consistency with
`SubscribeCommand`'s handling is the cleaner choice.
**Resolution:** _(open)_
**Resolution:** Resolved 2026-05-23 — added a
`catch (OperationCanceledException) when (ct.IsCancellationRequested)` clause
around the driver-call bodies in `ProbeCommand`, `ReadCommand`, and `WriteCommand`
that prints `Cancelled.` and falls through to the existing `finally`-block
shutdown. Matches `SubscribeCommand`'s existing handling so all four commands now
exit quietly on Ctrl+C. Regression tests in `CommandCancellationTests` pre-cancel
the CliFx `FakeInMemoryConsole` before calling `ExecuteAsync` and assert no
exception escapes.
### Driver.Modbus.Cli-006
@@ -161,7 +178,7 @@ the noisy trace on Ctrl+C-during-connect is acceptable. Consistency with
| Severity | Low |
| Category | Error handling & resilience |
| Location | `src/Drivers/Cli/ZB.MOM.WW.OtOpcUa.Driver.Modbus.Cli/Commands/ProbeCommand.cs:35-53` |
| Status | Open |
| Status | Resolved |
**Description:** `probe` reports `Health: {health.State}` from `GetHealth()`.
After a successful `InitializeAsync` the driver sets state to `Healthy`
@@ -179,7 +196,14 @@ snapshot's `StatusCode` (Good vs Bad) rather than — or in addition to — the
`State`, or print a single combined verdict line so the two cannot contradict each
other.
**Resolution:** _(open)_
**Resolution:** Resolved 2026-05-23 — `ProbeCommand` now prints a single combined
`Verdict:` line above the bare driver-state line; the headline is computed by a
new `ProbeCommand.ComputeVerdict(DriverState, statusCode)` helper that combines
the driver state with the probe snapshot's OPC UA quality class (top 2 bits of
the StatusCode). Verdict is `FAIL` whenever the driver is not Healthy OR the
snapshot is Bad, `DEGRADED` when the driver is Healthy but the snapshot is
Uncertain, and `OK` only when both are Good. Regression tests in
`ProbeCommandTests` pin the verdict-grid behaviour.
### Driver.Modbus.Cli-007
@@ -188,7 +212,7 @@ other.
| Severity | Low |
| Category | Design-document adherence |
| Location | `docs/Driver.Modbus.Cli.md:124-156`; `src/Drivers/Cli/ZB.MOM.WW.OtOpcUa.Driver.Modbus.Cli/Commands/ReadCommand.cs` |
| Status | Open |
| Status | Resolved |
**Description:** `docs/Driver.Modbus.Cli.md` devotes a whole "v2 addressing
grammar" section to the industry-standard tag-address strings (`40001:F:CDAB`,
@@ -205,7 +229,14 @@ driver's address-string parser (and `--family` for the DL205/MELSEC native
syntax), or scope the "v2 addressing grammar" section of the doc to note it
applies to `DriverConfig` JSON and is not a CLI flag.
**Resolution:** _(open)_
**Resolution:** Resolved 2026-05-23 — chose the doc-scoping fix (an
`--address-string` flag is a bigger feature that warrants its own design
discussion). Added a "CLI scope" callout to the top of the
`docs/Driver.Modbus.Cli.md` "v2 addressing grammar" section stating the CLI
accepts only the structured `--region` + `--address` + `--type` triple and that
the address-string grammar is a `DriverConfig` JSON feature. The pre-existing
closing paragraph already said the same thing; the new callout makes it visible
before the grammar examples instead of after them. Code surface left unchanged.
### Driver.Modbus.Cli-008
@@ -214,7 +245,7 @@ applies to `DriverConfig` JSON and is not a CLI flag.
| Severity | Low |
| Category | Testing coverage |
| Location | `tests/Drivers/Cli/ZB.MOM.WW.OtOpcUa.Driver.Modbus.Cli.Tests/` |
| Status | Open |
| Status | Resolved |
**Description:** The test project covers only the two pure-function seams:
`ReadCommand.SynthesiseTagName` and `WriteCommand.ParseValue`. There is no coverage
@@ -231,4 +262,19 @@ likely to regress and is currently untested. The validation gaps in findings
setters and assert the produced `ModbusDriverOptions`). Once findings 002/003 are
fixed, add tests for the new validation paths.
**Resolution:** _(open)_
**Resolution:** Resolved 2026-05-23 — added four new test classes covering the
previously untested branch logic:
- `ModbusCommandBaseTests` — six tests over `BuildOptions` (probe disabled,
`TimeoutMs``Timeout` mapping, `AutoReconnect` tracking
`--disable-reconnect` in both directions, host/port/unit flow-through, tag
pass-through) plus the new `ValidateEndpoint` range-check tests for finding
003 (port 1..65535, timeout-ms > 0, unit-id 1..247).
- `WriteCommandRegionValidationTests` — read-only region rejection
(DiscreteInputs / InputRegisters) and the Coils-non-Bool guard for finding
002.
- `ProbeCommandTests` — the new `ComputeVerdict` helper for finding 006
(OK / DEGRADED / FAIL grid).
- `CommandCancellationTests` — Ctrl+C-during-initialize for `ProbeCommand` /
`ReadCommand` / `WriteCommand` for finding 005.
Total test count grew from 18 to 64; all pass.
+15 -15
View File
@@ -7,7 +7,7 @@
| Review date | 2026-05-22 |
| Commit reviewed | `76d35d1` |
| Status | Reviewed |
| Open findings | 7 |
| Open findings | 0 |
## Checklist coverage
@@ -63,13 +63,13 @@
| Severity | Low |
| Category | Concurrency & thread safety |
| Location | `ModbusDriver.cs:59,188,241,259,266,726,745,759` |
| Status | Open |
| Status | Resolved |
**Description:** `_health` is a non-`volatile` reference field written from multiple threads (concurrent `ReadAsync` callers, the coalesced-read path, `WriteAsync` indirectly, and `ProbeLoopAsync`) and read by `GetHealth()`. Reference assignment is atomic on .NET so a torn read cannot occur, but there is no happens-before ordering: a stale `DriverHealth` can be observed on another core, and concurrent writers race so "last write wins" is non-deterministic (a `Degraded` write from a failed read can clobber a just-published `Healthy`, or vice versa).
**Recommendation:** Mark `_health` `volatile`, or assign via `Volatile.Write` and read with `Volatile.Read`, to give `GetHealth()` a defined ordering guarantee.
**Resolution:** _(open)_
**Resolution:** Resolved 2026-05-23 — routed every `_health` access through new `ReadHealth()` (`Volatile.Read`) and `WriteHealth()` (`Volatile.Write`) helpers, giving `GetHealth()` a defined ordering guarantee on every core. Stress-test (`ModbusLifecycleHygieneTests.GetHealth_under_concurrent_pressure_always_returns_a_complete_snapshot`) confirms the read path never sees a torn / half-constructed snapshot under concurrent reader + writer pressure.
### Driver.Modbus-004
@@ -123,13 +123,13 @@
| Severity | Low |
| Category | Design-document adherence |
| Location | `ModbusDriver.cs:1392`, `ModbusDriverOptions.cs:74-80` |
| Status | Open |
| Status | Resolved |
**Description:** Two design-vs-code drifts. (1) `MapDataType` maps `Int64`/`UInt64` to `DriverDataType.Int32` with the inline comment "widening to Int32 loses precision; PR 25 adds Int64 to DriverDataType". The address-space node for a 64-bit Modbus tag is declared `Int32`, misrepresenting the OPC UA variable's `DataType` even though `DecodeRegister` produces a correct `long`/`ulong` value — clients see a type/value mismatch. (2) `DisableFC23` is documented and bound from JSON but is a confirmed no-op ("The driver does not currently emit FC23"). Both are acknowledged-but-unfinished items worth tracking.
**Recommendation:** Track the PR 25 `DriverDataType.Int64` follow-up; until then document the Int32 surfacing limitation in `docs/v2/modbus-addressing.md` so operators configuring `I_64`/`UI_64` tags understand the node type. Mark `DisableFC23` clearly as reserved/unimplemented or gate it once FC23 ships.
**Resolution:** _(open)_
**Resolution:** Resolved 2026-05-23 — promoted the inline Int64/UInt64 caveat into a full `<remarks>` block on `MapDataType` calling out the surfacing limitation and the tracked follow-up, and rewrote the `DisableFC23` XML doc to flag the option as "Reserved / no-op" with a Driver.Modbus-007 tracking reference. (The cross-module doc update in `docs/v2/modbus-addressing.md` is out of scope for this module's edits — code is now self-documenting.)
### Driver.Modbus-008
@@ -138,13 +138,13 @@
| Severity | Low |
| Category | Documentation & comments |
| Location | `ModbusDriver.cs:411-417,700-703,737-744` |
| Status | Open |
| Status | Resolved |
**Description:** Stale/misleading comments. (1) The `<summary>` block at `ModbusDriver.cs:411-417` says auto-prohibited ranges are "Cleared by ReinitializeAsync ... or by an explicit re-probe API (not yet shipped)" — the re-probe loop has shipped (#151, `ReprobeLoopAsync`), so the parenthetical is wrong. (2) The comment at `ModbusDriver.cs:700-703` ("On block-level failure mark every member Bad — caller's per-tag fallback won't re-try since handled-set already includes them; auto-split-on-failure is a follow-up") contradicts the actual `catch (ModbusException)` arm below it, which deliberately does not add members to `handled` and does defer to per-tag fallback (and auto-split has shipped via bisection). The empty `foreach (var (idx, _) in block.Members) { }` loop at `ModbusDriver.cs:737-744`, with only a comment body, is dead code from that superseded design.
**Recommendation:** Update the two comments to match the shipped #148/#150/#151 behaviour and delete the empty `foreach` loop in the `catch (ModbusException)` arm.
**Resolution:** _(open)_
**Resolution:** Resolved 2026-05-23 — deleted the duplicate `<summary>` orphaned at the top of the prohibition block, rewrote the surviving one to credit the shipped #151 re-probe loop, replaced the "auto-split-on-failure is a follow-up" comment above the block loop with the actual #148/#150 behaviour (per-tag fallback + bisection), and removed the empty `foreach (var (idx, _) in block.Members) { }` plus its unused `status` local from the `catch (ModbusException)` arm.
### Driver.Modbus-009
@@ -153,13 +153,13 @@
| Severity | Low |
| Category | Correctness & logic bugs |
| Location | `ModbusDriver.cs:1160-1167`, `ModbusTcpTransport.cs:94-95` |
| Status | Open |
| Status | Resolved |
**Description:** Two edge cases. (1) `RegisterCount` for `ModbusDataType.String` computes `(tag.StringLength + 1) / 2`; a tag configured with `StringLength = 0` yields a register count of 0, flowing into `ReadOneAsync` as `totalRegs = 0` and producing an FC03/FC04 with quantity 0 — a spec-illegal request the PLC rejects with exception 03. The factory does not reject `StringLength = 0` for String tags. (2) `EnableKeepAlive` casts `opts.Time.TotalSeconds`/`opts.Interval.TotalSeconds` to `int`; a sub-second configured `TimeSpan` (e.g. 500 ms) truncates to 0, which most OSes reject or interpret as "use default", silently defeating the configured keep-alive timing.
**Recommendation:** Validate `StringLength >= 1` for `String` tags in `ModbusDriverFactoryExtensions.BuildTag`. For keep-alive, round up to a minimum of 1 second or validate the configured `TimeSpan` is a whole number of seconds.
**Resolution:** _(open)_
**Resolution:** Resolved 2026-05-23 — added `ValidateStringLength` in `ModbusDriverFactoryExtensions.BuildTag` so String-typed tags with `StringLength < 1` throw at bind time with a clear diagnostic (both AddressString + structured DTO paths), and introduced `ModbusTcpTransport.ClampToWholeSeconds` which rounds the configured keep-alive `TimeSpan` up to a minimum of 1 second so sub-second values no longer truncate to 0. Regression coverage in `ModbusEdgeCaseValidationTests`.
### Driver.Modbus-010
@@ -168,13 +168,13 @@
| Severity | Low |
| Category | Error handling & resilience |
| Location | `ModbusDriver.cs:864-868`, `ModbusDriverOptions.cs:116-125` |
| Status | Open |
| Status | Resolved |
**Description:** When `WriteOnChangeOnly` is enabled and `IsRedundantWrite` returns true, `WriteAsync` returns `WriteResult(0u)` (Good) without touching the wire. The suppression baseline (`_lastWrittenByRef`) is only invalidated by a *read* that returns a divergent value. If a driver instance has `WriteOnChangeOnly = true` but a tag is never subscribed/read (write-only setpoint), a value the operator believes was re-asserted is silently suppressed forever after the first write — no time- or count-based expiry exists. The option XML doc describes the read-invalidation path but does not warn about write-only tags.
**Recommendation:** Document the write-only-tag caveat on the `WriteOnChangeOnly` option, or add an optional TTL to the suppression cache so a periodic re-write still reaches the PLC.
**Resolution:** _(open)_
**Resolution:** Resolved 2026-05-23 — added a `<remarks>` block on `ModbusDriverOptions.WriteOnChangeOnly` that calls out the write-only-tag caveat explicitly: the cache is only invalidated by reads, so a tag that is never subscribed/polled stays suppressed forever after the first write. Operators choosing this option are directed to either subscribe every affected tag or leave `WriteOnChangeOnly = false`. Adding a TTL was considered but the safer option for an OPC UA driver is to make the limitation discoverable in the documentation surface (no behaviour change for existing deployments).
### Driver.Modbus-011
@@ -183,13 +183,13 @@
| Severity | Low |
| Category | Code organization & conventions |
| Location | `ModbusDriver.cs:23-43,89-97,408-432` |
| Status | Open |
| Status | Resolved |
**Description:** Field and member declarations are interleaved with methods throughout `ModbusDriver`. `ResolveHost` (a public method) is the first member of the class, followed by `BuildSlaveHostName`, then a block of fields; `_lastPublishedByRef`/`_lastWrittenByRef` are declared after the constructor; `ProhibitionState`, `_autoProhibited`, and `_reprobeCts` are declared mid-file between `DecodeRegisterArray` and `RangeIsAutoProhibited`. There are also two near-identical `<summary>` blocks stacked back-to-back at `ModbusDriver.cs:411-423`. This hurts readability of a 1400-line file and makes the field inventory hard to audit (relevant to the thread-safety findings above).
**Recommendation:** Group all instance fields at the top of the class, move nested types together, and remove the orphaned first `<summary>` at lines 411-417 that no longer precedes a member.
**Resolution:** _(open)_
**Resolution:** Resolved 2026-05-23 — reorganized `ModbusDriver` so every instance field (including the `_autoProhibited` / `_autoProhibitedLock` / `_reprobeCts` / `_rmwLocks` / `_lastPublishedByRef` / `_lastWrittenByRef` fields that were declared mid-file) lives in a single contiguous block at the top of the class, followed by the `ProhibitionState` nested type, the constructor, and then methods. Removed the duplicate orphan `<summary>` and the now-redundant field declarations that had been scattered through the file. The full 263-test suite passes with no behavioural change.
### Driver.Modbus-012
@@ -198,10 +198,10 @@
| Severity | Low |
| Category | Testing coverage |
| Location | `tests/Drivers/ZB.MOM.WW.OtOpcUa.Driver.Modbus.Tests/` |
| Status | Open |
| Status | Resolved |
**Description:** The unit suite is broad (coalescing, bisection, auto-recovery, byte order, arrays, BCD, RMW, caps, multi-unit, probe, reconnect, subscription). Gaps relative to the findings above: (1) no test exercises concurrent multi-subscription publishing, so the `_lastPublishedByRef` race (Driver.Modbus-001) is uncaught; (2) no test covers `ReinitializeAsync` state hygiene for stale `_tagsByName`/caches (Driver.Modbus-002); (3) no test feeds a malformed/short response PDU through `ReadRegisterBlockAsync`/`DecodeBitArray` to confirm a clean `BadCommunicationError` rather than an index-range crash (Driver.Modbus-005); (4) no test asserts `DisposeAsync` (vs `ShutdownAsync`) tears down the probe/re-probe loops and `_poll` (Driver.Modbus-004).
**Recommendation:** Add unit tests for concurrent deadband publishing across two subscriptions, `ReinitializeAsync` state hygiene, malformed-response handling in the register/bit block readers, and `DisposeAsync` loop teardown.
**Resolution:** _(open)_
**Resolution:** Resolved 2026-05-23 — gap (1) was already covered by `ModbusSubscriptionTests.Concurrent_deadband_subscriptions_do_not_corrupt_the_publish_cache` from the Driver.Modbus-001 fix. Added the remaining three in a new `ModbusLifecycleHygieneTests` file: `Reinitialize_clears_stale_tagsByName_entries` + `Reinitialize_clears_lastPublished_and_lastWritten_caches` (gap 2), `Short_response_PDU_surfaces_as_BadCommunicationError_not_an_IndexOutOfRangeException` + `Response_payload_truncated_below_declared_byteCount_surfaces_as_BadCommunicationError` + `DecodeBitArray_rejects_an_empty_bitmap_with_InvalidDataException` (gap 3), `DisposeAsync_without_explicit_Shutdown_tears_down_probe_loop_and_transport` + `DisposeAsync_disposes_the_pollEngine_so_subscriptions_stop` (gap 4). All 12 new tests pass (full suite: 263/263 green).
+9 -9
View File
@@ -7,7 +7,7 @@
| Review date | 2026-05-22 |
| Commit reviewed | `76d35d1` |
| Status | Reviewed |
| Open findings | 2 |
| Open findings | 0 |
## Checklist coverage
@@ -182,14 +182,14 @@
|---|---|
| Severity | Low |
| Category | Documentation & comments |
| Location | `OpcUaClientDriver.cs:783-784` |
| Status | Open |
| Location | `OpcUaClientDriver.cs:1007-1015` |
| Status | Resolved |
**Description:** The comment on the isArray computation states "-1 = scalar; 1+ = array dimensions; 0 = one-dimensional array". This is inaccurate against OPC UA ValueRank semantics: -3 is ScalarOrOneDimension, -2 is Any, -1 is Scalar, and 0 is OneOrMoreDimensions (not specifically one-dimensional). The code `valueRank >= 0` treats -2 (Any) and -3 (ScalarOrOneDimension) as scalar, which is a defensible default, but the comment misdescribes the constants and would mislead a maintainer.
**Description:** The comment on the isArray computation stated "-1 = scalar; 1+ = array dimensions; 0 = one-dimensional array". This is inaccurate against OPC UA ValueRank semantics: -3 is ScalarOrOneDimension, -2 is Any, -1 is Scalar, and 0 is OneOrMoreDimensions (not specifically one-dimensional). The code `valueRank >= 0` treats -2 (Any) and -3 (ScalarOrOneDimension) as scalar, which is a defensible default, but the comment misdescribed the constants and would mislead a maintainer.
**Recommendation:** Correct the comment to the actual ValueRank constants (-3 ScalarOrOneDimension, -2 Any, -1 Scalar, 0 OneOrMoreDimensions, 1 OneDimension, >1 multi-dim) and state the deliberate choice that anything >= 0 is treated as an array.
**Resolution:** _(open)_
**Resolution:** Resolved 2026-05-23 — `EnrichAndRegisterVariablesAsync` now carries the correct OPC UA Part 3 ValueRank legend (`-3 ScalarOrOneDimension`, `-2 Any`, `-1 Scalar`, `0 OneOrMoreDimensions`, `1 OneDimension`, `>1` specific N-dimensions) and explicitly states the deliberate choice that anything `>= 0` is treated as an array, with `-3`/`-2` conservatively folded into the scalar bucket. Regression tests `ValueRank_constants_have_the_OPCUA_Part3_spec_values` (anchors the SDK constants) and `IsArray_decision_matches_valueRank_greater_or_equal_zero` (theory across -3..2) pin the logic in `OpcUaClientLowFindingsRegressionTests.cs`.
### Driver.OpcUaClient-012
@@ -227,14 +227,14 @@
|---|---|
| Severity | Low |
| Category | Performance & resource management |
| Location | `OpcUaClientDriver.cs:904`, `:1035` |
| Status | Open |
| Location | `OpcUaClientDriver.cs:1138`, `:1314` |
| Status | Resolved |
**Description:** `MonitoredItem.Notification += (mi, args) => ...` (and the alarm-event equivalent) attaches a closure-capturing lambda to each monitored item's event. The lambda is never detached. When UnsubscribeAsync removes a subscription it calls Subscription.DeleteAsync but does not clear the MonitoredItem.Notification handlers; if the SDK retains the MonitoredItem/Subscription graph anywhere (the session keeps a reference until its own disposal, or during transfer-on-reconnect), the driver instance is kept alive by the closure longer than necessary.
**Description:** `MonitoredItem.Notification += (mi, args) => ...` (and the alarm-event equivalent) attached a closure-capturing lambda to each monitored item's event. The lambda was never detached. When UnsubscribeAsync removed a subscription it called Subscription.DeleteAsync but did not clear the MonitoredItem.Notification handlers; if the SDK retains the MonitoredItem/Subscription graph anywhere (the session keeps a reference until its own disposal, or during transfer-on-reconnect), the driver instance was kept alive by the closure longer than necessary.
**Recommendation:** Detach the Notification handlers when deleting a subscription, or hold the handler delegate so it can be explicitly removed in UnsubscribeAsync/ShutdownAsync.
**Resolution:** _(open)_
**Resolution:** Resolved 2026-05-23 — `SubscribeAsync` now stores each `(MonitoredItem, MonitoredItemNotificationEventHandler)` pair in a new `MonitoredItemNotificationHandle` record carried inside `RemoteSubscription`. `SubscribeAlarmsAsync` similarly stores the event-MonitoredItem and its handler delegate on `RemoteAlarmSubscription`. `UnsubscribeAsync`, `UnsubscribeAlarmsAsync`, and the subscription-teardown loops in `ShutdownAsync` now invoke `DetachNotificationHandlers` (or the alarm-equivalent inline `Notification -= rs.Handler`) BEFORE calling `Subscription.DeleteAsync`, so the SDK's invocation list no longer pins the driver through the captured lambda. Reflection-based regression tests `RemoteSubscription_record_carries_handler_delegates_so_they_can_be_detached` and `RemoteAlarmSubscription_record_carries_handler_delegate_so_it_can_be_detached` pin the contract that the handler reference is reachable from the bookkeeping record (`OpcUaClientLowFindingsRegressionTests.cs`).
### Driver.OpcUaClient-015
+9 -9
View File
@@ -7,7 +7,7 @@
| Review date | 2026-05-22 |
| Commit reviewed | `76d35d1` |
| Status | Reviewed |
| Open findings | 4 |
| Open findings | 0 |
## Checklist coverage
@@ -120,7 +120,7 @@ unreachable device, not crash on it.
| Severity | Low |
| Category | Performance & resource management |
| Location | `src/Drivers/Cli/ZB.MOM.WW.OtOpcUa.Driver.S7.Cli/Commands/ProbeCommand.cs:36,53`, `Commands/ReadCommand.cs:45,54`, `Commands/WriteCommand.cs:51,60`, `Commands/SubscribeCommand.cs:39,73` |
| Status | Open |
| Status | Resolved |
**Description:** Every command declares the driver with `await using var driver = new
S7Driver(...)` and *also* calls `await driver.ShutdownAsync(...)` in a `finally` block.
@@ -136,7 +136,7 @@ not actually disposing.
`subscribe` command `UnsubscribeAsync`. Alternatively drop `await using`
and keep the explicit `finally`. Pick one disposal mechanism per command.
**Resolution:** _(open)_
**Resolution:** Resolved 2026-05-23 — dropped the explicit `await driver.ShutdownAsync(CancellationToken.None)` calls from the `finally` blocks of `ProbeCommand`, `ReadCommand`, `WriteCommand`, and `SubscribeCommand`; `await using` is now the sole driver-disposal mechanism per command (DisposeAsync itself runs ShutdownAsync), and the subscribe command keeps `UnsubscribeAsync` in its finally because that is a subscription-lifecycle concern, not driver disposal. Added `CommandDisposalConventionsTests` to guard the source-level convention against regression.
### Driver.S7.Cli-005
@@ -145,7 +145,7 @@ and keep the explicit `finally`. Pick one disposal mechanism per command.
| Severity | Low |
| Category | Code organization & conventions |
| Location | `tests/ZB.MOM.WW.OtOpcUa.Driver.S7.Cli.Tests/` |
| Status | Open |
| Status | Resolved |
**Description:** A stale directory `tests/ZB.MOM.WW.OtOpcUa.Driver.S7.Cli.Tests/`
exists containing only an `obj/` folder — no `.csproj`, no source. The real test
@@ -158,7 +158,7 @@ grepping the tree for the S7 CLI test project.
directory (including its `obj/`). This is outside the module `src/` tree but is the
S7 CLI own orphaned test folder, so it belongs to this module cleanup.
**Resolution:** _(open)_
**Resolution:** Resolved 2026-05-23 — deleted the stale `tests/ZB.MOM.WW.OtOpcUa.Driver.S7.Cli.Tests/` directory (only contained a leftover `obj/` from before the move into `tests/Drivers/Cli/`; no tracked files). The real test project at `tests/Drivers/Cli/ZB.MOM.WW.OtOpcUa.Driver.S7.Cli.Tests/` is untouched.
### Driver.S7.Cli-006
@@ -167,7 +167,7 @@ S7 CLI own orphaned test folder, so it belongs to this module cleanup.
| Severity | Low |
| Category | Testing coverage |
| Location | `tests/Drivers/Cli/ZB.MOM.WW.OtOpcUa.Driver.S7.Cli.Tests/WriteCommandParseValueTests.cs` |
| Status | Open |
| Status | Resolved |
**Description:** The only test file covers `WriteCommand.ParseValue` and
`ReadCommand.SynthesiseTagName`. `S7CommandBase.BuildOptions` — which maps the
@@ -184,7 +184,7 @@ not be caught. `ParseValue` is also missing an explicit overflow-edge test (e.g.
overflow case to the `ParseValue` numeric tests once Driver.S7.Cli-001 is resolved so
the test asserts the wrapped `CommandException`.
**Resolution:** _(open)_
**Resolution:** Resolved 2026-05-23 — added `S7CommandBaseBuildOptionsTests` covering Probe.Enabled=false (one-shot CLI guarantee), TimeoutMs→Timeout TimeSpan mapping, host/port/CPU/rack/slot flowing through, and tag-list passthrough. The overflow-edge `ParseValue` test was already added under Driver.S7.Cli-001 (`ParseValue_overflow_for_numeric_types_throws_CommandException`).
### Driver.S7.Cli-007
@@ -193,7 +193,7 @@ the test asserts the wrapped `CommandException`.
| Severity | Low |
| Category | Documentation & comments |
| Location | `src/Drivers/Cli/ZB.MOM.WW.OtOpcUa.Driver.S7.Cli/Commands/SubscribeCommand.cs:45-51` |
| Status | Open |
| Status | Resolved |
**Description:** The Modbus CLI `SubscribeCommand` carries an explanatory comment on
the `OnDataChange` handler ("Route every data-change event to the CliFx console (not
@@ -206,4 +206,4 @@ Minor, but the rationale is worth keeping consistent across the CLI family.
**Recommendation:** Re-add the one-line comment from the Modbus `SubscribeCommand` so
the S7 copy explains why the event handler writes via `console.Output` synchronously.
**Resolution:** _(open)_
**Resolution:** Resolved 2026-05-23 — re-added the explanatory comment above the `OnDataChange` handler in the S7 `SubscribeCommand`, mirroring the Modbus copy: it explains the use of the CliFx `IConsole.Output` abstraction (rather than `System.Console`) and notes that the handler runs synchronously because it's raised from a driver background thread. Added `SubscribeCommandConsoleHandlerCommentTests` to guard the rationale against future copy-paste regressions.
+66 -15
View File
@@ -7,7 +7,7 @@
| Review date | 2026-05-22 |
| Commit reviewed | `76d35d1` |
| Status | Reviewed |
| Open findings | 7 |
| Open findings | 0 |
## Checklist coverage
@@ -40,7 +40,7 @@ a category produced nothing rather than leaving it blank.
| Severity | Low |
| Category | Correctness & logic bugs |
| Location | `TwinCATCommandBase.cs:23-24`, `Commands/SubscribeCommand.cs:23-24`, `Commands/BrowseCommand.cs:21-24` |
| Status | Open |
| Status | Resolved |
**Description:** Numeric command options are accepted without range validation. `--timeout-ms`
feeds `Timeout => TimeSpan.FromMilliseconds(TimeoutMs)`; passing `--timeout-ms 0` or a negative
@@ -56,7 +56,16 @@ failure mode should be a readable up-front rejection.
shared helper on `TwinCATCommandBase`) and throw `CliFx.Exceptions.CommandException` with a
clear message when `TimeoutMs <= 0`, `IntervalMs <= 0`, or `AmsPort` falls outside `1..65535`.
**Resolution:** _(open)_
**Resolution:** Added a virtual `Validate()` helper on `TwinCATCommandBase` that rejects
`TimeoutMs <= 0` and `AmsPort` outside `1..65535` with a clean
`CliFx.Exceptions.CommandException` carrying the offending value. `SubscribeCommand` overrides
`Validate()` to add the `IntervalMs > 0` check. Each `ExecuteAsync` calls `Validate()` first
so the CLI surfaces "bad argument" up-front instead of letting the driver fail with an opaque
transport error. Covered by `TwinCATCommandBaseTests.Validate_rejects_zero_timeout`,
`Validate_rejects_negative_timeout`, `Validate_rejects_out_of_range_ams_port` (theory, 4
cases), `Validate_accepts_in_range_ams_port` (theory, 4 cases),
`SubscribeCommand_validate_rejects_zero_interval`,
`SubscribeCommand_validate_rejects_negative_interval`.
### Driver.TwinCAT.Cli-002
@@ -65,7 +74,7 @@ clear message when `TimeoutMs <= 0`, `IntervalMs <= 0`, or `AmsPort` falls outsi
| Severity | Low |
| Category | Concurrency & thread safety |
| Location | `Commands/SubscribeCommand.cs:46-58` |
| Status | Open |
| Status | Resolved |
**Description:** The `OnDataChange` handler calls `console.Output.WriteLine(line)` synchronously.
In native ADS-notification mode the event is raised from the `Beckhoff.TwinCAT.Ads`
@@ -83,7 +92,12 @@ serialised on one poll loop; the TwinCAT native path has no such serialisation.
`TextWriter.Synchronized`. At minimum, gate it so the banner is written before the
subscription is registered (it already is) and lock the per-event writes against each other.
**Resolution:** _(open)_
**Resolution:** Introduced a per-execution `writeLock` object inside
`SubscribeCommand.ExecuteAsync`. Both the `OnDataChange` handler's `WriteLine` and the
post-subscription "Subscribed to ..." banner take the lock, so notification-callback writes
cannot interleave with the main-thread banner or with each other. Lock is local to the
command so parallel process instances do not contend with one another (each owns its own
console).
### Driver.TwinCAT.Cli-003
@@ -92,7 +106,7 @@ subscription is registered (it already is) and lock the per-event writes against
| Severity | Low |
| Category | Error handling & resilience |
| Location | `Commands/SubscribeCommand.cs:56-58` |
| Status | Open |
| Status | Resolved |
**Description:** The subscribe banner reports the mechanism purely from the `--poll-only` flag
(`var mode = PollOnly ? "polling" : "ADS notification"`). The doc (`docs/Driver.TwinCAT.Cli.md`)
@@ -108,7 +122,15 @@ returned `ISubscriptionHandle.DiagnosticId`, which is `twincat-native-sub-*` for
path vs the `PollGroupEngine` handle for poll mode) or soften the wording to "(requested:
ADS notification)" so it does not over-claim.
**Resolution:** _(open)_
**Resolution:** Added internal static
`SubscribeCommand.DescribeMechanism(ISubscriptionHandle)` that maps
`DiagnosticId.StartsWith("twincat-native-sub-", Ordinal)` to `"ADS notification"` and
anything else to `"polling"`. The banner now reads from the handle the driver actually
returned, so the line cannot disagree with what the driver did even if a future fallback
lands the subscription somewhere unexpected. Covered by
`SubscribeCommandMechanismTests.DescribeMechanism_returns_ADS_notification_for_native_handle`
(theory, 3 cases) and `DescribeMechanism_returns_polling_for_anything_else` (theory, 4 cases
including an ordinal case-sensitivity guard).
### Driver.TwinCAT.Cli-004
@@ -117,7 +139,7 @@ ADS notification)" so it does not over-claim.
| Severity | Low |
| Category | Design-document adherence |
| Location | `TwinCATCommandBase.cs:26-29`, `Commands/BrowseCommand.cs` |
| Status | Open |
| Status | Resolved |
**Description:** `--poll-only` is declared on `TwinCATCommandBase`, so it is inherited by
`browse`. `BrowseCommand` only ever calls `DiscoverAsync` — it never subscribes — so
@@ -131,7 +153,15 @@ disagree.
flag) onto an intermediate base shared by only `probe`/`read`/`subscribe`, or override/hide it
for `browse`. Alternatively document explicitly that the flag is a no-op for `browse`.
**Resolution:** _(open)_
**Resolution:** Introduced an intermediate `TwinCATTagCommandBase : TwinCATCommandBase` that
hosts the `--poll-only` flag and the `BuildOptions(...)` helper. `ProbeCommand`,
`ReadCommand`, `WriteCommand`, and `SubscribeCommand` inherit from this intermediate (they all
build a tag-list `TwinCATDriverOptions`). `BrowseCommand` keeps inheriting from
`TwinCATCommandBase` directly, so `--poll-only` no longer surfaces in `browse --help`. Browse
sets `UseNativeNotifications = true` on its inline options (irrelevant either way for the
discover-only path, but matches production wiring). Covered by
`TwinCATCommandBaseTests.BrowseCommand_does_not_expose_poll_only_flag` and
`ProbeCommand_still_exposes_poll_only_flag` (both reflect over the public property surface).
### Driver.TwinCAT.Cli-005
@@ -140,7 +170,7 @@ for `browse`. Alternatively document explicitly that the flag is a no-op for `br
| Severity | Low |
| Category | Code organization & conventions |
| Location | `Commands/ProbeCommand.cs:23`, `Commands/ReadCommand.cs:20`, `Commands/WriteCommand.cs:20`, `Commands/SubscribeCommand.cs:18` |
| Status | Open |
| Status | Resolved |
**Description:** The `--type` option is declared with the short alias `-t` on `read`, `write`,
and `subscribe`, but `ProbeCommand` declares `[CommandOption("type", ...)]` with no short
@@ -151,7 +181,10 @@ take the same `TwinCATDataType` option.
**Recommendation:** Add the `'t'` short alias to `ProbeCommand`'s `--type` option to match the
other three commands.
**Resolution:** _(open)_
**Resolution:** Added the `'t'` short alias on `ProbeCommand.DataType`'s `[CommandOption]`,
matching read/write/subscribe so muscle memory carries between the four verbs. Covered by
`TwinCATCommandBaseTests.ProbeCommand_type_option_carries_short_alias_t` which asserts the
`CommandOptionAttribute.ShortName` is `'t'`.
### Driver.TwinCAT.Cli-006
@@ -160,7 +193,7 @@ other three commands.
| Severity | Low |
| Category | Testing coverage |
| Location | `tests/Drivers/Cli/ZB.MOM.WW.OtOpcUa.Driver.TwinCAT.Cli.Tests/WriteCommandParseValueTests.cs` |
| Status | Open |
| Status | Resolved |
**Description:** The only test file covers `WriteCommand.ParseValue` and
`ReadCommand.SynthesiseTagName`. Other deterministic, router-independent logic is untested:
@@ -177,7 +210,20 @@ module's scope but worth flagging to whoever owns the test tree.
`BuildOptions` field wiring, and for the `CollectingAddressSpaceBuilder` prefix/max filtering
and access-classification logic.
**Resolution:** _(open)_
**Resolution:** Added two new test classes. `TwinCATCommandBaseTests` covers the Gateway
canonical form, round-trip through `TwinCATAmsAddress.TryParse`, DriverInstanceId
composition, Timeout projection, BuildOptions field wiring (devices, tags, timeout, probe
disabled, controller-browse disabled, UseNativeNotifications default true), and the PollOnly
toggle flipping UseNativeNotifications. `BrowseCommandFilterTests` covers
`CollectingAddressSpaceBuilder` (records variables in call order, treats `Folder` as a
same-builder pass-through), `BrowseCommand.FilterByPrefix` (empty/null prefix passes
everything, case-sensitive ordinal match), `BrowseCommand.PrintLimit` (max <= 0 = unbounded,
caps when matched > max, no padding when matched < max), and `BrowseCommand.AccessTag`
(ViewOnly -> RO, every other classification -> RW, theory over all 6 non-ViewOnly values).
`BrowseCommand.CollectingAddressSpaceBuilder` made `internal` (was `private`) so the test
project can construct it directly via the existing `InternalsVisibleTo` hook. Total tests
for this assembly went from 27 to 69. The stale empty sibling test directory mention is
left out of scope as noted.
### Driver.TwinCAT.Cli-007
@@ -186,7 +232,7 @@ and access-classification logic.
| Severity | Low |
| Category | Documentation & comments |
| Location | `TwinCATCommandBase.cs:31-36` |
| Status | Open |
| Status | Resolved |
**Description:** The `Timeout` override has an empty `init` accessor with the comment
`/* driven by TimeoutMs */`. Because the base `DriverCommandBase.Timeout` is declared
@@ -199,4 +245,9 @@ gives no hint of the deliberate no-op. This is a maintainability/clarity nit, no
computed projection of `--timeout-ms` and the `init` accessor is intentionally a no-op, so the
design intent survives refactoring.
**Resolution:** _(open)_
**Resolution:** Replaced the `<inheritdoc/>` on `TwinCATCommandBase.Timeout` with an explicit
`<summary>` documenting that `Timeout` is projected from `TimeoutMs`, that the `init`
accessor required by the abstract base property is intentionally a no-op, and that adding a
backing field would cause the two to drift on every refactor. The inner-block comment was
tightened to point at the XML summary so the design intent survives whichever doc surface a
future maintainer reads first.
+11 -11
View File
@@ -7,7 +7,7 @@
| Review date | 2026-05-22 |
| Commit reviewed | `76d35d1` |
| Status | Reviewed |
| Open findings | 5 |
| Open findings | 0 |
## Checklist coverage
@@ -112,7 +112,7 @@ does not support UDT tags, and `BrowseSymbolsAsync` already correctly yields
| Severity | Low |
| Category | Correctness & logic bugs |
| Location | `TwinCATDataType.cs:24-27` |
| Status | Open |
| Status | Resolved |
**Description:** The inline comments for the IEC time types are inaccurate. TwinCAT `TIME` is
a duration (32-bit, milliseconds) — not "ms since epoch of day". `DATE` is stored as seconds
@@ -125,7 +125,7 @@ implementer who tries to add proper conversion.
date/time semantics are intended to be exposed properly, track a follow-up to decode them to
`DriverDataType.DateTime`; otherwise document that they surface as raw counters.
**Resolution:** _(open)_
**Resolution:** Resolved 2026-05-23 — rewrote the inline comments to match the actual IEC 61131-3 / TwinCAT encoding (TIME = duration in ms, DATE = seconds since 1970-01-01 truncated to a day boundary, DT = seconds since 1970-01-01, TOD = ms since midnight) and added a block comment documenting that the driver surfaces them as raw UDINT counters via `DriverDataType.UInt32`. Test `Iec_time_types_map_to_uint32_raw_counter` pins the mapping.
### Driver.TwinCAT-005
@@ -156,7 +156,7 @@ catch), native-notification registration failures, and host state transitions
| Severity | Low |
| Category | OtOpcUa conventions |
| Location | `TwinCATDriver.cs:406-411` |
| Status | Open |
| Status | Resolved |
**Description:** `ResolveHost` falls back to `DriverInstanceId` when there are no configured
devices and the reference is unknown. `DriverInstanceId` is a logical config-DB identifier,
@@ -169,7 +169,7 @@ connectivity-status row.
empty string or a documented unresolved marker), or document why the instance ID is the chosen
fallback. Prefer the first device HostAddress only when one exists (already done).
**Resolution:** _(open)_
**Resolution:** Resolved 2026-05-23 — `ResolveHost` now returns `TwinCATDriver.UnresolvedHostSentinel` (empty string) when no devices are configured, replacing the `DriverInstanceId` collision with `GetHostStatuses()` rows. The sentinel is publicly documented on the driver type. Updated `ResolveHost_falls_back_to_unresolved_sentinel_when_no_devices` (was `_to_DriverInstanceId_`) and added `ResolveHost_returns_unresolved_sentinel_when_no_devices` + `ResolveHost_unresolved_sentinel_matches_no_GetHostStatuses_entry` regressions.
### Driver.TwinCAT-007
@@ -361,7 +361,7 @@ part of the documented driver contract, not optional.
| Severity | Low |
| Category | Design-document adherence |
| Location | `TwinCATDriverOptions.cs:41-43`, `TwinCATDriverOptions.cs:57-62`, `AdsTwinCATClient.cs:145` |
| Status | Open |
| Status | Resolved |
**Description:** Several drifts between the implemented config surface and
`docs/v2/driver-specs.md` section 6. The spec connection-settings list has separate `Host`
@@ -376,7 +376,7 @@ the probe path connects via `_options.Timeout` — a dead config field. The spec
shape (the doc is DRAFT, so updating it is acceptable). Remove or wire up
`TwinCATProbeOptions.Timeout`. Expose `NotificationMaxDelayMs` if batching control is wanted.
**Resolution:** _(open)_
**Resolution:** Resolved 2026-05-23 — `TwinCATProbeOptions.Timeout` is now wired into `EnsureConnectedAsync` via an optional `timeoutOverride` parameter that the probe loop passes (reads / writes keep the driver-level `_options.Timeout`). Added a `TwinCATDriverOptions.NotificationMaxDelayMs` config knob (parsed from `driverConfigJson` via `TwinCATDriverConfigDto.NotificationMaxDelayMs`) and threaded it through `ITwinCATClient.AddNotificationAsync` so `NotificationSettings` carries the configured max-delay instead of the hard-coded 0. The `Host` / `AmsNetId` / `AmsPort` triple in the spec was already implemented as the single `HostAddress` (parsed `ads://{netId}:{port}` URI) — kept as-is to match the v2 driver convention; covered by `TwinCATAmsAddress`. Regression tests: `ProbeOptions_Timeout_is_applied_to_probe_calls`, `NotificationMaxDelayMs_is_exposed_on_driver_options`, `NotificationMaxDelayMs_parses_from_driver_config_json`.
### Driver.TwinCAT-015
@@ -385,7 +385,7 @@ shape (the doc is DRAFT, so updating it is acceptable). Remove or wire up
| Severity | Low |
| Category | Code organization & conventions |
| Location | `TwinCATDriver.cs:431-432` |
| Status | Open |
| Status | Resolved |
**Description:** `Dispose()` runs `DisposeAsync().AsTask().GetAwaiter().GetResult()`
sync-over-async. `docs/v2/driver-stability.md` section Galaxy explicitly lists "sync-over-async
@@ -399,7 +399,7 @@ here — cancelling token sources, disposing clients, clearing dictionaries —
synchronous, and `PollGroupEngine.DisposeAsync` completes synchronously, so factor the
synchronous teardown out so `Dispose()` does not block on a `Task`.
**Resolution:** _(open)_
**Resolution:** Resolved 2026-05-23 — `Dispose()` now does an inline synchronous teardown with no `await` and no captured sync context: dispose native subscriptions, drive `PollGroupEngine.DisposeAsync` via `.AsTask().Wait(5s)` (no context capture), per-device `ProbeCts.Cancel()` + `ProbeTask.Wait(2s)`, `DisposeClient()` / `DisposeGate()`, then clear the dictionaries. `DisposeAsync` still routes through `ShutdownAsync` for genuinely async callers. Regression test `Dispose_does_not_block_on_async_in_default_synchronization_context` runs `Dispose()` inside a single-threaded `SynchronizationContext` that would deadlock a sync-over-async teardown and asserts it completes within 5s.
### Driver.TwinCAT-016
@@ -408,7 +408,7 @@ synchronous teardown out so `Dispose()` does not block on a `Task`.
| Severity | Low |
| Category | Testing coverage |
| Location | `tests/Drivers/ZB.MOM.WW.OtOpcUa.Driver.TwinCAT.Tests/` |
| Status | Open |
| Status | Resolved |
**Description:** Unit coverage exists for AMS-address parsing, symbol-path parsing, read/write,
native notifications, symbol browse, and the capability surface. Gaps tied to the findings
@@ -423,4 +423,4 @@ without truncation (Driver.TwinCAT-002).
addressed, especially a concurrency stress test for `EnsureConnectedAsync` and a
`ReinitializeAsync`-applies-new-config test.
**Resolution:** _(open)_
**Resolution:** Resolved 2026-05-23 — the previously-closed High findings each grew their regression coverage as they were resolved (see `TwinCATHighFindingsRegressionTests`: `ReinitializeAsync_applies_changed_device_config` for -001, `LInt_read_round_trips_value_above_int_MaxValue` + `DataType_mapping_preserves_width_and_signedness` for -002, `Concurrent_reads_on_one_device_create_a_single_client` + `Concurrent_reads_and_writes_share_one_client` for -007, `Symbol_version_changed_raises_OnRediscoveryNeeded` + `TwinCATDriver_implements_IRediscoverable` for -013). This pass added the two remaining gaps: `Structure_typed_pre_declared_tag_is_rejected_at_config_parse` (-003) and `Probe_loop_and_read_share_one_client_per_device` (-009 disposal-race coverage races 64 readers against the probe loop for 500ms and asserts a single client / single connect). All coverage lives in the test files `TwinCATHighFindingsRegressionTests.cs` and the new `TwinCATLowFindingsRegressionTests.cs`.
+118 -107
View File
@@ -12,126 +12,41 @@ Each module's `findings.md` is the source of truth; this file is generated from
|---|---|---|---|---|---|---|
| [Admin](Admin/findings.md) | Claude Code | 2026-05-22 | `76d35d1` | Reviewed | 0 | 13 |
| [Analyzers](Analyzers/findings.md) | Claude Code | 2026-05-22 | `76d35d1` | Reviewed | 0 | 7 |
| [Client.CLI](Client.CLI/findings.md) | Claude Code | 2026-05-22 | `76d35d1` | Reviewed | 8 | 10 |
| [Client.Shared](Client.Shared/findings.md) | Claude Code | 2026-05-22 | `76d35d1` | Reviewed | 5 | 11 |
| [Client.UI](Client.UI/findings.md) | Claude Code | 2026-05-22 | `76d35d1` | Reviewed | 6 | 11 |
| [Client.CLI](Client.CLI/findings.md) | Claude Code | 2026-05-22 | `76d35d1` | Reviewed | 0 | 10 |
| [Client.Shared](Client.Shared/findings.md) | Claude Code | 2026-05-22 | `76d35d1` | Reviewed | 0 | 11 |
| [Client.UI](Client.UI/findings.md) | Claude Code | 2026-05-22 | `76d35d1` | Reviewed | 0 | 11 |
| [Configuration](Configuration/findings.md) | Claude Code | 2026-05-22 | `76d35d1` | Reviewed | 0 | 11 |
| [Core](Core/findings.md) | Claude Code | 2026-05-22 | `76d35d1` | Reviewed | 0 | 12 |
| [Core.Abstractions](Core.Abstractions/findings.md) | Claude Code | 2026-05-22 | `76d35d1` | Reviewed | 0 | 8 |
| [Core.AlarmHistorian](Core.AlarmHistorian/findings.md) | Claude Code | 2026-05-22 | `76d35d1` | Reviewed | 0 | 11 |
| [Core.ScriptedAlarms](Core.ScriptedAlarms/findings.md) | Claude Code | 2026-05-22 | `76d35d1` | Reviewed | 0 | 12 |
| [Core.Scripting](Core.Scripting/findings.md) | Claude Code | 2026-05-22 | `76d35d1` | Reviewed | 0 | 11 |
| [Core.ScriptedAlarms](Core.ScriptedAlarms/findings.md) | Claude Code | 2026-05-23 | `a9be809` | Reviewed | 0 | 13 |
| [Core.Scripting](Core.Scripting/findings.md) | Claude Code | 2026-05-23 | `a9be809` | Reviewed | 0 | 16 |
| [Core.VirtualTags](Core.VirtualTags/findings.md) | Claude Code | 2026-05-22 | `76d35d1` | Reviewed | 0 | 13 |
| [Driver.AbCip](Driver.AbCip/findings.md) | Claude Code | 2026-05-22 | `76d35d1` | Reviewed | 0 | 15 |
| [Driver.AbCip.Cli](Driver.AbCip.Cli/findings.md) | Claude Code | 2026-05-22 | `76d35d1` | Reviewed | 6 | 8 |
| [Driver.AbCip.Cli](Driver.AbCip.Cli/findings.md) | Claude Code | 2026-05-22 | `76d35d1` | Reviewed | 0 | 8 |
| [Driver.AbLegacy](Driver.AbLegacy/findings.md) | Claude Code | 2026-05-22 | `76d35d1` | Reviewed | 0 | 13 |
| [Driver.AbLegacy.Cli](Driver.AbLegacy.Cli/findings.md) | Claude Code | 2026-05-22 | `76d35d1` | Reviewed | 6 | 7 |
| [Driver.Cli.Common](Driver.Cli.Common/findings.md) | Claude Code | 2026-05-22 | `76d35d1` | Reviewed | 2 | 6 |
| [Driver.AbLegacy.Cli](Driver.AbLegacy.Cli/findings.md) | Claude Code | 2026-05-22 | `76d35d1` | Reviewed | 0 | 7 |
| [Driver.Cli.Common](Driver.Cli.Common/findings.md) | Claude Code | 2026-05-23 | `a9be809` | Reviewed | 0 | 8 |
| [Driver.FOCAS](Driver.FOCAS/findings.md) | Claude Code | 2026-05-22 | `76d35d1` | Reviewed | 0 | 12 |
| [Driver.FOCAS.Cli](Driver.FOCAS.Cli/findings.md) | Claude Code | 2026-05-22 | `76d35d1` | Reviewed | 5 | 5 |
| [Driver.Galaxy](Driver.Galaxy/findings.md) | Claude Code | 2026-05-22 | `76d35d1` | Reviewed | 0 | 14 |
| [Driver.Historian.Wonderware](Driver.Historian.Wonderware/findings.md) | Claude Code | 2026-05-22 | `76d35d1` | Reviewed | 7 | 12 |
| [Driver.Historian.Wonderware.Client](Driver.Historian.Wonderware.Client/findings.md) | Claude Code | 2026-05-22 | `76d35d1` | Reviewed | 5 | 10 |
| [Driver.Modbus](Driver.Modbus/findings.md) | Claude Code | 2026-05-22 | `76d35d1` | Reviewed | 7 | 12 |
| [Driver.Modbus.Addressing](Driver.Modbus.Addressing/findings.md) | Claude Code | 2026-05-22 | `76d35d1` | Reviewed | 3 | 9 |
| [Driver.Modbus.Cli](Driver.Modbus.Cli/findings.md) | Claude Code | 2026-05-22 | `76d35d1` | Reviewed | 6 | 8 |
| [Driver.OpcUaClient](Driver.OpcUaClient/findings.md) | Claude Code | 2026-05-22 | `76d35d1` | Reviewed | 2 | 15 |
| [Driver.FOCAS.Cli](Driver.FOCAS.Cli/findings.md) | Claude Code | 2026-05-22 | `76d35d1` | Reviewed | 0 | 5 |
| [Driver.Galaxy](Driver.Galaxy/findings.md) | Claude Code | 2026-05-23 | `a9be809` | Reviewed | 0 | 18 |
| [Driver.Historian.Wonderware](Driver.Historian.Wonderware/findings.md) | Claude Code | 2026-05-22 | `76d35d1` | Reviewed | 0 | 12 |
| [Driver.Historian.Wonderware.Client](Driver.Historian.Wonderware.Client/findings.md) | Claude Code | 2026-05-22 | `76d35d1` | Reviewed | 0 | 10 |
| [Driver.Modbus](Driver.Modbus/findings.md) | Claude Code | 2026-05-22 | `76d35d1` | Reviewed | 0 | 12 |
| [Driver.Modbus.Addressing](Driver.Modbus.Addressing/findings.md) | Claude Code | 2026-05-22 | `76d35d1` | Reviewed | 0 | 9 |
| [Driver.Modbus.Cli](Driver.Modbus.Cli/findings.md) | Claude Code | 2026-05-22 | `76d35d1` | Reviewed | 0 | 8 |
| [Driver.OpcUaClient](Driver.OpcUaClient/findings.md) | Claude Code | 2026-05-22 | `76d35d1` | Reviewed | 0 | 15 |
| [Driver.S7](Driver.S7/findings.md) | Claude Code | 2026-05-22 | `76d35d1` | Reviewed | 0 | 14 |
| [Driver.S7.Cli](Driver.S7.Cli/findings.md) | Claude Code | 2026-05-22 | `76d35d1` | Reviewed | 4 | 7 |
| [Driver.TwinCAT](Driver.TwinCAT/findings.md) | Claude Code | 2026-05-22 | `76d35d1` | Reviewed | 5 | 16 |
| [Driver.TwinCAT.Cli](Driver.TwinCAT.Cli/findings.md) | Claude Code | 2026-05-22 | `76d35d1` | Reviewed | 7 | 7 |
| [Driver.S7.Cli](Driver.S7.Cli/findings.md) | Claude Code | 2026-05-22 | `76d35d1` | Reviewed | 0 | 7 |
| [Driver.TwinCAT](Driver.TwinCAT/findings.md) | Claude Code | 2026-05-22 | `76d35d1` | Reviewed | 0 | 16 |
| [Driver.TwinCAT.Cli](Driver.TwinCAT.Cli/findings.md) | Claude Code | 2026-05-22 | `76d35d1` | Reviewed | 0 | 7 |
| [Server](Server/findings.md) | Claude Code | 2026-05-22 | `76d35d1` | Reviewed | 0 | 15 |
## Pending findings
Findings with status `Open` or `In Progress`, ordered by severity.
| ID | Severity | Category | Location | Description |
|---|---|---|---|---|
| Client.CLI-002 | Low | Correctness & logic bugs | `Commands/SubscribeCommand.cs:129-137` | The summary computes `neverWentBad` as every target whose node-id key is absent from the `everBad` dictionary. A node that received no update at all is also absent from `everBad`, so it is counted in `neverWentBad` and printed under the he… |
| Client.CLI-003 | Low | Correctness & logic bugs | `Commands/BrowseCommand.cs:29-30`, `Commands/SubscribeCommand.cs:20-27`, `Commands/AlarmsCommand.cs:28-29`, `Commands/HistoryReadCommand.cs:42-43` | Numeric command options accept any value with no range validation. `--depth`, `--interval`, `--max-depth`, `--max`, and the history `--interval` can all be supplied as `0` or a negative number. A negative `--depth`/`--max-depth` silently d… |
| Client.CLI-004 | Low | OtOpcUa conventions | `Commands/SubscribeCommand.cs:13-37` | `SubscribeCommand` is the only command in the module whose constructor and all `[CommandOption]` properties have no XML doc comments. Every other command (`ConnectCommand`, `ReadCommand`, `WriteCommand`, `BrowseCommand`, `AlarmsCommand`, `… |
| Client.CLI-006 | Low | Error handling & resilience | `Commands/HistoryReadCommand.cs:73`, `Commands/HistoryReadCommand.cs:76`, `Helpers/NodeIdParser.cs:39` | Operator input-format errors surface as raw .NET exceptions rather than clean CLI errors. An unparseable start/end value throws `FormatException` straight out of `DateTime.Parse`; an invalid node id throws `FormatException`/`ArgumentExcept… |
| Client.CLI-007 | Low | Performance & resource management | `CommandBase.cs:112-123` | `ConfigureLogging` builds a new Serilog `LoggerConfiguration`, creates a logger, and assigns it to the static `Log.Logger` without disposing the previously assigned logger. For a single CLI invocation this leaks at most one logger and the… |
| Client.CLI-008 | Low | Documentation & comments | `docs/Client.CLI.md:158-217` | `docs/Client.CLI.md` is stale relative to the code at this commit. (1) The `subscribe` command section documents only `-n` and `-i`, but the code (`SubscribeCommand`) also exposes `-r/--recursive`, `--max-depth`, `-q/--quiet`, `--duration`… |
| Client.CLI-009 | Low | Code organization & conventions | `Commands/SubscribeCommand.cs:66-165`, `Commands/AlarmsCommand.cs:52-91` | Both long-running commands attach an event handler (`service.DataChanged += ...`, `service.AlarmEvent += ...`) with a lambda and never detach it. Because the handler closes over `console`, the captured console and the closure remain refere… |
| Client.CLI-010 | Low | Testing coverage | `tests/Client/ZB.MOM.WW.OtOpcUa.Client.CLI.Tests/SubscribeCommandTests.cs` | The new `SubscribeCommand` capabilities are largely untested. The four `SubscribeCommandTests` cover only single-node subscribe, unsubscribe-on-cancel, disconnect-in-finally, and the subscription message. There is no test for the `--recurs… |
| Client.Shared-003 | Low | Correctness & logic bugs | `Adapters/DefaultSessionAdapter.cs:76`, `Adapters/DefaultSessionAdapter.cs:273` | `WriteValueAsync` returns `response.Results[0]` and `CallMethodAsync` reads `result.Results[0]` without first checking the `Results` collection is non-empty. A malformed or service-level-faulted response (empty `Results` alongside a servic… |
| Client.Shared-004 | Low | OtOpcUa conventions | `Adapters/DefaultSessionAdapter.cs:228`, `Adapters/DefaultSessionAdapter.cs:121`, `Adapters/DefaultSessionAdapter.cs:172` | `CloseAsync`, `HistoryReadRawAsync`, and `HistoryReadAggregateAsync` are declared `async Task` but call the synchronous `Session.Close()` / `Session.HistoryRead(...)` APIs and contain no `await`. The history methods run a blocking synchron… |
| Client.Shared-009 | Low | Error handling & resilience / Documentation & comments | `OpcUaClientService.cs:302-322` | `AcknowledgeAlarmAsync` is typed `Task<StatusCode>` and its XML doc implies the returned code reports the ack outcome, but the method unconditionally `return StatusCodes.Good`. The actual failure path is `DefaultSessionAdapter.CallMethodAs… |
| Client.Shared-010 | Low | Performance & resource management | `Models/ConnectionSettings.cs:48`, `OpcUaClientService.cs:408-417` | `ConnectionSettings.CertificateStorePath` is initialized to `ClientStoragePaths.GetPkiPath()` as a property initializer, so every `ConnectionSettings` instantiation runs `Environment.GetFolderPath` + `Path.Combine` and, on the first call p… |
| Client.Shared-011 | Low | Testing coverage | `tests/Client/ZB.MOM.WW.OtOpcUa.Client.Shared.Tests/OpcUaClientServiceTests.cs` | The test suite is solid for the happy paths, connection lifecycle, and single-failover behavior. Gaps relative to the findings above: (a) no test exercises concurrent `SubscribeAsync`/failover to expose the `_activeDataSubscriptions` race… |
| Client.UI-003 | Low | OtOpcUa conventions | `ZB.MOM.WW.OtOpcUa.Client.UI.csproj:20-21`, `Program.cs:14-20` | The csproj references `Serilog` and `Serilog.Sinks.Console`, and `docs/Client.UI.md` lists Serilog as the logging technology, but no source file in the module uses Serilog. `Program.BuildAvaloniaApp()` uses Avalonia's `LogToTrace()` and th… |
| Client.UI-004 | Low | OtOpcUa conventions | `Views/MainWindow.axaml.cs:125-138` | `OnBrowseCertPathClicked` uses `OpenFolderDialog`, which is obsolete in Avalonia 11.x (the version pinned in the csproj). The supported replacement is the `StorageProvider` API (`StorageProvider.OpenFolderPickerAsync`). Using the obsolete… |
| Client.UI-006 | Low | Error handling & resilience | `ViewModels/MainWindowViewModel.cs:244-252`, `ViewModels/AlarmsViewModel.cs:88-112`, `ViewModels/SubscriptionsViewModel.cs:79-94` | Many catch blocks swallow exceptions silently with an empty body and only a comment (`// Redundancy info not available`, `// Subscribe failed`, `// Subscription failed; no item added`, and others). When a subscribe, alarm-subscribe, or red… |
| Client.UI-009 | Low | Design-document adherence | `ViewModels/HistoryViewModel.cs:44-54` | `HistoryViewModel.AggregateTypes` exposes eight entries: `null` (Raw) plus Average, Minimum, Maximum, Count, Start, End, and `StandardDeviation`. `docs/Client.UI.md` ("Query Options" table) lists only "Raw (default), Average, Minimum, Maxi… |
| Client.UI-010 | Low | Code organization & conventions | `Controls/DateTimeRangePicker.axaml.cs:33-37`, `Controls/DateTimeRangePicker.axaml.cs:70-80` | `DateTimeRangePicker` declares `MinDateTimeProperty` / `MaxDateTimeProperty` styled properties with public CLR accessors, but neither is read anywhere in the control. `TryParseDateTime`, `OnStartLostFocus`, and `OnEndLostFocus` never clamp… |
| Client.UI-011 | Low | Documentation & comments | `Views/MainWindow.axaml:81`, `Services/JsonSettingsService.cs:11-15` | The certificate-store-path `TextBox` watermark reads `(default: AppData/LmxOpcUaClient/pki)`, referencing the legacy pre-task-#208 folder name. Per `CLAUDE.md` / `docs/Client.UI.md` the canonical path is now `{LocalAppData}/OtOpcUaClient/`… |
| Driver.AbCip.Cli-003 | Low | Concurrency & thread safety | `src/Drivers/Cli/ZB.MOM.WW.OtOpcUa.Driver.AbCip.Cli/Commands/SubscribeCommand.cs:50-56,60-61` | The `OnDataChange` handler writes change lines to `console.Output` (a `TextWriter`) from the driver's poll-engine callback thread, while the command's main flow concurrently writes the "Subscribed to ... Ctrl+C to stop." line on the CLI th… |
| Driver.AbCip.Cli-004 | Low | Error handling & resilience | `src/Drivers/Cli/ZB.MOM.WW.OtOpcUa.Driver.AbCip.Cli/Commands/SubscribeCommand.cs:28,58`; `AbCipCommandBase.cs:26-34` | `--interval-ms` (`IntervalMs`) is taken verbatim and passed as `TimeSpan.FromMilliseconds(IntervalMs)` to `SubscribeAsync` with no validation. A zero or negative value produces a non-positive `TimeSpan`; the option description claims "Poll… |
| Driver.AbCip.Cli-005 | Low | Performance & resource management | `src/Drivers/Cli/ZB.MOM.WW.OtOpcUa.Driver.Cli.Common/DriverCommandBase.cs:51-59` | `ConfigureLogging` assigns a freshly created Serilog logger to the process-global `Log.Logger` but never calls `Log.CloseAndFlush()`. For a short-lived one-shot command (`probe`, `read`, `write`) the process exit flushes the console sink,… |
| Driver.AbCip.Cli-006 | Low | Design-document adherence | `src/Drivers/Cli/ZB.MOM.WW.OtOpcUa.Driver.AbCip.Cli/AbCipCommandBase.cs:29-34` | `AbCipCommandBase` overrides the abstract `DriverCommandBase.Timeout` property with a getter derived from `TimeoutMs` and an empty `init` body (`init { /* driven by TimeoutMs */ }`). Because the override has no `[CommandOption]` attribute,… |
| Driver.AbCip.Cli-007 | Low | Testing coverage | `tests/Drivers/Cli/ZB.MOM.WW.OtOpcUa.Driver.AbCip.Cli.Tests/WriteCommandParseValueTests.cs` | The only test file covers `WriteCommand.ParseValue` and `ReadCommand.SynthesiseTagName` — both pure static helpers. There is no coverage for `AbCipCommandBase.BuildOptions` (the flag-to-`AbCipDriverOptions` mapping that all four commands d… |
| Driver.AbCip.Cli-008 | Low | Documentation & comments | `docs/Driver.AbCip.Cli.md:8-9` | `docs/Driver.AbCip.Cli.md` opens with "Second of four driver test-client CLIs (Modbus -> AB CIP -> AB Legacy -> S7 -> TwinCAT)." The count "four" contradicts the chain that follows it (five names) and contradicts `docs/DriverClis.md`, whic… |
| Driver.AbLegacy.Cli-002 | Low | Correctness & logic bugs | `Commands/WriteCommand.cs:27-29`, `Program.cs:6-9` | The `--value` option help text states "booleans accept true/false/1/0", but `ParseBool` (`WriteCommand.cs:74-80`) and the error message also accept `on/off` and `yes/no`, and `DriverClis.md` documents the full `true/false/1/0/yes/no/on/off… |
| Driver.AbLegacy.Cli-003 | Low | Concurrency & thread safety | `Commands/SubscribeCommand.cs:47-53` | The `OnDataChange` handler calls `console.Output.WriteLine(line)` (the synchronous overload) directly from the `PollGroupEngine` poll thread. The poll engine raises change events from a background timer/loop thread, so two ticks that fire… |
| Driver.AbLegacy.Cli-004 | Low | Error handling & resilience | `Commands/ProbeCommand.cs:37-56`, `Commands/ReadCommand.cs:39-50`, `Commands/WriteCommand.cs:48-59`, `Commands/SubscribeCommand.cs:41-76` | Every command does `await using var driver = new AbLegacyDriver(...)` *and* an explicit `await driver.ShutdownAsync(...)` in the `finally`. `AbLegacyDriver` `DisposeAsync` itself calls `ShutdownAsync`, so the driver is shut down twice on t… |
| Driver.AbLegacy.Cli-005 | Low | Design-document adherence | `Commands/SubscribeCommand.cs:23-25`, `docs/Driver.AbLegacy.Cli.md:94-96` | The subscribe command interval option is `--interval-ms` (default 1000). `docs/Driver.AbLegacy.Cli.md` shows the subscribe example as `otopcua-ablegacy-cli subscribe ... -i 500`, which works because of the short alias `'i'`, but the doc ne… |
| Driver.AbLegacy.Cli-006 | Low | Code organization & conventions | `Commands/ProbeCommand.cs:20-22` | `ProbeCommand` declares its `--type` option with no short alias, while `ReadCommand`, `WriteCommand`, and `SubscribeCommand` all declare `--type` with the short alias `'t'`. `ProbeCommand` also gives `--address` the alias `'a'`, matching t… |
| Driver.AbLegacy.Cli-007 | Low | Testing coverage | `tests/Drivers/Cli/ZB.MOM.WW.OtOpcUa.Driver.AbLegacy.Cli.Tests/WriteCommandParseValueTests.cs` | The only test file in the CLI test project covers `WriteCommand.ParseValue` and `ReadCommand.SynthesiseTagName`. Two behaviours that are pure logic (testable without a device) are uncovered: (1) `AbLegacyCommandBase.BuildOptions` — that it… |
| Driver.Cli.Common-004 | Low | Error handling & resilience | `src/Drivers/Cli/ZB.MOM.WW.OtOpcUa.Driver.Cli.Common/SnapshotFormatter.cs:68-70` | `FormatTable` calls `rows.Max(r => r.Tag.Length)` (and the same for the value and status columns) without guarding against empty input. When `tagNames` and `snapshots` are both empty (equal length, so the mismatch check at line 56 passes),… |
| Driver.Cli.Common-006 | Low | Documentation & comments | `src/Drivers/Cli/ZB.MOM.WW.OtOpcUa.Driver.Cli.Common/SnapshotFormatter.cs:71`, `src/Drivers/Cli/ZB.MOM.WW.OtOpcUa.Driver.Cli.Common/DriverCommandBase.cs:9` | Two minor doc inaccuracies. (1) The comment at `SnapshotFormatter.cs:71` states the "source-time column is fixed-width (ISO-8601 to ms) so no max-measurement needed" — true only when every snapshot has a non-null `SourceTimestampUtc`. `For… |
| Driver.FOCAS.Cli-001 | Low | Error handling & resilience | `Commands/WriteCommand.cs:58-68` | `WriteCommand.ParseValue` parses the numeric `--value` types (`Byte`/`Int16`/`Int32`/`Float32`/`Float64`) with `sbyte.Parse` / `short.Parse` / etc. These throw raw `FormatException` or `OverflowException` for malformed or out-of-range inpu… |
| Driver.FOCAS.Cli-002 | Low | Concurrency & thread safety | `Commands/SubscribeCommand.cs:45-51` | The `subscribe` command attaches an `OnDataChange` handler that calls the synchronous `console.Output.WriteLine`. `OnDataChange` is raised from the driver's `PollGroupEngine` tick thread, while the command's main flow writes the "Subscribe… |
| Driver.FOCAS.Cli-003 | Low | Error handling & resilience | `FocasCommandBase.cs:19` (`CncPort`), `FocasCommandBase.cs:27` (`TimeoutMs`), `Commands/SubscribeCommand.cs:23` (`IntervalMs`) | The numeric command options `--cnc-port`, `--timeout-ms`, and `--interval-ms` are accepted without range validation. A zero or negative `--cnc-port` produces an invalid `focas://host:<n>` string; `--timeout-ms 0` yields a zero `TimeSpan` o… |
| Driver.FOCAS.Cli-004 | Low | Performance & resource management | `Commands/ProbeCommand.cs:37,54`; `Commands/ReadCommand.cs:37,46`; `Commands/WriteCommand.cs:45,54`; `Commands/SubscribeCommand.cs:39,73` | Every command declares `await using var driver = new FocasDriver(...)` |
| Driver.FOCAS.Cli-005 | Low | Design-document adherence | `Commands/WriteCommand.cs:50`, `Commands/ProbeCommand.cs:50` (via `SnapshotFormatter.FormatStatus`) | `docs/Driver.FOCAS.Cli.md` documents `BadDeviceFailure` and `BadCommunicationError` as the key diagnostic signals an operator reads off `probe` / `write` output ("A `BadCommunicationError` means ... `BadDeviceFailure` after a successful co… |
| Driver.Historian.Wonderware-004 | Low | Correctness and logic bugs | `Backend/SdkAlarmHistorianWriteBackend.cs:198-201` | `ToHistorianEvent` only assigns `historianEvent.Id` when `Guid.TryParse(dto.EventId, ...)` succeeds. If `EventId` is not a parseable GUID (or is empty), `Id` stays `Guid.Empty` and the event is written to the historian with an all-zeros id… |
| Driver.Historian.Wonderware-005 | Low | Concurrency and thread safety | `Backend/HistorianDataSource.cs:124`, `:126-127` | `GetHealthSnapshot` reads `_activeProcessNode` and `_activeEventNode` inside `_healthLock`, but those two fields are written under `_connectionLock` / `_eventConnectionLock` (lines 183, 243, 209-210, 266-269) — a different lock. The health… |
| Driver.Historian.Wonderware-007 | Low | Error handling and resilience | `Ipc/PipeServer.cs:70-75` | When `VerifyCaller` rejects the peer SID, the server logs the reason and calls `_current.Disconnect()` with no `HelloAck` frame sent. The shared-secret-mismatch and major-version-mismatch paths below it both send a rejecting `HelloAck` so… |
| Driver.Historian.Wonderware-008 | Low | Error handling and resilience | `Backend/HistorianDataSource.cs:301-307`, `:374-380` | When `query.StartQuery` returns `false`, `ReadRawAsync` and `ReadAggregateAsync` call `HandleConnectionError()` and return an empty result list. A failed `StartQuery` is not necessarily a connection failure — it can be a bad tag name, an i… |
| Driver.Historian.Wonderware-010 | Low | Performance and resource management | `Backend/HistorianConfiguration.cs:32-36`, `Backend/HistorianDataSource.cs` (all read methods) | `HistorianConfiguration.RequestTimeoutSeconds` is documented as the "outer safety timeout applied to sync-over-async Historian operations" and is copied around (`SdkAlarmHistorianWriteBackend.CloneConfigWithServerName:346`), but it is neve… |
| Driver.Historian.Wonderware-011 | Low | Design-document adherence | `Backend/HistorianDataSource.cs:9-12`, `Backend/IHistorianDataSource.cs:9-11`, `Backend/HistorianSample.cs:7-9`, `Backend/HistorianConfiguration.cs:7-9` | Several XML doc comments reference the retired v1 architecture as if it were current: "inside Galaxy.Host", "the Proxy maps returned samples", "the Host returns these across the IPC boundary as `GalaxyDataValue`", "Populated from ... the P… |
| Driver.Historian.Wonderware-012 | Low | Testing coverage | `Backend/HistorianDataSource.cs`, `tests/Drivers/ZB.MOM.WW.OtOpcUa.Driver.Historian.Wonderware.Tests/` | The unit-test suite covers `HistorianQualityMapper`, `HistorianClusterEndpointPicker`, `SdkAlarmHistorianWriteBackend`, `AahClientManagedAlarmEventWriter`, the IPC round trip, and `Program` alarm-writer wiring. `HistorianDataSource` itself… |
| Driver.Historian.Wonderware.Client-003 | Low | Concurrency & thread safety | `WonderwareHistorianClient.cs:207`, `WonderwareHistorianClient.cs:132-150` | `_totalQueries` is mutated with `Interlocked.Increment` in `Invoke`, but read inside `GetHealthSnapshot` under `_healthLock`, and every other counter (`_totalSuccesses`, `_totalFailures`, `_consecutiveFailures`) is mutated only under `_hea… |
| Driver.Historian.Wonderware.Client-004 | Low | Concurrency & thread safety | `WonderwareHistorianClient.cs:203-267` | A sidecar-reported failure is recorded in two non-atomic steps under separate lock acquisitions: `Invoke` calls `RecordSuccess()` (line 211) and then the caller calls `ThrowIfFailed` which calls `ReclassifySuccessAsFailure()` (line 256), d… |
| Driver.Historian.Wonderware.Client-006 | Low | Error handling & resilience | `Internal/PipeChannel.cs:96-107`, `WonderwareHistorianClientOptions.cs:11-12` | `PipeChannel.InvokeAsync` retries exactly once on transport failure and otherwise propagates. The options expose `ReconnectInitialBackoff` and `ReconnectMaxBackoff` and `WonderwareHistorianClientOptions` documents them as exponential backo… |
| Driver.Historian.Wonderware.Client-008 | Low | Security | `ZB.MOM.WW.OtOpcUa.Driver.Historian.Wonderware.Client.csproj:29-32` | The csproj suppresses two NuGet audit advisories (`GHSA-37gx-xxp4-5rgx`, `GHSA-w3x6-4m5h-cxqf`) for the `MessagePack` 2.5.187 dependency with no inline comment recording why the suppression is safe, who reviewed it, or when it should be re… |
| Driver.Historian.Wonderware.Client-010 | Low | Documentation & comments | `WonderwareHistorianClient.cs:355-361`, `WonderwareHistorianClient.cs:132-150` | Two doc/behaviour mismatches. (1) The `Dispose()` XML comment asserts the underlying channel async cleanup is non-blocking so the `GetAwaiter()/GetResult()` bridge is safe. `PipeChannel.DisposeAsync` calls `ResetTransport()`, which invokes… |
| Driver.Modbus-003 | Low | Concurrency & thread safety | `ModbusDriver.cs:59,188,241,259,266,726,745,759` | `_health` is a non-`volatile` reference field written from multiple threads (concurrent `ReadAsync` callers, the coalesced-read path, `WriteAsync` indirectly, and `ProbeLoopAsync`) and read by `GetHealth()`. Reference assignment is atomic… |
| Driver.Modbus-007 | Low | Design-document adherence | `ModbusDriver.cs:1392`, `ModbusDriverOptions.cs:74-80` | Two design-vs-code drifts. (1) `MapDataType` maps `Int64`/`UInt64` to `DriverDataType.Int32` with the inline comment "widening to Int32 loses precision; PR 25 adds Int64 to DriverDataType". The address-space node for a 64-bit Modbus tag is… |
| Driver.Modbus-008 | Low | Documentation & comments | `ModbusDriver.cs:411-417,700-703,737-744` | Stale/misleading comments. (1) The `<summary>` block at `ModbusDriver.cs:411-417` says auto-prohibited ranges are "Cleared by ReinitializeAsync ... or by an explicit re-probe API (not yet shipped)" — the re-probe loop has shipped (#151, `R… |
| Driver.Modbus-009 | Low | Correctness & logic bugs | `ModbusDriver.cs:1160-1167`, `ModbusTcpTransport.cs:94-95` | Two edge cases. (1) `RegisterCount` for `ModbusDataType.String` computes `(tag.StringLength + 1) / 2`; a tag configured with `StringLength = 0` yields a register count of 0, flowing into `ReadOneAsync` as `totalRegs = 0` and producing an F… |
| Driver.Modbus-010 | Low | Error handling & resilience | `ModbusDriver.cs:864-868`, `ModbusDriverOptions.cs:116-125` | When `WriteOnChangeOnly` is enabled and `IsRedundantWrite` returns true, `WriteAsync` returns `WriteResult(0u)` (Good) without touching the wire. The suppression baseline (`_lastWrittenByRef`) is only invalidated by a *read* that returns a… |
| Driver.Modbus-011 | Low | Code organization & conventions | `ModbusDriver.cs:23-43,89-97,408-432` | Field and member declarations are interleaved with methods throughout `ModbusDriver`. `ResolveHost` (a public method) is the first member of the class, followed by `BuildSlaveHostName`, then a block of fields; `_lastPublishedByRef`/`_lastW… |
| Driver.Modbus-012 | Low | Testing coverage | `tests/Drivers/ZB.MOM.WW.OtOpcUa.Driver.Modbus.Tests/` | The unit suite is broad (coalescing, bisection, auto-recovery, byte order, arrays, BCD, RMW, caps, multi-unit, probe, reconnect, subscription). Gaps relative to the findings above: (1) no test exercises concurrent multi-subscription publis… |
| Driver.Modbus.Addressing-006 | Low | Error handling & resilience | `ModbusAddressParser.cs:297-301` | `TryParseFamilyNative` catches only `ArgumentException` and `OverflowException`. The current helpers throw only those (including `ArgumentOutOfRangeException`, which derives from `ArgumentException`), so today it is correct. But the parser… |
| Driver.Modbus.Addressing-007 | Low | Design-document adherence | `ModbusDataType.cs:91-95`, `docs/v2/dl205.md` section Strings | `ModbusStringByteOrder` (HighByteFirst / LowByteFirst) is defined in this assembly and documented as the DL205 low-byte-first string-packing knob, but `ParsedModbusAddress` has no field for it and `ModbusAddressParser` never produces or co… |
| Driver.Modbus.Addressing-009 | Low | Documentation & comments | `ModbusModiconAddress.cs:55-64`, `ModbusModiconAddress.cs:104-110` | The comments on `ModbusModiconAddress.TryParse` are slightly inaccurate. The remark that 5-digit Modicon is always exactly 5 chars (40001..49999) and 6-digit is exactly 6 (400001..465536-shaped) implies the leading digit is always 4, but t… |
| Driver.Modbus.Cli-003 | Low | Correctness & logic bugs | `src/Drivers/Cli/ZB.MOM.WW.OtOpcUa.Driver.Modbus.Cli/ModbusCommandBase.cs:14-24` | `Port` (`int`) and `TimeoutMs` (`int`) accept any 32-bit value, including negatives and ports above 65535. `UnitId` is a `byte`, so it accepts 0-255 even though the option description and `docs/Driver.Modbus.Cli.md` both say the valid rang… |
| Driver.Modbus.Cli-004 | Low | Concurrency & thread safety | `src/Drivers/Cli/ZB.MOM.WW.OtOpcUa.Driver.Modbus.Cli/Commands/SubscribeCommand.cs:61-67` | The `OnDataChange` handler is invoked from the driver's `PollGroupEngine` background thread and calls `console.Output.WriteLine` synchronously. An exception thrown inside this handler (e.g. an `IOException` on a redirected or closed stdout… |
| Driver.Modbus.Cli-005 | Low | Error handling & resilience | `src/Drivers/Cli/ZB.MOM.WW.OtOpcUa.Driver.Modbus.Cli/Commands/ProbeCommand.cs:21-54`; `Commands/ReadCommand.cs:46-75`; `Commands/WriteCommand.cs:54-89` | All three commands call `ConfigureLogging()` then `console.RegisterCancellationHandler()`, but if the operator presses Ctrl+C before `InitializeAsync` completes, the resulting `OperationCancelledException` propagates out of `ExecuteAsync`… |
| Driver.Modbus.Cli-006 | Low | Error handling & resilience | `src/Drivers/Cli/ZB.MOM.WW.OtOpcUa.Driver.Modbus.Cli/Commands/ProbeCommand.cs:35-53` | `probe` reports `Health: {health.State}` from `GetHealth()`. After a successful `InitializeAsync` the driver sets state to `Healthy` regardless of whether the subsequent probe register read returns Good or a Bad status code. `ReadAsync` do… |
| Driver.Modbus.Cli-007 | Low | Design-document adherence | `docs/Driver.Modbus.Cli.md:124-156`; `src/Drivers/Cli/ZB.MOM.WW.OtOpcUa.Driver.Modbus.Cli/Commands/ReadCommand.cs` | `docs/Driver.Modbus.Cli.md` devotes a whole "v2 addressing grammar" section to the industry-standard tag-address strings (`40001:F:CDAB`, `HR1:I`, `C100`, `V2000:F:CDAB`, etc.) and says "set the per-tag `addressString` field instead of the… |
| Driver.Modbus.Cli-008 | Low | Testing coverage | `tests/Drivers/Cli/ZB.MOM.WW.OtOpcUa.Driver.Modbus.Cli.Tests/` | The test project covers only the two pure-function seams: `ReadCommand.SynthesiseTagName` and `WriteCommand.ParseValue`. There is no coverage for `WriteCommand`'s read-only-region rejection (`Region is not (Coils or HoldingRegisters)`), no… |
| Driver.OpcUaClient-011 | Low | Documentation & comments | `OpcUaClientDriver.cs:783-784` | The comment on the isArray computation states "-1 = scalar; 1+ = array dimensions; 0 = one-dimensional array". This is inaccurate against OPC UA ValueRank semantics: -3 is ScalarOrOneDimension, -2 is Any, -1 is Scalar, and 0 is OneOrMoreDi… |
| Driver.OpcUaClient-014 | Low | Performance & resource management | `OpcUaClientDriver.cs:904`, `:1035` | `MonitoredItem.Notification += (mi, args) => ...` (and the alarm-event equivalent) attaches a closure-capturing lambda to each monitored item's event. The lambda is never detached. When UnsubscribeAsync removes a subscription it calls Subs… |
| Driver.S7.Cli-004 | Low | Performance & resource management | `src/Drivers/Cli/ZB.MOM.WW.OtOpcUa.Driver.S7.Cli/Commands/ProbeCommand.cs:36,53`, `Commands/ReadCommand.cs:45,54`, `Commands/WriteCommand.cs:51,60`, `Commands/SubscribeCommand.cs:39,73` | Every command declares the driver with `await using var driver = new S7Driver(...)` and *also* calls `await driver.ShutdownAsync(...)` in a `finally` block. `S7Driver.DisposeAsync` itself calls `ShutdownAsync`, so shutdown runs twice per c… |
| Driver.S7.Cli-005 | Low | Code organization & conventions | `tests/ZB.MOM.WW.OtOpcUa.Driver.S7.Cli.Tests/` | A stale directory `tests/ZB.MOM.WW.OtOpcUa.Driver.S7.Cli.Tests/` exists containing only an `obj/` folder — no `.csproj`, no source. The real test project lives at `tests/Drivers/Cli/ZB.MOM.WW.OtOpcUa.Driver.S7.Cli.Tests/`. The empty direct… |
| Driver.S7.Cli-006 | Low | Testing coverage | `tests/Drivers/Cli/ZB.MOM.WW.OtOpcUa.Driver.S7.Cli.Tests/WriteCommandParseValueTests.cs` | The only test file covers `WriteCommand.ParseValue` and `ReadCommand.SynthesiseTagName`. `S7CommandBase.BuildOptions` — which maps the host / port / CPU / rack / slot / timeout flags onto an `S7DriverOptions` and forces `Probe.Enabled = fa… |
| Driver.S7.Cli-007 | Low | Documentation & comments | `src/Drivers/Cli/ZB.MOM.WW.OtOpcUa.Driver.S7.Cli/Commands/SubscribeCommand.cs:45-51` | The Modbus CLI `SubscribeCommand` carries an explanatory comment on the `OnDataChange` handler ("Route every data-change event to the CliFx console (not System.Console — the analyzer flags it + IConsole is the testable abstraction)"). The… |
| Driver.TwinCAT-004 | Low | Correctness & logic bugs | `TwinCATDataType.cs:24-27` | The inline comments for the IEC time types are inaccurate. TwinCAT `TIME` is a duration (32-bit, milliseconds) — not "ms since epoch of day". `DATE` is stored as seconds since 1970-01-01 (truncated to a day boundary), not "days since 1970-… |
| Driver.TwinCAT-006 | Low | OtOpcUa conventions | `TwinCATDriver.cs:406-411` | `ResolveHost` falls back to `DriverInstanceId` when there are no configured devices and the reference is unknown. `DriverInstanceId` is a logical config-DB identifier, not a host address; `IPerCallHostResolver` consumers expect a host key… |
| Driver.TwinCAT-014 | Low | Design-document adherence | `TwinCATDriverOptions.cs:41-43`, `TwinCATDriverOptions.cs:57-62`, `AdsTwinCATClient.cs:145` | Several drifts between the implemented config surface and `docs/v2/driver-specs.md` section 6. The spec connection-settings list has separate `Host` (IP), `AmsNetId`, and `AmsPort` fields; the implementation collapses these into a single `… |
| Driver.TwinCAT-015 | Low | Code organization & conventions | `TwinCATDriver.cs:431-432` | `Dispose()` runs `DisposeAsync().AsTask().GetAwaiter().GetResult()` — sync-over-async. `docs/v2/driver-stability.md` section Galaxy explicitly lists "sync-over-async on the OPC UA stack thread" among the four 2026-04-13 stability findings… |
| Driver.TwinCAT-016 | Low | Testing coverage | `tests/Drivers/ZB.MOM.WW.OtOpcUa.Driver.TwinCAT.Tests/` | Unit coverage exists for AMS-address parsing, symbol-path parsing, read/write, native notifications, symbol browse, and the capability surface. Gaps tied to the findings above: no test exercises `ReinitializeAsync` with a changed config (D… |
| Driver.TwinCAT.Cli-001 | Low | Correctness & logic bugs | `TwinCATCommandBase.cs:23-24`, `Commands/SubscribeCommand.cs:23-24`, `Commands/BrowseCommand.cs:21-24` | Numeric command options are accepted without range validation. `--timeout-ms` feeds `Timeout => TimeSpan.FromMilliseconds(TimeoutMs)`; passing `--timeout-ms 0` or a negative value yields `TimeSpan.Zero`/a negative `TimeSpan`, which is then… |
| Driver.TwinCAT.Cli-002 | Low | Concurrency & thread safety | `Commands/SubscribeCommand.cs:46-58` | The `OnDataChange` handler calls `console.Output.WriteLine(line)` synchronously. In native ADS-notification mode the event is raised from the `Beckhoff.TwinCAT.Ads` notification callback thread (see `TwinCATDriver.SubscribeAsync`, which in… |
| Driver.TwinCAT.Cli-003 | Low | Error handling & resilience | `Commands/SubscribeCommand.cs:56-58` | The subscribe banner reports the mechanism purely from the `--poll-only` flag (`var mode = PollOnly ? "polling" : "ADS notification"`). The doc (`docs/Driver.TwinCAT.Cli.md`) states the banner "announces which mechanism is in play". The CL… |
| Driver.TwinCAT.Cli-004 | Low | Design-document adherence | `TwinCATCommandBase.cs:26-29`, `Commands/BrowseCommand.cs` | `--poll-only` is declared on `TwinCATCommandBase`, so it is inherited by `browse`. `BrowseCommand` only ever calls `DiscoverAsync` — it never subscribes — so `UseNativeNotifications = !PollOnly` has no observable effect on a browse run. Th… |
| Driver.TwinCAT.Cli-005 | Low | Code organization & conventions | `Commands/ProbeCommand.cs:23`, `Commands/ReadCommand.cs:20`, `Commands/WriteCommand.cs:20`, `Commands/SubscribeCommand.cs:18` | The `--type` option is declared with the short alias `-t` on `read`, `write`, and `subscribe`, but `ProbeCommand` declares `[CommandOption("type", ...)]` with no short alias. An operator who has internalised `-t` from the other three verbs… |
| Driver.TwinCAT.Cli-006 | Low | Testing coverage | `tests/Drivers/Cli/ZB.MOM.WW.OtOpcUa.Driver.TwinCAT.Cli.Tests/WriteCommandParseValueTests.cs` | The only test file covers `WriteCommand.ParseValue` and `ReadCommand.SynthesiseTagName`. Other deterministic, router-independent logic is untested: `TwinCATCommandBase.Gateway` (the `ads://{netId}:{port}` string the driver's `TwinCATAmsAdd… |
| Driver.TwinCAT.Cli-007 | Low | Documentation & comments | `TwinCATCommandBase.cs:31-36` | The `Timeout` override has an empty `init` accessor with the comment `/* driven by TimeoutMs */`. Because the base `DriverCommandBase.Timeout` is declared `abstract { get; init; }`, the override must supply an `init`, but here it silently… |
_No pending findings._
## Closed findings
@@ -160,6 +75,7 @@ Findings with status `Resolved`, `Won't Fix`, or `Deferred`.
| Core.AlarmHistorian-006 | High | Resolved | Error handling & resilience | `src/Core/ZB.MOM.WW.OtOpcUa.Core.AlarmHistorian/SqliteStoreAndForwardSink.cs:103,135-216` |
| Core.ScriptedAlarms-001 | High | Resolved | Concurrency & thread safety | `ScriptedAlarmEngine.cs:175`, `ScriptedAlarmEngine.cs:178`, `ScriptedAlarmEngine.cs:73`, `ScriptedAlarmEngine.cs:368` |
| Core.Scripting-002 | High | Resolved | Security | `ForbiddenTypeAnalyzer.cs:70` |
| Core.Scripting-012 | High | Resolved | Security | `ForbiddenTypeAnalyzer.cs:60-76`, `ScriptSandbox.cs:96-126` |
| Core.VirtualTags-001 | High | Resolved | Correctness & logic bugs | `src/Core/ZB.MOM.WW.OtOpcUa.Core.VirtualTags/VirtualTagEngine.cs:306` |
| Driver.AbCip-001 | High | Resolved | Correctness & logic bugs | `AbCipDriver.cs:111`, `AbCipDriver.cs:163-167` |
| Driver.AbCip-002 | High | Resolved | Correctness & logic bugs | `AbCipStatusMapper.cs:65-78` |
@@ -168,6 +84,7 @@ Findings with status `Resolved`, `Won't Fix`, or `Deferred`.
| Driver.AbLegacy-001 | High | Resolved | Correctness & logic bugs | `AbLegacyAddress.cs:54`, `AbLegacyDriver.cs:368-374` |
| Driver.AbLegacy-006 | High | Resolved | Concurrency & thread safety | `AbLegacyDriver.cs:107-158`, `AbLegacyDriver.cs:162-234`, `LibplctagLegacyTagRuntime.cs` |
| Driver.Cli.Common-001 | High | Resolved | Correctness & logic bugs | `src/Drivers/Cli/ZB.MOM.WW.OtOpcUa.Driver.Cli.Common/SnapshotFormatter.cs:106-119` |
| Driver.Cli.Common-007 | High | Resolved | Correctness & logic bugs | `src/Drivers/Cli/ZB.MOM.WW.OtOpcUa.Driver.Cli.Common/SnapshotFormatter.cs:129` |
| Driver.FOCAS-001 | High | Resolved | Correctness & logic bugs | `FocasDriverFactoryExtensions.cs:54-86`, `FocasDriverFactoryExtensions.cs:132-140` |
| Driver.FOCAS-002 | High | Resolved | Correctness & logic bugs | `WireFocasClient.cs:164-179`, `FocasDriver.cs:513`, `FocasDriver.cs:593` |
| Driver.Galaxy-002 | High | Resolved | Correctness & logic bugs | `Browse/DataTypeMap.cs:13`, `Runtime/MxValueDecoder.cs:9` |
@@ -234,6 +151,9 @@ Findings with status `Resolved`, `Won't Fix`, or `Deferred`.
| Core.Scripting-004 | Medium | Resolved | Correctness & logic bugs | `DependencyExtractor.cs:73` |
| Core.Scripting-007 | Medium | Resolved | Error handling & resilience | `TimedScriptEvaluator.cs:60` |
| Core.Scripting-010 | Medium | Resolved | Testing coverage | `tests/Core/ZB.MOM.WW.OtOpcUa.Core.Scripting.Tests/ScriptSandboxTests.cs:54` |
| Core.Scripting-013 | Medium | Resolved | Security | `ScriptEvaluator.cs:202-225` (`BuildWrapperSource`) |
| Core.Scripting-014 | Medium | Resolved | Concurrency & thread safety | `CompiledScriptCache.cs:91-103` (`Clear`) |
| Core.Scripting-016 | Medium | Resolved | Performance & resource management | `src/Core/ZB.MOM.WW.OtOpcUa.Core.VirtualTags/VirtualTagEngine.cs:74-117`, `src/Core/ZB.MOM.WW.OtOpcUa.Core.ScriptedAlarms/ScriptedAlarmEngine.cs:139-182` |
| Core.VirtualTags-002 | Medium | Resolved | Correctness & logic bugs | `src/Core/ZB.MOM.WW.OtOpcUa.Core.VirtualTags/VirtualTagEngine.cs:237` |
| Core.VirtualTags-003 | Medium | Resolved | Correctness & logic bugs | `src/Core/ZB.MOM.WW.OtOpcUa.Core.VirtualTags/VirtualTagEngine.cs:117-120` |
| Core.VirtualTags-005 | Medium | Resolved | Concurrency & thread safety | `src/Core/ZB.MOM.WW.OtOpcUa.Core.VirtualTags/VirtualTagSource.cs:50-64` |
@@ -271,6 +191,7 @@ Findings with status `Resolved`, `Won't Fix`, or `Deferred`.
| Driver.Galaxy-009 | Medium | Resolved | Error handling & resilience | `GalaxyDriver.cs:354-371` |
| Driver.Galaxy-011 | Medium | Resolved | Performance & resource management | `GalaxyDriver.cs:411` |
| Driver.Galaxy-014 | Medium | Resolved | Testing coverage | `src/Drivers/ZB.MOM.WW.OtOpcUa.Driver.Galaxy` (module-wide) |
| Driver.Galaxy-016 | Medium | Resolved | Performance & resource management | `ZB.MOM.WW.OtOpcUa.Driver.Galaxy.csproj:43-47`, `libs/README.md:32-37` |
| Driver.Historian.Wonderware-002 | Medium | Resolved | Correctness and logic bugs | `Ipc/HistorianFrameHandler.cs:162`, `:181` |
| Driver.Historian.Wonderware-003 | Medium | Resolved | Correctness and logic bugs | `Backend/HistorianDataSource.cs:320-323`, `:457-460` |
| Driver.Historian.Wonderware-006 | Medium | Resolved | Error handling and resilience | `Ipc/PipeServer.cs:120-128` |
@@ -326,6 +247,25 @@ Findings with status `Resolved`, `Won't Fix`, or `Deferred`.
| Analyzers-004 | Low | Resolved | Performance & resource management | `src/Tooling/ZB.MOM.WW.OtOpcUa.Analyzers/UnwrappedCapabilityCallAnalyzer.cs:95-112` |
| Analyzers-005 | Low | Resolved | Design-document adherence | `src/Tooling/ZB.MOM.WW.OtOpcUa.Analyzers/UnwrappedCapabilityCallAnalyzer.cs:33-43` |
| Analyzers-007 | Low | Resolved | Documentation & comments | `src/Tooling/ZB.MOM.WW.OtOpcUa.Analyzers/UnwrappedCapabilityCallAnalyzer.cs:21-26` |
| Client.CLI-002 | Low | Resolved | Correctness & logic bugs | `Commands/SubscribeCommand.cs:129-137` |
| Client.CLI-003 | Low | Resolved | Correctness & logic bugs | `Commands/BrowseCommand.cs:29-30`, `Commands/SubscribeCommand.cs:20-27`, `Commands/AlarmsCommand.cs:28-29`, `Commands/HistoryReadCommand.cs:42-43` |
| Client.CLI-004 | Low | Resolved | OtOpcUa conventions | `Commands/SubscribeCommand.cs:13-37` |
| Client.CLI-006 | Low | Resolved | Error handling & resilience | `Commands/HistoryReadCommand.cs:73`, `Commands/HistoryReadCommand.cs:76`, `Helpers/NodeIdParser.cs:39` |
| Client.CLI-007 | Low | Resolved | Performance & resource management | `CommandBase.cs:112-123` |
| Client.CLI-008 | Low | Resolved | Documentation & comments | `docs/Client.CLI.md:158-217` |
| Client.CLI-009 | Low | Resolved | Code organization & conventions | `Commands/SubscribeCommand.cs:66-165`, `Commands/AlarmsCommand.cs:52-91` |
| Client.CLI-010 | Low | Resolved | Testing coverage | `tests/Client/ZB.MOM.WW.OtOpcUa.Client.CLI.Tests/SubscribeCommandTests.cs` |
| Client.Shared-003 | Low | Resolved | Correctness & logic bugs | `Adapters/DefaultSessionAdapter.cs:76`, `Adapters/DefaultSessionAdapter.cs:273` |
| Client.Shared-004 | Low | Resolved | OtOpcUa conventions | `Adapters/DefaultSessionAdapter.cs:228`, `Adapters/DefaultSessionAdapter.cs:121`, `Adapters/DefaultSessionAdapter.cs:172` |
| Client.Shared-009 | Low | Resolved | Error handling & resilience / Documentation & comments | `OpcUaClientService.cs:302-322` |
| Client.Shared-010 | Low | Resolved | Performance & resource management | `Models/ConnectionSettings.cs:48`, `OpcUaClientService.cs:408-417` |
| Client.Shared-011 | Low | Resolved | Testing coverage | `tests/Client/ZB.MOM.WW.OtOpcUa.Client.Shared.Tests/OpcUaClientServiceTests.cs` |
| Client.UI-003 | Low | Resolved | OtOpcUa conventions | `ZB.MOM.WW.OtOpcUa.Client.UI.csproj:20-21`, `Program.cs:14-20` |
| Client.UI-004 | Low | Resolved | OtOpcUa conventions | `Views/MainWindow.axaml.cs:125-138` |
| Client.UI-006 | Low | Resolved | Error handling & resilience | `ViewModels/MainWindowViewModel.cs:244-252`, `ViewModels/AlarmsViewModel.cs:88-112`, `ViewModels/SubscriptionsViewModel.cs:79-94` |
| Client.UI-009 | Low | Resolved | Design-document adherence | `ViewModels/HistoryViewModel.cs:44-54` |
| Client.UI-010 | Low | Resolved | Code organization & conventions | `Controls/DateTimeRangePicker.axaml.cs:33-37`, `Controls/DateTimeRangePicker.axaml.cs:70-80` |
| Client.UI-011 | Low | Resolved | Documentation & comments | `Views/MainWindow.axaml:81`, `Services/JsonSettingsService.cs:11-15` |
| Configuration-004 | Low | Resolved | OtOpcUa conventions | `src/Core/ZB.MOM.WW.OtOpcUa.Configuration/Enums/NodePermissions.cs:8`, `src/Core/ZB.MOM.WW.OtOpcUa.Configuration/OtOpcUaConfigDbContext.cs:417` |
| Configuration-005 | Low | Resolved | Concurrency & thread safety | `src/Core/ZB.MOM.WW.OtOpcUa.Configuration/LocalCache/LiteDbConfigCache.cs:50` |
| Configuration-007 | Low | Resolved | Error handling & resilience | `src/Core/ZB.MOM.WW.OtOpcUa.Configuration/Apply/GenerationApplier.cs:44` |
@@ -347,14 +287,16 @@ Findings with status `Resolved`, `Won't Fix`, or `Deferred`.
| Core.ScriptedAlarms-003 | Low | Resolved | Documentation & comments | `ScriptedAlarmEngine.cs:343`, `docs/ScriptedAlarms.md:107` |
| Core.ScriptedAlarms-006 | Low | Resolved | Concurrency & thread safety | `ScriptedAlarmEngine.cs:232`, `ScriptedAlarmEngine.cs:369` |
| Core.ScriptedAlarms-008 | Low | Resolved | Performance & resource management | `Part9StateMachine.cs:261-268` |
| Core.ScriptedAlarms-009 | Low | Won't Fix | Performance & resource management | `ScriptedAlarmEngine.cs:309-315`, `ScriptedAlarmEngine.cs:271` |
| Core.ScriptedAlarms-009 | Low | Resolved | Performance & resource management | `ScriptedAlarmEngine.cs:309-315`, `ScriptedAlarmEngine.cs:271` |
| Core.ScriptedAlarms-010 | Low | Resolved | Design-document adherence | `ScriptedAlarmEngine.cs:325-336`, `AlarmPredicateContext.cs:33-40`, `MessageTemplate.cs:47` |
| Core.ScriptedAlarms-011 | Low | Resolved | Code organization & conventions | `Part9StateMachine.cs:275` |
| Core.ScriptedAlarms-013 | Low | Resolved | Documentation & comments | `ScriptedAlarmEngine.cs:66-81` |
| Core.Scripting-005 | Low | Resolved | Correctness & logic bugs | `DependencyExtractor.cs:97` |
| Core.Scripting-006 | Low | Resolved | Concurrency & thread safety | `CompiledScriptCache.cs:55` |
| Core.Scripting-008 | Low | Won't Fix | Performance & resource management | `CompiledScriptCache.cs:34`, `ScriptEvaluator.cs:34` |
| Core.Scripting-008 | Low | Resolved | Performance & resource management | `CompiledScriptCache.cs:34`, `ScriptEvaluator.cs:34` |
| Core.Scripting-009 | Low | Resolved | Design-document adherence | `ForbiddenTypeAnalyzer.cs:45` |
| Core.Scripting-011 | Low | Resolved | Testing coverage | `tests/Core/ZB.MOM.WW.OtOpcUa.Core.Scripting.Tests/` |
| Core.Scripting-015 | Low | Resolved | Correctness & logic bugs | `ScriptEvaluator.cs:234-270` (`ToCSharpTypeName`) |
| Core.VirtualTags-004 | Low | Resolved | Correctness & logic bugs | `src/Core/ZB.MOM.WW.OtOpcUa.Core.VirtualTags/VirtualTagEngine.cs:349` |
| Core.VirtualTags-006 | Low | Resolved | Concurrency & thread safety | `src/Core/ZB.MOM.WW.OtOpcUa.Core.VirtualTags/VirtualTagEngine.cs:177-182`, `:395-401` |
| Core.VirtualTags-007 | Low | Resolved | Error handling & resilience | `src/Core/ZB.MOM.WW.OtOpcUa.Core.VirtualTags/TimerTriggerScheduler.cs:58` |
@@ -367,26 +309,95 @@ Findings with status `Resolved`, `Won't Fix`, or `Deferred`.
| Driver.AbCip-012 | Low | Resolved | Performance & resource management | `LibplctagTemplateReader.cs:15-35`, `AbCipDriver.cs:88-92` |
| Driver.AbCip-013 | Low | Resolved | Design-document adherence | `AbCipDriverOptions.cs:70-73`, `PlcFamilies/AbCipPlcFamilyProfile.cs:13-19`, `LibplctagTagRuntime.cs:16-27` |
| Driver.AbCip-015 | Low | Resolved | Documentation & comments | `AbCipDriver.cs:9-11`, `PlcTagHandle.cs:23-27,53-58`, `AbCipTemplateCache.cs:12-15`, `IAbCipTagEnumerator.cs:6-11`, `AbCipDriverOptions.cs:21` |
| Driver.AbCip.Cli-003 | Low | Resolved | Concurrency & thread safety | `src/Drivers/Cli/ZB.MOM.WW.OtOpcUa.Driver.AbCip.Cli/Commands/SubscribeCommand.cs:50-56,60-61` |
| Driver.AbCip.Cli-004 | Low | Resolved | Error handling & resilience | `src/Drivers/Cli/ZB.MOM.WW.OtOpcUa.Driver.AbCip.Cli/Commands/SubscribeCommand.cs:28,58`; `AbCipCommandBase.cs:26-34` |
| Driver.AbCip.Cli-005 | Low | Resolved | Performance & resource management | `src/Drivers/Cli/ZB.MOM.WW.OtOpcUa.Driver.Cli.Common/DriverCommandBase.cs:51-59` |
| Driver.AbCip.Cli-006 | Low | Resolved | Design-document adherence | `src/Drivers/Cli/ZB.MOM.WW.OtOpcUa.Driver.AbCip.Cli/AbCipCommandBase.cs:29-34` |
| Driver.AbCip.Cli-007 | Low | Resolved | Testing coverage | `tests/Drivers/Cli/ZB.MOM.WW.OtOpcUa.Driver.AbCip.Cli.Tests/WriteCommandParseValueTests.cs` |
| Driver.AbCip.Cli-008 | Low | Resolved | Documentation & comments | `docs/Driver.AbCip.Cli.md:8-9` |
| Driver.AbLegacy-005 | Low | Resolved | OtOpcUa conventions | `AbLegacyDriver.cs` (whole file) |
| Driver.AbLegacy-011 | Low | Resolved | Performance & resource management | `AbLegacyDriver.cs:440` |
| Driver.AbLegacy-013 | Low | Resolved | Code organization & conventions | `AbLegacyDriver.cs:340-345`, `AbLegacyDriver.cs:238-264` |
| Driver.AbLegacy.Cli-002 | Low | Resolved | Correctness & logic bugs | `Commands/WriteCommand.cs:27-29`, `Program.cs:6-9` |
| Driver.AbLegacy.Cli-003 | Low | Resolved | Concurrency & thread safety | `Commands/SubscribeCommand.cs:47-53` |
| Driver.AbLegacy.Cli-004 | Low | Resolved | Error handling & resilience | `Commands/ProbeCommand.cs:37-56`, `Commands/ReadCommand.cs:39-50`, `Commands/WriteCommand.cs:48-59`, `Commands/SubscribeCommand.cs:41-76` |
| Driver.AbLegacy.Cli-005 | Low | Resolved | Design-document adherence | `Commands/SubscribeCommand.cs:23-25`, `docs/Driver.AbLegacy.Cli.md:94-96` |
| Driver.AbLegacy.Cli-006 | Low | Resolved | Code organization & conventions | `Commands/ProbeCommand.cs:20-22` |
| Driver.AbLegacy.Cli-007 | Low | Resolved | Testing coverage | `tests/Drivers/Cli/ZB.MOM.WW.OtOpcUa.Driver.AbLegacy.Cli.Tests/WriteCommandParseValueTests.cs` |
| Driver.Cli.Common-004 | Low | Resolved | Error handling & resilience | `src/Drivers/Cli/ZB.MOM.WW.OtOpcUa.Driver.Cli.Common/SnapshotFormatter.cs:68-70` |
| Driver.Cli.Common-006 | Low | Resolved | Documentation & comments | `src/Drivers/Cli/ZB.MOM.WW.OtOpcUa.Driver.Cli.Common/SnapshotFormatter.cs:71`, `src/Drivers/Cli/ZB.MOM.WW.OtOpcUa.Driver.Cli.Common/DriverCommandBase.cs:9` |
| Driver.Cli.Common-008 | Low | Resolved | Testing coverage | `tests/Drivers/Cli/ZB.MOM.WW.OtOpcUa.Driver.Cli.Common.Tests/SnapshotFormatterTests.cs:50-64` |
| Driver.FOCAS-007 | Low | Resolved | Error handling & resilience | `FocasDriver.cs:140-148`, `FocasDriver.cs:478-484`, `FocasDriver.cs:529-533`, `FocasAlarmProjection.cs:61-63` |
| Driver.FOCAS-008 | Low | Resolved | Performance & resource management | `FocasDriver.cs:201`, `FocasDriver.cs:253` |
| Driver.FOCAS-009 | Low | Resolved | Design-document adherence | `FocasDriverOptions.cs:110-115`, `FocasDriver.cs:468-486`, `FocasDriverFactoryExtensions.cs:75-80` |
| Driver.FOCAS-010 | Low | Resolved | Code organization & conventions | `IFocasClient.cs:210-227` (`FocasOpMode`), `FocasConstants.cs:42-78` (`FocasOperationMode`) |
| Driver.FOCAS-011 | Low | Resolved | Code organization & conventions | `IFocasClient.cs:275-287` (`FocasAlarmType`), `FocasAlarmProjection.cs:149-175` |
| Driver.FOCAS.Cli-001 | Low | Resolved | Error handling & resilience | `Commands/WriteCommand.cs:58-68` |
| Driver.FOCAS.Cli-002 | Low | Resolved | Concurrency & thread safety | `Commands/SubscribeCommand.cs:45-51` |
| Driver.FOCAS.Cli-003 | Low | Resolved | Error handling & resilience | `FocasCommandBase.cs:19` (`CncPort`), `FocasCommandBase.cs:27` (`TimeoutMs`), `Commands/SubscribeCommand.cs:23` (`IntervalMs`) |
| Driver.FOCAS.Cli-004 | Low | Resolved | Performance & resource management | `Commands/ProbeCommand.cs:37,54`; `Commands/ReadCommand.cs:37,46`; `Commands/WriteCommand.cs:45,54`; `Commands/SubscribeCommand.cs:39,73` |
| Driver.FOCAS.Cli-005 | Low | Resolved | Design-document adherence | `Commands/WriteCommand.cs:50`, `Commands/ProbeCommand.cs:50` (via `SnapshotFormatter.FormatStatus`) |
| Driver.Galaxy-005 | Low | Resolved | OtOpcUa conventions | `Runtime/EventPump.cs:81-88` |
| Driver.Galaxy-010 | Low | Resolved | Security | `GalaxyDriver.cs:311-341` |
| Driver.Galaxy-012 | Low | Resolved | Performance & resource management | `Runtime/SubscriptionRegistry.cs:65-67`, `GalaxyDriver.cs:538`, `GalaxyDriver.cs:675` |
| Driver.Galaxy-013 | Low | Resolved | Design-document adherence | `GalaxyDriver.cs:14-27`, `GalaxyDriver.cs:374-382`, `Config/GalaxyDriverOptions.cs:84-86` |
| Driver.Galaxy-017 | Low | Deferred | Design-document adherence | `src/Drivers/ZB.MOM.WW.OtOpcUa.Driver.Galaxy/` (no source change), gateway proto contract |
| Driver.Galaxy-018 | Low | Resolved | Documentation & comments | `libs/README.md:32-37`, `ZB.MOM.WW.OtOpcUa.Driver.Galaxy.csproj:40-47` |
| Driver.Historian.Wonderware-004 | Low | Resolved | Correctness and logic bugs | `Backend/SdkAlarmHistorianWriteBackend.cs:198-201` |
| Driver.Historian.Wonderware-005 | Low | Resolved | Concurrency and thread safety | `Backend/HistorianDataSource.cs:124`, `:126-127` |
| Driver.Historian.Wonderware-007 | Low | Resolved | Error handling and resilience | `Ipc/PipeServer.cs:70-75` |
| Driver.Historian.Wonderware-008 | Low | Resolved | Error handling and resilience | `Backend/HistorianDataSource.cs:301-307`, `:374-380` |
| Driver.Historian.Wonderware-010 | Low | Resolved | Performance and resource management | `Backend/HistorianConfiguration.cs:32-36`, `Backend/HistorianDataSource.cs` (all read methods) |
| Driver.Historian.Wonderware-011 | Low | Resolved | Design-document adherence | `Backend/HistorianDataSource.cs:9-12`, `Backend/IHistorianDataSource.cs:9-11`, `Backend/HistorianSample.cs:7-9`, `Backend/HistorianConfiguration.cs:7-9` |
| Driver.Historian.Wonderware-012 | Low | Resolved | Testing coverage | `Backend/HistorianDataSource.cs`, `tests/Drivers/ZB.MOM.WW.OtOpcUa.Driver.Historian.Wonderware.Tests/` |
| Driver.Historian.Wonderware.Client-003 | Low | Resolved | Concurrency & thread safety | `WonderwareHistorianClient.cs:207`, `WonderwareHistorianClient.cs:132-150` |
| Driver.Historian.Wonderware.Client-004 | Low | Resolved | Concurrency & thread safety | `WonderwareHistorianClient.cs:203-267` |
| Driver.Historian.Wonderware.Client-006 | Low | Resolved | Error handling & resilience | `Internal/PipeChannel.cs:96-107`, `WonderwareHistorianClientOptions.cs:11-12` |
| Driver.Historian.Wonderware.Client-008 | Low | Resolved | Security | `ZB.MOM.WW.OtOpcUa.Driver.Historian.Wonderware.Client.csproj:29-32` |
| Driver.Historian.Wonderware.Client-010 | Low | Resolved | Documentation & comments | `WonderwareHistorianClient.cs:355-361`, `WonderwareHistorianClient.cs:132-150` |
| Driver.Modbus-003 | Low | Resolved | Concurrency & thread safety | `ModbusDriver.cs:59,188,241,259,266,726,745,759` |
| Driver.Modbus-007 | Low | Resolved | Design-document adherence | `ModbusDriver.cs:1392`, `ModbusDriverOptions.cs:74-80` |
| Driver.Modbus-008 | Low | Resolved | Documentation & comments | `ModbusDriver.cs:411-417,700-703,737-744` |
| Driver.Modbus-009 | Low | Resolved | Correctness & logic bugs | `ModbusDriver.cs:1160-1167`, `ModbusTcpTransport.cs:94-95` |
| Driver.Modbus-010 | Low | Resolved | Error handling & resilience | `ModbusDriver.cs:864-868`, `ModbusDriverOptions.cs:116-125` |
| Driver.Modbus-011 | Low | Resolved | Code organization & conventions | `ModbusDriver.cs:23-43,89-97,408-432` |
| Driver.Modbus-012 | Low | Resolved | Testing coverage | `tests/Drivers/ZB.MOM.WW.OtOpcUa.Driver.Modbus.Tests/` |
| Driver.Modbus.Addressing-006 | Low | Resolved | Error handling & resilience | `ModbusAddressParser.cs:297-301` |
| Driver.Modbus.Addressing-007 | Low | Resolved | Design-document adherence | `ModbusDataType.cs:91-95`, `docs/v2/dl205.md` section Strings |
| Driver.Modbus.Addressing-009 | Low | Resolved | Documentation & comments | `ModbusModiconAddress.cs:55-64`, `ModbusModiconAddress.cs:104-110` |
| Driver.Modbus.Cli-003 | Low | Resolved | Correctness & logic bugs | `src/Drivers/Cli/ZB.MOM.WW.OtOpcUa.Driver.Modbus.Cli/ModbusCommandBase.cs:14-24` |
| Driver.Modbus.Cli-004 | Low | Resolved | Concurrency & thread safety | `src/Drivers/Cli/ZB.MOM.WW.OtOpcUa.Driver.Modbus.Cli/Commands/SubscribeCommand.cs:61-67` |
| Driver.Modbus.Cli-005 | Low | Resolved | Error handling & resilience | `src/Drivers/Cli/ZB.MOM.WW.OtOpcUa.Driver.Modbus.Cli/Commands/ProbeCommand.cs:21-54`; `Commands/ReadCommand.cs:46-75`; `Commands/WriteCommand.cs:54-89` |
| Driver.Modbus.Cli-006 | Low | Resolved | Error handling & resilience | `src/Drivers/Cli/ZB.MOM.WW.OtOpcUa.Driver.Modbus.Cli/Commands/ProbeCommand.cs:35-53` |
| Driver.Modbus.Cli-007 | Low | Resolved | Design-document adherence | `docs/Driver.Modbus.Cli.md:124-156`; `src/Drivers/Cli/ZB.MOM.WW.OtOpcUa.Driver.Modbus.Cli/Commands/ReadCommand.cs` |
| Driver.Modbus.Cli-008 | Low | Resolved | Testing coverage | `tests/Drivers/Cli/ZB.MOM.WW.OtOpcUa.Driver.Modbus.Cli.Tests/` |
| Driver.OpcUaClient-011 | Low | Resolved | Documentation & comments | `OpcUaClientDriver.cs:1007-1015` |
| Driver.OpcUaClient-014 | Low | Resolved | Performance & resource management | `OpcUaClientDriver.cs:1138`, `:1314` |
| Driver.S7-003 | Low | Resolved | Correctness & logic bugs | `S7Driver.cs:172`, `S7Driver.cs:255` |
| Driver.S7-005 | Low | Resolved | OtOpcUa conventions | `S7Driver.cs:33`, `S7Driver.cs:433` |
| Driver.S7-009 | Low | Resolved | Error handling & resilience | `S7Driver.cs:392` |
| Driver.S7-010 | Low | Resolved | Performance & resource management | `S7Driver.cs:504` |
| Driver.S7-013 | Low | Resolved | Code organization & conventions | `S7DriverOptions.cs:90`, `S7Driver.cs:300` |
| Driver.S7.Cli-004 | Low | Resolved | Performance & resource management | `src/Drivers/Cli/ZB.MOM.WW.OtOpcUa.Driver.S7.Cli/Commands/ProbeCommand.cs:36,53`, `Commands/ReadCommand.cs:45,54`, `Commands/WriteCommand.cs:51,60`, `Commands/SubscribeCommand.cs:39,73` |
| Driver.S7.Cli-005 | Low | Resolved | Code organization & conventions | `tests/ZB.MOM.WW.OtOpcUa.Driver.S7.Cli.Tests/` |
| Driver.S7.Cli-006 | Low | Resolved | Testing coverage | `tests/Drivers/Cli/ZB.MOM.WW.OtOpcUa.Driver.S7.Cli.Tests/WriteCommandParseValueTests.cs` |
| Driver.S7.Cli-007 | Low | Resolved | Documentation & comments | `src/Drivers/Cli/ZB.MOM.WW.OtOpcUa.Driver.S7.Cli/Commands/SubscribeCommand.cs:45-51` |
| Driver.TwinCAT-004 | Low | Resolved | Correctness & logic bugs | `TwinCATDataType.cs:24-27` |
| Driver.TwinCAT-006 | Low | Resolved | OtOpcUa conventions | `TwinCATDriver.cs:406-411` |
| Driver.TwinCAT-014 | Low | Resolved | Design-document adherence | `TwinCATDriverOptions.cs:41-43`, `TwinCATDriverOptions.cs:57-62`, `AdsTwinCATClient.cs:145` |
| Driver.TwinCAT-015 | Low | Resolved | Code organization & conventions | `TwinCATDriver.cs:431-432` |
| Driver.TwinCAT-016 | Low | Resolved | Testing coverage | `tests/Drivers/ZB.MOM.WW.OtOpcUa.Driver.TwinCAT.Tests/` |
| Driver.TwinCAT.Cli-001 | Low | Resolved | Correctness & logic bugs | `TwinCATCommandBase.cs:23-24`, `Commands/SubscribeCommand.cs:23-24`, `Commands/BrowseCommand.cs:21-24` |
| Driver.TwinCAT.Cli-002 | Low | Resolved | Concurrency & thread safety | `Commands/SubscribeCommand.cs:46-58` |
| Driver.TwinCAT.Cli-003 | Low | Resolved | Error handling & resilience | `Commands/SubscribeCommand.cs:56-58` |
| Driver.TwinCAT.Cli-004 | Low | Resolved | Design-document adherence | `TwinCATCommandBase.cs:26-29`, `Commands/BrowseCommand.cs` |
| Driver.TwinCAT.Cli-005 | Low | Resolved | Code organization & conventions | `Commands/ProbeCommand.cs:23`, `Commands/ReadCommand.cs:20`, `Commands/WriteCommand.cs:20`, `Commands/SubscribeCommand.cs:18` |
| Driver.TwinCAT.Cli-006 | Low | Resolved | Testing coverage | `tests/Drivers/Cli/ZB.MOM.WW.OtOpcUa.Driver.TwinCAT.Cli.Tests/WriteCommandParseValueTests.cs` |
| Driver.TwinCAT.Cli-007 | Low | Resolved | Documentation & comments | `TwinCATCommandBase.cs:31-36` |
| Server-004 | Low | Resolved | OtOpcUa conventions | `src/Server/ZB.MOM.WW.OtOpcUa.Server/OpcUa/OtOpcUaServer.cs:187-200` |
| Server-006 | Low | Resolved | Concurrency & thread safety | `src/Server/ZB.MOM.WW.OtOpcUa.Server/OpcUa/DriverNodeManager.cs:478-482, 1342-1348` |
| Server-008 | Low | Resolved | Error handling & resilience | `src/Server/ZB.MOM.WW.OtOpcUa.Server/OpcUa/DriverNodeManager.cs:736` |
| Server-012 | Low | Resolved | Performance & resource management | `src/Server/ZB.MOM.WW.OtOpcUa.Server/Hosting/PeerHttpProbeLoop.cs:78-79` |
| Server-014 | Low | Resolved | Code organization & conventions | `src/Server/ZB.MOM.WW.OtOpcUa.Server/SealedBootstrap.cs` |
| Server-015 | Low | Resolved | Documentation & comments | `src/Server/ZB.MOM.WW.OtOpcUa.Server/OpcUa/OtOpcUaServer.cs:16-21`, `src/Server/ZB.MOM.WW.OtOpcUa.Server/OpcUa/OpcUaServerOptions.cs:21-26` |
| Driver.Galaxy-015 | ~~Medium~~ Low (re-triaged 2026-05-23) | Resolved | ~~Security~~ Documentation & comments (re-triaged 2026-05-23) | `libs/MxGateway.Client.dll`, `libs/MxGateway.Contracts.dll`, `libs/README.md` |
+20
View File
@@ -0,0 +1,20 @@
# Multi-stage build of OtOpcUa.Host targeting linux-x64. Used by docker-dev/docker-compose.yml
# to spin four host containers (admin-a, admin-b, driver-a, driver-b) from a single image —
# Compose drives OTOPCUA_ROLES + Cluster:* env per container to differentiate them.
FROM mcr.microsoft.com/dotnet/sdk:10.0 AS build
WORKDIR /src
COPY . .
RUN dotnet restore ZB.MOM.WW.OtOpcUa.slnx
RUN dotnet publish src/Server/ZB.MOM.WW.OtOpcUa.Host/ZB.MOM.WW.OtOpcUa.Host.csproj \
-c Release -o /app --no-restore
FROM mcr.microsoft.com/dotnet/aspnet:10.0 AS runtime
WORKDIR /app
COPY --from=build /app ./
EXPOSE 9000
EXPOSE 4053
EXPOSE 4840
ENTRYPOINT ["dotnet", "OtOpcUa.Host.dll"]
+62
View File
@@ -0,0 +1,62 @@
# docker-dev
Mac-friendly four-node OtOpcUa fleet for manual UI exercise + integration smoke tests. Spins up an Akka cluster + SQL Server + OpenLDAP + Traefik in front of two admin nodes.
## Stack
| Service | Role | Ports |
|---|---|---|
| `sql` | SQL Server 2022 (`ConfigDb` backing store) | host `14330` → container `1433` |
| `ldap` | OpenLDAP with dev users `alice` / `bob` | host `3893` → container `1389` |
| `admin-a` | OtOpcUa.Host, `OTOPCUA_ROLES=admin`, cluster seed | internal `9000` |
| `admin-b` | OtOpcUa.Host, `OTOPCUA_ROLES=admin`, joins admin-a | internal `9000` |
| `driver-a` | OtOpcUa.Host, `OTOPCUA_ROLES=driver` | host `4840` → container `4840` |
| `driver-b` | OtOpcUa.Host, `OTOPCUA_ROLES=driver` | host `4841` → container `4840` |
| `traefik` | Routes `:80` to whichever admin-* currently passes `/health/active` | host `80`, dashboard `8080` |
All six containers share an Akka cluster bound to port `4053` inside the Compose network. The Akka `PublicHostname` of each container matches its Compose service name; the seed-node list points at `admin-a` so the other three join via that.
## Bring up
```bash
# from the repo root
docker compose -f docker-dev/docker-compose.yml up -d --build
# wait ~15 seconds for SQL to come up + the cluster to form
open http://localhost # Blazor admin UI via Traefik
open http://localhost:8080 # Traefik dashboard
```
The first build takes a few minutes (.NET SDK image + restore + publish). Subsequent rebuilds are faster with Docker's layer cache.
## Auth (dev only)
Use one of the LDAP dev users from `LDAP_USERS` in `docker-compose.yml`:
| Username | Password |
|---|---|
| `alice` | `alice123` |
| `bob` | `bob123` |
The compose mounts everyone into `ou=FleetAdmin` so the dev role mapping resolves to `FleetAdmin`.
## Tear down
```bash
docker compose -f docker-dev/docker-compose.yml down -v
```
The `-v` drops the SQL + LDAP volumes; remove it to keep ConfigDb state across restarts.
## Failover smoke
1. Watch the Traefik dashboard at `http://localhost:8080`. Both `admin-a` and `admin-b` should be listed as healthy in the `otopcua-admin` service.
2. `docker compose -f docker-dev/docker-compose.yml stop admin-a``admin-b` should pick up the admin role-leader within ~15 s (Akka split-brain stable-after). Traefik will route traffic to `admin-b` once its `/health/active` returns 200.
3. `docker compose -f docker-dev/docker-compose.yml start admin-a``admin-a` rejoins as a follower; `admin-b` keeps the leader role until something disturbs it.
## Notes
- This compose is for the **local Mac/Linux developer rig**. The team's CI + soak runs go to the remote docker host at `10.100.0.35` (see `docs/v2/dev-environment.md`); the file there mirrors this one with adjusted port bindings.
- The OPC UA driver endpoints (`opc.tcp://localhost:4840`, `opc.tcp://localhost:4841`) are reachable directly from the host — Traefik is only in front of the admin HTTP surface.
- Galaxy + Wonderware drivers can't run in Linux containers (they need the Windows-only mxaccessgw + Historian SDK). On non-Windows, `DriverInstanceActor.ShouldStub(driverType, roles)` returns `true` for those types and the actor goes straight to a `Stubbed` state that returns deterministic success.
+130
View File
@@ -0,0 +1,130 @@
# docker-dev/ — Mac-friendly four-node fleet for v2 development + manual UI exercise.
#
# Stack:
# sql SQL Server 2022 (ConfigDb backing store)
# ldap OpenLDAP with the dev users from C:\publish\glauth\auth.md mirrored in
# admin-a OtOpcUa.Host with OTOPCUA_ROLES=admin (cluster seed)
# admin-b OtOpcUa.Host with OTOPCUA_ROLES=admin (joins admin-a)
# driver-a OtOpcUa.Host with OTOPCUA_ROLES=driver (joins via admin-a)
# driver-b OtOpcUa.Host with OTOPCUA_ROLES=driver (joins via admin-a)
# traefik Routes :80 to whichever admin-* currently passes /health/active
#
# Usage:
# docker compose -f docker-dev/docker-compose.yml up -d --build
# open http://localhost # Blazor admin UI via Traefik
# open http://localhost:8080 # Traefik dashboard
#
# Tear-down: docker compose -f docker-dev/docker-compose.yml down -v
name: otopcua-dev
services:
sql:
image: mcr.microsoft.com/mssql/server:2022-latest
environment:
ACCEPT_EULA: "Y"
SA_PASSWORD: "OtOpcUa!Dev123"
MSSQL_PID: Developer
ports:
- "14330:1433"
healthcheck:
test: ["CMD-SHELL", "/opt/mssql-tools18/bin/sqlcmd -S localhost -U sa -P 'OtOpcUa!Dev123' -No -Q 'SELECT 1' || exit 1"]
interval: 10s
timeout: 5s
retries: 20
ldap:
image: bitnami/openldap:2.6
environment:
LDAP_ROOT: "dc=lmxopcua,dc=local"
LDAP_ADMIN_USERNAME: "admin"
LDAP_ADMIN_PASSWORD: "ldapadmin"
LDAP_USERS: "alice,bob"
LDAP_PASSWORDS: "alice123,bob123"
LDAP_USER_DC: "ou=FleetAdmin"
ports:
- "3893:1389"
admin-a: &otopcua-host
build:
context: ..
dockerfile: docker-dev/Dockerfile
image: otopcua-host:dev
depends_on:
sql: { condition: service_healthy }
environment:
OTOPCUA_ROLES: "admin"
ASPNETCORE_URLS: "http://+:9000"
ConnectionStrings__ConfigDb: "Server=sql,1433;Database=OtOpcUa;User Id=sa;Password=OtOpcUa!Dev123;TrustServerCertificate=True;"
Cluster__Hostname: "0.0.0.0"
Cluster__Port: "4053"
Cluster__PublicHostname: "admin-a"
Cluster__SeedNodes__0: "akka.tcp://otopcua@admin-a:4053"
Cluster__Roles__0: "admin"
Security__Jwt__SigningKey: "docker-dev-signing-key-with-at-least-32-bytes-of-utf8-content-12345"
Security__Jwt__Issuer: "otopcua-dev"
Security__Jwt__Audience: "otopcua-dev"
Authentication__Ldap__Server: "ldap"
Authentication__Ldap__Port: "1389"
Authentication__Ldap__AllowInsecureLdap: "true"
admin-b:
<<: *otopcua-host
environment:
OTOPCUA_ROLES: "admin"
ASPNETCORE_URLS: "http://+:9000"
ConnectionStrings__ConfigDb: "Server=sql,1433;Database=OtOpcUa;User Id=sa;Password=OtOpcUa!Dev123;TrustServerCertificate=True;"
Cluster__Hostname: "0.0.0.0"
Cluster__Port: "4053"
Cluster__PublicHostname: "admin-b"
Cluster__SeedNodes__0: "akka.tcp://otopcua@admin-a:4053"
Cluster__Roles__0: "admin"
Security__Jwt__SigningKey: "docker-dev-signing-key-with-at-least-32-bytes-of-utf8-content-12345"
Security__Jwt__Issuer: "otopcua-dev"
Security__Jwt__Audience: "otopcua-dev"
Authentication__Ldap__Server: "ldap"
Authentication__Ldap__Port: "1389"
Authentication__Ldap__AllowInsecureLdap: "true"
driver-a:
<<: *otopcua-host
environment:
OTOPCUA_ROLES: "driver"
ConnectionStrings__ConfigDb: "Server=sql,1433;Database=OtOpcUa;User Id=sa;Password=OtOpcUa!Dev123;TrustServerCertificate=True;"
Cluster__Hostname: "0.0.0.0"
Cluster__Port: "4053"
Cluster__PublicHostname: "driver-a"
Cluster__SeedNodes__0: "akka.tcp://otopcua@admin-a:4053"
Cluster__Roles__0: "driver"
ports:
- "4840:4840"
driver-b:
<<: *otopcua-host
environment:
OTOPCUA_ROLES: "driver"
ConnectionStrings__ConfigDb: "Server=sql,1433;Database=OtOpcUa;User Id=sa;Password=OtOpcUa!Dev123;TrustServerCertificate=True;"
Cluster__Hostname: "0.0.0.0"
Cluster__Port: "4053"
Cluster__PublicHostname: "driver-b"
Cluster__SeedNodes__0: "akka.tcp://otopcua@admin-a:4053"
Cluster__Roles__0: "driver"
ports:
- "4841:4840"
traefik:
image: traefik:v3.1
command:
- --entrypoints.web.address=:80
- --providers.file.filename=/etc/traefik/dynamic.yml
- --providers.file.watch=true
- --api.insecure=true
ports:
- "80:80"
- "8080:8080"
volumes:
- ./traefik-dynamic.yml:/etc/traefik/dynamic.yml:ro
depends_on:
- admin-a
- admin-b
+21
View File
@@ -0,0 +1,21 @@
# docker-dev companion to scripts/install/traefik-dynamic.yml. Same routing rules,
# but the upstream targets are the Compose service names (admin-a, admin-b) on
# port 9000 instead of the Windows hostnames a bare-metal deployment would use.
http:
routers:
otopcua-admin:
entryPoints: ["web"]
rule: "PathPrefix(`/`)"
service: otopcua-admin
services:
otopcua-admin:
loadBalancer:
servers:
- url: "http://admin-a:9000"
- url: "http://admin-b:9000"
healthCheck:
path: /health/active
interval: 5s
timeout: 2s
+40 -16
View File
@@ -149,53 +149,77 @@ otopcua-cli browse -u opc.tcp://localhost:4840/OtOpcUa -U admin -P admin123 -r -
### subscribe
Monitors a node for value changes using OPC UA subscriptions. Prints each data change notification with timestamp, value, and status code. Runs until Ctrl+C, then unsubscribes and disconnects cleanly.
Monitors a node (or every Variable in its subtree) for value changes using OPC UA subscriptions.
Prints each data-change notification with timestamp, value, and status code, then prints a
summary on exit. Exits on Ctrl+C, or automatically after `--duration` seconds.
```bash
# Subscribe to a single node
otopcua-cli subscribe -u opc.tcp://localhost:4840 -n "ns=2;s=MyNode" -i 500
# Browse a subtree and subscribe to every Variable, run for 60 seconds, write the summary to disk
otopcua-cli subscribe -u opc.tcp://localhost:4840 -n "ns=3;s=ZB" -r --max-depth 4 \
--duration 60 --quiet --summary-file C:\Temp\subscribe-summary.txt
```
| Flag | Description |
|------|-------------|
| `-n` / `--node` | Node ID to monitor (required) |
| `-i` / `--interval` | Sampling/publishing interval in milliseconds (default: 1000) |
| `-n` / `--node` | Node ID to monitor (required). When `--recursive` is set, this is the browse root. |
| `-i` / `--interval` | Sampling interval in milliseconds (default: 1000) |
| `-r` / `--recursive` | Browse recursively from `--node` and subscribe to every Variable found |
| `--max-depth` | Maximum recursion depth when `--recursive` is set (default: 10) |
| `-q` / `--quiet` | Suppress per-update output; only print the final summary |
| `--duration` | Auto-exit after N seconds and print the summary (0 = run until Ctrl+C, default: 0) |
| `--summary-file` | Also write the summary to this file path on exit |
#### Summary buckets
The summary prints per-node counts across these buckets:
- **Ever went BAD during window** — node received at least one notification whose status was not Good.
- **NEVER went bad (suspect)** — node received at least one notification and every one was Good.
- **Last status GOOD / NOT-GOOD** — final observed status across nodes that received any update.
- **No update received at all** — node was subscribed but no notification arrived during the window.
### historyread
Reads historical data from a node. Supports raw history reads and aggregate (processed) history reads.
`--start` and `--end` are parsed with `CultureInfo.InvariantCulture` and treated as UTC; supply
them in ISO 8601 UTC form (`YYYY-MM-DDTHH:MM:SSZ`) for unambiguous behaviour across hosts.
```bash
# Raw history
otopcua-cli historyread -u opc.tcp://localhost:4840/OtOpcUa \
-n "ns=1;s=TestMachine_001.TestHistoryValue" \
--start "2026-03-25" --end "2026-03-30"
--start "2026-03-25T00:00:00Z" --end "2026-03-30T00:00:00Z"
# Aggregate: 1-hour average
otopcua-cli historyread -u opc.tcp://localhost:4840/OtOpcUa \
-n "ns=1;s=TestMachine_001.TestHistoryValue" \
--start "2026-03-25" --end "2026-03-30" \
--start "2026-03-25T00:00:00Z" --end "2026-03-30T00:00:00Z" \
--aggregate Average --interval 3600000
```
| Flag | Description |
|------|-------------|
| `-n` / `--node` | Node ID to read history for (required) |
| `--start` | Start time, ISO 8601 or date string (default: 24 hours ago) |
| `--end` | End time, ISO 8601 or date string (default: now) |
| `--start` | Start time in ISO 8601 UTC format, e.g. `2026-01-15T08:00:00Z` (default: 24 hours ago) |
| `--end` | End time in ISO 8601 UTC format, e.g. `2026-01-15T09:00:00Z` (default: now) |
| `--max` | Maximum number of values (default: 1000) |
| `--aggregate` | Aggregate function: Average, Minimum, Maximum, Count, Start, End |
| `--aggregate` | Aggregate function: Average, Minimum, Maximum, Count, Start, End, StandardDeviation |
| `--interval` | Processing interval in milliseconds for aggregates (default: 3600000) |
#### Aggregate mapping
| Name | OPC UA Node ID |
|------|---------------|
| `Average` | `AggregateFunction_Average` |
| `Minimum` | `AggregateFunction_Minimum` |
| `Maximum` | `AggregateFunction_Maximum` |
| `Count` | `AggregateFunction_Count` |
| `Start` | `AggregateFunction_Start` |
| `End` | `AggregateFunction_End` |
| Name | Aliases | OPC UA Node ID |
|------|---------|---------------|
| `Average` | `avg` | `AggregateFunction_Average` |
| `Minimum` | `min` | `AggregateFunction_Minimum` |
| `Maximum` | `max` | `AggregateFunction_Maximum` |
| `Count` | | `AggregateFunction_Count` |
| `Start` | `first` | `AggregateFunction_Start` |
| `End` | `last` | `AggregateFunction_End` |
| `StandardDeviation` | `stddev`, `stdev` | `AggregateFunction_StandardDeviationSample` |
### alarms
+1 -1
View File
@@ -198,7 +198,7 @@ All times are in UTC. Invalid input turns red on blur.
| Option | Description |
|--------|-------------|
| Aggregate | Raw (default), Average, Minimum, Maximum, Count, Start, End |
| Aggregate | Raw (default), Average, Minimum, Maximum, Count, Start, End, Standard Deviation |
| Interval (ms) | Processing interval for aggregate queries (shown only for aggregates) |
| Max Values | Maximum number of raw values to return (default 1000) |
+3 -2
View File
@@ -4,8 +4,9 @@ Ad-hoc probe / read / write / subscribe tool for ControlLogix / CompactLogix /
Micro800 / GuardLogix PLCs, talking to the **same** `AbCipDriver` the OtOpcUa
server uses (libplctag under the hood).
Second of four driver test-client CLIs (Modbus → AB CIP → AB Legacy → S7 →
TwinCAT). Shares `Driver.Cli.Common` with the others.
Second of six driver test-client CLIs (Modbus → AB CIP → AB Legacy → S7 →
TwinCAT → FOCAS). Shares `Driver.Cli.Common` with the others; see
[DriverClis.md](DriverClis.md) for the authoritative roster.
## Build + run
+3
View File
@@ -95,6 +95,9 @@ PLC-managed — use with caution.
otopcua-ablegacy-cli subscribe -g ab://192.168.1.20/1,0 -a N7:10 -t Int -i 500
```
`-i` / `--interval-ms` is the publishing interval in milliseconds — default
`1000`. `PollGroupEngine` floors sub-250 ms values, so `-i 100` runs at 250 ms.
## Known caveat — ab_server upstream gap
The integration-fixture `ab_server` Docker container accepts TCP but its PCCC
+8
View File
@@ -122,6 +122,14 @@ gives plausible values is the correct one for that device.
## v2 addressing grammar
> **CLI scope:** the `read` / `write` / `subscribe` commands accept only
> the structured `--region` + `--address` + `--type` triple. The
> address-string grammar below is a **`DriverConfig` JSON** feature
> consumed by the driver itself; it is not reachable from this CLI's
> flags. To experiment with it via the CLI, use the structured flags;
> to deploy spreadsheets as-is, hand-author a `DriverConfig` and run
> the server.
The driver accepts the industry-standard tag-address grammar so you can
paste tag spreadsheets from Wonderware / Kepware / Ignition without
per-row manual translation. Full reference + grammar rules:
+6 -3
View File
@@ -9,10 +9,13 @@ The project was originally called **LmxOpcUa** (a single-driver Galaxy/MXAccess
## Platform overview
- **Core** owns the OPC UA stack, address space, session/security/subscription machinery.
> **v2 (2026-05-26):** the separate `OtOpcUa.Server` + `OtOpcUa.Admin` services fused into a single role-gated `OtOpcUa.Host` binary, joined by an Akka.NET cluster. See [v2 design](plans/2026-05-26-akka-hosting-alignment-design.md) for the architectural decision.
- **Core** owns shared abstractions (driver capability contracts, scripting, virtual tags, alarm historian).
- **Drivers** plug in via capability interfaces in `ZB.MOM.WW.OtOpcUa.Core.Abstractions`: `IDriver`, `IReadable`, `IWritable`, `ITagDiscovery`, `ISubscribable`, `IHostConnectivityProbe`, `IAlarmSource`, `IHistoryProvider`, `IPerCallHostResolver`. Each driver opts into whichever it supports.
- **Server** is the OPC UA endpoint process (net10, AnyCPU). Hosts every driver in-process. The Galaxy driver reaches MXAccess via gRPC to a separately-installed **mxaccessgw** sidecar (sibling repo); it is no longer hosted from this repo.
- **Admin** is the Blazor Server operator UI (net10, x64). Owns the Config DB draft/publish flow, ACL + role-grant authoring, fleet status + `/metrics` scrape endpoint.
- **Host** (`src/Server/ZB.MOM.WW.OtOpcUa.Host`) is the single fused binary (.NET 10, AnyCPU). `OTOPCUA_ROLES` env decides what to mount: `admin` (Blazor + control-plane singletons), `driver` (OPC UA endpoint + per-node actors), or both. See [ServiceHosting.md](ServiceHosting.md).
- **Cluster + ControlPlane + Runtime + AdminUI + Security** sit between Core and Host. The cluster glues per-node actors into one logical fleet; the control-plane singletons (deploy coordinator, audit writer, redundancy state) live on the admin role-leader. See [Redundancy.md](Redundancy.md).
- The Galaxy driver still reaches MXAccess via gRPC to a separately-installed **mxaccessgw** sidecar (sibling repo).
## Where to find what
+64 -74
View File
@@ -1,103 +1,93 @@
# Redundancy
# Redundancy (v2)
## Overview
OtOpcUa supports OPC UA **non-transparent** warm/hot redundancy. Two (or more) OtOpcUa Server processes run side-by-side, share the same Config DB, the same driver backends (Galaxy ZB, MXAccess runtime, remote PLCs), and advertise the same OPC UA node tree. Each process owns a distinct `ApplicationUri`; OPC UA clients see both endpoints via the standard `ServerUriArray` and pick one based on the `ServiceLevel` that each server publishes.
OtOpcUa supports OPC UA **non-transparent** warm/hot redundancy. Two or more `OtOpcUa.Host` processes run side-by-side, share the same Config DB, and join the same Akka.NET cluster. Each process owns a distinct `ApplicationUri`; OPC UA clients see both endpoints via the standard `ServerUriArray` and pick one based on the `ServiceLevel` byte that each server publishes.
The redundancy surface lives in `src/Server/ZB.MOM.WW.OtOpcUa.Server/Redundancy/`:
> **v2 change.** v1's operator-managed `ClusterNode.RedundancyRole` column + `RedundancyCoordinator` / `ApplyLeaseRegistry` / `PeerHttpProbeLoop` are gone. Primary/secondary is now derived from **Akka cluster role-leader** for the `driver` role. The operator no longer writes a role into the DB; cluster topology + health drive ServiceLevel automatically.
| Class | Role |
|---|---|
| `RedundancyCoordinator` | Process-singleton; owns the current `RedundancyTopology` loaded from the `ClusterNode` table. `RefreshAsync` re-reads after `sp_PublishGeneration` so operator role swaps take effect without a process restart. CAS-style swap (`Interlocked.Exchange`) means readers always see a coherent snapshot. |
| `RedundancyTopology` | Immutable `(ClusterId, Self, Peers, ServerUriArray, ValidityFlags)` snapshot. |
| `ApplyLeaseRegistry` | Tracks in-progress `sp_PublishGeneration` apply leases keyed on `(ConfigGenerationId, PublishRequestId)`. `await using` the disposable scope guarantees every exit path (success / exception / cancellation) decrements the lease; a stale-lease watchdog force-closes any lease older than `ApplyMaxDuration` (default 10 minutes) so a crashed publisher can't pin the node at `PrimaryMidApply`. |
| `PeerReachabilityTracker` | Maintains last-known reachability for each peer node over two independent probes — OPC UA ping and HTTP `/healthz`. Both must succeed for `peerReachable = true`. |
| `RecoveryStateManager` | Gates transitions out of the `Recovering*` bands; requires dwell + publish-witness satisfaction before allowing a return to nominal. |
| `ServiceLevelCalculator` | Pure function `(role, selfHealthy, peerUa, peerHttp, applyInProgress, recoveryDwellMet, topologyValid, operatorMaintenance) → byte`. |
| `RedundancyStatePublisher` | Orchestrates inputs into the calculator, pushes the resulting byte to the OPC UA `ServiceLevel` variable via an edge-triggered `OnStateChanged` event, and fires `OnServerUriArrayChanged` when the topology's `ServerUriArray` shifts. |
The runtime pieces live in:
## Data model
Per-node redundancy state lives in the Config DB `ClusterNode` table (`src/Core/ZB.MOM.WW.OtOpcUa.Configuration/Entities/ClusterNode.cs`):
| Column | Role |
|---|---|
| `NodeId` | Unique node identity; matches `Node:NodeId` in the server's bootstrap `appsettings.json`. |
| `ClusterId` | Foreign key into `ServerCluster`. |
| `RedundancyRole` | `Primary`, `Secondary`, or `Standalone` (`RedundancyRole` enum in `Configuration/Enums`). |
| `ServiceLevelBase` | Per-node base value used to bias nominal ServiceLevel output. |
| `ApplicationUri` | Unique-per-node OPC UA ApplicationUri advertised in endpoint descriptions. |
`ServerUriArray` is derived from the set of peer `ApplicationUri` values at topology-load time and republished when the topology changes.
## ServiceLevel matrix
`ServiceLevelCalculator` produces one of the following bands (see `ServiceLevelBand` enum in the same file):
| Band | Byte | Meaning |
| Component | Project | Role |
|---|---|---|
| `Maintenance` | 0 | Operator-declared maintenance. |
| `NoData` | 1 | Self-reported unhealthy (`/healthz` fails). |
| `InvalidTopology` | 2 | More than one Primary detected; both nodes self-demote. |
| `RecoveringBackup` | 30 | Backup post-fault, dwell not met. |
| `BackupMidApply` | 50 | Backup inside a publish-apply window. |
| `IsolatedBackup` | 80 | Primary unreachable; Backup says "take over if asked" — does **not** auto-promote (non-transparent model). |
| `AuthoritativeBackup` | 100 | Backup nominal. |
| `RecoveringPrimary` | 180 | Primary post-fault, dwell not met. |
| `PrimaryMidApply` | 200 | Primary inside a publish-apply window. |
| `IsolatedPrimary` | 230 | Primary with unreachable peer, retains authority. |
| `AuthoritativePrimary` | 255 | Primary nominal. |
| `ServiceLevelCalculator` | `OtOpcUa.ControlPlane.Redundancy` | Pure function `(NodeHealthInputs) → byte`. No side effects. |
| `RedundancyStateActor` | `OtOpcUa.ControlPlane.Redundancy` | Admin-role cluster singleton; subscribes to cluster topology events, debounces 250ms, broadcasts `RedundancyStateChanged` on the `redundancy-state` DPS topic. |
| `DbHealthProbeActor` | `OtOpcUa.Runtime.Health` | Per-node; runs `SELECT 1` against ConfigDb every 5s. Read by health endpoint + redundancy calc. |
| `PeerOpcUaProbeActor` | `OtOpcUa.Runtime.Health` | Per-node; pings peer `opc.tcp://peer:4840` (real probe call is staged for follow-up F12). |
| `ClusterRoleInfo` | `OtOpcUa.Cluster` | Live view of cluster membership + role-leader; exposes `IClusterRoleInfo` to the rest of the host. |
The reserved bands (0 Maintenance, 1 NoData, 2 InvalidTopology) take precedence over operational states per OPC UA Part 5 §6.3.34. Operational values occupy 2..255 so spec-compliant clients that treat "<3 = unhealthy" keep working.
## ServiceLevel tiers (Part 5 §6.5)
Standalone nodes (single-instance deployments) report `AuthoritativePrimary` when healthy and `PrimaryMidApply` during publish.
`ServiceLevelCalculator.Compute(NodeHealthInputs)` returns a byte in 0..255 by tier:
## Publish fencing and split-brain prevention
| Tier | Byte | Condition |
|---|---|---|
| Down | 0 | Member status is not `Up` or `Joining` (leaving, removed, exiting). |
| Critically degraded | 100 | ConfigDb unreachable AND data is stale. |
| Stale | 200 | Data stale but ConfigDb reachable. |
| Healthy follower | 240 | DB ok + OPC UA probe ok + not stale. |
| Healthy leader | 250 | Healthy + this node is the `driver` role-leader. |
Any Admin-triggered `sp_PublishGeneration` acquires an apply lease through `ApplyLeaseRegistry.BeginApplyLease`. While the lease is held:
Drivers write their computed byte into the OPC UA `ServiceLevel` Variable on each refresh. Clients with the standard redundancy heuristic ("pick the highest ServiceLevel") therefore prefer the role-leader and fall back to followers on its degradation.
- The calculator reports `PrimaryMidApply` / `BackupMidApply` — clients see the band shift and cut over to the unaffected peer rather than racing against a half-applied generation.
- `RedundancyCoordinator.RefreshAsync` is called at the end of the apply window so the post-publish topology becomes visible exactly once, atomically.
- The watchdog force-closes any lease older than `ApplyMaxDuration`; a stuck publisher therefore cannot strand a node at `PrimaryMidApply`.
## Data flow
Because role transitions are **operator-driven** (write `RedundancyRole` in the Config DB + publish), the Backup never auto-promotes. An `IsolatedBackup` at 80 is the signal that the operator should intervene; auto-failover is intentionally out of scope for the non-transparent model (decision #154).
```
Cluster topology event ──┐
DB health probe ─────────┤
OPC UA peer probe ───────┤
RedundancyStateActor (admin singleton)
│ debounce 250ms
DPS topic "redundancy-state"
Driver nodes' OpcUaPublishActor
ServiceLevelCalculator → byte
OPC UA ServiceLevel Variable
```
## Metrics
The admin singleton is the cluster's only `RedundancyStateActor`. If the admin leader fails over, the new admin node spins up its replacement, re-subscribes to cluster events, and publishes a fresh snapshot from the current `Cluster.State`. There is no DB-persisted state to recover.
`RedundancyMetrics` in `src/Server/ZB.MOM.WW.OtOpcUa.Admin/Services/RedundancyMetrics.cs` registers the `ZB.MOM.WW.OtOpcUa.Redundancy` meter on the Admin process. Instruments:
## Configuration
| Name | Kind | Tags | Description |
|---|---|---|---|
| `otopcua.redundancy.role_transition` | Counter<long> | `cluster.id`, `node.id`, `from_role`, `to_role` | Incremented every time `FleetStatusPoller` observes a `RedundancyRole` change on a `ClusterNode` row. |
| `otopcua.redundancy.primary_count` | ObservableGauge<long> | `cluster.id` | Primary-role nodes per cluster — should be exactly 1 in nominal state. |
| `otopcua.redundancy.secondary_count` | ObservableGauge<long> | `cluster.id` | Secondary-role nodes per cluster. |
| `otopcua.redundancy.stale_count` | ObservableGauge<long> | `cluster.id` | Nodes whose `LastSeenAt` exceeded the stale threshold. |
Per-node identity comes from `appsettings.json` + the `OTOPCUA_ROLES` env var:
Admin `Program.cs` wires OpenTelemetry to the Prometheus exporter when `Metrics:Prometheus:Enabled=true` (default), exposing the meter under `/metrics`. The endpoint is intentionally unauthenticated — fleet conventions put it behind a reverse-proxy basic-auth gate if needed.
```json
{
"Cluster": {
"Hostname": "0.0.0.0",
"Port": 4053,
"PublicHostname": "node-a.lan",
"SeedNodes": ["akka.tcp://otopcua@node-a.lan:4053"],
"Roles": ["admin", "driver"]
}
}
```
## Real-time notifications (Admin UI)
```
OTOPCUA_ROLES=admin,driver
```
`FleetStatusPoller` in `src/Server/ZB.MOM.WW.OtOpcUa.Admin/Hubs/` polls the `ClusterNode` table, records role transitions, updates `RedundancyMetrics.SetClusterCounts`, and pushes a `RoleChanged` SignalR event onto `FleetStatusHub` when a transition is observed. `RedundancyTab.razor` subscribes with `_hub.On<RoleChangedMessage>("RoleChanged", …)` so connected Admin sessions see role swaps the moment they happen.
Both nodes share the same `ConfigDb` connection string; `Cluster.PublicHostname` + `Roles` are what makes them distinct in cluster gossip. The first node bootstraps the cluster (its address goes in `SeedNodes`); the second node joins via the same `SeedNodes` list.
## Configuring a redundant pair
There is no longer a `Node:NodeId` setting, no `ClusterNode.RedundancyRole`, no `ServiceLevelBase`. NodeId is derived as `host:port` of the cluster `PublicHostname` (see `ClusterRoleInfo.LocalNode` for the formula).
Redundancy is configured **in the Config DB, not appsettings.json**. The fields that must differ between the two instances:
## Split-brain
| Field | Location | Instance 1 | Instance 2 |
|---|---|---|---|
| `NodeId` | `appsettings.json` `Node:NodeId` (bootstrap) | `node-a` | `node-b` |
| `ClusterNode.ApplicationUri` | Config DB | `urn:node-a:OtOpcUa` | `urn:node-b:OtOpcUa` |
| `ClusterNode.RedundancyRole` | Config DB | `Primary` | `Secondary` |
| `ClusterNode.ServiceLevelBase` | Config DB | typically 255 | typically 100 |
`akka.conf` configures Akka's split-brain resolver with `active-strategy = keep-oldest`, `stable-after = 15s`, and `failure-detector.threshold = 10.0`. Under a clean partition: the oldest member stays up + the smaller (or younger) side downs itself within ~15 seconds. The `RedundancyStateActor` on the surviving partition re-computes from the post-partition `Cluster.State`.
Shared between instances: `ClusterId`, Config DB connection string, published generation, cluster-level ACLs, UNS hierarchy, driver instances.
Role swaps, stand-alone promotions, and base-level adjustments all happen through the Admin UI `RedundancyTab` — the operator edits the `ClusterNode` row in a draft generation and publishes. `RedundancyCoordinator.RefreshAsync` picks up the new topology without a process restart.
There is no operator-driven role swap during a partition. Failover is what the cluster does automatically.
## Client-side failover
The OtOpcUa Client CLI at `src/Client/ZB.MOM.WW.OtOpcUa.Client.CLI` supports `-F` / `--failover-urls` for automatic client-side failover; for long-running subscriptions the CLI monitors session KeepAlive and reconnects to the next available server, recreating the subscription on the new endpoint. See [`Client.CLI.md`](Client.CLI.md) for the command reference.
The OtOpcUa Client CLI at `src/Client/ZB.MOM.WW.OtOpcUa.Client.CLI` supports `-F` / `--failover-urls` for automatic client-side failover; for long-running subscriptions the CLI monitors session KeepAlive and reconnects to the next available server, recreating the subscription on the new endpoint. See [`Client.CLI.md`](Client.CLI.md).
## Depth reference
For the full decision trail and implementation plan — topology invariants, peer-probe cadence, recovery-dwell policy, compliance-script guard against enum-value drift — see `docs/v2/plan.md` §Phase 6.3.
For the full design — message contracts, tiered calculator truth table, recovery semantics — see `docs/plans/2026-05-26-akka-hosting-alignment-design.md` §6.
+1 -1
View File
@@ -35,7 +35,7 @@ new ScriptedAlarmDefinition(
## Predicate evaluation
Alarm predicates reuse the same Roslyn sandbox as virtual tags — `ScriptEvaluator<AlarmPredicateContext, bool>` compiles the source, `TimedScriptEvaluator` wraps it with the configured timeout (default from `TimedScriptEvaluator.DefaultTimeout`), and `DependencyExtractor` statically harvests the tag paths the script reads. The sandbox rules (forbidden types, cancellation, logging sinks) are documented in [VirtualTags.md](VirtualTags.md); ScriptedAlarms does not redefine them. The known resource limits — unbounded script-side memory, the per-publish accretion of dynamically-emitted script assemblies (Core.Scripting-008), and the orphan-thread CPU-budget caveat — are documented in that file as well.
Alarm predicates reuse the same Roslyn sandbox as virtual tags — `ScriptEvaluator<AlarmPredicateContext, bool>` compiles the source, `TimedScriptEvaluator` wraps it with the configured timeout (default from `TimedScriptEvaluator.DefaultTimeout`), and `DependencyExtractor` statically harvests the tag paths the script reads. The sandbox rules (forbidden types, cancellation, logging sinks) are documented in [VirtualTags.md](VirtualTags.md); ScriptedAlarms does not redefine them. The known resource limits — unbounded script-side memory and the orphan-thread CPU-budget caveat — are documented in that file as well; per-publish assembly accretion was resolved by the Core.Scripting-008 collectible-`AssemblyLoadContext` rewrite and no longer requires periodic server restarts.
`AlarmPredicateContext` (`AlarmPredicateContext.cs`) is the script's `ScriptContext` subclass:
+55 -41
View File
@@ -1,62 +1,76 @@
# Service Hosting
# Service Hosting (v2)
## Overview
A production OtOpcUa deployment runs **two or three processes**, each
with a distinct runtime and install surface:
A production OtOpcUa deployment runs **one binary per node**, plus the optional Wonderware historian sidecar:
| Process | Project | Runtime | Platform | Responsibility |
|---|---|---|---|---|
| **OtOpcUa Server** | `src/Server/ZB.MOM.WW.OtOpcUa.Server` | .NET 10 | x64 | Hosts the OPC UA endpoint; loads every driver in-process (Modbus, S7, AbCip, AbLegacy, TwinCAT, FOCAS, OPC UA Client, Galaxy via mxaccessgw); exposes `/healthz`. |
| **OtOpcUa Admin** | `src/Server/ZB.MOM.WW.OtOpcUa.Admin` | .NET 10 (ASP.NET Core / Blazor Server) | x64 | Operator UI for Config DB editing + fleet status, SignalR hubs (`FleetStatusHub`, `AlertHub`), Prometheus `/metrics`. |
| **OtOpcUa Wonderware Historian** *(optional)* | `src/Drivers/ZB.MOM.WW.OtOpcUa.Driver.Historian.Wonderware` | .NET Framework 4.8 | x86 (32-bit) | Out-of-process sidecar exposing the Wonderware Historian SDK over a named pipe. Required only when `Historian:Wonderware:Enabled=true` in `appsettings.json`. |
| **OtOpcUa Host** | `src/Server/ZB.MOM.WW.OtOpcUa.Host` | .NET 10 | AnyCPU | Single fused binary. `OTOPCUA_ROLES` env decides what to mount: `admin` (Blazor + auth + control-plane singletons), `driver` (OPC UA endpoint + per-driver actors), or both. |
| **OtOpcUa Wonderware Historian** *(optional)* | `src/Drivers/ZB.MOM.WW.OtOpcUa.Driver.Historian.Wonderware` | .NET Framework 4.8 | x86 (32-bit) | Out-of-process sidecar exposing the Wonderware Historian SDK over a named pipe. Required only when `Historian:Wonderware:Enabled=true`. |
Galaxy access uses a separately-installed **mxaccessgw** running out
of a sibling repo (`c:\Users\dohertj2\Desktop\mxaccessgw\`) — see
`docs/v2/Galaxy.ParityRig.md` for setup. The mxaccessgw owns the
MXAccess COM bitness constraint (its worker is x86 net48); nothing
in the OtOpcUa repo carries that constraint anymore. PR 7.2 retired
the legacy in-process `Galaxy.Host` / `Galaxy.Proxy` / `Galaxy.Shared`
projects + the `OtOpcUaGalaxyHost` Windows service.
Galaxy access still uses the separately-installed **mxaccessgw** sidecar (see `docs/v2/Galaxy.ParityRig.md`); the gateway owns the MXAccess COM bitness constraint (its worker is x86 net48). Nothing in the OtOpcUa repo carries that constraint anymore.
## OtOpcUa Server
> **v2 change.** v1's separate `OtOpcUa.Server` + `OtOpcUa.Admin` Windows services merged into a single role-gated `OtOpcUa.Host` binary. Two installers became one (with a `-Roles` parameter). The whole DI graph is composed in `OtOpcUa.Host/Program.cs`; per-role wiring is conditional on the env var.
Hosted via `Microsoft.Extensions.Hosting` with `AddWindowsService`
(decision #30 — replaced TopShelf in v2). The host's `Build()`
returns immediately when launched interactively (e.g. `dotnet run`)
but blocks for SCM signals when running as a Windows service.
## Role gating
In-process drivers are registered at startup in `Program.cs`'s
`DriverFactoryRegistry` block; the `DriverInstance` rows in the
central Config DB select which driver factories materialise into
live `IDriver` instances. See `docs/v2/driver-specs.md` for the
per-driver `DriverConfig` JSON shapes.
`Program.cs` reads `OTOPCUA_ROLES`, parses it with `RoleParser`, and conditionally registers services:
## OtOpcUa Admin
| Role present | Wires |
|---|---|
| `admin` | `AddOtOpcUaAuth`, `AddAdminUI`, `AddSignalR`, `AddOtOpcUaAdminClients`, `MapOtOpcUaAuth`, `MapAdminUI<App>`, `MapOtOpcUaHubs`, `WithOtOpcUaControlPlaneSingletons` (5 admin singletons via `Akka.Hosting`) |
| `driver` | `WithOtOpcUaRuntimeActors` (DriverHostActor + DbHealthProbeActor) — and the OPC UA endpoint on port 4840 |
| Either / both | `AddOtOpcUaConfigDb`, `AddOtOpcUaCluster`, `AddOtOpcUaHealth` (`/health/ready`, `/health/active`, `/healthz`) |
Same hosting model; runs the Blazor Server UI + SignalR hubs.
Reads from the same Config DB the Server writes to.
Single-node dev: `OTOPCUA_ROLES=admin,driver`. Production: typically two admin nodes (HA pair) + N driver nodes.
## Akka cluster
The host joins an Akka.NET cluster bound to the address in `appsettings.json::Cluster`:
```json
{
"Cluster": {
"Hostname": "0.0.0.0",
"Port": 4053,
"PublicHostname": "node-a.lan",
"SeedNodes": ["akka.tcp://otopcua@node-a.lan:4053"],
"Roles": ["admin", "driver"]
}
}
```
- `WithOtOpcUaClusterBootstrap` (in `OtOpcUa.Cluster`) loads the embedded HOCON (split-brain resolver, pinned dispatcher, failure detector tuning) and overlays remote endpoint + cluster options.
- All cluster singletons + per-node actors live on this single ActorSystem — there is no second Akka instance.
See [Redundancy.md](Redundancy.md) for the role-leader + ServiceLevel story.
## Health endpoints
Both admin and driver nodes expose:
| Path | Status meaning |
|---|---|
| `/healthz` | Process alive. |
| `/health/ready` | ConfigDb reachable + cluster member state is `Up`. |
| `/health/active` | Admin-role leader (the node Traefik or an HA LB should route traffic to). |
Used by Traefik for the active-leader-only routing pattern (see [Task 63 traefik docs](v2/Architecture-v2.md) — TODO).
## OtOpcUa Wonderware Historian (optional)
When `Historian:Wonderware:Enabled=true`, the Server speaks to a
sidecar that wraps the Wonderware Historian SDK (which is .NET
Framework only). The pipe IPC contract is in
`src/Drivers/ZB.MOM.WW.OtOpcUa.Driver.Historian.Wonderware.Client/Contracts/`
and the sidecar's pipe handler lives at
`src/Drivers/ZB.MOM.WW.OtOpcUa.Driver.Historian.Wonderware/Pipe/`.
Install via the `-InstallWonderwareHistorian` switch on
`scripts/install/Install-Services.ps1`.
Unchanged from v1. Pipe IPC contract lives in `src/Drivers/ZB.MOM.WW.OtOpcUa.Driver.Historian.Wonderware.Client/Contracts/`; sidecar pipe handler in `src/Drivers/ZB.MOM.WW.OtOpcUa.Driver.Historian.Wonderware/Pipe/`. Install via `scripts/install/Install-Services.ps1 -InstallWonderwareHistorian`.
## Install / Uninstall
- `scripts/install/Install-Services.ps1` — installs `OtOpcUa` and
optionally `OtOpcUaWonderwareHistorian`.
- `scripts/install/Uninstall-Services.ps1` — stops + removes both,
plus `OtOpcUaGalaxyHost` if a pre-7.2 rig still carries it.
- `scripts/install/Install-Services.ps1 -Roles admin,driver` — installs `OtOpcUaHost`. v2 rewrite tracked as plan Task 62.
- `scripts/install/Uninstall-Services.ps1` — stops + removes the host service (and the historian sidecar if installed).
## Logging
Serilog with rolling-daily file sinks. Each service writes to
`%ProgramData%\OtOpcUa\<service>-*.log` plus stdout (NSSM-friendly).
Serilog with rolling-daily file sinks. Each host writes to `logs/otopcua-*.log` plus stdout (NSSM/systemd-friendly). Per-environment log level overrides go in `appsettings.{Environment}.json`.
## Depth reference
For the full host-architecture rationale (why fused vs. split, role-gating tradeoffs, multi-node deployment shapes), see `docs/plans/2026-05-26-akka-hosting-alignment-design.md` §3-4.
+5 -3
View File
@@ -6,7 +6,7 @@ The runtime is split across two projects: `Core.Scripting` holds the Roslyn sand
## Roslyn script sandbox (`Core.Scripting`)
User scripts are compiled via `Microsoft.CodeAnalysis.CSharp.Scripting` against a `ScriptContext` subclass. `ScriptGlobals<TContext>` exposes the context as a field named `ctx`, so scripts read `ctx.GetTag("...")` / `ctx.SetVirtualTag("...", ...)` / `ctx.Now` / `ctx.Logger` and return a value.
User scripts are compiled via `Microsoft.CodeAnalysis.CSharp` (regular compiler, not the scripting variant — the original `CSharpScript` pipeline was retired by the Core.Scripting-008 / -016 rewrite, see "Compile cache" below). Each script's source is pasted as the body of a synthesized `CompiledScript.Run(ScriptGlobals<TContext>)` method against a `ScriptContext` subclass. `ScriptGlobals<TContext>` exposes the context as a field named `ctx`, so scripts read `ctx.GetTag("...")` / `ctx.SetVirtualTag("...", ...)` / `ctx.Now` / `ctx.Logger` and return a value.
### Compile pipeline (`ScriptEvaluator<TContext, TResult>`)
@@ -30,11 +30,13 @@ Similarly, **`System.Threading.Tasks` is now denied** (Core.Scripting-003), whic
`ConcurrentDictionary<string, Lazy<ScriptEvaluator<...>>>` keyed on `SHA-256(UTF8(source))` rendered to hex. `Lazy<T>` with `ExecutionAndPublication` mode means two threads racing a miss compile exactly once. Failed compiles evict the entry (via the `TryRemove(KeyValuePair<,>)` overload so a concurrently re-added retry entry is not collateral damage — Core.Scripting-006) so a corrected retry can succeed (used during Admin UI authoring). No capacity bound — scripts are operator-authored and bounded by the config DB. Whitespace changes miss the cache on purpose. `Clear()` is called on config-publish.
**Per-publish assembly accretion (accepted limitation, Core.Scripting-008).** Each compiled `ScriptEvaluator` holds a Roslyn `ScriptRunner<T>` delegate, which keeps the dynamically-emitted script assembly loaded for the process lifetime. Emitted assemblies in the default `AssemblyLoadContext` cannot be unloaded; `CompiledScriptCache.Clear()` drops the dictionary entries but does **not** unload the underlying assemblies. Across many config-publish generations (each `Clear()` followed by recompiling every script), the process accumulates dead script assemblies. For the expected "low thousands" of scripts this is benign, but a long-running server with very frequent publishes will see steady managed-memory growth that does not return until the process restarts. Out-of-process script evaluation or a collectible `AssemblyLoadContext` is a v3 concern; deployments with high-publish-frequency requirements should schedule a periodic server restart to reclaim the accrued assemblies.
**Per-publish assembly unload (Core.Scripting-008 resolved).** Each compiled `ScriptEvaluator` emits its script into a dedicated **collectible** `AssemblyLoadContext` — the BCL escape hatch for assemblies that can be unloaded. The compile path is hand-rolled `CSharpCompilation.Create` + `Emit(MemoryStream)` + `ScriptAssemblyLoadContext.LoadFromStream` rather than the legacy `CSharpScript.CreateDelegate` (which emits into the default ALC and cannot be unloaded). `ScriptEvaluator.Dispose()` calls `AssemblyLoadContext.Unload()` and `CompiledScriptCache.Clear()` disposes every materialised evaluator before dropping its dictionary entry, so the emitted assemblies become eligible for GC immediately after a config-publish. The reclaim is GC-timing-sensitive (Unload is *eligible-for-collection*, not synchronous); the next collection cycle reclaims them. Regression tests `Dispose_unloads_compiled_script_assembly_load_context` and `Clear_disposes_every_materialised_evaluator` in `CompiledScriptCacheTests` lock this contract via `WeakReference` + `GC.Collect()` assertions. Server restarts are no longer required to reclaim compiled-script memory.
**Scripting authoring convention.** With the collectible-ALC rewrite, the wrapper around a user script is an ordinary C# static method, not a Roslyn `Script` submission. The script body is pasted verbatim as the method body and must therefore end with an explicit `return …;` per ordinary C# rules — the legacy `CSharpScript` "last expression yields result" shorthand is gone. Every script in the existing test corpus already uses explicit `return`; this convention is operator-visible only when authoring a brand-new script from scratch.
### Per-evaluation timeout (`TimedScriptEvaluator<TContext, TResult>`)
Wraps `ScriptEvaluator` with a wall-clock budget. Default `DefaultTimeout = 250ms`. Implementation pushes the inner `RunAsync` onto `Task.Run` (so a CPU-bound script can't hog the calling thread before `WaitAsync` registers its timeout) then awaits `runTask.WaitAsync(Timeout, ct)`. A `TimeoutException` from `WaitAsync` is wrapped as `ScriptTimeoutException`. Caller-supplied `CancellationToken` cancellation wins over the timeout and propagates as `OperationCanceledException` — so a shutdown cancel is not misclassified. **Known leak:** when a CPU-bound script times out, the underlying `ScriptRunner` keeps running on its thread-pool thread until the Roslyn runtime returns (documented trade-off; out-of-process evaluation is a v3 concern).
Wraps `ScriptEvaluator` with a wall-clock budget. Default `DefaultTimeout = 250ms`. Implementation pushes the inner `RunAsync` onto `Task.Run` (so a CPU-bound script can't hog the calling thread before `WaitAsync` registers its timeout) then awaits `runTask.WaitAsync(Timeout, ct)`. A `TimeoutException` from `WaitAsync` is wrapped as `ScriptTimeoutException`. Caller-supplied `CancellationToken` cancellation wins over the timeout and propagates as `OperationCanceledException` — so a shutdown cancel is not misclassified. **Known leak:** when a CPU-bound script times out, the underlying compiled-script delegate keeps running on its `Task.Run` thread-pool thread until it returns of its own accord (the CT is checked only at evaluator entry; once the script body is running, only the script returning or throwing will release the thread). The post-rewrite delegate is a regular C# `Func<>` bound to the synthesized `CompiledScript.Run` method, so this is a vanilla "synchronous CPU-bound work on a pool thread" leak rather than anything Roslyn-specific. Documented trade-off; out-of-process evaluation is a v3 concern.
### Script logger plumbing
@@ -0,0 +1,451 @@
# OtOpcUa v2 — Akka.NET + Fused Hosting Alignment with ScadaLink
**Status:** Design approved, ready for implementation planning
**Date:** 2026-05-26
**Branch:** `v2-akka-fuse`
**Sister project reference:** `~/Desktop/scadalink-design` (ScadaLink)
## 1. Motivation
OtOpcUa today runs as three separate processes (`OtOpcUa.Server` OPC UA host, `OtOpcUa.Admin` Blazor Server web UI, optional `OtOpcUaWonderwareHistorian` Framework sidecar) with manual operator-driven warm-redundancy failover. The sister project ScadaLink — owned by the same developer — solved similar problems with a fused single-binary, role-gated hosting model on top of an Akka.NET cluster.
The motivation for this refactor is twofold:
1. **Consistency.** A single developer (the project owner) moves between OtOpcUa and ScadaLink frequently. Sharing patterns — hosting, auth, actor hierarchy, deployment model — reduces cognitive overhead and makes fixes portable.
2. **Real HA improvements.** Upgrade OtOpcUa's manual operator-driven failover to automatic, Akka-cluster-driven failover with Traefik routing for the web UI. Preserve OPC UA dual-endpoint client-side failover semantics (clients connect to both nodes and pick based on `ServiceLevel`), now driven automatically by Akka cluster leadership.
## 2. Architecture overview
**One binary, role-gated.** `OtOpcUa.Host` (Microsoft.NET.Sdk.Web, .NET 10) replaces `OtOpcUa.Server` and `OtOpcUa.Admin`. Same binary on every node. Role configured via `OTOPCUA_ROLES` environment variable.
**Two Akka roles, single cluster:**
- **`admin`** — hosts Blazor web UI + cluster singletons. Singletons pinned via `ClusterSingletonManagerSettings.WithRole("admin")`. Traefik routes `/` to whichever Admin-role node `/health/active` reports as leader.
- **`driver`** — hosts OPC UA endpoint + per-node `DriverHostActor` hierarchy. Every Driver-role node always serves OPC UA; `ServiceLevel` computed by `RedundancyStateActor` is broadcast back to each Driver node and used to publish to the local OPC UA address space.
Roles are additive: `OTOPCUA_ROLES=admin`, `OTOPCUA_ROLES=driver`, or `OTOPCUA_ROLES=admin,driver`. Small deployments run both roles on both nodes; larger deployments separate them.
**Per-role leadership.** `Cluster.Get(system).State.RoleLeader("driver")` drives OPC UA `ServiceLevel`. `RoleLeader("admin")` drives `/health/active` (Traefik routing). These are independent — admin and driver leadership can land on different nodes if separated.
**Cluster membership.** Both seed nodes; keep-oldest split-brain resolver; `down-if-alone = on`; 15s stable-after; 2s heartbeat / 10s threshold. CoordinatedShutdown for graceful singleton handover. Exact ScadaLink tuning.
**OPC UA dual-endpoint preserved.** Driver-role nodes all bind `opc.tcp://0.0.0.0:4840`. Clients still see N endpoints in `ServerUriArray` and fail over via `ServiceLevel`. OPC UA spec compliance unchanged from today.
**Mac dev:** role `admin,driver,dev``dev` short-circuits Windows-only driver registration (Galaxy, Wonderware) with explicit `[DEV-STUB]` log lines.
## 3. Project & process restructure
Single solution, ScadaLink-style folder layout. Existing OtOpcUa naming convention (`ZB.MOM.WW.OtOpcUa.*`) preserved.
### New entry point & deletions
| Action | Project |
|---|---|
| **New** | `OtOpcUa.Host``Microsoft.NET.Sdk.Web`, single Program.cs, role-gated startup, `AddWindowsService` |
| **Delete** | `OtOpcUa.Server` (content migrates) |
| **Delete** | `OtOpcUa.Admin` (UI moves to library) |
### New libraries
| Project | Owns | ScadaLink analog |
|---|---|---|
| `OtOpcUa.Commons` | Entity POCOs, interfaces, message contracts (`Types/`, `Interfaces/`, `Entities/`, `Messages/`) | `ScadaLink.Commons` |
| `OtOpcUa.ConfigDb` | EF Core `DbContext`, repositories, `IAuditService`, migrations, Data Protection key store | `ScadaLink.ConfigurationDatabase` |
| `OtOpcUa.Cluster` | Akka HOCON, `AkkaHostedService`, split-brain resolver config, role-aware membership helpers, `IClusterRoleInfo` | (split out of ScadaLink Host) |
| `OtOpcUa.Security` | LDAP bind, cookie+JWT hybrid, JWT issuance, role mapping, `/auth/login`, `/auth/ping` endpoints | `ScadaLink.Security` |
| `OtOpcUa.ControlPlane` | Cluster singletons: `ConfigPublishCoordinator`, `AdminOperationsActor`, `AuditWriterActor`, `FleetStatusBroadcaster`, `RedundancyStateActor` | `ScadaLink.ManagementService` |
| `OtOpcUa.Runtime` | Per-node actors: `DriverHostActor`, `DriverInstanceActor`, `VirtualTagActor`, `ScriptedAlarmActor`, `OpcUaPublishActor`, `HistorianAdapterActor`, `PeerOpcUaProbeActor`, `DbHealthProbeActor` | `ScadaLink.SiteRuntime` |
| `OtOpcUa.OpcUaServer` | OPC UA app host, address-space build, `Phase7Composer` extraction | (in ScadaLink.SiteRuntime/DCL) |
| `OtOpcUa.AdminUI` | Blazor components, hubs (`FleetStatusHub`, `AlertHub`, `ScriptLogHub`), auth state provider, `MapAdminUI<TApp>()` | `ScadaLink.CentralUI` |
### Unchanged
- Driver projects (`OtOpcUa.Driver.Galaxy`, `.Modbus`, `.S7`, `.AbCip`, `.AbLegacy`, `.TwinCAT`, `.FOCAS`, `.OpcUaClient`) — still implement `IDriver`, now consumed by `DriverInstanceActor` instead of `DriverInstanceBootstrapper`.
- `OtOpcUa.Driver.Historian.Wonderware` — .NET Framework 4.8 sidecar, named-pipe IPC, wrapped by a `HistorianAdapterActor` in `OtOpcUa.Runtime`.
- `mxaccessgw` sibling repo — unchanged; Galaxy driver still talks gRPC to it.
### Tests
- `tests/OtOpcUa.Cluster.Tests` — split-brain, leadership transitions
- `tests/OtOpcUa.ControlPlane.Tests` — singleton actor unit tests via Akka.TestKit
- `tests/OtOpcUa.Runtime.Tests` — per-node actor tests, driver lifecycle
- `tests/OtOpcUa.Security.Tests` — LDAP, cookie+JWT roundtrip
- `tests/OtOpcUa.Host.IntegrationTests` — 2-node in-process cluster, deployment flow, failover, Mac-safe
- `tests/OtOpcUa.OpcUa.IntegrationTests` — real OPCFoundation client against stubbed Host
- `tests/OtOpcUa.E2E.Tests` — full stack with Traefik (nightly CI)
### Deploy
- `deploy/Install-Services.ps1` — installs one Windows Service per node (`OtOpcUaHost`), passes role via env var. Old script replaced.
- `deploy/traefik/` — Windows Traefik config + service registration for the leader-routed `/health/active`.
- `docker-dev/` (new, optional) — 2-node Mac dev compose with stubbed drivers + LDAP + SQL Server + Traefik.
Solution file: `OtOpcUa.slnx` (matches ScadaLink convention; switch from current `.sln`).
## 4. Actor hierarchy
### Per-node tree
Rooted under `OtOpcUa.Runtime`, one tree per Driver-role node:
```
DriverHostActor (per-node coordinator, started by Host)
├─ DriverInstanceActor (per DriverInstance row)
│ └─ children = pooled or per-subscription work
├─ VirtualTagActor (per VirtualTag row)
├─ ScriptedAlarmActor (per ScriptedAlarm row)
├─ OpcUaPublishActor (per-node bridge to OPCFoundation address space)
├─ HistorianAdapterActor (per-node, wraps Wonderware named-pipe sidecar)
├─ PeerOpcUaProbeActor (per-node, tests peer OPC UA stack health)
└─ DbHealthProbeActor (per-node, cached DB health probe)
```
### Cluster singletons
Pinned to `admin` role via `ClusterSingletonManagerSettings.WithRole("admin")`:
| Actor | Owns | Notes |
|---|---|---|
| `ConfigPublishCoordinator` | The deploy protocol. Writes `Deployment` row, broadcasts `DispatchDeployment(deploymentId)` via `DistributedPubSub` to every `DriverHostActor`, tracks apply ACKs per node. | Replaces `ApplyLeaseRegistry`. Resumes after failover by re-reading ConfigDb state — no Akka.Persistence. |
| `AdminOperationsActor` | All mutating admin ops (CRUD on equipment, drivers, scripts, namespaces, ACLs). Wraps each in an audit envelope. | UI calls via `ClusterSingletonProxy` (in-process when UI is on Admin node). |
| `AuditWriterActor` | Receives `AuditEvent` telemetry from any node, batch-inserts into `ConfigAuditLog`. | Idempotent on `EventId`. |
| `FleetStatusBroadcaster` | Aggregates Akka cluster member events + per-node `DriverHostStatus` heartbeats. Publishes diffs to `IHubContext<FleetStatusHub>` and `IHubContext<AlertHub>`. | Push-driven; replaces today's 5s `FleetStatusPoller`. |
| `RedundancyStateActor` | Subscribes to `ClusterEvent.IMemberEvent` + `ClusterEvent.LeaderChanged` + per-node health probes. Computes `ServiceLevel` byte + `ServerUriArray` per Driver node. Publishes to `DistributedPubSub` topic `redundancy-state`. | Source of truth for OPC UA redundancy. Local `OpcUaPublishActor` subscribes and writes to its OPCFoundation stack. |
### Supervision
| Actor | Strategy |
|---|---|
| `DriverHostActor` | `Resume` |
| `DriverInstanceActor` | `Restart` with backoff (1s → 30s, ×1.5, jitter) |
| `VirtualTagActor` | `Restart` with backoff |
| `ScriptedAlarmActor` | `Restart` with backoff; preserve alarm state via `PreRestart` hook |
| `OpcUaPublishActor` | `Resume` |
| `HistorianAdapterActor` | `Restart` with backoff; SQLite store-and-forward buffers during pipe outage |
| All singletons | `Resume`; resumable state in ConfigDb |
| Script execution actors (short-lived) | `Stop` on failure |
### State machines
- `DriverInstanceActor` — Become/Stash for `Connecting → Connected → Reconnecting → Failed`. Bad-quality publish on disconnect; transparent re-subscribe on reconnect. Write failures returned synchronously via `Ask` from `OpcUaPublishActor`.
- `ConfigPublishCoordinator``Idle → Publishing → AwaitingApplyAcks → Sealed`, with timeout-driven escalation if a node fails to ack within `ApplyMaxDuration` (default 10 min).
- `RedundancyStateActor` — recomputes on every membership event, debounced 250ms to coalesce bursts.
### Communication conventions
- **Tell** for hot-path internal traffic (driver values, alarm state changes, publish broadcasts).
- **Ask** only at system boundaries (UI controller → `AdminOperationsActor`, with explicit timeout + cancellation token).
- **DistributedPubSub** for cluster-wide broadcasts (`DispatchDeployment`, `RedundancyStateChanged`, `FleetStatusChanged`).
- Application-level **correlation IDs** on every request/response message.
- Messages live in `OtOpcUa.Commons.Messages.{Drivers,Deploy,Admin,Audit,Redundancy}` — additive-only evolution.
### Singleton persistence
No Akka.Persistence. Each singleton reads its resumable state from `ConfigDb` on `PreStart` (e.g., `ConfigPublishCoordinator` reads the current in-flight `Deployment` row + per-node `NodeDeploymentState`) and writes on every state transition.
### Mac-dev stubs
`DevNode` role short-circuits driver registration. `DriverInstanceActor` for any Galaxy/Wonderware row enters a `Stubbed` Become state that returns deterministic test values. Logged at INFO with `[DEV-STUB] driver={Name} reason=windows-only`.
## 5. Web hosting, auth, and SignalR
### Kestrel startup gated by `admin` role
`Program.cs` builds `WebApplicationBuilder`, registers all services, but only calls `app.MapBlazor<App>()`, `app.MapHub<...>()`, `app.MapStaticAssets()`, and auth endpoints when `admin ∈ roles`. Driver-only nodes still bind Kestrel for `/healthz` on `:4841` and nothing else.
### Authentication — cookie+JWT hybrid
| Layer | Config |
|---|---|
| Cookie scheme | `OtOpcUa.Auth`, HttpOnly, SameSite=Strict, Secure (prod) / SameAsRequest (dev). Sliding 30-min idle timeout. |
| Embedded JWT | HMAC-SHA256, 15-min expiry, claims = `sub`, `roles`, `nodeAcls`. |
| LDAP bind | `LdapAuthService.AuthenticateAsync(user, pw)` at `/auth/login` POST — preserved from current `OtOpcUa.Admin/Security`. |
| Role mapping | `RoleMapper.MapGroupsToRolesAsync()` — LDAP groups → `FleetAdmin` / `ConfigEditor` / `ReadOnly`. Stays as-is. |
| Token issuance | `/auth/token` returns bearer for external clients (CLI, automation). |
| Circuit expiry probe | `/auth/ping` returns 200/401, polled by `CookieAuthenticationStateProvider` to detect expiry from inside a SignalR circuit. |
| Failure mode | LDAP unreachable → new logins fail, active sessions continue. |
### Data Protection keys
`services.AddDataProtection().PersistKeysToDbContext<OtOpcUaConfigDbContext>().SetApplicationName("OtOpcUa")` — keys live in `ConfigDb` so a circuit started on Admin-node A survives if Traefik fails over to Admin-node B mid-session.
### SignalR hubs
Three existing hubs preserved (`/hubs/fleet`, `/hubs/alerts`, `/hubs/script-log`):
- **Today:** `FleetStatusPoller` polls SQL every 5s.
- **New:** `FleetStatusBroadcaster` singleton receives Akka cluster events + per-node telemetry, pushes diffs via `IHubContext<FleetStatusHub>`. No polling.
- `HubTokenService` bearer-token fallback retired — hubs are circuit-local, cookie auth flows through SignalR natively. External hub consumers use the bearer token from `/auth/token` with a `JwtBearer` authentication scheme declaration on the hub.
### UI → backend wiring
- **Reads:** Blazor components inject scoped repositories from DI and read directly from `ConfigDb`. No change from today.
- **Writes / mutating ops:** Components inject `IAdminOperationsClient` — a thin wrapper around `ClusterSingletonProxy` to `AdminOperationsActor`. Mutations are `Ask` with a 10s timeout + correlation ID. Audit envelope built UI-side, completed singleton-side.
- **Driver diagnostics:** Today's `DriverDiagnosticsClient` HTTP round-trip retires. UI components ask `IFleetDiagnosticsClient` which delegates to `ClusterClientReceptionist`-published actor messages.
### Health endpoints
| Endpoint | Returns | Used by |
|---|---|---|
| `/health/ready` | 200 once Akka member is `Up` + ConfigDb reachable + DataProtection key ring loaded | Service supervisor readiness gate |
| `/health/active` | 200 only on the Admin-role leader; 503 elsewhere | Traefik — routes browser traffic to leader |
| `/healthz` (existing) | 200 when Driver-role actor system is up + at least one driver registered (preserved on `:4841`) | Ops probes, OPC UA monitoring tools |
### Traefik
Windows Service (or external box). One route: `host=otopcua.*` → load-balance to `{admin-node-a:9000, admin-node-b:9000}` with `/health/active` health check, sticky sessions disabled (DataProtection key sharing handles continuity).
### appsettings structure
Mirrors ScadaLink's per-component options pattern: `Cluster:`, `Security:`, `ConfigDb:`, `OpcUa:`, `Drivers:`, `Historian:` sections, bound to options classes owned by their respective component projects.
## 6. Edit + Deploy flow (replaces draft/publish generations)
The single most consequential domain change: **drop the draft/publish `ConfigGeneration` lifecycle**. Edits are live; deploy is a snapshot+push, ScadaLink-style.
### Edit model
- `Equipment`, `Driver`, `DriverInstance`, `Namespace`, `UnsItem`, `Script`, `VirtualTag`, `ScriptedAlarm`, `NodeAcl` are edited **directly** via `AdminOperationsActor`. No draft staging, no `ConfigGeneration` lifecycle. Last-write-wins per row (rowversion column for stale-write detection only).
- Live edits do **not** affect running Driver-role nodes — running stacks reflect the *last-deployed* state. The UI shows a "drift" indicator when live ConfigDb state differs from last sealed deployment.
- Validation runs on edit (semantic checks: driver tag-path validity, script syntax, namespace name uniqueness) — pulled forward from deploy-time to edit-time.
### Deploy model
```
Admin UI "Deploy" → AdminOperationsActor.Ask(StartDeployment)
AdminOperationsActor:
→ snapshot ConfigDb current state
→ ConfigComposer.Flatten() → DeploymentArtifact
→ compute RevisionHash = SHA256(canonical-serialized artifact)
→ write Deployment row (DeploymentId GUID, RevisionHash, CreatedBy, CreatedAtUtc, Status=Dispatching)
→ Ask ConfigPublishCoordinator.DispatchDeployment(deploymentId)
ConfigPublishCoordinator (cluster singleton, admin role):
→ write Deployment.Status = Dispatching
→ DistributedPubSub Publish to "deployments" topic: DispatchDeployment(deploymentId, revisionHash)
→ schedule ApplyDeadline timer (ApplyMaxDuration, default 10 min)
DriverHostActor (per node, subscribed to "deployments"):
receive DispatchDeployment(deploymentId, revisionHash):
→ if currentDeploymentRevision == revisionHash → ack Applied (idempotent)
→ else:
→ acquire per-node ApplyLock (Become Applying(deploymentId))
→ write NodeDeploymentState row (NodeId, DeploymentId, StartedAtUtc)
→ fetch artifact: read DeploymentArtifact blob from ConfigDb by deploymentId
→ diff against current applied artifact → per-instance ApplyDelta plans
→ dispatch ApplyDelta to DriverInstanceActor / VirtualTagActor / ScriptedAlarmActor children
→ collect per-instance acks (all-or-nothing per node)
→ on full success: write GenerationSealedCache (LiteDb local), update NodeDeploymentState.AppliedAtUtc
→ on any instance Failure: rollback to previous deployment, mark NodeDeploymentState=Failed
→ Tell Coordinator: ApplyAck(deploymentId, nodeId, Applied | Failed(reason))
→ Become Steady
ConfigPublishCoordinator: collect ApplyAcks
→ all Driver nodes Applied → Deployment.Status = Sealed → DistributedPubSub PublishDeploymentSealed
→ any Failed → Deployment.Status = PartiallyFailed → broadcast DeploymentFailed
→ deadline elapsed before all acks → Deployment.Status = TimedOut → broadcast DeploymentTimedOut
```
### Per-instance operation lock
All mutating commands (deploy, disable, enable, delete) on a `DriverInstance` go through `DriverInstanceActor`, which serializes them via the actor mailbox — single-threaded by construction.
### Idempotency
- `DeploymentId` + `RevisionHash` together identify a deployment.
- `DriverHostActor` seeing a `DispatchDeployment` whose `RevisionHash` matches current applied state → immediate ack `Applied`, no work. Safe to redeliver.
- `Phase7Composer.ComposeAsync(artifact)` is pure; same artifact → same delta plan.
- `DriverInstanceActor.ApplyDelta(plan)` compares against current state, applies only diffs.
### Concurrency control
- Last-write-wins on edits (no optimistic concurrency on `Equipment`, `Driver`, `Script`, etc.) — matches ScadaLink template behavior.
- **Optimistic concurrency on `Deployment` and `NodeDeploymentState` rows** (rowversion column) — prevents two concurrent Coordinator instances (during failover) from corrupting state.
### Singleton failover during deploy
1. Old Coordinator wrote `Deployment.Status = Dispatching` + `NodeDeploymentState` rows before broadcast.
2. New Coordinator on takeover queries `Deployment` rows with non-terminal `Status`.
3. For each in-flight deployment, `Ask` every `DriverHostActor` (via cluster-aware actor selection) for current `NodeDeploymentState`.
4. Recompute outstanding-ack set; resume the deadline timer with the remaining time.
5. If apply deadline already passed → mark `Deployment.Status = TimedOut` for any unack'd nodes.
### Crash recovery on Driver node restart
- `DriverHostActor.PreStart` reads `NodeDeploymentState` for self.
- If row says `Applied` for some `DeploymentId` and matches last sealed cache → Become Steady on that artifact.
- If row says `Applying` (didn't reach Applied) → discard partial state, re-fetch the artifact, replay apply (idempotent).
- If ConfigDb unreachable → fall back to local LiteDb sealed cache, Become `Stale` (drops ServiceLevel via `RedundancyStateActor`). Background reconnect retries every 30s.
### Schema migration from today
| Today | New |
|---|---|
| `ConfigGeneration` (Draft/Published/Sealed lifecycle) | **Dropped** |
| `ClusterNodeGenerationState` | Renamed → `NodeDeploymentState` with `(NodeId, DeploymentId, Status, StartedAtUtc, AppliedAtUtc, RowVersion)` |
| `ClusterNode.RedundancyRole` column | **Dropped** (Akka leader-of-driver-role is source of truth) |
| `ConfigAuditLog` | Kept; deploy events added as new event types |
| (new) `Deployment` | `(DeploymentId, RevisionHash, Status, CreatedBy, CreatedAtUtc, ArtifactBlob varbinary(max), RowVersion)` |
| (new) `ConfigEdit` audit row per Equipment/Driver/Script edit | Live-edit history |
| (new) `DataProtectionKeys` | DataProtection key ring storage |
No more `ApplyLeaseRegistry` table or watchdog actor. Apply state lives in `NodeDeploymentState`; watchdog is a Coordinator-side scheduled message keyed by `DeploymentId`.
### Stale-config fallback
Preserved from today's `GenerationSealedCache`: local LiteDb cache holds last-applied `DeploymentArtifact`. On Host boot with ConfigDb unreachable, `DriverHostActor` boots from cache → Become `Stale``RedundancyStateActor` drops `ServiceLevel` for that node.
### Peer probes consolidated
| Today | New |
|---|---|
| `PeerHttpProbeLoop` (HTTP `/healthz`) | Retired — Akka failure detector replaces it |
| `PeerUaProbeLoop` (OPC UA `opc.tcp://peer:4840`) | **Retained** as `PeerOpcUaProbeActor` — tests whether the OPC UA stack itself (not just the process) is up. Feeds `RedundancyStateActor`. |
| `DbHealthCache` (cached DB probe) | Retained as `DbHealthProbeActor` per-node. Feeds `RedundancyStateActor` + `/health/ready`. |
### ServiceLevel computation in `RedundancyStateActor`
```
serviceLevel(node) =
base 240 if (cluster member Up AND db reachable AND not stale AND opc ua probe ok)
base 200 if (member Up AND db reachable AND stale)
base 100 if (member Up AND db unreachable AND stale)
base 0 if (member Down / Unreachable)
+10 bonus if Akka driver-role leader is this node
```
ServiceLevel bands match the existing `RedundancyStatePublisher` so OPC UA client behavior is unchanged from today. The leader-bonus replaces today's operator-managed `RedundancyRole = Primary`.
## 7. Error handling & failure modes
### Akka cluster failure modes
| Scenario | Behavior |
|---|---|
| Network partition (split-brain) | Keep-oldest resolver downs the smaller side after 15s stable-after. `down-if-alone = on` covers isolated nodes. |
| Admin leader process crash | Failure detector trips after 10s, downs the member, new singleton instance starts on remaining Admin node. Traefik `/health/active` probe fails over within 1 polling interval (~5s). |
| Driver-role node crash | RedundancyStateActor sees member Down → drops that node's ServiceLevel to 0 → OPC UA clients reconnect to surviving node. Both nodes were already running their own copy; no in-cluster recovery needed for that node's work. |
| Both Admin nodes down simultaneously | Web UI unavailable. Driver nodes continue serving OPC UA from last-sealed cache. No new deployments possible until Admin node recovers. |
| All Driver nodes down | OPC UA endpoints unavailable. Clients reconnect when any Driver node returns. ServiceLevel back to 240 once member Up + DB reachable + apply sealed. |
| Singleton handover during deploy | Coordinator state survives in `Deployment` + `NodeDeploymentState` ConfigDb rows. New Coordinator queries DriverHostActors via cluster-aware actor selection. Resume remaining deadline. |
### ConfigDb unavailability
- **At edit time:** AdminUI returns user-visible error. No retries — operator decides.
- **At deploy time:** Coordinator refuses to start dispatch if it can't write the `Deployment` row.
- **At Driver node boot:** Fall back to local LiteDb sealed cache. RedundancyStateActor drops `ServiceLevel`.
- **At singleton failover:** New Coordinator's `PreStart` retries via Polly (5 attempts, exponential backoff). If exhausted → singleton crashes → cluster restarts singleton on next viable Admin node.
### Driver / equipment failures
- Driver connection loss → `DriverInstanceActor` enters `Reconnecting` Become state, publishes bad-quality to OPC UA address space immediately, retries at fixed interval.
- Tag-path-resolution failure → retried periodically.
- Write failure to driver → returned synchronously to caller via `Ask` from `OpcUaPublishActor`.
- Driver process unresponsive (Galaxy gateway down) → `IDriver.HealthCheck` returns degraded → `DriverInstanceActor` reports to `DriverHostActor``RedundancyStateActor` factors into ServiceLevel.
### Wonderware historian sidecar
- Named-pipe disconnect → `HistorianAdapterActor` enters `Reconnecting`; alarm history rows buffered to local SQLite store-and-forward.
- Sidecar process crash → no in-cluster recovery (external process); operator restarts via Windows Service control.
### Auth failures
- LDAP unreachable → `/auth/login` returns 503. Active sessions continue with cached claims.
- JWT signature failure (key ring drift) → 401, session terminates. DataProtection keys in ConfigDb prevent this in the happy path.
- Cookie expired (sliding 30-min idle) → `/auth/ping` returns 401 → `CookieAuthenticationStateProvider` triggers UI logout.
### SignalR / circuit drops
- Blazor circuit dropped → `App.razor` reload script reconnects (preserved from today).
- Hub message loss during reconnect → `FleetStatusBroadcaster` resends current state to the reconnecting client on `OnConnectedAsync` (full snapshot, not just diffs).
### OPC UA stack failures
- Address-space corruption → `OpcUaPublishActor` logs ERROR, sends `RebuildAddressSpace` to itself; sequence number bump notifies clients to resubscribe.
- OPC UA listener bind failure (port collision) → Host fails readiness probe, supervisor restarts service.
### Audit invariants
- Audit write failures **never abort** the user-facing action. `AuditWriterActor` buffer overflow → log WARN, drop oldest (with counter metric). The action's success/failure path is authoritative.
- All deploy + edit events carry `ExecutionId` (per-request correlation) so audit rows for one operator action share an ID.
## 8. Testing strategy
Test projects mirror the new layering. Test infrastructure stays Mac-friendly: stubbed Windows-only drivers, ephemeral SQL Server (LocalDB on Windows / `mcr.microsoft.com/mssql/server` container on Mac), `OpenLDAP` container, all spun up via `tests/docker-compose.yml`.
### Layered test pyramid
| Layer | Project | What it covers |
|---|---|---|
| **Unit** | `OtOpcUa.Runtime.Tests` | Per-actor logic via `Akka.TestKit.Xunit2`. `DriverInstanceActor` state-machine transitions, `Phase7Composer` purity, `ScriptedAlarmActor` state machine, `VirtualTagActor` expression eval. Drivers mocked via `IDriver` test doubles. |
| **Unit** | `OtOpcUa.ControlPlane.Tests` | Singleton actor logic. `ConfigPublishCoordinator` happy path + timeout + concurrent ack ordering. `RedundancyStateActor` ServiceLevel computation truth table. `AuditWriterActor` batch flush + idempotency on duplicate `EventId`. |
| **Unit** | `OtOpcUa.Cluster.Tests` | Split-brain resolver config validation, role-aware membership helpers, HOCON parses. |
| **Unit** | `OtOpcUa.Security.Tests` | LDAP role mapping, JWT issuance, cookie+JWT roundtrip, `/auth/ping` expiry semantics. |
| **Integration** | `OtOpcUa.Host.IntegrationTests` | 2-node in-process Akka cluster. Real SQL Server, stubbed drivers. Tests: deploy happy path, deploy timeout, deploy with one node down, singleton failover mid-deploy, ConfigDb outage + stale-config fallback, edit-then-deploy roundtrip, audit row emission. |
| **Integration** | `OtOpcUa.OpcUa.IntegrationTests` | Real OPCFoundation client connects to a running stubbed Host. Asserts: dual endpoint visible, ServerUriArray populated, ServiceLevel reflects leader status, browse + read + write through `OpcUaPublishActor`, write failures returned synchronously. |
| **End-to-end** | `OtOpcUa.E2E.Tests` | Full Host with Traefik in front, two Admin nodes + two Driver nodes (4 processes via Docker). Verifies: web UI login via LDAP, deploy from UI flows to OPC UA stack, kill admin leader → Traefik fails over within 25s, kill driver node → OPC UA clients reconnect with correct ServiceLevel. CI nightly. |
### Failover-specific test cases
1. Kill Admin leader during `Dispatching` phase → new Coordinator resumes, deployment seals.
2. Kill Admin leader during `AwaitingApplyAcks` → new Coordinator queries DriverHostActors, completes ack collection.
3. Kill Driver node during `Applying` → Coordinator marks that node's `NodeDeploymentState=Failed` after deadline; surviving Driver nodes complete their apply.
4. Restart Driver node mid-deploy → on restart, replays apply (idempotent).
5. Akka split-brain (network partition between 2 admin nodes) → keep-oldest wins, smaller side downs itself within 15s.
6. Both Admin nodes restart simultaneously → deployments in `Dispatching` resume cleanly after cluster reforms.
7. Concurrent edits to the same `DriverInstance` from two UI sessions → last write wins, both audit rows present, no row corruption.
### Deploy idempotency tests
- Replay `DispatchDeployment` with same `DeploymentId/RevisionHash` → no work, ack `Applied`.
- Apply same `DeploymentArtifact` twice in a row → second application is a no-op.
- Crash DriverHostActor mid-apply, restart → resumes from `NodeDeploymentState`, completes idempotently.
### Property tests
- `Phase7Composer.ComposeAsync` is pure: same artifact → same plan, no side effects.
- `RedundancyStateActor` ServiceLevel computation: every combination of (member-state, db-ok, stale, opc-ok, is-leader) produces expected byte.
- Audit envelope generation: every mutating op produces exactly one audit row with stable `ExecutionId` correlation.
### Mac-dev test invariants
- All unit + integration tests run on macOS without Windows-only assemblies.
- Cluster tests use in-process Akka.Remote on 127.0.0.1.
- LDAP tests use `OpenLDAP` container or `Security:Ldap:DevStubMode=true`.
### Retired tests
Anything touching `ConfigGeneration` lifecycle, `ApplyLeaseRegistry`, `PeerHttpProbeLoop`, `FleetStatusPoller`, `RedundancyCoordinator` peer-probe loops, `RedundancyStatePublisher`.
## 9. Risks & open questions
1. **Akka.NET on .NET 10.** Verify Akka.NET 1.5+ targets .NET 10 cleanly.
2. **OPCFoundation SDK threading.** The OPC UA stack runs its own threadpool. `OpcUaPublishActor` must marshal writes via thread-safe wrappers; use a dedicated `synchronized-dispatcher` for actors that touch the OPC UA address space.
3. **Failure detector tuning.** ScadaLink's 2s/10s is tuned for site-to-central RTT. Benchmark before locking. Aggressive tuning + GC pauses → spurious singleton handover.
4. **ServiceLevel = Akka leader removes operator control.** No escape hatch in v1. If a customer needs one later, add a `PinnedPrimary` column to `ClusterNode` and an override path in `RedundancyStateActor`. Out of scope now.
5. **Long-lived v2 branch drift.** Monthly rebase from main, CI runs on v2 from day one.
6. **Schema migration is destructive.** Dropping `ConfigGeneration` + `ClusterNode.RedundancyRole` is one-way. Cutover must run on a quiesced system. Provide a `Migrate-To-V2.ps1` script that backs up ConfigDb, runs EF migrations, validates row counts, prints a summary.
7. **Wonderware + mxaccessgw still external processes.** Both untouched by this refactor. Future actorization would be a second refactor.
8. **Audit row volume.** Edit-heavy install ≈ 5k rows/day. Need monthly partition + 365-day retention same as ScadaLink #23.
## 10. Migration plan
Big-bang on `v2-akka-fuse` branch:
1. Branch `v2-akka-fuse` off `main`.
2. Add new projects: `OtOpcUa.Host`, `.Cluster`, `.Security`, `.ControlPlane`, `.Runtime`, `.ConfigDb`, `.Commons`, `.AdminUI`, `.OpcUaServer`. Convert to `OtOpcUa.slnx`.
3. Move ConfigDb access (EF context, repos, migrations) out of `Server` and `Admin` into `OtOpcUa.ConfigDb`. Add DataProtection key store table.
4. Move LDAP + cookie + JWT out of `Admin/Security` into `OtOpcUa.Security`. Adopt 15-min JWT / 30-min sliding cookie / `/auth/ping`.
5. Build `OtOpcUa.Cluster`: HOCON, `AkkaHostedService`, role-aware membership helpers, split-brain resolver.
6. Build `OtOpcUa.ControlPlane`: `ConfigPublishCoordinator`, `AdminOperationsActor`, `AuditWriterActor`, `FleetStatusBroadcaster`, `RedundancyStateActor`.
7. Build `OtOpcUa.Runtime`: `DriverHostActor`, `DriverInstanceActor`, `VirtualTagActor`, `ScriptedAlarmActor`, `OpcUaPublishActor`, `HistorianAdapterActor`, `PeerOpcUaProbeActor`, `DbHealthProbeActor`.
8. Migrate `Phase7Composer` to `OtOpcUa.OpcUaServer`; make it pure and unit-tested.
9. Move Blazor components from `Admin` into `OtOpcUa.AdminUI` library; replace `DriverDiagnosticsClient` HTTP with in-process actor calls; rewire `FleetStatusHub` / `AlertHub` / `ScriptLogHub` to be fed by `FleetStatusBroadcaster` `IHubContext`.
10. Build `OtOpcUa.Host` `Program.cs`: role-gated startup, health endpoints (`/health/ready`, `/health/active`, `/healthz`), `AddWindowsService`.
11. ConfigDb migration: add `Deployment`, `ConfigEdit`, `DataProtectionKeys` tables; rename `ClusterNodeGenerationState``NodeDeploymentState`; drop `ConfigGeneration`; drop `ClusterNode.RedundancyRole`. EF migration + idempotent SQL script + `Migrate-To-V2.ps1`.
12. Delete `OtOpcUa.Server`, `OtOpcUa.Admin`, `DriverInstanceBootstrapper`, `RedundancyCoordinator`, `RedundancyStatePublisher`, `ApplyLeaseRegistry`, `FleetStatusPoller`, `PeerHttpProbeLoop`, `HubTokenService`. Sweep any `*RedundancyRole*` references.
13. Update `deploy/Install-Services.ps1`: single Windows Service per node, role via env var, Traefik service registration.
14. Update docs in `docs/`: rewrite `Redundancy.md`, `ServiceHosting.md`; add `Cluster.md`, `ControlPlane.md`, `Runtime.md`. Add top-level `Architecture-v2.md` summary.
15. CI: add integration test job for the 2-node cluster + OPC UA roundtrip.
16. Tag the last v1 release on `main` for backport-only fixes. Merge `v2-akka-fuse``main` when GA.
File diff suppressed because it is too large Load Diff
@@ -0,0 +1,101 @@
{
"planPath": "docs/plans/2026-05-26-akka-hosting-alignment-plan.md",
"branch": "v2-akka-fuse",
"designDoc": "docs/plans/2026-05-26-akka-hosting-alignment-design.md",
"lastUpdated": "2026-05-26T00:00:00Z",
"tasks": [
{"id": 0, "subject": "Task 0: Create branch and central package management", "status": "completed", "classification": "small", "estMinutes": 3, "parallelizableWith": [], "commit": "2b81147"},
{"id": 1, "subject": "Task 1: Create OtOpcUa.Commons project", "status": "completed", "classification": "small", "estMinutes": 3, "parallelizableWith": [2,3,4,5,6,7,8], "blockedBy": [0], "commit": "30a2104"},
{"id": 2, "subject": "Task 2: Create OtOpcUa.Cluster project", "status": "completed", "classification": "small", "estMinutes": 3, "parallelizableWith": [1,3,4,5,6,7,8], "blockedBy": [0], "commit": "30a2104"},
{"id": 3, "subject": "Task 3: Create OtOpcUa.Security project", "status": "completed", "classification": "small", "estMinutes": 3, "parallelizableWith": [1,2,4,5,6,7,8], "blockedBy": [0], "commit": "30a2104"},
{"id": 4, "subject": "Task 4: Create OtOpcUa.ControlPlane project", "status": "completed", "classification": "small", "estMinutes": 3, "parallelizableWith": [1,2,3,5,6,7,8], "blockedBy": [0], "commit": "30a2104"},
{"id": 5, "subject": "Task 5: Create OtOpcUa.Runtime project", "status": "completed", "classification": "small", "estMinutes": 3, "parallelizableWith": [1,2,3,4,6,7,8], "blockedBy": [0], "commit": "30a2104"},
{"id": 6, "subject": "Task 6: Create OtOpcUa.OpcUaServer project", "status": "completed", "classification": "small", "estMinutes": 3, "parallelizableWith": [1,2,3,4,5,7,8], "blockedBy": [0], "commit": "30a2104"},
{"id": 7, "subject": "Task 7: Create OtOpcUa.AdminUI Razor class library", "status": "completed", "classification": "small", "estMinutes": 3, "parallelizableWith": [1,2,3,4,5,6,8], "blockedBy": [0], "commit": "30a2104"},
{"id": 8, "subject": "Task 8: Create OtOpcUa.Host Web SDK project", "status": "completed", "classification": "small", "estMinutes": 5, "parallelizableWith": [1,2,3,4,5,6,7], "blockedBy": [0], "commit": "30a2104"},
{"id": 9, "subject": "Task 9: Build green smoke check", "status": "completed", "classification": "trivial", "estMinutes": 2, "parallelizableWith": [], "blockedBy": [1,2,3,4,5,6,7,8], "commit": "30a2104"},
{"id": 10, "subject": "Task 10: Add Deployment entity", "status": "completed", "classification": "standard", "estMinutes": 5, "parallelizableWith": [11,12,13], "blockedBy": [9], "commit": "8e2c4f2"},
{"id": 11, "subject": "Task 11: Add NodeDeploymentState entity", "status": "completed", "classification": "standard", "estMinutes": 5, "parallelizableWith": [10,12,13], "blockedBy": [9], "commit": "8e2c4f2"},
{"id": 12, "subject": "Task 12: Add ConfigEdit audit entity", "status": "completed", "classification": "small", "estMinutes": 4, "parallelizableWith": [10,11,13], "blockedBy": [9], "commit": "8e2c4f2"},
{"id": 13, "subject": "Task 13: Add DataProtection keys table", "status": "completed", "classification": "small", "estMinutes": 3, "parallelizableWith": [10,11,12], "blockedBy": [9], "commit": "8e2c4f2"},
{"id": "14a", "subject": "Task 14a: Add RowVersion to live-edit entities", "status": "completed", "classification": "standard", "estMinutes": 10, "parallelizableWith": [], "blockedBy": [13], "commit": "4bb4ad8"},
{"id": "14b", "subject": "Task 14b: Decouple live-edit entities from ConfigGeneration", "status": "completed", "classification": "high-risk", "estMinutes": 30, "parallelizableWith": [], "blockedBy": ["14a"], "commit": "13d3aea"},
{"id": "14c", "subject": "Task 14c: Obsolete GenerationApplier/Diff/SealedCache", "status": "completed", "classification": "high-risk", "estMinutes": 20, "parallelizableWith": [], "blockedBy": ["14b"], "commit": "1ddf8bb"},
{"id": "14d", "subject": "Task 14d: Drop ClusterNode.RedundancyRole", "status": "completed", "classification": "standard", "estMinutes": 5, "parallelizableWith": ["14a","14b","14c"], "blockedBy": [13], "commit": "3c915e6"},
{"id": "14e", "subject": "Task 14e: Delete ConfigGeneration + ClusterNodeGenerationState", "status": "completed", "classification": "small", "estMinutes": 5, "parallelizableWith": [], "blockedBy": ["14b","14c"], "commit": "e00f46d"},
{"id": "14f", "subject": "Task 14f: V2HostingAlignment EF migration (consolidator)", "status": "completed", "classification": "high-risk", "estMinutes": 15, "parallelizableWith": [], "blockedBy": ["14a","14b","14c","14d","14e"], "commit": "605dbf3"},
{"id": 15, "subject": "Task 15: Migrate-To-V2.ps1 idempotent script", "status": "completed", "classification": "standard", "estMinutes": 5, "parallelizableWith": [16,17,18], "blockedBy": ["14f"], "commit": "c168c1c"},
{"id": 16, "subject": "Task 16: Common types (CorrelationId, ExecutionId, NodeId, ...)", "status": "completed", "classification": "standard", "estMinutes": 5, "parallelizableWith": [17,18], "blockedBy": [9], "commit": "fee4a8c"},
{"id": 17, "subject": "Task 17: Akka message contracts", "status": "completed", "classification": "standard", "estMinutes": 5, "parallelizableWith": [16,18], "blockedBy": [16], "commit": "5d3a5a4"},
{"id": 18, "subject": "Task 18: Common interfaces", "status": "completed", "classification": "small", "estMinutes": 4, "parallelizableWith": [16,17], "blockedBy": [16], "commit": "136234e"},
{"id": 19, "subject": "Task 19: HOCON config", "status": "completed", "classification": "standard", "estMinutes": 5, "parallelizableWith": [20,21,22], "blockedBy": [2], "commit": "3d0f4dc"},
{"id": 20, "subject": "Task 20: AkkaHostedService implementation", "status": "completed", "classification": "standard", "estMinutes": 5, "parallelizableWith": [19,21,22], "blockedBy": [2,18], "commit": "f184f8e"},
{"id": 21, "subject": "Task 21: Role parser from OTOPCUA_ROLES env", "status": "completed", "classification": "small", "estMinutes": 3, "parallelizableWith": [19,20,22], "blockedBy": [2], "commit": "dfb0636"},
{"id": 22, "subject": "Task 22: ClusterRoleInfo implementation", "status": "completed", "classification": "standard", "estMinutes": 5, "parallelizableWith": [19,20,21], "blockedBy": [18,20], "commit": "c217c49"},
{"id": 23, "subject": "Task 23: Cluster test project + tests", "status": "completed", "classification": "standard", "estMinutes": 5, "parallelizableWith": [], "blockedBy": [19,20,21,22], "commit": "e0b6d56"},
{"id": 24, "subject": "Task 24: Move LdapAuthService into OtOpcUa.Security", "status": "completed", "classification": "standard", "estMinutes": 5, "parallelizableWith": [25], "blockedBy": [3], "commit": "567b8ca"},
{"id": 25, "subject": "Task 25: JwtTokenService", "status": "completed", "classification": "standard", "estMinutes": 5, "parallelizableWith": [24], "blockedBy": [3], "commit": "93316e3"},
{"id": 26, "subject": "Task 26: Cookie+JWT hybrid AddOtOpcUaAuth extension", "status": "completed", "classification": "standard", "estMinutes": 5, "parallelizableWith": [27,28], "blockedBy": [13,24,25], "commit": "207fc6a"},
{"id": 27, "subject": "Task 27: /auth/login, /auth/ping, /auth/token endpoints", "status": "completed", "classification": "standard", "estMinutes": 5, "parallelizableWith": [26,28], "blockedBy": [24,25], "commit": "8be84ba"},
{"id": 28, "subject": "Task 28: CookieAuthenticationStateProvider for Blazor", "status": "completed", "classification": "small", "estMinutes": 4, "parallelizableWith": [26,27], "blockedBy": [25], "commit": "e38f22e"},
{"id": 29, "subject": "Task 29: Security test project + tests", "status": "completed", "classification": "standard", "estMinutes": 5, "parallelizableWith": [], "blockedBy": [24,25,26,27,28], "commit": "38ea0c5"},
{"id": 30, "subject": "Task 30: ConfigPublishCoordinator happy path", "status": "completed", "classification": "high-risk", "estMinutes": 5, "parallelizableWith": [32,33,34,35], "blockedBy": [4,17,18,10,11], "commit": "62e12da"},
{"id": 31, "subject": "Task 31: Coordinator timeout + failover recovery", "status": "completed", "classification": "high-risk", "estMinutes": 5, "parallelizableWith": [32,33,34,35], "blockedBy": [30], "commit": "f193872"},
{"id": 32, "subject": "Task 32: AdminOperationsActor + StartDeployment", "status": "completed", "classification": "standard", "estMinutes": 5, "parallelizableWith": [30,31,33,34,35], "blockedBy": [4,17,18,10,12], "commit": "ef683f5"},
{"id": 33, "subject": "Task 33: AuditWriterActor batched idempotent insert", "status": "completed", "classification": "standard", "estMinutes": 5, "parallelizableWith": [30,31,32,34,35], "blockedBy": [4,17], "commit": "23f669c"},
{"id": 34, "subject": "Task 34: FleetStatusBroadcaster", "status": "completed", "classification": "standard", "estMinutes": 5, "parallelizableWith": [30,31,32,33,35], "blockedBy": [4,17], "commit": "dd122c4"},
{"id": 35, "subject": "Task 35: RedundancyStateActor + ServiceLevelCalculator", "status": "completed", "classification": "high-risk", "estMinutes": 5, "parallelizableWith": [30,31,32,33,34], "blockedBy": [4,17,18], "commit": "6b37f99"},
{"id": 36, "subject": "Task 36: Singleton registration extension (admin role)", "status": "completed", "classification": "standard", "estMinutes": 4, "parallelizableWith": [], "blockedBy": [30,31,32,33,34,35], "commit": "52bf4b3"},
{"id": 37, "subject": "Task 37: DriverHostActor scaffolding + PreStart recovery", "status": "completed", "classification": "high-risk", "estMinutes": 5, "parallelizableWith": [41,42,43,44], "blockedBy": [5,17,18,11], "commit": "ed13013"},
{"id": 38, "subject": "Task 38: DriverHostActor DispatchDeployment handler", "status": "completed", "classification": "high-risk", "estMinutes": 5, "parallelizableWith": [41,42,43,44], "blockedBy": [37], "commit": "ed13013"},
{"id": 39, "subject": "Task 39: DriverHostActor stale-config fallback", "status": "completed", "classification": "standard", "estMinutes": 4, "parallelizableWith": [41,42,43,44], "blockedBy": [38], "commit": "ed13013"},
{"id": 40, "subject": "Task 40: Runtime test project bootstrap", "status": "completed", "classification": "small", "estMinutes": 3, "parallelizableWith": [], "blockedBy": [37,38,39], "commit": "ed13013"},
{"id": 41, "subject": "Task 41: DriverInstanceActor state machine", "status": "completed", "classification": "high-risk", "estMinutes": 5, "parallelizableWith": [42,43,44], "blockedBy": [5,17,40], "commit": "64c627f"},
{"id": 42, "subject": "Task 42: VirtualTagActor", "status": "completed", "classification": "standard", "estMinutes": 5, "parallelizableWith": [41,43,44], "blockedBy": [5,17,40], "commit": "39729bf"},
{"id": 43, "subject": "Task 43: ScriptedAlarmActor", "status": "completed", "classification": "standard", "estMinutes": 5, "parallelizableWith": [41,42,44], "blockedBy": [5,17,40], "commit": "95ef533"},
{"id": 44, "subject": "Task 44: OpcUaPublishActor on synchronized dispatcher", "status": "completed", "classification": "high-risk", "estMinutes": 5, "parallelizableWith": [41,42,43], "blockedBy": [5,6,17,19,40], "commit": "e115f13"},
{"id": 45, "subject": "Task 45: HistorianAdapter + PeerOpcUaProbe + DbHealthProbe actors", "status": "completed", "classification": "standard", "estMinutes": 5, "parallelizableWith": [], "blockedBy": [37,40], "commit": "28639cb"},
{"id": 46, "subject": "Task 46: Extract OpcUaApplicationHost + Phase7Composer", "status": "completed", "classification": "standard", "estMinutes": 5, "parallelizableWith": [], "blockedBy": [6], "commit": "2877a88"},
{"id": 47, "subject": "Task 47: Phase7Composer purity + property tests", "status": "completed", "classification": "standard", "estMinutes": 5, "parallelizableWith": [48,49,50,51,52], "blockedBy": [46], "commit": "b7c117a"},
{"id": 48, "subject": "Task 48: Move Blazor components into AdminUI library", "status": "completed", "classification": "standard", "estMinutes": 5, "parallelizableWith": [47], "blockedBy": [7], "commit": "1a067e6"},
{"id": 49, "subject": "Task 49: Move SignalR hubs and rewire to FleetStatusBroadcaster", "status": "completed", "classification": "standard", "estMinutes": 5, "parallelizableWith": [50,51,52], "blockedBy": [34,48], "commit": "26d8f2f"},
{"id": 50, "subject": "Task 50: IAdminOperationsClient via ClusterSingletonProxy", "status": "completed", "classification": "standard", "estMinutes": 5, "parallelizableWith": [49,51,52], "blockedBy": [18,32,48], "commit": "f022499"},
{"id": 51, "subject": "Task 51: Replace DriverDiagnosticsClient with IFleetDiagnosticsClient", "status": "completed", "classification": "standard", "estMinutes": 5, "parallelizableWith": [49,50,52], "blockedBy": [18,48], "commit": "b83f099"},
{"id": 52, "subject": "Task 52: Drift indicator + Deploy button UI", "status": "completed", "classification": "standard", "estMinutes": 5, "parallelizableWith": [49,50,51], "blockedBy": [50,48], "commit": "f167808"},
{"id": 53, "subject": "Task 53: Host Program.cs role-gated startup", "status": "completed", "classification": "high-risk", "estMinutes": 5, "parallelizableWith": [54,55], "blockedBy": [8,15,20,21,22,26,36,40,45,46,48,49], "commit": "e2b357f"},
{"id": 54, "subject": "Task 54: Health endpoints + appsettings layout", "status": "completed", "classification": "standard", "estMinutes": 5, "parallelizableWith": [53,55], "blockedBy": [8,22], "commit": "fa1d685"},
{"id": 55, "subject": "Task 55: Mac dev mode + DEV-STUB drivers", "status": "completed", "classification": "standard", "estMinutes": 5, "parallelizableWith": [53,54], "blockedBy": [41], "commit": "8b4de80"},
{"id": 56, "subject": "Task 56: Delete OtOpcUa.Server + OtOpcUa.Admin projects", "status": "completed", "classification": "high-risk", "estMinutes": 5, "parallelizableWith": [], "blockedBy": [53,54,55], "commit": "76310b8"},
{"id": 57, "subject": "Task 57: Build & test green check", "status": "completed", "classification": "trivial", "estMinutes": 3, "parallelizableWith": [], "blockedBy": [56], "commit": "76310b8"},
{"id": 58, "subject": "Task 58: 2-node integration test harness", "status": "completed", "classification": "standard", "estMinutes": 5, "parallelizableWith": [], "blockedBy": [57], "commit": "d6fac2d", "deviation": "Also consolidated to a single Akka.Hosting ActorSystem — Program.cs ran two competing ActorSystems (custom AkkaHostedService + Akka.Hosting AddAkka). Cluster singletons landed on the bare one. Fixed in this commit; AkkaHostedService.cs deleted. docker-compose.yml (SQL+OpenLDAP for real local runs) deferred — harness uses EF in-memory."},
{"id": 59, "subject": "Task 59: Deploy + failover integration tests", "status": "completed", "classification": "standard", "estMinutes": 5, "parallelizableWith": [60], "blockedBy": [58], "commit": "5cfbe8b", "deviation": "Happy-path + idempotency landed. Failover scenarios (kill-mid-apply, split-brain, restart-during-deploy) deferred as F22 — they need node-down/restart primitives on the harness. Two production bugs fixed in this commit: (1) coordinator missing DPS subscription for ACKs, (2) NodeId collision on shared loopback host."},
{"id": 60, "subject": "Task 60: OPC UA dual-endpoint + ServiceLevel tests", "status": "pending", "classification": "standard", "estMinutes": 5, "parallelizableWith": [59], "blockedBy": [58]},
{"id": 61, "subject": "Task 61: E2E test infrastructure + GitHub Actions CI", "status": "completed", "classification": "standard", "estMinutes": 5, "parallelizableWith": [], "blockedBy": [59,60], "commit": "253fb60", "deviation": "CI workflow files landed but E2E test project (tests/Server/ZB.MOM.WW.OtOpcUa.E2ETests) deferred — it lands when F10/F11/F12 wire enough engine for an end-to-end round-trip to be meaningful. The E2E workflow runs against the docker-dev fleet but its --filter Category=E2E currently matches zero tests."},
{"id": 62, "subject": "Task 62: Rewrite Install-Services.ps1", "status": "completed", "classification": "standard", "estMinutes": 5, "parallelizableWith": [63,64,65], "blockedBy": [53], "commit": "e40615d"},
{"id": 63, "subject": "Task 63: Traefik config + docker-dev compose", "status": "completed", "classification": "standard", "estMinutes": 5, "parallelizableWith": [62,64,65], "blockedBy": [53], "commit": "7e3b56c", "deviation": "Untested on macOS (no local Docker). Compose file should work — exercise + adjust on first run against a real Docker host."},
{"id": 64, "subject": "Task 64: Update existing docs (Redundancy, ServiceHosting, security)", "status": "completed", "classification": "standard", "estMinutes": 5, "parallelizableWith": [62,63,65], "blockedBy": [57], "commit": "3c3fef9", "deviation": "Redundancy.md + ServiceHosting.md full rewrites. security.md v2 banner only — full per-section rewrite waits for F15 (Admin pages migration) since security.md references many pages that will move. README.md platform-overview updated."},
{"id": 65, "subject": "Task 65: New v2 docs (Architecture-v2, Cluster, ControlPlane, Runtime)", "status": "completed", "classification": "standard", "estMinutes": 5, "parallelizableWith": [62,63,64], "blockedBy": [57], "commit": "1689901"},
{"id": "F1", "subject": "Follow-up: AuthEndpoints integration tests against fused Host", "status": "completed", "classification": "small", "estMinutes": 10, "parallelizableWith": ["F2"], "blockedBy": [53], "commit": "463512d", "origin": "Deviation from Task 29 (commit 38ea0c5) — deferred until Task 53 wires AddOtOpcUaAuth/MapOtOpcUaAuth in Program. Add WebApplicationFactory<OtOpcUa.Host.Program> tests for /auth/login (204/401/503), /auth/ping (401/200), /auth/token (200+JWT), /auth/logout (204+cookie clear) using a stub ILdapAuthService.", "deviation": "Used HostBuilder + TestServer directly (Security.Tests/AuthEndpointsIntegrationTests) instead of WebApplicationFactory<Program> — Host needs Akka cluster bootstrap that's out of scope for this contract test. Cluster-mode auth coverage belongs in Task 58."},
{"id": "F2", "subject": "Follow-up: Replace JwtBearer BuildServiceProvider antipattern with IPostConfigureOptions", "status": "completed", "classification": "small", "estMinutes": 5, "parallelizableWith": ["F1"], "blockedBy": [], "commit": "45a8c79", "origin": "Deviation from Task 26 (commit 207fc6a) — AddOtOpcUaAuth uses services.BuildServiceProvider().CreateScope() inside .AddJwtBearer lambda (ASP0000). Refactor to IPostConfigureOptions<JwtBearerOptions> so validation parameters resolve lazily from the real request provider."},
{"id": "F3", "subject": "Follow-up: Add EventId unique column to ConfigAuditLog for cross-restart audit idempotency", "status": "completed", "classification": "small", "estMinutes": 15, "parallelizableWith": ["F4"], "blockedBy": [], "commit": "f57f61d", "origin": "Deviation from Task 33 — AuditWriterActor only dedups in-buffer; ConfigAuditLog lacks EventId column so a duplicate AuditEvent that arrives after a flush becomes a duplicate row. Add nullable EventId Guid + filtered unique index, migration, and refactor AuditWriterActor.WrapDetails away."},
{"id": "F4", "subject": "Follow-up: Harden AuditWriterActor.WrapDetails JSON synthesis with System.Text.Json", "status": "completed", "classification": "small", "estMinutes": 5, "parallelizableWith": ["F3"], "blockedBy": [], "commit": "f57f61d", "deviation": "Moot — F3 deleted WrapDetails entirely (EventId/CorrelationId now live in dedicated columns).", "origin": "Self-review of Task 33 — WrapDetails uses string concat; malformed caller DetailsJson would produce invalid JSON and trip the CK_ConfigAuditLog_DetailsJson_IsJson constraint, killing the entire flush batch. Discard this task if F3 lands first (F3 removes WrapDetails entirely)."},
{"id": "F5", "subject": "Follow-up: ConfigPublishCoordinator multi-node happy-path test", "status": "completed", "classification": "standard", "estMinutes": 30, "parallelizableWith": [], "blockedBy": [], "commit": "5cfbe8b", "deviation": "Delivered by Task 59 — DeployHappyPathTests.StartDeployment_seals_after_both_nodes_apply exercises the exact 'dispatch to N driver nodes, all ack, seals' flow via the real 2-node TwoNodeClusterHarness rather than a multi-system TestKit. Cleaner because it tests the production code path end-to-end.", "origin": "Self-review of Task 30 — single-ActorSystem TestKit can't simulate the plan's 'dispatch to N driver nodes, all ack, seals' happy path because DiscoverDriverNodes() needs real cluster membership. Add a multi-system test (two ActorSystems joined into one cluster, driver-role on the second)."},
{"id": "F6", "subject": "Follow-up: RedundancyStateActor publisher abstraction so tests don't need DPS bootstrap", "status": "completed", "classification": "small", "estMinutes": 10, "parallelizableWith": [], "blockedBy": [], "commit": "dfc143c", "origin": "Self-review of Task 35 — RedundancyStateActorTests are skipped because single-node DistributedPubSub bootstrap is unreliable in TestKit. Inject an Action<object> broadcast so tests can replace it with a probe; un-skip both tests."},
{"id": "F7", "subject": "Follow-up: DriverInstanceActor full engine wiring (subscriptions, writes, ApplyDelta diff)", "status": "completed", "classification": "standard", "estMinutes": 45, "parallelizableWith": [], "blockedBy": [44], "origin": "Self-review of Task 41 — subscription publishing, ApplyDelta diffing, bad-quality-on-disconnect, write path, and supervisor backoff are stubbed. Wire after OpcUaPublishActor lands.", "shipped": "All three pieces landed: (1) spawn lifecycle in DriverHostActor (DriverSpawnPlanner + IDriverFactory seam) — da14149, (2) ISubscribable wiring + OPC UA status-code → OpcUaQuality severity-bit mapping + DetachSubscription on disconnect/PostStop, (3) IWritable.WriteAsync write path with 5s timeout, status-code bubble-up, and AttributeValuePublished published to parent on every OnDataChange — both shipped in the F7-residual batch. Host DI binding (DriverFactoryBootstrap registers AbCip/AbLegacy/FOCAS/Galaxy/Modbus/S7/TwinCAT factories) lives in src/Server/ZB.MOM.WW.OtOpcUa.Host/Drivers/."},
{"id": "F8", "subject": "Follow-up: VirtualTagActor engine wiring (compile expression, subscribe deps, publish result)", "status": "partial", "classification": "standard", "estMinutes": 30, "parallelizableWith": [], "blockedBy": [], "origin": "Self-review of Task 42 — VirtualTagEngine.Evaluate not called; DependencyValueChanged just buffers.", "shipped": "(1) IVirtualTagEvaluator seam + NullVirtualTagEvaluator default. VirtualTagActor calls evaluator on DependencyValueChanged, dedupes unchanged results, emits EvaluationResult to parent, publishes Warning ScriptLogEntry on failure. (2) DependencyMuxActor in Runtime fans out DriverInstanceActor.AttributeValuePublished from DriverHostActor through to interested VirtualTagActor subscribers. VirtualTagActor takes dependencyRefs + mux ActorRef in Props, registers interest in PreStart, unregisters in PostStop. WithOtOpcUaRuntimeActors spawns the mux + threads it into DriverHostActor. Production binding to Core.VirtualTags.VirtualTagEngine (expression compile + dep extraction) still TODO — split as F8b."},
{"id": "F9", "subject": "Follow-up: ScriptedAlarmActor engine wiring + state persistence", "status": "partial", "classification": "standard", "estMinutes": 30, "parallelizableWith": [], "blockedBy": [], "origin": "Self-review of Task 43 — AlarmConditionService not called; PreRestart persistence to ScriptedAlarmState DB not wired; HistorianAdapter rows not emitted.", "shipped": "(1) IScriptedAlarmEvaluator seam + NullScriptedAlarmEvaluator default. ScriptedAlarmActor takes AlarmConfig (id/name/path/severity/predicate), evaluates on DependencyValueChanged, publishes AlarmTransitionEvent + ScriptLogEntry on every transition. (2) IAlarmActorStateStore seam in Commons.Engines + NullAlarmActorStateStore default + EfAlarmActorStateStore production adapter over the ScriptedAlarmState entity. ScriptedAlarmActor PreStart loads + restores; every Transition fires a fire-and-forget save with lastAckUser. Predicate binding to Core.ScriptedAlarms.ScriptedAlarmEngine still TODO — split as F9b."},
{"id": "F10", "subject": "Follow-up: OpcUaPublishActor SDK integration (address-space writes + ServiceLevel + RebuildAddressSpace)", "status": "partial", "classification": "high-risk", "estMinutes": 60, "parallelizableWith": [], "blockedBy": [47], "origin": "Self-review of Task 44 — SDK calls stubbed; counters only. Wire after Phase 7 OpcUaServer extraction.", "shipped": "(1) IOpcUaAddressSpaceSink + IServiceLevelPublisher seams in Commons.OpcUa with Null* defaults. OpcUaPublishActor routes through the sink, dedupes ServiceLevelChanged, subscribes to redundancy-state DPS topic, maps redundancy snapshot to a coarse ServiceLevel (Primary+leader=240, Primary=200, Secondary=100, Detached=0). (2) OtOpcUaNodeManager (CustomNodeManager2) + OtOpcUaSdkServer (StandardServer subclass) + SdkAddressSpaceSink in OpcUaServer — lazy variable creation on first WriteValue, WriteAlarmState shape, RebuildAddressSpace tear-down. Variable updates propagate via ClearChangeMasks so subscribed OPC UA clients see them. Tests boot a real StandardServer + verify sink writes show up in the manager. Production wiring through OpcUaApplicationHost.StartAsync (default server = OtOpcUaSdkServer) + IServiceLevelPublisher SDK binding + #109 OpcUaPublishActor→Phase7Applier integration are the remaining pieces."},
{"id": "F11", "subject": "Follow-up: HistorianAdapterActor named-pipe IPC + SqliteStoreAndForwardSink wiring", "status": "completed", "classification": "standard", "estMinutes": 30, "parallelizableWith": [], "blockedBy": [], "commit": "6861381", "deviationNotes": "Reshaped HistorianAdapterActor around the existing IAlarmHistorianSink abstraction (alarm-event shape, not the original tag-history-row stub). Defaults to NullAlarmHistorianSink; production deployments wire SqliteStoreAndForwardSink + WonderwareHistorianClient via AddOtOpcUaRuntime overrides. Actor now exposes GetStatus returning HistorianSinkStatus for diagnostics. Named-pipe transport implementation lives unchanged in src/Drivers/ZB.MOM.WW.OtOpcUa.Driver.Historian.Wonderware.Client/WonderwareHistorianClient.cs — the actor is intentionally just a fire-and-forget bridge.", "origin": "Self-review of Task 45 — stub buffers in-memory; named-pipe + SQLite store-and-forward not wired."},
{"id": "F12", "subject": "Follow-up: PeerOpcUaProbeActor real opc.tcp ping (replace Ok=true stub)", "status": "completed", "classification": "small", "estMinutes": 20, "parallelizableWith": [], "blockedBy": [], "commit": "b06e3ae", "deviation": "TCP-connect probe rather than full OPC UA Hello/Acknowledge handshake. Enough for the redundancy calc; deeper liveness signals can layer on later without changing the actor's contract.", "origin": "Self-review of Task 45 — RunProbe always returns Ok=true; replace with OPC UA Client connect."},
{"id": "F13", "subject": "Follow-up: Full OpcUaApplicationHost extraction (security/alarms/history/observability)", "status": "partial", "classification": "high-risk", "estMinutes": 120, "parallelizableWith": [], "blockedBy": [], "commit": "36c4751-partial", "deviationNotes": "F13a (cert auto-creation) shipped in 36c4751. Remaining: endpoint-security wiring (SecurityProfileResolver into ServerConfiguration.SecurityPolicies), LDAP user-token validator (the OPC UA UserNameToken path; HTTP-layer LDAP auth is separate and already in OtOpcUa.Security), scripted-alarm node manager creation, history backend wiring, observability hooks (OpenTelemetry metrics + traces). These are gated by F10's OpcUaPublishActor SDK integration — until F10 lands, nothing instantiates OpcUaApplicationHost so the missing wiring is dead weight.", "origin": "Self-review of Task 46 — facade only boots ApplicationInstance + StandardServer. Legacy 391-line file pulls Server.Security/Alarms/History/Observability. Pull those into thin OpcUaServer interfaces."},
{"id": "F14", "subject": "Follow-up: Migrate side-effecting Phase7Composer (EquipmentNodeWalker, trace logs, node cache)", "status": "partial", "classification": "standard", "estMinutes": 60, "parallelizableWith": [], "blockedBy": [], "origin": "Self-review of Task 47 — pure version covers the projection. Walker + alarm sink registration + cache mutation stay in legacy until split into Phase7Applier.", "shipped": "Phase7Plan + Phase7Planner.Compute (pure diff over EquipmentNodes/DriverInstancePlans/ScriptedAlarmPlans by stable id, with Added/Removed/Changed lists). Phase7Applier consumes plan + IOpcUaAddressSpaceSink: drives RebuildAddressSpace on Equipment/Alarm topology change, writes inactive AlarmState for removed nodes, catches + logs sink faults. Driver-only changes correctly skip the rebuild (DriverHostActor's spawn-plan in Runtime handles those). Walker integration with the real SDK NodeManager is the remaining piece — split as F14b (consumes the existing EquipmentNodeWalker once F10b lands an SDK builder)."},
{"id": "F15", "subject": "Follow-up: Migrate 47 legacy Admin Blazor components into AdminUI library", "status": "completed", "classification": "high-risk", "estMinutes": 180, "commit": "Phase A-D (read views) + F15.2 batches 1-4 (live-edit CRUD) + F15.3 (live alerts/script-log/CSV import/Monaco)", "deviationNotes": "All 4 phases of read-only views shipped: Phase A (shell/auth/fleet/hosts), B (cluster CRUD + Overview/Redundancy), C (Equipment/UNS/Namespaces/Drivers/Tags/ACLs), D (Audit/VirtualTags/ScriptedAlarms/Scripts/RoleGrants/Certificates/Reservations/AlarmsHistorian). Per Q1Q5 of docs/v2/AdminUI-rebuild-plan.md: typed driver editors deferred, top-level VirtualTags/ScriptedAlarms kept (Q2 reversed for cross-cluster discoverability), routes-not-tabs adopted, fleet-wide LDAP→role map only, generic login errors. Live-edit forms (F15.2) and ScriptLog page (depends on F16 ScriptLogHub) are explicit follow-ups.", "parallelizableWith": [], "blockedBy": [], "origin": "Self-review of Task 48 — only MapAdminUI scaffold + 1 new page (Deployments). 47 pages stay in legacy Admin (accepted-broken) until Task 56."},
{"id": "F16", "subject": "Follow-up: Bridge FleetStatusBroadcaster → SignalR hubs (FleetStatusHub / AlertHub / ScriptLogHub)", "status": "completed", "classification": "standard", "estMinutes": 30, "parallelizableWith": [], "blockedBy": [], "commit": "f18c285", "deviation": "FleetStatusHub bridge landed. AlertHub + ScriptLogHub deferred — they need upstream message contracts that aren't defined yet (alerts emerge from F9 ScriptedAlarmActor, script logs from F8 VirtualTagActor).", "origin": "Self-review of Task 49 — hubs are passive Hub subclasses; the bridge from FleetStatusBroadcaster.broadcast → IHubContext is not wired."},
{"id": "F17", "subject": "Follow-up: FleetDiagnosticsClient real Akka ActorSelection round-trip (GetDiagnosticsRequest)", "status": "completed", "classification": "standard", "estMinutes": 30, "parallelizableWith": [], "blockedBy": [], "commit": "8f32b89", "origin": "Self-review of Task 51 — client returns an empty snapshot stub. Add GetDiagnosticsRequest contract + DriverHostActor handler + real Ask/Reply."},
{"id": "F18", "subject": "Follow-up: Thread HttpContext.User.Identity.Name into Deployments page (createdBy)", "status": "completed", "classification": "small", "estMinutes": 5, "parallelizableWith": [], "blockedBy": [], "commit": "b266f63", "origin": "Self-review of Task 52 — Deployments.razor hardcodes createdBy=\"(current user)\"; needs @inject AuthenticationStateProvider."},
{"id": "F19", "subject": "Follow-up: RuntimeStartup extension for driver-role node-actor spawn", "status": "completed", "classification": "standard", "estMinutes": 20, "parallelizableWith": [], "blockedBy": [], "commit": "09d6676", "origin": "Self-review of Task 53 — only admin-role singletons are wired via WithOtOpcUaControlPlaneSingletons. Driver-role nodes need a parallel WithOtOpcUaRuntimeActors that spawns DriverHostActor."},
{"id": "F20", "subject": "Follow-up: Wire DriverInstanceActor.ShouldStub() into DriverHostActor child spawn", "status": "completed", "classification": "small", "estMinutes": 10, "parallelizableWith": ["F7"], "blockedBy": [], "origin": "Self-review of Task 55 — ShouldStub helper exists but nothing calls it. Folds into F7 when DriverHostActor learns to spawn DriverInstanceActor children.", "shipped": "DriverHostActor.SpawnChild now calls DriverInstanceActor.ShouldStub(type, _localRoles) and routes Windows-only driver types to the stub path on non-Windows / dev-role hosts. Verified by DriverHostActorReconcileTests.Galaxy_on_non_windows_is_stubbed_by_ShouldStub_check."},
{"id": "F21", "subject": "Follow-up: docker-compose.yml for Host.IntegrationTests (real SQL Server + OpenLDAP)", "status": "completed", "classification": "standard", "estMinutes": 30, "parallelizableWith": [], "blockedBy": [], "commit": "b0a2bb0", "deviationNotes": "Stack shipped (SQL on 14331, OpenLDAP on 3894). HarnessMode reads OTOPCUA_HARNESS_USE_SQL=1 / USE_LDAP=1 from env; SQL mode uses per-harness unique DB via EnsureCreated. Compose itself not local-validated — DESKTOP-6JL3KKO has no Docker per CLAUDE.md; CI on Linux will exercise the real path. The xunit test-trait split was punted — env vars are simpler and cover the same use case (one suite, two modes, no test-class duplication).", "origin": "Deviation from Task 58 — TwoNodeClusterHarness uses EF InMemoryDatabase + StubLdapAuthService. For Mac-friendly local runs against real SQL constraints + LDAP, add tests/Server/ZB.MOM.WW.OtOpcUa.Host.IntegrationTests/docker-compose.yml (SQL Server + OpenLDAP), wire EF SqlServer provider behind an env var (OTOPCUA_HARNESS_USE_SQL=1), and add a test trait so CI can run both modes."},
{"id": "F22", "subject": "Follow-up: failover scenario integration tests (kill-mid-apply, split-brain, restart-during-deploy)", "status": "completed", "classification": "standard", "estMinutes": 60, "parallelizableWith": [], "blockedBy": [], "commit": "cd5540c", "deviationNotes": "Shipped 3 scenarios on the existing 2-node harness: stop-shrinks, restart-rejoins-same-port, deploy-with-one-node-down. Split-brain via simulated partition deferred — Akka.Hosting + xunit don't expose transport-level interference cleanly. The graceful-shutdown + rejoin path is what production actually exercises; ungraceful kill-mid-apply non-deterministic under SBR's 15s stable-after.", "origin": "Deviation from Task 59 — happy-path + idempotency landed but design §8 cases 3-7 need controlled node-down primitives on TwoNodeClusterHarness (StopNodeAsync, RestartNodeAsync, PartitionBetweenAsync). Add those + 5 scenario tests."}
]
}
+13 -3
View File
@@ -583,10 +583,20 @@ language binding.
**Depends on:** A.1 merged (proto change live).
**Files** (`c:\Users\dohertj2\Desktop\mxaccessgw\clients\`):
**Files** (`c:\Users\dohertj2\Desktop\mxaccessgw\src\` for .NET — note the sibling
repo restructured after this plan was written; `clients/dotnet/MxGateway.Client.csproj`
no longer exists, the proto contracts now live in
`src/ZB.MOM.WW.MxGateway.Contracts/` under the new namespace
`ZB.MOM.WW.MxGateway.Contracts.Proto[.Galaxy]`; the OtOpcUa driver currently
consumes vendored binaries from the pre-restructure build — see
`src/Drivers/ZB.MOM.WW.OtOpcUa.Driver.Galaxy/libs/README.md`):
1. **.NET** — codegen runs on csproj rebuild via `Grpc.Tools`; just
rebuild `MxGateway.Client.csproj` after pulling A.1.
1. **.NET** — codegen runs on csproj rebuild via `Grpc.Tools`; rebuild
`src/ZB.MOM.WW.MxGateway.Contracts/ZB.MOM.WW.MxGateway.Contracts.csproj`
after pulling A.1. (If unwinding the driver's vendored binaries onto the
new contracts namespace as part of the alarm work, namespace-rename + a
reimplementation of the missing `MxGatewayClient` / `MxGatewaySession`
wrappers is also in scope.)
2. **Python** — run `clients\python\generate-proto.ps1`; commit the
regenerated `_pb2.py` + `_pb2_grpc.py` files under
`clients\python\src\`.
+3 -4
View File
@@ -1,6 +1,6 @@
# High-Level Requirements
> **Revision** — Refreshed 2026-04-19 for the OtOpcUa v2 multi-driver platform (task #205). The original 2025 text described a single-process Galaxy/MXAccess server called LmxOpcUa. Today the project is the **OtOpcUa** multi-driver OPC UA platform deployed as three cooperating processes (Server, Admin, Galaxy.Host). The Galaxy integration is one of seven shipped drivers. HLR-001 through HLR-008 have been rewritten driver-agnostically; HLR-009 has been retired (the embedded Status Dashboard is superseded by the Admin UI). HLR-010 through HLR-017 are new and cover plug-in drivers, resilience, Config DB / draft-publish, cluster redundancy, fleet-wide identifier uniqueness, Admin UI, audit logging, metrics, and the Roslyn capability-wrapping analyzer.
> **Revision** — Refreshed 2026-05-23 for the OtOpcUa v2 multi-driver platform. The original 2025 text described a single-process Galaxy/MXAccess server called LmxOpcUa. Today the project is the **OtOpcUa** multi-driver OPC UA platform deployed as two cooperating processes (Server, Admin). The Galaxy integration is one of seven shipped drivers and is now an in-process Tier-A driver that talks gRPC to a separately installed `mxaccessgw` gateway (sibling repo) — PR 7.2 (2026-04-30) retired the legacy out-of-process `Galaxy.Host` Windows service. HLR-001 through HLR-008 have been rewritten driver-agnostically; HLR-009 has been retired (the embedded Status Dashboard is superseded by the Admin UI). HLR-010 through HLR-017 cover plug-in drivers, resilience, Config DB / draft-publish, cluster redundancy, fleet-wide identifier uniqueness, Admin UI, audit logging, metrics, and the Roslyn capability-wrapping analyzer.
## HLR-001: OPC UA Server
@@ -28,11 +28,10 @@ Drivers whose backend has a native change signal (e.g. Galaxy's `time_of_last_de
## HLR-007: Service Hosting
The system shall be deployed as three cooperating Windows services:
The system shall be deployed as two cooperating Windows services (the legacy `OtOpcUa.Galaxy.Host` x86 host was retired in PR 7.2 — Galaxy access now flows through the separately installed `mxaccessgw` gateway, which lives in a sibling repository and is not part of the OtOpcUa deployment):
- **OtOpcUa.Server** — .NET 10 x64, `Microsoft.Extensions.Hosting` + `AddWindowsService`, hosts all non-Galaxy drivers in-process and the OPC UA endpoint.
- **OtOpcUa.Server** — .NET 10 AnyCPU, `Microsoft.Extensions.Hosting` + `AddWindowsService` (decision #30 replaced the original TopShelf choice), hosts every driver in-process — including the new Tier-A `GalaxyDriver` that speaks gRPC to `mxaccessgw` and the OPC UA endpoint.
- **OtOpcUa.Admin** — .NET 10 x64 Blazor Server web app, hosts the admin UI, SignalR hubs for live updates, `/metrics` Prometheus endpoint, and audit log writers.
- **OtOpcUa.Galaxy.Host** — .NET Framework 4.8 x86 (TopShelf), hosts MXAccess COM + Galaxy Repository SQL + Historian plugin. Talks to `Driver.Galaxy.Proxy` inside `OtOpcUa.Server` via a named pipe (MessagePack over length-prefixed frames, per-process shared secret, SID-restricted ACL).
## HLR-008: Logging
+14
View File
@@ -1,5 +1,19 @@
# Security
> **v2 status (2026-05-26).** The four security concerns below are unchanged in v2.
> Paths + project names moved: `OtOpcUa.Server/Security/``OtOpcUa.Security/`
> (`Ldap/`, `Jwt/`, `Endpoints/AuthEndpoints.cs`), `OtOpcUa.Admin` is gone (its
> auth + role-grant pages live in `OtOpcUa.AdminUI`), and Admin auth policies
> register in `OtOpcUa.Host/Program.cs` via `AddOtOpcUaAuth` rather than in a
> separate Admin process. The v2 `Security:Jwt` section adds JWT bearer auth
> alongside the existing cookie scheme (`AddJwtBearer` wired via
> `IPostConfigureOptions<JwtBearerOptions>` in `OtOpcUa.Security`). DataProtection
> keys persist to the shared `ConfigDb.DataProtectionKeys` table so cookies
> survive failover between admin-role nodes.
>
> See `docs/plans/2026-05-26-akka-hosting-alignment-design.md` §5 for the v2
> auth + DataProtection rationale.
OtOpcUa has four independent security concerns. This document covers all four:
1. **Transport security** — OPC UA secure channel (signing, encryption, X.509 trust).
+265
View File
@@ -0,0 +1,265 @@
# Admin UI rebuild plan (F15)
**Status:** UX kickoff — proposals to react to before any per-page rebuild starts.
**Last updated:** 2026-05-26 on `v2-akka-fuse`.
## Why this isn't a straight port
The v1 Admin UI was built around `ConfigGeneration` draft → publish:
operators edited a **draft** generation, the system computed a **diff** against the
last published one, and a manual **Publish** sealed it. Six full pages
(`DraftEditor`, `DiffViewer`, `DiffSection`, `Generations`, plus the per-tab
"viewing draft N" header) lived to make this workflow legible.
v2 replaces that with **live-edit + snapshot-deploy** (decisions #14a#14e on this
branch). Edits write directly to live tables guarded by `RowVersion`
concurrency; deploying is a single click that snapshots the current live state
and dispatches it via Akka. Drift between "current live" and "last sealed
deployment" surfaces as a one-line indicator on the
[Deployments](../../src/Server/ZB.MOM.WW.OtOpcUa.AdminUI/Components/Pages/Deployments.razor)
page.
That collapses **six pages → zero** before we ship a line of new Razor. The
remaining ~41 legacy pages map to ~30 v2 pages once redundant fleet-wide views
fold into their cluster-tab equivalents.
## Inventory: 47 legacy pages → v2 disposition
Source: `git show 76310b8^ -- 'src/Server/ZB.MOM.WW.OtOpcUa.Admin/**/*.razor'`.
### Site shell (5 files) — port
| Legacy | v2 status | Notes |
|---|---|---|
| `App.razor`, `Routes.razor`, `_Imports.razor` | Port | Boilerplate; minor render-mode tweaks |
| `Layout/MainLayout.razor` | ✅ Already in v2 | Done in Task 48 |
| `Components/Pages/Login.razor`, `Account.razor` | Port | Auth endpoints changed (cookie+JWT hybrid, Task 26); login form posts to `/auth/login` now |
### Shared widgets (5 files) — port
| Legacy | v2 status |
|---|---|
| `StatusBadge.razor` | ✅ Already in v2 |
| `LoadingSpinner.razor` | ✅ Already in v2 |
| `ToastNotification.razor` | ✅ Already in v2 |
| `ClusterAuthorizeView.razor`, `RedirectToLogin.razor` | Port — adjust for v2 `IUserAuthenticator` |
### Fleet (1 file) — reshape
| Legacy | v2 strategy |
|---|---|
| `Fleet.razor` | **Reshape.** Drop the v1 "poller reads central DB" data source. v2 reads `NodeDeploymentState` (Applied / Failed / Stale per node) + subscribes to `FleetStatusHub` for live `ServiceLevel` updates (already wired in F16) + queries `IFleetDiagnosticsClient.GetDiagnostics` (F17) for per-node driver health. Single page, similar shape to v1. |
### Cluster CRUD (3 files) — port
| Legacy | v2 strategy |
|---|---|
| `ClustersList.razor` | Port |
| `NewCluster.razor` | Port |
| `ClusterDetail.razor` | **Port — drop draft/publish chrome.** No "New draft" button; no "current published" sidebar. Replace with "Last deployed" badge + a "Deploy" button (already a SignalR-aware widget on the Deployments page; this becomes a cluster-scoped variant). |
### Draft/publish workflow (4 files) — **drop entirely**
| Legacy | v2 strategy |
|---|---|
| `DraftEditor.razor` | **Drop.** No drafts in v2. |
| `DiffViewer.razor` | **Drop.** Drift indicator replaces it on Deployments page. |
| `DiffSection.razor` | **Drop.** |
| `Generations.razor` | **Drop — replaced by `Deployments.razor`** (already shipped in v2 ahead of F15). |
### Cluster tabs (11 files) — port as live-edit forms
Each becomes a live-edit surface: load the entity, bind to a form, save with
`RowVersion` concurrency check (409 on conflict → toast + reload). No "viewing
draft N" header; no per-tab snapshot view.
| Legacy tab | v2 strategy |
|---|---|
| `EquipmentTab.razor` | Port — UNS path tree picker stays |
| `UnsTab.razor` | Port — same |
| `NamespacesTab.razor` | Port |
| `DriversTab.razor` | Port — **driver-type-specific editors are a separate question (see below)** |
| `TagsTab.razor` | Port |
| `AclsTab.razor` | Port — wire to v2 LDAP group → role mapping (see RoleGrants question) |
| `RedundancyTab.razor` | Port — surface v2 `ServiceLevel` calc (Task 35) instead of v1 redundancy state machine |
| `ScriptedAlarmsTab.razor` | Port |
| `ScriptsTab.razor` | Port |
| `VirtualTagsTab.razor` | Port |
| `AuditTab.razor` | Port — wire to v2 `ConfigAuditLog` (post-F3 schema: `EventId`, `CorrelationId` columns) |
### Cluster-scoped editors (3 files) — port as reusable inputs
| Legacy | v2 strategy |
|---|---|
| `IdentificationFields.razor` | Port |
| `ImportEquipment.razor` | Port |
| `ScriptEditor.razor` | Port |
### Cross-cluster pages (8 files) — mixed
| Legacy | v2 strategy |
|---|---|
| `Hosts.razor` | Port — reshape to "Akka cluster members" (showing `host:port` NodeIds, roles, redundancy state) |
| `Certificates.razor` | Port — F13a's `PkiStoreRoot` becomes the data source |
| `Reservations.razor` | Port |
| `RoleGrants.razor` | **Reshape.** v1 was cluster-scoped role grants; v2 uses LDAP group → role mapping (see Q4 below) |
| `AlarmsHistorian.razor` | Port — wire to F11's `HistorianAdapterActor.GetStatus` (queue depth + drain state) |
| `ScriptLog.razor` | Port — needs SignalR hub bridge (F16 deferred ScriptLogHub) |
| `ScriptedAlarms.razor` (top-level) | **Possibly drop** (see Q2 below) |
| `VirtualTags.razor` (top-level) | **Possibly drop** (see Q2 below) |
### Driver-typed editors (5 files) — sequencing decision needed
| Legacy | v2 strategy |
|---|---|
| `Drivers/FocasDetail.razor` | Defer — JSON editor in `DriversTab` covers the same config initially |
| `Modbus/ModbusOptionsEditor.razor` | Same |
| `Modbus/ModbusAddressEditor.razor` | Same |
| `Modbus/ModbusAddressPreview.razor` | Same |
| `Modbus/ModbusDiagnostics.razor` | Port — separate from the config editor, this is operational telemetry |
### Account (1 file) — port
| Legacy | v2 strategy |
|---|---|
| `Account.razor` | Port — minor reshape for JWT (token expiry UI, refresh button) |
## Summary by disposition
| Disposition | Count |
|---|---|
| Already in v2 | 5 |
| Port as-is | 22 |
| Port + reshape | 7 |
| **Drop (replaced by live-edit / Deployments page)** | **5** |
| Drop (redundant with cluster tab) | 2 (pending Q2) |
| Defer (driver-typed editors) | 4 |
| **Total active rebuild** | ~30 pages |
## Open design questions
These need answers before per-page sequencing starts. They drive how many
phases the rebuild takes and what gets cut.
### Q1 — Driver-typed editors: ship now or defer?
**Context.** v1 had typed editors for Modbus + FOCAS driver config. They sat
behind a generic JSON editor for the other six driver types. The typed editors
caught operator typos that the JSON editor missed (port ranges, slave-ID
collisions, address-map overlaps).
**Options.**
- **Defer all typed editors.** Ship `DriversTab` with a JSON editor first; add
typed editors per-driver as field requests come in. Saves ~1 day on F15.
- **Port the existing two.** Modbus + FOCAS were already validated against
field use. The other six driver types stay JSON-only.
- **Ship all eight typed editors.** Most work, best UX. ~3 extra days on F15.
**Recommendation:** Defer. The OPC UA dual-endpoint tests + driver
engine wiring (F7-F10) are higher-leverage and need attention first.
### Q2 — Top-level `ScriptedAlarms.razor` and `VirtualTags.razor`: keep or drop?
**Context.** In v1, these were fleet-wide views of every scripted alarm and
virtual tag across every cluster. The cluster tabs let you edit them; the
top-level pages let you find them across clusters.
**Options.**
- **Drop.** Fleet-wide view is rare; cluster scope covers 95% of use.
- **Keep as read-only.** Cross-cluster search + drill-down to the per-cluster tab.
**Recommendation:** Drop, but expose a global search on the top nav that
matches cluster + alarm/tag names if operators ask.
### Q3 — ClusterDetail: 10 tabs or split routes?
**Context.** v1 had 10 nav-tabs inside `ClusterDetail.razor`. Some are very
heavy (Tags can be 10k rows; AuditTab streams). All 10 share render state.
**Options.**
- **Keep tabs.** Familiar; one URL per cluster.
- **Split into routes.** `/clusters/{id}/equipment`, `/clusters/{id}/tags`,
etc. Better deep-linking, better load (one tab's data per page), easier auth
scoping.
**Recommendation:** Split into routes. The v1 monolith was already groaning
under the live-update SignalR fan-in; routes let each surface manage its own
subscription lifecycle.
### Q4 — RoleGrants: cluster-scoped table or LDAP group → role map?
**Context.** v1 had a per-cluster `RoleGrants` table where you mapped users to
cluster-scoped roles (ClusterAdmin, ClusterOperator, etc.). v2 introduced
LDAP-driven auth: LDAP group membership maps to OPC UA permissions
(`ReadOnly`, `WriteOperate`, `WriteTune`, `WriteConfigure`, `AlarmAck`)
fleet-wide.
**Options.**
- **Keep v1 model.** Cluster-scoped grants survive; LDAP just provides the
username.
- **Replace with fleet-wide LDAP-group → role mapping.** v2's `LdapOptions`
already has a `GroupToRole` dictionary; surface that in a single fleet-level
page.
- **Both.** LDAP map for fleet-wide defaults; per-cluster overrides for
scoping.
**Recommendation:** Fleet-wide LDAP-group → role map only. Per-cluster scoping
adds combinatorial complexity that v2's redundancy model doesn't need
(every driver-role node runs every driver in the fleet).
### Q5 — Login UI: backed by `/auth/login` (cookie+JWT hybrid) — what about LDAP error UX?
**Context.** v2's `/auth/login` does an LDAP bind. Failures come back as
specific reasons (invalid creds vs. service-account misconfig vs. server
unreachable). The default behavior is to lump them all into "Login failed."
**Options.**
- **Generic "Login failed."** Safer; doesn't leak whether the username exists.
- **Specific error categories.** Helps operators diagnose deploy issues.
**Recommendation:** Generic for production deployments, specific when
`Authentication:Ldap:AllowInsecureLdap=true` (dev mode signal).
## Proposed sequencing (4 phases)
Each phase is independently mergeable. The branch ships when Phase A is in;
Phases BD can follow as smaller PRs.
### Phase A — Shell + auth + fleet (minimum-viable Admin)
~½–1 day. Ships a working admin surface with no config editing.
- Port `App.razor`, `Routes.razor`, `_Imports.razor`
- Port `Login.razor` (post Q5)
- Port `Account.razor`
- Reshape `Fleet.razor` against v2 data sources
- Port `Hosts.razor` reshape
### Phase B — Cluster CRUD + Overview/Redundancy tabs
~1 day. Adds cluster browse + readonly redundancy view.
- Port `ClustersList`, `NewCluster`, `ClusterDetail` (Overview tab only)
- Port `RedundancyTab` (read-only — surfaces v2 `ServiceLevel`)
- Split into routes if Q3 = split
### Phase C — Config editor tabs
~2 days. The big chunk — the live-edit config surface.
- `EquipmentTab`, `UnsTab`, `NamespacesTab`
- `DriversTab` (JSON-only initially per Q1)
- `TagsTab`
- `AclsTab` post Q4 reshape
- `ImportEquipment`, `IdentificationFields`
### Phase D — Logic + ops pages
~1 day.
- `VirtualTagsTab`, `ScriptedAlarmsTab`, `ScriptsTab`, `ScriptEditor`
- `AuditTab` against new ConfigAuditLog schema
- `RoleGrants` post Q4 reshape
- `Certificates`
- `Reservations`
- `AlarmsHistorian`, `ScriptLog` (depends on F16 ScriptLogHub deferred)
## Out of scope for F15
- Typed driver editors (Q1, deferred unless reversed)
- Top-level fleet-wide ScriptedAlarms / VirtualTags pages (Q2, recommended drop)
- Per-cluster RoleGrants (Q4, recommended drop)
- ScriptLogHub SignalR bridge (F16 deferred — only needed for Phase D's
ScriptLog page; can move to a separate F16-extension follow-up)
+127
View File
@@ -0,0 +1,127 @@
# OtOpcUa v2 Architecture
Single-page tour of the v2 layout. For decision history + tradeoffs, see [`2026-05-26-akka-hosting-alignment-design.md`](../plans/2026-05-26-akka-hosting-alignment-design.md).
## Big picture
```
┌─────────────────────────────────────────────┐
│ OtOpcUa.Host │ (fused binary)
│ │
│ reads OTOPCUA_ROLES env, mounts: │
│ ┌─────────────────────────────────────┐ │
│ │ admin → Blazor + auth + control- │ │
│ │ plane singletons │ │
│ │ driver → OPC UA endpoint + │ │
│ │ per-node actors │ │
│ └─────────────────────────────────────┘ │
└─────────────────────────────────────────────┘
│ joins
┌─────────────────────────────────────────────┐
│ Akka.NET cluster │
│ (split-brain resolver: keep-oldest, 15s) │
└─────────────────────────────────────────────┘
shared by every node: ┌─────────────────┐
│ ConfigDb (SQL) │ live-edit + Deployment artifacts + audit
└─────────────────┘
```
The v1 setup was two separate Windows services (`OtOpcUa.Server` + `OtOpcUa.Admin`) talking through the DB. v2 collapses them into one binary with role gating, and adds an Akka cluster so admin singletons can drive deploys and the redundancy story is automatic.
## Project layout
```
src/Core/ shared abstractions, no Server deps
ZB.MOM.WW.OtOpcUa.Commons types + Akka message contracts + interfaces
ZB.MOM.WW.OtOpcUa.Cluster HOCON, AkkaClusterOptions, IClusterRoleInfo
ZB.MOM.WW.OtOpcUa.Configuration EF Core DbContext + entities
src/Server/ server-side projects
ZB.MOM.WW.OtOpcUa.Security cookie+JWT auth, LDAP, JwtTokenService
ZB.MOM.WW.OtOpcUa.ControlPlane admin-role cluster singletons
ZB.MOM.WW.OtOpcUa.Runtime driver-role per-node actors
ZB.MOM.WW.OtOpcUa.OpcUaServer OPC UA endpoint facade + Phase7Composer
ZB.MOM.WW.OtOpcUa.AdminUI Blazor Razor class library
ZB.MOM.WW.OtOpcUa.Host fused binary (Program.cs)
```
| Project | Role | Doc |
|---|---|---|
| Cluster | Bootstrap + cluster topology view | [Cluster.md](Cluster.md) |
| ControlPlane | Admin singletons (deploy, audit, fleet, redundancy) | [ControlPlane.md](ControlPlane.md) |
| Runtime | Driver-role actor tree | [Runtime.md](Runtime.md) |
| Security | Cookie+JWT auth, LDAP, /auth/* endpoints | [../security.md](../security.md) |
| OpcUaServer | OPC UA endpoint host + composer | [../OpcUaServer.md](../OpcUaServer.md) |
| Host | Role-gated DI graph + Program.cs | [../ServiceHosting.md](../ServiceHosting.md) |
## Role gating
`Program.cs` reads `OTOPCUA_ROLES` once (per process) and decides what to wire:
```csharp
var roles = RoleParser.Parse(Environment.GetEnvironmentVariable("OTOPCUA_ROLES"));
var hasAdmin = roles.Contains("admin");
var hasDriver = roles.Contains("driver");
builder.Services.AddOtOpcUaConfigDb(builder.Configuration);
builder.Services.AddOtOpcUaCluster(builder.Configuration);
builder.Services.AddAkka("otopcua", (ab, sp) =>
{
ab.WithOtOpcUaClusterBootstrap(sp); // HOCON + remote + cluster options
if (hasAdmin) ab.WithOtOpcUaControlPlaneSingletons();
if (hasDriver) ab.WithOtOpcUaRuntimeActors();
});
if (hasAdmin)
{
builder.Services.AddOtOpcUaAuth(builder.Configuration);
builder.Services.AddAdminUI();
// SignalR, AdminOpsClient, etc.
}
builder.Services.AddOtOpcUaHealth();
```
There is a **single** ActorSystem. Cluster singletons + per-node actors share it via the `Akka.Hosting` registry. This was a v2 fix (the initial Phase 9 wiring ran two ActorSystems by mistake; see commit `d6fac2d`).
## Live-edit vs draft/publish
v1 had `ConfigGeneration(Draft|Published)` with every live-edit entity FK'd to a generation. Edits accumulated in a Draft until Publish promoted them.
v2 removes that entirely:
- No `ConfigGeneration` table, no `GenerationId` columns.
- Every live-edit entity has a `RowVersion` (`IsRowVersion()`) for last-write-wins.
- Audit goes to `ConfigEdit` (per-row delta) and `ConfigAuditLog` (event-level).
- Deploys snapshot the *current* DB state into an immutable `Deployment.ArtifactBlob` + its `RevisionHash`. That artifact is what driver nodes apply.
See [ControlPlane.md § Deploy flow](ControlPlane.md#deploy-flow) for the end-to-end dispatch + ACK + seal sequence.
## NodeId
Each cluster member has a `NodeId` derived as `{PublicHostname}:{Port}` of the Akka remote endpoint. `ClusterRoleInfo.LocalNode` + `ConfigPublishCoordinator.DiscoverDriverNodes()` use the same formula so they always agree. The port suffix makes loopback test deployments distinguishable (commit `5cfbe8b`); in production the hostname alone is already unique.
## Health endpoints
| Path | Returns 200 when… |
|---|---|
| `/healthz` | Process is alive (no checks). |
| `/health/ready` | DB reachable + this node is `Up` in the cluster. |
| `/health/active` | This node is the admin role-leader (used by Traefik/HA-LB to pin traffic). |
## What lives where (quick map)
| Concern | Project | Entry point |
|---|---|---|
| Read OTOPCUA_ROLES | `Cluster.RoleParser` | static `Parse(string?)` |
| Cluster lifecycle | `Cluster.WithOtOpcUaClusterBootstrap` | extension on `AkkaConfigurationBuilder` |
| Local node identity | `Cluster.IClusterRoleInfo.LocalNode` | DI singleton |
| Admin singletons | `ControlPlane.WithOtOpcUaControlPlaneSingletons` | extension on `AkkaConfigurationBuilder` |
| Driver actors | `Runtime.WithOtOpcUaRuntimeActors` | extension on `AkkaConfigurationBuilder` |
| Auth pipeline | `Security.AddOtOpcUaAuth` + `MapOtOpcUaAuth` | extensions on `IServiceCollection` / `IEndpointRouteBuilder` |
| OPC UA facade | `OpcUaServer.OpcUaApplicationHost` | runtime host, started by driver-role startup |
| Health endpoints | `Host.Health.AddOtOpcUaHealth` + `MapOtOpcUaHealth` | extensions on `IServiceCollection` / `IEndpointRouteBuilder` |
+102
View File
@@ -0,0 +1,102 @@
# OtOpcUa.Cluster
Akka.NET cluster bootstrap + topology view. Used by every other server-side project to talk to the live cluster.
Path: `src/Core/ZB.MOM.WW.OtOpcUa.Cluster/`
## Public surface
| Type | Role |
|---|---|
| `AkkaClusterOptions` | DI-bound options from `appsettings.json::Cluster`. Hostname/Port/PublicHostname/SeedNodes/Roles. |
| `IClusterRoleInfo` (interface in Commons) | Live view of cluster membership + role-leader topology. Thread-safe + event-raising. |
| `ClusterRoleInfo` | Implementation. Subscribes to `ClusterEvent.IMemberEvent` + `RoleLeaderChanged` + `LeaderChanged`. |
| `HoconLoader.LoadBaseConfig()` | Reads the embedded `Resources/akka.conf`. |
| `RoleParser.Parse(string?)` | Parses `OTOPCUA_ROLES` env var into a deduped `string[]`. |
| `ServiceCollectionExtensions.AddOtOpcUaCluster(configuration)` | Binds options + registers `IClusterRoleInfo` singleton. **Does not** start an ActorSystem. |
| `WithOtOpcUaClusterBootstrap(serviceProvider)` | Extension on `AkkaConfigurationBuilder`. Loads embedded HOCON + applies `WithRemoting(...)` + `WithClustering(...)` from options. |
## Bootstrap flow
```csharp
// Program.cs
builder.Services.AddOtOpcUaCluster(builder.Configuration);
builder.Services.AddAkka("otopcua", (ab, sp) =>
{
ab.WithOtOpcUaClusterBootstrap(sp); // HOCON + remote + cluster
// …singletons + node actors layered on
});
```
Order matters: `AddOtOpcUaCluster` must come before `AddAkka` so the options binding has run by the time the `AddAkka` lambda fires. Inside the lambda, `WithOtOpcUaClusterBootstrap` resolves `IOptions<AkkaClusterOptions>` from `sp` and writes them into the Akka builder.
The single ActorSystem this produces is what every other v2 piece runs on. There is no second Akka instance — that was a Phase 9 bug (commit `d6fac2d` consolidated).
## Embedded HOCON
`src/Core/ZB.MOM.WW.OtOpcUa.Cluster/Resources/akka.conf` contains:
| Setting | Value | Why |
|---|---|---|
| `akka.actor.provider` | `cluster` | Required for `Cluster.Get(system)` to work. |
| `akka.cluster.split-brain-resolver.active-strategy` | `keep-oldest` | Smaller/younger side downs itself on partition. |
| `akka.cluster.split-brain-resolver.stable-after` | `15s` | Time before SBR acts. |
| `akka.cluster.failure-detector.threshold` | `10.0` | Higher than default (8.0) for GC-pause tolerance. |
| `opcua-synchronized-dispatcher.type` | `PinnedDispatcher` | Dedicated thread for `OpcUaPublishActor` so SDK calls stay marshalled. |
The Cluster.Tests project verifies these key values stay correct (`HoconLoaderTests`).
## Configuration
```json
{
"Cluster": {
"Hostname": "0.0.0.0",
"Port": 4053,
"PublicHostname": "node-a.lan",
"SeedNodes": ["akka.tcp://otopcua@node-a.lan:4053"],
"Roles": ["admin", "driver"]
}
}
```
- `Hostname`: interface to bind. `0.0.0.0` listens on every interface.
- `Port`: TCP port for cluster gossip. Default 4053.
- `PublicHostname`: address advertised in cluster gossip. Must be reachable by every other node.
- `SeedNodes`: where new nodes go to join. List one (or two) stable nodes. First node bootstraps the cluster from its own address.
- `Roles`: free-form tags Akka gossip propagates. v2 uses `admin` + `driver`; per-role wiring in `Program.cs` reads `OTOPCUA_ROLES` env var, not this list — these two should stay in sync.
## IClusterRoleInfo
Anywhere in the host that needs the local node's identity or a view of who-else-is-in-the-cluster, inject `IClusterRoleInfo`:
```csharp
public sealed class MyService(IClusterRoleInfo cluster)
{
public NodeId Self => cluster.LocalNode;
public IReadOnlyList<NodeId> Drivers => cluster.MembersWithRole("driver");
public NodeId? AdminLeader => cluster.RoleLeader("admin");
public MyService(IClusterRoleInfo cluster)
{
cluster.RoleLeaderChanged += (_, e) =>
Console.WriteLine($"role={e.Role}: {e.PreviousLeader} → {e.NewLeader}");
}
}
```
`LocalNode` is `{PublicHostname}:{Port}` (the port suffix lets loopback test deployments stay distinct; production hostnames are already unique). `ConfigPublishCoordinator` uses the same `{host}:{port}` formula so the expected-ack set and the driver self-identification agree (commit `5cfbe8b`).
## Lifecycle
Akka.Hosting owns the lifecycle: `IHostedService` starts the ActorSystem at host start, runs `CoordinatedShutdown.ClusterLeavingReason` on host stop. The Cluster project does not register its own `IHostedService` (the v1 `AkkaHostedService` was deleted in commit `d6fac2d`).
## Tests
`tests/Core/ZB.MOM.WW.OtOpcUa.Cluster.Tests/` covers:
- `HoconLoaderTests` — embedded resource loads + key settings parse correctly.
- `RoleParserTests` — comma-split + dedup + trim semantics.
Cross-project integration is in `tests/Server/ZB.MOM.WW.OtOpcUa.Host.IntegrationTests/` (cluster formation, deploy round-trip).
+99
View File
@@ -0,0 +1,99 @@
# OtOpcUa.ControlPlane
Five admin-role cluster singletons that drive the v2 deploy, audit, fleet, and redundancy stories. Path: `src/Server/ZB.MOM.WW.OtOpcUa.ControlPlane/`.
## Singletons
| Actor | File | Marker key | Role |
|---|---|---|---|
| `ConfigPublishCoordinator` | `Coordinators/ConfigPublishCoordinator.cs` | `ConfigPublishCoordinatorKey` | Dispatches `DispatchDeployment`, collects `ApplyAck`s, seals/fails/times-out. |
| `AdminOperationsActor` | `AdminOperations/AdminOperationsActor.cs` | `AdminOperationsActorKey` | Receives `StartDeployment` from the UI, snapshots ConfigDb via `ConfigComposer`, persists `Deployment` row + `ConfigEdit` marker, tells the coordinator to dispatch. |
| `AuditWriterActor` | `Audit/AuditWriterActor.cs` | `AuditWriterActorKey` | Batched `ConfigAuditLog` writer. Flushes every 500 events or 5 s. In-buffer dedup; cross-restart dedup tracked as F3. |
| `FleetStatusBroadcaster` | `Fleet/FleetStatusBroadcaster.cs` | `FleetStatusBroadcasterKey` | Aggregates per-node `FleetNodeStatus` heartbeats; publishes `FleetStatusChanged` on the `fleet-status` DPS topic (SignalR bridge tracked as F16). |
| `RedundancyStateActor` | `Redundancy/RedundancyStateActor.cs` | `RedundancyStateActorKey` | Cluster-event subscriber; debounces 250 ms; publishes `RedundancyStateChanged` on the `redundancy-state` DPS topic. |
All five register via `WithOtOpcUaControlPlaneSingletons()` (extension on `AkkaConfigurationBuilder`). Each uses `ClusterSingletonOptions { Role = "admin" }` so the singleton runs on the admin role-leader and migrates to the next admin node on failover.
```csharp
// Program.cs (admin role only)
builder.Services.AddAkka("otopcua", (ab, sp) =>
{
ab.WithOtOpcUaClusterBootstrap(sp);
if (hasAdmin) ab.WithOtOpcUaControlPlaneSingletons();
if (hasDriver) ab.WithOtOpcUaRuntimeActors();
});
```
Resolve from anywhere via `IRequiredActor<T>` or the `ActorRegistry`:
```csharp
public sealed class AdminOperationsClient(ActorRegistry registry) : IAdminOperationsClient
{
private readonly IActorRef _proxy = registry.Get<AdminOperationsActorKey>();
// ...
}
```
## Deploy flow
```
UI → IAdminOperationsClient.StartDeploymentAsync(createdBy)
│ Ask the AdminOperationsActor singleton proxy
AdminOperationsActor
│ ConfigComposer.SnapshotAndFlattenAsync(db) → ConfigArtifact(blob, revHash)
│ insert Deployment(Dispatching) + ConfigEdit marker
│ Tell coordinator → DispatchDeployment
ConfigPublishCoordinator
│ DiscoverDriverNodes() → expected ACK set (host:port per member)
│ insert NodeDeploymentState(Applying) per driver
│ Publish DispatchDeployment on "deployments" topic
│ Start apply-deadline timer (2 min default)
▼ DistributedPubSub
DriverHostActor (on each driver node — subscribed to "deployments")
│ PreStart subscribed; current state Steady(rev)
│ if currentRev == msg.rev → immediate ApplyAck(Applied) (idempotent)
│ else Become(Applying) → write NodeDeploymentStatus → ApplyAck
▼ via "deployment-acks" topic
ConfigPublishCoordinator (subscribed to "deployment-acks" in PreStart)
│ PersistNodeAck + collect
│ all-Applied → Sealed
│ any-Failed → PartiallyFailed
│ deadline → TimedOut
```
The dedicated `deployment-acks` topic + coordinator subscription was added in commit `5cfbe8b`. Before that, ACKs were published back on `deployments` and the coordinator (not subscribed) silently dropped them — deployments hung at `AwaitingApplyAcks` forever in multi-node tests.
### Failover recovery
If the admin singleton fails over mid-deploy, the new instance's `PreStart` queries `NodeDeploymentState` for any `Dispatching`/`AwaitingApplyAcks` row, rebuilds `_expectedAcks` + `_acks` from persisted state, and resumes the deadline timer. See `Coordinators/ConfigPublishCoordinator.cs::PreStart`.
## ConfigComposer
Pure function `SnapshotAndFlattenAsync(db) → ConfigArtifact(byte[], string)`:
1. Reads every live-edit table.
2. Serialises to a stable byte[] (deterministic ordering).
3. Computes SHA-256 over the bytes → 64-hex `RevisionHash`.
Same DB state → same artifact + same hash. That's what makes the `NoChanges` outcome work (AdminOperations compares the proposed hash to the last sealed deployment's hash).
## ServiceLevelCalculator
Pure function exposed at `Redundancy/ServiceLevelCalculator.Compute(NodeHealthInputs)`. Returns the OPC UA `ServiceLevel` byte per the truth table in [Redundancy.md](../Redundancy.md#servicelevel-tiers-part-5-65). No side effects; trivially unit-testable.
## DPS topics
| Topic | Publisher | Subscribers |
|---|---|---|
| `deployments` | ConfigPublishCoordinator | DriverHostActor (per-node) |
| `deployment-acks` | DriverHostActor | ConfigPublishCoordinator |
| `fleet-status` | FleetStatusBroadcaster | (SignalR bridge — F16) |
| `redundancy-state` | RedundancyStateActor | (per-node ServiceLevel calc — F10) |
## Tests
`tests/Server/ZB.MOM.WW.OtOpcUa.ControlPlane.Tests/` — 29 tests covering coordinator (happy path, timeout, failover recovery), AdminOps (StartDeployment outcomes), AuditWriter (batching, dedup), FleetStatusBroadcaster (heartbeat staleness), RedundancyStateActor (debounce, snapshot), ConfigComposer (purity), ServiceLevelCalculator (truth table).
Multi-node tests (cross-ActorSystem) are in `tests/Server/ZB.MOM.WW.OtOpcUa.Host.IntegrationTests/`.
+6 -3
View File
@@ -151,8 +151,11 @@ substantive driver change, and revise this table when the data does.
`SubscriptionRegistry`, or a downstream consumer retaining
`DataValueSnapshot` references past their useful life.
## Scripted-alarm engine — known hot-path allocations
## Scripted-alarm engine — hot-path allocation reuse
`ScriptedAlarmEngine.BuildReadCache` allocates a fresh `Dictionary<string, DataValueSnapshot>` and `AlarmPredicateContext` on every predicate evaluation — i.e. once per upstream tag change per referencing alarm. On a busy line where many tags feeding many alarms change frequently, this is a steady stream of short-lived dictionary allocations on the hot path. (Core.ScriptedAlarms-009)
`ScriptedAlarmEngine` keeps a per-alarm reusable evaluation scratch in `_scratchByAlarmId` — the read-cache `Dictionary<string, DataValueSnapshot>` and the `AlarmPredicateContext` are allocated once per alarm (on first evaluation) and refilled in place across every subsequent predicate evaluation. The hot path no longer allocates a fresh dictionary + context per upstream tag change. (Core.ScriptedAlarms-009)
The allocations are deliberate for now: predicate evaluation is already serialised under `_evalGate`, so a single reused scratch dictionary would be safe, but the per-call dictionary keeps the evaluation surface immutable and trivially safe against future refactors. If a future scripted-alarm soak surfaces allocation pressure on this path, the mitigation is a per-alarm scratch buffer cleared between evaluations — note here before changing the engine.
Safety: reuse is serialised under `_evalGate`, so two threads can never observe the same scratch in a half-refilled state. The context wraps the read-cache by reference, so refilling the dictionary is what the predicate's `ctx.GetTag(path)` calls observe. `LoadAsync` clears `_scratchByAlarmId` alongside `_alarms` so a config-publish drops the prior generation's scratch (a new generation may carry different `Inputs` / `Logger`). Regression tests in `ScriptedAlarmEngineTests` lock the reuse contract:
- `Reevaluation_reuses_the_same_read_cache_dictionary` — asserts dictionary instance identity across two evaluations.
- `Reevaluation_reuses_the_same_predicate_context` — same, for the context.
- `LoadAsync_drops_the_prior_generations_scratch` — asserts a publish resets the scratch.
+126
View File
@@ -0,0 +1,126 @@
# OtOpcUa.Runtime
Driver-role actor tree — one set per node. Path: `src/Server/ZB.MOM.WW.OtOpcUa.Runtime/`.
## Actor tree
```
DriverHostActor (per node)
│ state machine: Steady ⇄ Applying ⇄ Stale
├──▶ DriverInstanceActor (per configured DriverInstance row)
│ state: Connecting → Connected → Reconnecting (or Stubbed)
├──▶ VirtualTagActor (per VirtualTag row)
│ compiles + evaluates expression, publishes derived value
├──▶ ScriptedAlarmActor (per ScriptedAlarm row)
│ state: Inactive ⇄ Active ⇄ Acknowledged
├──▶ OpcUaPublishActor (per node, pinned dispatcher)
│ marshalled OPC UA SDK writes + RebuildAddressSpace
├──▶ HistorianAdapterActor (per node)
│ pipe IPC to Wonderware historian sidecar
├──▶ PeerOpcUaProbeActor (per peer node)
│ opc.tcp ping → redundancy-state DPS topic
└──▶ DbHealthProbeActor (per node)
cached SELECT 1; consumed by /health/ready + redundancy calc
```
## Public surface
| Type | File |
|---|---|
| `WithOtOpcUaRuntimeActors()` | `ServiceCollectionExtensions.cs` — extension on `AkkaConfigurationBuilder`. Spawns `DriverHostActor` + `DbHealthProbeActor` on the host's ActorSystem. |
| `DriverHostActor` | `Drivers/DriverHostActor.cs` |
| `DriverInstanceActor` | `Drivers/DriverInstanceActor.cs` |
| `VirtualTagActor` | `VirtualTags/VirtualTagActor.cs` |
| `ScriptedAlarmActor` | `ScriptedAlarms/ScriptedAlarmActor.cs` |
| `OpcUaPublishActor` | `OpcUa/OpcUaPublishActor.cs` |
| `HistorianAdapterActor` | `Historian/HistorianAdapterActor.cs` |
| `PeerOpcUaProbeActor` | `Health/PeerOpcUaProbeActor.cs` |
| `DbHealthProbeActor` | `Health/DbHealthProbeActor.cs` |
Marker keys for registry lookup: `DriverHostActorKey`, `DbHealthProbeActorKey`.
## DriverHostActor
Per-node supervisor with three Become states:
| State | Meaning |
|---|---|
| `Steady(rev)` | Caught up. `DispatchDeployment` with `msg.rev == currentRev` → immediate `ApplyAck(Applied)` (idempotent). New rev → `Become(Applying)`. |
| `Applying(id)` | Apply in progress. Further `DispatchDeployment` for in-flight ID → debug-log + ignore. For new ID → defer via `Self.Forward`. |
| `Stale` | ConfigDb unreachable on bootstrap. Periodic `RetryConfigDbConnection` tries to advance to `Steady`. |
`PreStart`:
1. Subscribe to `deployments` DPS topic.
2. Read most-recent `NodeDeploymentState` for this node from ConfigDb.
3. If `Applied` → restore `_currentRevision`, `Become(Steady)`.
4. If `Applying` (orphan from crash) → replay apply (idempotent).
5. If `Failed``Become(Steady)` at last known rev.
6. DB unreachable → `Become(Stale)`, start retry timer.
ACK publishing: when no `_coordinatorOverride` is set (production), `SendAck` publishes on the dedicated `deployment-acks` DPS topic which the coordinator subscribes to (commit `5cfbe8b`).
## DriverInstanceActor
Per-driver-instance child. State machine:
- `Connecting` → first attempt to reach the underlying driver
- `Connected` → subscriptions active, reads/writes flow
- `Reconnecting` → temporary disconnect; backoff retry
- `Stubbed` → DEV-STUB mode for Windows-only drivers (Galaxy, Wonderware Historian) on non-Windows or when `roles` contains `dev`
`ShouldStub(driverType, roles)` returns `true` for `"Galaxy" | "Historian.Wonderware"` on non-Windows; the actor goes straight to `Stubbed` and returns deterministic success without touching real hardware. Wiring this into the DriverHost child-spawn path is follow-up F20 (folds into F7).
Engine wiring (subscription publishing, ApplyDelta diff, bad-quality-on-disconnect, write path, supervisor backoff) is stubbed — tracked as F7. Tests exercise message contracts, not engine behaviour.
## VirtualTagActor / ScriptedAlarmActor
Skeleton state machines + message handlers. Engine work:
- `VirtualTagEngine.Evaluate()` not yet called from `VirtualTagActor.DependencyValueChanged` (F8).
- `AlarmConditionService` not yet called from `ScriptedAlarmActor` (F9).
- `ScriptedAlarmState` DB persistence on `PreRestart` not wired (F9).
## OpcUaPublishActor
The only actor on the **pinned dispatcher** (`opcua-synchronized-dispatcher` from `akka.conf`). All OPC UA SDK address-space writes go through it so the SDK's threading model isn't violated.
Message contracts are defined; actual SDK calls are stubbed (counters only). Real address-space writes + `ServiceLevel` Variable updates + `RebuildAddressSpace` after a deploy land in F10 (gated on F13 — full `OpcUaApplicationHost` extraction).
## HistorianAdapterActor, PeerOpcUaProbeActor
Both have message contracts wired. Engine integration deferred:
- `HistorianAdapterActor` — named-pipe IPC to the Wonderware historian sidecar + `SqliteStoreAndForwardSink` (F11).
- `PeerOpcUaProbeActor` — real `opc.tcp://peer:4840` ping (F12). Current stub always returns `Ok=true`.
## DbHealthProbeActor
`Ask<DbHealthStatus>` returns cached state (refreshed every 5 s by an internal `SELECT 1`). Consumed by `/health/ready` and `RedundancyStateActor`.
## Lifecycle wiring
```csharp
// Program.cs (driver role only)
builder.Services.AddAkka("otopcua", (ab, sp) =>
{
ab.WithOtOpcUaClusterBootstrap(sp);
if (hasAdmin) ab.WithOtOpcUaControlPlaneSingletons();
if (hasDriver) ab.WithOtOpcUaRuntimeActors();
});
```
`WithOtOpcUaRuntimeActors` resolves `IDbContextFactory<OtOpcUaConfigDbContext>` + `IClusterRoleInfo` from DI, then spawns `DbHealthProbeActor` and `DriverHostActor` as top-level `/user/` actors. Both register marker keys in `ActorRegistry` so the registry lookup works from anywhere.
## Tests
`tests/Server/ZB.MOM.WW.OtOpcUa.Runtime.Tests/` — 16 tests covering DriverHostActor (Steady ack, Applying transitions, Stale recovery), DriverInstanceActor (state machine, stub mode), VirtualTagActor + ScriptedAlarmActor (message contracts), OpcUaPublishActor (props + message acceptance), DbHealthProbe + PeerOpcUaProbe (probe loop), and the `WithOtOpcUaRuntimeActors` registration round-trip.
End-to-end deploy from admin → driver via the cluster is in `tests/Server/ZB.MOM.WW.OtOpcUa.Host.IntegrationTests/DeployHappyPathTests.cs`.
+9
View File
@@ -43,6 +43,15 @@ that a naive Modbus client will byte-swap [1][2].
really "read 10 consecutive holding registers starting at the Modbus address
that V2000 translates to (see next section), unpack each register low-byte
then high-byte, stop at the first `0x00`."
- **Grammar scope** (Driver.Modbus.Addressing-007): the
`ModbusStringByteOrder` knob (HighByteFirst / LowByteFirst) is **not**
expressible through the `ModbusAddressParser` grammar string — the 3rd grammar
field is the multi-register word/byte order (ABCD/CDAB/BADC/DCBA) and the 4th
is the array count, so there is no token slot for the per-string byte order.
Tags that need low-byte-first packing on DL205 must set
`ModbusTagDefinition.StringByteOrder = LowByteFirst` via the structured tag
form (the driver config DTO). The grammar default produces high-byte-first
strings (matches Ignition / Kepware default behaviour).
Test names:
`DL205_String_low_byte_first_within_register`,
@@ -29,7 +29,7 @@ Tie-in capability — **historian alarm sink**:
| 3 | Evaluation trigger = **change-driven + timer-driven**; operator chooses per-tag | Change-driven is cheap at steady state; timer is the escape hatch for polling derivations that don't have a discrete "input changed" signal. |
| 4 | Script shape = **Shape A — one script per virtual tag/alarm**; `return` produces the value (or `bool` for alarm condition) | Minimal surface; no predicate/action split. Alarm side-effects (severity, message) configured out-of-band, not in the script. |
| 5 | Alarm fidelity = **full OPC UA Part 9** | Uniform with Galaxy + ALMD on the wire; client-side tooling (HMIs, historians, event pipelines) gets one shape. |
| 6 | Sandbox = **read-only context**; scripts can only read any tag + write to virtual tags | Strict Roslyn `ScriptOptions` allow-list. Authoritative deny-list (`ForbiddenTypeAnalyzer`): namespace-prefix deny `System.IO`, `System.Net`, `System.Diagnostics`, `System.Reflection`, `System.Threading.Tasks` (Task / Parallel fan-out — Core.Scripting-003), `System.Runtime.InteropServices`, `Microsoft.Win32`; type-granular deny `System.Environment`, `System.AppDomain`, `System.GC`, `System.Activator`, `System.Threading.Thread` (these live directly in the allow-listed `System` / `System.Threading` namespaces, so a prefix rule cannot reach them without blocking primitives — Core.Scripting-001 / -009). |
| 6 | Sandbox = **read-only context**; scripts can only read any tag + write to virtual tags | Strict Roslyn `ScriptOptions` allow-list. Authoritative deny-list (`ForbiddenTypeAnalyzer`): namespace-prefix deny `System.IO`, `System.Net`, `System.Diagnostics`, `System.Reflection`, `System.Threading.Tasks` (Task / Parallel fan-out — Core.Scripting-003), `System.Runtime.InteropServices`, `System.Runtime.Loader` (AssemblyLoadContext et al. — Core.Scripting-012), `Microsoft.Win32`; type-granular deny `System.Environment`, `System.AppDomain`, `System.GC`, `System.Activator`, `System.Threading.Thread`, `System.Threading.ThreadPool` (Core.Scripting-012), `System.Threading.Timer` (Core.Scripting-012) (these live directly in the allow-listed `System` / `System.Threading` namespaces, so a prefix rule cannot reach them without blocking primitives — Core.Scripting-001 / -009 / -012). |
| 7 | Dependency declaration = **AST inference**; operator doesn't maintain a separate dependency list | `CSharpSyntaxWalker` extracts `ctx.GetTag("path")` string-literal calls at compile time; dynamic paths rejected at publish. |
| 8 | Config storage = **config DB with generation-sealed cache** (same as driver instances) | Virtual tags + alarms publish atomically in the same generation as the driver instance config they may depend on. |
| 9 | Script return value shape (`ctx.GetTag`) = **`DataValue { Value, StatusCode, Timestamp }`** | Scripts branch on quality naturally without separate `ctx.GetQuality(...)` calls. |
@@ -89,7 +89,7 @@ Tie-in capability — **historian alarm sink**:
### Stream A — `Core.Scripting` (Roslyn engine + sandbox + AST inference + logger) — **2 weeks**
1. **A.1** Project scaffold + NuGet `Microsoft.CodeAnalysis.CSharp.Scripting`. `ScriptOptions` allow-list (`typeof(object).Assembly`, `typeof(Enumerable).Assembly`, the Core.Scripting assembly itself — nothing else). Hand-written `ScriptContext` base class with `GetTag(string)` / `SetVirtualTag(string, object)` / `Logger` / `Now` / `Deadband(double, double, double)` helpers.
1. **A.1** Project scaffold + NuGet `Microsoft.CodeAnalysis.CSharp.Scripting`. `ScriptOptions` allow-list (`typeof(object).Assembly`, `typeof(Enumerable).Assembly`, the Core.Scripting assembly itself — nothing else). Hand-written `ScriptContext` base class with `GetTag(string)` / `SetVirtualTag(string, object)` / `Logger` / `Now` / `Deadband(double, double, double)` helpers. _(Implementation note 2026-05-23 — superseded by Core.Scripting-008 / -016: the `CSharpScript`/`ScriptRunner` path was replaced with a hand-rolled `CSharpCompilation.Create``Emit(MemoryStream)` → collectible `ScriptAssemblyLoadContext.LoadFromStream` pipeline so per-publish ALC accretion is reclaimable, and engines route compiles through `CompiledScriptCache` rather than calling `ScriptEvaluator.Compile` directly. The reference list was correspondingly widened from the narrow allow-list above to the full BCL `TRUSTED_PLATFORM_ASSEMBLIES` set (filtered to `System.*` + `netstandard` + `Microsoft.Win32.Registry`) because the new pipeline can't compile against the old narrow set; `ForbiddenTypeAnalyzer` is now the sole security gate, consistent with how Core.Scripting-001 / -002 established the analyzer must be the real boundary because type forwarding makes any references-list-only restriction porous. See `docs/VirtualTags.md` "Compile cache" for the current implementation contract.)_
2. **A.2** `DependencyExtractor : CSharpSyntaxWalker`. Visits every `InvocationExpressionSyntax` targeting `ctx.GetTag` / `ctx.SetVirtualTag`; accepts only a `LiteralExpressionSyntax` argument. Non-literal arguments (concat, variable, method call) → publish-time rejection with an actionable error pointing the operator at the exact span. Outputs `IReadOnlySet<string> Inputs` + `IReadOnlySet<string> Outputs`.
3. **A.3** Compile cache. `(source_hash) → compiled Script<T>`. Recompile only when source changes. Warm on `SealedBootstrap`.
4. **A.4** Per-evaluation timeout wrapper (default 250ms; configurable per tag). Timeout = tag quality `BadInternalError` + structured warning log. Keeps a single runaway script from starving the engine.
@@ -162,7 +162,7 @@ Tie-in capability — **historian alarm sink**:
## Compliance Checks (run at exit gate)
- [ ] **Sandbox escape**: attempts to reference any deny-listed namespace prefix (`System.IO`, `System.Net`, `System.Diagnostics`, `System.Reflection`, `System.Threading.Tasks`, `System.Runtime.InteropServices`, `Microsoft.Win32`) or any of the type-granular forbidden types (`System.Environment`, `System.AppDomain`, `System.GC`, `System.Activator`, `System.Threading.Thread`) fail at script compile with an actionable error. Vectors include direct calls, `typeof(T)`, generic type arguments, casts, `is`/`as` patterns, `default(T)`, array element types, and explicitly-typed local declarations.
- [ ] **Sandbox escape**: attempts to reference any deny-listed namespace prefix (`System.IO`, `System.Net`, `System.Diagnostics`, `System.Reflection`, `System.Threading.Tasks`, `System.Runtime.InteropServices`, `System.Runtime.Loader`, `Microsoft.Win32`) or any of the type-granular forbidden types (`System.Environment`, `System.AppDomain`, `System.GC`, `System.Activator`, `System.Threading.Thread`, `System.Threading.ThreadPool`, `System.Threading.Timer`) fail at script compile with an actionable error. Vectors include direct calls, `typeof(T)`, generic type arguments, casts, `is`/`as` patterns, `default(T)`, array element types, and explicitly-typed local declarations.
- [ ] **Dependency inference**: `ctx.GetTag(myStringVar)` (non-literal path) is rejected at publish with a span-pointed error; `ctx.GetTag("Line1/Speed")` is accepted + appears in the inferred input set.
- [ ] **Change cascade**: tag A → virtual tag B → virtual tag C. When A changes, B recomputes, then C recomputes. Single change event triggers the full cascade in topological order within one evaluation pass.
- [ ] **Cycle rejection**: publish a config where virtual tag B depends on A and A depends on B. Publish fails pre-commit with a clear cycle message.
+2 -2
View File
@@ -193,7 +193,7 @@ ConfigurationService
- Compact binary format, faster than JSON, good fit for high-frequency data change callbacks
- Simpler than gRPC on .NET 4.8 (which needs legacy `Grpc.Core` native library)
**Decided: Galaxy Host is a separate Windows service.**
**Decided: Galaxy Host is a separate Windows service.** _(Reversed by PR 7.2, 2026-04-30 — see PR 7.2's commit `ae7106d` and the project_galaxy_via_mxgateway memory entry. The legacy in-process `Galaxy.Host` / `Galaxy.Proxy` / `Galaxy.Shared` projects + the `OtOpcUaGalaxyHost` Windows service were retired; Galaxy access now flows through the in-process Tier-A `GalaxyDriver` talking gRPC to a separately installed `mxaccessgw` gateway sibling repo. The reasoning below was correct for the original LMX/x86-COM architecture; the gateway sibling repo now owns those constraints externally.)_
- Independent lifecycle from the OtOpcUa Server
- Can be restarted without affecting the main server or other drivers
- Galaxy.Proxy detects connection loss, sets Bad quality on Galaxy nodes, reconnects when Host comes back
@@ -801,7 +801,7 @@ aggregate runner (#253); server-side factory + seed SQL per driver (#210#213)
| 26 | Admin deploys on same server (co-hosted) | Simplifies deployment; can also run on separate management host | 2026-04-16 |
| 27 | Admin scaffold early, driver-specific screens deferred | Core CRUD for instances/drivers first; per-driver config UI added with each driver | 2026-04-16 |
| 28 | Named pipes for Galaxy IPC | Fast, no port conflicts, native to both .NET 4.8 and .NET 10 | 2026-04-16 |
| 29 | Galaxy Host is a separate Windows service | Independent lifecycle, can restart without affecting main server or other drivers | 2026-04-16 |
| 29 | Galaxy Host is a separate Windows service | Independent lifecycle, can restart without affecting main server or other drivers | 2026-04-16 (**reversed PR 7.2, 2026-04-30** — Galaxy is now an in-process Tier-A driver talking gRPC to the sibling `mxaccessgw` gateway; see the decision body above) |
| 30 | Drop TopShelf, use Microsoft.Extensions.Hosting | Built-in Windows Service support in .NET 10, no third-party dependency | 2026-04-16 |
| 31 | Mono-repo for all drivers | Simpler dependency management, single CI pipeline, shared abstractions | 2026-04-16 |
| 32 | MessagePack serialization for Galaxy IPC | Binary, fast, works on .NET 4.8+ and .NET 10 via MessagePack-CSharp NuGet | 2026-04-16 |
+1 -1
View File
@@ -67,7 +67,7 @@ Remaining Phase 6.3 surfaces (hardening, not release-blocking):
- **AB CIP** (PRs #202222) — `Driver.AbCip`, `Driver.AbCip.IntegrationTests` (7 tests), AB CIP Cli. Live-boot verified against a ControlLogix rig.
- **AB Legacy** (PRs #202, #223) — `Driver.AbLegacy`, `Driver.AbLegacy.IntegrationTests` (2 tests), AB Legacy Cli. PCCC cip-path workaround for SLC/MicroLogix.
- **TwinCAT ADS** (PRs #205, this branch `task-galaxy-e2e`) — `Driver.TwinCAT`, `Driver.TwinCAT.IntegrationTests` (2 tests), TwinCAT Cli. TCBSD/ESXi fixture for e2e since local Hyper-V / TwinCAT RTIME are mutually exclusive on the dev box.
- **TwinCAT ADS** (PR #205) — `Driver.TwinCAT`, `Driver.TwinCAT.IntegrationTests` (2 tests), TwinCAT Cli. TCBSD/ESXi fixture for e2e since local Hyper-V / TwinCAT RTIME are mutually exclusive on the dev box.
- **FOCAS** (PRs #173, #199 + this session's migration) — `Driver.FOCAS` with an **in-process managed `FocasWireClient`** that speaks FOCAS/2 over TCP directly. Tier-C isolation retired — `Driver.FOCAS.Host` + `Driver.FOCAS.Shared` + `FwlibNative` P/Invoke + shim DLL + NSSM service all deleted. `Driver.FOCAS.IntegrationTests` covers 9 scenarios (fixed tree identity/axes/program/timers/spindle + user-authored PARAM/MACRO/PMC reads, Browse, Subscribe, IAlarmSource raise/clear, Probe transitions).
Decision recorded: FOCAS is **read-only** against the CNC by design — writes return `BadNotWritable`. See `docs/drivers/FOCAS.md` + `docs/drivers/FOCAS-Test-Fixture.md` for the deployment + coverage map.
+85 -51
View File
@@ -1,46 +1,63 @@
<#
.SYNOPSIS
Registers the v2 Windows services on a node: OtOpcUa (main server, net10) and
optionally OtOpcUaWonderwareHistorian (Wonderware historian sidecar).
Registers the v2 Windows service on a node: OtOpcUaHost (fused binary, .NET 10)
and optionally OtOpcUaWonderwareHistorian (Wonderware historian sidecar, net48 x86).
.DESCRIPTION
PR 7.2 retired the legacy out-of-process OtOpcUaGalaxyHost service alongside the
GalaxyProxyDriver / GalaxyHost / GalaxyShared projects. Galaxy access now flows
through the in-process GalaxyDriver talking gRPC to a separately-installed
mxaccessgw. The mxaccessgw server runs out of its own repo
(`c:\Users\dohertj2\Desktop\mxaccessgw\`) see
`docs/v2/Galaxy.ParityRig.md` for the gw setup recipe.
v2 consolidates the legacy OtOpcUa + OtOpcUaAdmin services into a single role-gated
OtOpcUaHost binary. The -Roles parameter sets the OTOPCUA_ROLES service env so
Program.cs decides what to mount (admin / driver / both). The Wonderware historian
sidecar logic is unchanged from v1; install it with -InstallWonderwareHistorian.
Galaxy access flows through the mxaccessgw sibling repo (separate service); see
docs/v2/Galaxy.ParityRig.md for the gateway setup.
.PARAMETER InstallRoot
Where the binaries live (typically C:\Program Files\OtOpcUa).
Where the binaries live (typically C:\Program Files\OtOpcUa). The OtOpcUaHost
service runs OtOpcUa.Host.exe from this directory; publish the Host project there
with `dotnet publish -c Release -r win-x64 --self-contained` first.
.PARAMETER ServiceAccount
Service account SID or DOMAIN\name. The OtOpcUa service runs under this account.
Service account SID or DOMAIN\name. The OtOpcUaHost service runs under this account.
.PARAMETER Roles
Comma-separated cluster roles for this node. One of:
- "admin,driver" single-node dev or all-in-one production node
- "admin" admin-only HA pair member (Blazor + control-plane singletons)
- "driver" driver-only node (OPC UA endpoint + per-node actors)
Written to the service env as OTOPCUA_ROLES.
.PARAMETER HttpPort
HTTP port for the AdminUI + auth endpoints. Default 9000. Written as ASPNETCORE_URLS.
Ignored on driver-only nodes (no Blazor surface).
.PARAMETER InstallWonderwareHistorian
Gate the OtOpcUaWonderwareHistorian sidecar install. Off by default; set when
the deployment uses the Wonderware historian for history reads + alarm-event
persistence.
Gate the OtOpcUaWonderwareHistorian sidecar install. Off by default; set when the
deployment uses the Wonderware historian for history reads + alarm-event persistence.
.PARAMETER HistorianSharedSecret
Per-process secret passed to the Historian sidecar via env var. Generated
freshly per install when not supplied.
Per-process secret passed to the historian sidecar via env var. Generated freshly
per install when not supplied.
.EXAMPLE
.\Install-Services.ps1 -InstallRoot 'C:\Program Files\OtOpcUa' -ServiceAccount 'OTOPCUA\svc-otopcua'
.\Install-Services.ps1 -InstallRoot 'C:\Program Files\OtOpcUa' `
-ServiceAccount 'OTOPCUA\svc-otopcua' -Roles 'admin,driver'
.EXAMPLE
.\Install-Services.ps1 -InstallRoot 'C:\Program Files\OtOpcUa' -ServiceAccount 'OTOPCUA\svc-otopcua' `
.\Install-Services.ps1 -InstallRoot 'C:\Program Files\OtOpcUa' `
-ServiceAccount 'OTOPCUA\svc-otopcua' -Roles 'driver' `
-InstallWonderwareHistorian
#>
[CmdletBinding()]
param(
[Parameter(Mandatory)] [string]$InstallRoot,
[Parameter(Mandatory)] [string]$ServiceAccount,
[Parameter(Mandatory)] [ValidateSet('admin', 'driver', 'admin,driver', 'driver,admin')]
[string]$Roles,
[int]$HttpPort = 9000,
# PR 3.W — Wonderware historian sidecar. Optional; gates the
# OtOpcUaWonderwareHistorian service. Secret + pipe defaults match the server's
# Historian:Wonderware appsettings block.
# Wonderware historian sidecar. Optional; gates the OtOpcUaWonderwareHistorian
# service. Secret + pipe defaults match the server's Historian:Wonderware appsettings.
[switch]$InstallWonderwareHistorian,
[string]$HistorianSharedSecret,
[string]$HistorianPipeName = 'OtOpcUaWonderwareHistorian',
@@ -51,18 +68,19 @@ param(
$ErrorActionPreference = 'Stop'
if (-not (Test-Path "$InstallRoot\OtOpcUa.Server.exe")) {
Write-Error "OtOpcUa.Server.exe not found at $InstallRoot — copy the publish output first"
if (-not (Test-Path "$InstallRoot\OtOpcUa.Host.exe")) {
Write-Error "OtOpcUa.Host.exe not found at $InstallRoot — copy the publish output first"
exit 1
}
# Generate fresh shared secrets per install if not supplied.
function New-SharedSecret {
$bytes = New-Object byte[] 32
[System.Security.Cryptography.RandomNumberGenerator]::Create().GetBytes($bytes)
return [Convert]::ToBase64String($bytes)
}
if ($InstallWonderwareHistorian -and -not $HistorianSharedSecret) { $HistorianSharedSecret = New-SharedSecret }
if ($InstallWonderwareHistorian -and -not $HistorianSharedSecret) {
$HistorianSharedSecret = New-SharedSecret
}
if ($InstallWonderwareHistorian -and -not (Test-Path "$InstallRoot\WonderwareHistorian\OtOpcUa.Driver.Historian.Wonderware.exe")) {
Write-Error "OtOpcUa.Driver.Historian.Wonderware.exe not found at $InstallRoot\WonderwareHistorian — copy the publish output first"
@@ -76,10 +94,7 @@ $sid = if ($ServiceAccount.StartsWith('S-1-')) {
(New-Object System.Security.Principal.NTAccount $ServiceAccount).Translate([System.Security.Principal.SecurityIdentifier]).Value
}
# --- Install OtOpcUaWonderwareHistorian (PR 3.W) — separate sidecar that exposes the
# Wonderware Historian SDK via a named-pipe protocol consumed by the .NET 10 server.
# Optional: only installed when -InstallWonderwareHistorian is supplied. Depends on the
# hard AVEVA services that host the historian SDK runtime path.
# --- OtOpcUaWonderwareHistorian sidecar (optional, unchanged from v1) -------
$historianDepend = $null
if ($InstallWonderwareHistorian) {
$historianEnv = @(
@@ -87,14 +102,10 @@ if ($InstallWonderwareHistorian) {
"OTOPCUA_ALLOWED_SID=$sid"
"OTOPCUA_HISTORIAN_SECRET=$HistorianSharedSecret"
"OTOPCUA_HISTORIAN_ENABLED=true"
# Default-on when the historian sidecar is installed; flip to false for a
# read-only deployment that still loads aahClientManaged for reads but
# rejects WriteAlarmEvents frames.
"OTOPCUA_HISTORIAN_ALARM_WRITE_ENABLED=true"
"OTOPCUA_HISTORIAN_SERVER=$HistorianServer"
"OTOPCUA_HISTORIAN_PORT=$HistorianPort"
) -join "`0"
$historianEnv += "`0`0"
)
Write-Host "Installing OtOpcUaWonderwareHistorian..."
& sc.exe create OtOpcUaWonderwareHistorian binPath= "`"$InstallRoot\WonderwareHistorian\OtOpcUa.Driver.Historian.Wonderware.exe`"" `
@@ -105,36 +116,59 @@ if ($InstallWonderwareHistorian) {
& sc.exe config OtOpcUaWonderwareHistorian start= delayed-auto | Out-Null
$svcKey = "HKLM:\SYSTEM\CurrentControlSet\Services\OtOpcUaWonderwareHistorian"
$envValue = $historianEnv.Split("`0") | Where-Object { $_ -ne '' }
Set-ItemProperty -Path $svcKey -Name 'Environment' -Type MultiString -Value $envValue
Set-ItemProperty -Path $svcKey -Name 'Environment' -Type MultiString -Value $historianEnv
& sc.exe failure OtOpcUaWonderwareHistorian reset= 86400 actions= restart/5000/restart/30000/restart/60000 | Out-Null
$historianDepend = 'OtOpcUaWonderwareHistorian'
}
# --- Install OtOpcUa. Galaxy access flows through GalaxyDriver → mxaccessgw (gRPC),
# so OtOpcUa no longer depends on a sibling service for Galaxy connectivity. The
# mxaccessgw is installed separately. When the Wonderware sidecar is installed,
# depend on it for startup ordering.
$otOpcUaDepends = @()
if ($historianDepend) { $otOpcUaDepends += $historianDepend }
# --- OtOpcUaHost (the fused v2 binary) --------------------------------------
$normalisedRoles = ($Roles -split ',' | ForEach-Object { $_.Trim() } | Sort-Object -Unique) -join ','
Write-Host "Installing OtOpcUa..."
$hasAdmin = $normalisedRoles -split ',' -contains 'admin'
$hostEnv = @(
"OTOPCUA_ROLES=$normalisedRoles",
'DOTNET_ENVIRONMENT=Production'
)
if ($hasAdmin) {
$hostEnv += "ASPNETCORE_URLS=http://+:$HttpPort"
}
$hostDepends = @()
if ($historianDepend) { $hostDepends += $historianDepend }
Write-Host "Installing OtOpcUaHost (roles=$normalisedRoles)..."
$createArgs = @(
'create', 'OtOpcUa',
'binPath=', "`"$InstallRoot\OtOpcUa.Server.exe`"",
'DisplayName=', 'OtOpcUa Server',
'create', 'OtOpcUaHost',
'binPath=', "`"$InstallRoot\OtOpcUa.Host.exe`"",
'DisplayName=', "OtOpcUa Host ($normalisedRoles)",
'start=', 'auto',
'obj=', $ServiceAccount
)
if ($otOpcUaDepends.Count -gt 0) {
$createArgs += @('depend=', ($otOpcUaDepends -join '/'))
if ($hostDepends.Count -gt 0) {
$createArgs += @('depend=', ($hostDepends -join '/'))
}
& sc.exe @createArgs | Out-Null
# Env block via registry MultiString (sc.exe doesn't take env directly).
$svcKey = "HKLM:\SYSTEM\CurrentControlSet\Services\OtOpcUaHost"
Set-ItemProperty -Path $svcKey -Name 'Environment' -Type MultiString -Value $hostEnv
# Restart-on-failure: 5s, 30s, 60s; reset counter after a clean 24h run.
& sc.exe failure OtOpcUaHost reset= 86400 actions= restart/5000/restart/30000/restart/60000 | Out-Null
Write-Host ""
Write-Host "Installed. Start with:"
Write-Host "Installed OtOpcUaHost:"
Write-Host " Roles: $normalisedRoles"
if ($hasAdmin) { Write-Host " HTTP port: $HttpPort" }
Write-Host " Binary: $InstallRoot\OtOpcUa.Host.exe"
Write-Host " Account: $ServiceAccount"
Write-Host ""
Write-Host "Start with:"
if ($InstallWonderwareHistorian) { Write-Host " sc.exe start OtOpcUaWonderwareHistorian" }
Write-Host " sc.exe start OtOpcUa"
Write-Host " sc.exe start OtOpcUaHost"
if ($InstallWonderwareHistorian) {
Write-Host ""
Write-Host "Wonderware historian shared secret (configure into appsettings.json Historian:Wonderware:SharedSecret):"
@@ -142,5 +176,5 @@ if ($InstallWonderwareHistorian) {
}
Write-Host ""
Write-Host "NOTE: Galaxy access flows through mxaccessgw — install + run that separately"
Write-Host " per docs/v2/Galaxy.ParityRig.md. OtOpcUa connects via the Galaxy.Gateway"
Write-Host " section of appsettings.json (default endpoint http://localhost:5120)."
Write-Host " per docs/v2/Galaxy.ParityRig.md. OtOpcUaHost connects via the"
Write-Host " Galaxy.Gateway section of appsettings.json (default http://localhost:5120)."
+68
View File
@@ -0,0 +1,68 @@
<#
.SYNOPSIS
Installs Traefik as a Windows service that routes admin HTTP traffic to whichever
OtOpcUa.Host node holds the admin role-leader (via /health/active).
.DESCRIPTION
Downloads the Traefik Windows binary into $InstallRoot, drops traefik.yml +
traefik-dynamic.yml from this directory next to it, and registers Traefik as a
Windows service via sc.exe with restart-on-failure.
Companion to Install-Services.ps1. Run on the box that fronts the admin HTTP
traffic (typically a separate node from OtOpcUaHost, or co-located on the
primary admin node).
.PARAMETER InstallRoot
Where the Traefik binary + config land. Default 'C:\Program Files\Traefik'.
.PARAMETER TraefikVersion
Traefik version to download. Default 'v3.1.6'.
.EXAMPLE
.\Install-Traefik.ps1 -InstallRoot 'C:\Program Files\Traefik'
#>
[CmdletBinding()]
param(
[string]$InstallRoot = 'C:\Program Files\Traefik',
[string]$TraefikVersion = 'v3.1.6'
)
$ErrorActionPreference = 'Stop'
if (-not (Test-Path $InstallRoot)) {
New-Item -ItemType Directory -Path $InstallRoot | Out-Null
}
$zip = Join-Path $env:TEMP "traefik-$TraefikVersion.zip"
$url = "https://github.com/traefik/traefik/releases/download/$TraefikVersion/traefik_${TraefikVersion}_windows_amd64.zip"
Write-Host "Downloading Traefik $TraefikVersion..."
Invoke-WebRequest -Uri $url -OutFile $zip
Expand-Archive -Path $zip -DestinationPath $InstallRoot -Force
Remove-Item $zip
$scriptDir = Split-Path -Parent $MyInvocation.MyCommand.Path
Copy-Item -Force (Join-Path $scriptDir 'traefik.yml') $InstallRoot
Copy-Item -Force (Join-Path $scriptDir 'traefik-dynamic.yml') (Join-Path $InstallRoot 'dynamic.yml')
# Traefik reads dynamic.yml from /etc/traefik on Linux; on Windows place it next to the
# binary and point the file provider at it. Edit traefik.yml's `filename:` if you want
# to change the location.
(Get-Content -Raw (Join-Path $InstallRoot 'traefik.yml')) `
-replace '/etc/traefik/dynamic.yml', (Join-Path $InstallRoot 'dynamic.yml').Replace('\', '/') `
| Set-Content (Join-Path $InstallRoot 'traefik.yml')
Write-Host "Installing Traefik Windows service..."
& sc.exe create OtOpcUaTraefik binPath= "`"$InstallRoot\traefik.exe`" --configFile=`"$InstallRoot\traefik.yml`"" `
DisplayName= 'OtOpcUa Traefik (admin HTTP front door)' `
start= auto | Out-Null
& sc.exe failure OtOpcUaTraefik reset= 86400 actions= restart/5000/restart/30000/restart/60000 | Out-Null
Write-Host ""
Write-Host "Installed OtOpcUaTraefik. Edit:"
Write-Host " $InstallRoot\dynamic.yml (router + service definitions)"
Write-Host "Start with:"
Write-Host " sc.exe start OtOpcUaTraefik"
Write-Host ""
Write-Host "Traefik dashboard: http://localhost:8080 (turn off api.insecure in production)"
+11 -11
View File
@@ -43,11 +43,11 @@ function Test-NssmService([string]$Name) {
# Step 1: Stop in reverse dependency order
# ------------------------------------------------------------------------
Step "Stopping services (OtOpcUa OtOpcUaWonderwareHistorian MxAccessGw)"
Step "Stopping services (OtOpcUaHost > OtOpcUaWonderwareHistorian > MxAccessGw)"
foreach ($name in @('OtOpcUa', 'OtOpcUaWonderwareHistorian', 'MxAccessGw')) {
foreach ($name in @('OtOpcUaHost', 'OtOpcUaWonderwareHistorian', 'MxAccessGw')) {
if (Test-NssmService $name) {
Run { nssm stop $name } "stop $name"
Run { Stop-Service $name -Force -ErrorAction SilentlyContinue } "stop $name"
}
else {
Write-Host " ($name not installed; skipping)" -ForegroundColor DarkGray
@@ -56,7 +56,7 @@ foreach ($name in @('OtOpcUa', 'OtOpcUaWonderwareHistorian', 'MxAccessGw')) {
if (-not $WhatIf) {
Start-Sleep -Seconds 3
Get-Process MxGateway.Server, MxGateway.Worker, OtOpcUa.Server, OtOpcUa.Driver.Historian.Wonderware -ErrorAction SilentlyContinue |
Get-Process MxGateway.Server, MxGateway.Worker, OtOpcUa.Host, OtOpcUa.Driver.Historian.Wonderware -ErrorAction SilentlyContinue |
ForEach-Object {
Write-Host " killing residual process $($_.ProcessName) (PID=$($_.Id))" -ForegroundColor DarkYellow
Stop-Process -Id $_.Id -Force -ErrorAction SilentlyContinue
@@ -109,14 +109,14 @@ Run {
# Step 4: Refresh OtOpcUa + Wonderware historian sidecar
# ------------------------------------------------------------------------
Step "Publishing OtOpcUa server + Wonderware historian sidecar from $RepoRoot"
Step "Publishing OtOpcUa.Host + Wonderware historian sidecar from $RepoRoot"
Run {
& dotnet publish "$RepoRoot\src\Server\ZB.MOM.WW.OtOpcUa.Server" `
& dotnet publish "$RepoRoot\src\Server\ZB.MOM.WW.OtOpcUa.Host" `
-c Release -o (Join-Path $PublishRoot "lmxopcua") | Out-Null
& dotnet publish "$RepoRoot\src\Drivers\ZB.MOM.WW.OtOpcUa.Driver.Historian.Wonderware" `
-c Release -o (Join-Path $PublishRoot "lmxopcua\WonderwareHistorian") | Out-Null
} "dotnet publish (Server + sidecar)"
} "dotnet publish (Host + sidecar)"
# ------------------------------------------------------------------------
# Step 5: Service env block — ensure OTOPCUA_HISTORIAN_ALARM_WRITE_ENABLED
@@ -143,16 +143,16 @@ if (Test-NssmService 'OtOpcUaWonderwareHistorian') {
# Step 6: Start in forward dependency order
# ------------------------------------------------------------------------
Step "Starting services (MxAccessGw OtOpcUaWonderwareHistorian OtOpcUa)"
Step "Starting services (MxAccessGw > OtOpcUaWonderwareHistorian > OtOpcUaHost)"
foreach ($pair in @(
@{ Name = 'MxAccessGw'; Wait = 4 },
@{ Name = 'OtOpcUaWonderwareHistorian'; Wait = 4 },
@{ Name = 'OtOpcUa'; Wait = 8 }
@{ Name = 'OtOpcUaHost'; Wait = 8 }
)) {
$name = $pair.Name
if (Test-NssmService $name) {
Run { nssm start $name } "start $name"
Run { Start-Service $name } "start $name"
if (-not $WhatIf) { Start-Sleep -Seconds $pair.Wait }
}
else {
@@ -167,7 +167,7 @@ foreach ($pair in @(
Step "Smoke verification"
if (-not $WhatIf) {
foreach ($name in @('MxAccessGw', 'OtOpcUaWonderwareHistorian', 'OtOpcUa')) {
foreach ($name in @('MxAccessGw', 'OtOpcUaWonderwareHistorian', 'OtOpcUaHost')) {
if (Test-NssmService $name) {
$status = (Get-Service $name).Status
$color = if ($status -eq 'Running') { 'Green' } else { 'Red' }
+7 -6
View File
@@ -3,16 +3,17 @@
Stops + removes the v2 services. Mirrors Install-Services.ps1.
.DESCRIPTION
PR 7.2 retired the legacy OtOpcUaGalaxyHost service. Galaxy access now flows
through the in-process GalaxyDriver against a separately-installed mxaccessgw.
OtOpcUaGalaxyHost is included in the cleanup loop below so this script safely
removes it from any rig still carrying the legacy service from a pre-7.2
install.
Removes the v2 OtOpcUaHost service plus the optional OtOpcUaWonderwareHistorian
sidecar. Also cleans up legacy service names from prior installs:
- OtOpcUa (v1 server) replaced by OtOpcUaHost in v2
- OtOpcUaAdmin (v1 admin) fused into OtOpcUaHost in v2
- OtOpcUaGalaxyHost (pre-7.2 Galaxy host) long-retired
#>
[CmdletBinding()] param()
$ErrorActionPreference = 'Continue'
foreach ($svc in 'OtOpcUa', 'OtOpcUaWonderwareHistorian', 'OtOpcUaGalaxyHost') {
foreach ($svc in 'OtOpcUaHost', 'OtOpcUaWonderwareHistorian',
'OtOpcUa', 'OtOpcUaAdmin', 'OtOpcUaGalaxyHost') {
if (Get-Service $svc -ErrorAction SilentlyContinue) {
Write-Host "Stopping $svc..."
Stop-Service $svc -Force -ErrorAction SilentlyContinue
+24
View File
@@ -0,0 +1,24 @@
# Dynamic (file-provider) Traefik config for the OtOpcUa admin HTTP routing.
# Picked up by traefik.yml's file provider (with watch: true) so router/service
# edits hot-reload without a Traefik restart.
http:
routers:
otopcua-admin:
entryPoints: ["web"]
rule: "HostRegexp(`otopcua.*`)"
service: otopcua-admin
services:
otopcua-admin:
loadBalancer:
servers:
- url: "http://admin-a:9000"
- url: "http://admin-b:9000"
healthCheck:
path: /health/active
interval: 5s
timeout: 2s
# Default expected status is 2xx. Followers return 503 from
# /health/active so Traefik will drop them from the balancer
# within the next interval after a leadership change.
+30
View File
@@ -0,0 +1,30 @@
# Traefik static configuration for the OtOpcUa fleet HTTP front door.
#
# Routes admin-role HTTP traffic (Blazor + auth + SignalR + /auth/*) to whichever
# OtOpcUa.Host node currently holds the admin role-leader. Uses the /health/active
# endpoint as the active-leader signal: a node returns 200 only when it is the
# Akka admin role-leader; followers return 503 and Traefik routes around them.
#
# OPC UA traffic is NOT routed through Traefik — clients connect directly to
# opc.tcp://node:4840 on every driver node and use the standard ServiceLevel
# heuristic for failover.
entryPoints:
web:
address: ":80"
providers:
file:
filename: /etc/traefik/dynamic.yml
watch: true
api:
insecure: true
dashboard: true
log:
level: INFO
format: common
accessLog:
format: common
+59
View File
@@ -0,0 +1,59 @@
<#
.SYNOPSIS
Idempotent migration runner that takes the OtOpcUaConfig database from the v1 schema
(with ConfigGeneration / ClusterNodeGenerationState) to the v2 hosting-aligned schema
(with Deployment / NodeDeploymentState / ConfigEdit / DataProtectionKeys).
.DESCRIPTION
Backs the database up, applies the idempotent EF migration script, then validates that
expected tables exist and legacy tables are gone. Safe to re-run the EF script itself
is idempotent, and the backup picks a unique filename per invocation.
.PARAMETER ConnectionString
Mandatory. Full ADO.NET connection string with permissions to BACKUP DATABASE and
apply DDL on the target ConfigDb.
.PARAMETER BackupPath
Optional. Full path for the backup file. Defaults to a timestamped path under $env:TEMP.
.EXAMPLE
.\Migrate-To-V2.ps1 -ConnectionString "Server=sql01;Database=OtOpcUaConfig;Trusted_Connection=True;TrustServerCertificate=True"
#>
[CmdletBinding()]
param(
[Parameter(Mandatory)][string] $ConnectionString,
[string] $BackupPath = "$env:TEMP\OtOpcUa-V1-Backup-$(Get-Date -Format yyyyMMddHHmmss).bak"
)
$ErrorActionPreference = 'Stop'
if (-not (Get-Command Invoke-Sqlcmd -ErrorAction SilentlyContinue)) {
throw "Invoke-Sqlcmd not available. Install module: Install-Module SqlServer -Scope CurrentUser"
}
Write-Host "Step 1/4 — Backup ConfigDb to $BackupPath" -ForegroundColor Cyan
Invoke-Sqlcmd -ConnectionString $ConnectionString `
-Query "BACKUP DATABASE [OtOpcUaConfig] TO DISK = '$BackupPath' WITH FORMAT, COMPRESSION"
Write-Host "Step 2/4 — Row counts (before)" -ForegroundColor Cyan
$beforeCounts = Invoke-Sqlcmd -ConnectionString $ConnectionString -InputFile "$PSScriptRoot\count-rows.sql"
$beforeCounts | Format-Table
Write-Host "Step 3/4 — Apply Migrate-To-V2.sql" -ForegroundColor Cyan
Invoke-Sqlcmd -ConnectionString $ConnectionString -InputFile "$PSScriptRoot\Migrate-To-V2.sql" -QueryTimeout 1800
Write-Host "Step 4/4 — Row counts (after) + validation" -ForegroundColor Cyan
$afterCounts = Invoke-Sqlcmd -ConnectionString $ConnectionString -InputFile "$PSScriptRoot\count-rows.sql"
$afterCounts | Format-Table
$tablesNow = (Invoke-Sqlcmd -ConnectionString $ConnectionString `
-Query "SELECT name FROM sys.tables ORDER BY name").name
foreach ($t in 'Deployment','NodeDeploymentState','ConfigEdit','DataProtectionKeys') {
if ($tablesNow -notcontains $t) { throw "Expected v2 table $t missing." }
}
foreach ($t in 'ConfigGeneration','ClusterNodeGenerationState') {
if ($tablesNow -contains $t) { throw "Legacy v1 table $t still present." }
}
Write-Host "Migration complete. Backup at $BackupPath" -ForegroundColor Green
File diff suppressed because it is too large Load Diff
+26
View File
@@ -0,0 +1,26 @@
-- Per-table row counts for pre/post-migration audit.
-- Covers every table relevant to the v1 -> v2 transition so the operator can confirm
-- live-edit data was preserved and v2 tables came up empty.
SELECT TableName = t.name, [Rows] = SUM(p.[rows])
FROM sys.tables t
JOIN sys.partitions p ON p.object_id = t.object_id AND p.index_id IN (0,1)
WHERE t.name IN (
-- Live-edit configuration (rows must survive)
'ServerCluster','ClusterNode','ClusterNodeCredential',
'Namespace','UnsArea','UnsLine',
'DriverInstance','Device','Equipment','Tag','PollGroup','VirtualTag',
'NodeAcl','ExternalIdReservation',
'Script','ScriptedAlarm','ScriptedAlarmState',
'LdapGroupRoleMapping',
'EquipmentImportBatch','EquipmentImportRow',
-- Status tables (rebuilt at runtime; counts informational)
'DriverHostStatus','DriverInstanceResilienceStatus',
-- Audit (preserved)
'ConfigAuditLog',
-- v2 deploy model (empty pre-migration, populated post)
'Deployment','NodeDeploymentState','ConfigEdit','DataProtectionKeys'
)
GROUP BY t.name
ORDER BY t.name;
GO
@@ -109,8 +109,16 @@ public abstract class CommandBase : ICommand
/// <summary>
/// Configures Serilog based on the verbose flag.
/// </summary>
/// <remarks>
/// Disposes the previously assigned <see cref="Log.Logger" /> via <see cref="Log.CloseAndFlush" />
/// before installing the new one, so repeated CLI invocations (e.g. in the test suite) do not
/// leak the prior logger's console sink.
/// </remarks>
protected void ConfigureLogging()
{
// Dispose any previously installed logger before swapping in a new one.
Log.CloseAndFlush();
var config = new LoggerConfiguration();
if (Verbose)
config.MinimumLevel.Debug()
@@ -1,6 +1,8 @@
using System.Threading.Channels;
using CliFx.Attributes;
using CliFx.Exceptions;
using CliFx.Infrastructure;
using Opc.Ua;
using ZB.MOM.WW.OtOpcUa.Client.CLI.Helpers;
using ZB.MOM.WW.OtOpcUa.Client.Shared;
using ZB.MOM.WW.OtOpcUa.Client.Shared.Models;
@@ -43,14 +45,25 @@ public class AlarmsCommand : CommandBase
public override async ValueTask ExecuteAsync(IConsole console)
{
ConfigureLogging();
if (Interval <= 0)
throw new CommandException($"--interval must be greater than 0 (was {Interval}).");
NodeId? sourceNodeId;
try
{
sourceNodeId = NodeIdParser.Parse(NodeId);
}
catch (Exception ex) when (ex is FormatException or ArgumentException)
{
throw new CommandException($"Invalid --node value: {ex.Message}");
}
IOpcUaClientService? service = null;
try
{
var ct = console.RegisterCancellationHandler();
(service, _) = await CreateServiceAndConnectAsync(ct);
var sourceNodeId = NodeIdParser.Parse(NodeId);
// Channel serialises SDK notification-thread writes to the main async loop so
// that concurrent alarm callbacks never interleave on the shared TextWriter.
var outputChannel = Channel.CreateUnbounded<string>(
@@ -1,4 +1,5 @@
using CliFx.Attributes;
using CliFx.Exceptions;
using CliFx.Infrastructure;
using Opc.Ua;
using ZB.MOM.WW.OtOpcUa.Client.CLI.Helpers;
@@ -42,13 +43,25 @@ public class BrowseCommand : CommandBase
public override async ValueTask ExecuteAsync(IConsole console)
{
ConfigureLogging();
if (Depth <= 0)
throw new CommandException($"--depth must be greater than 0 (was {Depth}).");
NodeId? startNode;
try
{
startNode = NodeIdParser.Parse(NodeId);
}
catch (Exception ex) when (ex is FormatException or ArgumentException)
{
throw new CommandException($"Invalid --node value: {ex.Message}");
}
IOpcUaClientService? service = null;
try
{
var ct = console.RegisterCancellationHandler();
(service, _) = await CreateServiceAndConnectAsync(ct);
var startNode = NodeIdParser.Parse(NodeId);
var maxDepth = Recursive ? Depth : 1;
await BrowseNodeAsync(service, console, startNode, maxDepth, 0, ct);
@@ -1,5 +1,6 @@
using System.Globalization;
using CliFx.Attributes;
using CliFx.Exceptions;
using CliFx.Infrastructure;
using Opc.Ua;
using ZB.MOM.WW.OtOpcUa.Client.CLI.Helpers;
@@ -62,22 +63,65 @@ public class HistoryReadCommand : CommandBase
public override async ValueTask ExecuteAsync(IConsole console)
{
ConfigureLogging();
if (MaxValues <= 0)
throw new CommandException($"--max must be greater than 0 (was {MaxValues}).");
if (!string.IsNullOrEmpty(Aggregate) && IntervalMs <= 0)
throw new CommandException($"--interval must be greater than 0 (was {IntervalMs}).");
NodeId nodeId;
try
{
nodeId = NodeIdParser.ParseRequired(NodeId);
}
catch (Exception ex) when (ex is FormatException or ArgumentException)
{
throw new CommandException($"Invalid --node value: {ex.Message}");
}
DateTime start, end;
try
{
start = string.IsNullOrEmpty(StartTime)
? DateTime.UtcNow.AddHours(-24)
: DateTime.Parse(StartTime, CultureInfo.InvariantCulture,
DateTimeStyles.AssumeUniversal | DateTimeStyles.AdjustToUniversal);
}
catch (FormatException ex)
{
throw new CommandException($"Invalid --start value '{StartTime}': {ex.Message}. Expected ISO 8601 UTC format, e.g. 2026-01-15T08:00:00Z.");
}
try
{
end = string.IsNullOrEmpty(EndTime)
? DateTime.UtcNow
: DateTime.Parse(EndTime, CultureInfo.InvariantCulture,
DateTimeStyles.AssumeUniversal | DateTimeStyles.AdjustToUniversal);
}
catch (FormatException ex)
{
throw new CommandException($"Invalid --end value '{EndTime}': {ex.Message}. Expected ISO 8601 UTC format, e.g. 2026-01-15T08:00:00Z.");
}
AggregateType aggregateType = default;
if (!string.IsNullOrEmpty(Aggregate))
{
try
{
aggregateType = ParseAggregateType(Aggregate);
}
catch (ArgumentException ex)
{
throw new CommandException($"Invalid --aggregate value: {ex.Message}");
}
}
IOpcUaClientService? service = null;
try
{
var ct = console.RegisterCancellationHandler();
(service, _) = await CreateServiceAndConnectAsync(ct);
var nodeId = NodeIdParser.ParseRequired(NodeId);
var start = string.IsNullOrEmpty(StartTime)
? DateTime.UtcNow.AddHours(-24)
: DateTime.Parse(StartTime, CultureInfo.InvariantCulture,
DateTimeStyles.AssumeUniversal | DateTimeStyles.AdjustToUniversal);
var end = string.IsNullOrEmpty(EndTime)
? DateTime.UtcNow
: DateTime.Parse(EndTime, CultureInfo.InvariantCulture,
DateTimeStyles.AssumeUniversal | DateTimeStyles.AdjustToUniversal);
IReadOnlyList<DataValue> values;
if (string.IsNullOrEmpty(Aggregate))
@@ -88,7 +132,6 @@ public class HistoryReadCommand : CommandBase
}
else
{
var aggregateType = ParseAggregateType(Aggregate);
await console.Output.WriteLineAsync(
$"History for {NodeId} ({Aggregate}, interval={IntervalMs}ms)");
values = await service.HistoryReadAggregateAsync(
@@ -1,5 +1,7 @@
using CliFx.Attributes;
using CliFx.Exceptions;
using CliFx.Infrastructure;
using Opc.Ua;
using ZB.MOM.WW.OtOpcUa.Client.CLI.Helpers;
using ZB.MOM.WW.OtOpcUa.Client.Shared;
@@ -29,13 +31,23 @@ public class ReadCommand : CommandBase
public override async ValueTask ExecuteAsync(IConsole console)
{
ConfigureLogging();
NodeId nodeId;
try
{
nodeId = NodeIdParser.ParseRequired(NodeId);
}
catch (Exception ex) when (ex is FormatException or ArgumentException)
{
throw new CommandException($"Invalid --node value: {ex.Message}");
}
IOpcUaClientService? service = null;
try
{
var ct = console.RegisterCancellationHandler();
(service, _) = await CreateServiceAndConnectAsync(ct);
var nodeId = NodeIdParser.ParseRequired(NodeId);
var value = await service.ReadValueAsync(nodeId, ct);
await console.Output.WriteLineAsync($"Node: {NodeId}");
@@ -1,6 +1,7 @@
using System.Collections.Concurrent;
using System.Threading.Channels;
using CliFx.Attributes;
using CliFx.Exceptions;
using CliFx.Infrastructure;
using Opc.Ua;
using ZB.MOM.WW.OtOpcUa.Client.CLI.Helpers;
@@ -12,42 +13,92 @@ namespace ZB.MOM.WW.OtOpcUa.Client.CLI.Commands;
[Command("subscribe", Description = "Monitor a node for value changes")]
public class SubscribeCommand : CommandBase
{
/// <summary>
/// Creates the subscribe command used to monitor a node (or a subtree of nodes) for data-change
/// notifications.
/// </summary>
/// <param name="factory">The factory that creates the shared client service for the command run.</param>
public SubscribeCommand(IOpcUaClientServiceFactory factory) : base(factory)
{
}
/// <summary>
/// Gets the node ID to monitor. When <see cref="Recursive" /> is set, this node is the browse root
/// and every <c>Variable</c> child it reaches is subscribed.
/// </summary>
[CommandOption("node", 'n', Description = "Node ID to monitor", IsRequired = true)]
public string NodeId { get; init; } = default!;
/// <summary>
/// Gets the sampling interval, in milliseconds, requested for every monitored item.
/// </summary>
[CommandOption("interval", 'i', Description = "Sampling interval in milliseconds")]
public int Interval { get; init; } = 1000;
/// <summary>
/// Gets a value indicating whether the command should browse from <see cref="NodeId" />
/// and subscribe to every <c>Variable</c> in the subtree.
/// </summary>
[CommandOption("recursive", 'r', Description = "Browse recursively from --node and subscribe to every Variable found")]
public bool Recursive { get; init; }
/// <summary>
/// Gets the maximum recursion depth applied while collecting variables when <see cref="Recursive" /> is set.
/// </summary>
[CommandOption("max-depth", Description = "Maximum recursion depth when --recursive is set")]
public int MaxDepth { get; init; } = 10;
/// <summary>
/// Gets a value indicating whether per-update lines should be suppressed in favour of the final summary only.
/// </summary>
[CommandOption("quiet", 'q', Description = "Suppress per-update output; only print a final summary on Ctrl+C")]
public bool Quiet { get; init; }
/// <summary>
/// Gets the duration, in seconds, before the command auto-exits and prints its summary.
/// A value of <c>0</c> means the command runs until Ctrl+C.
/// </summary>
[CommandOption("duration", Description = "Auto-exit after N seconds and print summary (0 = run until Ctrl+C)")]
public int DurationSeconds { get; init; } = 0;
/// <summary>
/// Gets the optional path that the command should write the final summary to on exit, in addition to stdout.
/// </summary>
[CommandOption("summary-file", Description = "Write summary to this file path on exit (in addition to stdout)")]
public string? SummaryFile { get; init; }
/// <summary>
/// Connects to the server, subscribes to <see cref="NodeId" /> (or its subtree when recursive),
/// streams data-change notifications to the console, and prints a summary when the command exits.
/// </summary>
/// <param name="console">The CLI console used for output and cancellation handling.</param>
public override async ValueTask ExecuteAsync(IConsole console)
{
ConfigureLogging();
if (Interval <= 0)
throw new CommandException($"--interval must be greater than 0 (was {Interval}).");
if (Recursive && MaxDepth <= 0)
throw new CommandException($"--max-depth must be greater than 0 (was {MaxDepth}).");
if (DurationSeconds < 0)
throw new CommandException($"--duration must be 0 or a positive number (was {DurationSeconds}).");
NodeId rootNodeId;
try
{
rootNodeId = NodeIdParser.ParseRequired(NodeId);
}
catch (Exception ex) when (ex is FormatException or ArgumentException)
{
throw new CommandException($"Invalid --node value: {ex.Message}");
}
IOpcUaClientService? service = null;
try
{
var ct = console.RegisterCancellationHandler();
(service, _) = await CreateServiceAndConnectAsync(ct);
var rootNodeId = NodeIdParser.ParseRequired(NodeId);
var targets = new List<(NodeId nodeId, string displayPath)>();
if (Recursive)
{
@@ -1,4 +1,5 @@
using CliFx.Attributes;
using CliFx.Exceptions;
using CliFx.Infrastructure;
using Opc.Ua;
using ZB.MOM.WW.OtOpcUa.Client.CLI.Helpers;
@@ -37,14 +38,23 @@ public class WriteCommand : CommandBase
public override async ValueTask ExecuteAsync(IConsole console)
{
ConfigureLogging();
NodeId nodeId;
try
{
nodeId = NodeIdParser.ParseRequired(NodeId);
}
catch (Exception ex) when (ex is FormatException or ArgumentException)
{
throw new CommandException($"Invalid --node value: {ex.Message}");
}
IOpcUaClientService? service = null;
try
{
var ct = console.RegisterCancellationHandler();
(service, _) = await CreateServiceAndConnectAsync(ct);
var nodeId = NodeIdParser.ParseRequired(NodeId);
// Read current value to determine type for conversion
var currentValue = await service.ReadValueAsync(nodeId, ct);
var typedValue = ValueConverter.ConvertValue(Value, currentValue.Value);
@@ -9,9 +9,9 @@
</PropertyGroup>
<ItemGroup>
<PackageReference Include="CliFx" Version="2.3.6"/>
<PackageReference Include="Serilog" Version="4.2.0"/>
<PackageReference Include="Serilog.Sinks.Console" Version="6.0.0"/>
<PackageReference Include="CliFx"/>
<PackageReference Include="Serilog"/>
<PackageReference Include="Serilog.Sinks.Console"/>
</ItemGroup>
<ItemGroup>
@@ -14,7 +14,13 @@ internal sealed class DefaultApplicationConfigurationFactory : IApplicationConfi
public async Task<ApplicationConfiguration> CreateAsync(ConnectionSettings settings, CancellationToken ct)
{
var storePath = settings.CertificateStorePath;
// Resolve the canonical PKI path lazily on first use so constructing a
// ConnectionSettings instance — including the throwaway copies the client
// service builds per failover attempt — does not touch the filesystem.
// Callers that supply an explicit path override the default.
var storePath = string.IsNullOrWhiteSpace(settings.CertificateStorePath)
? ClientStoragePaths.GetPkiPath()
: settings.CertificateStorePath;
var config = new ApplicationConfiguration
{
@@ -24,9 +24,47 @@ internal sealed class DefaultEndpointDiscovery : IEndpointDiscovery
using var client = DiscoveryClient.Create(new Uri(endpointUrl));
var allEndpoints = client.GetEndpoints(null);
return EndpointSelector.SelectBest(allEndpoints, endpointUrl, requestedMode);
}
}
/// <summary>
/// Pure best-endpoint selection logic, extracted from <see cref="DefaultEndpointDiscovery"/>
/// so it can be unit tested without standing up a real <see cref="DiscoveryClient"/>.
/// </summary>
internal static class EndpointSelector
{
private static readonly ILogger Logger = Log.ForContext(typeof(EndpointSelector));
/// <summary>
/// Picks the best endpoint from the discovery response that matches the requested
/// security mode, preferring <c>Basic256Sha256</c>, and rewrites the endpoint URL
/// host to match the user-supplied URL when the discovery response advertises a
/// different hostname.
/// </summary>
/// <param name="allEndpoints">Endpoints returned by the discovery query, in any order.</param>
/// <param name="endpointUrl">The endpoint URL the operator supplied; supplies the hostname rewrite target.</param>
/// <param name="requestedMode">The requested OPC UA message security mode.</param>
/// <exception cref="InvalidOperationException">
/// Thrown when no endpoint matches <paramref name="requestedMode"/>; the message lists the
/// security mode + policy combinations the server returned so operators can diagnose mismatches.
/// </exception>
public static EndpointDescription SelectBest(
IEnumerable<EndpointDescription> allEndpoints,
string endpointUrl,
MessageSecurityMode requestedMode)
{
ArgumentNullException.ThrowIfNull(allEndpoints);
if (string.IsNullOrWhiteSpace(endpointUrl))
throw new ArgumentException("Endpoint URL must not be null or empty.", nameof(endpointUrl));
// Materialise once so we can both iterate and produce a diagnostic message
// without re-running the underlying discovery enumeration.
var endpoints = allEndpoints.ToList();
EndpointDescription? best = null;
foreach (var ep in allEndpoints)
foreach (var ep in endpoints)
{
if (ep.SecurityMode != requestedMode)
continue;
@@ -37,18 +75,21 @@ internal sealed class DefaultEndpointDiscovery : IEndpointDiscovery
continue;
}
// Prefer Basic256Sha256 when multiple endpoints match the requested mode.
if (ep.SecurityPolicyUri == SecurityPolicies.Basic256Sha256)
best = ep;
}
if (best == null)
{
var available = string.Join(", ", allEndpoints.Select(e => $"{e.SecurityMode}/{e.SecurityPolicyUri}"));
var available = string.Join(", ", endpoints.Select(e => $"{e.SecurityMode}/{e.SecurityPolicyUri}"));
throw new InvalidOperationException(
$"No endpoint found with security mode '{requestedMode}'. Available endpoints: {available}");
}
// Rewrite endpoint URL hostname to match user-supplied hostname
// Rewrite endpoint URL hostname to match user-supplied hostname. Necessary
// when the OPC UA server returns a discovery URL using a different hostname
// (e.g. internal DNS name) than the one the operator routed to.
var serverUri = new Uri(best.EndpointUrl);
var requestedUri = new Uri(endpointUrl);
if (serverUri.Host != requestedUri.Host)
@@ -73,6 +73,17 @@ internal sealed class DefaultSessionAdapter : ISessionAdapter
var writeCollection = new WriteValueCollection { writeValue };
var response = await _session.WriteAsync(null, writeCollection, ct);
// A malformed or service-level-faulted response can come back with an empty
// Results collection alongside a service fault. Surface the service result
// (or BadUnexpectedError) rather than letting Results[0] throw
// IndexOutOfRangeException upstream.
if (response.Results == null || response.Results.Count == 0)
{
var serviceResult = response.ResponseHeader?.ServiceResult.Code ?? StatusCodes.BadUnexpectedError;
throw new ServiceResultException(serviceResult,
$"Write response contained no results for node {nodeId}.");
}
return response.Results[0];
}
@@ -143,15 +154,18 @@ internal sealed class DefaultSessionAdapter : ISessionAdapter
if (continuationPoint != null)
nodesToRead[0].ContinuationPoint = continuationPoint;
_session.HistoryRead(
// Use the async overload so this method is genuinely asynchronous,
// honors the cancellation token, and does not block the caller's thread
// (which would block the UI dispatcher for client.ui consumers).
var response = await _session.HistoryReadAsync(
null,
new ExtensionObject(details),
TimestampsToReturn.Source,
continuationPoint != null,
nodesToRead,
out var results,
out _);
ct).ConfigureAwait(false);
var results = response.Results;
if (results == null || results.Count == 0)
break;
@@ -186,15 +200,17 @@ internal sealed class DefaultSessionAdapter : ISessionAdapter
new HistoryReadValueId { NodeId = nodeId }
};
_session.HistoryRead(
// Use the async overload so the method honors the cancellation token and
// does not block on a synchronous service round-trip.
var response = await _session.HistoryReadAsync(
null,
new ExtensionObject(details),
TimestampsToReturn.Source,
false,
nodesToRead,
out var results,
out _);
ct).ConfigureAwait(false);
var results = response.Results;
var allValues = new List<DataValue>();
if (results != null && results.Count > 0)
@@ -229,7 +245,9 @@ internal sealed class DefaultSessionAdapter : ISessionAdapter
{
try
{
if (_session.Connected) _session.Close();
// Use the async overload so the caller does not block on the close
// service round-trip and the cancellation token is honored.
if (_session.Connected) await _session.CloseAsync(ct).ConfigureAwait(false);
}
catch (Exception ex)
{
@@ -270,6 +288,15 @@ internal sealed class DefaultSessionAdapter : ISessionAdapter
},
ct);
// An empty Results collection paired with a service fault must surface as
// a ServiceResultException, not an IndexOutOfRangeException from Results[0].
if (result.Results == null || result.Results.Count == 0)
{
var serviceResult = result.ResponseHeader?.ServiceResult.Code ?? StatusCodes.BadUnexpectedError;
throw new ServiceResultException(serviceResult,
$"Call response contained no results for method {methodId} on {objectId}.");
}
var callResult = result.Results[0];
if (StatusCode.IsBad(callResult.StatusCode))
throw new ServiceResultException(callResult.StatusCode);
@@ -96,6 +96,11 @@ public interface IOpcUaClientService : IDisposable
/// <param name="eventId">The event identifier returned by the OPC UA server for the alarm event.</param>
/// <param name="comment">The operator acknowledgment comment to write with the method call.</param>
/// <param name="ct">The cancellation token that aborts the acknowledgment request.</param>
/// <returns>
/// <see cref="StatusCodes.Good"/> on success, or the server's bad <see cref="StatusCode"/>
/// (from the underlying <see cref="ServiceResultException"/>) when the acknowledge call
/// returns a bad result. Other transport-level failures still surface as exceptions.
/// </returns>
Task<StatusCode> AcknowledgeAlarmAsync(string conditionNodeId, byte[] eventId, string comment, CancellationToken ct = default);
/// <summary>
@@ -41,11 +41,13 @@ public sealed class ConnectionSettings
public bool AutoAcceptCertificates { get; set; } = true;
/// <summary>
/// Path to the certificate store. Defaults to a subdirectory under LocalApplicationData
/// resolved via <see cref="ClientStoragePaths"/> so the one-shot legacy-folder migration
/// runs before the path is returned.
/// Path to the certificate store. Defaults to <see cref="string.Empty"/>; the
/// consuming application configuration factory resolves the canonical path via
/// <see cref="ClientStoragePaths.GetPkiPath"/> lazily on first connect, so
/// constructing settings — including the throwaway copies built per failover
/// attempt — does not touch disk or run the legacy-folder migration probe.
/// </summary>
public string CertificateStorePath { get; set; } = ClientStoragePaths.GetPkiPath();
public string CertificateStorePath { get; set; } = string.Empty;
/// <summary>
/// Validates the settings and throws if any required values are missing or invalid.
@@ -353,11 +353,24 @@ public sealed class OpcUaClientService : IOpcUaClientService
: NodeId.Parse(conditionNodeId + ".Condition");
var acknowledgeMethodId = MethodIds.AcknowledgeableConditionType_Acknowledge;
await _session!.CallMethodAsync(
conditionObjId,
acknowledgeMethodId,
[eventId, new LocalizedText(comment)],
ct);
// CallMethodAsync throws ServiceResultException on a bad call result;
// surface that as the returned StatusCode so callers using the documented
// `Task<StatusCode>` contract (e.g. `if (StatusCode.IsBad(result))`) see
// the failure instead of an uncaught exception they did not anticipate.
try
{
await _session!.CallMethodAsync(
conditionObjId,
acknowledgeMethodId,
[eventId, new LocalizedText(comment)],
ct);
}
catch (ServiceResultException ex)
{
Logger.Warning(ex, "Failed to acknowledge alarm on {ConditionId} (status {Status})",
conditionNodeId, ex.StatusCode);
return ex.StatusCode;
}
Logger.Debug("Acknowledged alarm on {ConditionId}", conditionNodeId);
return StatusCodes.Good;
@@ -8,8 +8,8 @@
</PropertyGroup>
<ItemGroup>
<PackageReference Include="OPCFoundation.NetStandard.Opc.Ua.Client" Version="1.5.378.106"/>
<PackageReference Include="Serilog" Version="4.2.0"/>
<PackageReference Include="OPCFoundation.NetStandard.Opc.Ua.Client"/>
<PackageReference Include="Serilog"/>
</ItemGroup>
<ItemGroup>
@@ -30,12 +30,6 @@ public partial class DateTimeRangePicker : UserControl
public static readonly StyledProperty<string> EndTextProperty =
AvaloniaProperty.Register<DateTimeRangePicker, string>(nameof(EndText), defaultValue: "");
public static readonly StyledProperty<DateTimeOffset?> MinDateTimeProperty =
AvaloniaProperty.Register<DateTimeRangePicker, DateTimeOffset?>(nameof(MinDateTime));
public static readonly StyledProperty<DateTimeOffset?> MaxDateTimeProperty =
AvaloniaProperty.Register<DateTimeRangePicker, DateTimeOffset?>(nameof(MaxDateTime));
private bool _isUpdating;
public DateTimeRangePicker()
@@ -67,18 +61,6 @@ public partial class DateTimeRangePicker : UserControl
set => SetValue(EndTextProperty, value);
}
public DateTimeOffset? MinDateTime
{
get => GetValue(MinDateTimeProperty);
set => SetValue(MinDateTimeProperty, value);
}
public DateTimeOffset? MaxDateTime
{
get => GetValue(MaxDateTimeProperty);
set => SetValue(MaxDateTimeProperty, value);
}
protected override void OnLoaded(RoutedEventArgs e)
{
base.OnLoaded(e);
@@ -1,4 +1,6 @@
using Avalonia;
using Serilog;
using ZB.MOM.WW.OtOpcUa.Client.Shared;
namespace ZB.MOM.WW.OtOpcUa.Client.UI;
@@ -7,8 +9,16 @@ public class Program
[STAThread]
public static void Main(string[] args)
{
BuildAvaloniaApp()
.StartWithClassicDesktopLifetime(args);
ConfigureLogging();
try
{
BuildAvaloniaApp()
.StartWithClassicDesktopLifetime(args);
}
finally
{
Log.CloseAndFlush();
}
}
public static AppBuilder BuildAvaloniaApp()
@@ -18,4 +28,35 @@ public class Program
.WithInterFont()
.LogToTrace();
}
/// <summary>
/// Initializes the Serilog root logger with a console sink + a rolling daily file sink
/// under <c>{LocalAppData}/OtOpcUaClient/logs/</c>. CLAUDE.md mandates Serilog with a
/// rolling daily file sink as the project standard; this is also the only way the swallow
/// blocks in the alarms / subscriptions / redundancy view-models surface a diagnosable
/// trace when an operator hits a problem in the field.
/// </summary>
private static void ConfigureLogging()
{
var logsDir = Path.Combine(ClientStoragePaths.GetRoot(), "logs");
try
{
Directory.CreateDirectory(logsDir);
}
catch
{
// Best-effort; file sink will gracefully fall back if the dir can't be created.
}
Log.Logger = new LoggerConfiguration()
.MinimumLevel.Information()
.Enrich.FromLogContext()
.WriteTo.Console()
.WriteTo.File(
path: Path.Combine(logsDir, "client-ui-.log"),
rollingInterval: RollingInterval.Day,
retainedFileCountLimit: 14,
shared: true)
.CreateLogger();
}
}
@@ -2,6 +2,7 @@ using System.Collections.ObjectModel;
using CommunityToolkit.Mvvm.ComponentModel;
using CommunityToolkit.Mvvm.Input;
using Opc.Ua;
using Serilog;
using ZB.MOM.WW.OtOpcUa.Client.Shared;
using ZB.MOM.WW.OtOpcUa.Client.Shared.Models;
using ZB.MOM.WW.OtOpcUa.Client.UI.Services;
@@ -13,9 +14,18 @@ namespace ZB.MOM.WW.OtOpcUa.Client.UI.ViewModels;
/// </summary>
public partial class AlarmsViewModel : ObservableObject
{
private static readonly ILogger Logger = Log.ForContext<AlarmsViewModel>();
private readonly IUiDispatcher _dispatcher;
private readonly IOpcUaClientService _service;
/// <summary>
/// Last user-visible status message — set when an alarm subscribe / unsubscribe / refresh
/// operation fails so the shell can surface the diagnostic instead of silently dropping it.
/// Genuine failures are distinguished from "feature not supported" (condition refresh).
/// </summary>
[ObservableProperty] private string? _statusMessage;
[ObservableProperty] private int _interval = 1000;
[ObservableProperty]
@@ -95,19 +105,25 @@ public partial class AlarmsViewModel : ObservableObject
await _service.SubscribeAlarmsAsync(sourceNodeId, Interval);
IsSubscribed = true;
StatusMessage = null;
try
{
await _service.RequestConditionRefreshAsync();
}
catch
catch (Exception refreshEx)
{
// Refresh not supported
// Condition refresh is optional on the server side — log at info level and surface
// a soft notice rather than a hard failure so the operator can tell apart "server
// does not advertise refresh" from a genuine subscribe failure.
Logger.Information(refreshEx, "RequestConditionRefresh not supported by server");
StatusMessage = "Condition refresh not supported by server (subscribed).";
}
}
catch
catch (Exception ex)
{
// Subscribe failed
Logger.Warning(ex, "SubscribeAlarms failed for {Source}", MonitoredNodeIdText ?? "(all)");
StatusMessage = $"Subscribe to alarms failed: {ex.Message}";
}
}
@@ -123,10 +139,12 @@ public partial class AlarmsViewModel : ObservableObject
{
await _service.UnsubscribeAlarmsAsync();
IsSubscribed = false;
StatusMessage = null;
}
catch
catch (Exception ex)
{
// Unsubscribe failed
Logger.Warning(ex, "UnsubscribeAlarms failed");
StatusMessage = $"Unsubscribe alarms failed: {ex.Message}";
}
}
@@ -136,10 +154,14 @@ public partial class AlarmsViewModel : ObservableObject
try
{
await _service.RequestConditionRefreshAsync();
StatusMessage = null;
}
catch
catch (Exception ex)
{
// Refresh failed
// Same as the subscribe-time fallback: refresh is server-side optional. Information-
// level log + soft status so the operator sees why an explicit refresh did nothing.
Logger.Information(ex, "RequestConditionRefresh not supported by server");
StatusMessage = "Condition refresh not supported by server.";
}
}
@@ -189,19 +211,22 @@ public partial class AlarmsViewModel : ObservableObject
await _service.SubscribeAlarmsAsync(nodeId, Interval);
IsSubscribed = true;
StatusMessage = null;
try
{
await _service.RequestConditionRefreshAsync();
}
catch
catch (Exception refreshEx)
{
// Refresh not supported
Logger.Information(refreshEx, "RequestConditionRefresh not supported by server (restore path)");
StatusMessage = "Condition refresh not supported by server (restored subscription).";
}
}
catch
catch (Exception ex)
{
// Subscribe failed
Logger.Warning(ex, "RestoreAlarmSubscription failed for {Source}", sourceNodeId);
StatusMessage = $"Restore alarm subscription failed: {ex.Message}";
}
}
@@ -1,6 +1,7 @@
using System.Collections.ObjectModel;
using CommunityToolkit.Mvvm.ComponentModel;
using CommunityToolkit.Mvvm.Input;
using Serilog;
using ZB.MOM.WW.OtOpcUa.Client.Shared;
using ZB.MOM.WW.OtOpcUa.Client.Shared.Models;
using ZB.MOM.WW.OtOpcUa.Client.UI.Services;
@@ -12,6 +13,8 @@ namespace ZB.MOM.WW.OtOpcUa.Client.UI.ViewModels;
/// </summary>
public partial class MainWindowViewModel : ObservableObject, IDisposable
{
private static readonly ILogger Logger = Log.ForContext<MainWindowViewModel>();
private readonly IUiDispatcher _dispatcher;
private readonly IOpcUaClientServiceFactory _factory;
private readonly ISettingsService _settingsService;
@@ -137,6 +140,15 @@ public partial class MainWindowViewModel : ObservableObject, IDisposable
{
if (args.PropertyName == nameof(AlarmsViewModel.ActiveAlarmCount))
_dispatcher.Post(() => ActiveAlarmCount = Alarms.ActiveAlarmCount);
else if (args.PropertyName == nameof(AlarmsViewModel.StatusMessage)
&& !string.IsNullOrEmpty(Alarms.StatusMessage))
_dispatcher.Post(() => StatusMessage = Alarms.StatusMessage!);
};
Subscriptions.PropertyChanged += (_, args) =>
{
if (args.PropertyName == nameof(SubscriptionsViewModel.StatusMessage)
&& !string.IsNullOrEmpty(Subscriptions.StatusMessage))
_dispatcher.Post(() => StatusMessage = Subscriptions.StatusMessage!);
};
History = new HistoryViewModel(_service, _dispatcher);
@@ -244,15 +256,17 @@ public partial class MainWindowViewModel : ObservableObject, IDisposable
SessionLabel = $"{info.ServerName} | Session: {info.SessionName} ({info.SessionId})";
});
// Load redundancy info
// Load redundancy info — the server may not implement the redundancy facet, in which
// case we leave RedundancyInfo null but log so a field diagnosis can tell the difference
// between "facet not advertised" and "facet errored". The connection itself stays up.
try
{
var redundancy = await _service!.GetRedundancyInfoAsync();
_dispatcher.Post(() => RedundancyInfo = redundancy);
}
catch
catch (Exception redundancyEx)
{
// Redundancy info not available
Logger.Information(redundancyEx, "GetRedundancyInfo unavailable on this server");
}
// Load root nodes
@@ -2,6 +2,7 @@ using System.Collections.ObjectModel;
using CommunityToolkit.Mvvm.ComponentModel;
using CommunityToolkit.Mvvm.Input;
using Opc.Ua;
using Serilog;
using ZB.MOM.WW.OtOpcUa.Client.Shared;
using ZB.MOM.WW.OtOpcUa.Client.Shared.Models;
using ZB.MOM.WW.OtOpcUa.Client.UI.Services;
@@ -13,9 +14,17 @@ namespace ZB.MOM.WW.OtOpcUa.Client.UI.ViewModels;
/// </summary>
public partial class SubscriptionsViewModel : ObservableObject
{
private static readonly ILogger Logger = Log.ForContext<SubscriptionsViewModel>();
private readonly IUiDispatcher _dispatcher;
private readonly IOpcUaClientService _service;
/// <summary>
/// Last user-visible status message — set when a subscribe/unsubscribe operation fails so the
/// shell can surface the diagnostic instead of silently dropping the error. Cleared on success.
/// </summary>
[ObservableProperty] private string? _statusMessage;
[ObservableProperty]
[NotifyCanExecuteChangedFor(nameof(AddSubscriptionCommand))]
[NotifyCanExecuteChangedFor(nameof(RemoveSubscriptionCommand))]
@@ -85,11 +94,13 @@ public partial class SubscriptionsViewModel : ObservableObject
{
ActiveSubscriptions.Add(new SubscriptionItemViewModel(nodeIdStr, interval));
SubscriptionCount = ActiveSubscriptions.Count;
StatusMessage = null;
});
}
catch
catch (Exception ex)
{
// Subscription failed; no item added
Logger.Warning(ex, "AddSubscription failed for {NodeId}", nodeIdStr);
_dispatcher.Post(() => StatusMessage = $"Subscribe failed for {nodeIdStr}: {ex.Message}");
}
}
@@ -116,9 +127,11 @@ public partial class SubscriptionsViewModel : ObservableObject
_dispatcher.Post(() => ActiveSubscriptions.Remove(item));
}
catch
catch (Exception ex)
{
// Unsubscribe failed for this item; continue with others
Logger.Warning(ex, "Unsubscribe failed for {NodeId}", item.NodeId);
_dispatcher.Post(() => StatusMessage = $"Unsubscribe failed for {item.NodeId}: {ex.Message}");
// Continue with the other items in the batch.
}
}
@@ -146,11 +159,13 @@ public partial class SubscriptionsViewModel : ObservableObject
{
ActiveSubscriptions.Add(new SubscriptionItemViewModel(nodeIdStr, intervalMs));
SubscriptionCount = ActiveSubscriptions.Count;
StatusMessage = null;
});
}
catch
catch (Exception ex)
{
// Subscription failed
Logger.Warning(ex, "AddSubscriptionForNode failed for {NodeId}", nodeIdStr);
_dispatcher.Post(() => StatusMessage = $"Subscribe failed for {nodeIdStr}: {ex.Message}");
}
}
@@ -186,9 +201,10 @@ public partial class SubscriptionsViewModel : ObservableObject
foreach (var child in children)
await AddSubscriptionRecursiveAsync(child.NodeId, child.NodeClass, intervalMs, maxDepth, currentDepth + 1);
}
catch
catch (Exception ex)
{
// Browse failed for this node; skip it
Logger.Warning(ex, "Recursive browse failed for {NodeId}; skipping subtree", nodeIdStr);
_dispatcher.Post(() => StatusMessage = $"Browse failed for {nodeIdStr}: {ex.Message}");
}
}
@@ -78,7 +78,7 @@
<TextBox Text="{Binding CertificateStorePath}"
Width="370"
IsReadOnly="True"
Watermark="(default: AppData/LmxOpcUaClient/pki)" />
Watermark="(default: AppData/OtOpcUaClient/pki)" />
<Button Name="BrowseCertPathButton"
Content="..."
Width="30"
@@ -2,6 +2,7 @@ using System.ComponentModel;
using System.Reflection;
using Avalonia.Controls;
using Avalonia.Interactivity;
using Avalonia.Platform.Storage;
using SkiaSharp;
using Svg.Skia;
using ZB.MOM.WW.OtOpcUa.Client.UI.ViewModels;
@@ -126,15 +127,34 @@ public partial class MainWindow : Window
{
if (DataContext is not MainWindowViewModel vm) return;
var dialog = new OpenFolderDialog
var topLevel = TopLevel.GetTopLevel(this);
if (topLevel == null) return;
IStorageFolder? startLocation = null;
if (!string.IsNullOrEmpty(vm.CertificateStorePath))
{
try
{
startLocation = await topLevel.StorageProvider.TryGetFolderFromPathAsync(vm.CertificateStorePath);
}
catch
{
// Best-effort: if the existing path can't be resolved (missing/permission), open the dialog without it.
}
}
var folders = await topLevel.StorageProvider.OpenFolderPickerAsync(new FolderPickerOpenOptions
{
Title = "Select Certificate Store Folder",
Directory = vm.CertificateStorePath
};
AllowMultiple = false,
SuggestedStartLocation = startLocation
});
var result = await dialog.ShowAsync(this);
if (!string.IsNullOrEmpty(result))
vm.CertificateStorePath = result;
if (folders.Count == 0) return;
var picked = folders[0].TryGetLocalPath();
if (!string.IsNullOrEmpty(picked))
vm.CertificateStorePath = picked;
}
protected override void OnClosing(WindowClosingEventArgs e)
@@ -9,16 +9,17 @@
</PropertyGroup>
<ItemGroup>
<PackageReference Include="Avalonia" Version="11.2.7"/>
<PackageReference Include="Avalonia.Desktop" Version="11.2.7"/>
<PackageReference Include="Avalonia.Svg.Skia" Version="11.2.0.2"/>
<PackageReference Include="Avalonia.Themes.Fluent" Version="11.2.7"/>
<PackageReference Include="Avalonia.Fonts.Inter" Version="11.2.7"/>
<PackageReference Include="Avalonia.Controls.DataGrid" Version="11.2.7"/>
<PackageReference Include="Avalonia.Diagnostics" Version="11.2.7"/>
<PackageReference Include="CommunityToolkit.Mvvm" Version="8.4.0"/>
<PackageReference Include="Serilog" Version="4.2.0"/>
<PackageReference Include="Serilog.Sinks.Console" Version="6.0.0"/>
<PackageReference Include="Avalonia"/>
<PackageReference Include="Avalonia.Desktop"/>
<PackageReference Include="Avalonia.Svg.Skia"/>
<PackageReference Include="Avalonia.Themes.Fluent"/>
<PackageReference Include="Avalonia.Fonts.Inter"/>
<PackageReference Include="Avalonia.Controls.DataGrid"/>
<PackageReference Include="Avalonia.Diagnostics"/>
<PackageReference Include="CommunityToolkit.Mvvm"/>
<PackageReference Include="Serilog"/>
<PackageReference Include="Serilog.Sinks.Console"/>
<PackageReference Include="Serilog.Sinks.File"/>
</ItemGroup>
<ItemGroup>
@@ -34,4 +35,11 @@
<EmbeddedResource Include="Assets\app-icon.svg" />
</ItemGroup>
<ItemGroup>
<!-- Tmds.DBus.Protocol 0.20.0 reaches this project transitively from Avalonia.Desktop on
Linux/macOS only. We do not ship Linux/macOS builds of the Client.UI to end users;
this advisory affects dev-tooling code paths only. -->
<NuGetAuditSuppress Include="https://github.com/advisories/GHSA-xrw6-gwf8-vvr9"/>
</ItemGroup>
</Project>
@@ -0,0 +1,26 @@
namespace ZB.MOM.WW.OtOpcUa.Cluster;
public sealed class AkkaClusterOptions
{
public const string SectionName = "Cluster";
public string SystemName { get; set; } = "otopcua";
public string Hostname { get; set; } = "0.0.0.0";
public int Port { get; set; } = 4053;
/// <summary>
/// Hostname advertised in cluster gossip. Must be reachable by other nodes.
/// In docker-compose this is the container DNS name; in bare metal it's the
/// host's stable LAN address.
/// </summary>
public string PublicHostname { get; set; } = "127.0.0.1";
public string[] SeedNodes { get; set; } = Array.Empty<string>();
/// <summary>
/// Cluster roles for this node. When empty the role list comes from
/// <c>OTOPCUA_ROLES</c> via <see cref="RoleParser"/>. Allowed values:
/// <c>admin</c>, <c>driver</c>, <c>dev</c>.
/// </summary>
public string[] Roles { get; set; } = Array.Empty<string>();
}
@@ -0,0 +1,188 @@
using Akka.Actor;
using Akka.Cluster;
using Microsoft.Extensions.Logging;
using Microsoft.Extensions.Options;
using ZB.MOM.WW.OtOpcUa.Commons.Interfaces;
using CommonsNodeId = ZB.MOM.WW.OtOpcUa.Commons.Types.NodeId;
namespace ZB.MOM.WW.OtOpcUa.Cluster;
/// <summary>
/// Thread-safe live view of cluster membership and role topology. Subscribes to
/// <see cref="ClusterEvent.IMemberEvent"/>, <see cref="ClusterEvent.RoleLeaderChanged"/>, and
/// <see cref="ClusterEvent.LeaderChanged"/> through an internal subscriber actor and keeps
/// a snapshot of role-to-members + role-to-leader. The CLR-facing event surface is
/// <see cref="IClusterRoleInfo.RoleLeaderChanged"/>.
/// </summary>
public sealed class ClusterRoleInfo : IClusterRoleInfo, IDisposable
{
private readonly Akka.Cluster.Cluster _cluster;
private readonly ILogger<ClusterRoleInfo> _logger;
private readonly CommonsNodeId _localNode;
private readonly HashSet<string> _localRoles;
private readonly object _lock = new();
private readonly Dictionary<string, Member?> _roleLeaders = new(StringComparer.Ordinal);
private readonly Dictionary<string, HashSet<Member>> _membersByRole = new(StringComparer.Ordinal);
private IActorRef? _subscriber;
public ClusterRoleInfo(ActorSystem system, IOptions<AkkaClusterOptions> options, ILogger<ClusterRoleInfo> logger)
{
_cluster = Akka.Cluster.Cluster.Get(system);
_logger = logger;
// NodeId encodes host:port so cluster members on shared hosts (test loopback, dev VMs
// sharing a bind IP) stay distinct. Production hosts have unique DNS names so the port
// suffix is harmless redundancy.
_localNode = CommonsNodeId.Parse($"{options.Value.PublicHostname}:{options.Value.Port}");
_localRoles = new HashSet<string>(options.Value.Roles, StringComparer.Ordinal);
SeedFromCurrentState();
_subscriber = system.ActorOf(Props.Create(() => new SubscriberActor(this)), "clusterroleinfo-subscriber");
}
public CommonsNodeId LocalNode => _localNode;
public IReadOnlySet<string> LocalRoles => _localRoles;
public bool HasRole(string role) => _localRoles.Contains(role);
public IReadOnlyList<CommonsNodeId> MembersWithRole(string role)
{
lock (_lock)
{
if (!_membersByRole.TryGetValue(role, out var members)) return Array.Empty<CommonsNodeId>();
return members
.Select(m => ToNodeId(m.Address))
.ToArray();
}
}
public CommonsNodeId? RoleLeader(string role)
{
lock (_lock)
{
return _roleLeaders.TryGetValue(role, out var leader) && leader is not null
? ToNodeId(leader.Address)
: (CommonsNodeId?)null;
}
}
public event EventHandler<RoleLeaderChangedEventArgs>? RoleLeaderChanged;
private void SeedFromCurrentState()
{
var snapshot = _cluster.State;
lock (_lock)
{
foreach (var member in snapshot.Members)
foreach (var role in member.Roles)
{
if (!_membersByRole.TryGetValue(role, out var set))
_membersByRole[role] = set = new HashSet<Member>();
set.Add(member);
}
foreach (var role in snapshot.Members.SelectMany(m => m.Roles).Distinct())
{
var leaderAddr = _cluster.State.RoleLeader(role);
_roleLeaders[role] = leaderAddr is not null
? snapshot.Members.FirstOrDefault(m => m.Address == leaderAddr)
: null;
}
}
}
internal void HandleMemberEvent(ClusterEvent.IMemberEvent evt)
{
lock (_lock)
{
switch (evt)
{
case ClusterEvent.MemberUp up:
foreach (var role in up.Member.Roles)
{
if (!_membersByRole.TryGetValue(role, out var set))
_membersByRole[role] = set = new HashSet<Member>();
set.Add(up.Member);
}
break;
case ClusterEvent.MemberRemoved removed:
foreach (var role in removed.Member.Roles)
if (_membersByRole.TryGetValue(role, out var set))
set.Remove(removed.Member);
break;
}
}
}
internal void HandleRoleLeaderChanged(ClusterEvent.RoleLeaderChanged evt)
{
CommonsNodeId? previous = null;
CommonsNodeId? next = null;
var raise = false;
lock (_lock)
{
_roleLeaders.TryGetValue(evt.Role, out var prevMember);
if (prevMember is not null)
previous = ToNodeId(prevMember.Address);
var nextMember = evt.Leader is null
? null
: _cluster.State.Members.FirstOrDefault(m => m.Address == evt.Leader);
_roleLeaders[evt.Role] = nextMember;
if (nextMember is not null)
next = ToNodeId(nextMember.Address);
raise = !Nullable.Equals(previous, next);
}
if (!raise) return;
try
{
RoleLeaderChanged?.Invoke(this, new RoleLeaderChangedEventArgs
{
Role = evt.Role,
PreviousLeader = previous,
NewLeader = next,
});
}
catch (Exception ex)
{
_logger.LogWarning(ex, "RoleLeaderChanged subscriber threw for role {Role}", evt.Role);
}
}
private static CommonsNodeId ToNodeId(Akka.Actor.Address address) =>
CommonsNodeId.Parse($"{address.Host ?? string.Empty}:{address.Port ?? 0}");
public void Dispose()
{
_subscriber?.Tell(PoisonPill.Instance);
_subscriber = null;
}
private sealed class SubscriberActor : ReceiveActor
{
public SubscriberActor(ClusterRoleInfo owner)
{
Receive<ClusterEvent.IMemberEvent>(e => owner.HandleMemberEvent(e));
Receive<ClusterEvent.RoleLeaderChanged>(e => owner.HandleRoleLeaderChanged(e));
Receive<ClusterEvent.LeaderChanged>(_ => { /* no-op for now; reserved for ServiceLevel calc */ });
Receive<ClusterEvent.CurrentClusterState>(_ => { /* seeded from initial snapshot */ });
}
protected override void PreStart()
{
Akka.Cluster.Cluster.Get(Context.System).Subscribe(
Self,
ClusterEvent.InitialStateAsEvents,
typeof(ClusterEvent.IMemberEvent),
typeof(ClusterEvent.LeaderChanged),
typeof(ClusterEvent.RoleLeaderChanged));
}
protected override void PostStop() =>
Akka.Cluster.Cluster.Get(Context.System).Unsubscribe(Self);
}
}
@@ -0,0 +1,15 @@
namespace ZB.MOM.WW.OtOpcUa.Cluster;
public static class HoconLoader
{
private const string ResourceName = "ZB.MOM.WW.OtOpcUa.Cluster.Resources.akka.conf";
public static string LoadBaseConfig()
{
using var stream = typeof(HoconLoader).Assembly.GetManifestResourceStream(ResourceName)
?? throw new InvalidOperationException(
$"Embedded resource '{ResourceName}' not found. Verify EmbeddedResource glob in csproj.");
using var reader = new StreamReader(stream);
return reader.ReadToEnd();
}
}
@@ -0,0 +1,73 @@
# Base Akka.NET cluster configuration for OtOpcUa fused-host nodes.
#
# Roles, seed nodes, public hostname/port, and the actor system name are overlaid
# at runtime by AkkaHostedService — see ZB.MOM.WW.OtOpcUa.Cluster/AkkaHostedService.cs.
# Everything else here is the cluster-wide tuning that should match across nodes.
#
# Tuning sourced from ScadaLink (ScadaLink.Host/Actors/AkkaHostedService.BuildHocon);
# any divergence must be deliberate and recorded in docs/v2/Architecture.md.
akka {
extensions = [
"Akka.Cluster.Tools.PublishSubscribe.DistributedPubSubExtensionProvider, Akka.Cluster.Tools"
]
actor {
provider = cluster
}
remote {
dot-netty.tcp {
hostname = "0.0.0.0"
port = 4053
}
transport-failure-detector {
heartbeat-interval = 2s
acceptable-heartbeat-pause = 10s
}
}
cluster {
seed-nodes = []
roles = []
min-nr-of-members = 1
split-brain-resolver {
active-strategy = "keep-oldest"
stable-after = 15s
keep-oldest {
down-if-alone = on
}
}
failure-detector {
heartbeat-interval = 2s
threshold = 10.0
acceptable-heartbeat-pause = 10s
}
down-removal-margin = 15s
run-coordinated-shutdown-when-down = on
singleton {
singleton-name = "singleton"
}
singleton-proxy {
singleton-identification-interval = 1s
}
}
coordinated-shutdown {
run-by-clr-shutdown-hook = on
default-phase-timeout = 30s
}
}
# Pinned dispatcher used by OpcUaPublishActor (Task 44) so the OPC UA SDK sees
# only one thread per actor instance — its session/subscription locks expect
# strict single-threaded access.
opcua-synchronized-dispatcher {
type = "PinnedDispatcher"
executor = "thread-pool-executor"
throughput = 1
}
@@ -0,0 +1,29 @@
namespace ZB.MOM.WW.OtOpcUa.Cluster;
public static class RoleParser
{
private static readonly HashSet<string> Allowed = new(StringComparer.Ordinal)
{
"admin", "driver", "dev",
};
public static string[] Parse(string? raw)
{
if (string.IsNullOrWhiteSpace(raw)) return Array.Empty<string>();
var roles = raw
.Split(',', StringSplitOptions.RemoveEmptyEntries | StringSplitOptions.TrimEntries)
.Select(r => r.ToLowerInvariant())
.Distinct()
.ToArray();
foreach (var r in roles)
{
if (!Allowed.Contains(r))
throw new ArgumentException(
$"Unknown role '{r}'. Allowed: {string.Join(", ", Allowed)}.", nameof(raw));
}
return roles;
}
}
@@ -0,0 +1,67 @@
using Akka.Cluster.Hosting;
using Akka.Hosting;
using Akka.Remote.Hosting;
using Microsoft.Extensions.Configuration;
using Microsoft.Extensions.DependencyInjection;
using Microsoft.Extensions.Options;
using ZB.MOM.WW.OtOpcUa.Commons.Interfaces;
namespace ZB.MOM.WW.OtOpcUa.Cluster;
public static class ServiceCollectionExtensions
{
/// <summary>
/// Binds <see cref="AkkaClusterOptions"/> and registers <see cref="IClusterRoleInfo"/>. The
/// actual ActorSystem + cluster bootstrap is layered on inside the host's <c>AddAkka(...)</c>
/// configurator via <see cref="WithOtOpcUaClusterBootstrap"/> — keeping the entire Akka graph
/// under Akka.Hosting's management so cluster singletons land on the same ActorSystem.
/// </summary>
public static IServiceCollection AddOtOpcUaCluster(this IServiceCollection services, IConfiguration configuration)
{
services.AddOptions<AkkaClusterOptions>()
.Bind(configuration.GetSection(AkkaClusterOptions.SectionName));
services.AddSingleton<IClusterRoleInfo, ClusterRoleInfo>();
return services;
}
/// <summary>
/// Configures the Akka.Hosting builder with the embedded OtOpcUa HOCON (split-brain resolver,
/// pinned dispatcher, failure detector tuning) + remote endpoint + cluster bootstrap derived
/// from <see cref="AkkaClusterOptions"/>.
///
/// Wire from Program.cs:
/// <code>
/// services.AddAkka("otopcua", (ab, sp) =>
/// {
/// ab.WithOtOpcUaClusterBootstrap(sp);
/// if (hasAdmin) ab.WithOtOpcUaControlPlaneSingletons();
/// if (hasDriver) ab.WithOtOpcUaRuntimeActors();
/// });
/// </code>
/// </summary>
public static AkkaConfigurationBuilder WithOtOpcUaClusterBootstrap(
this AkkaConfigurationBuilder builder,
IServiceProvider serviceProvider)
{
var options = serviceProvider.GetRequiredService<IOptions<AkkaClusterOptions>>().Value;
builder.AddHocon(HoconLoader.LoadBaseConfig(), HoconAddMode.Append);
builder.WithRemoting(new RemoteOptions
{
HostName = options.Hostname,
Port = options.Port,
PublicHostName = options.PublicHostname,
});
builder.WithClustering(new ClusterOptions
{
SeedNodes = options.SeedNodes,
Roles = options.Roles,
});
return builder;
}
}
@@ -0,0 +1,32 @@
<Project Sdk="Microsoft.NET.Sdk">
<PropertyGroup>
<RootNamespace>ZB.MOM.WW.OtOpcUa.Cluster</RootNamespace>
<TreatWarningsAsErrors>true</TreatWarningsAsErrors>
</PropertyGroup>
<ItemGroup>
<PackageReference Include="Akka.Hosting"/>
<PackageReference Include="Akka.Cluster"/>
<PackageReference Include="Akka.Cluster.Hosting"/>
<PackageReference Include="Akka.Cluster.Tools"/>
<PackageReference Include="Akka.Remote.Hosting"/>
<PackageReference Include="Microsoft.Extensions.Hosting"/>
<PackageReference Include="Microsoft.Extensions.Options.ConfigurationExtensions"/>
</ItemGroup>
<ItemGroup>
<ProjectReference Include="..\ZB.MOM.WW.OtOpcUa.Commons\ZB.MOM.WW.OtOpcUa.Commons.csproj"/>
</ItemGroup>
<ItemGroup>
<EmbeddedResource Include="Resources\akka.conf"/>
</ItemGroup>
<ItemGroup>
<!-- OpenTelemetry.Api 1.9.0 reaches this project transitively from Akka.Cluster.Hosting.
Bump arrives when Akka updates its OTel dependency; tracked separately. -->
<NuGetAuditSuppress Include="https://github.com/advisories/GHSA-g94r-2vxg-569j"/>
</ItemGroup>
</Project>
@@ -0,0 +1,41 @@
namespace ZB.MOM.WW.OtOpcUa.Commons.Engines;
/// <summary>
/// Persistence seam for <c>ScriptedAlarmActor</c>'s in-memory state across actor restarts.
/// Captures only the slice the actor's 3-state machine needs (Inactive / Active /
/// Acknowledged + last transition + last-ack user). The fuller GxP audit trail
/// (<see cref="Configuration.Entities.ScriptedAlarmState"/>'s Comments/Confirmed/Shelving)
/// stays in the production engine binding — this seam is the small surface the actor
/// consumes directly.
/// </summary>
public interface IAlarmActorStateStore
{
Task<AlarmActorStateSnapshot?> LoadAsync(string alarmId, CancellationToken ct);
Task SaveAsync(AlarmActorStateSnapshot snapshot, CancellationToken ct);
}
/// <summary>Persisted slice of <c>ScriptedAlarmActor</c>'s state. Active is NOT persisted —
/// it re-derives from the evaluator on startup per Phase 7 decision #14. <c>State</c> here
/// distinguishes Acknowledged vs not-yet-acknowledged for cases where the actor came up
/// Active and operator interaction had already happened.</summary>
/// <param name="AlarmId">Matches <c>ScriptedAlarm.ScriptedAlarmId</c>.</param>
/// <param name="State">Inactive / Active / Acknowledged — the actor's 3-state enum, projected to string.</param>
/// <param name="LastTransitionUtc">When the actor last transitioned.</param>
/// <param name="LastAckUser">Who acknowledged most recently. Null when never acked.</param>
public sealed record AlarmActorStateSnapshot(
string AlarmId,
string State,
DateTime LastTransitionUtc,
string? LastAckUser);
/// <summary>No-op default. Bound when no production store is configured (tests, smoke runs).
/// Load returns null → actor boots Inactive; Save is a no-op so state doesn't leak.</summary>
public sealed class NullAlarmActorStateStore : IAlarmActorStateStore
{
public static readonly NullAlarmActorStateStore Instance = new();
private NullAlarmActorStateStore() { }
public Task<AlarmActorStateSnapshot?> LoadAsync(string alarmId, CancellationToken ct) =>
Task.FromResult<AlarmActorStateSnapshot?>(null);
public Task SaveAsync(AlarmActorStateSnapshot snapshot, CancellationToken ct) =>
Task.CompletedTask;
}
@@ -0,0 +1,30 @@
namespace ZB.MOM.WW.OtOpcUa.Commons.Engines;
/// <summary>
/// Abstraction over the scripted-alarm predicate engine. Production binds this to a
/// wrapper around <c>ScriptedAlarmEngine</c> from <c>Core.ScriptedAlarms</c>; default
/// binding is <see cref="NullScriptedAlarmEvaluator"/> which keeps the alarm in its
/// current state (so an unconfigured node never spuriously alarms).
/// </summary>
public interface IScriptedAlarmEvaluator
{
ScriptedAlarmEvalResult Evaluate(string alarmId, string predicate, IReadOnlyDictionary<string, object?> dependencies);
}
/// <summary>Result of one alarm-predicate evaluation. <c>Active</c> is only meaningful when
/// <c>Success</c> is true; on failure the caller should keep the prior state and log Reason.</summary>
public sealed record ScriptedAlarmEvalResult(bool Success, bool Active, string? Reason)
{
public static ScriptedAlarmEvalResult Ok(bool active) => new(true, active, null);
public static ScriptedAlarmEvalResult Failure(string reason) => new(false, false, reason);
}
/// <summary>Default that always returns <c>Active = false, Success = true</c>. Safe no-op:
/// no alarm fires when no real engine is bound.</summary>
public sealed class NullScriptedAlarmEvaluator : IScriptedAlarmEvaluator
{
public static readonly NullScriptedAlarmEvaluator Instance = new();
private NullScriptedAlarmEvaluator() { }
public ScriptedAlarmEvalResult Evaluate(string alarmId, string predicate, IReadOnlyDictionary<string, object?> dependencies)
=> ScriptedAlarmEvalResult.Ok(active: false);
}
@@ -0,0 +1,36 @@
namespace ZB.MOM.WW.OtOpcUa.Commons.Engines;
/// <summary>
/// Abstraction over the compiled virtual-tag expression engine. Runtime consumes this so
/// <see cref="VirtualTagActor"/> can stay free of Roslyn / scripting machinery and the
/// production wiring binds an adapter over <c>VirtualTagEngine</c> from
/// <c>Core.VirtualTags</c>.
/// </summary>
public interface IVirtualTagEvaluator
{
/// <summary>
/// Evaluate <paramref name="expression"/> against the snapshot in
/// <paramref name="dependencies"/>. Implementations must not throw — script failures
/// are reported via <see cref="VirtualTagEvalResult.Failure"/>.
/// </summary>
VirtualTagEvalResult Evaluate(string virtualTagId, string expression, IReadOnlyDictionary<string, object?> dependencies);
}
/// <summary>Result of one virtual-tag expression eval. Stash a Reason on every Failure so
/// callers can emit a useful <c>ScriptLogEntry</c> to operators.</summary>
public sealed record VirtualTagEvalResult(bool Success, object? Value, string? Reason)
{
public static readonly VirtualTagEvalResult NoChange = new(true, null, "no-change");
public static VirtualTagEvalResult Ok(object? value) => new(true, value, null);
public static VirtualTagEvalResult Failure(string reason) => new(false, null, reason);
}
/// <summary>Returns <see cref="VirtualTagEvalResult.NoChange"/> from every call. Bound by default
/// when the production <c>VirtualTagEngine</c> adapter hasn't been registered (Mac dev, tests).</summary>
public sealed class NullVirtualTagEvaluator : IVirtualTagEvaluator
{
public static readonly NullVirtualTagEvaluator Instance = new();
private NullVirtualTagEvaluator() { }
public VirtualTagEvalResult Evaluate(string virtualTagId, string expression, IReadOnlyDictionary<string, object?> dependencies)
=> VirtualTagEvalResult.NoChange;
}
@@ -0,0 +1,13 @@
using ZB.MOM.WW.OtOpcUa.Commons.Messages.Admin;
namespace ZB.MOM.WW.OtOpcUa.Commons.Interfaces;
/// <summary>
/// Cluster-singleton-proxy client for the <c>AdminOperationsActor</c>. The Blazor UI calls
/// this from any host (admin or driver role); the proxy routes the request to whichever node
/// holds the admin singleton.
/// </summary>
public interface IAdminOperationsClient
{
Task<StartDeploymentResult> StartDeploymentAsync(string createdBy, CancellationToken ct);
}
@@ -0,0 +1,20 @@
using ZB.MOM.WW.OtOpcUa.Commons.Types;
namespace ZB.MOM.WW.OtOpcUa.Commons.Interfaces;
/// <summary>
/// Live view of the local node's identity and the cluster's role topology. Implemented by
/// <c>ClusterRoleInfo</c> in <c>OtOpcUa.Cluster</c>; consumed by everything that needs to
/// distinguish admin-role vs driver-role members or react to role-leader changes (e.g. OPC UA
/// ServiceLevel computation).
/// </summary>
public interface IClusterRoleInfo
{
NodeId LocalNode { get; }
IReadOnlySet<string> LocalRoles { get; }
bool HasRole(string role);
IReadOnlyList<NodeId> MembersWithRole(string role);
NodeId? RoleLeader(string role);
event EventHandler<RoleLeaderChangedEventArgs>? RoleLeaderChanged;
}

Some files were not shown because too many files have changed in this diff Show More