Files
lmxopcua/docs/v2/lmx-followups.md
Joseph Doherty aa8834a231 Phase 3 PR 40 — LiveStackSmokeTests: write-roundtrip + subscribe-receives-OnDataChange against the live Galaxy. Finishes LMX #5 by exercising the IWritable + ISubscribable capability paths end-to-end through the Proxy → OtOpcUaGalaxyHost service → MXAccess → real Galaxy.
Two new facts target DelmiaReceiver_001.TestAttribute — the writable Boolean UDA on the TestMachine_001 hierarchy in this dev Galaxy. The user nominated TestMachine_001 (the deployed test-target object) as a scratch surface for live testing; ZB query showed DelmiaReceiver_001 carries one dynamic_attribute named TestAttribute (mx_data_type=1=Boolean, lock_type=0=writable, security_classification=1=Operate). Naming makes the intent obvious — the attribute exists for exactly this kind of integration testing — and Boolean keeps the assertions simple (invert, write, read back).
Write_then_read_roundtrips_a_writable_Boolean_attribute_on_TestMachine_001: reads the current value as the baseline (Galaxy may return Uncertain quality until the Engine has scanned the attribute at least once — we don't read into a typed bool until Status is Good), inverts it, writes via IWritable, then polls reads in a 5s loop until either the new value comes back or the budget expires. The scan-window poll (rather than a single read after a fixed delay) accommodates Galaxy's variable scan latency on a fresh service start. Restore-on-finally writes the original value back so re-running the test doesn't accumulate a flipped TestAttribute on the dev box (Galaxy holds UDA values across runs since they're deployed). Best-effort restore — swallows exceptions so a failure in restore doesn't mask the primary assertion.
Subscribe_fires_OnDataChange_with_initial_value_then_again_after_a_write: subscribes to the same attribute with a 250ms publishing interval, captures every OnDataChange notification onto a thread-safe ConcurrentQueue (MXAccess advisory fires on its own thread per Galaxy's COM apartment model — must not block it), waits up to 5s for the initial-value callback (per ISubscribable's contract: 'driver MAY fire OnDataChange immediately with the current value'), records the queue depth as a baseline, writes the toggled value, waits up to 8s for at least one MORE notification, then searches the queue tail for the notification carrying the toggled value (initial value may appear multiple times before the write commits — looking at the tail finds the post-write delta even if the queue grew during the wait window). Unsubscribes on finally + restores baseline.
Both tests use Convert.ToBoolean(value ?? false) to defensively handle the Boxed-vs-typed quirk in MessagePack-deserialized Galaxy values — depending on the wire encoding the Boolean might come back as System.Boolean or System.Object boxing one. Convert.ToBoolean handles both. Same pattern in OnReadValue's existing usage.
WaitForAsync helper does the loop+budget pattern shared by both tests.
PR 40 is the code side of LMX #5's final two deferred facts. To actually run them green requires re-executing from a normal (non-admin) PowerShell — the elevated-shell skip from PR 39 fires correctly under bash + sc.exe-context (verified). lmx-followups.md #5 updated to note the new facts + the run command + the one remaining genuine follow-up (alarm-condition fact when an alarm-flagged attribute is deployed on TestMachine_001).
Test posture from elevated bash: 7 LiveStackSmokeTests facts discovered (was 5; +2 new), all skip cleanly with the elevation message. Build clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-18 19:38:34 -04:00

10 KiB

LMX Galaxy bridge — remaining follow-ups

State after PR 19: the Galaxy driver is functionally at v1 parity through the IDriver abstraction; the OPC UA server runs with LDAP-authenticated Basic256Sha256 endpoints and alarms are observable through AlarmConditionState.ReportEvent. The items below are what remains LMX- specific before the stack can fully replace the v1 deployment, in rough priority order.

1. Proxy-side IHistoryProvider for ReadAtTime / ReadEventsDONE (PRs 35 + 38)

PR 35 extended IHistoryProvider with ReadAtTimeAsync + ReadEventsAsync (default throwing implementations so existing impls keep compiling), added the HistoricalEvent + HistoricalEventsResult records to Core.Abstractions, and implemented both methods in GalaxyProxyDriver on top of the PR 10 / PR 11 IPC messages.

PR 38 wired the OPC UA HistoryRead service-handler through DriverNodeManager by overriding CustomNodeManager2's four per-kind hooks — HistoryReadRawModified / HistoryReadProcessed / HistoryReadAtTime / HistoryReadEvents. Each walks nodesToProcess, resolves the driver-side full reference from NodeId.Identifier, dispatches to the right IHistoryProvider method, and populates the paired results + errors lists (both must be set — the MasterNodeManager merges them and a Good result with an unset error slot serializes as BadHistoryOperationUnsupported on the wire). Historized variables gain AccessLevels.HistoryRead so the stack dispatches; the driver root folder gains EventNotifiers.HistoryRead so HistoryReadEvents can target it.

Aggregate translation uses a small MapAggregate helper that handles Average / Minimum / Maximum / Total / Count (the enum surface the driver exposes) and returns null for unsupported aggregates so the handler can surface BadAggregateNotSupported. Raw+Processed+AtTime wrap driver samples as HistoryData in an ExtensionObject; Events emits a HistoryEvent with the standard BaseEventType field list (EventId / SourceName / Message / Severity / Time / ReceiveTime) — custom SelectClause evaluation is an explicit follow-up.

Tests:

  • DriverNodeManagerHistoryMappingTests — 12 unit cases pinning MapAggregate, BuildHistoryData, BuildHistoryEvent, ToDataValue.
  • HistoryReadIntegrationTests — 5 end-to-end cases drive a real OPC UA client (Session.HistoryRead) against a fake IHistoryProvider driver through the running stack. Covers raw round-trip, processed with Average aggregate, unsupported aggregate → BadAggregateNotSupported, at-time timestamp forwarding, and events field-list shape.

Deferred:

  • Continuation-point plumbing via Session.Save/RestoreHistoryContinuationPoint. Driver returns null continuations today so the pass-through is fine.
  • Per-SelectClause evaluation in HistoryReadEvents — clients that send a custom field selection currently get the standard BaseEventType layout.

2. Write-gating by role — DONE (PR 26)

Landed in PR 26. WriteAuthzPolicy in Server/Security/ maps SecurityClassification → required role (FreeAccess → no role required, Operate/SecuredWriteWriteOperate, TuneWriteTune, Configure/VerifiedWriteWriteConfigure, ViewOnly → deny regardless). DriverNodeManager caches the classification per variable during discovery and checks the session's roles (via IRoleBearer) in OnWriteValue before calling IWritable.WriteAsync. Roles do not cascade — a session with WriteOperate can't write a Tune attribute unless it also carries WriteTune.

See feedback_acl_at_server_layer.md in memory for the architectural directive that authz stays at the server layer and never delegates to driver-specific auth.

3. Admin UI client-cert trust management — DONE (PR 28)

PR 28 shipped /certificates in the Admin UI. CertTrustService reads the OPC UA server's PKI store root (OpcUaServerOptions.PkiStoreRoot — default %ProgramData%\OtOpcUa\pki) and lists rejected + trusted certs by parsing the .der files directly, so it has no Opc.Ua dependency and runs on any Admin host that can reach the shared PKI directory.

Operator actions: Trust (moves rejected/certs/*.dertrusted/certs/*.der), Delete rejected, Revoke trust. The OPC UA stack re-reads the trusted store on each new client handshake, so no explicit reload signal is needed — operators retry the rejected client's connection after trusting.

Deferred: flipping AutoAcceptUntrustedClientCertificates to false as the deployment default. That's a production-hardening config change, not a code gap — the Admin UI is now ready to be the trust gate.

4. Live-LDAP integration test — DONE (PR 31)

PR 31 shipped Server.Tests/LdapUserAuthenticatorLiveTests.cs — 6 live-bind tests against the dev GLAuth instance at localhost:3893, skipped cleanly when the port is unreachable. Covers: valid bind, wrong password, unknown user, empty credentials, single-group → WriteOperate mapping, multi-group admin user surfacing all mapped roles.

Also added UserNameAttribute to LdapOptions (default uid for RFC 2307 compat) so Active Directory deployments can configure sAMAccountName / userPrincipalName without code changes. LdapUserAuthenticatorAdCompatTests (5 unit guards) pins the AD-shape DN parsing + filter escape behaviors. See docs/security.md §"Active Directory configuration" for the AD appsettings snippet.

Deferred: asserting session.Identity end-to-end on the server side (i.e. drive a full OPC UA session with username/password, then read an IHostConnectivityProbe-style "whoami" node to verify the role surfaced). That needs a test-only address-space node and is a separate PR.

5. Full Galaxy live-service smoke test against the merged v2 stack — IN PROGRESS (PRs 36 + 37)

PR 36 shipped the prerequisites helper (AvevaPrerequisites) that probes every dependency a live smoke test needs and produces actionable skip messages.

PR 37 shipped the live-stack smoke test project structure: tests/Driver.Galaxy.Proxy.Tests/LiveStack/ with LiveStackFixture (connects to the already-running OtOpcUaGalaxyHost Windows service via named pipe; never spawns the Host process) and LiveStackSmokeTests covering:

  • Fixture initializes successfully (IPC handshake succeeds end-to-end).
  • Driver reports DriverState.Healthy post-handshake.
  • DiscoverAsync returns at least one variable from the live Galaxy.
  • GetHostStatuses reports at least one Platform/AppEngine host.
  • ReadAsync on a discovered variable round-trips through Proxy → Host pipe → MXAccess → back without a BadInternalError.

Shared secret + pipe name resolve from OTOPCUA_GALAXY_SECRET / OTOPCUA_GALAXY_PIPE env vars, falling back to reading the service's registry-stored Environment values (requires elevated test host).

PR 40 added the write + subscribe facts targeting DelmiaReceiver_001.TestAttribute (the writable Boolean UDA the dev Galaxy ships under TestMachine_001) — write-then-read with a 5s scan-window poll + restore-on-finally, and subscribe-then-write asserting both an initial-value OnDataChange and a post-write OnDataChange. PR 39 added the elevated-shell short-circuit so a developer running from an admin window gets an actionable skip instead of UnauthorizedAccessException.

Run the live tests (from a NORMAL non-admin PowerShell):

$env:OTOPCUA_GALAXY_SECRET = Get-Content C:\Users\dohertj2\Desktop\lmxopcua\.local\galaxy-host-secret.txt
cd C:\Users\dohertj2\Desktop\lmxopcua
dotnet test tests\ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Proxy.Tests --filter "FullyQualifiedName~LiveStackSmokeTests"

Expected: 7/7 pass against the running OtOpcUaGalaxyHost service.

Remaining for #5 in production-grade form:

  • Confirm the suite passes from a non-elevated shell (operator action).
  • Add similar facts for an alarm-source attribute once TestMachine_001 (or a sibling) carries a deployed alarm condition — the current dev Galaxy's TestAttribute isn't alarm-flagged.

6. Second driver instance on the same server — DONE (PR 32)

Server.Tests/MultipleDriverInstancesIntegrationTests.cs registers two drivers with distinct DriverInstanceIds on one DriverHost, spins up the full OPC UA server, and asserts three behaviors: (1) each driver's namespace URI (urn:OtOpcUa:{id}) resolves to a distinct index in the client's NamespaceUris, (2) browsing one subtree returns that driver's folder and does NOT leak the other driver's folder, (3) reads route to the correct driver — the alpha instance returns 42 while beta returns 99, so a misroute would surface at the assertion layer.

Deferred: the alarm-event multi-driver parity case (two drivers each raising a GalaxyAlarmEvent, assert each condition lands on its owning instance's condition node). Alarm tracking already has its own integration test (AlarmSubscription*); the multi-driver alarm case would need a stub IAlarmSource that's worth its own focused PR.

7. Host-status per-AppEngine granularity → Admin UI dashboard — DONE (PRs 33 + 34)

PR 33 landed the data layer: DriverHostStatus entity + migration with composite key (NodeId, DriverInstanceId, HostName) and two query-supporting indexes (per-cluster drill-down on NodeId, stale-row detection on LastSeenUtc).

PR 34 wired the publisher + consumer. HostStatusPublisher is a BackgroundService in the Server process that walks every registered IHostConnectivityProbe-capable driver every 10s, calls GetHostStatuses(), and upserts rows (LastSeenUtc advances each tick; State + StateChangedUtc update on transitions). Admin UI /hosts page groups by cluster, shows four summary cards (Hosts / Running / Stale / Faulted), and flags rows whose LastSeenUtc is older than 30s as Stale so operators see crashed Servers without waiting for a state change.

Deferred as follow-ups:

  • Event-driven push (subscribe to OnHostStatusChanged per driver for sub-heartbeat latency). Adds DriverHost lifecycle-event plumbing; 10s polling is fine for operator-scale use.
  • Failure-count column — needs the publisher to track a transition history per host, not just current-state.
  • SignalR fan-out to the Admin page (currently the page polls the DB, not a hub). The DB-polled version is fine at current cadence but a hub push would eliminate the 10s race where a new row sits in the DB before the Admin page notices.