Files
lmxopcua/docs/v2/lmx-followups.md
Joseph Doherty 52a29100b1 Phase 3 PR 38 — DriverNodeManager HistoryRead override (LMX #1 finish). Wires the OPC UA HistoryRead service through CustomNodeManager2's four protected per-kind hooks — HistoryReadRawModified / HistoryReadProcessed / HistoryReadAtTime / HistoryReadEvents — each dispatching to the driver's IHistoryProvider capability (PR 35 for ReadAtTime + ReadEvents on top of PR 19-era ReadRaw + ReadProcessed). Was the last missing piece of the end-to-end HistoryRead path: PR 10 + PR 11 shipped the Galaxy.Host IPC contracts, PR 35 surfaced them on IHistoryProvider + GalaxyProxyDriver, but no server-side handler bridged OPC UA HistoryRead service requests onto the capability interface. Now it does.
Per-kind override shape: each hook receives the pre-filtered nodesToProcess list (NodeHandles for nodes this manager claimed), iterates them, resolves handle.NodeId.Identifier to the driver-side full reference string, and dispatches to the right IHistoryProvider method. Write back into the outer results + errors slots at handle.Index (not the local loop counter — nodesToProcess is a filtered subset of nodesToRead, so indexing by the loop counter lands in the wrong slot for mixed-manager batches). WriteResult helper sets both results[i] AND errors[i]; this matters because MasterNodeManager merges them and leaving errors[i] at its default (BadHistoryOperationUnsupported) overrides a Good result with Unsupported on the wire — this was the subtle failure mode that masked a correctly-constructed HistoryData response during debugging. Failure-isolation per node: NotSupportedException from a driver that doesn't implement a particular HistoryProvider method translates to BadHistoryOperationUnsupported in that slot; generic exceptions log and surface BadInternalError; unresolvable NodeIds get BadNodeIdUnknown. The batch continues unconditionally.
Aggregate mapping: MapAggregate translates ObjectIds.AggregateFunction_Average / Minimum / Maximum / Total / Count to the driver's HistoryAggregateType enum. Null for anything else (e.g. TimeAverage, Interpolative) so the handler surfaces BadAggregateNotSupported at the batch level — per Part 13, one unsupported aggregate means the whole request fails since ReadProcessedDetails carries one aggregate list for all nodes. BuildHistoryData wraps driver DataValueSnapshots as Opc.Ua.HistoryData in an ExtensionObject; BuildHistoryEvent wraps HistoricalEvents as Opc.Ua.HistoryEvent with the canonical BaseEventType field list (EventId, SourceName, Message, Severity, Time, ReceiveTime — the order OPC UA clients that didn't customize the SelectClause expect). ToDataValue preserves null SourceTimestamp (Galaxy historian rows often carry only ServerTimestamp) — synthesizing a SourceTimestamp would lie about actual sample time.
Two address-space changes were required to make the stack dispatch reach the per-kind hooks at all: (1) historized variables get AccessLevels.HistoryRead added to their AccessLevel byte — the base's early-gate check on (variable.AccessLevel & HistoryRead != 0) was rejecting requests before our override ever ran; (2) the driver-root folder gets EventNotifiers.HistoryRead | SubscribeToEvents so HistoryReadEvents can target it (the conventional pattern for alarm-history browse against a driver-owned object). Document the 'set both bits' requirement inline since it's not obvious from the surface API.
OpcHistoryReadResult alias: Opc.Ua.HistoryReadResult (service-layer per-node result) collides with Core.Abstractions.HistoryReadResult (driver-side samples + continuation point) by type name; the alias 'using OpcHistoryReadResult = Opc.Ua.HistoryReadResult' keeps the override signatures unambiguous and the test project applies the mirror pattern for its stub driver impl.
Tests — DriverNodeManagerHistoryMappingTests (12 new Category=Unit cases): MapAggregate translates each supported aggregate NodeId via reflection-backed theory (guards against the stack renaming AggregateFunction_* constants); returns null for unsupported NodeIds (TimeAverage) and null input; BuildHistoryData wraps samples with correct DataValues + SourceTimestamp preservation; BuildHistoryEvent emits the 6-element BaseEventType field list in canonical order (regression guard for a future 'respect the client's SelectClauses' change); null SourceName / Message translate to empty-string Variants (nullable-Variant refactor trap); ToDataValue preserves StatusCode + both timestamps; ToDataValue leaves SourceTimestamp at default when the snapshot omits it. HistoryReadIntegrationTests (5 new Category=Integration): drives a real OPC UA client Session.HistoryRead against a fake HistoryDriver through the running server. Covers raw round-trip (verifies per-node DataValue ordering + values); processed with Average aggregate (captures the driver's received aggregate + interval, asserting MapAggregate routed correctly); unsupported aggregate (TimeAverage → BadAggregateNotSupported); at-time (forwards the per-timestamp list); events (BaseEventType field list shape, SelectClauses populated to satisfy the stack's filter validator). Server.Tests Unit: 55 pass / 0 fail (43 prior + 12 new mapping). Server.Tests Integration: 14 pass / 0 fail (9 prior + 5 new history). Full solution build clean, 0 errors.
lmx-followups.md #1 updated to 'DONE (PRs 35 + 38)' with two explicit deferred items: continuation-point plumbing (driver returns null today so pass-through is fine) and per-SelectClause evaluation in HistoryReadEvents (clients with custom field selections get the canonical BaseEventType layout today).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-18 17:50:23 -04:00

9.6 KiB

LMX Galaxy bridge — remaining follow-ups

State after PR 19: the Galaxy driver is functionally at v1 parity through the IDriver abstraction; the OPC UA server runs with LDAP-authenticated Basic256Sha256 endpoints and alarms are observable through AlarmConditionState.ReportEvent. The items below are what remains LMX- specific before the stack can fully replace the v1 deployment, in rough priority order.

1. Proxy-side IHistoryProvider for ReadAtTime / ReadEventsDONE (PRs 35 + 38)

PR 35 extended IHistoryProvider with ReadAtTimeAsync + ReadEventsAsync (default throwing implementations so existing impls keep compiling), added the HistoricalEvent + HistoricalEventsResult records to Core.Abstractions, and implemented both methods in GalaxyProxyDriver on top of the PR 10 / PR 11 IPC messages.

PR 38 wired the OPC UA HistoryRead service-handler through DriverNodeManager by overriding CustomNodeManager2's four per-kind hooks — HistoryReadRawModified / HistoryReadProcessed / HistoryReadAtTime / HistoryReadEvents. Each walks nodesToProcess, resolves the driver-side full reference from NodeId.Identifier, dispatches to the right IHistoryProvider method, and populates the paired results + errors lists (both must be set — the MasterNodeManager merges them and a Good result with an unset error slot serializes as BadHistoryOperationUnsupported on the wire). Historized variables gain AccessLevels.HistoryRead so the stack dispatches; the driver root folder gains EventNotifiers.HistoryRead so HistoryReadEvents can target it.

Aggregate translation uses a small MapAggregate helper that handles Average / Minimum / Maximum / Total / Count (the enum surface the driver exposes) and returns null for unsupported aggregates so the handler can surface BadAggregateNotSupported. Raw+Processed+AtTime wrap driver samples as HistoryData in an ExtensionObject; Events emits a HistoryEvent with the standard BaseEventType field list (EventId / SourceName / Message / Severity / Time / ReceiveTime) — custom SelectClause evaluation is an explicit follow-up.

Tests:

  • DriverNodeManagerHistoryMappingTests — 12 unit cases pinning MapAggregate, BuildHistoryData, BuildHistoryEvent, ToDataValue.
  • HistoryReadIntegrationTests — 5 end-to-end cases drive a real OPC UA client (Session.HistoryRead) against a fake IHistoryProvider driver through the running stack. Covers raw round-trip, processed with Average aggregate, unsupported aggregate → BadAggregateNotSupported, at-time timestamp forwarding, and events field-list shape.

Deferred:

  • Continuation-point plumbing via Session.Save/RestoreHistoryContinuationPoint. Driver returns null continuations today so the pass-through is fine.
  • Per-SelectClause evaluation in HistoryReadEvents — clients that send a custom field selection currently get the standard BaseEventType layout.

2. Write-gating by role — DONE (PR 26)

Landed in PR 26. WriteAuthzPolicy in Server/Security/ maps SecurityClassification → required role (FreeAccess → no role required, Operate/SecuredWriteWriteOperate, TuneWriteTune, Configure/VerifiedWriteWriteConfigure, ViewOnly → deny regardless). DriverNodeManager caches the classification per variable during discovery and checks the session's roles (via IRoleBearer) in OnWriteValue before calling IWritable.WriteAsync. Roles do not cascade — a session with WriteOperate can't write a Tune attribute unless it also carries WriteTune.

See feedback_acl_at_server_layer.md in memory for the architectural directive that authz stays at the server layer and never delegates to driver-specific auth.

3. Admin UI client-cert trust management — DONE (PR 28)

PR 28 shipped /certificates in the Admin UI. CertTrustService reads the OPC UA server's PKI store root (OpcUaServerOptions.PkiStoreRoot — default %ProgramData%\OtOpcUa\pki) and lists rejected + trusted certs by parsing the .der files directly, so it has no Opc.Ua dependency and runs on any Admin host that can reach the shared PKI directory.

Operator actions: Trust (moves rejected/certs/*.dertrusted/certs/*.der), Delete rejected, Revoke trust. The OPC UA stack re-reads the trusted store on each new client handshake, so no explicit reload signal is needed — operators retry the rejected client's connection after trusting.

Deferred: flipping AutoAcceptUntrustedClientCertificates to false as the deployment default. That's a production-hardening config change, not a code gap — the Admin UI is now ready to be the trust gate.

4. Live-LDAP integration test — DONE (PR 31)

PR 31 shipped Server.Tests/LdapUserAuthenticatorLiveTests.cs — 6 live-bind tests against the dev GLAuth instance at localhost:3893, skipped cleanly when the port is unreachable. Covers: valid bind, wrong password, unknown user, empty credentials, single-group → WriteOperate mapping, multi-group admin user surfacing all mapped roles.

Also added UserNameAttribute to LdapOptions (default uid for RFC 2307 compat) so Active Directory deployments can configure sAMAccountName / userPrincipalName without code changes. LdapUserAuthenticatorAdCompatTests (5 unit guards) pins the AD-shape DN parsing + filter escape behaviors. See docs/security.md §"Active Directory configuration" for the AD appsettings snippet.

Deferred: asserting session.Identity end-to-end on the server side (i.e. drive a full OPC UA session with username/password, then read an IHostConnectivityProbe-style "whoami" node to verify the role surfaced). That needs a test-only address-space node and is a separate PR.

5. Full Galaxy live-service smoke test against the merged v2 stack — IN PROGRESS (PRs 36 + 37)

PR 36 shipped the prerequisites helper (AvevaPrerequisites) that probes every dependency a live smoke test needs and produces actionable skip messages.

PR 37 shipped the live-stack smoke test project structure: tests/Driver.Galaxy.Proxy.Tests/LiveStack/ with LiveStackFixture (connects to the already-running OtOpcUaGalaxyHost Windows service via named pipe; never spawns the Host process) and LiveStackSmokeTests covering:

  • Fixture initializes successfully (IPC handshake succeeds end-to-end).
  • Driver reports DriverState.Healthy post-handshake.
  • DiscoverAsync returns at least one variable from the live Galaxy.
  • GetHostStatuses reports at least one Platform/AppEngine host.
  • ReadAsync on a discovered variable round-trips through Proxy → Host pipe → MXAccess → back without a BadInternalError.

Shared secret + pipe name resolve from OTOPCUA_GALAXY_SECRET / OTOPCUA_GALAXY_PIPE env vars, falling back to reading the service's registry-stored Environment values (requires elevated test host).

Remaining:

  • Install + run the OtOpcUaGalaxyHost + OtOpcUa services on the dev box (scripts/install/Install-Services.ps1) so the skip-on-unready tests actually execute and the smoke PR lands green.
  • Subscribe-and-receive-data-change fact (needs a known tag that actually ticks; deferred until operators confirm a scratch tag exists).
  • Write-and-roundtrip fact (needs a test-only UDA or agreed scratch tag so we can't accidentally mutate a process-critical value).

6. Second driver instance on the same server — DONE (PR 32)

Server.Tests/MultipleDriverInstancesIntegrationTests.cs registers two drivers with distinct DriverInstanceIds on one DriverHost, spins up the full OPC UA server, and asserts three behaviors: (1) each driver's namespace URI (urn:OtOpcUa:{id}) resolves to a distinct index in the client's NamespaceUris, (2) browsing one subtree returns that driver's folder and does NOT leak the other driver's folder, (3) reads route to the correct driver — the alpha instance returns 42 while beta returns 99, so a misroute would surface at the assertion layer.

Deferred: the alarm-event multi-driver parity case (two drivers each raising a GalaxyAlarmEvent, assert each condition lands on its owning instance's condition node). Alarm tracking already has its own integration test (AlarmSubscription*); the multi-driver alarm case would need a stub IAlarmSource that's worth its own focused PR.

7. Host-status per-AppEngine granularity → Admin UI dashboard — DONE (PRs 33 + 34)

PR 33 landed the data layer: DriverHostStatus entity + migration with composite key (NodeId, DriverInstanceId, HostName) and two query-supporting indexes (per-cluster drill-down on NodeId, stale-row detection on LastSeenUtc).

PR 34 wired the publisher + consumer. HostStatusPublisher is a BackgroundService in the Server process that walks every registered IHostConnectivityProbe-capable driver every 10s, calls GetHostStatuses(), and upserts rows (LastSeenUtc advances each tick; State + StateChangedUtc update on transitions). Admin UI /hosts page groups by cluster, shows four summary cards (Hosts / Running / Stale / Faulted), and flags rows whose LastSeenUtc is older than 30s as Stale so operators see crashed Servers without waiting for a state change.

Deferred as follow-ups:

  • Event-driven push (subscribe to OnHostStatusChanged per driver for sub-heartbeat latency). Adds DriverHost lifecycle-event plumbing; 10s polling is fine for operator-scale use.
  • Failure-count column — needs the publisher to track a transition history per host, not just current-state.
  • SignalR fan-out to the Admin page (currently the page polls the DB, not a hub). The DB-polled version is fine at current cadence but a hub push would eliminate the 10s race where a new row sits in the DB before the Admin page notices.