Files
lmxopcua/docs/plans/opcuaclient-plan.md
Joseph Doherty 2d07d716dc Recover stashed driver-gaps work from pre-v2-mxgw-merge working tree
Captures uncommitted work that lived in the working tree on
v2-mxgw-integration but was orthogonal to the migration. Stashed
during the v2-mxgw merge to master (2026-04-30) and replanted here on
a feature branch off master so it's git-visible rather than living in
the stash list.

Two distinct buckets:

1. Tracked fixture/config refinements (10 files, ~36 lines):
   - scripts/e2e/test-opcuaclient.ps1
   - src/ZB.MOM.WW.OtOpcUa.Admin/appsettings.json
   - 5 docker-compose.yml under tests/.../IntegrationTests/Docker/
     (AbCip, Modbus, OpcUaClient, S7)
   - 4 fixture .cs files (AbServerFixture, ModbusSimulatorFixture,
     OpcPlcFixture, Snap7ServerFixture)

2. Untracked driver-gaps queue artifacts (~8000 lines):
   - docs/plans/{abcip,ablegacy,focas,opcuaclient,s7,twincat}-plan.md
     — per-driver gap plans
   - docs/featuregaps.md — cross-cutting analysis
   - docs/v2/focas-deployment.md, docs/v2/implementation/focas-simulator-plan.md
   - followup.md — auto/driver-gaps queue follow-ups
   - scripts/queue/ — PR-queue automation tooling (12 files including
     pr-manifest.yaml at 1473 lines)

This commit is a snapshot for recoverability — review and split into
focused PRs (or discard) before merging anywhere downstream.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 08:28:01 -04:00

41 KiB

OpcUaClient Driver — Implementation Plan

Source of gap analysis: featuregaps.md → OpcUaClient

Covers Build = Yes items only. Numbering matches the featuregaps Recommendations table.

Summary

The OpcUaClient driver already ships 8/8 capability interfaces and a working end-to-end Session/Subscription/MonitoredItem/HistoryRead pipeline backed by the OPC Foundation OPCFoundation.NetStandard.Opc.Ua.Client SDK. Most of the 14 Build = Yes gaps are operability or curation knobs — config surface + plumbing into existing SDK calls — rather than new protocol implementation. A small number need genuinely new SDK plumbing (Reverse Connect, ModelChangeEvent subscribe) and one (ReadEventsAsync) needs a coordinated cross-driver interface change.

The plan groups the work into five phases, ordered to deliver per-tag / per-subscription operability first (highest-frequency operator pain), then curation, then change tracking, then connectivity, then historical+HA. Each PR sticks to one feature-gap row so reviews stay narrow.

Phased delivery

Phase Theme Gaps PRs Notes
1 Operability knobs #5, #6, #15, #17, #20 5 Pure SDK config surface; no new wire flows
2 Discovery & curation #2, #7, #8, #9 4 Touches ITagDiscovery + adds method invoke
3 Change tracking #10 1 New session-level subscription on Server node
4 Connectivity #1 1 Reverse Connect — new listener path
5 Historical & redundancy #12, #13, #14 3 Includes the cross-driver IHistoryProvider change

Total: 14 PRs across 5 phases. Phases 1-3 land independently against the existing single-session model. Phase 4 ships in parallel with phases 2-3 since it doesn't touch OpcUaClientDriver proper. Phase 5's first PR is a prerequisite for the ReadEventsAsync work in every other history-capable driver and must coordinate with them.

Per-PR detail

Phase 1 — Operability knobs

PR-1: Per-subscription tuning (gap #6)

Goal: lift the hard-coded KeepAliveCount=10, LifetimeCount=1000, MaxNotificationsPerPublish=0, Priority=0, PublishingInterval floor of 50 ms into OpcUaClientDriverOptions so high-event-rate servers can be defended against (MaxNotificationsPerPublish=0 is unlimited — the documented DoS surface) and high-tag-count deployments can split by priority.

SDK API:

  • Subscription.SetPublishingMode(bool, ct) for runtime enable/disable
  • SubscriptionOptions.PublishingInterval / KeepAliveCount / LifetimeCount / MaxNotificationsPerPublish / Priority set at create-time
  • New options class OpcUaSubscriptionDefaults (publish interval floor, keep-alive count, lifetime count, max notifications, priority)

Files:

  • src/.../OpcUaClient/OpcUaClientDriverOptions.cs — add Subscriptions sub-section
  • src/.../OpcUaClient/OpcUaClientDriver.csSubscribeAsync reads from options
  • src/.../OpcUaClient/OpcUaClientDriver.csSubscribeAlarmsAsync reuses same defaults but with Priority=1 higher than data subscriptions so alarms aren't starved during data bursts

Tests: OpcUaClientSubscribeAndProbeTests — assert options propagate; add a stress unit test (mocked Subscription) that asserts custom MaxNotificationsPerPublish is forwarded so a value > 0 actually reaches the SDK.

Risks: Setting LifetimeCount too low against a server with publish- throttling can drop subscriptions; doc the formula (LifetimeCount >= 3 * KeepAliveCount).

Docs / fixture / e2e: new "Subscription tuning" subsection in docs/drivers/OpcUaClient.md (create if missing) documenting the Subscriptions options block with the LifetimeCount >= 3 * KeepAliveCount formula; cross-link from the "Advanced options" section of docs/Client.CLI.md so CLI users discover the knobs. Fixture: opc-plc already publishes fast tickers (FastUInt1 @ 100 ms) sufficient for coverage — no fixture-side change. Integration test in tests/ZB.MOM.WW.OtOpcUa.Driver.OpcUaClient.IntegrationTests/ asserting custom KeepAliveCount / Priority reach the wire (capture via OpcPlcFixture keepalive count). E2E: extend scripts/e2e/test-opcuaclient.ps1 with a stage that sets a non-default publish interval and confirms the local subscription honours it.


PR-2: Per-tag advanced subscription tuning incl. deadband (gap #5)

Goal: surface SamplingInterval, QueueSize, DiscardOldest, MonitoringMode, and DataChangeFilter (DeadbandType=Absolute/Percent + Trigger=Status/StatusValue/StatusValueTimestamp) per-tag. Deadband is the baseline analog noise filter every commercial UA aggregator ships and the single feature most likely to cut bandwidth on busy plants.

SDK API:

  • MonitoredItem.Filter = new DataChangeFilter { Trigger = DataChangeTrigger.StatusValue, DeadbandType = (uint)DeadbandType.Absolute, DeadbandValue = 0.5 }
  • MonitoredItemOptions.QueueSize / DiscardOldest / SamplingInterval / MonitoringMode
  • Per-tag override structure: extend the SubscribeAsync parameter shape (or add an overload accepting a IReadOnlyList<MonitoredTagSpec>) — note this requires coordinating with ISubscribable so the per-tag carrier reaches the driver.

Files:

  • src/.../Core.Abstractions/ISubscribable.cs — add overload SubscribeAsync(IReadOnlyList<MonitoredTagSpec>, ...) keeping old API for source compat
  • src/.../OpcUaClient/OpcUaClientDriver.cs — translate spec → SDK filter

Tests: assert DataChangeFilter lands on the MonitoredItem.Filter for each kind of trigger; assert PercentDeadband requires server-side EURange (server returns BadFilterNotAllowed if not configured) — capture the StatusCode and surface as a usable error.

Risks: cross-cutting ISubscribable change. Mitigation: ship the overload as additive — existing single-arg path still exists.

Docs / fixture / e2e: new "Per-tag deadband and monitoring filters" section in docs/drivers/OpcUaClient.md (create if missing) with worked examples of Absolute vs Percent deadband + the EURange prerequisite; update docs/Client.CLI.md subscribe command page with the new tag- config syntax for --deadband / --queue-size / --discard-oldest; update docs/Client.UI.md Subscriptions tab section to mirror. Fixture: OpcPlcFixture / OpcPlcProfile seeds an analog (StepUp already oscillates) and confirms EURange is published — extend the profile to flag noisy nodes. Integration test in tests/ZB.MOM.WW.OtOpcUa.Driver.OpcUaClient.IntegrationTests/ asserts publish suppression below the deadband threshold. E2E: add a -DeadbandValue stage to scripts/e2e/test-opcuaclient.ps1 (and a deadband knob to scripts/e2e/e2e-config.sample.json) that subscribes, asserts no spurious updates within the band.


PR-3: Honor server OperationLimits (gap #15)

Goal: read Server.ServerCapabilities.OperationLimits.MaxNodesPerRead / Write / Browse / HistoryReadData once after Session activation, cache, and chunk batch operations to those caps client-side. Today the SDK chunks on its internal default; against an undersized embedded UA server this results in BadTooManyOperations.

SDK API:

  • After session open: Session.ReadAsync of VariableIds.Server_ServerCapabilities_OperationLimits_MaxNodesPerRead
    • sibling NodeIds. The SDK exposes Session.OperationLimits after FetchOperationLimits is called — prefer that path.
  • Session.FetchOperationLimitsAsync(ct) (1.5+); fallback: explicit Read.

Files:

  • src/.../OpcUaClient/OpcUaClientDriver.cs — call FetchOperationLimitsAsync post-OpenSessionOnEndpointAsync; honour caps in ReadAsync, WriteAsync, BrowseRecursiveAsync, EnrichAndRegisterVariablesAsync, ExecuteHistoryReadAsync.

Tests: mock Session.OperationLimits to a value below the test batch size and assert the driver issues N wire calls instead of one.

Risks: a zero on the server means "no limit" per Part 5 — don't divide by zero.

Docs / fixture / e2e: new "Server OperationLimits handling" subsection in docs/drivers/OpcUaClient.md documenting the auto-fetch behaviour, the zero-means-unlimited semantics, and how to override via options if the server reports an under-truthful value. Fixture: opc-plc publishes the standard ServerCapabilities tree out of the box — no container-side change; the OpcPlcFixture seed validates the IDs at collection init. Integration test asserts batch reads chunk to the fetched cap. No e2e change needed (the script's batch sizes are already small).


PR-4: Diagnostics counters (gap #17)

Goal: expose per-driver counters on DriverHealth (or a sibling DriverDiagnostics surface): publish-request count, notifications-per- second EWMA, missing-publish-request count, dropped-notification rate, session resets count. Operators currently see only LastSuccessfulRead

  • last error.

SDK API:

  • Subscription.Notification event fires per published notification — bump a counter
  • Subscription.PublishStateChanged event for missed-publish detection
  • Session.PublishError event for channel-level errors
  • Session.SessionClosing/SessionConfigurationChanged for session-reset attribution

Files:

  • src/.../OpcUaClient/OpcUaClientDriver.cs — instrument hooks; expose via IDriver.GetDiagnostics() or extend DriverHealth
  • src/.../Core.Abstractions/IDriver.cs — confirm where the counter shape lives; if DriverHealth is too rigid, add IDriverDiagnostics (mirrors the Modbus driver-diagnostics RPC pattern from #154)

Tests: synthetic notification fan-out → assert counters increment; session close → assert reset count bumps.

Risks: counters need to be lock-free hot-path safe; use Interlocked.Increment and a single sliding-window clock per counter.

Docs / fixture / e2e: new "Driver diagnostics" section in docs/drivers/OpcUaClient.md enumerating each counter and the event that bumps it; cross-link to the driver-diagnostics Admin RPC documented for Modbus (#154 pattern). Fixture: no opc-plc change required. Integration test exercises IDriverDiagnostics after forcing a session close. E2E: extend scripts/e2e/test-opcuaclient.ps1 with a "diagnostics snapshot" stage that asserts publish/notification counters are non-zero after the subscribe stage.


PR-5: CRL / revocation handling (gap #20)

Goal: explicit revoked-cert handling in CertificateValidator plus a RejectSHA1SignedCertificates knob. Today the validator hooks BadCertificateUntrusted only — a revoked cert silently fails as "untrusted" with no operator-visible distinction.

SDK API:

  • CertificateValidator.CertificateValidation event — inspect e.Error.StatusCode for BadCertificateRevoked, BadCertificateRevocationUnknown, BadCertificateIssuerRevocationUnknown, BadCertificatePolicyCheckFailed
  • SecurityConfiguration.RejectSHA1SignedCertificates, SecurityConfiguration.RejectUnknownRevocationStatus, SecurityConfiguration.MinimumCertificateKeySize — direct config bool/int knobs already on the SDK type
  • CertificateTrustList.AddCRL / per-store CRL directories under %LocalAppData%\OtOpcUa\pki\{trusted,issuers}\crl\

Files:

  • src/.../OpcUaClient/OpcUaClientDriver.csBuildApplicationConfigurationAsync honours new options, validator handler distinguishes revoked vs untrusted in the surfaced error message
  • src/.../OpcUaClient/OpcUaClientDriverOptions.cs — add RejectSHA1SignedCertificates, RejectUnknownRevocationStatus, MinimumCertificateKeySize

Tests: feed a SHA1-signed test cert and a revoked cert through the validator with the new knobs on/off.

Risks: PKI directory layout changes — existing deployments need a migration note.

Docs / fixture / e2e: new "Certificate revocation and SHA1 rejection" subsection in docs/drivers/OpcUaClient.md documenting the CRL directory layout under %LocalAppData%\OtOpcUa\pki\{trusted,issuers}\crl\ and the new options (with a migration note for existing PKI stores); cross-link from docs/security.md. Fixture: extend OpcPlcFixture / Docker/docker-compose.yml with an optional secured endpoint variant and a SHA1-signed test cert checked into the test project's resources for the validator unit test. Integration test exercises a revoked cert via a local CRL drop. E2E: add a -Insecure:$false smoke stage to scripts/e2e/test-opcuaclient.ps1 that asserts a revoked cert produces a distinguishable error message.


Phase 2 — Discovery & curation

PR-6: Discovery URL FindServers (gap #2)

Goal: accept a discovery URL (opc.tcp://host:4840 pointing at the LDS or the server's own discovery endpoint) and surface advertised servers

  • endpoints to the operator without manual policy/mode tuple copy.

SDK API:

  • DiscoveryClient.CreateAsync(appConfig, new Uri(url), DiagnosticsMasks.None, ct)
  • DiscoveryClient.FindServersAsync(null, ct)ApplicationDescription[]
  • DiscoveryClient.GetEndpointsAsync(null, ct) per advertised DiscoveryUrl

Files:

  • src/.../OpcUaClient/OpcUaClientDriver.cs — new internal DiscoverServersAsync helper; extend the Admin-side discovery RPC to invoke it (driver-diagnostics pattern from #154)
  • src/.../OpcUaClient/OpcUaClientDriverOptions.cs — add DiscoveryUrl knob (alternative to explicit EndpointUrls — when set the driver runs FindServers at init and feeds the result into the failover candidate list)

Tests: mock DiscoveryClient returning two advertised servers each with three endpoints; assert the candidate list reflects the policy/mode filter applied client-side.

Risks: FindServers itself usually requires SecurityMode=None — spec out in the doc that the discovery channel is unsecured even when the data channel will be encrypted.

Docs / fixture / e2e: new "Discovery URL (FindServers)" section in docs/drivers/OpcUaClient.md with the unsecured-discovery-vs-secured- data caveat called out; cross-link from docs/Client.CLI.md if a discover CLI command surfaces. Fixture: opc-plc already responds to FindServers on the same endpoint — OpcPlcFixture adds a discovery probe at collection init. Integration test exercises the helper against the live opc-plc container and asserts at least one ApplicationDescription returned. E2E: replace the hard-coded -RemoteUrl stage in scripts/e2e/test-opcuaclient.ps1 with an optional -DiscoveryUrl mode that picks the first advertised endpoint.


PR-7: Selective import / namespace remap (gap #7)

Goal: per-branch include/exclude rules, namespace-URI remapping, and re-keyed BrowseNames — the curation surface every commercial aggregator ships.

Approach: extend OpcUaClientDriverOptions with a Curation section:

  • IncludePaths: string[] — glob or NodeId-rooted prefix list; only paths matching are imported
  • ExcludePaths: string[] — wins over Include (Include is allow-list, Exclude is block-list)
  • NamespaceRemap: Dictionary<string,string> — upstream NS URI → local-side alias for BrowseName generation
  • RootAlias: string — default "Remote"; replaces the hardcoded folder name today

SDK API — none new; this is pure local filtering inside BrowseRecursiveAsync and EnrichAndRegisterVariablesAsync.

Files:

  • src/.../OpcUaClient/OpcUaClientDriverOptions.cs
  • src/.../OpcUaClient/OpcUaClientDriver.csBrowseRecursiveAsync consults the rule set; helper MapNamespaceForBrowseName handles NS remap

Tests: synthetic browse tree, exercise include/exclude/remap each independently and combined; verify the cap accounting in MaxDiscoveredNodes excludes filtered nodes.

Risks: glob semantics — pin to a small subset (*, ? only — no character classes or **) to keep the doc + behaviour simple.

Docs / fixture / e2e: new "Curation: include/exclude and namespace remap" section in docs/drivers/OpcUaClient.md with worked examples of each rule kind and the supported glob subset; update docs/drivers/OpcUaClient-Test-Fixture.md "Coverage map" with the new filtering rows. Fixture: extend OpcPlcProfile to enumerate which upstream namespaces are exercised so curation tests can target them. Integration test seeds an Include + Exclude + Remap rule and asserts the local tree reflects the filter. E2E: add a -IncludePath / -NamespaceRemap set of params to scripts/e2e/test-opcuaclient.ps1 that asserts the local browse depth matches the rule.


PR-8: Type definition mirroring (gap #8)

Goal: walk the upstream Types folder (ObjectTypes, VariableTypes, DataTypes, ReferenceTypes) and project them into the local address space so downstream UI clients keep type-aware rendering and structured DataTypes decode correctly.

SDK API:

  • Session.NodeCache.FetchNode(typeNodeId) for type metadata
  • Session.LoadDataTypeSystem — for structured DataType encoding
  • Session.FetchTypeTree(NodeIdCollection) — populates the session's type cache from the server

Files:

  • src/.../OpcUaClient/OpcUaClientDriver.cs — new pass-3 in DiscoverAsync that walks i=86 (Types folder) under the curation rules, registers a parallel type subtree, and links variables to their TypeDefinition via HasTypeDefinition references on the address-space builder
  • src/.../Core.Abstractions/IAddressSpaceBuilder.cs — confirm whether the builder accepts type nodes; if not, extend it (this likely is a prerequisite — if so, it gets its own preceding PR-8a)

Tests: mock browse returning BaseObjectType -> DerivedThing; assert local builder receives the type node + the HasTypeDefinition link.

Risks: significant. Type mirroring touches IAddressSpaceBuilder which is a cross-cutting interface every driver depends on. If IAddressSpaceBuilder already supports type nodes (Galaxy has type-like templates), reuse that surface; otherwise this PR splits.

Docs / fixture / e2e: new "Type mirroring" section in docs/drivers/OpcUaClient.md documenting which type nodes get walked and how downstream UA clients see the HasTypeDefinition references; also note in docs/Client.UI.md that the Browse tree now shows mirrored types. Fixture: opc-plc already exposes the standard Types folder; extend OpcPlcProfile to assert at least one custom ObjectType is present. Integration test browses the local Types folder post-discovery and asserts the upstream type chain landed. No e2e change needed beyond extending the existing browse stage to walk under Types.


PR-9: Method node mirroring + Call passthrough (gap #9)

Goal: discover NodeClass.Method nodes in the browse pass, expose them on the local address space, and forward Call invocations as Session.CallAsync against the upstream node. The driver already calls AcknowledgeableConditionType.Acknowledge for A&C — generalize that path.

SDK API:

  • Session.CallAsync(requestHeader, methodsToCall: CallMethodRequestCollection, ct) returning CallMethodResultCollection
  • Browse already covers Method nodes by lifting the NodeClassMask; need to additionally browse HasProperty to discover InputArguments / OutputArguments for argument translation

Files:

  • src/.../Core.Abstractions/IDriver.cs — add IMethodInvoker capability interface (this is a NEW capability, not a tweak to an existing one)
  • src/.../OpcUaClient/OpcUaClientDriver.cs — implement IMethodInvoker.InvokeAsync(string objectId, string methodId, IReadOnlyList<object?> inputs, ct); refactor AcknowledgeAsync to reuse the common path
  • src/.../Server/... node-manager — wire IMethodInvoker to the OPC UA server's MethodNode.OnCallMethod hook so downstream Call requests reach the driver

Tests: mock Session.CallAsync returning Good + an output collection; assert pass-through fidelity. Also assert per-argument BadInvalidArgument codes pass through.

Risks: high — adds a new capability interface. Other drivers that could support methods (Galaxy via OnExecute scripts, FOCAS via FOCAS commands) gain a clean extension point but each is its own follow-up.

Docs / fixture / e2e: new "Method nodes and Call passthrough" section in docs/drivers/OpcUaClient.md explaining how method calls flow through the aggregator (input/output argument translation, error- code passthrough); add a call command page to docs/Client.CLI.md covering the new path; mirror in docs/Client.UI.md if a UI surface ships. Fixture: opc-plc already exposes the standard Server.GetMonitoredItems method — OpcPlcFixture registers it as the canonical method-call target. Integration test in tests/ZB.MOM.WW.OtOpcUa.Driver.OpcUaClient.IntegrationTests/ invokes Server.GetMonitoredItems through the aggregator. E2E: add a -MethodNodeId stage to scripts/e2e/test-opcuaclient.ps1 that calls the method through the local server and asserts the output matches the direct upstream call.


Phase 3 — Change tracking

PR-10: Auto re-import on ModelChangeEvent (gap #10)

Goal: subscribe to BaseModelChangeEventType / GeneralModelChangeEventType on the upstream server's i=2253 Server node so when the upstream topology changes (new tag added, type modified) the driver triggers a ReinitializeAsync-style re-import without operator action.

SDK API:

  • A second Subscription on the Session, monitoring Server node (ObjectIds.Server) with an EventFilter whose SelectClauses reference BaseModelChangeEventType and (optionally) GeneralModelChangeEventType Changes property
  • On notification: enqueue a debounced re-discover (don't react to every event during a bulk topology edit — coalesce 2-5s window)

Files:

  • src/.../OpcUaClient/OpcUaClientDriver.cs — add _modelChangeSubscription field; new SubscribeModelChangesAsync invoked at the end of InitializeAsync; debounce timer that calls ReinitializeAsync on the driver host
  • src/.../OpcUaClient/OpcUaClientDriverOptions.cs — add WatchModelChanges: bool (default true) + ModelChangeDebounce: TimeSpan (default 5s)

Tests: synthetic event injection on the mock Session's notification stream; assert one debounced re-import call regardless of N events arriving in the window.

Risks: re-import while a downstream client is mid-browse — needs serialization on _gate like the rest of the driver; document that clients see a brief gap in the address space during reload.

Docs / fixture / e2e: new "Auto re-import on ModelChangeEvent" section in docs/drivers/OpcUaClient.md documenting the debounce window, the _gate serialization, and the brief browse-gap during reload. Fixture: opc-plc supports runtime topology mutation via the addnode/addtag HTTP control endpoint — extend OpcPlcFixture with a helper that triggers a model change. Integration test asserts a single re-import call after a burst of synthetic model change events. E2E: add a "topology change" stage to scripts/e2e/test-opcuaclient.ps1 that calls the opc-plc control endpoint, then asserts the local server reflects the new node within the debounce window.


Phase 4 — Connectivity

PR-11: Reverse Connect (gap #1)

Goal: support server-initiated client connect for OT-DMZ outbound-only firewalls. The upstream server connects to us on a TCP listener; we respond as the client. Hard requirement for many regulated plant networks.

SDK API:

  • Opc.Ua.Client.ReverseConnectManager — manages a TCP listener on the configured port and dispatches incoming reverse-connect requests
  • ReverseConnectManager.AddEndpoint(Uri reverseEndpoint) — listener URI e.g. opc.tcp://0.0.0.0:4844
  • ReverseConnectManager.WaitForConnection(serverUri, serverUri, ct) — blocks until the configured server initiates a reverse connect
  • Session.Create(appConfig, reverseConnection, endpoint, ...) — alternative session-create overload accepting the ITransportWaitingConnection returned by the manager

Files:

  • src/.../OpcUaClient/OpcUaClientDriverOptions.cs — add ReverseConnect: { Enabled, ListenerUrl, ExpectedServerUri } section
  • src/.../OpcUaClient/OpcUaClientDriver.cs — when reverse-connect is enabled, replace the failover sweep with WaitForConnection and fall through into the same session-create path
  • New helper ReverseConnectListener — owns the manager lifecycle, one listener per driver-host process (singleton across instances if multiple reverse-connect drivers are configured)

Tests: spin up a ReverseConnectClient test against an opc-plc container started with --rc opc.tcp://host:4844 to verify end-to-end. Unit tests mock ITransportWaitingConnection.

Risks: highest of the plan. Reverse Connect changes the listen-vs-dial direction; if multiple OpcUaClient driver instances both listen on the same port the manager must multiplex. opc-plc supports reverse connect (--rc flag) so the integration test pattern from docs/drivers/OpcUaClient-Test-Fixture.md extends cleanly.

Docs / fixture / e2e: new "Reverse Connect" section in docs/drivers/OpcUaClient.md (create if missing) documenting the listener URL config, the OT-DMZ outbound-only use case, and the shared- listener singleton model; update docs/drivers/OpcUaClient-Test-Fixture.md with the new "Reverse Connect coverage" row. Fixture: extend Docker/docker-compose.yml with an opc-plc-rc service variant that adds --rc opc.tcp://host.docker.internal:4844; OpcPlcFixture gains a [CollectionDefinition] that wires up the reverse-connect listener on the test side. Integration test asserts a session opens via the reverse path. E2E: add a -ReverseConnect switch to scripts/e2e/test-opcuaclient.ps1 that flips the driver to listener mode and verifies the bridge stage still passes.


Phase 5 — Historical & redundancy

PR-12: IHistoryProvider.ReadEventsAsync interface fix + driver impl (gap #12)

Goal: extend IHistoryProvider.ReadEventsAsync to carry an EventFilter SelectClauses parameter so HistoryRead Events can return the right field projection, and implement the OPC UA Client passthrough.

This is a cross-driver concern. IHistoryProvider lives in Core.Abstractions and every driver that opts into history (Galaxy, OpcUaClient, plus any future historian-backed Tier-A driver) inherits the default. Changing the signature is source-breaking — coordinate as one PR that:

  1. Adds the IReadOnlyList<EventFieldProjection> (or equivalent abstract EventFilterSpec) parameter
  2. Updates Galaxy's existing override (currently the only override) to honour the projection (best-effort — the Galaxy A&E log has a fixed field set so most projections degrade to the default columns)
  3. Lands the OpcUaClient passthrough using Session.HistoryReadAsync with ReadEventDetails

SDK API:

  • ReadEventDetails { StartTime, EndTime, NumValuesPerNode, Filter }
  • Session.HistoryReadAsync is already the call we use for Raw — pass new ExtensionObject(new ReadEventDetails { ... }) for events
  • HistoryEvent.Events: HistoryEventFieldList[] — unwrap into HistoricalEvent records

Files:

  • src/.../Core.Abstractions/IHistoryProvider.cs — interface change
  • src/.../Driver.Galaxy.../*HistoryProvider*.cs — adjust signature
  • src/.../OpcUaClient/OpcUaClientDriver.cs — implement ReadEventsAsync; reuse ExecuteHistoryReadAsync shape
  • Server-side history facade — propagate the new parameter

Tests: integration test against opc-plc with --alm (alarm sim already enabled per the fixture doc) — verify the SelectClause projection comes back correctly.

Risks: the cross-driver interface change is the riskiest single ergonomic call in this plan. If we can't fit the new parameter without breaking every driver's IHistoryProvider impl, fall back to a sibling IEventHistoryProvider interface and only the OPC UA Client + Galaxy implement it. Decide this in the PR review.

Docs / fixture / e2e: new "HistoryRead Events" section in docs/drivers/OpcUaClient.md documenting the EventFilter-aware passthrough; update docs/Client.CLI.md historyread page to cover event-mode reads. Cross-driver doc updates (this PR adds an "IHistoryProvider.ReadEventsAsync signature change — see docs/plans/opcuaclient-plan.md PR-12" note to every other driver plan that has a history surface): docs/plans/abcip-plan.md, docs/plans/ablegacy-plan.md, docs/plans/focas-plan.md, docs/plans/s7-plan.md, docs/plans/twincat-plan.md, the Galaxy plan family (docs/plans/galaxy-*.md if/when present, and the LMX equivalent if it lands), and any Modbus plan. Galaxy is the only existing implementor and gets a real signature update in this PR; the others get a heads-up note so future work tracks the new shape. Fixture: opc- plc runs with --alm already (per existing fixture doc) — no compose change. Integration test issues a HistoryRead Events with a non-default SelectClause and asserts the projected fields. E2E: extend scripts/e2e/test-opcuaclient.ps1 with a "history events" stage gated on the --alm simulator producing at least one event.


PR-13: Full Aggregate function set (gap #13)

Goal: extend HistoryAggregateType from the 5 enum values today (Average/Minimum/Maximum/Total/Count) to the OPC UA Part 13 standard catalog of 30+ aggregates that historian-class clients expect.

SDK API: ObjectIds.AggregateFunction_* constants — one per aggregate. The SDK already exposes them; this is pure mapping work.

Aggregates to add (Part 13 §5):

  • TimeAverage, TimeAverage2
  • Interpolative
  • MinimumActualTime, MaximumActualTime, Range, Range2
  • AnnotationCount, DurationGood, DurationBad, PercentGood, PercentBad
  • WorstQuality, WorstQuality2
  • StandardDeviationSample, StandardDeviationPopulation, VarianceSample, VariancePopulation
  • NumberOfTransitions
  • Start, End, Delta, StartBound, EndBound
  • DurationInStateZero, DurationInStateNonZero

Files:

  • src/.../Core.Abstractions/IHistoryProvider.cs — extend HistoryAggregateType enum (additive — existing values keep their ordinal)
  • src/.../OpcUaClient/OpcUaClientDriver.csMapAggregateToNodeId switch grows; default arm rejects out of range

Tests: parametrized unit test sweeping every enum value — assert each maps to a non-null NodeId in the SDK's well-known set.

Risks: low — this is mapping work. Drivers without a real historian (everything except Galaxy + OpcUaClient) keep throwing NotSupported.

Docs / fixture / e2e: extend the "HistoryRead aggregates" section in docs/drivers/OpcUaClient.md with the full Part 13 catalog and which aggregates require server-side support; update docs/Client.CLI.md historyread page enumerating the new --aggregate values. Fixture: opc-plc historian support is limited — flag in docs/drivers/OpcUaClient-Test-Fixture.md that the new aggregates are unit-tested via the SDK's well-known NodeId set, not exercised wire-side. Integration test sweeps every enum value and asserts the mapping; gated-skip for aggregates the live opc-plc image doesn't honour. No e2e change.


PR-14: ServerUriArray redundant failover (gap #14)

Goal: read upstream Server.ServerArray / ServerStatus.ServerArray and ServerRedundancyType.RedundancySupport at session activation; when the upstream server advertises non-None redundancy, fail over mid-session on ServiceLevel drop without losing client subscriptions. Today our EndpointUrls is a one-shot connect- attempt list, not a live redundancy group.

SDK API:

  • Session.ReadValueAsync(VariableIds.Server_ServerStatus_ServerArray, ct) → URI list
  • Session.ReadValueAsync(VariableIds.Server_ServiceLevel, ct) polled or subscribed via MonitoredItem
  • Subscribe Server_ServiceLevel on the existing alarm subscription so drops propagate via the publish channel
  • On low-ServiceLevel: open a parallel session against the next URI in ServerArray, Session.TransferSubscriptionsAsync(otherSession, ...) the live subscriptions, swap Session reference

Files:

  • src/.../OpcUaClient/OpcUaClientDriver.cs — new MonitorServerRedundancyAsync method; integrate with the existing OnKeepAlive / SessionReconnectHandler machinery so reconnect and redundancy-failover share the subscription-transfer code path
  • src/.../OpcUaClient/OpcUaClientDriverOptions.cs — add Redundancy: { Enabled, ServiceLevelThreshold (default 200) }

Tests: with two opc-plc containers behind the driver, artificially drop ServiceLevel on the active one and assert the secondary takes over; assert subscription handles stay valid.

Risks: redundancy is the second-riskiest item after Reverse Connect. The SDK's TransferSubscriptions has known edge cases when the secondary's SecureChannel rejects the source-channel's authentication token; doc that the secondary must trust the same client cert as the primary.

Docs / fixture / e2e: new "Upstream redundancy (ServerArray)" section in docs/drivers/OpcUaClient.md with the ServiceLevel threshold, the shared-cert prerequisite for TransferSubscriptions, and the ops runbook for forcing a failover; cross-link from docs/Redundancy.md (which today covers OUR server's redundancy — add a "vs upstream-side redundancy" note). Fixture: extend Docker/docker-compose.yml with a second opc-plc-secondary service on a different port; OpcPlcFixture gains a multi-endpoint variant. Integration test drops the active server's ServiceLevel and asserts the secondary takes over with subscription handles intact. E2E: add a -PrimaryUrl / -SecondaryUrl pair to scripts/e2e/test-opcuaclient.ps1 (and matching keys to scripts/e2e/e2e-config.sample.json) that scripts a primary stop + asserts the bridge stage continues to pass.


Documentation, fixture, and e2e impact

Consolidated index of every doc page, fixture asset, and e2e script touched by the plan above. Authoritative for review — if a PR's Docs / fixture / e2e line references a path not listed here, that's a checklist gap.

Driver user docs

  • docs/drivers/OpcUaClient.mdcreate on first PR that needs it (PR-1) if not present, then extend with one section per PR-1 through PR-14 covering: subscription tuning, per-tag deadband, OperationLimits handling, diagnostics counters, CRL/SHA1, FindServers, curation, type mirroring, methods, ModelChangeEvent, Reverse Connect, history events, aggregates, upstream redundancy.
  • docs/drivers/OpcUaClient-Test-Fixture.md — coverage map updated for curation (PR-7), Reverse Connect (PR-11), aggregates note (PR-13), redundancy multi-endpoint variant (PR-14).
  • docs/Client.CLI.md — extended for subscribe deadband syntax (PR-2), any discover command (PR-6), call command (PR-9), historyread event mode (PR-12), --aggregate enum expansion (PR-13).
  • docs/Client.UI.md — extended for Subscriptions tab deadband fields (PR-2), Browse-tree type rendering note (PR-8), Method-call surface (PR-9) if it ships.
  • docs/security.md — cross-link from PR-5 (CRL/SHA1 knobs).
  • docs/Redundancy.md — cross-link from PR-14 (note distinguishing server-side redundancy from upstream-side redundancy).

Fixture assets

  • tests/ZB.MOM.WW.OtOpcUa.Driver.OpcUaClient.IntegrationTests/Docker/docker-compose.yml — add opc-plc-rc (PR-11) and opc-plc-secondary (PR-14) service variants; optional secured endpoint (PR-5).
  • tests/ZB.MOM.WW.OtOpcUa.Driver.OpcUaClient.IntegrationTests/OpcPlcFixture.cs — discovery probe at collection init (PR-6), reverse-connect listener (PR-11), multi-endpoint variant (PR-14), model-change helper (PR-10).
  • tests/ZB.MOM.WW.OtOpcUa.Driver.OpcUaClient.IntegrationTests/OpcPlcProfile.cs — flag noisy analogs for deadband (PR-2), enumerate exercised namespaces for curation (PR-7), record at least one custom ObjectType (PR-8).
  • New integration tests added per PR; all live under the existing tests/ZB.MOM.WW.OtOpcUa.Driver.OpcUaClient.IntegrationTests/ collection.
  • Test certs (PR-5): SHA1-signed + revoked test fixtures checked into the unit-test project's resources.

E2E scripts

  • scripts/e2e/test-opcuaclient.ps1 — new stages added per PR (subscription tuning PR-1, deadband PR-2, diagnostics PR-4, CRL PR-5, discovery PR-6, curation PR-7, method call PR-9, topology change PR-10, reverse connect PR-11, history events PR-12, redundancy failover PR-14). The script is the single integration point for every driver-level e2e — keep the stages ordered top-down by phase.
  • scripts/e2e/e2e-config.sample.json — new keys: deadband, discoveryUrl, includePath, namespaceRemap, methodNodeId, reverseConnect, primaryUrl, secondaryUrl.
  • scripts/e2e/test-all.ps1 — no structural change; the existing opcuaclient block forwards new params after wiring them through e2e-config.sample.json.

Cross-driver impact (PR-12 — IHistoryProvider.ReadEventsAsync)

PR-12 changes the IHistoryProvider.ReadEventsAsync signature in Core.Abstractions (or introduces a sibling IEventHistoryProvider — pinned in PR-12 review per Open Question 2). That decision is source-breaking for every driver that opts into history. PR-12 must add an explicit "interface change — adopt new signature when this driver implements ReadEventsAsync" note to:

  • docs/plans/abcip-plan.md
  • docs/plans/ablegacy-plan.md
  • docs/plans/focas-plan.md
  • docs/plans/s7-plan.md
  • docs/plans/twincat-plan.md
  • The Galaxy plan family — docs/plans/galaxy-*.md if/when those pages exist; Galaxy is the only current implementor and gets the real signature update in PR-12, not just a note.
  • The LMX plan — docs/plans/lmx-*.md if/when it lands (current state: the LMX driver's history surface is implicit through Galaxy; revisit during PR-12 review).
  • A Modbus plan page if/when one exists; Modbus does not implement history today but the heads-up note tracks the cross-driver shape.

The cross-driver note text should be a one-paragraph "Heads up: the IHistoryProvider.ReadEventsAsync interface gained an EventFilterSpec parameter in OpcUaClient PR-12 (docs/plans/opcuaclient-plan.md). If/when this driver implements event-history, adopt the new signature." This pattern keeps each driver plan stable while the cross-cutting breakage is owned by one PR.


Skip-rated items (for context)

These featuregaps rows are Build = No and intentionally omitted from the plan above:

# Gap Why we're skipping
3 Multicast / LDS-ME registration Server-side responsibility, not aggregator's.
4 GDS push management (Part 12) Significant infra; rare for our deployment scale.
11 HistoryUpdate / Modified / Annotation passthrough MES backfill scope; defer.
16 Connection / session pooling for multi-instance scale-out Premature; current per-instance model is simple and adequate.
18 Kerberos / OAuth2 / JWT identity Significant security work; defer until AD integration drives it (separate workstream).
19 Write attribute scope beyond Value Niche; rarely used in OPC UA practice.

If any of these get prioritized later they slot cleanly between the phases above — none have prerequisites among the Build = Yes items.

Open questions

  1. ISubscribable overload vs new method (PR-2): per-tag spec carrier is needed for deadband; do we extend the existing SubscribeAsync overload or add SubscribeWithSpecsAsync? The former is source-breaking but cleaner; the latter is additive but leaves two parallel paths.
  2. IHistoryProvider.ReadEventsAsync shape (PR-12): does the EventFilterSpec parameter live on IHistoryProvider (one interface, every driver gets it) or on a sibling IEventHistoryProvider (two interfaces, only event-history drivers implement)? Memory entry suggests the former; preference depends on whether non-OPC-UA drivers ever expect to project arbitrary event fields. Pin this in PR-12 review.
  3. IMethodInvoker capability (PR-9): does this become the 9th capability interface (currently 8/8) or is it folded into IWritable as a method-invoke variant? Adding a 9th interface is the cleaner model and matches the spec layering.
  4. Type mirroring address-space surface (PR-8): does IAddressSpaceBuilder already accept type nodes? If yes, PR-8 is straightforward; if no, it splits into a prerequisite PR-8a that extends the builder, then PR-8b for the OPC UA Client wire-up. The answer determines whether PR-8 ships in Phase 2 or slips to a later phase.
  5. Reverse Connect listener ownership (PR-11): one listener per driver instance (port collision when multiple reverse-connect drivers run in the same process) vs one shared listener with a expectedServerUri dispatcher. Shared is the right answer; pin the singleton lifetime to the driver-host.
  6. Phase 1 ship order: PR-1, PR-3, PR-4, PR-5 are independent and can land in parallel. PR-2 depends on the ISubscribable interface decision (Q1) — recommend landing PR-1 first to validate the OpcUaSubscriptionDefaults shape, then PR-2.