Files
lmxopcua/docs/plans/2026-05-28-driver-browsers-design.md
T
Joseph Doherty fcd0b9b355 docs: design for live address browsers (OpcUaClient + Galaxy)
Approved design for the deferred follow-up from PR #f9fc7dd's driver-pages
work. Lazy tree browse via per-driver IDriverBrowser registered in AdminUI
DI, sessions held in-process with TTL reaper. Detailed sequencing for the
writing-plans handoff is in section 9.
2026-05-28 15:19:52 -04:00

19 KiB

Live address browsers for OpcUaClient + Galaxy drivers — design

Status: approved 2026-05-28. Implementation plan to follow via writing-plans. Builds on: PR that shipped driver-specific AdminUI pages (commit 0d3ec46). Both OpcUaClientAddressPickerBody.razor and GalaxyAddressPickerBody.razor were intentionally shipped as static stubs ("enter the string manually") with live browse deferred to this follow-up.

Goal: Add lazy, ad-hoc browse trees to the OpcUaClient and Galaxy address pickers in the AdminUI, so operators can navigate the remote server's (or galaxy's) hierarchy and pick an address rather than typing it.

Architecture: A new IDriverBrowser abstraction registered per driver type (parallel to the runtime's IDriverProbe), with implementations housed in sibling *.Browser projects under src/Drivers/. AdminUI owns the live browse sessions in-process via a BrowseSessionRegistry singleton with a 2-minute idle TTL and an IHostedService reaper. Razor picker bodies talk to a scoped IBrowserSessionService; no actor messages on the hot path.

Tech stack: .NET 10 / Blazor Server / OPCFoundation.NetStandard.Opc.Ua.Client / ZB.MOM.WW.MxGateway.Client (sibling repo, lazy-browse API already shipped).


1. Architecture

Abstraction

// Commons (shared)
public interface IDriverBrowser {
    string DriverType { get; }                                      // "OpcUaClient", "Galaxy", ...
    Task<IBrowseSession> OpenAsync(string configJson, CancellationToken ct);
}

public interface IBrowseSession : IAsyncDisposable {
    Guid Token { get; }
    DateTime LastUsedUtc { get; }
    Task<IReadOnlyList<BrowseNode>> RootAsync(CancellationToken ct);
    Task<IReadOnlyList<BrowseNode>> ExpandAsync(string nodeId, CancellationToken ct);
    Task<IReadOnlyList<AttributeInfo>> AttributesAsync(string nodeId, CancellationToken ct); // empty for OPC UA
}

public sealed record BrowseNode(
    string NodeId,             // address persisted on commit
    string DisplayName,
    BrowseNodeKind Kind,       // Folder | Leaf
    bool HasChildrenHint);

public sealed record AttributeInfo(
    string Name,               // e.g. "DownloadPath"
    string DriverDataType,
    bool   IsArray,
    string SecurityClass);     // FreeAccess | Operate | Tune | Configure | ViewOnly

public enum BrowseNodeKind { Folder, Leaf }

Session lifecycle

  1. Razor picker body calls BrowserSessionService.OpenAsync(driverType, formJson)
  2. Service resolves IDriverBrowser from DI by driver type, calls OpenAsync(json)
  3. Returns IBrowseSession; service registers it in BrowseSessionRegistry under a new Guid token
  4. Razor stores token, calls RootAsync(token) to populate the initial tree
  5. Each subsequent expand-click calls ExpandAsync(token, nodeId)
  6. Picker body's IAsyncDisposable.DisposeAsync fires CloseAsync(token) on tear-down
  7. BrowseSessionReaper (IHostedService) ticks every 30s, evicts any session where (UtcNow - LastUsedUtc) > 2 min, awaits DisposeAsync

The session genuinely has no value to other cluster nodes — it's tied to one circuit. Hosting it in-process avoids cross-cluster Ask latency on every folder click.


2. Components

New projects

Path Purpose
src/Drivers/ZB.MOM.WW.OtOpcUa.Driver.OpcUaClient.Browser/ OPC UA browser impl + session
src/Drivers/ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Browser/ Galaxy browser impl + session
tests/Drivers/ZB.MOM.WW.OtOpcUa.Driver.OpcUaClient.Browser.Tests/ Unit tests (use opc-plc fixture)
tests/Drivers/ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Browser.Tests/ Unit tests (fake transport)

Driver-specific browsers live in sibling projects so AdminUI doesn't drag the runtime Driver.* projects (and their full SDK chains) through a transitive reference.

New abstractions

Path Purpose
src/Core/ZB.MOM.WW.OtOpcUa.Commons/Browsing/IDriverBrowser.cs Per-driver factory
src/Core/ZB.MOM.WW.OtOpcUa.Commons/Browsing/IBrowseSession.cs Session contract
src/Core/ZB.MOM.WW.OtOpcUa.Commons/Browsing/BrowseNode.cs + BrowseNodeKind enum + AttributeInfo

AdminUI plumbing

Path Purpose
src/Server/.../AdminUI/Browsing/BrowseSessionRegistry.cs Singleton, ConcurrentDictionary<Guid, IBrowseSession>
src/Server/.../AdminUI/Browsing/BrowseSessionReaper.cs IHostedService, 30s tick, 2 min idle TTL
src/Server/.../AdminUI/Browsing/IBrowserSessionService.cs Scoped DI service for Razor
src/Server/.../AdminUI/Browsing/BrowserSessionService.cs Impl: resolve driver, register session, enforce per-call timeouts
src/Server/.../AdminUI/Components/Shared/Drivers/DriverBrowseTree.razor Shared lazy tree component with per-node text filter

Modified files

Path Change
src/Server/.../Pickers/OpcUaClientAddressPickerBody.razor Add Browse button + DriverBrowseTree; keep manual entry
src/Server/.../Pickers/GalaxyAddressPickerBody.razor Same shape + side-panel for attribute pick
src/Server/.../AdminUI/Program.cs Register IDriverBrowser services + registry + reaper
src/Drivers/.../OpcUaClient.Contracts/NamespaceMap.cs Extract from runtime Driver.OpcUaClient for shared use
ZB.MOM.WW.OtOpcUa.slnx Add the four new projects

3. Data flow

Open → tree → pick (OpcUaClient as worked example; Galaxy identical except attribute side-panel before commit):

Razor picker body          BrowserSessionService          IDriverBrowser           Remote
        |                          |                            |                       |
   click Browse ────────► OpenAsync(driverType, json) ─► OpenAsync(json) ────────► connect + activate session
        | ◄────────────────  token (Guid)               ◄───── ISession                  |
        |                          |                            |                       |
   render tree ─────────► RootAsync(token) ─────────────► session.RootAsync ─────► BrowseAsync(ObjectsFolder)
        | ◄──────────────── BrowseNode[]                 ◄───── refs                     |
        |                          |                            |                       |
   click folder ────────► ExpandAsync(token, nodeId) ──► session.ExpandAsync ───► BrowseAsync(nodeId)
        | ◄──────────────── BrowseNode[]                 ◄───── refs                     |
        |                          |                            |                       |
   click leaf + commit ─► CloseAsync(token) ─────────► session.DisposeAsync ───► CloseSession
        |                          |                            |                       |

Galaxy two-stage attribute pick: after the user selects an object (Folder) in the tree, the picker body calls AttributesAsync(token, tagName) and renders the result as a side-panel. The user picks an attribute; the committed address is tag_name.AttributeName.

Stable address format:

  • OpcUaClient: nsu=<uri>;<localid> via NamespaceMap.ToStableReference — survives remote namespace-table reorder across restarts
  • Galaxy: tag_name (the globally unique system name) — already stable by definition

Per-node text filter: purely client-side over the already-loaded node.Children. No round-trip on filter input.


4. OpcUaClient browser specifics

Connection

  • Reuses OpcUaClientDriverOptions (deserialize with UnmappedMemberHandling.Skip)
  • Builds a separate ApplicationConfiguration from the runtime driver — PKI root at %LocalAppData%/OtOpcUa/adminui-browse-pki/ (separate cert store)
  • ApplicationName = "OtOpcUa AdminUI Browse", ApplicationUri = "urn:OtOpcUa:AdminUI:Browse"
  • Endpoint selection: same DiscoveryClient.GetEndpointsAsync → filter (policy, mode) as the runtime driver
  • One endpoint only (no failover) — interactive use; user retries with different URL on failure
  • Bounded by OpcUaClientDriverOptions.PerEndpointConnectTimeout (clamped [5, 30]s)

Namespace map

  • NamespaceMap class extracted to OpcUaClient.Contracts so both runtime and Browser projects share one impl
  • Browser builds the map from the live session on open; uses ToStableReference for outbound NodeIds; uses TryResolve for inbound

Lazy browse

  • One level per click using Session.BrowseAsync + BrowseNextAsync continuation-point loop
  • BrowseDescriptionCollection filters to NodeClass.Object | NodeClass.Variable, ResultMask = BrowseName | DisplayName | NodeClass
  • BrowseNode.HasChildrenHint = (Kind == Folder) — heuristic; saves a per-node round-trip
  • Inside-session calls guarded by SemaphoreSlim _gate (same pattern as runtime driver — OPC UA Session.BrowseAsync not thread-safe)

Cert handling

  • AutoAcceptCertificates = true honored with parity to runtime + log warning + per-session unwire on dispose
  • AutoAcceptCertificates = false + untrusted cert → OpenAsync fails with SDK error message in the UI

Reconnect handling

  • None. Browse sessions are short-lived (2 min idle TTL). Keep-alive failure → UI surfaces error chip → user re-clicks Browse.

5. Galaxy browser specifics

Connection

  • Reuses GalaxyDriverOptions (deserialize with UnmappedMemberHandling.Skip)
  • Opens MxGatewaySession with ClientName = "OtOpcUa-AdminUI-Browse" — distinct from runtime driver's name so the gateway can attribute load
  • Per-call gateway client built via session.GalaxyRepository(opts.GalaxyName)

Lazy browse

  • Root: client.BrowseAsync(new BrowseChildrenOptions(), ct)IReadOnlyList<LazyBrowseNode>
  • Expand: cached LazyBrowseNode lookup by tag_name, then node.ExpandAsync(ct) (gateway client handles paging internally)
  • No internal gate — LazyBrowseNode.ExpandAsync already has its own lock; gateway client is thread-safe across distinct calls

Two-stage attribute pick

  • Galaxy BrowseNode.Kind is always Folder — leaves don't exist at tree level
  • When the user clicks an object node, picker body calls AttributesAsync(token, tagName) and shows the result as a side-panel listing (Name, DriverDataType, IsArray, SecurityClass)
  • On attribute click, committed address is $"{tagName}.{attrName}"
  • Backing call: either BrowseChildrenOptions { IncludeAttributes = true } filtered to the GobjectId, or a dedicated GetAttributesAsync(GobjectId, ct) — to be confirmed during plan write against the gateway client surface

Filters in v1

  • Per-node text filter (client-side) for tree navigation
  • Server-side filters (TagNameGlob, AlarmBearingOnly, HistorizedOnly) deferred to a follow-up — easy to add later without breaking the wire (the session is constructed today with new BrowseChildrenOptions())

6. Error handling, timeouts, TTL

Failures

  • OpenAsync → catches Exception, logs Info, returns typed BrowseOpenResult(Ok: false, Message, Token: Empty). UI shows red chip with truncated SDK message
  • ExpandAsync / AttributesAsync → same shape per-call. Failed branch shows error chip; rest of tree intact; session stays alive
  • BrowseSessionNotFoundException when token unknown (session reaped or never existed)

Timeouts

  • Per-call expand/attributes: 20 s via CTS.CreateLinkedTokenSource(callerCt) in BrowserSessionService
  • Session open: 30 s ceiling; OPC UA reuses PerEndpointConnectTimeout (default 10 s), Galaxy hardcodes 30 s for MxGatewaySession.OpenAsync

TTL & reaping

  • LastUsedUtc set on every RootAsync/ExpandAsync/AttributesAsync
  • Reaper: IHostedService with PeriodicTimer(30s). On each tick: snapshot keys; for any session with (UtcNow - LastUsedUtc) > 120s: TryRemove then await DisposeAsync outside the dictionary
  • Concurrent ExpandAsync racing eviction → caller catches closed-session error → service translates to BrowseSessionNotFoundException
  • On AdminUI shutdown: StopAsync walks the registry once and disposes all sessions

Concurrency

  • BrowseSessionRegistry = ConcurrentDictionary<Guid, IBrowseSession> — no extra lock
  • OpcUaClient session serializes browse on SemaphoreSlim; Galaxy session relies on its internal locks

Component dispose

  • Razor picker body implements IAsyncDisposable
  • Fires CloseAsync(token) fire-and-forget (no await) so circuit teardown isn't blocked by a gRPC roundtrip
  • Reaper is the safety net if dispose doesn't fire

Logging

  • Serilog. Info at open + close, Debug at close-with-reason (user-close | idle-ttl | shutdown), Info on failure
  • No per-expand logging (noise)

Audit trail

  • None — browse is read-only and doesn't mutate config or driver state (matches probe pattern)

7. Security & auth

Role gating

  • Browse button gated by existing DriverOperator LDAP policy — same as Reconnect/Restart in DriverStatusPanel
  • Picker bodies check policy in OnInitializedAsync via IAuthorizationService and AuthenticationStateProvider
  • Manual entry stays available regardless of role

Credentials in JSON

  • Form JSON posted to BrowserSessionService.OpenAsync contains plaintext passwords / API keys — same as the existing TestDriverConnect probe
  • JSON is deserialized into typed Options → used to build SDK config → both released; no _lastConfigJson cached field anywhere in the registry or session impls
  • Browse session tokens are Guid.NewGuid() and only ever cross the authenticated Blazor circuit

Cert handling

  • AutoAcceptCertificates = true honored with log warning + per-session unwire on dispose
  • Browse PKI store separate from runtime PKI — browse-time accept doesn't poison the runtime driver's trust store

Rate limiting

  • None. DriverOperator role gating + 2-minute TTL is the budget. A bad actor with DriverOperator already has Reconnect/Restart capability

Multi-replica AdminUI

  • Sticky cookies (already configured via Traefik) pin a user to one replica → BrowseSessionRegistry is always co-located with the circuit that created the token
  • Failover → token invalid on new replica → UI re-opens gracefully

8. Testing

Unit tests — per-driver browsers

  • tests/Drivers/.../OpcUaClient.Browser.Tests/: against opc-plc at opc.tcp://10.100.0.35:50000. OpcUaClientBrowseSessionTests, OpcUaClientDriverBrowserTests (bad endpoint, auth rejected, bad JSON)
  • tests/Drivers/.../Galaxy.Browser.Tests/: fake IGalaxyRepositoryClientTransport (precedent in gateway-client repo). GalaxyBrowseSessionTests, GalaxyDriverBrowserTests

Unit tests — AdminUI plumbing (added to existing tests/Server/AdminUI.Tests/)

  • BrowseSessionRegistryTests: register/get/remove, concurrent registration
  • BrowseSessionReaperTests: virtual time, idle eviction, non-idle preservation, eviction-vs-in-flight-expand race
  • BrowserSessionServiceTests: open→root→expand→close, unknown driver type, per-call timeout enforced

Component tests

  • DriverBrowseTree lazy-expand contract with fake IBrowserSessionService; per-node filter filters DOM but does not call ExpandAsync; click caching
  • Picker bodies: Browse button hidden when !_canOperate; manual entry still works

Integration tests (opt-in, fixture-gated)

  • tests/Drivers/.../OpcUaClient.Browser.IntegrationTests/: end-to-end against opc-plc, 3-level expand + round-trip resolve. Skipped unless OPCUA_SIM_ENDPOINT set
  • No Galaxy integration suite in v1 (requires wonder-app-vd03; deferred)

Specific regression tests

  • Namespace-stable round-trip: open → browse → take returned NodeId string → ExpandAsync(string) → must resolve back to same NodeId
  • TTL reaper racing live ExpandAsync: TryRemove while expand is in-flight → safe, translates to BrowseSessionNotFoundException

Verification at PR time

  • dotnet build ZB.MOM.WW.OtOpcUa.slnx clean
  • dotnet test tests/Server/.../AdminUI.Tests/ green (existing 51 + new ~12)
  • dotnet test tests/Drivers/.../OpcUaClient.Browser.Tests/ with lmxopcua-fix up opcuaclient
  • dotnet test tests/Drivers/.../Galaxy.Browser.Tests/ (no fixture)
  • Manual smoke: run AdminUI, edit an OpcUaClient driver, click Browse against opc-plc, pick a variable, verify the stored NodeId reads cleanly via Client CLI

9. Implementation sequencing (for plan-writing)

Suggested phase split — each phase shippable + reviewable independently:

  1. Phase 1 — Abstractions. Add IDriverBrowser, IBrowseSession, BrowseNode, AttributeInfo, BrowseNodeKind to Commons. Empty build.
  2. Phase 2 — Extract NamespaceMap. Move from runtime Driver.OpcUaClient to Driver.OpcUaClient.Contracts; update runtime ref.
  3. Phase 3 — OpcUaClient browser. New Driver.OpcUaClient.Browser project; impl + unit tests against opc-plc.
  4. Phase 4 — Galaxy browser. New Driver.Galaxy.Browser project; impl + unit tests with fake transport. Confirm attribute-fetch API surface on GalaxyRepositoryClient.
  5. Phase 5 — AdminUI plumbing. BrowseSessionRegistry, BrowseSessionReaper, BrowserSessionService, DI wire-up in Program.cs. Unit tests.
  6. Phase 6 — Shared DriverBrowseTree.razor. Lazy tree component with per-node filter. Component tests with fake service.
  7. Phase 7 — Wire pickers. Update OpcUaClientAddressPickerBody.razor and GalaxyAddressPickerBody.razor to use DriverBrowseTree + DriverOperator gating + (Galaxy) attribute side-panel. Manual smoke test.
  8. Phase 8 — Integration test + docs. Opt-in opc-plc integration suite, design doc cross-references in docs/, CLAUDE.md (or docs/security.md) updates if needed.

Decisions table

# Decision Rationale
1 Ad-hoc browse using form JSON Mirrors TestDriverConnect probe; works for new drafts and existing drivers uniformly
2 Tree + lazy load both drivers Galaxy gateway just shipped LazyBrowseNode.ExpandAsync — symmetric UX possible
3 AdminUI-hosted via IDriverBrowser factory Browse is interactive (≥10 calls/session); cross-cluster Ask hop would multiply latency; session has no value to other nodes
4 Sibling *.Browser projects Keep AdminUI from pulling runtime Driver.* projects' SDK chains
5 NamespaceMap to OpcUaClient.Contracts Shared between runtime + browser, no new project needed
6 Separate browse PKI store Browse-time cert accept must not poison runtime driver's trust store
7 Per-node client-side text filter (v1) Quick UX win; server-side filters deferred
8 2 min idle TTL, 30s reaper tick Matches typical user cadence; bounds resource exposure
9 20 s per-call / 30 s open timeouts Interactive feel; longer hangs almost always mean broken remote
10 DriverOperator role gating Live remote connection is operationally privileged; matches Reconnect/Restart precedent
11 No audit trail Browse is read-only; matches probe pattern
12 Galaxy two-stage attribute side-panel One modal, no extra clicks vs. two-modal flow