Files
lmxopcua/docs/reqs/MxAccessClientReqs.md
Joseph Doherty 48970af416 Doc refresh (task #205) — requirements updated for multi-driver OtOpcUa three-process deploy
Per-file summary:

- docs/reqs/OpcUaServerReqs.md — rewritten driver-agnostic. OPC-001..OPC-013 re-scoped to multi-driver address-space composition + capability dispatch; OPC-014 AuthorizationGate + permission trie; OPC-015 dynamic ServiceLevel via RedundancyCoordinator; OPC-017 surgical generation-apply rebuild; OPC-012 capability dispatch via CapabilityInvoker (decision #143 idempotence-aware retry); OPC-013 per-host Polly isolation (decision #144); OPC-019 OpenTelemetry metrics. Transport-security profile matrix (OPC-010) + UserName/LDAP (OPC-011) preserved.

- docs/reqs/GalaxyRepositoryReqs.md — scope clarified as Galaxy-driver-only (not platform). GR-001..GR-004 tied to ITagDiscovery.DiscoverAsync + IRediscoverable; all SQL runs inside OtOpcUa.Galaxy.Host and streams to Proxy via named pipe. GR-008 capability wrapping via CapabilityInvoker added. Cross-links to docs/v2/driver-specs.md + docs/GalaxyRepository.md.

- docs/reqs/MxAccessClientReqs.md — scope clarified as Galaxy-Host-only. MXA-001..MXA-009 preserved (STA pump, register/unregister, subscription refcount, auto-reconnect, probe, COM cleanup, operation metrics, error translation). MXA-010 Proxy-side capability wrapping + MXA-011 pipe ACL + per-process shared secret (OTOPCUA_ALLOWED_SID / OTOPCUA_GALAXY_SECRET) added.

- docs/reqs/ServiceHostReqs.md — rewritten for three-process deployment. Shared section (SVC-SHARED-001/002) for Serilog + bootstrap-only appsettings. SRV-* for OtOpcUa.Server (net10 x64, Microsoft.Extensions.Hosting + AddWindowsService, in-process driver hosting, redundancy-node bootstrap). ADM-* for OtOpcUa.Admin (Blazor Server, cookie+LDAP auth, CanEdit/CanPublish policies, sole DB writer, Prometheus /metrics, audit logging). GHX-* for OtOpcUa.Galaxy.Host (TopShelf, net48 x86, named-pipe IPC bootstrap, STA backend lifecycle, crash handling tied to supervisor).

- docs/reqs/ClientRequirements.md — restructured as numbered, verifiable requirements. SHR-* for Client.Shared (single IOpcUaClientService, ConnectionSettings, failover, cross-platform certs, type-coercing write, UI-thread neutrality). CLI-001..CLI-011 cover connect/read/write/browse/subscribe/historyread/alarms/redundancy. UI-001..UI-008 cover connection panel, tree browser, each tab, connection-state reflection, cross-platform build. Reference design content (IOpcUaClientService shape, models, view-model map, mock layout) preserved.

- docs/reqs/StatusDashboardReqs.md — retired cleanly. Replaced with a pointer to docs/v2/admin-ui.md + HLR-015 / HLR-016 / HLR-017 / ADM-*. Mapping table shows each retired DASH-001..DASH-009 requirement's replacement (live cluster-node view via SignalR, Prometheus metrics, driver-instance detail views, etc.). Note that a formal AdminUiReqs.md can be written later if needed for cert compliance.

HighLevelReqs.md was already at the target shape (HLR-001..HLR-018 with Revision header noting retired HLR-009) as of commit f217636; verified identical and no additional edit required.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-20 01:31:58 -04:00

12 KiB

Galaxy Driver — MXAccess Client Requirements

Revision — Refreshed 2026-04-19 for the OtOpcUa v2 multi-driver platform (task #205). Scope narrowed: this document covers the MXAccess surface inside OtOpcUa.Galaxy.Host (.NET Framework 4.8 x86 Windows service). The in-server Driver.Galaxy.Proxy implements the IReadable / IWritable / ISubscribable / IAlarmSource / IHistoryProvider capability interfaces and routes every wire call through the named pipe to this Host process. The STA thread + reconnect playback + subscription refcount requirements from v1 are preserved; what changed is where they live (Host service, not the Server process). MXA-010 (proxy-side wrapping) and MXA-011 (pipe ACL / shared secret) are new.

Parent: HLR-002, HLR-005, HLR-007

Driver scope: Galaxy only. Process scope: OtOpcUa.Galaxy.Host (Host side) and Driver.Galaxy.Proxy (server-side forwarder).

MXA-001: STA Thread with Message Pump

All MXAccess COM objects shall be created and called on a dedicated STA thread running a Win32 message pump to ensure COM callbacks are delivered.

Acceptance Criteria

  • A dedicated thread is created with ApartmentState.STA before any MXAccess COM object is instantiated; implementation lives in StaPump inside OtOpcUa.Galaxy.Host.
  • The thread runs a Win32 message pump using GetMessage / TranslateMessage / DispatchMessage.
  • Work items are marshalled to the STA thread via PostThreadMessage(WM_APP) and a concurrent queue.
  • All COM object creation (LMXProxyServer), method calls, and event callbacks happen on this thread.
  • Thread name Galaxy.Sta (for diagnostics).

Details

  • If the STA thread dies unexpectedly, log Fatal and trigger Host service shutdown. The supervisor restarts the Host under its driver-stability policy (docs/v2/driver-stability.md). COM objects on the dead thread are unrecoverable; no in-process recovery is attempted.
  • RunAsync(Action) returns a Task that completes when the action executes on the STA thread. Callers can await it.

MXA-002: Connection Lifecycle

The Host shall support Register/Unregister lifecycle with the LMXProxyServer COM object, tracking the connection handle.

Acceptance Criteria

  • Register(clientName) is called on the STA thread and returns a positive connection handle on success.
  • Handle ≤ 0 → descriptive error thrown; Host reports DriverHealth.Unavailable via the pipe so the Proxy reports Bad quality to the core.
  • Unregister(handle) is called during disconnect after all subscriptions are removed.
  • Client name comes from OTOPCUA_GALAXY_CLIENT_NAME environment variable; default OtOpcUa-Galaxy.Host. Must be unique per MXAccess registration (a cluster's Primary and Secondary each get their own client-name suffix via node override).
  • Connection state transitions: Disconnected → Connecting → Connected → Disconnecting → Disconnected (and Error from any state).

Details

  • ConnectedSince (UTC) recorded after successful Register.
  • ReconnectCount tracked for diagnostics and /metrics.
  • State changes are emitted over the pipe as DriverHealth updates.

MXA-003: Tag Subscription

The Host shall support subscribing to tags via AddItem + AdviseSupervisory, receiving value updates through OnDataChange callbacks.

Acceptance Criteria

  • Subscribe sequence: AddItem(handle, address) returns item handle, then AdviseSupervisory(handle, itemHandle) starts the subscription.
  • OnDataChange callback delivers value, quality, timestamp, and MXSTATUS_PROXY array.
  • Item address format: tag_name.AttributeName for scalars, tag_name.AttributeName[] for whole arrays.
  • AddItem failure → Warning logged, failure propagated over the pipe to the Proxy.
  • Bidirectional maps of address ↔ itemHandle maintained for callback resolution.
  • Multi-client refcounting: two Proxy-side subscribe calls for the same address produce one MXAccess subscription; refcount decrement on the last unsubscribe triggers UnAdvise / RemoveItem.

Details

  • AdviseSupervisory (not Advise) is used because this is a background service without an interactive user session.
  • Stored subscriptions dictionary maps address → callback for reconnect replay.
  • On reconnect, every entry in stored subscriptions is re-subscribed (AddItem + AdviseSupervisory with new handles).

MXA-004: Tag Read/Write

The Host shall support synchronous-style read and write operations, marshalled to the STA thread, with configurable timeouts.

Acceptance Criteria

  • Read pattern: prefer cached subscription value; fall back to subscribe-get-first-value-unsubscribe (AddItem → AdviseSupervisory → wait for OnDataChange → UnAdvise → RemoveItem).
  • Write: AddItem → AdviseSupervisory → Write() → await OnWriteComplete callback → cleanup.
  • Read timeout: Galaxy:ReadTimeoutSeconds in driver config (default 5 seconds) — enforced on the Host side in addition to the Proxy-side Polly Timeout leg.
  • Write timeout: Galaxy:WriteTimeoutSeconds (default 5 seconds) — enforced similarly.
  • Concurrent operation limit: configurable semaphore (Galaxy:MaxConcurrentOperations, default 10).
  • All operations marshalled to the STA thread.

Details

  • Write uses security classification -1 (no security). Galaxy runtime enforces security; OtOpcUa authorization is enforced server-side before the call ever reaches the pipe (per OPC-014 AuthorizationGate).
  • OnWriteComplete: check MXSTATUS_PROXY.success. If 0, extract detail code and propagate as an error over the pipe.
  • COM exceptions translated to meaningful error messages.

MXA-005: Auto-Reconnect

The Host shall monitor connection health and automatically reconnect on failure, replaying all stored subscriptions after reconnect.

Acceptance Criteria

  • Monitor loop runs on a background thread at Galaxy:MonitorIntervalSeconds (default 5 seconds).
  • On disconnect, attempt reconnect. On success, replay all stored subscriptions.
  • On reconnect failure, log Warning and retry at next interval (no exponential backoff inside the Host; the Proxy-side Polly pipeline handles cross-process backoff against pipe failures).
  • Reconnect count is incremented on each successful reconnect.
  • Monitor loop is cancellable for clean Host shutdown.

Details

  • Reconnect cleans up old COM objects before creating new ones.
  • After reconnect, probe subscription (MXA-006) is re-established first, then stored subscriptions.
  • No max retry limit — keep trying indefinitely until the Host service is stopped.

MXA-006: Probe-Based Health Monitoring

The Host shall optionally subscribe to a configurable probe tag and use OnDataChange callback staleness to detect silent connection failures.

Acceptance Criteria

  • Probe tag address configured via Galaxy:ProbeTag. If unset, probe monitoring is disabled.
  • Track _lastProbeValueTime (UTC) updated on each OnDataChange for the probe tag.
  • If DateTime.UtcNow - _lastProbeValueTime > staleThreshold, force disconnect and reconnect.
  • Stale threshold: Galaxy:ProbeStaleThresholdSeconds (default 60 seconds).
  • Implements IHostConnectivityProbe on the Proxy side so the core's CapabilityInvoker records probe outcomes with DriverCapability.Probe telemetry.

Details

  • The probe tag should be an attribute the Galaxy runtime updates regularly (platform heartbeat, area timestamp). Specific tag is site-dependent.
  • After forced reconnect, reset _lastProbeValueTime to DateTime.UtcNow.

MXA-007: COM Cleanup

On disconnect or disposal, the Host shall unwire event handlers, unadvise/remove all items, unregister, and release COM objects via Marshal.ReleaseComObject.

Acceptance Criteria

  • Cleanup order: UnAdvise all active subscriptions → RemoveItem all items → unwire OnDataChange and OnWriteComplete handlers → Unregister → Marshal.ReleaseComObject.
  • On dispose: run disconnect if still connected, then dispose STA thread.
  • Each cleanup step wrapped in try/catch (cleanup must not throw).
  • After cleanup: handle maps cleared, pending write TCS entries abandoned, COM reference set to null.

Details

  • Stored subscriptions are NOT cleared on disconnect (preserved for reconnect replay). Only cleared on Dispose.
  • Event handlers unwired BEFORE Unregister (else callbacks may fire on a dead object).
  • Marshal.ReleaseComObject in a finally block, always.

MXA-008: Operation Metrics

The MXAccess Host shall record timing and success/failure for Read, Write, and Subscribe operations.

Acceptance Criteria

  • Each operation records duration (ms) + success/failure.
  • Metrics exposed over the pipe to the Proxy, which re-publishes them via OpenTelemetry → Prometheus under DriverInstanceId = "galaxy-*", HostName = "galaxy.host".
  • Rolling 1000-entry buffer for percentile calculation.
  • Uses an ITimingScope pattern: using (var scope = metrics.BeginOperation("read")) { ... }.

MXA-009: Error Code Translation

The Host shall translate known MXAccess error codes from MXSTATUS_PROXY.detail into human-readable messages for logging and OPC UA status propagation.

Acceptance Criteria

  • Error 1008 → "User lacks security permission"
  • Error 1012 → "Secured write required (one signature)"
  • Error 1013 → "Verified write required (two signatures)"
  • Unknown error codes logged with their numeric value.
  • Translated messages flow back through the pipe and surface in OPC UA StatusCode descriptions and Server logs.
  • Errors 1008 / 1012 / 1013 on write operations map to Bad_UserAccessDenied at the OPC UA surface.

MXA-010: Proxy-Side Capability Wrapping

Driver.Galaxy.Proxy shall implement the capability interfaces as thin forwarders that serialize every call through the named pipe and route every call through CapabilityInvoker.

Acceptance Criteria

  • Driver.Galaxy.Proxy implements IDriver + IReadable + IWritable + ISubscribable + ITagDiscovery + IRediscoverable + IAlarmSource + IHistoryProvider + IHostConnectivityProbe.
  • Each implementation uses CapabilityInvoker.InvokeAsync(DriverCapability.<...>, …) — direct pipe calls bypassing the invoker are caught by Roslyn OTOPCUA0001.
  • Each method serializes a MessagePack request frame, sends over the pipe, awaits the response frame, deserializes, returns.
  • Pipe disconnect mid-call → CapabilityInvoker's circuit breaker counts the failure; sustained disconnect opens the circuit and Galaxy nodes surface Bad quality until the pipe reconnects.
  • Proxy tolerates Host service restarts — it automatically reconnects and replays subscription setup (parallel to MXA-005 but across the IPC boundary).

MXA-011: Pipe Security

The named pipe between Proxy and Host shall be restricted to the Server's runtime principal via SID-based ACL and authenticated with a per-process shared secret.

Acceptance Criteria

  • Pipe name from OTOPCUA_GALAXY_PIPE environment variable; default OtOpcUaGalaxy.
  • Allowed SID passed as OTOPCUA_ALLOWED_SID — only the declared principal (typically the Server service account) can open the pipe; Administrators is explicitly NOT granted (per the project_galaxy_host_installed memory note).
  • Shared secret passed via OTOPCUA_GALAXY_SECRET at spawn time; the Proxy must present the matching secret on the opening handshake.
  • Secret is process-scoped (regenerated per Host restart) and never persisted to disk or Config DB.
  • Pipe ACL denials are logged as Warning with the rejected principal SID.

Details

  • Environment variables are passed by the supervisor launching the Host (docs/v2/driver-stability.md).
  • Dev-box secret is stored at .local/galaxy-host-secret.txt for NSSM-wrapped development runs (memory note: project_galaxy_host_installed).