Audit (three parallel agent passes) found 43 markdown files carrying stale references to the deleted Galaxy.Host/Proxy/Shared projects after the v2-mxgw merge. This commit lands the prioritized fixes. Track 1 — high-traffic in-place rewrites (3 files, ~454 lines deleted) - README.md (202 → 91 lines): drops .NET 4.8 / x86 / TopShelf install text; leads with the multi-driver .NET 10 server identity and points at scripts/install/Install-Services.ps1 and the parity rig. - docs/v2/driver-specs.md §1 Galaxy (~289 → ~66 lines): replaces the Tier-C out-of-process spec with a Tier-A in-process description matching the current GalaxyDriver code, with the four-section GalaxyDriverOptions JSON shape pulled verbatim from Config/GalaxyDriverOptions.cs. - docs/drivers/Galaxy.md (211 → 92 lines): full rewrite around the current Browse/Runtime/Health/Config sub-folders. Track 2 — historical banners (5 files) - lmx_mxgw.md, lmx_mxgw_impl.md, lmx_backend.md, docs/v2/Galaxy.ParityMatrix.md, docs/v2/implementation/phase-2-galaxy-out-of-process.md each get a "✅ Completed 2026-04-30 — historical record" banner block. lmx_mxgw.md also fixes two dead links (`docs/Galaxy.Driver.md` and `docs/v2/Galaxy.Driver.md`) → `docs/drivers/Galaxy.md`. Track 3 — v1 archive sweep (10 git mv + 1 new index + 2 in-place scrubs) - Moved 10 v1 docs under docs/v1/ preserving subpath structure: AlarmTracking, Configuration, DataTypeMapping, HistoricalDataAccess, Subscriptions (top-level); drivers/Galaxy-Repository, drivers/Galaxy-Test-Fixture; reqs/GalaxyRepositoryReqs, reqs/MxAccessClientReqs, reqs/ServiceHostReqs. - New docs/v1/README.md is the shared archive banner + per-file table. - docs/README.md repointed to the v1 paths and updated to reflect the v2 two-process deploy shape (Server + Admin + optional OtOpcUaWonderwareHistorian). - docs/v2/Galaxy.ParityRig.md got a historical banner + four inline scrubs marking the OtOpcUaGalaxyHost service / Driver.Galaxy.Host EXE / Driver.Galaxy.ParityTests project as deleted-in-PR-7.2. The repo's live-reading surface (README + CLAUDE.md + docs/v2/) now describes only the post-PR-7.2 architecture. v1 docs are preserved as a labelled archive under docs/v1/. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
12 KiB
Galaxy Driver — MXAccess Client Requirements
Revision — Refreshed 2026-04-19 for the OtOpcUa v2 multi-driver platform (task #205). Scope narrowed: this document covers the MXAccess surface inside
OtOpcUa.Galaxy.Host(.NET Framework 4.8 x86 Windows service). The in-serverDriver.Galaxy.Proxyimplements theIReadable/IWritable/ISubscribable/IAlarmSource/IHistoryProvidercapability interfaces and routes every wire call through the named pipe to this Host process. The STA thread + reconnect playback + subscription refcount requirements from v1 are preserved; what changed is where they live (Host service, not the Server process). MXA-010 (proxy-side wrapping) and MXA-011 (pipe ACL / shared secret) are new.
Parent: HLR-002, HLR-005, HLR-007
Driver scope: Galaxy only. Process scope: OtOpcUa.Galaxy.Host (Host side) and Driver.Galaxy.Proxy (server-side forwarder).
MXA-001: STA Thread with Message Pump
All MXAccess COM objects shall be created and called on a dedicated STA thread running a Win32 message pump to ensure COM callbacks are delivered.
Acceptance Criteria
- A dedicated thread is created with
ApartmentState.STAbefore any MXAccess COM object is instantiated; implementation lives inStaPumpinsideOtOpcUa.Galaxy.Host. - The thread runs a Win32 message pump using
GetMessage/TranslateMessage/DispatchMessage. - Work items are marshalled to the STA thread via
PostThreadMessage(WM_APP)and a concurrent queue. - All COM object creation (
LMXProxyServer), method calls, and event callbacks happen on this thread. - Thread name
Galaxy.Sta(for diagnostics).
Details
- If the STA thread dies unexpectedly, log Fatal and trigger Host service shutdown. The supervisor restarts the Host under its driver-stability policy (
docs/v2/driver-stability.md). COM objects on the dead thread are unrecoverable; no in-process recovery is attempted. RunAsync(Action)returns aTaskthat completes when the action executes on the STA thread. Callers canawaitit.
MXA-002: Connection Lifecycle
The Host shall support Register/Unregister lifecycle with the LMXProxyServer COM object, tracking the connection handle.
Acceptance Criteria
Register(clientName)is called on the STA thread and returns a positive connection handle on success.- Handle ≤ 0 → descriptive error thrown; Host reports
DriverHealth.Unavailablevia the pipe so the Proxy reports Bad quality to the core. Unregister(handle)is called during disconnect after all subscriptions are removed.- Client name comes from
OTOPCUA_GALAXY_CLIENT_NAMEenvironment variable; defaultOtOpcUa-Galaxy.Host. Must be unique per MXAccess registration (a cluster's Primary and Secondary each get their own client-name suffix via node override). - Connection state transitions: Disconnected → Connecting → Connected → Disconnecting → Disconnected (and Error from any state).
Details
ConnectedSince(UTC) recorded after successful Register.ReconnectCounttracked for diagnostics and/metrics.- State changes are emitted over the pipe as
DriverHealthupdates.
MXA-003: Tag Subscription
The Host shall support subscribing to tags via AddItem + AdviseSupervisory, receiving value updates through OnDataChange callbacks.
Acceptance Criteria
- Subscribe sequence:
AddItem(handle, address)returns item handle, thenAdviseSupervisory(handle, itemHandle)starts the subscription. OnDataChangecallback delivers value, quality, timestamp, and MXSTATUS_PROXY array.- Item address format:
tag_name.AttributeNamefor scalars,tag_name.AttributeName[]for whole arrays. - AddItem failure → Warning logged, failure propagated over the pipe to the Proxy.
- Bidirectional maps of
address ↔ itemHandlemaintained for callback resolution. - Multi-client refcounting: two Proxy-side subscribe calls for the same address produce one MXAccess subscription; refcount decrement on the last unsubscribe triggers
UnAdvise/RemoveItem.
Details
AdviseSupervisory(notAdvise) is used because this is a background service without an interactive user session.- Stored subscriptions dictionary maps address → callback for reconnect replay.
- On reconnect, every entry in stored subscriptions is re-subscribed (AddItem + AdviseSupervisory with new handles).
MXA-004: Tag Read/Write
The Host shall support synchronous-style read and write operations, marshalled to the STA thread, with configurable timeouts.
Acceptance Criteria
- Read pattern: prefer cached subscription value; fall back to subscribe-get-first-value-unsubscribe (AddItem → AdviseSupervisory → wait for OnDataChange → UnAdvise → RemoveItem).
- Write: AddItem → AdviseSupervisory →
Write()→ awaitOnWriteCompletecallback → cleanup. - Read timeout:
Galaxy:ReadTimeoutSecondsin driver config (default 5 seconds) — enforced on the Host side in addition to the Proxy-side PollyTimeoutleg. - Write timeout:
Galaxy:WriteTimeoutSeconds(default 5 seconds) — enforced similarly. - Concurrent operation limit: configurable semaphore (
Galaxy:MaxConcurrentOperations, default 10). - All operations marshalled to the STA thread.
Details
- Write uses security classification
-1(no security). Galaxy runtime enforces security; OtOpcUa authorization is enforced server-side before the call ever reaches the pipe (per OPC-014AuthorizationGate). OnWriteComplete: checkMXSTATUS_PROXY.success. If 0, extract detail code and propagate as an error over the pipe.- COM exceptions translated to meaningful error messages.
MXA-005: Auto-Reconnect
The Host shall monitor connection health and automatically reconnect on failure, replaying all stored subscriptions after reconnect.
Acceptance Criteria
- Monitor loop runs on a background thread at
Galaxy:MonitorIntervalSeconds(default 5 seconds). - On disconnect, attempt reconnect. On success, replay all stored subscriptions.
- On reconnect failure, log Warning and retry at next interval (no exponential backoff inside the Host; the Proxy-side Polly pipeline handles cross-process backoff against pipe failures).
- Reconnect count is incremented on each successful reconnect.
- Monitor loop is cancellable for clean Host shutdown.
Details
- Reconnect cleans up old COM objects before creating new ones.
- After reconnect, probe subscription (MXA-006) is re-established first, then stored subscriptions.
- No max retry limit — keep trying indefinitely until the Host service is stopped.
MXA-006: Probe-Based Health Monitoring
The Host shall optionally subscribe to a configurable probe tag and use OnDataChange callback staleness to detect silent connection failures.
Acceptance Criteria
- Probe tag address configured via
Galaxy:ProbeTag. If unset, probe monitoring is disabled. - Track
_lastProbeValueTime(UTC) updated on each OnDataChange for the probe tag. - If
DateTime.UtcNow - _lastProbeValueTime > staleThreshold, force disconnect and reconnect. - Stale threshold:
Galaxy:ProbeStaleThresholdSeconds(default 60 seconds). - Implements
IHostConnectivityProbeon the Proxy side so the core'sCapabilityInvokerrecords probe outcomes withDriverCapability.Probetelemetry.
Details
- The probe tag should be an attribute the Galaxy runtime updates regularly (platform heartbeat, area timestamp). Specific tag is site-dependent.
- After forced reconnect, reset
_lastProbeValueTimetoDateTime.UtcNow.
MXA-007: COM Cleanup
On disconnect or disposal, the Host shall unwire event handlers, unadvise/remove all items, unregister, and release COM objects via Marshal.ReleaseComObject.
Acceptance Criteria
- Cleanup order: UnAdvise all active subscriptions → RemoveItem all items → unwire OnDataChange and OnWriteComplete handlers → Unregister →
Marshal.ReleaseComObject. - On dispose: run disconnect if still connected, then dispose STA thread.
- Each cleanup step wrapped in try/catch (cleanup must not throw).
- After cleanup: handle maps cleared, pending write TCS entries abandoned, COM reference set to null.
Details
- Stored subscriptions are NOT cleared on disconnect (preserved for reconnect replay). Only cleared on Dispose.
- Event handlers unwired BEFORE Unregister (else callbacks may fire on a dead object).
Marshal.ReleaseComObjectin afinallyblock, always.
MXA-008: Operation Metrics
The MXAccess Host shall record timing and success/failure for Read, Write, and Subscribe operations.
Acceptance Criteria
- Each operation records duration (ms) + success/failure.
- Metrics exposed over the pipe to the Proxy, which re-publishes them via OpenTelemetry → Prometheus under
DriverInstanceId = "galaxy-*",HostName = "galaxy.host". - Rolling 1000-entry buffer for percentile calculation.
- Uses an
ITimingScopepattern:using (var scope = metrics.BeginOperation("read")) { ... }.
MXA-009: Error Code Translation
The Host shall translate known MXAccess error codes from MXSTATUS_PROXY.detail into human-readable messages for logging and OPC UA status propagation.
Acceptance Criteria
- Error 1008 → "User lacks security permission"
- Error 1012 → "Secured write required (one signature)"
- Error 1013 → "Verified write required (two signatures)"
- Unknown error codes logged with their numeric value.
- Translated messages flow back through the pipe and surface in OPC UA
StatusCodedescriptions and Server logs. - Errors 1008 / 1012 / 1013 on write operations map to
Bad_UserAccessDeniedat the OPC UA surface.
MXA-010: Proxy-Side Capability Wrapping
Driver.Galaxy.Proxy shall implement the capability interfaces as thin forwarders that serialize every call through the named pipe and route every call through CapabilityInvoker.
Acceptance Criteria
Driver.Galaxy.ProxyimplementsIDriver+IReadable+IWritable+ISubscribable+ITagDiscovery+IRediscoverable+IAlarmSource+IHistoryProvider+IHostConnectivityProbe.- Each implementation uses
CapabilityInvoker.InvokeAsync(DriverCapability.<...>, …)— direct pipe calls bypassing the invoker are caught by Roslyn OTOPCUA0001. - Each method serializes a MessagePack request frame, sends over the pipe, awaits the response frame, deserializes, returns.
- Pipe disconnect mid-call →
CapabilityInvoker's circuit breaker counts the failure; sustained disconnect opens the circuit and Galaxy nodes surface Bad quality until the pipe reconnects. - Proxy tolerates Host service restarts — it automatically reconnects and replays subscription setup (parallel to MXA-005 but across the IPC boundary).
MXA-011: Pipe Security
The named pipe between Proxy and Host shall be restricted to the Server's runtime principal via SID-based ACL and authenticated with a per-process shared secret.
Acceptance Criteria
- Pipe name from
OTOPCUA_GALAXY_PIPEenvironment variable; defaultOtOpcUaGalaxy. - Allowed SID passed as
OTOPCUA_ALLOWED_SID— only the declared principal (typically the Server service account) can open the pipe;Administratorsis explicitly NOT granted (per theproject_galaxy_host_installedmemory note). - Shared secret passed via
OTOPCUA_GALAXY_SECRETat spawn time; the Proxy must present the matching secret on the opening handshake. - Secret is process-scoped (regenerated per Host restart) and never persisted to disk or Config DB.
- Pipe ACL denials are logged as Warning with the rejected principal SID.
Details
- Environment variables are passed by the supervisor launching the Host (
docs/v2/driver-stability.md). - Dev-box secret is stored at
.local/galaxy-host-secret.txtfor NSSM-wrapped development runs (memory note:project_galaxy_host_installed).