Files
lmxopcua/docs/v1/reqs/MxAccessClientReqs.md
Joseph Doherty 006af51768 docs: post-PR-7.2 cleanup — audit + three-track scrub
Audit (three parallel agent passes) found 43 markdown files carrying
stale references to the deleted Galaxy.Host/Proxy/Shared projects
after the v2-mxgw merge. This commit lands the prioritized fixes.

Track 1 — high-traffic in-place rewrites (3 files, ~454 lines deleted)
- README.md (202 → 91 lines): drops .NET 4.8 / x86 / TopShelf install
  text; leads with the multi-driver .NET 10 server identity and points
  at scripts/install/Install-Services.ps1 and the parity rig.
- docs/v2/driver-specs.md §1 Galaxy (~289 → ~66 lines): replaces the
  Tier-C out-of-process spec with a Tier-A in-process description
  matching the current GalaxyDriver code, with the four-section
  GalaxyDriverOptions JSON shape pulled verbatim from
  Config/GalaxyDriverOptions.cs.
- docs/drivers/Galaxy.md (211 → 92 lines): full rewrite around the
  current Browse/Runtime/Health/Config sub-folders.

Track 2 — historical banners (5 files)
- lmx_mxgw.md, lmx_mxgw_impl.md, lmx_backend.md,
  docs/v2/Galaxy.ParityMatrix.md,
  docs/v2/implementation/phase-2-galaxy-out-of-process.md each get a
  " Completed 2026-04-30 — historical record" banner block. lmx_mxgw.md
  also fixes two dead links (`docs/Galaxy.Driver.md` and
  `docs/v2/Galaxy.Driver.md`) → `docs/drivers/Galaxy.md`.

Track 3 — v1 archive sweep (10 git mv + 1 new index + 2 in-place scrubs)
- Moved 10 v1 docs under docs/v1/ preserving subpath structure:
  AlarmTracking, Configuration, DataTypeMapping, HistoricalDataAccess,
  Subscriptions (top-level); drivers/Galaxy-Repository,
  drivers/Galaxy-Test-Fixture; reqs/GalaxyRepositoryReqs,
  reqs/MxAccessClientReqs, reqs/ServiceHostReqs.
- New docs/v1/README.md is the shared archive banner + per-file table.
- docs/README.md repointed to the v1 paths and updated to reflect the
  v2 two-process deploy shape (Server + Admin + optional
  OtOpcUaWonderwareHistorian).
- docs/v2/Galaxy.ParityRig.md got a historical banner + four inline
  scrubs marking the OtOpcUaGalaxyHost service / Driver.Galaxy.Host
  EXE / Driver.Galaxy.ParityTests project as deleted-in-PR-7.2.

The repo's live-reading surface (README + CLAUDE.md + docs/v2/) now
describes only the post-PR-7.2 architecture. v1 docs are preserved as
a labelled archive under docs/v1/.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 08:59:59 -04:00

206 lines
12 KiB
Markdown

# Galaxy Driver — MXAccess Client Requirements
> **Revision** — Refreshed 2026-04-19 for the OtOpcUa v2 multi-driver platform (task #205). Scope narrowed: this document covers the MXAccess surface **inside `OtOpcUa.Galaxy.Host`** (.NET Framework 4.8 x86 Windows service). The in-server `Driver.Galaxy.Proxy` implements the `IReadable` / `IWritable` / `ISubscribable` / `IAlarmSource` / `IHistoryProvider` capability interfaces and routes every wire call through the named pipe to this Host process. The STA thread + reconnect playback + subscription refcount requirements from v1 are preserved; what changed is where they live (Host service, not the Server process). MXA-010 (proxy-side wrapping) and MXA-011 (pipe ACL / shared secret) are new.
Parent: [HLR-002](HighLevelReqs.md#hlr-002-multi-driver-plug-in-model), [HLR-005](HighLevelReqs.md#hlr-005-live-data-access), [HLR-007](HighLevelReqs.md#hlr-007-service-hosting)
Driver scope: Galaxy only. Process scope: `OtOpcUa.Galaxy.Host` (Host side) and `Driver.Galaxy.Proxy` (server-side forwarder).
## MXA-001: STA Thread with Message Pump
All MXAccess COM objects shall be created and called on a dedicated STA thread running a Win32 message pump to ensure COM callbacks are delivered.
### Acceptance Criteria
- A dedicated thread is created with `ApartmentState.STA` before any MXAccess COM object is instantiated; implementation lives in `StaPump` inside `OtOpcUa.Galaxy.Host`.
- The thread runs a Win32 message pump using `GetMessage` / `TranslateMessage` / `DispatchMessage`.
- Work items are marshalled to the STA thread via `PostThreadMessage(WM_APP)` and a concurrent queue.
- All COM object creation (`LMXProxyServer`), method calls, and event callbacks happen on this thread.
- Thread name `Galaxy.Sta` (for diagnostics).
### Details
- If the STA thread dies unexpectedly, log Fatal and trigger Host service shutdown. The supervisor restarts the Host under its driver-stability policy (`docs/v2/driver-stability.md`). COM objects on the dead thread are unrecoverable; no in-process recovery is attempted.
- `RunAsync(Action)` returns a `Task` that completes when the action executes on the STA thread. Callers can `await` it.
---
## MXA-002: Connection Lifecycle
The Host shall support Register/Unregister lifecycle with the `LMXProxyServer` COM object, tracking the connection handle.
### Acceptance Criteria
- `Register(clientName)` is called on the STA thread and returns a positive connection handle on success.
- Handle ≤ 0 → descriptive error thrown; Host reports `DriverHealth.Unavailable` via the pipe so the Proxy reports Bad quality to the core.
- `Unregister(handle)` is called during disconnect after all subscriptions are removed.
- Client name comes from `OTOPCUA_GALAXY_CLIENT_NAME` environment variable; default `OtOpcUa-Galaxy.Host`. Must be unique per MXAccess registration (a cluster's Primary and Secondary each get their own client-name suffix via node override).
- Connection state transitions: Disconnected → Connecting → Connected → Disconnecting → Disconnected (and Error from any state).
### Details
- `ConnectedSince` (UTC) recorded after successful Register.
- `ReconnectCount` tracked for diagnostics and `/metrics`.
- State changes are emitted over the pipe as `DriverHealth` updates.
---
## MXA-003: Tag Subscription
The Host shall support subscribing to tags via AddItem + AdviseSupervisory, receiving value updates through OnDataChange callbacks.
### Acceptance Criteria
- Subscribe sequence: `AddItem(handle, address)` returns item handle, then `AdviseSupervisory(handle, itemHandle)` starts the subscription.
- `OnDataChange` callback delivers value, quality, timestamp, and MXSTATUS_PROXY array.
- Item address format: `tag_name.AttributeName` for scalars, `tag_name.AttributeName[]` for whole arrays.
- AddItem failure → Warning logged, failure propagated over the pipe to the Proxy.
- Bidirectional maps of `address ↔ itemHandle` maintained for callback resolution.
- Multi-client refcounting: two Proxy-side subscribe calls for the same address produce one MXAccess subscription; refcount decrement on the last unsubscribe triggers `UnAdvise` / `RemoveItem`.
### Details
- `AdviseSupervisory` (not `Advise`) is used because this is a background service without an interactive user session.
- Stored subscriptions dictionary maps address → callback for reconnect replay.
- On reconnect, every entry in stored subscriptions is re-subscribed (AddItem + AdviseSupervisory with new handles).
---
## MXA-004: Tag Read/Write
The Host shall support synchronous-style read and write operations, marshalled to the STA thread, with configurable timeouts.
### Acceptance Criteria
- Read pattern: prefer cached subscription value; fall back to subscribe-get-first-value-unsubscribe (AddItem → AdviseSupervisory → wait for OnDataChange → UnAdvise → RemoveItem).
- Write: AddItem → AdviseSupervisory → `Write()` → await `OnWriteComplete` callback → cleanup.
- Read timeout: `Galaxy:ReadTimeoutSeconds` in driver config (default 5 seconds) — enforced on the Host side in addition to the Proxy-side Polly `Timeout` leg.
- Write timeout: `Galaxy:WriteTimeoutSeconds` (default 5 seconds) — enforced similarly.
- Concurrent operation limit: configurable semaphore (`Galaxy:MaxConcurrentOperations`, default 10).
- All operations marshalled to the STA thread.
### Details
- Write uses security classification `-1` (no security). Galaxy runtime enforces security; OtOpcUa authorization is enforced server-side before the call ever reaches the pipe (per OPC-014 `AuthorizationGate`).
- `OnWriteComplete`: check `MXSTATUS_PROXY.success`. If 0, extract detail code and propagate as an error over the pipe.
- COM exceptions translated to meaningful error messages.
---
## MXA-005: Auto-Reconnect
The Host shall monitor connection health and automatically reconnect on failure, replaying all stored subscriptions after reconnect.
### Acceptance Criteria
- Monitor loop runs on a background thread at `Galaxy:MonitorIntervalSeconds` (default 5 seconds).
- On disconnect, attempt reconnect. On success, replay all stored subscriptions.
- On reconnect failure, log Warning and retry at next interval (no exponential backoff inside the Host; the Proxy-side Polly pipeline handles cross-process backoff against pipe failures).
- Reconnect count is incremented on each successful reconnect.
- Monitor loop is cancellable for clean Host shutdown.
### Details
- Reconnect cleans up old COM objects before creating new ones.
- After reconnect, probe subscription (MXA-006) is re-established first, then stored subscriptions.
- No max retry limit — keep trying indefinitely until the Host service is stopped.
---
## MXA-006: Probe-Based Health Monitoring
The Host shall optionally subscribe to a configurable probe tag and use OnDataChange callback staleness to detect silent connection failures.
### Acceptance Criteria
- Probe tag address configured via `Galaxy:ProbeTag`. If unset, probe monitoring is disabled.
- Track `_lastProbeValueTime` (UTC) updated on each OnDataChange for the probe tag.
- If `DateTime.UtcNow - _lastProbeValueTime > staleThreshold`, force disconnect and reconnect.
- Stale threshold: `Galaxy:ProbeStaleThresholdSeconds` (default 60 seconds).
- Implements `IHostConnectivityProbe` on the Proxy side so the core's `CapabilityInvoker` records probe outcomes with `DriverCapability.Probe` telemetry.
### Details
- The probe tag should be an attribute the Galaxy runtime updates regularly (platform heartbeat, area timestamp). Specific tag is site-dependent.
- After forced reconnect, reset `_lastProbeValueTime` to `DateTime.UtcNow`.
---
## MXA-007: COM Cleanup
On disconnect or disposal, the Host shall unwire event handlers, unadvise/remove all items, unregister, and release COM objects via `Marshal.ReleaseComObject`.
### Acceptance Criteria
- Cleanup order: UnAdvise all active subscriptions → RemoveItem all items → unwire OnDataChange and OnWriteComplete handlers → Unregister → `Marshal.ReleaseComObject`.
- On dispose: run disconnect if still connected, then dispose STA thread.
- Each cleanup step wrapped in try/catch (cleanup must not throw).
- After cleanup: handle maps cleared, pending write TCS entries abandoned, COM reference set to null.
### Details
- Stored subscriptions are NOT cleared on disconnect (preserved for reconnect replay). Only cleared on Dispose.
- Event handlers unwired BEFORE Unregister (else callbacks may fire on a dead object).
- `Marshal.ReleaseComObject` in a `finally` block, always.
---
## MXA-008: Operation Metrics
The MXAccess Host shall record timing and success/failure for Read, Write, and Subscribe operations.
### Acceptance Criteria
- Each operation records duration (ms) + success/failure.
- Metrics exposed over the pipe to the Proxy, which re-publishes them via OpenTelemetry → Prometheus under `DriverInstanceId = "galaxy-*"`, `HostName = "galaxy.host"`.
- Rolling 1000-entry buffer for percentile calculation.
- Uses an `ITimingScope` pattern: `using (var scope = metrics.BeginOperation("read")) { ... }`.
---
## MXA-009: Error Code Translation
The Host shall translate known MXAccess error codes from `MXSTATUS_PROXY.detail` into human-readable messages for logging and OPC UA status propagation.
### Acceptance Criteria
- Error 1008 → "User lacks security permission"
- Error 1012 → "Secured write required (one signature)"
- Error 1013 → "Verified write required (two signatures)"
- Unknown error codes logged with their numeric value.
- Translated messages flow back through the pipe and surface in OPC UA `StatusCode` descriptions and Server logs.
- Errors 1008 / 1012 / 1013 on write operations map to `Bad_UserAccessDenied` at the OPC UA surface.
---
## MXA-010: Proxy-Side Capability Wrapping
`Driver.Galaxy.Proxy` shall implement the capability interfaces as thin forwarders that serialize every call through the named pipe and route every call through `CapabilityInvoker`.
### Acceptance Criteria
- `Driver.Galaxy.Proxy` implements `IDriver` + `IReadable` + `IWritable` + `ISubscribable` + `ITagDiscovery` + `IRediscoverable` + `IAlarmSource` + `IHistoryProvider` + `IHostConnectivityProbe`.
- Each implementation uses `CapabilityInvoker.InvokeAsync(DriverCapability.<...>, …)` — direct pipe calls bypassing the invoker are caught by Roslyn **OTOPCUA0001**.
- Each method serializes a MessagePack request frame, sends over the pipe, awaits the response frame, deserializes, returns.
- Pipe disconnect mid-call → `CapabilityInvoker`'s circuit breaker counts the failure; sustained disconnect opens the circuit and Galaxy nodes surface Bad quality until the pipe reconnects.
- Proxy tolerates Host service restarts — it automatically reconnects and replays subscription setup (parallel to MXA-005 but across the IPC boundary).
---
## MXA-011: Pipe Security
The named pipe between Proxy and Host shall be restricted to the Server's runtime principal via SID-based ACL and authenticated with a per-process shared secret.
### Acceptance Criteria
- Pipe name from `OTOPCUA_GALAXY_PIPE` environment variable; default `OtOpcUaGalaxy`.
- Allowed SID passed as `OTOPCUA_ALLOWED_SID` — only the declared principal (typically the Server service account) can open the pipe; `Administrators` is explicitly NOT granted (per the `project_galaxy_host_installed` memory note).
- Shared secret passed via `OTOPCUA_GALAXY_SECRET` at spawn time; the Proxy must present the matching secret on the opening handshake.
- Secret is process-scoped (regenerated per Host restart) and never persisted to disk or Config DB.
- Pipe ACL denials are logged as Warning with the rejected principal SID.
### Details
- Environment variables are passed by the supervisor launching the Host (`docs/v2/driver-stability.md`).
- Dev-box secret is stored at `.local/galaxy-host-secret.txt` for NSSM-wrapped development runs (memory note: `project_galaxy_host_installed`).