Doc refresh (task #205) — requirements updated for multi-driver OtOpcUa three-process deploy
Per-file summary: - docs/reqs/OpcUaServerReqs.md — rewritten driver-agnostic. OPC-001..OPC-013 re-scoped to multi-driver address-space composition + capability dispatch; OPC-014 AuthorizationGate + permission trie; OPC-015 dynamic ServiceLevel via RedundancyCoordinator; OPC-017 surgical generation-apply rebuild; OPC-012 capability dispatch via CapabilityInvoker (decision #143 idempotence-aware retry); OPC-013 per-host Polly isolation (decision #144); OPC-019 OpenTelemetry metrics. Transport-security profile matrix (OPC-010) + UserName/LDAP (OPC-011) preserved. - docs/reqs/GalaxyRepositoryReqs.md — scope clarified as Galaxy-driver-only (not platform). GR-001..GR-004 tied to ITagDiscovery.DiscoverAsync + IRediscoverable; all SQL runs inside OtOpcUa.Galaxy.Host and streams to Proxy via named pipe. GR-008 capability wrapping via CapabilityInvoker added. Cross-links to docs/v2/driver-specs.md + docs/GalaxyRepository.md. - docs/reqs/MxAccessClientReqs.md — scope clarified as Galaxy-Host-only. MXA-001..MXA-009 preserved (STA pump, register/unregister, subscription refcount, auto-reconnect, probe, COM cleanup, operation metrics, error translation). MXA-010 Proxy-side capability wrapping + MXA-011 pipe ACL + per-process shared secret (OTOPCUA_ALLOWED_SID / OTOPCUA_GALAXY_SECRET) added. - docs/reqs/ServiceHostReqs.md — rewritten for three-process deployment. Shared section (SVC-SHARED-001/002) for Serilog + bootstrap-only appsettings. SRV-* for OtOpcUa.Server (net10 x64, Microsoft.Extensions.Hosting + AddWindowsService, in-process driver hosting, redundancy-node bootstrap). ADM-* for OtOpcUa.Admin (Blazor Server, cookie+LDAP auth, CanEdit/CanPublish policies, sole DB writer, Prometheus /metrics, audit logging). GHX-* for OtOpcUa.Galaxy.Host (TopShelf, net48 x86, named-pipe IPC bootstrap, STA backend lifecycle, crash handling tied to supervisor). - docs/reqs/ClientRequirements.md — restructured as numbered, verifiable requirements. SHR-* for Client.Shared (single IOpcUaClientService, ConnectionSettings, failover, cross-platform certs, type-coercing write, UI-thread neutrality). CLI-001..CLI-011 cover connect/read/write/browse/subscribe/historyread/alarms/redundancy. UI-001..UI-008 cover connection panel, tree browser, each tab, connection-state reflection, cross-platform build. Reference design content (IOpcUaClientService shape, models, view-model map, mock layout) preserved. - docs/reqs/StatusDashboardReqs.md — retired cleanly. Replaced with a pointer to docs/v2/admin-ui.md + HLR-015 / HLR-016 / HLR-017 / ADM-*. Mapping table shows each retired DASH-001..DASH-009 requirement's replacement (live cluster-node view via SignalR, Prometheus metrics, driver-instance detail views, etc.). Note that a formal AdminUiReqs.md can be written later if needed for cert compliance. HighLevelReqs.md was already at the target shape (HLR-001..HLR-018 with Revision header noting retired HLR-009) as of commit f217636; verified identical and no additional edit required. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -1,6 +1,10 @@
|
||||
# MXAccess Client — Component Requirements
|
||||
# Galaxy Driver — MXAccess Client Requirements
|
||||
|
||||
Parent: [HLR-003](HighLevelReqs.md#hlr-003-mxaccess-runtime-data-access), [HLR-008](HighLevelReqs.md#hlr-008-connection-resilience)
|
||||
> **Revision** — Refreshed 2026-04-19 for the OtOpcUa v2 multi-driver platform (task #205). Scope narrowed: this document covers the MXAccess surface **inside `OtOpcUa.Galaxy.Host`** (.NET Framework 4.8 x86 Windows service). The in-server `Driver.Galaxy.Proxy` implements the `IReadable` / `IWritable` / `ISubscribable` / `IAlarmSource` / `IHistoryProvider` capability interfaces and routes every wire call through the named pipe to this Host process. The STA thread + reconnect playback + subscription refcount requirements from v1 are preserved; what changed is where they live (Host service, not the Server process). MXA-010 (proxy-side wrapping) and MXA-011 (pipe ACL / shared secret) are new.
|
||||
|
||||
Parent: [HLR-002](HighLevelReqs.md#hlr-002-multi-driver-plug-in-model), [HLR-005](HighLevelReqs.md#hlr-005-live-data-access), [HLR-007](HighLevelReqs.md#hlr-007-service-hosting)
|
||||
|
||||
Driver scope: Galaxy only. Process scope: `OtOpcUa.Galaxy.Host` (Host side) and `Driver.Galaxy.Proxy` (server-side forwarder).
|
||||
|
||||
## MXA-001: STA Thread with Message Pump
|
||||
|
||||
@@ -8,165 +12,194 @@ All MXAccess COM objects shall be created and called on a dedicated STA thread r
|
||||
|
||||
### Acceptance Criteria
|
||||
|
||||
- A dedicated thread is created with `ApartmentState.STA` before any MXAccess COM objects are instantiated.
|
||||
- The thread runs a Win32 message pump using `GetMessage`/`TranslateMessage`/`DispatchMessage` loop.
|
||||
- A dedicated thread is created with `ApartmentState.STA` before any MXAccess COM object is instantiated; implementation lives in `StaPump` inside `OtOpcUa.Galaxy.Host`.
|
||||
- The thread runs a Win32 message pump using `GetMessage` / `TranslateMessage` / `DispatchMessage`.
|
||||
- Work items are marshalled to the STA thread via `PostThreadMessage(WM_APP)` and a concurrent queue.
|
||||
- The STA thread processes work items between message pump iterations.
|
||||
- All COM object creation (`LMXProxyServer` constructor), method calls, and event callbacks happen on this thread.
|
||||
- All COM object creation (`LMXProxyServer`), method calls, and event callbacks happen on this thread.
|
||||
- Thread name `Galaxy.Sta` (for diagnostics).
|
||||
|
||||
### Details
|
||||
|
||||
- Thread name: `MxAccess-STA` (for diagnostics).
|
||||
- If the STA thread dies unexpectedly, log Fatal and trigger service shutdown. Do not attempt to create a replacement thread (COM objects on the dead thread are unrecoverable).
|
||||
- `RunAsync(Action)` method returns a `Task` that completes when the action executes on the STA thread. Callers can `await` it.
|
||||
- If the STA thread dies unexpectedly, log Fatal and trigger Host service shutdown. The supervisor restarts the Host under its driver-stability policy (`docs/v2/driver-stability.md`). COM objects on the dead thread are unrecoverable; no in-process recovery is attempted.
|
||||
- `RunAsync(Action)` returns a `Task` that completes when the action executes on the STA thread. Callers can `await` it.
|
||||
|
||||
---
|
||||
|
||||
## MXA-002: Connection Lifecycle
|
||||
|
||||
The client shall support Register/Unregister lifecycle with the LMXProxyServer COM object, tracking the connection handle.
|
||||
The Host shall support Register/Unregister lifecycle with the `LMXProxyServer` COM object, tracking the connection handle.
|
||||
|
||||
### Acceptance Criteria
|
||||
|
||||
- `Register(clientName)` is called on the STA thread and returns a positive connection handle on success.
|
||||
- If Register returns handle <= 0, throw with descriptive error.
|
||||
- Handle ≤ 0 → descriptive error thrown; Host reports `DriverHealth.Unavailable` via the pipe so the Proxy reports Bad quality to the core.
|
||||
- `Unregister(handle)` is called during disconnect after all subscriptions are removed.
|
||||
- Client name: configurable via `MxAccess:ClientName`, default `LmxOpcUa`. Must be unique per MXAccess registration.
|
||||
- Client name comes from `OTOPCUA_GALAXY_CLIENT_NAME` environment variable; default `OtOpcUa-Galaxy.Host`. Must be unique per MXAccess registration (a cluster's Primary and Secondary each get their own client-name suffix via node override).
|
||||
- Connection state transitions: Disconnected → Connecting → Connected → Disconnecting → Disconnected (and Error from any state).
|
||||
|
||||
### Details
|
||||
|
||||
- `ConnectedSince` timestamp (UTC) is recorded after successful Register.
|
||||
- `ReconnectCount` is tracked for diagnostics and dashboard display.
|
||||
- State change events are raised for dashboard and health check consumption.
|
||||
- `ConnectedSince` (UTC) recorded after successful Register.
|
||||
- `ReconnectCount` tracked for diagnostics and `/metrics`.
|
||||
- State changes are emitted over the pipe as `DriverHealth` updates.
|
||||
|
||||
---
|
||||
|
||||
## MXA-003: Tag Subscription
|
||||
|
||||
The client shall support subscribing to tags via AddItem + AdviseSupervisory, receiving value updates through OnDataChange callbacks.
|
||||
The Host shall support subscribing to tags via AddItem + AdviseSupervisory, receiving value updates through OnDataChange callbacks.
|
||||
|
||||
### Acceptance Criteria
|
||||
|
||||
- Subscribe sequence: `AddItem(handle, address)` returns item handle, then `AdviseSupervisory(handle, itemHandle)` starts the subscription.
|
||||
- `OnDataChange` callback delivers value, quality (integer), timestamp, and MXSTATUS_PROXY array.
|
||||
- `OnDataChange` callback delivers value, quality, timestamp, and MXSTATUS_PROXY array.
|
||||
- Item address format: `tag_name.AttributeName` for scalars, `tag_name.AttributeName[]` for whole arrays.
|
||||
- If AddItem fails (e.g., tag does not exist), log Warning and return failure to caller.
|
||||
- Bidirectional maps of `address ↔ itemHandle` are maintained for callback resolution.
|
||||
- AddItem failure → Warning logged, failure propagated over the pipe to the Proxy.
|
||||
- Bidirectional maps of `address ↔ itemHandle` maintained for callback resolution.
|
||||
- Multi-client refcounting: two Proxy-side subscribe calls for the same address produce one MXAccess subscription; refcount decrement on the last unsubscribe triggers `UnAdvise` / `RemoveItem`.
|
||||
|
||||
### Details
|
||||
|
||||
- Use `AdviseSupervisory` (not `Advise`) because this is a background service with no interactive user session. AdviseSupervisory allows secured/verified writes without user authentication.
|
||||
- Stored subscriptions dictionary maps address to callback for reconnect replay.
|
||||
- On reconnect, all entries in stored subscriptions are re-subscribed (AddItem + AdviseSupervisory with new handles).
|
||||
- `AdviseSupervisory` (not `Advise`) is used because this is a background service without an interactive user session.
|
||||
- Stored subscriptions dictionary maps address → callback for reconnect replay.
|
||||
- On reconnect, every entry in stored subscriptions is re-subscribed (AddItem + AdviseSupervisory with new handles).
|
||||
|
||||
---
|
||||
|
||||
## MXA-004: Tag Read/Write
|
||||
|
||||
The client shall support synchronous-style read and write operations, marshalled to the STA thread, with configurable timeouts.
|
||||
The Host shall support synchronous-style read and write operations, marshalled to the STA thread, with configurable timeouts.
|
||||
|
||||
### Acceptance Criteria
|
||||
|
||||
- Read: implemented as subscribe-get-first-value-unsubscribe pattern (AddItem → AdviseSupervisory → wait for OnDataChange → UnAdvise → RemoveItem).
|
||||
- Read pattern: prefer cached subscription value; fall back to subscribe-get-first-value-unsubscribe (AddItem → AdviseSupervisory → wait for OnDataChange → UnAdvise → RemoveItem).
|
||||
- Write: AddItem → AdviseSupervisory → `Write()` → await `OnWriteComplete` callback → cleanup.
|
||||
- Read timeout: configurable via `MxAccess:ReadTimeoutSeconds`, default 5 seconds.
|
||||
- Write timeout: configurable via `MxAccess:WriteTimeoutSeconds`, default 5 seconds. On timeout, log Warning and return timeout error.
|
||||
- Concurrent operation limit: configurable semaphore via `MxAccess:MaxConcurrentOperations`, default 10.
|
||||
- Read timeout: `Galaxy:ReadTimeoutSeconds` in driver config (default 5 seconds) — enforced on the Host side in addition to the Proxy-side Polly `Timeout` leg.
|
||||
- Write timeout: `Galaxy:WriteTimeoutSeconds` (default 5 seconds) — enforced similarly.
|
||||
- Concurrent operation limit: configurable semaphore (`Galaxy:MaxConcurrentOperations`, default 10).
|
||||
- All operations marshalled to the STA thread.
|
||||
|
||||
### Details
|
||||
|
||||
- Write uses security classification -1 (no security). Galaxy runtime handles security enforcement.
|
||||
- `OnWriteComplete` callback: check MXSTATUS_PROXY `success` field. If 0, extract detail code and propagate error.
|
||||
- COM exceptions (`COMException` with HRESULT) are caught and translated to meaningful error messages.
|
||||
- Write uses security classification `-1` (no security). Galaxy runtime enforces security; OtOpcUa authorization is enforced server-side before the call ever reaches the pipe (per OPC-014 `AuthorizationGate`).
|
||||
- `OnWriteComplete`: check `MXSTATUS_PROXY.success`. If 0, extract detail code and propagate as an error over the pipe.
|
||||
- COM exceptions translated to meaningful error messages.
|
||||
|
||||
---
|
||||
|
||||
## MXA-005: Auto-Reconnect
|
||||
|
||||
The client shall monitor connection health and automatically reconnect on failure, replaying all stored subscriptions after reconnect.
|
||||
The Host shall monitor connection health and automatically reconnect on failure, replaying all stored subscriptions after reconnect.
|
||||
|
||||
### Acceptance Criteria
|
||||
|
||||
- Monitor loop runs on a background thread, checking connection health at configurable interval (`MxAccess:MonitorIntervalSeconds`, default 5 seconds).
|
||||
- If disconnected, attempt reconnect. On success, replay all stored subscriptions.
|
||||
- On reconnect failure, log Warning and retry at next interval (no exponential backoff — reconnect as quickly as possible on a plant-floor service).
|
||||
- Monitor loop runs on a background thread at `Galaxy:MonitorIntervalSeconds` (default 5 seconds).
|
||||
- On disconnect, attempt reconnect. On success, replay all stored subscriptions.
|
||||
- On reconnect failure, log Warning and retry at next interval (no exponential backoff inside the Host; the Proxy-side Polly pipeline handles cross-process backoff against pipe failures).
|
||||
- Reconnect count is incremented on each successful reconnect.
|
||||
- Monitor loop is cancellable (for clean shutdown).
|
||||
- Monitor loop is cancellable for clean Host shutdown.
|
||||
|
||||
### Details
|
||||
|
||||
- Reconnect cleans up old COM objects before creating new ones.
|
||||
- After reconnect, probe subscription is re-established first, then stored subscriptions.
|
||||
- No max retry limit — keep trying indefinitely until service is stopped.
|
||||
- After reconnect, probe subscription (MXA-006) is re-established first, then stored subscriptions.
|
||||
- No max retry limit — keep trying indefinitely until the Host service is stopped.
|
||||
|
||||
---
|
||||
|
||||
## MXA-006: Probe-Based Health Monitoring
|
||||
|
||||
The client shall optionally subscribe to a configurable probe tag and use OnDataChange callback staleness to detect silent connection failures.
|
||||
The Host shall optionally subscribe to a configurable probe tag and use OnDataChange callback staleness to detect silent connection failures.
|
||||
|
||||
### Acceptance Criteria
|
||||
|
||||
- Subscribe to a configurable probe tag (a known-good Galaxy attribute that changes periodically).
|
||||
- Probe tag address configured via `Galaxy:ProbeTag`. If unset, probe monitoring is disabled.
|
||||
- Track `_lastProbeValueTime` (UTC) updated on each OnDataChange for the probe tag.
|
||||
- If `DateTime.UtcNow - _lastProbeValueTime > staleThreshold`, force disconnect and reconnect.
|
||||
- Probe tag address: configurable via `MxAccess:ProbeTag`. If not configured, probe monitoring is disabled.
|
||||
- Stale threshold: configurable via `MxAccess:ProbeStaleThresholdSeconds`, default 60 seconds.
|
||||
- Stale threshold: `Galaxy:ProbeStaleThresholdSeconds` (default 60 seconds).
|
||||
- Implements `IHostConnectivityProbe` on the Proxy side so the core's `CapabilityInvoker` records probe outcomes with `DriverCapability.Probe` telemetry.
|
||||
|
||||
### Details
|
||||
|
||||
- The probe tag should be an attribute that the Galaxy runtime updates regularly (e.g., a platform heartbeat or area-level timestamp). The specific tag is site-dependent.
|
||||
- After forced reconnect, reset `_lastProbeValueTime` to `DateTime.UtcNow` to give the new connection a full threshold window.
|
||||
- The probe tag should be an attribute the Galaxy runtime updates regularly (platform heartbeat, area timestamp). Specific tag is site-dependent.
|
||||
- After forced reconnect, reset `_lastProbeValueTime` to `DateTime.UtcNow`.
|
||||
|
||||
---
|
||||
|
||||
## MXA-007: COM Cleanup
|
||||
|
||||
On disconnect or disposal, the client shall unwire event handlers, unadvise/remove all items, unregister, and release COM objects via Marshal.ReleaseComObject.
|
||||
On disconnect or disposal, the Host shall unwire event handlers, unadvise/remove all items, unregister, and release COM objects via `Marshal.ReleaseComObject`.
|
||||
|
||||
### Acceptance Criteria
|
||||
|
||||
- Cleanup order: UnAdvise all active subscriptions → RemoveItem all items → unwire OnDataChange and OnWriteComplete event handlers → Unregister → `Marshal.ReleaseComObject`.
|
||||
- Cleanup order: UnAdvise all active subscriptions → RemoveItem all items → unwire OnDataChange and OnWriteComplete handlers → Unregister → `Marshal.ReleaseComObject`.
|
||||
- On dispose: run disconnect if still connected, then dispose STA thread.
|
||||
- Each cleanup step is wrapped in try/catch (cleanup must not throw).
|
||||
- After cleanup: handle maps are cleared, pending write TCS entries are abandoned, COM reference is set to null.
|
||||
- Each cleanup step wrapped in try/catch (cleanup must not throw).
|
||||
- After cleanup: handle maps cleared, pending write TCS entries abandoned, COM reference set to null.
|
||||
|
||||
### Details
|
||||
|
||||
- `_storedSubscriptions` is NOT cleared on disconnect (preserved for reconnect replay). Only cleared on Dispose.
|
||||
- Event handlers must be unwired BEFORE Unregister, or callbacks may fire on a dead object.
|
||||
- `Marshal.ReleaseComObject` in a finally block, always, even if earlier steps fail.
|
||||
- Stored subscriptions are NOT cleared on disconnect (preserved for reconnect replay). Only cleared on Dispose.
|
||||
- Event handlers unwired BEFORE Unregister (else callbacks may fire on a dead object).
|
||||
- `Marshal.ReleaseComObject` in a `finally` block, always.
|
||||
|
||||
---
|
||||
|
||||
## MXA-008: Operation Metrics
|
||||
|
||||
The MXAccess client shall record timing and success/failure for Read, Write, and Subscribe operations.
|
||||
The MXAccess Host shall record timing and success/failure for Read, Write, and Subscribe operations.
|
||||
|
||||
### Acceptance Criteria
|
||||
|
||||
- Each operation records: duration (ms), success/failure.
|
||||
- Metrics are available for the status dashboard: count, success rate, avg/min/max/P95 latency.
|
||||
- Uses a rolling 1000-entry buffer for percentile calculation.
|
||||
- Metrics are exposed via a queryable interface consumed by the status report service.
|
||||
|
||||
### Details
|
||||
|
||||
- Uses an `ITimingScope` pattern: `using (var scope = metrics.BeginOperation("read")) { ... }` for automatic timing and success tracking.
|
||||
- Metrics are periodically logged at Debug level for diagnostics.
|
||||
- Each operation records duration (ms) + success/failure.
|
||||
- Metrics exposed over the pipe to the Proxy, which re-publishes them via OpenTelemetry → Prometheus under `DriverInstanceId = "galaxy-*"`, `HostName = "galaxy.host"`.
|
||||
- Rolling 1000-entry buffer for percentile calculation.
|
||||
- Uses an `ITimingScope` pattern: `using (var scope = metrics.BeginOperation("read")) { ... }`.
|
||||
|
||||
---
|
||||
|
||||
## MXA-009: Error Code Translation
|
||||
|
||||
The client shall translate known MXAccess error codes from MXSTATUS_PROXY.detail into human-readable messages for logging and OPC UA status propagation.
|
||||
The Host shall translate known MXAccess error codes from `MXSTATUS_PROXY.detail` into human-readable messages for logging and OPC UA status propagation.
|
||||
|
||||
### Acceptance Criteria
|
||||
|
||||
- Error 1008 → "User lacks security permission"
|
||||
- Error 1012 → "Secured write required (one signature)"
|
||||
- Error 1013 → "Verified write required (two signatures)"
|
||||
- Unknown error codes are logged with their numeric value.
|
||||
- Translated messages are included in OPC UA StatusCode descriptions and log entries.
|
||||
- Unknown error codes logged with their numeric value.
|
||||
- Translated messages flow back through the pipe and surface in OPC UA `StatusCode` descriptions and Server logs.
|
||||
- Errors 1008 / 1012 / 1013 on write operations map to `Bad_UserAccessDenied` at the OPC UA surface.
|
||||
|
||||
---
|
||||
|
||||
## MXA-010: Proxy-Side Capability Wrapping
|
||||
|
||||
`Driver.Galaxy.Proxy` shall implement the capability interfaces as thin forwarders that serialize every call through the named pipe and route every call through `CapabilityInvoker`.
|
||||
|
||||
### Acceptance Criteria
|
||||
|
||||
- `Driver.Galaxy.Proxy` implements `IDriver` + `IReadable` + `IWritable` + `ISubscribable` + `ITagDiscovery` + `IRediscoverable` + `IAlarmSource` + `IHistoryProvider` + `IHostConnectivityProbe`.
|
||||
- Each implementation uses `CapabilityInvoker.InvokeAsync(DriverCapability.<...>, …)` — direct pipe calls bypassing the invoker are caught by Roslyn **OTOPCUA0001**.
|
||||
- Each method serializes a MessagePack request frame, sends over the pipe, awaits the response frame, deserializes, returns.
|
||||
- Pipe disconnect mid-call → `CapabilityInvoker`'s circuit breaker counts the failure; sustained disconnect opens the circuit and Galaxy nodes surface Bad quality until the pipe reconnects.
|
||||
- Proxy tolerates Host service restarts — it automatically reconnects and replays subscription setup (parallel to MXA-005 but across the IPC boundary).
|
||||
|
||||
---
|
||||
|
||||
## MXA-011: Pipe Security
|
||||
|
||||
The named pipe between Proxy and Host shall be restricted to the Server's runtime principal via SID-based ACL and authenticated with a per-process shared secret.
|
||||
|
||||
### Acceptance Criteria
|
||||
|
||||
- Pipe name from `OTOPCUA_GALAXY_PIPE` environment variable; default `OtOpcUaGalaxy`.
|
||||
- Allowed SID passed as `OTOPCUA_ALLOWED_SID` — only the declared principal (typically the Server service account) can open the pipe; `Administrators` is explicitly NOT granted (per the `project_galaxy_host_installed` memory note).
|
||||
- Shared secret passed via `OTOPCUA_GALAXY_SECRET` at spawn time; the Proxy must present the matching secret on the opening handshake.
|
||||
- Secret is process-scoped (regenerated per Host restart) and never persisted to disk or Config DB.
|
||||
- Pipe ACL denials are logged as Warning with the rejected principal SID.
|
||||
|
||||
### Details
|
||||
|
||||
- Environment variables are passed by the supervisor launching the Host (`docs/v2/driver-stability.md`).
|
||||
- Dev-box secret is stored at `.local/galaxy-host-secret.txt` for NSSM-wrapped development runs (memory note: `project_galaxy_host_installed`).
|
||||
|
||||
Reference in New Issue
Block a user