Files
lmxopcua/docs/v2/lmx-followups.md
Joseph Doherty bf329b05d8 Phase 3 PR 35 — IHistoryProvider gains ReadAtTimeAsync + ReadEventsAsync; GalaxyProxyDriver implements both. Extends Core.Abstractions.IHistoryProvider with two new methods that round out the OPC UA Part 11 HistoryRead surface (HistoryReadAtTime + HistoryReadEvents are the last two modes not covered by the PR 19-era ReadRawAsync + ReadProcessedAsync) and wires GalaxyProxyDriver to call the existing PR-10/PR-11 IPC contracts the Host already implements.
Interface additions use C# default interface implementations that throw NotSupportedException — existing IHistoryProvider implementations keep compiling, only drivers whose backend carries the relevant capability override. This matches the 'capabilities are optional per driver' design already used by IHistoryProvider.ReadProcessedAsync's docs (Modbus / OPC UA Client drivers never had an event historian and the default-throw path lets callers see BadHistoryOperationUnsupported naturally). New HistoricalEvent record models one historian row (EventId, SourceName, EventTimeUtc + ReceivedTimeUtc — process vs historian-persist timestamps, Message, Severity mapped to OPC UA's 1-1000 range); HistoricalEventsResult pairs the event list with a continuation-point token for future batching. Both live in Core.Abstractions so downstream (Proxy, Host, Server) reference a single domain shape — no Shared-contract leak into the driver-facing interface.
GalaxyProxyDriver.ReadAtTimeAsync maps the domain DateTime[] to Unix-ms longs, calls CallAsync on the existing MessageKind.HistoryReadAtTimeRequest, and trusts the Host's one-sample-per-requested-timestamp contract (the Host pads with bad-quality snapshots for timestamps it can't interpolate; re-aligning on the Proxy side would duplicate the Host's interpolation policy logic). ReadEventsAsync does the same for HistoryReadEventsRequest; ToHistoricalEvent translates GalaxyHistoricalEvent (MessagePack-annotated, Unix-ms) to the domain record, explicitly tagging DateTimeKind.Utc on both timestamp fields so downstream serializers (JSON, OPC UA types) don't apply an unexpected local-time offset.
Tests — HistoricalEventMappingTests (3 new Proxy.Tests unit cases): every field maps correctly from wire to domain; null SourceName and null DisplayText preserve through the mapping (system events without a source come out with null so callers can distinguish them from alarm events); both timestamps come out as DateTimeKind.Utc (regression guard against a future refactor using DateTime.FromFileTimeUtc or similar that defaults to Unspecified). Driver.Galaxy.Proxy.Tests Unit suite: 17 pass / 0 fail (14 prior + 3 new). Full solution build clean, 0 errors.
Scope exclusions — DriverNodeManager HistoryRead service-handler wiring (on the OPC UA Server side, where HistoryReadAtTime and HistoryReadEvents service requests land) and the full-loop integration test (OPC UA client → server → IPC → Host → HistorianDataSource → back) are deferred to a focused follow-up PR. The capability surface is the load-bearing change; wiring the service handlers is mechanical in comparison and worth its own PR for reviewability. docs/v2/lmx-followups.md #1 updated with the split.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-18 16:08:27 -04:00

137 lines
7.1 KiB
Markdown

# LMX Galaxy bridge — remaining follow-ups
State after PR 19: the Galaxy driver is functionally at v1 parity through the
`IDriver` abstraction; the OPC UA server runs with LDAP-authenticated
Basic256Sha256 endpoints and alarms are observable through
`AlarmConditionState.ReportEvent`. The items below are what remains LMX-
specific before the stack can fully replace the v1 deployment, in
rough priority order.
## 1. Proxy-side `IHistoryProvider` for `ReadAtTime` / `ReadEvents`
**Status**: Capability surface complete (PR 35). OPC UA HistoryRead service-handler
wiring in `DriverNodeManager` remains as the next step; integration-test still
pending.
PR 35 extended `IHistoryProvider` with `ReadAtTimeAsync` + `ReadEventsAsync`
(default throwing implementations so existing impls keep compiling), added the
`HistoricalEvent` + `HistoricalEventsResult` records to
`Core.Abstractions`, and implemented both methods in `GalaxyProxyDriver` on top
of the PR 10 / PR 11 IPC messages. Wire-to-domain mapping (`ToHistoricalEvent`)
is unit-tested for field fidelity, null-preservation, and `DateTimeKind.Utc`.
**Remaining**:
- `DriverNodeManager` wires the new capability methods onto `HistoryRead`
`AtTime` + `Events` service handlers.
- Integration test: OPC UA client calls `HistoryReadAtTime` / `HistoryReadEvents`,
value flows through IPC to the Host's `HistorianDataSource`, back to the client.
## 2. Write-gating by role — **DONE (PR 26)**
Landed in PR 26. `WriteAuthzPolicy` in `Server/Security/` maps
`SecurityClassification` → required role (`FreeAccess` → no role required,
`Operate`/`SecuredWrite``WriteOperate`, `Tune``WriteTune`,
`Configure`/`VerifiedWrite``WriteConfigure`, `ViewOnly` → deny regardless).
`DriverNodeManager` caches the classification per variable during discovery and
checks the session's roles (via `IRoleBearer`) in `OnWriteValue` before calling
`IWritable.WriteAsync`. Roles do not cascade — a session with `WriteOperate`
can't write a `Tune` attribute unless it also carries `WriteTune`.
See `feedback_acl_at_server_layer.md` in memory for the architectural directive
that authz stays at the server layer and never delegates to driver-specific auth.
## 3. Admin UI client-cert trust management — **DONE (PR 28)**
PR 28 shipped `/certificates` in the Admin UI. `CertTrustService` reads the OPC
UA server's PKI store root (`OpcUaServerOptions.PkiStoreRoot` — default
`%ProgramData%\OtOpcUa\pki`) and lists rejected + trusted certs by parsing the
`.der` files directly, so it has no `Opc.Ua` dependency and runs on any
Admin host that can reach the shared PKI directory.
Operator actions: Trust (moves `rejected/certs/*.der``trusted/certs/*.der`),
Delete rejected, Revoke trust. The OPC UA stack re-reads the trusted store on
each new client handshake, so no explicit reload signal is needed —
operators retry the rejected client's connection after trusting.
Deferred: flipping `AutoAcceptUntrustedClientCertificates` to `false` as the
deployment default. That's a production-hardening config change, not a code
gap — the Admin UI is now ready to be the trust gate.
## 4. Live-LDAP integration test — **DONE (PR 31)**
PR 31 shipped `Server.Tests/LdapUserAuthenticatorLiveTests.cs` — 6 live-bind
tests against the dev GLAuth instance at `localhost:3893`, skipped cleanly
when the port is unreachable. Covers: valid bind, wrong password, unknown
user, empty credentials, single-group → WriteOperate mapping, multi-group
admin user surfacing all mapped roles.
Also added `UserNameAttribute` to `LdapOptions` (default `uid` for RFC 2307
compat) so Active Directory deployments can configure `sAMAccountName` /
`userPrincipalName` without code changes. `LdapUserAuthenticatorAdCompatTests`
(5 unit guards) pins the AD-shape DN parsing + filter escape behaviors. See
`docs/security.md` §"Active Directory configuration" for the AD appsettings
snippet.
Deferred: asserting `session.Identity` end-to-end on the server side (i.e.
drive a full OPC UA session with username/password, then read an
`IHostConnectivityProbe`-style "whoami" node to verify the role surfaced).
That needs a test-only address-space node and is a separate PR.
## 5. Full Galaxy live-service smoke test against the merged v2 stack
**Status**: Individual pieces have live smoke tests (PR 5 MXAccess, PR 13
probe manager, PR 14 alarm tracker), but the full loop — OPC UA client →
`OtOpcUaServer``GalaxyProxyDriver` (in-process) → named-pipe to
Galaxy.Host subprocess → live MXAccess runtime → real Galaxy objects — has
no single end-to-end smoke test.
**To do**:
- Test that spawns the full topology, discovers a deployed Galaxy object,
subscribes to one of its attributes, writes a value back, and asserts the
write round-tripped through MXAccess. Skip when ArchestrA isn't running.
## 6. Second driver instance on the same server — **DONE (PR 32)**
`Server.Tests/MultipleDriverInstancesIntegrationTests.cs` registers two
drivers with distinct `DriverInstanceId`s on one `DriverHost`, spins up the
full OPC UA server, and asserts three behaviors: (1) each driver's namespace
URI (`urn:OtOpcUa:{id}`) resolves to a distinct index in the client's
NamespaceUris, (2) browsing one subtree returns that driver's folder and
does NOT leak the other driver's folder, (3) reads route to the correct
driver — the alpha instance returns 42 while beta returns 99, so a misroute
would surface at the assertion layer.
Deferred: the alarm-event multi-driver parity case (two drivers each raising
a `GalaxyAlarmEvent`, assert each condition lands on its owning instance's
condition node). Alarm tracking already has its own integration test
(`AlarmSubscription*`); the multi-driver alarm case would need a stub
`IAlarmSource` that's worth its own focused PR.
## 7. Host-status per-AppEngine granularity → Admin UI dashboard — **DONE (PRs 33 + 34)**
**PR 33** landed the data layer: `DriverHostStatus` entity + migration with
composite key `(NodeId, DriverInstanceId, HostName)` and two query-supporting
indexes (per-cluster drill-down on `NodeId`, stale-row detection on
`LastSeenUtc`).
**PR 34** wired the publisher + consumer. `HostStatusPublisher` is a
`BackgroundService` in the Server process that walks every registered
`IHostConnectivityProbe`-capable driver every 10s, calls
`GetHostStatuses()`, and upserts rows (`LastSeenUtc` advances each tick;
`State` + `StateChangedUtc` update on transitions). Admin UI `/hosts` page
groups by cluster, shows four summary cards (Hosts / Running / Stale /
Faulted), and flags rows whose `LastSeenUtc` is older than 30s as Stale so
operators see crashed Servers without waiting for a state change.
Deferred as follow-ups:
- Event-driven push (subscribe to `OnHostStatusChanged` per driver for
sub-heartbeat latency). Adds DriverHost lifecycle-event plumbing;
10s polling is fine for operator-scale use.
- Failure-count column — needs the publisher to track a transition history
per host, not just current-state.
- SignalR fan-out to the Admin page (currently the page polls the DB, not
a hub). The DB-polled version is fine at current cadence but a hub push
would eliminate the 10s race where a new row sits in the DB before the
Admin page notices.