Polling chosen over event-driven for initial scope: simpler, matches Admin UI consumer cadence, avoids DriverHost lifecycle-event plumbing that doesn't exist today. Event-driven push for sub-heartbeat latency is a straightforward follow-up. Admin.Services.HostStatusService left-joins DriverHostStatus against ClusterNode on NodeId so rows persist even when the ClusterNode entry doesn't exist yet (first-boot bootstrap case). StaleThreshold = 30s — covers one missed publisher heartbeat plus a generous buffer for clock skew and GC pauses. Admin Components/Pages/Hosts.razor — FleetAdmin-visible page grouped by cluster (handles the '(unassigned)' case for rows without a matching ClusterNode). Four summary cards (Hosts / Running / Stale / Faulted); per-cluster table with Node / Driver / Host / State + Stale-badge / Last-transition / Last-seen / Detail columns; 10s auto-refresh via IServiceScopeFactory timer pattern matching FleetStatusPoller + Fleet dashboard (PR 27). Row-class highlighting: Faulted → table-danger, Stale → table-warning, else default. State badge maps DriverHostState enum to bootstrap color classes. Sidebar link added between 'Fleet status' and 'Clusters'. Server csproj adds Microsoft.EntityFrameworkCore.SqlServer 10.0.0 + registers OtOpcUaConfigDbContext in Program.cs scoped via NodeOptions.ConfigDbConnectionString (no Admin-style manual SQL raw — the DbContext is the only access path, keeps migrations owner-of-record). Tests — HostStatusPublisherTests (4 new Integration cases, uses per-run throwaway DB matching the FleetStatusPollerTests pattern): publisher upserts one row per host from each probe-capable driver and skips non-probe drivers; second tick advances LastSeenUtc without creating duplicate rows (upsert pattern verified end-to-end); state change between ticks updates State AND StateChangedUtc (datetime2(3) rounds to millisecond precision so comparison uses 1ms tolerance — documented inline); MapState translates every HostState enum member. Server.Tests Integration: 4 new tests pass. Admin build clean, Admin.Tests Unit still 23 / 0. docs/v2/lmx-followups.md item #7 marked DONE with three explicit deferred items (event-driven push, failure-count column, SignalR fan-out). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
138 lines
7.2 KiB
Markdown
138 lines
7.2 KiB
Markdown
# LMX Galaxy bridge — remaining follow-ups
|
|
|
|
State after PR 19: the Galaxy driver is functionally at v1 parity through the
|
|
`IDriver` abstraction; the OPC UA server runs with LDAP-authenticated
|
|
Basic256Sha256 endpoints and alarms are observable through
|
|
`AlarmConditionState.ReportEvent`. The items below are what remains LMX-
|
|
specific before the stack can fully replace the v1 deployment, in
|
|
rough priority order.
|
|
|
|
## 1. Proxy-side `IHistoryProvider` for `ReadAtTime` / `ReadEvents`
|
|
|
|
**Status**: Host-side IPC shipped (PR 10 + PR 11). Proxy consumer not written.
|
|
|
|
PR 10 added `HistoryReadAtTimeRequest/Response` on the IPC wire and
|
|
`MxAccessGalaxyBackend.HistoryReadAtTimeAsync` delegates to
|
|
`HistorianDataSource.ReadAtTimeAsync`. PR 11 did the same for events
|
|
(`HistoryReadEventsRequest/Response` + `GalaxyHistoricalEvent`). The Proxy
|
|
side (`GalaxyProxyDriver`) doesn't call those yet — `Core.Abstractions.IHistoryProvider`
|
|
only exposes `ReadRawAsync` + `ReadProcessedAsync`.
|
|
|
|
**To do**:
|
|
- Extend `IHistoryProvider` with `ReadAtTimeAsync(string, DateTime[], …)` and
|
|
`ReadEventsAsync(string?, DateTime, DateTime, int, …)`.
|
|
- `GalaxyProxyDriver` calls the new IPC message kinds.
|
|
- `DriverNodeManager` wires the new capability methods onto `HistoryRead`
|
|
`AtTime` + `Events` service handlers.
|
|
- Integration test: OPC UA client calls `HistoryReadAtTime` / `HistoryReadEvents`,
|
|
value flows through IPC to the Host's `HistorianDataSource`, back to the client.
|
|
|
|
## 2. Write-gating by role — **DONE (PR 26)**
|
|
|
|
Landed in PR 26. `WriteAuthzPolicy` in `Server/Security/` maps
|
|
`SecurityClassification` → required role (`FreeAccess` → no role required,
|
|
`Operate`/`SecuredWrite` → `WriteOperate`, `Tune` → `WriteTune`,
|
|
`Configure`/`VerifiedWrite` → `WriteConfigure`, `ViewOnly` → deny regardless).
|
|
`DriverNodeManager` caches the classification per variable during discovery and
|
|
checks the session's roles (via `IRoleBearer`) in `OnWriteValue` before calling
|
|
`IWritable.WriteAsync`. Roles do not cascade — a session with `WriteOperate`
|
|
can't write a `Tune` attribute unless it also carries `WriteTune`.
|
|
|
|
See `feedback_acl_at_server_layer.md` in memory for the architectural directive
|
|
that authz stays at the server layer and never delegates to driver-specific auth.
|
|
|
|
## 3. Admin UI client-cert trust management — **DONE (PR 28)**
|
|
|
|
PR 28 shipped `/certificates` in the Admin UI. `CertTrustService` reads the OPC
|
|
UA server's PKI store root (`OpcUaServerOptions.PkiStoreRoot` — default
|
|
`%ProgramData%\OtOpcUa\pki`) and lists rejected + trusted certs by parsing the
|
|
`.der` files directly, so it has no `Opc.Ua` dependency and runs on any
|
|
Admin host that can reach the shared PKI directory.
|
|
|
|
Operator actions: Trust (moves `rejected/certs/*.der` → `trusted/certs/*.der`),
|
|
Delete rejected, Revoke trust. The OPC UA stack re-reads the trusted store on
|
|
each new client handshake, so no explicit reload signal is needed —
|
|
operators retry the rejected client's connection after trusting.
|
|
|
|
Deferred: flipping `AutoAcceptUntrustedClientCertificates` to `false` as the
|
|
deployment default. That's a production-hardening config change, not a code
|
|
gap — the Admin UI is now ready to be the trust gate.
|
|
|
|
## 4. Live-LDAP integration test — **DONE (PR 31)**
|
|
|
|
PR 31 shipped `Server.Tests/LdapUserAuthenticatorLiveTests.cs` — 6 live-bind
|
|
tests against the dev GLAuth instance at `localhost:3893`, skipped cleanly
|
|
when the port is unreachable. Covers: valid bind, wrong password, unknown
|
|
user, empty credentials, single-group → WriteOperate mapping, multi-group
|
|
admin user surfacing all mapped roles.
|
|
|
|
Also added `UserNameAttribute` to `LdapOptions` (default `uid` for RFC 2307
|
|
compat) so Active Directory deployments can configure `sAMAccountName` /
|
|
`userPrincipalName` without code changes. `LdapUserAuthenticatorAdCompatTests`
|
|
(5 unit guards) pins the AD-shape DN parsing + filter escape behaviors. See
|
|
`docs/security.md` §"Active Directory configuration" for the AD appsettings
|
|
snippet.
|
|
|
|
Deferred: asserting `session.Identity` end-to-end on the server side (i.e.
|
|
drive a full OPC UA session with username/password, then read an
|
|
`IHostConnectivityProbe`-style "whoami" node to verify the role surfaced).
|
|
That needs a test-only address-space node and is a separate PR.
|
|
|
|
## 5. Full Galaxy live-service smoke test against the merged v2 stack
|
|
|
|
**Status**: Individual pieces have live smoke tests (PR 5 MXAccess, PR 13
|
|
probe manager, PR 14 alarm tracker), but the full loop — OPC UA client →
|
|
`OtOpcUaServer` → `GalaxyProxyDriver` (in-process) → named-pipe to
|
|
Galaxy.Host subprocess → live MXAccess runtime → real Galaxy objects — has
|
|
no single end-to-end smoke test.
|
|
|
|
**To do**:
|
|
- Test that spawns the full topology, discovers a deployed Galaxy object,
|
|
subscribes to one of its attributes, writes a value back, and asserts the
|
|
write round-tripped through MXAccess. Skip when ArchestrA isn't running.
|
|
|
|
## 6. Second driver instance on the same server — **DONE (PR 32)**
|
|
|
|
`Server.Tests/MultipleDriverInstancesIntegrationTests.cs` registers two
|
|
drivers with distinct `DriverInstanceId`s on one `DriverHost`, spins up the
|
|
full OPC UA server, and asserts three behaviors: (1) each driver's namespace
|
|
URI (`urn:OtOpcUa:{id}`) resolves to a distinct index in the client's
|
|
NamespaceUris, (2) browsing one subtree returns that driver's folder and
|
|
does NOT leak the other driver's folder, (3) reads route to the correct
|
|
driver — the alpha instance returns 42 while beta returns 99, so a misroute
|
|
would surface at the assertion layer.
|
|
|
|
Deferred: the alarm-event multi-driver parity case (two drivers each raising
|
|
a `GalaxyAlarmEvent`, assert each condition lands on its owning instance's
|
|
condition node). Alarm tracking already has its own integration test
|
|
(`AlarmSubscription*`); the multi-driver alarm case would need a stub
|
|
`IAlarmSource` that's worth its own focused PR.
|
|
|
|
## 7. Host-status per-AppEngine granularity → Admin UI dashboard — **DONE (PRs 33 + 34)**
|
|
|
|
**PR 33** landed the data layer: `DriverHostStatus` entity + migration with
|
|
composite key `(NodeId, DriverInstanceId, HostName)` and two query-supporting
|
|
indexes (per-cluster drill-down on `NodeId`, stale-row detection on
|
|
`LastSeenUtc`).
|
|
|
|
**PR 34** wired the publisher + consumer. `HostStatusPublisher` is a
|
|
`BackgroundService` in the Server process that walks every registered
|
|
`IHostConnectivityProbe`-capable driver every 10s, calls
|
|
`GetHostStatuses()`, and upserts rows (`LastSeenUtc` advances each tick;
|
|
`State` + `StateChangedUtc` update on transitions). Admin UI `/hosts` page
|
|
groups by cluster, shows four summary cards (Hosts / Running / Stale /
|
|
Faulted), and flags rows whose `LastSeenUtc` is older than 30s as Stale so
|
|
operators see crashed Servers without waiting for a state change.
|
|
|
|
Deferred as follow-ups:
|
|
|
|
- Event-driven push (subscribe to `OnHostStatusChanged` per driver for
|
|
sub-heartbeat latency). Adds DriverHost lifecycle-event plumbing;
|
|
10s polling is fine for operator-scale use.
|
|
- Failure-count column — needs the publisher to track a transition history
|
|
per host, not just current-state.
|
|
- SignalR fan-out to the Admin page (currently the page polls the DB, not
|
|
a hub). The DB-polled version is fine at current cadence but a hub push
|
|
would eliminate the 10s race where a new row sits in the DB before the
|
|
Admin page notices.
|