Polling chosen over event-driven for initial scope: simpler, matches Admin UI consumer cadence, avoids DriverHost lifecycle-event plumbing that doesn't exist today. Event-driven push for sub-heartbeat latency is a straightforward follow-up. Admin.Services.HostStatusService left-joins DriverHostStatus against ClusterNode on NodeId so rows persist even when the ClusterNode entry doesn't exist yet (first-boot bootstrap case). StaleThreshold = 30s — covers one missed publisher heartbeat plus a generous buffer for clock skew and GC pauses. Admin Components/Pages/Hosts.razor — FleetAdmin-visible page grouped by cluster (handles the '(unassigned)' case for rows without a matching ClusterNode). Four summary cards (Hosts / Running / Stale / Faulted); per-cluster table with Node / Driver / Host / State + Stale-badge / Last-transition / Last-seen / Detail columns; 10s auto-refresh via IServiceScopeFactory timer pattern matching FleetStatusPoller + Fleet dashboard (PR 27). Row-class highlighting: Faulted → table-danger, Stale → table-warning, else default. State badge maps DriverHostState enum to bootstrap color classes. Sidebar link added between 'Fleet status' and 'Clusters'. Server csproj adds Microsoft.EntityFrameworkCore.SqlServer 10.0.0 + registers OtOpcUaConfigDbContext in Program.cs scoped via NodeOptions.ConfigDbConnectionString (no Admin-style manual SQL raw — the DbContext is the only access path, keeps migrations owner-of-record). Tests — HostStatusPublisherTests (4 new Integration cases, uses per-run throwaway DB matching the FleetStatusPollerTests pattern): publisher upserts one row per host from each probe-capable driver and skips non-probe drivers; second tick advances LastSeenUtc without creating duplicate rows (upsert pattern verified end-to-end); state change between ticks updates State AND StateChangedUtc (datetime2(3) rounds to millisecond precision so comparison uses 1ms tolerance — documented inline); MapState translates every HostState enum member. Server.Tests Integration: 4 new tests pass. Admin build clean, Admin.Tests Unit still 23 / 0. docs/v2/lmx-followups.md item #7 marked DONE with three explicit deferred items (event-driven push, failure-count column, SignalR fan-out). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
7.2 KiB
LMX Galaxy bridge — remaining follow-ups
State after PR 19: the Galaxy driver is functionally at v1 parity through the
IDriver abstraction; the OPC UA server runs with LDAP-authenticated
Basic256Sha256 endpoints and alarms are observable through
AlarmConditionState.ReportEvent. The items below are what remains LMX-
specific before the stack can fully replace the v1 deployment, in
rough priority order.
1. Proxy-side IHistoryProvider for ReadAtTime / ReadEvents
Status: Host-side IPC shipped (PR 10 + PR 11). Proxy consumer not written.
PR 10 added HistoryReadAtTimeRequest/Response on the IPC wire and
MxAccessGalaxyBackend.HistoryReadAtTimeAsync delegates to
HistorianDataSource.ReadAtTimeAsync. PR 11 did the same for events
(HistoryReadEventsRequest/Response + GalaxyHistoricalEvent). The Proxy
side (GalaxyProxyDriver) doesn't call those yet — Core.Abstractions.IHistoryProvider
only exposes ReadRawAsync + ReadProcessedAsync.
To do:
- Extend
IHistoryProviderwithReadAtTimeAsync(string, DateTime[], …)andReadEventsAsync(string?, DateTime, DateTime, int, …). GalaxyProxyDrivercalls the new IPC message kinds.DriverNodeManagerwires the new capability methods ontoHistoryReadAtTime+Eventsservice handlers.- Integration test: OPC UA client calls
HistoryReadAtTime/HistoryReadEvents, value flows through IPC to the Host'sHistorianDataSource, back to the client.
2. Write-gating by role — DONE (PR 26)
Landed in PR 26. WriteAuthzPolicy in Server/Security/ maps
SecurityClassification → required role (FreeAccess → no role required,
Operate/SecuredWrite → WriteOperate, Tune → WriteTune,
Configure/VerifiedWrite → WriteConfigure, ViewOnly → deny regardless).
DriverNodeManager caches the classification per variable during discovery and
checks the session's roles (via IRoleBearer) in OnWriteValue before calling
IWritable.WriteAsync. Roles do not cascade — a session with WriteOperate
can't write a Tune attribute unless it also carries WriteTune.
See feedback_acl_at_server_layer.md in memory for the architectural directive
that authz stays at the server layer and never delegates to driver-specific auth.
3. Admin UI client-cert trust management — DONE (PR 28)
PR 28 shipped /certificates in the Admin UI. CertTrustService reads the OPC
UA server's PKI store root (OpcUaServerOptions.PkiStoreRoot — default
%ProgramData%\OtOpcUa\pki) and lists rejected + trusted certs by parsing the
.der files directly, so it has no Opc.Ua dependency and runs on any
Admin host that can reach the shared PKI directory.
Operator actions: Trust (moves rejected/certs/*.der → trusted/certs/*.der),
Delete rejected, Revoke trust. The OPC UA stack re-reads the trusted store on
each new client handshake, so no explicit reload signal is needed —
operators retry the rejected client's connection after trusting.
Deferred: flipping AutoAcceptUntrustedClientCertificates to false as the
deployment default. That's a production-hardening config change, not a code
gap — the Admin UI is now ready to be the trust gate.
4. Live-LDAP integration test — DONE (PR 31)
PR 31 shipped Server.Tests/LdapUserAuthenticatorLiveTests.cs — 6 live-bind
tests against the dev GLAuth instance at localhost:3893, skipped cleanly
when the port is unreachable. Covers: valid bind, wrong password, unknown
user, empty credentials, single-group → WriteOperate mapping, multi-group
admin user surfacing all mapped roles.
Also added UserNameAttribute to LdapOptions (default uid for RFC 2307
compat) so Active Directory deployments can configure sAMAccountName /
userPrincipalName without code changes. LdapUserAuthenticatorAdCompatTests
(5 unit guards) pins the AD-shape DN parsing + filter escape behaviors. See
docs/security.md §"Active Directory configuration" for the AD appsettings
snippet.
Deferred: asserting session.Identity end-to-end on the server side (i.e.
drive a full OPC UA session with username/password, then read an
IHostConnectivityProbe-style "whoami" node to verify the role surfaced).
That needs a test-only address-space node and is a separate PR.
5. Full Galaxy live-service smoke test against the merged v2 stack
Status: Individual pieces have live smoke tests (PR 5 MXAccess, PR 13
probe manager, PR 14 alarm tracker), but the full loop — OPC UA client →
OtOpcUaServer → GalaxyProxyDriver (in-process) → named-pipe to
Galaxy.Host subprocess → live MXAccess runtime → real Galaxy objects — has
no single end-to-end smoke test.
To do:
- Test that spawns the full topology, discovers a deployed Galaxy object, subscribes to one of its attributes, writes a value back, and asserts the write round-tripped through MXAccess. Skip when ArchestrA isn't running.
6. Second driver instance on the same server — DONE (PR 32)
Server.Tests/MultipleDriverInstancesIntegrationTests.cs registers two
drivers with distinct DriverInstanceIds on one DriverHost, spins up the
full OPC UA server, and asserts three behaviors: (1) each driver's namespace
URI (urn:OtOpcUa:{id}) resolves to a distinct index in the client's
NamespaceUris, (2) browsing one subtree returns that driver's folder and
does NOT leak the other driver's folder, (3) reads route to the correct
driver — the alpha instance returns 42 while beta returns 99, so a misroute
would surface at the assertion layer.
Deferred: the alarm-event multi-driver parity case (two drivers each raising
a GalaxyAlarmEvent, assert each condition lands on its owning instance's
condition node). Alarm tracking already has its own integration test
(AlarmSubscription*); the multi-driver alarm case would need a stub
IAlarmSource that's worth its own focused PR.
7. Host-status per-AppEngine granularity → Admin UI dashboard — DONE (PRs 33 + 34)
PR 33 landed the data layer: DriverHostStatus entity + migration with
composite key (NodeId, DriverInstanceId, HostName) and two query-supporting
indexes (per-cluster drill-down on NodeId, stale-row detection on
LastSeenUtc).
PR 34 wired the publisher + consumer. HostStatusPublisher is a
BackgroundService in the Server process that walks every registered
IHostConnectivityProbe-capable driver every 10s, calls
GetHostStatuses(), and upserts rows (LastSeenUtc advances each tick;
State + StateChangedUtc update on transitions). Admin UI /hosts page
groups by cluster, shows four summary cards (Hosts / Running / Stale /
Faulted), and flags rows whose LastSeenUtc is older than 30s as Stale so
operators see crashed Servers without waiting for a state change.
Deferred as follow-ups:
- Event-driven push (subscribe to
OnHostStatusChangedper driver for sub-heartbeat latency). Adds DriverHost lifecycle-event plumbing; 10s polling is fine for operator-scale use. - Failure-count column — needs the publisher to track a transition history per host, not just current-state.
- SignalR fan-out to the Admin page (currently the page polls the DB, not a hub). The DB-polled version is fine at current cadence but a hub push would eliminate the 10s race where a new row sits in the DB before the Admin page notices.