Files
lmxopcua/docs/v2/lmx-followups.md
Joseph Doherty ed88835d34 Phase 3 PR 28 — Admin UI cert-trust management page. New /certificates route (FleetAdmin-only) surfaces the OPC UA server's PKI store rejected + trusted certs and gives operators Trust / Delete / Revoke actions so rejected client certs can be promoted without touching disk. CertTrustService reads $PkiStoreRoot/{rejected,trusted}/certs/*.der files directly via X509CertificateLoader — no Opc.Ua dependency in the Admin project, which keeps the Admin host runnable on a machine that doesn't have the full Server install locally (only needs the shared PKI directory reachable; typical deployment has Admin + Server side-by-side on the same box and PkiStoreRoot defaults match so a plain-vanilla install needs no override). CertTrustOptions bound from the Admin's 'CertTrust:PkiStoreRoot' section, default %ProgramData%\OtOpcUa\pki (matches OpcUaServerOptions.PkiStoreRoot default). Trust action moves the .der from rejected/certs/ to trusted/certs/ via File.Move(overwrite:true) — idempotent, tolerates a concurrent operator doing the same move. Delete wipes the file. Revoke removes from trusted/certs/ (Opc.Ua re-reads the Directory store on each new client handshake, so no explicit reload signal is needed; operators retry the rejected connection after trusting). Thumbprint matching is case-insensitive because X509Certificate2.Thumbprint is upper-case hex but operators copy-paste from logs that sometimes lowercase it. Malformed files in the store are logged + skipped — a single bad .der can't take the whole management page offline. Missing store directories produce empty lists rather than exceptions so a pristine install (Server never run yet, no rejected/trusted dirs yet) doesn't crash the page.
Razor page layout: two tables (Rejected / Trusted) with Subject / Issuer / Thumbprint / Valid-window / Actions columns, status banner after each action with success or warning kind ('file missing' = another admin handled it), FleetAdmin-only via [Authorize(Roles=AdminRoles.FleetAdmin)]. Each action invokes LogActionAsync which Serilog-logs the authenticated admin user + thumbprint + action for an audit trail — DB-level ConfigAuditLog persistence is deferred because its schema is cluster-scoped and cert actions are cluster-agnostic; Serilog + CertTrustService's filesystem-op info logs give the forensic trail in the meantime. Sidebar link added to MainLayout between Reservations and the future Account page.
Tests — CertTrustServiceTests (9 new unit cases): ListRejected parses Subject + Thumbprint + store kind from a self-signed test cert written into rejected/certs/; rejected and trusted stores are kept separate; TrustRejected moves the file and the Rejected list is empty afterwards; TrustRejected with a thumbprint not in rejected returns false without touching trusted; DeleteRejected removes the file; UntrustCert removes from trusted only; thumbprint match is case-insensitive (operator UX); missing store directories produce empty lists instead of throwing DirectoryNotFoundException (pristine-install tolerance); a junk .der in the store is logged + skipped and the valid certs still surface (one bad file doesn't break the page). Full Admin.Tests Unit suite: 23 pass / 0 fail (14 prior + 9 new). Full Admin build clean — 0 errors, 0 warnings.
lmx-followups.md #3 marked DONE with a cross-reference to this PR and a note that flipping AutoAcceptUntrustedClientCertificates to false as the production default is a deployment-config follow-up, not a code gap — the Admin UI is now ready to be the trust gate.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-18 14:37:55 -04:00

110 lines
5.4 KiB
Markdown

# LMX Galaxy bridge — remaining follow-ups
State after PR 19: the Galaxy driver is functionally at v1 parity through the
`IDriver` abstraction; the OPC UA server runs with LDAP-authenticated
Basic256Sha256 endpoints and alarms are observable through
`AlarmConditionState.ReportEvent`. The items below are what remains LMX-
specific before the stack can fully replace the v1 deployment, in
rough priority order.
## 1. Proxy-side `IHistoryProvider` for `ReadAtTime` / `ReadEvents`
**Status**: Host-side IPC shipped (PR 10 + PR 11). Proxy consumer not written.
PR 10 added `HistoryReadAtTimeRequest/Response` on the IPC wire and
`MxAccessGalaxyBackend.HistoryReadAtTimeAsync` delegates to
`HistorianDataSource.ReadAtTimeAsync`. PR 11 did the same for events
(`HistoryReadEventsRequest/Response` + `GalaxyHistoricalEvent`). The Proxy
side (`GalaxyProxyDriver`) doesn't call those yet — `Core.Abstractions.IHistoryProvider`
only exposes `ReadRawAsync` + `ReadProcessedAsync`.
**To do**:
- Extend `IHistoryProvider` with `ReadAtTimeAsync(string, DateTime[], …)` and
`ReadEventsAsync(string?, DateTime, DateTime, int, …)`.
- `GalaxyProxyDriver` calls the new IPC message kinds.
- `DriverNodeManager` wires the new capability methods onto `HistoryRead`
`AtTime` + `Events` service handlers.
- Integration test: OPC UA client calls `HistoryReadAtTime` / `HistoryReadEvents`,
value flows through IPC to the Host's `HistorianDataSource`, back to the client.
## 2. Write-gating by role — **DONE (PR 26)**
Landed in PR 26. `WriteAuthzPolicy` in `Server/Security/` maps
`SecurityClassification` → required role (`FreeAccess` → no role required,
`Operate`/`SecuredWrite``WriteOperate`, `Tune``WriteTune`,
`Configure`/`VerifiedWrite``WriteConfigure`, `ViewOnly` → deny regardless).
`DriverNodeManager` caches the classification per variable during discovery and
checks the session's roles (via `IRoleBearer`) in `OnWriteValue` before calling
`IWritable.WriteAsync`. Roles do not cascade — a session with `WriteOperate`
can't write a `Tune` attribute unless it also carries `WriteTune`.
See `feedback_acl_at_server_layer.md` in memory for the architectural directive
that authz stays at the server layer and never delegates to driver-specific auth.
## 3. Admin UI client-cert trust management — **DONE (PR 28)**
PR 28 shipped `/certificates` in the Admin UI. `CertTrustService` reads the OPC
UA server's PKI store root (`OpcUaServerOptions.PkiStoreRoot` — default
`%ProgramData%\OtOpcUa\pki`) and lists rejected + trusted certs by parsing the
`.der` files directly, so it has no `Opc.Ua` dependency and runs on any
Admin host that can reach the shared PKI directory.
Operator actions: Trust (moves `rejected/certs/*.der``trusted/certs/*.der`),
Delete rejected, Revoke trust. The OPC UA stack re-reads the trusted store on
each new client handshake, so no explicit reload signal is needed —
operators retry the rejected client's connection after trusting.
Deferred: flipping `AutoAcceptUntrustedClientCertificates` to `false` as the
deployment default. That's a production-hardening config change, not a code
gap — the Admin UI is now ready to be the trust gate.
## 4. Live-LDAP integration test
**Status**: PR 19 unit-tested the auth-flow shape; the live bind path is
exercised only by the pre-existing `Admin.Tests/LdapLiveBindTests.cs` which
uses the same Novell library against a running GLAuth at `localhost:3893`.
**To do**:
- Add `OpcUaServerIntegrationTests.Valid_username_authenticates_against_live_ldap`
with the same skip-when-unreachable guard.
- Assert `session.Identity` on the server side carries the expected role
after bind — requires exposing a test hook or reading identity from a
new `IHostConnectivityProbe`-style "whoami" variable in the address space.
## 5. Full Galaxy live-service smoke test against the merged v2 stack
**Status**: Individual pieces have live smoke tests (PR 5 MXAccess, PR 13
probe manager, PR 14 alarm tracker), but the full loop — OPC UA client →
`OtOpcUaServer``GalaxyProxyDriver` (in-process) → named-pipe to
Galaxy.Host subprocess → live MXAccess runtime → real Galaxy objects — has
no single end-to-end smoke test.
**To do**:
- Test that spawns the full topology, discovers a deployed Galaxy object,
subscribes to one of its attributes, writes a value back, and asserts the
write round-tripped through MXAccess. Skip when ArchestrA isn't running.
## 6. Second driver instance on the same server
**Status**: `DriverHost.RegisterAsync` supports multiple drivers; the OPC UA
server creates one `DriverNodeManager` per driver and isolates their
subtrees under distinct namespace URIs. Not proven with two active
`GalaxyProxyDriver` instances pointing at different Galaxies.
**To do**:
- Integration test that registers two driver instances, each with a distinct
`DriverInstanceId` + endpoint in its own session, asserts nodes from both
appear under the correct subtrees, alarm events land on the correct
instance's condition nodes.
## 7. Host-status per-AppEngine granularity → Admin UI dashboard
**Status**: PR 13 ships per-platform/per-AppEngine `ScanState` probing; PR 17
surfaces the resulting `OnHostStatusChanged` events through OPC UA. Admin
UI doesn't render a per-host dashboard yet.
**To do**:
- SignalR hub push of `HostStatusChangedEventArgs` to the Admin UI.
- Dashboard page showing each tracked host, current state, last transition
time, failure count.