Files
lmxopcua/docs/v2/lmx-followups.md
Joseph Doherty ed88835d34 Phase 3 PR 28 — Admin UI cert-trust management page. New /certificates route (FleetAdmin-only) surfaces the OPC UA server's PKI store rejected + trusted certs and gives operators Trust / Delete / Revoke actions so rejected client certs can be promoted without touching disk. CertTrustService reads $PkiStoreRoot/{rejected,trusted}/certs/*.der files directly via X509CertificateLoader — no Opc.Ua dependency in the Admin project, which keeps the Admin host runnable on a machine that doesn't have the full Server install locally (only needs the shared PKI directory reachable; typical deployment has Admin + Server side-by-side on the same box and PkiStoreRoot defaults match so a plain-vanilla install needs no override). CertTrustOptions bound from the Admin's 'CertTrust:PkiStoreRoot' section, default %ProgramData%\OtOpcUa\pki (matches OpcUaServerOptions.PkiStoreRoot default). Trust action moves the .der from rejected/certs/ to trusted/certs/ via File.Move(overwrite:true) — idempotent, tolerates a concurrent operator doing the same move. Delete wipes the file. Revoke removes from trusted/certs/ (Opc.Ua re-reads the Directory store on each new client handshake, so no explicit reload signal is needed; operators retry the rejected connection after trusting). Thumbprint matching is case-insensitive because X509Certificate2.Thumbprint is upper-case hex but operators copy-paste from logs that sometimes lowercase it. Malformed files in the store are logged + skipped — a single bad .der can't take the whole management page offline. Missing store directories produce empty lists rather than exceptions so a pristine install (Server never run yet, no rejected/trusted dirs yet) doesn't crash the page.
Razor page layout: two tables (Rejected / Trusted) with Subject / Issuer / Thumbprint / Valid-window / Actions columns, status banner after each action with success or warning kind ('file missing' = another admin handled it), FleetAdmin-only via [Authorize(Roles=AdminRoles.FleetAdmin)]. Each action invokes LogActionAsync which Serilog-logs the authenticated admin user + thumbprint + action for an audit trail — DB-level ConfigAuditLog persistence is deferred because its schema is cluster-scoped and cert actions are cluster-agnostic; Serilog + CertTrustService's filesystem-op info logs give the forensic trail in the meantime. Sidebar link added to MainLayout between Reservations and the future Account page.
Tests — CertTrustServiceTests (9 new unit cases): ListRejected parses Subject + Thumbprint + store kind from a self-signed test cert written into rejected/certs/; rejected and trusted stores are kept separate; TrustRejected moves the file and the Rejected list is empty afterwards; TrustRejected with a thumbprint not in rejected returns false without touching trusted; DeleteRejected removes the file; UntrustCert removes from trusted only; thumbprint match is case-insensitive (operator UX); missing store directories produce empty lists instead of throwing DirectoryNotFoundException (pristine-install tolerance); a junk .der in the store is logged + skipped and the valid certs still surface (one bad file doesn't break the page). Full Admin.Tests Unit suite: 23 pass / 0 fail (14 prior + 9 new). Full Admin build clean — 0 errors, 0 warnings.
lmx-followups.md #3 marked DONE with a cross-reference to this PR and a note that flipping AutoAcceptUntrustedClientCertificates to false as the production default is a deployment-config follow-up, not a code gap — the Admin UI is now ready to be the trust gate.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-18 14:37:55 -04:00

5.4 KiB

LMX Galaxy bridge — remaining follow-ups

State after PR 19: the Galaxy driver is functionally at v1 parity through the IDriver abstraction; the OPC UA server runs with LDAP-authenticated Basic256Sha256 endpoints and alarms are observable through AlarmConditionState.ReportEvent. The items below are what remains LMX- specific before the stack can fully replace the v1 deployment, in rough priority order.

1. Proxy-side IHistoryProvider for ReadAtTime / ReadEvents

Status: Host-side IPC shipped (PR 10 + PR 11). Proxy consumer not written.

PR 10 added HistoryReadAtTimeRequest/Response on the IPC wire and MxAccessGalaxyBackend.HistoryReadAtTimeAsync delegates to HistorianDataSource.ReadAtTimeAsync. PR 11 did the same for events (HistoryReadEventsRequest/Response + GalaxyHistoricalEvent). The Proxy side (GalaxyProxyDriver) doesn't call those yet — Core.Abstractions.IHistoryProvider only exposes ReadRawAsync + ReadProcessedAsync.

To do:

  • Extend IHistoryProvider with ReadAtTimeAsync(string, DateTime[], …) and ReadEventsAsync(string?, DateTime, DateTime, int, …).
  • GalaxyProxyDriver calls the new IPC message kinds.
  • DriverNodeManager wires the new capability methods onto HistoryRead AtTime + Events service handlers.
  • Integration test: OPC UA client calls HistoryReadAtTime / HistoryReadEvents, value flows through IPC to the Host's HistorianDataSource, back to the client.

2. Write-gating by role — DONE (PR 26)

Landed in PR 26. WriteAuthzPolicy in Server/Security/ maps SecurityClassification → required role (FreeAccess → no role required, Operate/SecuredWriteWriteOperate, TuneWriteTune, Configure/VerifiedWriteWriteConfigure, ViewOnly → deny regardless). DriverNodeManager caches the classification per variable during discovery and checks the session's roles (via IRoleBearer) in OnWriteValue before calling IWritable.WriteAsync. Roles do not cascade — a session with WriteOperate can't write a Tune attribute unless it also carries WriteTune.

See feedback_acl_at_server_layer.md in memory for the architectural directive that authz stays at the server layer and never delegates to driver-specific auth.

3. Admin UI client-cert trust management — DONE (PR 28)

PR 28 shipped /certificates in the Admin UI. CertTrustService reads the OPC UA server's PKI store root (OpcUaServerOptions.PkiStoreRoot — default %ProgramData%\OtOpcUa\pki) and lists rejected + trusted certs by parsing the .der files directly, so it has no Opc.Ua dependency and runs on any Admin host that can reach the shared PKI directory.

Operator actions: Trust (moves rejected/certs/*.dertrusted/certs/*.der), Delete rejected, Revoke trust. The OPC UA stack re-reads the trusted store on each new client handshake, so no explicit reload signal is needed — operators retry the rejected client's connection after trusting.

Deferred: flipping AutoAcceptUntrustedClientCertificates to false as the deployment default. That's a production-hardening config change, not a code gap — the Admin UI is now ready to be the trust gate.

4. Live-LDAP integration test

Status: PR 19 unit-tested the auth-flow shape; the live bind path is exercised only by the pre-existing Admin.Tests/LdapLiveBindTests.cs which uses the same Novell library against a running GLAuth at localhost:3893.

To do:

  • Add OpcUaServerIntegrationTests.Valid_username_authenticates_against_live_ldap with the same skip-when-unreachable guard.
  • Assert session.Identity on the server side carries the expected role after bind — requires exposing a test hook or reading identity from a new IHostConnectivityProbe-style "whoami" variable in the address space.

5. Full Galaxy live-service smoke test against the merged v2 stack

Status: Individual pieces have live smoke tests (PR 5 MXAccess, PR 13 probe manager, PR 14 alarm tracker), but the full loop — OPC UA client → OtOpcUaServerGalaxyProxyDriver (in-process) → named-pipe to Galaxy.Host subprocess → live MXAccess runtime → real Galaxy objects — has no single end-to-end smoke test.

To do:

  • Test that spawns the full topology, discovers a deployed Galaxy object, subscribes to one of its attributes, writes a value back, and asserts the write round-tripped through MXAccess. Skip when ArchestrA isn't running.

6. Second driver instance on the same server

Status: DriverHost.RegisterAsync supports multiple drivers; the OPC UA server creates one DriverNodeManager per driver and isolates their subtrees under distinct namespace URIs. Not proven with two active GalaxyProxyDriver instances pointing at different Galaxies.

To do:

  • Integration test that registers two driver instances, each with a distinct DriverInstanceId + endpoint in its own session, asserts nodes from both appear under the correct subtrees, alarm events land on the correct instance's condition nodes.

7. Host-status per-AppEngine granularity → Admin UI dashboard

Status: PR 13 ships per-platform/per-AppEngine ScanState probing; PR 17 surfaces the resulting OnHostStatusChanged events through OPC UA. Admin UI doesn't render a per-host dashboard yet.

To do:

  • SignalR hub push of HostStatusChangedEventArgs to the Admin UI.
  • Dashboard page showing each tracked host, current state, last transition time, failure count.