Commit Graph

76 Commits

Author SHA1 Message Date
Joseph Doherty
d6fac2d81d test(host): 2-node integration test harness + consolidate to one ActorSystem (Task 58)
Builds TwoNodeClusterHarness: two in-process Host-equivalent nodes sharing
an in-memory ConfigDb. Forms a 2-member Akka cluster. ClusterFormationTests
proves both nodes see each other as admin+driver role members.

Fixes a real production bug uncovered while wiring the harness — Program.cs
ran two separate ActorSystems (one from AddOtOpcUaCluster.AkkaHostedService
with cluster HOCON, one from Akka.Hosting.AddAkka with bare HOCON). Cluster
singletons landed on the bare ActorSystem and could not actually form a
cluster ("Configuration does not contain `akka.cluster` node").

Consolidation:
- AddOtOpcUaCluster now only binds AkkaClusterOptions + registers IClusterRoleInfo
- New WithOtOpcUaClusterBootstrap pushes embedded HOCON + Remote/Cluster options
  into Akka.Hosting's AkkaConfigurationBuilder
- AkkaHostedService.cs deleted — Akka.Hosting now owns the lifecycle
- Program.cs + harness call WithOtOpcUaClusterBootstrap inside AddAkka

Why not WebApplicationFactory<Program>? Program.cs reads OTOPCUA_ROLES from
process env (shared across in-process WAFs); the harness replays Program.cs's
DI graph from a clean WebApplicationBuilder per node with per-node config
overrides. Same production extensions, isolated config + Kestrel + Akka ports.

Tests: 93 v2 tests pass (was 91 + 2 new cluster formation), 0 skipped.
2026-05-26 06:27:04 -04:00
Joseph Doherty
45a8c79ffe refactor(security): JwtBearer validation via IPostConfigureOptions (F2)
Eliminates the services.BuildServiceProvider() captive-provider antipattern
(ASP0000) inside AddJwtBearer. The new ConfigureJwtBearerFromTokenService
resolves JwtTokenService from the real DI container at runtime and stays
in lock-step with JwtTokenService.BuildValidationParameters.

All 27 Security.Tests stay green, including the F1 integration tests that
exercise /auth/token through the real bearer pipeline.
2026-05-26 06:18:00 -04:00
Joseph Doherty
b266f63cd7 feat(adminui): thread User.Identity.Name into Deployments createdBy (F18)
Injects AuthenticationStateProvider and reads the current user's identity
name on Deploy click, replacing the "(current user)" placeholder.
Anonymous case falls back to "(anonymous)" — should never hit in practice
since the page requires FleetAdmin/ConfigEditor.
2026-05-26 06:17:53 -04:00
Joseph Doherty
dfc143cdeb feat(controlplane): RedundancyStateActor broadcast override + un-skip tests (F6)
Mirrors the publisher-injection pattern from FleetStatusBroadcaster and
PeerOpcUaProbeActor: Props accepts an optional Action<object> override so
tests can use a TestProbe sink instead of bootstrapping DistributedPubSub
(unreliable single-node in TestKit).

Un-skips the two RedundancyStateActor tests deferred under F6.
2026-05-26 06:16:32 -04:00
Joseph Doherty
09d6676e1f feat(runtime): WithOtOpcUaRuntimeActors extension for driver-role node startup (F19)
Mirrors WithOtOpcUaControlPlaneSingletons for the driver role. Spawns
DriverHostActor + DbHealthProbeActor on the host's ActorSystem and
registers both under marker keys. Host's Program.cs now calls it when
the node carries the driver role, so driver-only and admin+driver
deployments both auto-bootstrap the per-node actors.

Integration test covers the registration round-trip via Microsoft.Extensions.Hosting
+ Akka.Hosting AddAkka.
2026-05-26 06:09:37 -04:00
Joseph Doherty
76310b8829 chore(cleanup): delete OtOpcUa.Server, OtOpcUa.Admin, and obsolete v1 tests
Task 56: removes the legacy in-process Server + Admin Web project + their test
projects (Server.Tests, Admin.Tests, Admin.E2ETests). The fused OtOpcUa.Host
binary built across Phases 1-9 is now the sole production entry point.

What happened to the 47 legacy Admin Blazor pages: per follow-up F15, the
v1 architecture's draft/publish UX is replaced by v2's live-edit + snapshot-
deploy model, so a 1:1 migration is not meaningful. The mechanical move via
git mv preserves the history; service classes + page bodies that referenced
removed v1 types (ConfigGeneration, RedundancyRole, GenerationId) were
deleted. AdminUI now ships a minimal Home page + the v2 Deployments page.

Per-page rebuild against the v2 surface is tracked as F15. The v2 Deployments
page (Task 52) is the only first-party UI shipping in this PR.

Task 57: solution build green; 84+ tests green across active v2 + legacy
driver test projects.
2026-05-26 05:38:31 -04:00
Joseph Doherty
8b4de8080b feat(runtime): DEV-STUB mode for Windows-only drivers on non-Windows or dev role 2026-05-26 05:23:02 -04:00
Joseph Doherty
fa1d685ccd feat(host): health endpoints + per-environment appsettings layout 2026-05-26 05:23:01 -04:00
Joseph Doherty
e2b357f89a feat(host): role-gated Program.cs composes all v2 components 2026-05-26 05:22:59 -04:00
Joseph Doherty
f167808a2c feat(adminui): Deployments page with drift indicator and Deploy button 2026-05-26 05:18:00 -04:00
Joseph Doherty
b83f099394 feat(adminui): IFleetDiagnosticsClient skeleton (Akka round-trip tracked as F17) 2026-05-26 05:17:59 -04:00
Joseph Doherty
f022499e7f feat(adminui): IAdminOperationsClient backed by ClusterSingletonProxy 2026-05-26 05:17:58 -04:00
Joseph Doherty
26d8f2f620 feat(adminui): FleetStatusHub + AlertHub + MapOtOpcUaHubs (broadcaster bridge tracked as F16) 2026-05-26 05:17:56 -04:00
Joseph Doherty
1a067e609c refactor(adminui): MapAdminUI extension + AddAdminUI DI (47-component migration tracked as F15) 2026-05-26 05:17:55 -04:00
Joseph Doherty
b7c117ab31 feat(opcua): pure Phase7Composer + purity tests (side-effects tracked as F14) 2026-05-26 05:14:45 -04:00
Joseph Doherty
2877a883cd feat(opcua): OpcUaApplicationHost facade in OpcUaServer (full extraction tracked as F13) 2026-05-26 05:14:39 -04:00
Joseph Doherty
28639cb14d feat(runtime): HistorianAdapter + PeerOpcUaProbe + DbHealthProbe actors (engine wiring tracked as F11/F12) 2026-05-26 05:09:06 -04:00
Joseph Doherty
e115f13104 feat(runtime): OpcUaPublishActor on synchronized dispatcher (SDK wiring tracked as F10) 2026-05-26 05:09:04 -04:00
Joseph Doherty
95ef533822 feat(runtime): ScriptedAlarmActor state machine (engine wiring tracked as F9) 2026-05-26 05:09:03 -04:00
Joseph Doherty
39729bfe21 feat(runtime): VirtualTagActor skeleton (engine wiring tracked as F8) 2026-05-26 05:09:01 -04:00
Joseph Doherty
64c627f8d6 feat(runtime): DriverInstanceActor state machine with Connecting/Connected/Reconnecting 2026-05-26 05:05:36 -04:00
Joseph Doherty
ed130135ca feat(runtime): DriverHostActor state machine with PreStart recovery + DispatchDeployment + stale fallback 2026-05-26 05:02:42 -04:00
Joseph Doherty
52bf4b3371 feat(controlplane): WithOtOpcUaControlPlaneSingletons registration extension (admin role) 2026-05-26 04:57:09 -04:00
Joseph Doherty
dd122c4ca9 feat(controlplane): FleetStatusBroadcaster push-driven from cluster events + heartbeats 2026-05-26 04:57:07 -04:00
Joseph Doherty
f193872891 feat(controlplane): ConfigPublishCoordinator deadline timeout + failover PreStart recovery 2026-05-26 04:57:05 -04:00
Joseph Doherty
6b37f997ad feat(controlplane): RedundancyStateActor with debounced topology publish 2026-05-26 04:53:31 -04:00
Joseph Doherty
62e12dab95 feat(controlplane): ConfigPublishCoordinator happy path with NodeDeploymentState seeding 2026-05-26 04:53:29 -04:00
Joseph Doherty
ef683f5073 feat(controlplane): AdminOperationsActor + ConfigComposer + StartDeployment flow 2026-05-26 04:53:28 -04:00
Joseph Doherty
23f669c376 feat(controlplane): AuditWriterActor with batched in-buffer-dedup insert 2026-05-26 04:44:01 -04:00
Joseph Doherty
14acab5a58 feat(controlplane): ServiceLevelCalculator + ControlPlane.Tests harness 2026-05-26 04:43:59 -04:00
Joseph Doherty
e38f22e3c2 feat(security): CookieAuthenticationStateProvider for Blazor circuit expiry detection 2026-05-26 04:35:50 -04:00
Joseph Doherty
8be84ba27b feat(security): /auth/login, /auth/ping, /auth/token endpoints 2026-05-26 04:35:49 -04:00
Joseph Doherty
207fc6aba9 feat(security): cookie+JWT hybrid auth via AddOtOpcUaAuth 2026-05-26 04:35:48 -04:00
Joseph Doherty
93316e3431 feat(security): JwtTokenService with HS256 + 15-min expiry 2026-05-26 04:35:46 -04:00
Joseph Doherty
567b8cac1d refactor(security): move LdapAuthService into OtOpcUa.Security library 2026-05-26 04:35:42 -04:00
Joseph Doherty
c168c1c9c6 feat(migration): add Migrate-To-V2.ps1 idempotent migration runner 2026-05-26 04:26:01 -04:00
Joseph Doherty
30a2104fa5 feat(scaffold): introduce 8 v2 component projects
Adds the empty project skeletons that subsequent v2 tasks fill in:

  src/Core/ZB.MOM.WW.OtOpcUa.Commons      (types, interfaces, message contracts)
  src/Core/ZB.MOM.WW.OtOpcUa.Cluster      (Akka.Hosting + cluster wiring)
  src/Server/ZB.MOM.WW.OtOpcUa.Security   (cookie+JWT auth, LDAP)
  src/Server/ZB.MOM.WW.OtOpcUa.ControlPlane (admin-role cluster singletons)
  src/Server/ZB.MOM.WW.OtOpcUa.Runtime    (per-node driver actors)
  src/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer (OPC UA SDK application host)
  src/Server/ZB.MOM.WW.OtOpcUa.AdminUI    (Razor class library)
  src/Server/ZB.MOM.WW.OtOpcUa.Host       (single fused web binary)

Each project sets TreatWarningsAsErrors=true in its own csproj (per the
Directory.Build.props deviation note in the previous commit). NuGetAuditSuppress
entries cover transitive vulnerability advisories the new strictness surfaces:

  - GHSA-g94r-2vxg-569j (OpenTelemetry.Api 1.9.0 via Akka.Cluster.Hosting/Tools)
  - GHSA-h958-fxgg-g7w3 (Opc.Ua.Core 1.5.374.126 via OpcUaServer)
  - GHSA-37gx-xxp4-5rgx + GHSA-w3x6-4m5h-cxqf (legacy advisories already accepted)

OpcUaServer pins OPCFoundation.NetStandard.Opc.Ua.Configuration to 1.5.374.126
via VersionOverride to match Opc.Ua.Server's transitive Opc.Ua.Core (same
constraint as the legacy Server project).

Runtime does NOT project-reference any concrete Driver.* assemblies; drivers
load reflectively at runtime (Phase 6). Runtime gets the IDriver contract
through Core.Abstractions instead.

Host's Microsoft.Extensions.Hosting.WindowsServices is conditional on the
Windows OS so the project builds on macOS dev machines.

Build verification: dotnet build -> 438 warnings (all pre-existing xUnit1051
in legacy Server.Tests/Admin.Tests), 0 errors. Closes Task 9 (build green
smoke check, no separate commit).
2026-05-26 03:44:56 -04:00
Joseph Doherty
2b811477d1 chore(build): introduce central package management for v2
Adds Directory.Packages.props (ManagePackageVersionsCentrally) and
Directory.Build.props (net10.0/nullable/implicit usings/LangVersion latest).
Strips Version attributes from every csproj PackageReference and consolidates
versions into the central file.

Side fixes (necessary to keep the build green on .NET SDK 10.0.105 on macOS):

- Microsoft.CodeAnalysis.CSharp{,.Workspaces}: 5.3.0 -> 5.0.0. The 5.3.0
  analyzer DLL references compiler 5.3.0.0 and the local SDK ships compiler
  5.0.0.0, producing CS9057 on every project that loaded the Analyzers
  output. Master itself was broken on this machine pre-change.
- Server + Server.Tests pin OPCFoundation.NetStandard.Opc.Ua.{Configuration,
  Client} to 1.5.374.126 via VersionOverride, matching Opc.Ua.Server's
  pin. Mixing 1.5.378.106 Opc.Ua.Core transitively with 1.5.374.126
  Opc.Ua.Server breaks CustomNodeManager2 override signatures
  (CS0115 on LoadPredefinedNodes/Browse/HistoryRead*) and CS7069 in
  the tests. The pin disappears when the legacy Server project is
  deleted in Task 56.
- Client.UI + Client.UI.Tests: NuGetAuditSuppress for
  GHSA-xrw6-gwf8-vvr9 (Tmds.DBus.Protocol 0.20.0 reaches both projects
  transitively from Avalonia.Desktop on Linux/macOS only).

Deviation from the plan: TreatWarningsAsErrors=true is NOT set in
Directory.Build.props because the pre-v2 Admin/Server test projects carry
~240 xUnit1051 analyzer warnings that would fail the build. New v2 projects
opt in via their own csproj; the global flag can return once the legacy
projects are deleted in Task 56.
2026-05-26 03:40:24 -04:00
Joseph Doherty
866dc03fac style(ui): align admin styling with ScadaLink master conventions
- Move CSS into wwwroot/css/ (theme.css, site.css); sidebar 218 -> 220px
- Add hamburger + Bootstrap collapse for <lg viewports
- Add Components/Shared/ with LoadingSpinner, ToastNotification, StatusBadge
- Replace .page-title with flex + <h4 class="mb-0"> across 20 pages
- Convert NewCluster + IdentificationFields forms to card + h6 subsection pattern
2026-05-26 01:12:57 -04:00
Joseph Doherty
6134050ceb fix(server): resolve Low code-review findings (Server-004,006,008,012,014,015)
- Server-004: pass the role-derived display name to UserIdentity's base
  ctor (the SDK's DisplayName has no public setter) and drop the dead
  Display property; make RoleBasedIdentity internal sealed.
- Server-006: derive a bounded CancellationToken from the SDK's
  OperationContext.OperationDeadline in OnReadValue / OnWriteValue so a
  stalled driver call can no longer pin the request thread.
- Server-008: mark handled slots via CallMethodRequest.Processed = true
  in RouteScriptedAlarmMethodCalls (the SDK skips on Processed, not on a
  Good error slot).
- Server-012: PeerHttpProbeLoop.ProbeAsync stops mutating client.Timeout
  per call; uses a per-request CancellationTokenSource linked to the
  shutdown token instead.
- Server-014: wire SealedBootstrap into Program.cs via AddSealedBootstrap
  + OpcUaServerService so the generation-sealed cache + stale-config flag
  + resilient reader actually run; /healthz now reflects cache-fallback
  state.
- Server-015: replace the stale 'PR 16 / PR 17 minimum-viable scope'
  class summaries on OtOpcUaServer and OpcUaServerOptions with the
  shipped LDAP + anonymous-role + configurable security-profile prose.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-23 07:24:20 -04:00
Joseph Doherty
2b33b64a58 fix(admin): resolve Low code-review findings (Admin-010,011,012)
- Admin-010: vendor Bootstrap 5.3.3 (CSS + JS bundle + maps + provenance
  README) under wwwroot/lib/bootstrap and reference local paths from
  App.razor — Admin no longer pulls Bootstrap from jsDelivr.
- Admin-011: swap FleetStatusPoller's three plain dictionaries for
  ConcurrentDictionary so ResetCache can't race a poll tick.
- Admin-012: drop the EquipmentId column from EquipmentCsvImporter (per
  admin-ui.md — equipment id is system-derived from EquipmentUuid);
  EquipmentImportBatchService and the textarea placeholder updated to
  match.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-23 07:24:07 -04:00
Joseph Doherty
8d5dbb46f2 fix(admin): authenticate SignalR hub clients with a bearer-token scheme
The Admin-003 fix gated every SignalR hub with [Authorize], but the server-side
Blazor HubConnection clients had no way to authenticate: the browser's HttpOnly
auth cookie is not reachable from the interactive circuit, so every hub negotiate
returned 401 and the Admin live-update feature was non-functional app-wide
(silently degraded on Hosts/ScriptLog, fatal on the cluster pages).

Introduce a token-based hub auth path:
- HubTokenService mints/validates short-lived tokens using ASP.NET Core Data
  Protection (the same primitive that protects the auth cookie — no signing-key
  management, no new packages). Tokens carry the user's name + roles.
- HubTokenAuthenticationHandler is a custom "HubToken" auth scheme that reads the
  token from the Authorization: Bearer header (negotiate) or the access_token
  query parameter (WebSocket upgrade).
- The "HubClients" authorization policy runs both the cookie and HubToken
  schemes; the hub endpoints use RequireAuthorization("HubClients").
- AdminHubConnectionFactory builds hub connections with an AccessTokenProvider
  that mints a fresh token for the circuit's authenticated user on every
  (re)connect. All six hub-consuming pages now resolve connections through it.

Hub negotiate now returns 200 and the WebSocket upgrades (101); live updates
work. The best-effort try/catch guards added previously are kept as defence.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-22 12:06:29 -04:00
Joseph Doherty
f2545392e0 fix(admin): stop SignalR hub-connect failure from crashing cluster pages
The Admin-003 fix gated every SignalR hub with [Authorize]/RequireAuthorization,
but the server-side HubConnection clients on ClusterDetail, AclsTab, RedundancyTab
and RoleGrants cannot forward the browser's HttpOnly auth cookie — so the hub
negotiate returns 401. Those four pages called HubConnection.StartAsync()
unguarded, so the 401 surfaced as an unhandled exception (a 500 page for the
prerendered ClusterDetail, a broken circuit for the others).

Wrap StartAsync/SendAsync in try/catch on all four, matching the established
best-effort pattern already used in Hosts.razor and ScriptLog.razor: the live
banner / live refresh degrades but the page renders. Restoring functional hub
live-updates needs a token-based hub auth scheme (cookie forwarding is not
viable across the prerender/interactive boundary) and is left as follow-up.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-22 11:56:06 -04:00
Joseph Doherty
0f3b74ad87 fix(server): wire PermissionTrieCache into AuthorizationGate for generation pinning
Core-002 fixed TriePermissionEvaluator to evaluate each request against
the session's bound AuthGenerationId rather than whatever the cache
currently holds. AuthorizationGate.BuildSessionState was not updated at
the same time: it hardcoded AuthGenerationId = 0, so the evaluator's
GetTrie(cluster, 0) call returned null for any generation != 0, causing
every gated operation to silently fail with NotGranted regardless of
actual grants. The 42 gate/matrix/deferred-hardening tests all started
failing as a result.

Fix: add an optional PermissionTrieCache parameter to AuthorizationGate;
BuildSessionState now stamps AuthGenerationId from the cache's current
generation for the session's cluster. AuthorizationBootstrap.BuildGateAsync
passes the cache it creates. All 7 test MakeGate helpers updated to pass
the cache so tests produce a valid AuthGenerationId. 433/433 server tests
now pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-22 11:25:39 -04:00
Joseph Doherty
2dd0bd4198 fix(server): resolve Medium code-review finding (Server-013)
Replace silent Enum.TryParse fallback to None with a ParseSecurityProfile
helper that emits a startup Log.Warning naming the unsupported value and
listing recognised profiles; operators now see the misconfiguration
before any client connects rather than getting an unexplained None posture.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-22 11:03:35 -04:00
Joseph Doherty
a00f0338b5 fix(server): resolve Medium code-review finding (Server-011)
Advertise UserName token policy on any non-None security profile when
Ldap.Enabled; emit a startup LogWarning when Ldap.Enabled=true but
SecurityProfile=None so the misconfiguration is surfaced before clients
connect rather than silently producing no credential path.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-22 11:01:43 -04:00
Joseph Doherty
6075254f38 fix(server): resolve Medium code-review finding (Server-010)
Default AutoAcceptUntrustedClientCertificates to false in both
OpcUaServerOptions and Program.cs config fallback, aligning with
docs/security.md; auto-accept is now explicitly opt-in for dev use only.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-22 11:00:24 -04:00
Joseph Doherty
fccb529d5f fix(server): resolve Medium code-review finding (Server-007)
Add configDbHealthy parameter to OpcUaApplicationHost; wire a
DbHealthCache (CanConnectAsync cached 10 s) in Program.cs so /healthz
reflects real config-DB reachability instead of the previous always-true
default; /healthz now returns 503 on a DB outage unless stale-config
cache is warm.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-22 10:59:08 -04:00
Joseph Doherty
8e8199752f fix(server): resolve Medium code-review finding (Server-005)
Add _nodeManagerDisposed field; set it under Lock in Dispose before
detaching the alarm-service handler; check it in OnAlarmServiceTransition
under the same Lock so an in-flight transition cannot dispatch to a
ConditionSink whose DriverNodeManager is being concurrently disposed.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-22 10:56:01 -04:00
Joseph Doherty
2003b343bf fix(server): resolve Medium code-review finding (Server-003)
Fix ReadRawAsync: correct XML doc from newest-first to oldest-first
(ascending source timestamp per OPC UA Part 11); move maxValuesPerNode
cap inside the time-window filter loop so paging limits apply to
in-window results only, not the whole buffer snapshot.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-22 10:54:08 -04:00