Commit Graph

1053 Commits

Author SHA1 Message Date
Joseph Doherty
d055cb059e docs(plans): mark F15 partial — Phases A–D shipped
Some checks failed
v2-ci / build (push) Failing after 34s
v2-ci / unit-tests (tests/Core/ZB.MOM.WW.OtOpcUa.Cluster.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.ControlPlane.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Runtime.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Security.Tests) (push) Has been skipped
v2-ci / integration (push) Has been skipped
2026-05-26 08:02:02 -04:00
Joseph Doherty
74161f9460 feat(adminui): F15 Phase D — logic + ops pages
Some checks failed
v2-ci / unit-tests (tests/Core/ZB.MOM.WW.OtOpcUa.Cluster.Tests) (push) Has been cancelled
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.ControlPlane.Tests) (push) Has been cancelled
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer.Tests) (push) Has been cancelled
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Runtime.Tests) (push) Has been cancelled
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Security.Tests) (push) Has been cancelled
v2-ci / integration (push) Has been cancelled
v2-ci / build (push) Has been cancelled
- ClusterAudit (/clusters/{id}/audit) — reads ConfigAuditLog with the
  EventId/CorrelationId columns added in F3; shown as a Cluster tab
- VirtualTags (/virtual-tags)            — fleet-wide read view
- ScriptedAlarms (/scripted-alarms)      — fleet-wide read view
- Scripts (/scripts)                     — fleet-wide; expandable code preview
- RoleGrants (/role-grants)              — per Q4, surfaces the fleet-wide
                                           LDAP-group → role mapping from
                                           Authentication:Ldap:GroupToRole
                                           (read-only; reload via host restart)
- Certificates (/certificates)           — own/trusted/issuer/rejected store
                                           contents resolved against
                                           OpcUa:PkiStoreRoot config (F13a)
- Reservations (/reservations)           — ExternalIdReservation table
- AlarmsHistorian (/alarms-historian)    — live HistorianAdapterActor sink
                                           status via the F11 GetStatus query;
                                           5s polling

ScriptLog deferred (needs the F16-deferred ScriptLogHub bridge).
ClusterNav extended with the Audit tab.

Adds an AdminUI → Runtime project reference so the historian status page can
inject IRequiredActor<HistorianAdapterActorKey>. NuGet audit suppression for
the transitive Opc.Ua.Core advisory mirrored from the Runtime project.

All 104 v2 tests still green.
2026-05-26 08:01:23 -04:00
Joseph Doherty
396052a126 feat(adminui): F15 Phase C — config-tab read views (Equipment/UNS/Namespaces/Drivers/Tags/ACLs)
Some checks failed
v2-ci / build (push) Failing after 38s
v2-ci / unit-tests (tests/Core/ZB.MOM.WW.OtOpcUa.Cluster.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.ControlPlane.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Runtime.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Security.Tests) (push) Has been skipped
v2-ci / integration (push) Has been skipped
Per Q3 of the rebuild plan, each v1 ClusterDetail tab becomes a separate
route under /clusters/{id}/<tab>. This batch adds read-only table views
for the six core config entity types; live-edit forms with RowVersion
concurrency land in Phase C.2 once the read-view shape is reviewed.

- ClusterEquipment    /clusters/{id}/equipment   — joins via DriverInstance
                                                   so the cluster scope works
- ClusterUns          /clusters/{id}/uns         — Areas + Lines tables
- ClusterNamespaces   /clusters/{id}/namespaces  — Kind + URI + Enabled chip
- ClusterDrivers      /clusters/{id}/drivers     — collapsed list with JSON
                                                   config expandable per Q1
                                                   (typed editors deferred)
- ClusterTags         /clusters/{id}/tags        — first 200 by name + filter
- ClusterAcls         /clusters/{id}/acls        — LDAP group + scope +
                                                   NodePermissions bits

Shared ClusterNav.razor extracted; ClusterOverview + ClusterRedundancy
updated to use it. _Imports.razor adds Components.Shared so the shared
nav is in scope across pages.
2026-05-26 07:56:39 -04:00
Joseph Doherty
fd0cc4dfdb feat(adminui): F15 Phase B — cluster CRUD + Overview/Redundancy routes
Some checks failed
v2-ci / build (push) Failing after 38s
v2-ci / unit-tests (tests/Core/ZB.MOM.WW.OtOpcUa.Cluster.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.ControlPlane.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Runtime.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Security.Tests) (push) Has been skipped
v2-ci / integration (push) Has been skipped
- ClustersList (/clusters) — table view, row-click opens detail
- NewCluster (/clusters/new) — EditForm with DataAnnotations; redundancy
  mode + node-count coupling enforced client-side (None→1, Warm/Hot→2);
  CreatedBy taken from AuthenticationStateProvider
- ClusterOverview (/clusters/{id}) — cluster details + last-deployment
  badge + node list. Per Q3, the legacy 10-tab monolith is split into
  separate routes; this page hosts the Overview "tab" as its primary slot
- ClusterRedundancy (/clusters/{id}/redundancy) — static ServiceLevelBase
  config view; live ServiceLevel comes via RedundancyStateActor DPS topic
  (deferred to its own follow-up once the SignalR bridge lands)

The other 8 v1 cluster tabs (Equipment, UNS, Namespaces, Drivers, Tags,
ACLs, ScriptedAlarms, Scripts, Audit) land in Phase C/D.
2026-05-26 07:52:41 -04:00
Joseph Doherty
850d6774ea feat(adminui): F15 Phase A — shell + auth + fleet + hosts pages
Some checks failed
v2-ci / build (push) Failing after 38s
v2-ci / unit-tests (tests/Core/ZB.MOM.WW.OtOpcUa.Cluster.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.ControlPlane.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Runtime.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Security.Tests) (push) Has been skipped
v2-ci / integration (push) Has been skipped
Implements Phase A of the F15 rebuild plan: minimum-viable Admin surface
with a working sign-in path and a fleet-state landing page. Decisions Q1–Q5
of docs/v2/AdminUI-rebuild-plan.md were taken as recommended.

- App.razor (moved into AdminUI library from the Host stub; vendored
  Bootstrap from RCL wwwroot — no public CDN, air-gap safe)
- Routes.razor (AuthorizeRouteView enforces page-level [Authorize])
- RedirectToLogin.razor (preserves returnUrl through the auth hop)
- Login.razor (static SSR, posts to /auth/login; Q5 wording about
  generic-vs-specific LDAP errors)
- Account.razor (identity + fleet roles + raw LDAP groups; Q4 — no
  per-cluster grants; fleet-wide LDAP-group → role mapping only)
- Fleet.razor (per-node deployment status: reads NodeDeploymentState
  + unions with IClusterRoleInfo.MembersWithRole("driver") so freshly-
  joined nodes appear as "waiting"; 10s auto-refresh)
- Hosts.razor (Akka cluster topology: members, status, roles, role-
  leader; 5s auto-refresh)

Host's stub App.razor deleted; Program.cs now points at
AdminUI.Components.App via an added using.

All 104 v2 tests remain green.
2026-05-26 07:49:35 -04:00
Joseph Doherty
5c754ecffd docs(v2): F15 UX kickoff — AdminUI rebuild plan
Some checks failed
v2-ci / build (push) Failing after 41s
v2-ci / unit-tests (tests/Core/ZB.MOM.WW.OtOpcUa.Cluster.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.ControlPlane.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Runtime.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Security.Tests) (push) Has been skipped
v2-ci / integration (push) Has been skipped
47-page legacy inventory mapped to v2 disposition (5 already done, 22 port
as-is, 7 reshape, 5 dropped because live-edit replaces draft/publish, 4
deferred driver-typed editors). Net ~30 active pages to rebuild.

Five open design questions surfaced for review before per-page work starts:
Q1 driver-typed editors (defer vs. ship), Q2 top-level fleet-wide views
(drop vs. keep), Q3 ClusterDetail tabs vs. split routes, Q4 RoleGrants
cluster-scoped vs. LDAP-group fleet-wide, Q5 Login error UX.

Proposed 4-phase sequencing (~5 days total): shell+auth+fleet, cluster
CRUD, config tabs, logic+ops. Each phase independently mergeable.
2026-05-26 07:38:58 -04:00
Joseph Doherty
68c6f36cfe docs(plans): mark F13a partial-complete (36c4751)
Some checks failed
v2-ci / build (push) Failing after 42s
v2-ci / unit-tests (tests/Core/ZB.MOM.WW.OtOpcUa.Cluster.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.ControlPlane.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Runtime.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Security.Tests) (push) Has been skipped
v2-ci / integration (push) Has been skipped
2026-05-26 07:35:04 -04:00
Joseph Doherty
36c4751571 feat(opcua): F13a — cert auto-creation in OpcUaApplicationHost
Adds OPC UA SDK's CheckApplicationInstanceCertificate call to
OpcUaApplicationHost.StartAsync, removing the v1 friction of needing to
pre-create the PKI directory tree before booting.

- New OpcUaApplicationHostOptions.PkiStoreRoot (defaults to "pki")
- BuildConfigurationAsync now derives own/issuer/trusted/rejected from
  PkiStoreRoot so the cert paths are configurable + consistent
- EnsureApplicationCertificateAsync runs before StandardServer.Start, and
  fails fast with a clear message if the SDK can't produce a valid cert
- 2 new tests: fresh-tree creates a cert, second boot reuses it

Partial slice of follow-up F13. Endpoint-security, user-token validator,
and observability wiring still pending in the F13 follow-up. OpcUaServer
tests: 4 → 6.
2026-05-26 07:34:48 -04:00
Joseph Doherty
229282ad8b docs(plans): mark F21 complete (b0a2bb0)
Some checks failed
v2-ci / build (push) Failing after 43s
v2-ci / unit-tests (tests/Core/ZB.MOM.WW.OtOpcUa.Cluster.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.ControlPlane.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Runtime.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Security.Tests) (push) Has been skipped
v2-ci / integration (push) Has been skipped
2026-05-26 07:25:36 -04:00
Joseph Doherty
b0a2bb037d test(integration): F21 — docker-compose + env-driven SQL/LDAP harness mode
Adds a real-infra mode for the integration test harness alongside the default
in-memory mode. Drops the previously-untested code paths (EF SqlServer
behaviors, real LDAP bind) under env-var control without breaking the
zero-infra default that CI runs.

- docker-compose.yml — minimal SQL 2022 (14331) + OpenLDAP (3894) stack
  (ports chosen to coexist with docker-dev/ on 14330/3893)
- HarnessMode record reads OTOPCUA_HARNESS_USE_SQL=1 / USE_LDAP=1 from env
- SQL mode: per-harness unique DB OtOpcUa_Harness_{guid}, EnsureCreated
  at startup, EnsureDeleted on dispose (best-effort)
- LDAP mode: drops StubLdapAuthService and configures real LdapAuthService
  against the compose'd OpenLDAP via Authentication:Ldap:* config keys
- Microsoft.EntityFrameworkCore.SqlServer added to the test project
- README documents both modes + the macOS no-Docker caveat

Default in-memory mode unchanged — all 9 existing tests still pass.
2026-05-26 07:25:16 -04:00
Joseph Doherty
ba6e5dd7f9 docs(plans): mark F11 + F22 complete
Some checks failed
v2-ci / build (push) Failing after 49s
v2-ci / unit-tests (tests/Core/ZB.MOM.WW.OtOpcUa.Cluster.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.ControlPlane.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Runtime.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Security.Tests) (push) Has been skipped
v2-ci / integration (push) Has been skipped
F22 → cd5540c (failover scenarios on TwoNodeClusterHarness)
F11 → 6861381 (HistorianAdapterActor → IAlarmHistorianSink bridge)

Branch follow-ups: 11/22 → 13/22 done. Remaining 9 are engine wiring
gated on real drivers/SDKs (F7-F10, F13-F14), Admin UI rebuild (F15),
fold-in to F7 (F20), and SQL/LDAP harness mode (F21).
2026-05-26 07:19:07 -04:00
Joseph Doherty
686138123f feat(runtime): F11 — HistorianAdapterActor wired to IAlarmHistorianSink
Some checks failed
v2-ci / unit-tests (tests/Core/ZB.MOM.WW.OtOpcUa.Cluster.Tests) (push) Has been cancelled
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.ControlPlane.Tests) (push) Has been cancelled
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer.Tests) (push) Has been cancelled
v2-ci / build (push) Has been cancelled
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Runtime.Tests) (push) Has been cancelled
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Security.Tests) (push) Has been cancelled
v2-ci / integration (push) Has been cancelled
Reshapes the placeholder buffered-counter actor into a thin fire-and-forget
bridge over the existing IAlarmHistorianSink contract. Default sink is
NullAlarmHistorianSink; production deployments override the DI binding to
SqliteStoreAndForwardSink wrapping WonderwareHistorianClient (the v1
components in src/Drivers/ZB.MOM.WW.OtOpcUa.Driver.Historian.Wonderware*
are reused verbatim — actor is just a mailbox-friendly entry point).

- HistorianAdapterActor.Props(IAlarmHistorianSink?) — null defaults to NullAlarmHistorianSink
- Receive<AlarmHistorianEvent>: fire-and-forget sink.EnqueueAsync
- Receive<GetStatus>: returns sink.GetStatus() (queue depth + drain state)
- ServiceCollectionExtensions.AddOtOpcUaRuntime registers the default sink
- WithOtOpcUaRuntimeActors spawns the actor + registers HistorianAdapterActorKey
- Program.cs calls AddOtOpcUaRuntime when hasDriver

Tests: 2 new (forward-to-sink + GetStatus). Runtime suite 17 → 18.
2026-05-26 07:18:08 -04:00
Joseph Doherty
cd5540cb1a test(integration): F22 — failover scenario tests + harness Stop/Restart primitives
Extends TwoNodeClusterHarness with three lifecycle primitives:
- StopNodeBAsync()      — graceful CoordinatedShutdown (Cluster.Leave)
- RestartNodeBAsync()   — rebuild node B on same Akka port + same in-memory DB
- WaitForClusterSizeAsync(n) — converge assertion helper

Adds three failover scenario tests:
- Stopping node B shrinks cluster to 1 Up member
- Restarted node B rejoins on the same Akka port
- Deployment started with B down seals with a single NodeDeploymentState
  (validates ConfigPublishCoordinator.DiscoverDriverNodes snapshots
   membership at dispatch time)

Closes follow-up F22. Integration test count: 6 → 9 (+3).
2026-05-26 07:13:14 -04:00
Joseph Doherty
4e6ef648d1 docs(plans): mark F16 complete
Some checks failed
v2-ci / build (push) Failing after 3m8s
v2-ci / unit-tests (tests/Core/ZB.MOM.WW.OtOpcUa.Cluster.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.ControlPlane.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Runtime.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Security.Tests) (push) Has been skipped
v2-ci / integration (push) Has been skipped
2026-05-26 07:01:54 -04:00
Joseph Doherty
f18c285cca feat(adminui): FleetStatusSignalRBridge — DPS → SignalR forwarding (F16)
Some checks failed
v2-ci / unit-tests (tests/Core/ZB.MOM.WW.OtOpcUa.Cluster.Tests) (push) Has been cancelled
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.ControlPlane.Tests) (push) Has been cancelled
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer.Tests) (push) Has been cancelled
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Runtime.Tests) (push) Has been cancelled
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Security.Tests) (push) Has been cancelled
v2-ci / integration (push) Has been cancelled
v2-ci / build (push) Has been cancelled
New per-admin-node actor that subscribes to the fleet-status DistributedPubSub
topic + forwards every FleetStatusChanged snapshot to all SignalR clients
connected to FleetStatusHub via IHubContext.

Wired via WithOtOpcUaSignalRBridges (new AkkaConfigurationBuilder extension in
AdminUI.Hubs) — Program.cs calls it inside the if(hasAdmin) block alongside
WithOtOpcUaControlPlaneSingletons.

Per-node subscription rather than cluster-singleton: every admin node forwards
its own snapshots to its own connected clients. Simpler than singleton
coordination + acceptable because the messages are small and SignalR fan-out
is per-node anyway.
2026-05-26 07:01:08 -04:00
Joseph Doherty
7a6b016d9e docs(plans): mark F17 complete
Some checks failed
v2-ci / build (push) Failing after 41s
v2-ci / unit-tests (tests/Core/ZB.MOM.WW.OtOpcUa.Cluster.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.ControlPlane.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Runtime.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Security.Tests) (push) Has been skipped
v2-ci / integration (push) Has been skipped
2026-05-26 06:58:27 -04:00
Joseph Doherty
8f32b89fb9 feat(adminui): FleetDiagnosticsClient real Akka ActorSelection round-trip (F17)
Some checks failed
v2-ci / unit-tests (tests/Core/ZB.MOM.WW.OtOpcUa.Cluster.Tests) (push) Has been cancelled
v2-ci / build (push) Has been cancelled
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.ControlPlane.Tests) (push) Has been cancelled
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer.Tests) (push) Has been cancelled
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Runtime.Tests) (push) Has been cancelled
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Security.Tests) (push) Has been cancelled
v2-ci / integration (push) Has been cancelled
- New Commons.Messages.Fleet.GetDiagnostics request record.
- DriverHostActor handles GetDiagnostics in all three states (Steady, Applying,
  Stale); replies with a NodeDiagnosticsSnapshot built from _currentRevision
  + the local NodeId. Drivers list is empty until F7 wires the per-instance
  children.
- FleetDiagnosticsClient now resolves the target via ActorSelection at
  akka.tcp://{system}@{nodeId}/user/driver-host and Asks with a 3s timeout.
  On timeout/peer-down it returns an empty snapshot so the UI degrades
  gracefully rather than throwing.

Two new integration tests in Host.IntegrationTests:
- GetDiagnostics_returns_snapshot_with_target_NodeId verifies the
  cross-node Ask/Reply works.
- GetDiagnostics_after_deploy_reports_current_revision exercises the
  end-to-end path: AdminOps starts a deployment, both DriverHostActors
  apply, then diagnostics reports the new revision on both nodes.

All 98 v2 tests pass (was 96 + 2 new).
2026-05-26 06:58:11 -04:00
Joseph Doherty
337a691629 docs(plans): mark F3, F4, F5, F12 follow-ups complete
Some checks failed
v2-ci / build (push) Failing after 38s
v2-ci / unit-tests (tests/Core/ZB.MOM.WW.OtOpcUa.Cluster.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.ControlPlane.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Runtime.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Security.Tests) (push) Has been skipped
v2-ci / integration (push) Has been skipped
2026-05-26 06:55:39 -04:00
Joseph Doherty
b06e3ae740 feat(runtime): PeerOpcUaProbeActor real TCP-connect probe (F12)
Replaces the Ok=true stub with a TCP connect to the peer's OPC UA port (4840
default) with a 2s timeout. A successful connect indicates the OPC UA server
process is up + accepting connections — enough for the redundancy calculator
to treat the peer as live. A full secure-channel Hello/Acknowledge handshake
is overkill for what the redundancy calc consumes and would pull in the OPC
UA Client SDK + a PKI setup. Upgrade later if a deeper liveness signal is ever
required.

Probe extracts the host from NodeId by stripping the :port suffix (commit
5cfbe8b encoded host:port into NodeId for cluster-member identity).

Tests: 2 new tests — Ok=true against a live TcpListener on a chosen port,
Ok=false against an unreachable endpoint. All 17 Runtime tests pass (was 16
covering only the message-contract surface).
2026-05-26 06:54:51 -04:00
Joseph Doherty
f57f61deac feat(audit): EventId + CorrelationId columns + filtered unique index (F3 + F4)
ConfigAuditLog gains two nullable columns (EventId, CorrelationId) + a filtered
unique index UX_ConfigAuditLog_EventId. EF migration
20260526105027_AddConfigAuditLogEventIdColumns is additive (nullable + filtered
index = legacy rows backfill cleanly).

AuditWriterActor now writes EventId + CorrelationId into the dedicated columns
instead of synthesising a JSON wrapper into DetailsJson. Cross-restart dedup
is now real: a retry of an already-flushed batch hits the unique index and
SaveChanges throws; the existing catch drops the duplicate without losing the
rest of the batch.

WrapDetails helper deleted — F4 (its JSON hardening) becomes moot.

AuditWriterActorTests.Details_wrapper_embeds_eventId_and_correlationId renamed
+ rewritten to assert against the columns. All 29 ControlPlane tests pass,
all 95 v2 tests green.
2026-05-26 06:52:53 -04:00
Joseph Doherty
8e5c8e29f7 docs(plans): mark Task 61 complete
Some checks failed
v2-ci / build (push) Failing after 1m21s
v2-ci / unit-tests (tests/Core/ZB.MOM.WW.OtOpcUa.Cluster.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.ControlPlane.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Runtime.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Security.Tests) (push) Has been skipped
v2-ci / integration (push) Has been skipped
2026-05-26 06:49:06 -04:00
Joseph Doherty
253fb60459 ci: v2 build + unit + integration workflow, nightly E2E (Task 61)
.github/workflows/v2-ci.yml runs on push/PR to v2-akka-fuse + master:
  - build job: dotnet restore + build (Release)
  - unit-tests job: matrix over the 5 v2 test projects (Cluster, ControlPlane,
    Runtime, Security, OpcUaServer) with Category!=E2E
  - integration job: Host.IntegrationTests with Category!=E2E

.github/workflows/v2-e2e.yml runs nightly at 03:00 UTC + workflow_dispatch:
  - Brings up the docker-dev four-node fleet (admin pair + driver pair + SQL
    + LDAP + Traefik)
  - Waits up to 60s for /health/active to return 200
  - Runs Category=E2E only
  - Always tears down with -v

Both workflows pin .NET 10 via actions/setup-dotnet (no global.json so any
10.0 SDK works). Compatible with both GitHub Actions and Gitea Actions
(act_runner). The E2E filter currently matches zero tests because the
tests/Server/ZB.MOM.WW.OtOpcUa.E2ETests project doesn't exist yet — it lands
when F10/F11/F12 wire enough engine for an end-to-end round-trip to be
meaningful.
2026-05-26 06:48:52 -04:00
Joseph Doherty
8ac71db464 docs(plans): mark Tasks 62, 63, 64, 65 complete 2026-05-26 06:46:55 -04:00
Joseph Doherty
7e3b56c27d feat(deploy): Traefik active-leader routing + docker-dev compose (Task 63)
- scripts/install/traefik.yml + traefik-dynamic.yml: Traefik static + dynamic
  config. One :80 entry point, one router on HostRegexp(otopcua.*), one
  service load-balancing admin-a:9000 + admin-b:9000 with /health/active health
  check (interval 5s, timeout 2s, expected 200). Followers return 503 from
  /health/active so Traefik drops them within the next interval after a
  leadership change.

- scripts/install/Install-Traefik.ps1: downloads Traefik for Windows, drops the
  yml configs, registers the OtOpcUaTraefik Windows service via sc.exe with
  restart-on-failure. Companion to Install-Services.ps1.

- docker-dev/{Dockerfile,docker-compose.yml,traefik-dynamic.yml,README.md}:
  Mac-friendly four-node fleet (admin-a + admin-b + driver-a + driver-b) plus
  SQL Server 2022 + OpenLDAP + Traefik. Single OtOpcUa.Host image built once;
  Compose drives OTOPCUA_ROLES + Cluster:* per container to differentiate the
  four hosts. README walks through bring-up + failover smoke + the dev LDAP
  users.

Note: untested on macOS (no local Docker — see docs/v2/dev-environment.md).
2026-05-26 06:46:40 -04:00
Joseph Doherty
e40615dad5 feat(install): rewrite Install/Refresh/Uninstall-Services.ps1 for v2 fused Host (Task 62)
- Install-Services.ps1: installs OtOpcUaHost (single fused binary) replacing
  the v1 OtOpcUa + OtOpcUaAdmin pair. Required -Roles param writes OTOPCUA_ROLES
  to the service env so Program.cs decides what to mount (admin / driver / both).
  -HttpPort param (default 9000) writes ASPNETCORE_URLS on admin-role nodes.
  sc.exe restart-on-failure: 5s, 30s, 60s; reset counter after 24h clean run.
  Wonderware historian sidecar install logic preserved from v1.

- Uninstall-Services.ps1: removes OtOpcUaHost + cleans up legacy v1 names
  (OtOpcUa, OtOpcUaAdmin) and the long-retired OtOpcUaGalaxyHost.

- Refresh-Services.ps1: updated service names (OtOpcUa -> OtOpcUaHost), publish
  path (ZB.MOM.WW.OtOpcUa.Server -> ZB.MOM.WW.OtOpcUa.Host), process names
  (OtOpcUa.Server -> OtOpcUa.Host). Switched nssm stop/start calls to
  Stop-Service/Start-Service so the script works whether the underlying
  service was installed via nssm or sc.exe.
2026-05-26 06:44:35 -04:00
Joseph Doherty
1689901c0e docs(v2): Architecture-v2 + Cluster + ControlPlane + Runtime overviews (Task 65)
Four new docs at docs/v2/ giving a single-page tour of each v2 piece:
- Architecture-v2.md: top-level mental model (fused Host + roles + cluster + live-edit)
- Cluster.md: AkkaClusterOptions + IClusterRoleInfo + WithOtOpcUaClusterBootstrap
- ControlPlane.md: 5 admin singletons + DPS topics + deploy flow + failover recovery
- Runtime.md: per-node actor tree + state machines + engine-wiring follow-up map

Each links back to the design doc for depth. Architecture-v2 cross-references
the other three + ServiceHosting + Redundancy + security.
2026-05-26 06:41:48 -04:00
Joseph Doherty
3c3fef911c docs: v2 updates to Redundancy, ServiceHosting, security, README (Task 64)
- Redundancy.md: full rewrite — Akka-leader-driven ServiceLevel replaces
  operator-managed RedundancyRole. Documents the 5-tier ServiceLevelCalculator,
  RedundancyStateActor cluster singleton, and the DPS data flow.

- ServiceHosting.md: full rewrite — single fused OtOpcUa.Host binary with
  OTOPCUA_ROLES env gating. Documents the conditional DI graph and the new
  health endpoints (/health/ready, /health/active, /healthz).

- security.md: v2 banner at top covering path/project renames + new JWT bearer
  + DataProtection persisted to ConfigDb. Body unchanged because the 4-concern
  security model is unchanged in v2; full per-section rewrite waits for F15
  (Admin pages migration) since security.md references many pages that move.

- README.md: platform overview updated to v2 (fused Host + role gating).
2026-05-26 06:38:55 -04:00
Joseph Doherty
a8becc9c46 docs(plans): mark Task 59 complete; track F22 failover scenarios 2026-05-26 06:35:03 -04:00
Joseph Doherty
5cfbe8b5dd test(host): deploy happy-path + idempotency integration tests (Task 59)
DeployHappyPathTests exercises the full deploy pipeline on the 2-node harness:
AdminOperationsActor → ConfigPublishCoordinator → DistributedPubSub →
DriverHostActor on both nodes → ApplyAck → coordinator seals. Verifies both
NodeDeploymentState rows reach Applied and Deployment.Status reaches Sealed.

Exposed + fixed two production bugs along the way:

1. Coordinator was publishing DispatchDeployment on the "deployments" topic but
   never subscribed to anything — DriverHostActor ACKs published on the same
   topic could not reach it. Added dedicated "deployment-acks" topic with
   coordinator subscription in PreStart, and DriverHostActor publishes ACKs
   there.

2. NodeId derivation used member.Address.Host only — two cluster members on a
   shared loopback host (test harness, dev VMs) collided to one identity. The
   coordinator's expected-ack set became {1} and the system sealed after only
   half the nodes acked. Switched to host:port everywhere (ClusterRoleInfo +
   coordinator) so loopback nodes stay distinct and production identities are
   harmlessly more specific.

Tests: 95 v2 tests pass (was 93 + 2 deploy tests), 0 skipped.

Failover scenarios (design §8 cases 3-7: node-kill-mid-apply, split-brain,
restart-during-deploy) deferred — they need controlled node-down primitives
on the harness. Tracked as F22 (failover scenario test cases).
2026-05-26 06:34:36 -04:00
Joseph Doherty
62e3cd6599 docs(plans): mark Task 58 complete; track F21 docker-compose follow-up 2026-05-26 06:27:33 -04:00
Joseph Doherty
d6fac2d81d test(host): 2-node integration test harness + consolidate to one ActorSystem (Task 58)
Builds TwoNodeClusterHarness: two in-process Host-equivalent nodes sharing
an in-memory ConfigDb. Forms a 2-member Akka cluster. ClusterFormationTests
proves both nodes see each other as admin+driver role members.

Fixes a real production bug uncovered while wiring the harness — Program.cs
ran two separate ActorSystems (one from AddOtOpcUaCluster.AkkaHostedService
with cluster HOCON, one from Akka.Hosting.AddAkka with bare HOCON). Cluster
singletons landed on the bare ActorSystem and could not actually form a
cluster ("Configuration does not contain `akka.cluster` node").

Consolidation:
- AddOtOpcUaCluster now only binds AkkaClusterOptions + registers IClusterRoleInfo
- New WithOtOpcUaClusterBootstrap pushes embedded HOCON + Remote/Cluster options
  into Akka.Hosting's AkkaConfigurationBuilder
- AkkaHostedService.cs deleted — Akka.Hosting now owns the lifecycle
- Program.cs + harness call WithOtOpcUaClusterBootstrap inside AddAkka

Why not WebApplicationFactory<Program>? Program.cs reads OTOPCUA_ROLES from
process env (shared across in-process WAFs); the harness replays Program.cs's
DI graph from a clean WebApplicationBuilder per node with per-node config
overrides. Same production extensions, isolated config + Kestrel + Akka ports.

Tests: 93 v2 tests pass (was 91 + 2 new cluster formation), 0 skipped.
2026-05-26 06:27:04 -04:00
Joseph Doherty
bb353c4d43 docs(plans): mark F1, F2, F6, F18, F19 follow-ups complete
Post-compaction batch of 5 follow-ups landed:
- F19 (09d6676): WithOtOpcUaRuntimeActors extension for driver-role spawn
- F1  (463512d): AuthEndpoints integration tests via TestServer
- F6  (dfc143c): RedundancyStateActor broadcast override + un-skip 2 tests
- F18 (b266f63): User.Identity.Name into Deployments createdBy
- F2  (45a8c79): JwtBearer validation via IPostConfigureOptions
2026-05-26 06:18:31 -04:00
Joseph Doherty
45a8c79ffe refactor(security): JwtBearer validation via IPostConfigureOptions (F2)
Eliminates the services.BuildServiceProvider() captive-provider antipattern
(ASP0000) inside AddJwtBearer. The new ConfigureJwtBearerFromTokenService
resolves JwtTokenService from the real DI container at runtime and stays
in lock-step with JwtTokenService.BuildValidationParameters.

All 27 Security.Tests stay green, including the F1 integration tests that
exercise /auth/token through the real bearer pipeline.
2026-05-26 06:18:00 -04:00
Joseph Doherty
b266f63cd7 feat(adminui): thread User.Identity.Name into Deployments createdBy (F18)
Injects AuthenticationStateProvider and reads the current user's identity
name on Deploy click, replacing the "(current user)" placeholder.
Anonymous case falls back to "(anonymous)" — should never hit in practice
since the page requires FleetAdmin/ConfigEditor.
2026-05-26 06:17:53 -04:00
Joseph Doherty
dfc143cdeb feat(controlplane): RedundancyStateActor broadcast override + un-skip tests (F6)
Mirrors the publisher-injection pattern from FleetStatusBroadcaster and
PeerOpcUaProbeActor: Props accepts an optional Action<object> override so
tests can use a TestProbe sink instead of bootstrapping DistributedPubSub
(unreliable single-node in TestKit).

Un-skips the two RedundancyStateActor tests deferred under F6.
2026-05-26 06:16:32 -04:00
Joseph Doherty
463512d1d8 test(security): AuthEndpoints integration tests via TestServer (F1)
7 tests exercise AddOtOpcUaAuth + MapOtOpcUaAuth end-to-end against an
in-memory ConfigDb + stub ILdapAuthService. Covers /auth/login (204/401/503),
/auth/ping (401/200), /auth/token (200+JWT shape), /auth/logout (204+clear-cookie).

Scope is the auth contract — not the fused Host bootstrap (cluster + role
gating belongs in the Task 58 multi-node harness). HostBuilder + TestServer
is used directly instead of WebApplicationFactory<Program> because the
test project has no Program entry point and Host needs Akka cluster up.
2026-05-26 06:15:07 -04:00
Joseph Doherty
09d6676e1f feat(runtime): WithOtOpcUaRuntimeActors extension for driver-role node startup (F19)
Mirrors WithOtOpcUaControlPlaneSingletons for the driver role. Spawns
DriverHostActor + DbHealthProbeActor on the host's ActorSystem and
registers both under marker keys. Host's Program.cs now calls it when
the node carries the driver role, so driver-only and admin+driver
deployments both auto-bootstrap the per-node actors.

Integration test covers the registration round-trip via Microsoft.Extensions.Hosting
+ Akka.Hosting AddAkka.
2026-05-26 06:09:37 -04:00
Joseph Doherty
698709a578 docs(plans): mark Tasks 56+57 complete 2026-05-26 05:39:07 -04:00
Joseph Doherty
76310b8829 chore(cleanup): delete OtOpcUa.Server, OtOpcUa.Admin, and obsolete v1 tests
Task 56: removes the legacy in-process Server + Admin Web project + their test
projects (Server.Tests, Admin.Tests, Admin.E2ETests). The fused OtOpcUa.Host
binary built across Phases 1-9 is now the sole production entry point.

What happened to the 47 legacy Admin Blazor pages: per follow-up F15, the
v1 architecture's draft/publish UX is replaced by v2's live-edit + snapshot-
deploy model, so a 1:1 migration is not meaningful. The mechanical move via
git mv preserves the history; service classes + page bodies that referenced
removed v1 types (ConfigGeneration, RedundancyRole, GenerationId) were
deleted. AdminUI now ships a minimal Home page + the v2 Deployments page.

Per-page rebuild against the v2 surface is tracked as F15. The v2 Deployments
page (Task 52) is the only first-party UI shipping in this PR.

Task 57: solution build green; 84+ tests green across active v2 + legacy
driver test projects.
2026-05-26 05:38:31 -04:00
Joseph Doherty
2b75ce3876 docs(plans): mark Phase 9 tasks 53-55 complete; track F19/F20 follow-ups 2026-05-26 05:23:48 -04:00
Joseph Doherty
8b4de8080b feat(runtime): DEV-STUB mode for Windows-only drivers on non-Windows or dev role 2026-05-26 05:23:02 -04:00
Joseph Doherty
fa1d685ccd feat(host): health endpoints + per-environment appsettings layout 2026-05-26 05:23:01 -04:00
Joseph Doherty
e2b357f89a feat(host): role-gated Program.cs composes all v2 components 2026-05-26 05:22:59 -04:00
Joseph Doherty
eb4280b7eb docs(plans): add F15-F18 follow-ups for Phase 8 deferred scope 2026-05-26 05:19:02 -04:00
Joseph Doherty
8a1f97b27f docs(plans): mark Phase 8 tasks 48-52 complete; track F15-F18 follow-ups 2026-05-26 05:18:37 -04:00
Joseph Doherty
f167808a2c feat(adminui): Deployments page with drift indicator and Deploy button 2026-05-26 05:18:00 -04:00
Joseph Doherty
b83f099394 feat(adminui): IFleetDiagnosticsClient skeleton (Akka round-trip tracked as F17) 2026-05-26 05:17:59 -04:00
Joseph Doherty
f022499e7f feat(adminui): IAdminOperationsClient backed by ClusterSingletonProxy 2026-05-26 05:17:58 -04:00
Joseph Doherty
26d8f2f620 feat(adminui): FleetStatusHub + AlertHub + MapOtOpcUaHubs (broadcaster bridge tracked as F16) 2026-05-26 05:17:56 -04:00
Joseph Doherty
1a067e609c refactor(adminui): MapAdminUI extension + AddAdminUI DI (47-component migration tracked as F15) 2026-05-26 05:17:55 -04:00