315 Commits

Author SHA1 Message Date
Joseph Doherty bd6c0b4d3d docs: complete XML doc comments via fixdocs (2757 to 131 findings)
Add missing <returns>/<param>/<summary>/<typeparam> tags and clean up
misused inheritdoc across 481 files so the documented API surface is
complete. Documentation-only (zero code lines changed). The 131 remaining
findings are inheritdoc-style warnings deliberately left to preserve
hand-written implementation rationale (plan-decision notes, race-condition
explanations).
2026-06-03 12:34:34 -04:00
Joseph Doherty c6d9b20d9f chore(adminui): prune kit-duplicate + dead shell CSS from site.css
v2-ci / build (push) Failing after 6m40s
v2-ci / unit-tests (tests/Core/ZB.MOM.WW.OtOpcUa.Cluster.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.ControlPlane.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Runtime.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Security.Tests) (push) Has been skipped
v2-ci / integration (tests/Server/ZB.MOM.WW.OtOpcUa.Host.IntegrationTests) (push) Has been skipped
v2-ci / integration (tests/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer.IntegrationTests) (push) Has been skipped
The ZB.MOM.WW.Theme cutover left site.css carrying a near-verbatim copy of the
kit's layout.css (.app-shell/.side-rail/.rail-link/.rail-foot/.login-*) plus two
dead rules (#sidebar-collapse — the kit emits #theme-rail; .rail-eyebrow-chevron
— rendered by the deleted NavSection.razor). Those duplicates loaded after the
kit and could silently override it. Removed them; kept only the app-only rules
the kit does not provide: .rail-eyebrow (footer Session label) and
.chip-alert/.chip-caution (domain status variants). 167 lines removed; builds clean.
2026-06-03 04:37:23 -04:00
Joseph Doherty 11de14d12e refactor(adminui): explicit ClaimTypes.Role footer filter; fix stale NavSidebar comment
v2-ci / build (push) Failing after 45s
v2-ci / unit-tests (tests/Core/ZB.MOM.WW.OtOpcUa.Cluster.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.ControlPlane.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Runtime.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Security.Tests) (push) Has been skipped
v2-ci / integration (tests/Server/ZB.MOM.WW.OtOpcUa.Host.IntegrationTests) (push) Has been skipped
v2-ci / integration (tests/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer.IntegrationTests) (push) Has been skipped
2026-06-03 03:18:08 -04:00
Joseph Doherty aadbf49678 feat(adminui): LoginCard sign-in; remove dead StatusBadge 2026-06-03 03:13:23 -04:00
Joseph Doherty 70d764b063 feat(adminui): MainLayout delegates to ZB.MOM.WW.Theme ThemeShell + kit nav 2026-06-03 03:10:49 -04:00
Joseph Doherty 11bcff6af5 refactor(adminui): drop vendored theme.css/fonts/nav-state.js; keep app-only CSS in site.css 2026-06-03 03:07:21 -04:00
Joseph Doherty de41963587 feat(adminui): use ZB.MOM.WW.Theme ThemeHead + ThemeScripts 2026-06-03 03:03:45 -04:00
Joseph Doherty a78b212c95 build(adminui): reference ZB.MOM.WW.Theme 0.2.0 2026-06-03 03:02:23 -04:00
Joseph Doherty 075c0e69da feat(audit): OtOpcUa IAuditActorAccessor seam + HTTP impl (audit Actor from Auth principal) (Phase 3)
v2-ci / build (push) Failing after 40s
v2-ci / unit-tests (tests/Core/ZB.MOM.WW.OtOpcUa.Cluster.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.ControlPlane.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Runtime.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Security.Tests) (push) Has been skipped
v2-ci / integration (tests/Server/ZB.MOM.WW.OtOpcUa.Host.IntegrationTests) (push) Has been skipped
v2-ci / integration (tests/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer.IntegrationTests) (push) Has been skipped
Introduces the IAuditActorAccessor seam and HttpAuditActorAccessor impl so the
ZB.MOM.WW.Audit.AuditEvent Actor field can be sourced from the authenticated Blazor
cookie principal (ZbClaimTypes.Username) when structured emitters are added. Adds the
AuditActor.Resolve static helper (accessor value → SystemFallback/"system") as the
canonical pattern for future emit sites. Wires DI in AddOtOpcUaAuth (TryAddScoped) with
AddHttpContextAccessor(). The structured AuditEvent path remains DORMANT — no live emit
sites exist; seam is forward-looking. SP-based audit path left untouched. 9 new unit
tests all green; Security (54) and ControlPlane (45) test suites fully pass.
2026-06-02 15:25:49 -04:00
Joseph Doherty b7f5e887ee feat(audit): OtOpcUa ConfigAuditLog.Outcome column + migration + ClusterAudit visibility fix (Task 2.2)
Persist the canonical AuditOutcome and make structured audit rows visible.

- ConfigAuditLog gains a nullable Outcome column, stored as the AuditOutcome
  enum member name (nvarchar(16), mirroring how AdminRole is persisted). The
  AuditWriterActor flush now writes Outcome = evt.Outcome.ToString(). Nullable so
  legacy rows and the bespoke stored-procedure path (no derived outcome) write
  NULL.
- Migration 20260602135350_AddConfigAuditLogOutcome: additive nullable column,
  no backfill. Up adds the column, Down drops it. Chains after
  20260602112419_CanonicalizeAdminRoles; `dotnet ef migrations
  has-pending-model-changes` is clean.
- ClusterAudit visibility fix: the page filtered solely on ClusterId, but the
  structured AuditWriterActor path stamps NodeId (ClusterId null), so those rows
  were invisible. Extracted ClusterAuditQuery.ForClusterAsync (shared by the page
  and tests) which ORs in rows whose NodeId belongs to a node in the cluster —
  membership resolved from ClusterNode (NodeId -> ClusterId). SP-path
  ClusterId-stamped rows still match.

Tests: ControlPlane 45/45 (adds Outcome persistence + Denied-outcome asserts);
new Configuration ClusterAuditQueryTests 3/3 (both-paths visible, other-cluster
excluded, page-size cap); AdminUI 121/121. Configuration Unit suite is green on a
clean run (a pre-existing timing flake in ResilientConfigReaderTests, untouched
here, occasionally fails under parallel load and passes in isolation).
2026-06-02 09:59:22 -04:00
Joseph Doherty 933dd1a874 feat(audit): OtOpcUa adopt canonical ZB.MOM.WW.Audit.AuditEvent + AuditWriterActor:IAuditWriter + Outcome derivation (Task 2.1)
Deep-adopt the shared audit record. Deletes the bespoke 8-field positional
Commons AuditEvent and repoints the writer path at ZB.MOM.WW.Audit.AuditEvent
(0.1.0, feed-mapped via dohertj2-gitea). Adds the package reference to both
Commons and ControlPlane.

- AuditWriterActor now implements IAuditWriter: WriteAsync(evt, ct) is a
  best-effort, never-throwing entry point that Self.Tell()s the event onto the
  same batching/dedup/flush pipeline and returns Task.CompletedTask. Existing
  Receive<AuditEvent> + 500/5s batching + two-layer dedup unchanged.
- Flush mapping updated for the canonical field types: OccurredAtUtc is now
  DateTimeOffset (.UtcDateTime into the datetime2 column), SourceNode is string?
  (was NodeId.Value), CorrelationId is Guid? (stored null when null). Outcome is
  NOT yet persisted (column lands in Task 2.2).
- New AuditOutcomeMapper.FromAction maps the OtOpcUa action vocabulary to the
  required canonical Outcome: OpcUaAccessDenied / CrossClusterNamespaceAttempt ->
  Denied; config verbs (DraftCreated/Edited, Published, RolledBack, NodeApplied,
  ClusterCreated, NodeAdded, CredentialAdded/Disabled, ExternalIdReleased) ->
  Success. OtOpcUa emits no Failure events.

The Akka message shape changed, but the structured audit path is dormant (zero
production emit/Tell sites; all live audit flows through the bespoke SP path),
so there is no rolling-deploy wire-compat concern. Tested-not-exercised by
design.

ControlPlane.Tests: 44/44 green (AuditWriterActor suite rewritten to construct
the canonical record + assert the Outcome derivation table + the WriteAsync
best-effort/mailbox-routing contract + null SourceNode/CorrelationId handling).
2026-06-02 09:53:12 -04:00
Joseph Doherty c1619d95f5 feat(auth)!: OtOpcUa canonical control-plane roles + config-DB migration (Task 1.7)
Standardize the control-plane admin role VALUES on the canonical six
(ZB.MOM.WW.Auth CanonicalRole). OtOpcUa uses four:
  ConfigViewer   -> Viewer
  ConfigEditor   -> Designer
  FleetAdmin     -> Administrator
  DriverOperator -> Operator   (appsettings-only string role)

This is a rename, not a permission change: enforcement semantics are
preserved (whoever could deploy/administer/operate before still can).

- AdminRole enum members renamed (persisted as string names via
  HasConversion<string>); RoleGrants.razor dropdown default updated.
- EF DATA migration CanonicalizeAdminRoles rewrites existing
  LdapGroupRoleMapping.Role rows old->new (Up) and back (Down); schema /
  model snapshot byte-identical (no pending model changes).
- Enforcement role STRINGS canonicalized:
  * Security policies keep their NAMES ("DriverOperator"/"FleetAdmin")
    but require canonical roles: RequireRole("Operator","Administrator")
    and RequireRole("Administrator").
  * Deployments.razor [Authorize(Roles="Administrator,Designer")].
  * DevStub now grants "Administrator"; LdapOptions/doc-comment examples
    canonicalized.
- Data-plane authorization (NodePermissions/NodeAcl/IPermissionEvaluator/
  TriePermissionEvaluator/UserAuthorizationState) UNTOUCHED.
- New CanonicalAdminRolesTests pins canonical claim values end-to-end and
  the real registered policies; existing role-string tests updated.
2026-06-02 07:30:00 -04:00
Joseph Doherty 8ba289f975 chore(auth): OtOpcUa unify dev LDAP base DN to dc=zb,dc=local (Task 1.6)
Replace all dev-directory dc=lmxopcua,dc=local references with dc=zb,dc=local
across LdapOptions default, integration harness overrides, docker-compose LDAP_ROOT,
AclEdit placeholder DN, and dev/smoke-test docs. CN/OU prefixes preserved.
2026-06-02 06:45:23 -04:00
Joseph Doherty d0777eee29 fix(auth): OtOpcUa Task 1.5 review — pin JWT role-claim test + document issued-only JWT role key
Fix 1 (test): Token_payload_uses_canonical_zb_claim_keys now asserts that the JWT
payload carries at least one role under JwtTokenService.RoleClaimType ("Role"),
pinning the role-key contract so a future rename is caught immediately. Adds a
comment explaining why alice has roles (appsettings "ReadOnly"→"ConfigViewer"
baseline). Adds missing `using ZB.MOM.WW.OtOpcUa.Security.Jwt` to the test file.

Fix 2 (no-validation path — no AddJwtBearer in production pipeline): grep of src/
confirms no AddJwtBearer / JwtBearer scheme in ServiceCollectionExtensions or Host;
the ServiceCollectionExtensions doc comment explicitly states "no JwtBearer parallel
scheme". RoleClaimType intentionally stays the short "Role" key. Three changes:
  - RoleClaimType doc comment documents issued-only nature, the caveat that a
    JwtBearer scheme MUST use BuildValidationParameters(), and that BuildValidationParameters
    is already wired to set RoleClaimType+NameClaimType correctly.
  - Issue() inline comment at the role-mint site references RoleClaimType docs.
  - BuildValidationParameters() now sets RoleClaimType=RoleClaimType and
    NameClaimType=UsernameClaimType so that if it is ever passed to AddJwtBearer,
    role/name resolution is correct without any extra wiring. TryValidate() is
    refactored to delegate to BuildValidationParameters() so the two can never drift.

All 35 security tests green.
2026-06-02 06:30:10 -04:00
Joseph Doherty 83856b7c27 feat(auth): OtOpcUa adopt ZbClaimTypes + ZbCookieDefaults, keep cookie name (Task 1.5)
Add ZB.MOM.WW.Auth.AspNetCore package ref to Security project (version 0.1.1
from central PM). Alias JwtTokenService.UsernameClaimType and DisplayNameClaimType
to ZbClaimTypes.Username ("zb:username") and ZbClaimTypes.DisplayName ("zb:displayname")
so every mint/read site inherits the canonical spelling. AuthEndpoints login path now
emits ZbClaimTypes.Name (= ClaimTypes.Name, populates Identity.Name) instead of
ClaimTypes.NameIdentifier (no other read site used it), and references ZbClaimTypes.Role
(= ClaimTypes.Role) for role claims so [Authorize(Roles=...)] continues to resolve.
Cookie hardening now flows through ZbCookieDefaults.Apply (sets HttpOnly, SameSite=Strict,
SlidingExpiration, SecurePolicy, ExpireTimeSpan) followed by opts.Cookie.Name = v.Name to
preserve the OtOpcUa-specific "ZB.MOM.WW.OtOpcUa.Auth" cookie name. Two new tests added
to AuthEndpointsIntegrationTests assert canonical ZbClaimTypes on the cookie principal and
canonical zb: keys in the JWT payload; all 35 security tests green.
2026-06-02 06:11:00 -04:00
Joseph Doherty c4f315ec90 fix(auth): OtOpcUa 1.2 review fixes — startup insecure-transport guard + Ldaps in prod overlays, test fidelity, 0.1.1 pin 2026-06-02 01:37:29 -04:00
Joseph Doherty 257caa7bd1 feat(auth): cut OtOpcUa over to ZB.MOM.WW.Auth.Ldap; preserve DevStubMode; route roles via IGroupRoleMapper (Task 1.2/1.4) 2026-06-02 00:55:10 -04:00
Joseph Doherty 6534875476 feat(auth): add IGroupRoleMapper<string> seam (Task 1.1) 2026-06-02 00:29:45 -04:00
Joseph Doherty d2d7730830 build: add ZB.MOM.WW.Auth/Audit feed mapping + version pins
Maps ZB.MOM.WW.Auth, ZB.MOM.WW.Auth.*, ZB.MOM.WW.Audit to the gitea feed
and pins Auth.Abstractions/Ldap/AspNetCore + Audit at 0.1.0. No project
references yet (added during Phase 1/2 adoption). OtOpcUa omits Auth.ApiKeys
(OPC UA transport security).
2026-06-02 00:16:39 -04:00
Joseph Doherty 2844180865 fix: honor LdapOptions.Enabled at runtime; dedupe ILdapAuthService registration; +SearchBase test, doc fix
v2-ci / build (push) Failing after 41s
v2-ci / unit-tests (tests/Core/ZB.MOM.WW.OtOpcUa.Cluster.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.ControlPlane.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Runtime.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Security.Tests) (push) Has been skipped
v2-ci / integration (tests/Server/ZB.MOM.WW.OtOpcUa.Host.IntegrationTests) (push) Has been skipped
v2-ci / integration (tests/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer.IntegrationTests) (push) Has been skipped
2026-06-01 23:03:12 -04:00
Joseph Doherty d3ab2bfbaf fix: bind OtOpcUa LdapOptions from real Security:Ldap section; gate validator on DevStubMode 2026-06-01 22:46:09 -04:00
Joseph Doherty 88e773af36 feat: validate OpcUa host options at startup (route through IOptions + ValidateOnStart) 2026-06-01 18:45:55 -04:00
Joseph Doherty f35ebd7aaf feat: add fail-fast LDAP options validation in OtOpcUa via ZB.MOM.WW.Configuration 2026-06-01 18:32:44 -04:00
Joseph Doherty 0cbb82e466 build: add ZB.MOM.WW.Configuration feed mapping + version pin 2026-06-01 18:10:28 -04:00
Joseph Doherty 7b6884031d Merge feat/telemetry-followons: telemetry follow-ons for OtOpcUa
v2-ci / build (push) Failing after 34s
v2-ci / unit-tests (tests/Core/ZB.MOM.WW.OtOpcUa.Cluster.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.ControlPlane.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Runtime.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Security.Tests) (push) Has been skipped
v2-ci / integration (tests/Server/ZB.MOM.WW.OtOpcUa.Host.IntegrationTests) (push) Has been skipped
v2-ci / integration (tests/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer.IntegrationTests) (push) Has been skipped
Serilog.AspNetCore/Extensions.Hosting/Settings.Configuration aligned to 10.0.0;
config-driven OTLP exporter opt-in (default Prometheus; also makes recorded
spans exportable when OTLP is configured).
2026-06-01 17:17:23 -04:00
Joseph Doherty 7ff7a60ae0 feat(otopcua): config-driven OTLP exporter opt-in (default Prometheus) 2026-06-01 16:40:24 -04:00
Joseph Doherty 8faa2bf23d build(otopcua): align Serilog.AspNetCore/Extensions.Hosting/Settings.Configuration to 10.0.0 2026-06-01 16:35:34 -04:00
Joseph Doherty 2099713ed8 Merge feat/adopt-zb-telemetry: adopt ZB.MOM.WW.Telemetry across OtOpcUa
v2-ci / build (push) Failing after 51s
v2-ci / unit-tests (tests/Core/ZB.MOM.WW.OtOpcUa.Cluster.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.ControlPlane.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Runtime.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Security.Tests) (push) Has been skipped
v2-ci / integration (tests/Server/ZB.MOM.WW.OtOpcUa.Host.IntegrationTests) (push) Has been skipped
v2-ci / integration (tests/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer.IntegrationTests) (push) Has been skipped
AddZbTelemetry (shared OTel Resource identity + standard instrumentation;
kept meter ZB.MOM.WW.OtOpcUa + /metrics) and AddZbSerilog (shared enrichers +
trace correlation; sinks moved to appsettings). Behaviour-preserving.
2026-06-01 16:05:34 -04:00
Joseph Doherty c05ffc7b39 build(otopcua): add <clear/> to NuGet.config packageSources for supply-chain hygiene parity 2026-06-01 16:03:15 -04:00
Joseph Doherty 60017177cb feat(otopcua): adopt AddZbSerilog (shared enrichers + trace correlation); sinks to config 2026-06-01 15:41:21 -04:00
Joseph Doherty 26bae36f8b feat(otopcua): wire OTel via AddZbTelemetry (shared Resource + std instrumentation) 2026-06-01 15:33:28 -04:00
Joseph Doherty 368390ea9d build(otopcua): reference ZB.MOM.WW.Telemetry packages from Gitea feed 2026-06-01 15:29:46 -04:00
Joseph Doherty 8f950722c6 Merge feat/adopt-zb-health: adopt ZB.MOM.WW.Health shared probes (OtOpcUaCompat policy, admin-leader, ProbeQuery)
v2-ci / build (push) Failing after 5m5s
v2-ci / unit-tests (tests/Core/ZB.MOM.WW.OtOpcUa.Cluster.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.ControlPlane.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Runtime.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Security.Tests) (push) Has been skipped
v2-ci / integration (tests/Server/ZB.MOM.WW.OtOpcUa.Host.IntegrationTests) (push) Has been skipped
v2-ci / integration (tests/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer.IntegrationTests) (push) Has been skipped
2026-06-01 14:07:02 -04:00
Joseph Doherty 1d729fb0f8 feat: adopt shared ZB.MOM.WW.Health probes (preserve tiers + OtOpcUaCompat policy) 2026-06-01 13:36:28 -04:00
Joseph Doherty 0b99aceacb build: reference ZB.MOM.WW.Health packages from the Gitea feed 2026-06-01 13:30:13 -04:00
Joseph Doherty d57b42bcd6 chore: gitignore local credentials file and runtime PKI store
v2-ci / build (push) Failing after 45s
v2-ci / unit-tests (tests/Core/ZB.MOM.WW.OtOpcUa.Cluster.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.ControlPlane.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Runtime.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Security.Tests) (push) Has been skipped
v2-ci / integration (tests/Server/ZB.MOM.WW.OtOpcUa.Host.IntegrationTests) (push) Has been skipped
v2-ci / integration (tests/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer.IntegrationTests) (push) Has been skipped
sql_login.txt holds DB creds and the Host pki/ dir is the runtime OPC UA
certificate store (private keys + issued/trusted certs); neither belongs
in source control, and ignoring them prevents an accidental git add .
2026-05-31 10:27:59 -04:00
Joseph Doherty 5e87f7e16f docs(alarms): record 2026-05-31 live re-confirmation of native alarm feed
v2-ci / build (push) Failing after 41s
v2-ci / unit-tests (tests/Core/ZB.MOM.WW.OtOpcUa.Cluster.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.ControlPlane.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Runtime.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Security.Tests) (push) Has been skipped
v2-ci / integration (tests/Server/ZB.MOM.WW.OtOpcUa.Host.IntegrationTests) (push) Has been skipped
v2-ci / integration (tests/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer.IntegrationTests) (push) Has been skipped
Independently re-ran the D.1 alarm-source smoke against the live gateway
(10.100.0.48:5120) to back the native MxAccess alarm-event claim with a
fresh empirical run, not just the original 2026-05-29 capture.
2026-05-31 10:12:47 -04:00
Joseph Doherty 695fa6408b docs(alarms): record native alarms verified working; add D.1 smoke
v2-ci / build (push) Failing after 47s
v2-ci / unit-tests (tests/Core/ZB.MOM.WW.OtOpcUa.Cluster.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.ControlPlane.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Runtime.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Security.Tests) (push) Has been skipped
v2-ci / integration (tests/Server/ZB.MOM.WW.OtOpcUa.Host.IntegrationTests) (push) Has been skipped
v2-ci / integration (tests/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer.IntegrationTests) (push) Has been skipped
The 2026-04-30 alarm plan banners claimed worker-side native alarm
subscription was blocked on a COM-bitness finding. That's stale: the
mxaccessgw .NET client now has true MxAccess alarm-event support, and a
live StreamAlarms check (+ new Skip-gated GatewayGalaxyAlarmFeedLiveTests
through the lmxopcua consumer) confirms native alarms — operator comment,
category, severity, timestamps — flow end-to-end. Reconcile both plan docs
to reality and add docs/plans/alarms-d1-smoke-artifact.md as the D.1
alarm-source deliverable. Historian-write live smoke + full server->A&C
round-trip remain (Windows parity rig only).
2026-05-31 09:59:01 -04:00
Joseph Doherty 61193629b6 fix(adminui): wire Test Connect probes + live panels on admin-only nodes
v2-ci / build (push) Failing after 36s
v2-ci / unit-tests (tests/Core/ZB.MOM.WW.OtOpcUa.Cluster.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.ControlPlane.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Runtime.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Security.Tests) (push) Has been skipped
v2-ci / integration (tests/Server/ZB.MOM.WW.OtOpcUa.Host.IntegrationTests) (push) Has been skipped
v2-ci / integration (tests/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer.IntegrationTests) (push) Has been skipped
Both bugs surfaced only on split-role deployments (the MAIN cluster's
admin-only nodes), where the AdminUI runs without the driver role.

- Test Connect returned "No probe registered" for every driver: the
  IDriverProbe set was registered only under the driver role, but the
  admin-operations singleton that consumes it is pinned to admin. Extract
  AddOtOpcUaDriverProbes() (idempotent via TryAddEnumerable) and call it
  in the hasAdmin path too.

- Live driver-status/alerts/script-log panels showed "SignalR error:
  Connection refused": these Blazor Server components opened a HubConnection
  to their own hub via the browser's public URL, which server-side code
  can't reach behind Traefik (host :9200 -> container :9000). Read the
  in-process source directly instead -- DriverStatus via
  IDriverStatusSnapshotStore.SnapshotChanged, Alerts/ScriptLog via a new
  IInProcessBroadcaster<T>. Fleet status was unaffected (reads DB/ActorSystem).

Adds unit tests for probe registration, the snapshot-store event, and the
broadcaster.
2026-05-29 16:38:32 -04:00
Joseph Doherty e3a27422a1 fix(adminui): Galaxy editor 500 — read DriverConfig case-insensitively + null-safe FromRecord
v2-ci / build (push) Failing after 39s
v2-ci / unit-tests (tests/Core/ZB.MOM.WW.OtOpcUa.Cluster.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.ControlPlane.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Runtime.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Security.Tests) (push) Has been skipped
v2-ci / integration (tests/Server/ZB.MOM.WW.OtOpcUa.Host.IntegrationTests) (push) Has been skipped
v2-ci / integration (tests/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer.IntegrationTests) (push) Has been skipped
GalaxyDriverPage deserialized DriverConfig with case-sensitive camelCase opts, but the
persisted/seeded config is PascalCase (the runtime reads it case-insensitively). So all four
nested option records read as null -> FromRecord NRE (HTTP 500) on edit, and the form would
have shown defaults instead of the real config (risking a clobber on save). Fix: add
PropertyNameCaseInsensitive=true (matches the runtime) so real values load, plus null-coalesce
the nested records in FromRecord as defense-in-depth. Regression test asserts the seeded
PascalCase config loads its real values.
2026-05-29 12:45:44 -04:00
Joseph Doherty 32d7fd7cc9 fix(galaxy): complete PR 7.2 rename — use canonical GalaxyMxGateway driver type
v2-ci / build (push) Failing after 48s
v2-ci / unit-tests (tests/Core/ZB.MOM.WW.OtOpcUa.Cluster.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.ControlPlane.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Runtime.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Security.Tests) (push) Has been skipped
v2-ci / integration (tests/Server/ZB.MOM.WW.OtOpcUa.Host.IntegrationTests) (push) Has been skipped
v2-ci / integration (tests/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer.IntegrationTests) (push) Has been skipped
The driver/factory/seed use 'GalaxyMxGateway' (legacy 'Galaxy' was retired),
but the AdminUI editor router, GalaxyDriverPage, address picker, identity
dropdown, the Galaxy browser/probe, and DraftValidator still keyed on 'Galaxy'.
Result: the seeded GalaxyMxGateway driver couldn't be edited ('no editor
registered'), UI-created Galaxy drivers wrote a type with no factory, and a
SystemPlatform-bound GalaxyMxGateway driver failed publish validation.
Align all stragglers to GalaxyMxGateway (+ failing-test-first DraftValidator
coverage). ShouldStub's 'Galaxy' legacy safety-net left intact.
2026-05-29 12:31:55 -04:00
Joseph Doherty de666b24c3 test: fix Galaxy-tag Phase7 test fixtures + S7 CLI enum; add MaterialiseGalaxyTags coverage
v2-ci / build (push) Failing after 38s
v2-ci / unit-tests (tests/Core/ZB.MOM.WW.OtOpcUa.Cluster.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.ControlPlane.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Runtime.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Security.Tests) (push) Has been skipped
v2-ci / integration (tests/Server/ZB.MOM.WW.OtOpcUa.Host.IntegrationTests) (push) Has been skipped
v2-ci / integration (tests/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer.IntegrationTests) (push) Has been skipped
Completes the test side of the in-progress Galaxy-tag workstream:
- Phase7ApplierTests / Phase7ApplierHierarchyTests: supply the now-required
  Galaxy-tag args to Phase7Plan / Phase7CompositionResult.
- Add genuine coverage for Phase7Applier.MaterialiseGalaxyTags (folder-per-distinct-path,
  variable-per-tag node-id derivation, folder dedupe) + added-Galaxy-tags-trigger-rebuild.
- S7.Cli.Tests: use the project's S7CpuType (CLI option type) instead of S7.Net.CpuType.
Whole solution now builds 0/0; OpcUaServer.Tests 52, S7.Cli.Tests 36 green.
2026-05-29 12:18:01 -04:00
Joseph Doherty a4fb97aef8 chore(docker-dev): remap Traefik to host port 9200
v2-ci / build (push) Failing after 2m6s
v2-ci / unit-tests (tests/Core/ZB.MOM.WW.OtOpcUa.Cluster.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.ControlPlane.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Runtime.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Security.Tests) (push) Has been skipped
v2-ci / integration (tests/Server/ZB.MOM.WW.OtOpcUa.Host.IntegrationTests) (push) Has been skipped
v2-ci / integration (tests/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer.IntegrationTests) (push) Has been skipped
Host :80 collides with the sister scadabridge-traefik dev stack; bind the
OtOpcUa Traefik :80 entrypoint to host 9200 instead (admin UI now at
http://localhost:9200). Dashboard already on 8089 to avoid the same clash.
2026-05-29 12:09:21 -04:00
Joseph Doherty da4634d67e fix(tests,cli): implement IOpcUaAddressSpaceSink.EnsureVariable in test fakes; fix CLI CS1587
v2-ci / build (push) Failing after 44s
v2-ci / unit-tests (tests/Core/ZB.MOM.WW.OtOpcUa.Cluster.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.ControlPlane.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Runtime.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Security.Tests) (push) Has been skipped
v2-ci / integration (tests/Server/ZB.MOM.WW.OtOpcUa.Host.IntegrationTests) (push) Has been skipped
v2-ci / integration (tests/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer.IntegrationTests) (push) Has been skipped
Resolves the 12 reported build errors (7 CS0535 sink fakes + 5 CLI CS1587).
Runtime.Tests green (74). NOTE: OpcUaServer.Tests still has pre-existing CS7036
errors from the in-progress Galaxy-tag workstream (Phase7Plan/Phase7CompositionResult
new required params) — separate, test-only, not addressed here.
2026-05-29 10:19:32 -04:00
Joseph Doherty 869be660fd fix(adminui): strip stale Phase C.2 / rebuild-plan roadmap notes from cluster list pages
v2-ci / build (push) Failing after 49s
v2-ci / unit-tests (tests/Core/ZB.MOM.WW.OtOpcUa.Cluster.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.ControlPlane.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Runtime.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Security.Tests) (push) Has been skipped
v2-ci / integration (tests/Server/ZB.MOM.WW.OtOpcUa.Host.IntegrationTests) (push) Has been skipped
v2-ci / integration (tests/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer.IntegrationTests) (push) Has been skipped
Removes the internal-roadmap deferral banners (the original request that
seeded this work); kept the genuinely useful operator descriptions.
2026-05-29 10:12:15 -04:00
Joseph Doherty a8916c3e08 docs(adminui): correct stale follow-up source comments (F15/F16/Phase4/TODO 3.3-3.4)
v2-ci / build (push) Failing after 46s
v2-ci / unit-tests (tests/Core/ZB.MOM.WW.OtOpcUa.Cluster.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.ControlPlane.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Runtime.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Security.Tests) (push) Has been skipped
v2-ci / integration (tests/Server/ZB.MOM.WW.OtOpcUa.Host.IntegrationTests) (push) Has been skipped
v2-ci / integration (tests/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer.IntegrationTests) (push) Has been skipped
2026-05-29 10:00:58 -04:00
Joseph Doherty 79b2345834 fix(adminui): disable RoleGrants buttons during save (review) 2026-05-29 09:58:05 -04:00
Joseph Doherty 4df5b849ac fix(security): let OperationCanceledException propagate from login role merge (review) 2026-05-29 09:56:09 -04:00
Joseph Doherty a58151e99e feat(adminui): editable DB-backed LDAP role map (global, FleetAdmin-gated) 2026-05-29 09:55:07 -04:00
Joseph Doherty 1fd093d95d test(config): global LdapGroupRoleMapping CRUD 2026-05-29 09:52:47 -04:00
Joseph Doherty f210f09caf feat(security): merge DB-backed LDAP role grants into login claims 2026-05-29 09:51:22 -04:00
Joseph Doherty 042f3b6a65 feat(security): add FleetAdmin authorization policy 2026-05-29 09:48:31 -04:00
Joseph Doherty bc40388914 chore(di): register ILdapGroupRoleMappingService 2026-05-29 09:47:10 -04:00
Joseph Doherty b719194046 feat(security): RoleMapper.Merge — additive DB-backed role grants 2026-05-29 09:43:12 -04:00
Joseph Doherty 7570df76d3 feat(adminui): editable OpcUaClient endpoint URL list via CollectionEditor 2026-05-29 09:41:09 -04:00
Joseph Doherty 244949caa3 feat(adminui): editable S7 tag list via CollectionEditor 2026-05-29 09:37:12 -04:00
Joseph Doherty a5a0d06dbe feat(adminui): editable FOCAS device + tag lists via CollectionEditor 2026-05-29 09:33:53 -04:00
Joseph Doherty 6882761f4c feat(adminui): editable TwinCAT device + tag lists via CollectionEditor 2026-05-29 09:29:57 -04:00
Joseph Doherty 15f3797f1e feat(adminui): editable AbLegacy device + tag lists via CollectionEditor 2026-05-29 09:26:25 -04:00
Joseph Doherty 534d670b21 feat(adminui): editable AbCip device + tag lists via CollectionEditor 2026-05-29 09:22:51 -04:00
Joseph Doherty b351a81c8f fix(adminui): preserve un-edited Modbus tag fields across edit (review)
Capture the original ModbusTagDefinition as _source in ModbusTagRow and
rewrite ToDefinition() to use 'with {}', so StringByteOrder, ArrayCount,
Deadband, UnitId, and CoalesceProhibited survive a load→edit→save cycle.
2026-05-29 09:18:36 -04:00
Joseph Doherty f655efc570 feat(adminui): typed resilience override form replaces JSON textarea 2026-05-29 09:15:54 -04:00
Joseph Doherty c4116e54c9 feat(adminui): editable Modbus tag list via CollectionEditor 2026-05-29 09:14:06 -04:00
Joseph Doherty c3fec1426c fix(adminui): case-insensitive resilience policy keys + malformed-json test (review) 2026-05-29 09:10:41 -04:00
Joseph Doherty a2761e4b98 fix(adminui): key CollectionEditor rows by identity (code review) 2026-05-29 09:08:02 -04:00
Joseph Doherty 4a469fbe06 feat(adminui): typed resilience override form model + tests 2026-05-29 09:06:45 -04:00
Joseph Doherty e2fa6754bb feat(adminui): add generic CollectionEditor<TRow> modal list editor 2026-05-29 09:03:03 -04:00
Joseph Doherty b76561a780 docs(adminui): implementation plan + task persistence for deferred follow-ups
19 tasks across WS1 (driver collection editors), WS2 (typed resilience
form), WS3 (editable DB-backed LDAP role map, global), WS4 (cleanup).
2026-05-29 08:59:55 -04:00
Joseph Doherty c49fccbe0c docs(adminui): design for completing deferred follow-ups
Driver collection editors (modal-per-row shared shell), resilience typed
form, editable DB-backed LDAP->role map (global roles, live on next
sign-in), and stale-comment/note cleanup. Roles intentionally global —
no per-cluster permissions.
2026-05-29 08:45:50 -04:00
Joseph Doherty 5622e51006 fix(adminui): clean up dev-migration note on Home page
Removed the F15 follow-up annotation that was visible to end users.
Replaced with a one-line orientation pointer to the nav.
2026-05-29 08:02:57 -04:00
Joseph Doherty 9e479ce675 test(security): fix Logout_clears_the_cookie
v2-ci / build (push) Failing after 44s
v2-ci / unit-tests (tests/Core/ZB.MOM.WW.OtOpcUa.Cluster.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.ControlPlane.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Runtime.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Security.Tests) (push) Has been skipped
v2-ci / integration (tests/Server/ZB.MOM.WW.OtOpcUa.Host.IntegrationTests) (push) Has been skipped
v2-ci / integration (tests/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer.IntegrationTests) (push) Has been skipped
Two pre-existing test bugs surfaced by the auth-alignment branch:
 - Test wanted the 204/JSON contract but never sent Accept:
   application/json — endpoint correctly returned 302 (form POST).
 - Cookie-name assertion still used OtOpcUa.Auth= (now
   ZB.MOM.WW.OtOpcUa.Auth= since the Task 1 default change).

Endpoint behavior is intentional and untouched.
2026-05-29 08:01:26 -04:00
Joseph Doherty af691f3291 fix(security): correct challenge tests to match framework reality
ASP.NET Core's cookie-handler IsAjaxRequest heuristic only checks
X-Requested-With (not Accept). Drop the third test (Accept: application/json
was assumed to → 401 but actually → 302) and the Location.ShouldBeNull
assertion on the XHR test (framework still writes Location alongside 401;
clients ignore it). Renamed _ajax_ → _xhr_ for accuracy. Design doc
updated to match.
2026-05-29 07:58:18 -04:00
Joseph Doherty 453340e71e test(security): add browser-vs-AJAX challenge tests for root path
Adds protected MapGet("/") in the test host plus three [Fact] methods
exercising the cookie scheme's challenge heuristic for the root route:
browser (Accept: text/html), AJAX (X-Requested-With: XMLHttpRequest),
and JSON (Accept: application/json) callers. Also adds a no-redirect
HttpClient helper so the 302 + Location can be asserted directly.
2026-05-29 07:56:15 -04:00
Joseph Doherty b64d670303 style(security): use Authorization namespace import (code-review cleanup) 2026-05-29 07:51:29 -04:00
Joseph Doherty c83e9397e6 chore(security): drop Microsoft.AspNetCore.Authentication.JwtBearer (unused) 2026-05-29 07:50:47 -04:00
Joseph Doherty 74b9218a92 refactor(security): drop JwtBearer parallel scheme, externalize cookie config
Single Cookie auth scheme; framework default challenge restores 302 → /login
for browsers + 401 for AJAX. OtOpcUaCookieOptions now flows through to
CookieAuthenticationOptions via PostConfigure (fixes a latent bug where the
options class was bound but ignored). Cookie name moves to
ZB.MOM.WW.OtOpcUa.Auth; existing sessions get a one-time forced sign-out.
2026-05-29 07:47:58 -04:00
Joseph Doherty 532e9933f3 feat(security): extend OtOpcUaCookieOptions with RequireHttpsCookie + ZB.MOM.WW cookie name default 2026-05-29 07:44:33 -04:00
Joseph Doherty ee8add4416 docs: implementation plan for auth/login alignment with ScadaBridge
5 tasks following Section 6 of the approved design (bc4fce5). Tasks 3 and 4
parallelizable. Each task carries Classification + Estimated implement time
+ Parallelizable-with metadata for subagent dispatch.
2026-05-29 07:43:11 -04:00
Joseph Doherty bc4fce5fbe docs: design for auth/login alignment with ScadaBridge
Removes the JwtBearer parallel scheme + non-redirect 401 challenge that left
browsers staring at Chrome's HTTP_RESPONSE_CODE_FAILURE page on protected
GETs. JWT keeps minting (cookie payload only); cookie config flows through
the existing-but-unused OtOpcUaCookieOptions via PostConfigure (same pattern
ScadaBridge uses).
2026-05-29 07:39:11 -04:00
Joseph Doherty 7a0b8525a9 chore(docker-dev): rotate GALAXY_MXGW_API_KEY default to new credential
Replaces the old fallback (mxgw_otopcua_…UY_NKlBl3) with the freshly issued
mxgw_otopcua2_GI7-… on all 8 host services. Gateway endpoint stays at
http://10.100.0.48:5120 (seed-clusters.sql already points there). Operators
who set GALAXY_MXGW_API_KEY in their shell continue to override the default
unchanged.
2026-05-29 07:18:23 -04:00
Joseph Doherty 560b327ee1 refactor(galaxy): migrate to ZB.MOM.WW.MxGateway.* nupkg packages
v2-ci / build (push) Failing after 33s
v2-ci / unit-tests (tests/Core/ZB.MOM.WW.OtOpcUa.Cluster.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.ControlPlane.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Runtime.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Security.Tests) (push) Has been skipped
v2-ci / integration (tests/Server/ZB.MOM.WW.OtOpcUa.Host.IntegrationTests) (push) Has been skipped
v2-ci / integration (tests/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer.IntegrationTests) (push) Has been skipped
Imports the freshly-rebuilt ZB.MOM.WW.MxGateway.Client + ZB.MOM.WW.MxGateway.Contracts
nupkgs (0.1.0) from /tmp/mxgw-dist. Replaces the vendored libs/ DLLs and the
pre-restructure MxGateway.* namespaces across the runtime Galaxy driver,
Galaxy.Browser, and their tests.

Key changes:
- nuget-packages/ added as a local feed via NuGet.config; .gitignore exempts it
  from the *.nupkg rule so the packages are tracked
- Directory.Packages.props pins both packages at 0.1.0
- 4 csprojs swap <Reference HintPath="libs/...dll"/> for <PackageReference/>
- 36 .cs files renamed `using MxGateway.*` -> `using ZB.MOM.WW.MxGateway.*`
- libs/ removed (vendored DLLs + README.md)

GalaxyBrowseSession rewritten around the new lazy API:
- RootAsync calls GalaxyRepositoryClient.BrowseAsync (returns LazyBrowseNodes)
  and caches them by TagName instead of bulk-fetching the whole hierarchy
- ExpandAsync looks up the cached LazyBrowseNode and calls its ExpandAsync,
  giving true one-wire-call-per-click instead of in-memory parent/child scan
- _byGobjectId + _hasChildrenSet dropped (LazyBrowseNode carries HasChildrenHint)
- AttributesAsync unchanged (already uses DiscoverHierarchyAsync MaxDepth=0)

Tests: Galaxy.Tests 245/245, Galaxy.Browser.Tests 10/10, AdminUI.Tests 66/66.
Pre-existing 12 solution errors unchanged (test sinks + Cli XML comments).
2026-05-29 07:14:18 -04:00
Joseph Doherty d1b6cff085 docs: link driver-browsers design from CLAUDE.md 2026-05-28 16:23:28 -04:00
Joseph Doherty ef17d2e595 fix(adminui): picker DisposeAsync is fire-and-forget per design 2026-05-28 16:21:24 -04:00
Joseph Doherty e439100937 fix(adminui): DriverBrowseTree uses local field, not parameter mutation 2026-05-28 16:18:58 -04:00
Joseph Doherty 7c9621040e feat(adminui): wire Galaxy picker to live browser + attribute side-panel 2026-05-28 16:17:34 -04:00
Joseph Doherty 1b0baf7025 feat(adminui): wire OpcUaClient picker to live browser 2026-05-28 16:16:37 -04:00
Joseph Doherty f31af0093f test(opcuaclient.browser): opc-plc integration round-trip 2026-05-28 16:13:43 -04:00
Joseph Doherty 6e365ef1a9 feat(adminui): shared lazy DriverBrowseTree component with per-node filter 2026-05-28 16:13:03 -04:00
Joseph Doherty 1dbd3b2a6d feat(adminui): register browse services in AddAdminUI 2026-05-28 16:11:13 -04:00
Joseph Doherty 48c3c56073 test(galaxy.browser): unit + fake-transport session coverage 2026-05-28 16:07:13 -04:00
Joseph Doherty 5475ab2aa3 test(opcuaclient.browser): unit + opc-plc live coverage 2026-05-28 16:04:25 -04:00
Joseph Doherty 1a143beeb9 feat(galaxy.browser): add transient gateway-connection factory
GalaxyDriverBrowser opens an ad-hoc GalaxyRepositoryClient from the
AdminUI's persisted Galaxy options and hands it to a GalaxyBrowseSession
for the address picker. Mirrors GalaxyDriver.BuildClientOptions field-
for-field so the gateway sees an identical option shape, with API-key
resolution inlined (env:/file:/dev: prefixes) so the Browser project
needn't take a hard reference on Driver.Galaxy.

Connect phase runs under a 30s budget linked to the caller's CT and
includes a TestConnectionAsync call so auth/TLS/DNS failures surface
inside the budget instead of waiting for the first DiscoverHierarchy
round-trip. On any post-Create exception the client is disposed before
the throw propagates.

Refactored GalaxyBrowseSession to take only GalaxyRepositoryClient —
browse never needs MxGatewaySession (that's only for live subscribe/
write paths), and constructing one outside the runtime driver isn't
straightforward. The session now disposes _client in DisposeAsync; the
_session field/parameter is gone.
2026-05-28 15:59:57 -04:00
Joseph Doherty 641b2ecbcf fix(opcuaclient.browser): volatile _disposed for cross-thread visibility 2026-05-28 15:54:33 -04:00
Joseph Doherty 09d1bbac00 feat(opcuaclient.browser): add transient-session factory 2026-05-28 15:53:17 -04:00
Joseph Doherty b869af2b3d fix(galaxy.browser): volatile _disposed, RootAsync gate, O(1) child hint 2026-05-28 15:51:31 -04:00
Joseph Doherty 56be42913c feat(opcuaclient.browser): add lazy browse session impl 2026-05-28 15:48:56 -04:00
Joseph Doherty dc8a2dd52c test(adminui): browse session registry, reaper, service 2026-05-28 15:44:20 -04:00
Joseph Doherty d605d0b20d feat(galaxy.browser): add lazy browse session with attribute fetch 2026-05-28 15:42:19 -04:00
Joseph Doherty 85676db3a5 feat(opcuaclient.browser): scaffold project + slnx entry 2026-05-28 15:39:14 -04:00
Joseph Doherty bec2988309 feat(adminui): in-process browse session registry + TTL reaper + service 2026-05-28 15:36:19 -04:00
Joseph Doherty 7cd5cde315 refactor(opcuaclient): move NamespaceMap to Contracts, make public
Browser project (Phase 3) needs to share namespace-stable address encoding
with the runtime driver. Move keeps the same namespace, so existing usages
in OpcUaClientDriver compile unchanged.
2026-05-28 15:35:21 -04:00
Joseph Doherty 7c92297d0e feat(galaxy.browser): scaffold project + slnx entry 2026-05-28 15:35:14 -04:00
Joseph Doherty 81f09a7054 feat(commons): add IDriverBrowser/IBrowseSession/BrowseNode abstractions 2026-05-28 15:32:01 -04:00
Joseph Doherty c962b86bde docs: implementation plan for driver browsers (OpcUaClient + Galaxy)
18-task plan following Section 9 of the approved design. Phases 3 & 4
parallelizable. Each task carries Classification + Estimated implement
time + Parallelizable-with metadata to drive subagent dispatch.
2026-05-28 15:29:40 -04:00
Joseph Doherty fcd0b9b355 docs: design for live address browsers (OpcUaClient + Galaxy)
Approved design for the deferred follow-up from PR #f9fc7dd's driver-pages
work. Lazy tree browse via per-driver IDriverBrowser registered in AdminUI
DI, sessions held in-process with TTL reaper. Detailed sequencing for the
writing-plans handoff is in section 9.
2026-05-28 15:19:52 -04:00
Joseph Doherty 0d3ec46c14 fix(adminui): capture audit username at click time, not at panel init
v2-ci / build (push) Failing after 48s
v2-ci / unit-tests (tests/Core/ZB.MOM.WW.OtOpcUa.Cluster.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.ControlPlane.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Runtime.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Security.Tests) (push) Has been skipped
v2-ci / integration (tests/Server/ZB.MOM.WW.OtOpcUa.Host.IntegrationTests) (push) Has been skipped
v2-ci / integration (tests/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer.IntegrationTests) (push) Has been skipped
DriverStatusPanel previously cached the username in a field at
OnInitializedAsync and forwarded the cached value into RestartDriver
/ ReconnectDriver messages. A token refresh or claim change mid-
circuit would land the stale name in the audit ConfigEdit row.
Re-reads AuthenticationStateProvider at button-click time so the
audit entry reflects the current principal.
2026-05-28 11:58:12 -04:00
Joseph Doherty 662f3f9f5c refactor(driver-pages): address Phase 6/8 deep-review findings
v2-ci / build (push) Failing after 32s
v2-ci / unit-tests (tests/Core/ZB.MOM.WW.OtOpcUa.Cluster.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.ControlPlane.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Runtime.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Security.Tests) (push) Has been skipped
v2-ci / integration (tests/Server/ZB.MOM.WW.OtOpcUa.Host.IntegrationTests) (push) Has been skipped
v2-ci / integration (tests/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer.IntegrationTests) (push) Has been skipped
- Topic-name drift fix: DriverHealthChanged.TopicName and
  DriverControlTopic.Name now live on the message contracts in
  Commons. AkkaDriverHealthPublisher, DriverStatusSignalRBridge,
  DriverHostActor, and AdminOperationsActor all delegate to the
  single constant so a rename can't silently desynchronise
  publisher and subscriber.
- DriverStatusPanel._opResultClearTimer switched from
  System.Timers.Timer to System.Threading.Timer + awaited
  DisposeAsync. Prevents an in-flight 8s clear-callback from
  invoking StateHasChanged on a component whose hub has already
  been released.
- PublishHealthSnapshot deduplicates against the last published
  (state, lastSuccess, lastError, errorCount) fingerprint. The
  30s heartbeat no longer floods the SignalR layer with identical
  Healthy snapshots — newly-joined clients still warm up via the
  snapshot store on JoinDriver.
2026-05-28 11:52:20 -04:00
Joseph Doherty dcd2509548 refactor(driver-pages): address post-review follow-ups
- DriverInstanceSpec carries ClusterId from the deployment artifact;
  DriverHostActor threads the real cluster identity into
  DriverInstanceActor instead of the local NodeId. Old pre-PR
  artifacts without a ClusterId field fall back to the NodeId so
  in-flight deployments keep working.
- DriverHostActor.ChildEntry holds the full DriverInstanceSpec
  (was only carrying DriverType + LastConfigJson). Restart respawns
  preserve RowId, Name, Enabled, ClusterId — no placeholder values.
- Drop the unnecessary _faultLock on DriverInstanceActor — every
  read/write site runs inside an Akka message handler which is
  single-threaded per actor instance.
- DriverStatusPanel.DisposeAsync awaits Timer.DisposeAsync so an
  in-flight 5s tick can't invoke StateHasChanged on a component
  whose hub has already been torn down.
2026-05-28 11:41:46 -04:00
Joseph Doherty 64e4726fff docs(plans): mark all 48 driver-pages tasks complete in persistence file
Records final commit hashes + notes per task. Persistence file mirrors
the 43-commit branch state so future sessions can resume from the
correct checkpoint via /superpowers-extended-cc:executing-plans.
2026-05-28 11:32:45 -04:00
Joseph Doherty 494da22cd1 test(adminui): E2E scaffolding for Test Connect + Reconnect + Status hub
- DriverTestConnectE2eTests: 3 scenarios (sim/wrong-port/black-hole)
  against the Modbus Docker fixture. Sim + wrong-port skip if fixture
  unreachable; black-hole uses ModbusDriverProbe directly (no fixture).
- DriverReconnectE2eTests: message round-trip through AdminOperationsActor
  cluster singleton — Ok=true + audit write, without live driver side effect.
- DriverStatusHubE2eTests: bridge-mocked fallback — spawns
  DriverStatusSignalRBridge in the harness ActorSystem with a mock
  IHubContext, publishes DriverHealthChanged to the driver-health DPS
  topic, asserts store upsert + hub SendAsync call.
- DockerFixtureAvailability helper: TCP-connect probe for skip guards.
- Moq 4.20.72 added to central package management for hub mocking.
- Design doc §8.3 replaced with concrete pre-ship operator runbook.
2026-05-28 11:31:12 -04:00
Joseph Doherty 063005fefa feat(adminui): DriverTagPicker modal + 9 static address builders
- DriverTagPicker shell: modal chrome + per-driver picker body
  rendered as ChildContent.
- 9 picker bodies (Modbus/AbCip/AbLegacy/S7/TwinCat/FOCAS/
  OpcUaClient/Galaxy/Historian.Wonderware). 5 have computed
  builder logic + unit tests; 4 are free-text passthroughs
  (live browse for OPC UA + Galaxy is a documented follow-up).
- Each typed driver page gets a "Pick address" button that opens
  the modal with the matching body. Picked address surfaces in
  the modal footer for manual copy — no JS interop in v1.
2026-05-28 11:21:33 -04:00
Joseph Doherty ffcc8d1065 feat(adminui): Reconnect/Restart on DriverStatusPanel (DriverOperator-gated)
- RestartDriver / ReconnectDriver messages + AdminOperationsActor
  handlers (broadcast via driver-control DPS topic; audited via
  ConfigEdits).
- DriverHostActor subscribes to driver-control; locates the
  matching child DriverInstanceActor and stops+respawns it
  (Restart) or sends it a ForceReconnect internal message
  (Reconnect — re-enters Reconnecting state without full stop).
  DriverInstanceSpec constructor call uses named args to handle
  the full 6-parameter signature.
- New DriverOperator authorization policy mapped to DriverOperator
  or FleetAdmin role; documented in docs/security.md. Map LDAP
  group via GroupToRole (e.g. "ot-driver-operator": "DriverOperator").
- DriverStatusPanel renders Reconnect + Restart buttons when the
  user holds the DriverOperator policy (hidden otherwise). Restart
  requires an in-page Razor confirm block (no JS confirm, keeps
  SignalR event loop unblocked). Both buttons show a spinner and
  are disabled during in-flight; result chip auto-clears after 8s.
  Username sourced from AuthenticationStateProvider.

Reconnect resolves to "ForceReconnect" (re-enter Reconnecting,
not full stop+respawn) — transport drops and retries while actor
and in-memory state are preserved. All DriverInstanceActor states
handle ForceReconnect safely (no-op when already in transition).
2026-05-28 11:14:04 -04:00
Joseph Doherty 4b374fd177 feat(adminui): Test Connect button on every typed driver page
- AdminProbeService routes TestDriverConnect through
  IAdminOperationsClient with a 65s outer guard (actor side already
  clamps to [1,60]).
- Added generic AskAsync<T> to IAdminOperationsClient interface and
  AdminOperationsClient impl, delegating straight to the Akka proxy.
- DriverTestConnectButton renders the button + inline result chip,
  auto-clears after 30s, disables during in-flight.
- Wired into all 9 typed driver pages directly under the
  identity section. Sources timeout from the form's
  ProbeTimeoutSeconds; sources config JSON from the form's
  current Options (operator can test BEFORE saving).
2026-05-28 11:02:49 -04:00
Joseph Doherty 54f0dbddb9 fix(drivers): align probe DriverType strings with AdminUI keys
ModbusDriverProbe.DriverType was "Modbus" but the AdminUI's
ModbusDriverPage persists DriverInstance.DriverType = "ModbusTcp".
GalaxyDriverProbe used the runtime DriverTypeName constant
("GalaxyMxGateway") but the AdminUI saves "Galaxy". The probe DI
lookup is case-insensitive but not name-insensitive, so Test
Connect would fail to find a probe for these two drivers.
2026-05-28 10:55:15 -04:00
Joseph Doherty c19d124e89 feat(drivers): TCP-connect IDriverProbe for all 9 driver types
Cheap-and-fast probe: open TCP socket to the configured endpoint,
close immediately. Surfaces SocketError on failure, latency on
success, "timed out" on caller cancel. Sufficient for the AdminUI
Test Connect "can we reach the host?" question. Richer protocol-
level probes (OPC UA session open, FOCAS handshake, gRPC ping)
are a documented follow-up. Each probe registered as
AddSingleton<IDriverProbe, X> in DriverFactoryBootstrap so they
flow through DI into AdminOperationsActor.

Historian.Wonderware returns a clean "TCP probe not applicable"
result because it communicates over a Windows named pipe, not TCP.
Also adds OpcUaClient + Historian.Wonderware.Client project
references to Host.csproj (both were missing from the driver
ItemGroup).
2026-05-28 10:53:42 -04:00
Joseph Doherty f3f328c25c feat(adminops): IDriverProbe + TestDriverConnect actor handler
- IDriverProbe abstraction in Core.Abstractions; one impl per driver
  type, resolved by DriverType string. Phase 7.3 + 7.4 add concrete
  probes for the 9 supported driver types.
- TestDriverConnect / TestDriverConnectResult messages.
- AdminOperationsActor.HandleTestDriverConnectAsync looks up the probe
  by DriverType, runs it with a [1,60]s clamped timeout, and returns
  success/latency or failure/message. Probes that throw or time out
  surface as soft failures.
2026-05-28 10:44:00 -04:00
Joseph Doherty 4584612a1a feat(adminui): DriverStatusPanel + wire into 9 typed pages
Live panel subscribed to the /hubs/driverstatus SignalR feed —
renders state chip, last-success age, 5-min error count, last
error message. Auto-reconnect; dimmed when no push arrives for 30s.
Hidden for new instances (nothing deployed yet); shown read-only
on every edit-mode page. Reconnect/Restart buttons land in Phase 8.
2026-05-28 10:29:43 -04:00
Joseph Doherty 4203b84d51 feat(runtime): publish DriverHealthChanged via DriverInstanceActor
- IDriverHealthPublisher in Core.Abstractions + NullDriverHealthPublisher
  no-op for tests/dev-stub paths.
- AkkaDriverHealthPublisher in Runtime forwards to the cluster-wide
  `driver-health` DPS topic.
- DriverInstanceActor instrumented to publish snapshots on every
  observable state change + a periodic 30s heartbeat so the AdminUI
  snapshot store warms up for newly-joined SignalR clients.
- Sliding 5-minute Faulted-count tracked per actor via Queue<DateTime>.
- DriverHostActor.SpawnChild threads clusterId (_localNode.Value) and
  the health publisher down to every DriverInstanceActor child.
- ServiceCollectionExtensions.AddOtOpcUaRuntime registers
  AkkaDriverHealthPublisher as IDriverHealthPublisher singleton.
2026-05-28 10:22:44 -04:00
Joseph Doherty 29370fde3c feat(adminui): add DriverStatusSignalRBridge + InMemory snapshot store 2026-05-28 10:13:30 -04:00
Joseph Doherty 3f23a1acd3 feat(adminui): add DriverStatusHub 2026-05-28 10:13:25 -04:00
Joseph Doherty 4d5c6ac892 feat(messages): add DriverHealthChanged DPS contract 2026-05-28 10:10:16 -04:00
Joseph Doherty c4086c243c fix(adminui): S7 typed page no longer wipes Tags on save
- S7DriverPage.FormModel now preserves Tags through Form ↔ Options
  translation (was hard-coding Tags = [] on every save, silently
  destroying any tag list that operators had configured).
- Add FormModel_RoundTrip tests for OpcUaClient and Historian
  mirror classes — both were translating Options ↔ form-model
  entirely untested.
- Surface S7 Tags in the round-trip test so this regression
  can't reach merge again.
2026-05-28 10:06:43 -04:00
Joseph Doherty a971db3ee5 refactor(adminui): retire generic DriverEdit.razor
All 9 driver types now have typed pages; DriverEditRouter dispatches
to them directly. Unknown DriverType strings (e.g. legacy rows) render
an explicit error notice instead of falling through to a generic
editor — the failure mode is now visible, not silent.
2026-05-28 09:59:25 -04:00
Joseph Doherty 5f8fa7004c feat(adminui): wire all 9 typed pages into DriverEditRouter map
DriverEditRouter now dispatches every known DriverType to its typed
page. The legacy DriverEdit fallback remains in ResolveComponentType
for forward-compatibility with as-yet-unknown driver types but is no
longer reached for any current driver.
2026-05-28 09:58:36 -04:00
Joseph Doherty 059a6218f7 feat(adminui): AbLegacy typed driver page 2026-05-28 09:57:07 -04:00
Joseph Doherty 8149739161 feat(adminui): FOCAS typed driver page
Adds FocasDriverPage.razor (route: /clusters/{id}/drivers/new/focas) with
typed sections for timeout, probe, AlarmProjection (enabled + poll interval),
HandleRecycle (enabled + interval in minutes), FixedTree (enabled + axis/
program/timer poll intervals), and read-only JSON views for Devices and Tags.
FormModel uses flat settable properties + FromOptions/ToOptions with
appropriate unit conversions (ms, minutes). Also adds
FocasDriverPageFormSerializationTests (3 tests: JSON round-trip, unknown-field
drop, FormModel round-trip covering all sub-options classes).
2026-05-28 09:56:53 -04:00
Joseph Doherty 2c16062457 feat(adminui): Historian.Wonderware typed driver page 2026-05-28 09:55:15 -04:00
Joseph Doherty dc21cbad53 feat(adminui): AbCip typed driver page 2026-05-28 09:55:13 -04:00
Joseph Doherty dfbf6793de feat(adminui): TwinCat typed driver page
Adds TwinCATDriverPage.razor (route: /clusters/{id}/drivers/new/twincat)
with typed fields for timeout, UseNativeNotifications, EnableControllerBrowse,
NotificationMaxDelayMs, probe sub-options (enabled/interval/timeout/admin
timeout), and read-only JSON views for Devices and Tags collections.
FormModel uses flat settable properties + FromOptions/ToOptions. Also adds
TwinCATDriverPageFormSerializationTests (3 tests). Fixes pre-existing
placeholder syntax error in AbCipDriverPage.razor (@raw_cpu_type in
attribute caused RZ9986).
2026-05-28 09:54:49 -04:00
Joseph Doherty a243cfd126 feat(adminui): Galaxy typed driver page 2026-05-28 09:52:31 -04:00
Joseph Doherty 5cad9b260e feat(adminui): S7 typed driver page
Adds S7DriverPage.razor (route: /clusters/{id}/drivers/new/s7) with
typed fields for host, port, CpuType InputSelect, rack, slot, timeout,
probe sub-options, and read-only JSON tag view. FormModel uses flat
settable properties and FromOptions/ToOptions round-trip; no
init-only bindings in Razor. Also adds
S7DriverPageFormSerializationTests (3 tests: JSON round-trip,
unknown-field drop, FormModel round-trip).
2026-05-28 09:52:10 -04:00
Joseph Doherty a3073d16bf feat(adminui): Modbus typed driver page 2026-05-28 09:52:01 -04:00
Joseph Doherty efcc2311e6 feat(adminui): OpcUaClient typed driver page 2026-05-28 09:50:34 -04:00
Joseph Doherty 7014c9376c feat(adminui): reference all 9 Driver.*.Contracts projects
Wires the POCO-only driver contracts into the AdminUI csproj so the
9 typed *DriverPage.razor components from Phase 4 can compile against
the real Options classes without dragging native driver deps in.
2026-05-28 09:42:12 -04:00
Joseph Doherty 27b3a014da refactor(adminui): hand /drivers routes to DriverTypePicker + DriverEditRouter
Removes both @page directives from DriverEdit.razor. The picker owns
/drivers/new; the router owns /drivers/{id} and dispatches via
DynamicComponent (currently falls back to DriverEdit for every driver
type — Phase 4 populates the type map one driver at a time).
2026-05-28 09:39:49 -04:00
Joseph Doherty 55e8bf70d9 feat(adminui): add DriverEditRouter dispatch page
Falls back to legacy DriverEdit until Phase 4 populates the type-map.
2026-05-28 09:38:35 -04:00
Joseph Doherty c0ce5d02bd feat(adminui): add DriverTypePicker landing page
Adds /clusters/{ClusterId}/drivers/new picker page (Task 3.1). Renders
a 9-card Bootstrap grid — one card per driver type — each linking to
/clusters/{ClusterId}/drivers/new/{slug}. No data fetch; type list is
hardcoded. Route collides with DriverEdit.razor's same directive; Task
3.3 removes the duplicate to resolve the runtime ambiguity.
2026-05-28 09:36:54 -04:00
Joseph Doherty a28f4cdd25 refactor(adminui): drive DriverEdit.razor through shared section components
No functional change — the identity, resilience, and save-bar are now
each in their own reusable component so the typed driver pages (Phase 4)
can share them. The middle "Driver config (JSON)" panel stays inlined
for now — it's replaced wholesale by typed forms in Phase 4.
2026-05-28 09:33:06 -04:00
Joseph Doherty a008530af6 feat(adminui): add DriverResilienceSection shared component 2026-05-28 09:29:41 -04:00
Joseph Doherty 1ff3875a19 feat(adminui): add DriverIdentitySection shared component 2026-05-28 09:28:29 -04:00
Joseph Doherty 85af126406 feat(adminui): add DriverFormShell shared component 2026-05-28 09:26:54 -04:00
Joseph Doherty f2f6eeb74e feat(drivers): expose ProbeTimeoutSeconds on every driver Options class
Adds a uniform [Range(1, 60)] ProbeTimeoutSeconds property to all 9
driver Options classes (Modbus 5s, AbCip 5s, AbLegacy 5s, S7 5s,
TwinCAT 10s, FOCAS 10s, OpcUaClient 15s, Galaxy 30s, Historian 15s).
Powers the AdminUI Test Connect button (Phase 7 of the plan).
2026-05-28 09:21:50 -04:00
Joseph Doherty 8c0a32025d refactor(driver-historian-wonderware-client): extract WonderwareHistorianClientOptions to .Contracts
Move WonderwareHistorianClientOptions to a new
Driver.Historian.Wonderware.Client.Contracts sibling project. The record
had no using directives and uses only primitive types (string, TimeSpan)
so the contracts project is dependency-free.

Convert one doc-comment reference:
  <see cref="WonderwareHistorianClient"/> → <c>WonderwareHistorianClient</c>
per the approved decision — no compilable usings were present.

The runtime Driver.Historian.Wonderware.Client project gains a
ProjectReference to .Contracts; the .slnx is updated accordingly.
2026-05-28 09:16:49 -04:00
Joseph Doherty 5ffbc42d8c refactor(driver-galaxy): extract GalaxyDriverOptions to .Contracts
Move GalaxyDriverOptions (and nested records GalaxyGatewayOptions,
GalaxyMxAccessOptions, GalaxyRepositoryOptions, GalaxyReconnectOptions)
from Config/GalaxyDriverOptions.cs into a new Driver.Galaxy.Contracts
sibling project at the contracts root (no Config/ subdirectory). The
existing namespace ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Config is preserved
unchanged — it is a runtime ABI concern and all consumers already import
it via the namespace qualifier.

No doc-comment substitutions required — the only cref in the file
(<see cref="ApiKeySecretRef"/>) is an intra-type parameter reference
that resolves within the contracts project itself.

The options file had no using directives and no NuGet type surface;
the contracts project is dependency-free. The runtime Driver.Galaxy
project gains a ProjectReference to .Contracts; the .slnx is updated
accordingly.
2026-05-28 09:15:57 -04:00
Joseph Doherty 5f0e0482ed refactor(driver-opcuaclient): extract OpcUaClientDriverOptions to .Contracts
Move OpcUaClientDriverOptions and all companion enums (OpcUaTargetNamespaceKind,
OpcUaSecurityMode, OpcUaSecurityPolicy, OpcUaAuthType) to a new
Driver.OpcUaClient.Contracts sibling project. The options file had no
using directives — all types were defined in the same file — so no
NuGet mirror enum pattern was required.

Convert two doc-comment references:
  <see cref="OpcUaClientDriver.InitializeAsync"/> → <c>OpcUaClientDriver.InitializeAsync</c>
  <see cref="OpcUaClientDriver.ValidateNamespaceKind"/> → <c>OpcUaClientDriver.ValidateNamespaceKind</c>
per the approved decision — no compilable usings were present.

The runtime Driver.OpcUaClient project gains a ProjectReference to .Contracts;
the .slnx is updated accordingly.
2026-05-28 09:14:57 -04:00
Joseph Doherty d892ab9e12 refactor(driver-focas): extract FocasDriverOptions to .Contracts
Move FocasDriverOptions (and companion option types), FocasCncSeries,
and the FocasDataType enum to a new Driver.FOCAS.Contracts sibling
project. FocasDataTypeExtensions (which uses DriverDataType from
Core.Abstractions) stays in the runtime driver as FocasDataTypeExtensions.cs.

Convert two doc-comment references:
  <see cref="FocasDriver.InitializeAsync"/> → <c>FocasDriver.InitializeAsync</c>
  <see cref="FocasAddress.TryParse"/> → <c>FocasAddress.TryParse</c>
per the approved decision — no compilable usings were present in the
moved files.

The runtime Driver.FOCAS project gains a ProjectReference to .Contracts;
the .slnx is updated accordingly.
2026-05-28 09:13:10 -04:00
Joseph Doherty 9f62f2c242 refactor(driver-s7): extract S7DriverOptions to .Contracts with parallel CpuType enum
Introduces Driver.S7.Contracts (dependency-free POCO project) and moves
S7DriverOptions / S7ProbeOptions / S7TagDefinition / S7DataType into it.
Adds S7CpuType enum mirroring S7.Net.CpuType exactly (7 values with
explicit integer codes). Runtime S7CpuTypeMap bridges S7CpuType →
S7.Net.CpuType at the single Plc construction site in S7Driver.InitializeAsync.
S7DriverFactoryExtensions and S7CommandBase updated to use S7CpuType; test
files updated to match (S7_1500Profile, S7DriverScaffoldTests). AdminUI can
now reference Driver.S7.Contracts without pulling in S7netplus.
2026-05-28 09:08:27 -04:00
Joseph Doherty a88721ce31 refactor(driver-twincat): extract TwinCATDriverOptions to .Contracts
Move TwinCATDriverOptions and TwinCATDataType enum to a new
Driver.TwinCAT.Contracts sibling project. TwinCATDataTypeExtensions
(which uses DriverDataType from Core.Abstractions) stays in the
runtime driver as TwinCATDataTypeExtensions.cs.

Replace two doc-comment references:
  <see cref="Core.Abstractions.PollGroupEngine"/> → <c>PollGroupEngine</c>
  <see cref="TwinCATAmsAddress.TryParse"/> → <c>TwinCATAmsAddress.TryParse</c>
per the approved decision — no compilable usings were present.

The runtime Driver.TwinCAT project gains a ProjectReference to .Contracts;
the .slnx is updated accordingly.
2026-05-28 09:01:28 -04:00
Joseph Doherty 4902295211 refactor(driver-ablegacy): extract AbLegacyDriverOptions to .Contracts
Move AbLegacyDriverOptions, AbLegacyDataType enum, and
AbLegacyPlcFamilyProfile (including AbLegacyPlcFamily enum) to a new
Driver.AbLegacy.Contracts sibling project. All three files are zero-dep
after splitting AbLegacyDataTypeExtensions (which uses DriverDataType
from Core.Abstractions) into a new file that stays in the runtime driver.

Drop the doc-comment <see cref="AbLegacyAddress.TryParse"/> reference and
replace with <c>AbLegacyAddress.TryParse</c> per the approved decision.
The PlcFamilies using directive is retained in the contracts project since
both namespaces live there.

The runtime Driver.AbLegacy project gains a ProjectReference to .Contracts;
the .slnx is updated accordingly.
2026-05-28 08:59:21 -04:00
Joseph Doherty b474d63335 refactor(driver-abcip): extract AbCipDriverOptions to .Contracts
Move AbCipDriverOptions (and AbCipDataType enum) to a new
Driver.AbCip.Contracts sibling project. AbCipDataTypeExtensions
(which uses DriverDataType from Core.Abstractions) stays in the
runtime driver as AbCipDataTypeExtensions.cs.

Replace two doc-comment <see cref="Core.Abstractions.IAlarmSource"/>
and <see cref="Core.Abstractions.IHostConnectivityProbe"/> with <c>X</c>
per the approved decision — no compilable using was present.

The runtime Driver.AbCip project gains a ProjectReference to .Contracts;
the .slnx is updated accordingly.
2026-05-28 08:57:36 -04:00
Joseph Doherty 5058a56645 refactor(driver-modbus): extract ModbusDriverOptions to .Contracts
Move ModbusDriverOptions (and companion option types) to a new
Driver.Modbus.Contracts sibling project. The contracts project
references only Driver.Modbus.Addressing (itself zero-dep and
Admin-safe) because ModbusDriverOptions.Probe/Family/Region
properties use enum types that live there.

Drop 'using ZB.MOM.WW.OtOpcUa.Core.Abstractions' and replace
<see cref="IHostConnectivityProbe"/> with <c>IHostConnectivityProbe</c>
per the approved decision — the using was doc-comment-only.

The runtime Driver.Modbus project gains a ProjectReference back to
.Contracts; the .slnx is updated accordingly.
2026-05-28 08:50:17 -04:00
Joseph Doherty dc12c3732e test(adminui): scaffold AdminUI.Tests project 2026-05-28 08:42:42 -04:00
Joseph Doherty c1c68c9134 docs(plans): AdminUI driver-specific pages implementation plan
48-task plan across 10 phases (Contracts split, shared sections,
router/picker, 9 typed pages, retire generic editor, live status,
Test Connect, Reconnect/Restart, address pickers, E2E). Tracked in
sibling .tasks.json with dependency graph.
2026-05-28 08:36:53 -04:00
Joseph Doherty af06648558 docs(plans): AdminUI driver-specific pages design
Replaces the generic JSON-blob DriverEdit page with typed per-driver
pages (all 9 drivers), Test Connect, live status panel with
Reconnect/Restart, and a per-driver tag/address picker. Live OPC UA +
Galaxy browse explicitly deferred to a follow-up.
2026-05-28 08:29:20 -04:00
Joseph Doherty 64e3fbe035 docs: backfill XML documentation across 756 files
v2-ci / build (push) Failing after 1m43s
v2-ci / unit-tests (tests/Core/ZB.MOM.WW.OtOpcUa.Cluster.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.ControlPlane.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Runtime.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Security.Tests) (push) Has been skipped
v2-ci / integration (tests/Server/ZB.MOM.WW.OtOpcUa.Host.IntegrationTests) (push) Has been skipped
v2-ci / integration (tests/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer.IntegrationTests) (push) Has been skipped
Adds <summary>, <param>, <typeparam>, and <inheritdoc/> tags to public
members surfaced by commentchecker — resolves 5,847 of 5,869 issues
(99.6%) across three /fixdocs passes.
2026-05-28 08:10:17 -04:00
Joseph Doherty f9fc7dd2e1 feat(host): wire UseWindowsService so sc.exe-installed service runs cleanly
v2-ci / build (push) Failing after 45s
v2-ci / unit-tests (tests/Core/ZB.MOM.WW.OtOpcUa.Cluster.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.ControlPlane.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Runtime.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Security.Tests) (push) Has been skipped
v2-ci / integration (tests/Server/ZB.MOM.WW.OtOpcUa.Host.IntegrationTests) (push) Has been skipped
v2-ci / integration (tests/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer.IntegrationTests) (push) Has been skipped
v2-e2e / e2e (push) Failing after 37s
The v2 plan's blessed install path (scripts/install/Install-Services.ps1)
registers the host via `sc.exe create binPath=...OtOpcUa.Host.exe`, but the
binary never called `UseWindowsService`. Without it, the Service Control
Manager waits ~30s for the process to call SetServiceStatus(Running) and
then kills it — the install script's design was incomplete.

Two changes:

- Host.csproj: drop the `IsOSPlatform('Windows')` condition on the
  Microsoft.Extensions.Hosting.WindowsServices package reference so the
  package is always available. The runtime helper used by
  UseWindowsService gates on WindowsServiceHelpers.IsWindowsService()
  internally, so it's a no-op when running as a console app or under
  Linux/macOS — the binary stays cross-platform-buildable.

- Program.cs: call builder.Host.UseWindowsService(options =>
  options.ServiceName = "OtOpcUaHost") immediately after CreateBuilder.
  When the host is launched by SCM, WindowsServiceLifetime takes over
  the IHostLifetime slot and reports START/STOP correctly. When launched
  by `dotnet run` or `OtOpcUa.Host.exe` from a console, it's a no-op.

Verified end-to-end on wonder-app-vd03.zmr.zimmer.com: `sc.exe create`
followed by `sc.exe start OtOpcUaHost` transitions from START_PENDING to
RUNNING; /login + /health/ready + /health/active all return 200; service
survives SSH session close and auto-starts on boot per the AUTO_START
flag set by the installer script.
2026-05-26 17:07:52 -04:00
Joseph Doherty 7dfbca6469 feat(opcua): materialise SystemPlatform tags (Galaxy) as OPC UA variables
v2-ci / build (push) Failing after 47s
v2-ci / unit-tests (tests/Core/ZB.MOM.WW.OtOpcUa.Cluster.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.ControlPlane.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Runtime.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Security.Tests) (push) Has been skipped
v2-ci / integration (tests/Server/ZB.MOM.WW.OtOpcUa.Host.IntegrationTests) (push) Has been skipped
v2-ci / integration (tests/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer.IntegrationTests) (push) Has been skipped
Closes the gap where Tag rows with EquipmentId=NULL + Namespace.Kind=SystemPlatform
(Galaxy hierarchy) existed in ConfigDb but were never surfaced in the OPC UA
address space. Now they materialise as Variable nodes under a folder named for
their FolderPath, browseable through any OPC UA client.

Layers touched:

- IOpcUaAddressSpaceSink: new EnsureVariable(nodeId, parentFolderId, displayName,
  dataType) signature on the sink interface, NullSink, DeferredSink, SdkSink.
- OtOpcUaNodeManager.EnsureVariable: creates a BaseDataVariableState parented
  under the named folder (or root), initial Value=null +
  StatusCode=BadWaitingForInitialData; resolves Tag.DataType strings to the
  matching OPC UA built-in NodeId. Idempotent.
- Phase7CompositionResult: new GalaxyTags collection of GalaxyTagPlan records
  carrying (TagId, DriverInstanceId, FolderPath, DisplayName, DataType,
  MxAccessRef). Constructor overloads keep existing call sites compiling.
- Phase7Composer.Compose: now takes Tag + Namespace inputs, filters for
  SystemPlatform-namespace tags with EquipmentId=NULL, emits GalaxyTagPlan
  rows with MXAccess ref "FolderPath.Name".
- Phase7Plan: new AddedGalaxyTags / RemovedGalaxyTags / ChangedGalaxyTags
  collections + GalaxyTagDelta record; IsEmpty + needsRebuild updated.
- Phase7Planner.Compute: diffs GalaxyTags by TagId via existing DiffById helper.
- DeploymentArtifact.ParseComposition: reads the Tags + Namespaces +
  DriverInstances arrays the ConfigComposer already emits, applies the same
  SystemPlatform filter, returns the same GalaxyTagPlan list as the composer
  so artifact-side and compose-side plans agree.
- Phase7Applier: new MaterialiseGalaxyTags pass that ensures one folder per
  distinct FolderPath then one Variable per tag. NodeId for the variable is
  "<FolderPath>.<Name>" matching the MXAccess ref so the future Galaxy
  SubscribeBulk wiring can address them directly.
- OpcUaPublishActor.RebuildAddressSpace: invokes MaterialiseGalaxyTags after
  MaterialiseHierarchy. _lastApplied initialiser updated for the new ctor.
- seed-clusters.sql: pre-existing TestMachine_001.TestAlarm001..003 rows
  needed no change — the composer/applier now picks them up automatically.

Verified end-to-end via docker-dev: deploy click → driver-a logs
"Phase7Applier: Galaxy tags materialised (tags=3, folders=1)" → OPC UA Client
CLI browses the three Variable nodes under TestMachine_001 folder. Reads
return BadWaitingForInitialData status (expected — Galaxy driver's
SubscribeBulk wiring to push values into the nodes is the remaining
follow-up).
2026-05-26 15:43:22 -04:00
Joseph Doherty 44b8a9c7ff fix(deploy): ClusterNode NodeId uses host:port + Traefik sticky cookie
v2-ci / build (push) Failing after 41s
v2-ci / unit-tests (tests/Core/ZB.MOM.WW.OtOpcUa.Cluster.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.ControlPlane.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Runtime.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Security.Tests) (push) Has been skipped
v2-ci / integration (tests/Server/ZB.MOM.WW.OtOpcUa.Host.IntegrationTests) (push) Has been skipped
v2-ci / integration (tests/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer.IntegrationTests) (push) Has been skipped
Two bring-up issues found while clicking through the operator Deploy flow
on the docker-dev stack:

- ConfigPublishCoordinator computes expected-ack NodeIds from
  Akka.Cluster.State.Members as "{host}:{port}" (e.g. "driver-a:4053") to
  match ClusterRoleInfo's NodeId derivation. The seed had been using the
  bare service name ("driver-a"), so NodeDeploymentState INSERT hit FK
  violation 547 on NodeDeploymentState.NodeId → ClusterNode.NodeId. Seed
  now writes the full host:port form for every ClusterNode row.

- Blazor Server uses SignalR (WebSocket upgrade after the initial GET).
  Without sticky sessions, Traefik round-robins admin-a/admin-b and the
  WebSocket upgrade lands on the wrong backend, returning "No Connection
  with that ID: Status code '404'" so @onclick handlers never fire on the
  client. Added sticky.cookie (otopcua_lb, SameSite=Lax) to all three
  Traefik service loadBalancers so each session pins to one node.

Verified end-to-end: clicked "Deploy current configuration" on
/deployments → Deployment row sealed in ~70ms → driver-a + driver-b
spawn GalaxyMxGateway driver (stub=False) → GalaxyDriver connects to
http://10.100.0.48:5120 with the seeded ApiKeySecretRef=env:GALAXY_MXGW_API_KEY.
2026-05-26 15:10:11 -04:00
Joseph Doherty 60beb9128e feat(deploy,runtime): wire mxaccessgw connection — endpoint, key, seed row
v2-ci / build (push) Failing after 37s
v2-ci / unit-tests (tests/Core/ZB.MOM.WW.OtOpcUa.Cluster.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.ControlPlane.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Runtime.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Security.Tests) (push) Has been skipped
v2-ci / integration (tests/Server/ZB.MOM.WW.OtOpcUa.Host.IntegrationTests) (push) Has been skipped
v2-ci / integration (tests/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer.IntegrationTests) (push) Has been skipped
User confirmed the mxaccessgw client (Galaxy driver) doesn't need Windows
— only the gateway worker has that constraint. This wires the Galaxy
driver into the docker-dev fleet:

- docker-compose.yml: GALAXY_MXGW_API_KEY env var on every host service
  (admin nodes harmlessly ignore it; driver-role nodes pick it up when
  the seeded DriverInstance resolves ApiKeySecretRef=env:GALAXY_MXGW_API_KEY).
  Default value matches the key the operator provided; override via shell
  env (GALAXY_MXGW_API_KEY=... docker compose up -d) to rotate without
  editing compose.
- seed-clusters.sql: now creates a SystemPlatform Namespace
  (MAIN-galaxy, urn:zb:docker-dev:galaxy) plus a GalaxyMxGateway
  DriverInstance (MAIN-galaxy-mxgw) in the MAIN cluster pointing at
  http://10.100.0.48:5120 with UseTls=false. Idempotent via IF NOT EXISTS.
- DriverInstanceActor.ShouldStub: clarified the doc comment — only the
  legacy "Galaxy" type name and "Historian.Wonderware" are Windows-only;
  the v2 "GalaxyMxGateway" driver is .NET 10 cross-platform (gRPC to an
  external gateway) and is NOT stubbed.
- README: documents the final operator step — sign in, click "Deploy
  current configuration" on /deployments to materialise the seeded
  Galaxy driver into a running gRPC connection. Raw DriverInstance rows
  don't spawn drivers on their own; the v2 lifecycle requires a sealed
  Deployment first.
2026-05-26 14:58:02 -04:00
Joseph Doherty 6884de9774 revert(adminui): restore 'OtOpcUa Admin' login title
v2-ci / build (push) Failing after 44s
v2-ci / unit-tests (tests/Core/ZB.MOM.WW.OtOpcUa.Cluster.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.ControlPlane.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Runtime.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Security.Tests) (push) Has been skipped
v2-ci / integration (tests/Server/ZB.MOM.WW.OtOpcUa.Host.IntegrationTests) (push) Has been skipped
v2-ci / integration (tests/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer.IntegrationTests) (push) Has been skipped
User chose to revert the MxAccess Gateway rebrand on the login card. Keep
the layout fix from c064ec1 (no panel-head top strip; inline h1.login-title)
and just put the original product name back.
2026-05-26 14:50:06 -04:00
Joseph Doherty c064ec16cf fix(security,adminui): logout redirects to /login + restyle login card
v2-ci / build (push) Failing after 41s
v2-ci / unit-tests (tests/Core/ZB.MOM.WW.OtOpcUa.Cluster.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.ControlPlane.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Runtime.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Security.Tests) (push) Has been skipped
v2-ci / integration (tests/Server/ZB.MOM.WW.OtOpcUa.Host.IntegrationTests) (push) Has been skipped
v2-ci / integration (tests/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer.IntegrationTests) (push) Has been skipped
Two small UX fixes:

- AuthEndpoints.LogoutAsync now redirects browser callers to /login after
  SignOutAsync instead of returning 204 NoContent. 204 was correct for the
  REST contract but left browsers stuck on the page they came from (the
  cookie was cleared but no navigation happened, so "Sign out" appeared
  to do nothing). API callers can still opt into the status-only behavior
  by sending `Accept: application/json`.

- Login.razor drops the .panel-head top strip; the sign-in card now reads
  as a self-contained form with an inline title "MxAccess Gateway Admin —
  sign in". Added a .login-title CSS class to site.css that matches the
  panel-head's typographic weight without the bar.
2026-05-26 14:47:53 -04:00
Joseph Doherty ed1c17bc7b fix(deploy,host): docker-dev bring-up — anon health probes, robust seeder
v2-ci / build (push) Failing after 32s
v2-ci / unit-tests (tests/Core/ZB.MOM.WW.OtOpcUa.Cluster.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.ControlPlane.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Runtime.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Security.Tests) (push) Has been skipped
v2-ci / integration (tests/Server/ZB.MOM.WW.OtOpcUa.Host.IntegrationTests) (push) Has been skipped
v2-ci / integration (tests/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer.IntegrationTests) (push) Has been skipped
Two fixes surfaced while bringing up the docker-dev stack end-to-end:

- HealthEndpoints.MapOtOpcUaHealth now calls .AllowAnonymous() on /health/ready,
  /health/active, /healthz. Without it the AddOtOpcUaAuth fallback policy 401s
  every probe and Traefik marks every backend unhealthy → all three cluster
  routes return 503.

- cluster-seed entrypoint no longer attempts to apply Migrate-To-V2.sql via
  sqlcmd. The EF-generated idempotent script puts CREATE PROCEDURE inside
  IF NOT EXISTS BEGIN ... END blocks (procs must be first in their batch),
  so sqlcmd fails with "Must declare the scalar variable @FromGenerationId".
  EF's own runner handles this; sqlcmd doesn't. The seed now just waits for
  the schema and applies row inserts. Migrations remain the operator's job:
      dotnet ef database update --project src/Core/.../Configuration \
                                --startup-project src/Server/.../Host

Also: LDAP service removed (bitnami/openldap:2.6 image retired, legacy tag
crashes mid-setup with exit 68); every host now runs with
Authentication__Ldap__DevStubMode=true. Bumped LDAP+Traefik dashboard host
ports to avoid collisions with the sister scadalink dev stack (3893→3894,
8080→8089).

Confirmed working end-to-end: all three Traefik routes return HTTP 200,
cluster-seed populates ServerCluster (MAIN/SITE-A/SITE-B) + ClusterNode
(driver-a/b, site-a-1/2, site-b-1/2) rows on first boot.
2026-05-26 14:37:01 -04:00
Joseph Doherty 1e64488c0d Merge branch 'v2-gap-closeout' — close audit gaps + dev-UX polish
v2-ci / build (push) Failing after 47s
v2-ci / unit-tests (tests/Core/ZB.MOM.WW.OtOpcUa.Cluster.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.ControlPlane.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Runtime.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Security.Tests) (push) Has been skipped
v2-ci / integration (tests/Server/ZB.MOM.WW.OtOpcUa.Host.IntegrationTests) (push) Has been skipped
v2-ci / integration (tests/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer.IntegrationTests) (push) Has been skipped
Closes the four gaps from the 2026-05-26 hosting-alignment audit and
adds the supporting dev infrastructure that surfaced while smoke-testing
the fused Host:

Audit-gap closure:
  - feat(host): per-role appsettings overlays for admin / driver / admin-driver
  - feat(opcua): OpcUaApplicationHost.PeerApplicationUris populates Server.ServerArray
    via IServerInternal.ServerUris.Append; unit test + new OpcUaServer.IntegrationTests
    project carrying DualEndpointTests (real OPCFoundation client reads both peer URIs)
  - refactor(test): rename FailoverScenarioTests → FailoverDuringDeployTests
  - chore(cleanup): remove stale bin/obj shells for deleted v1 Server/Admin projects
  - ci(v2): integration matrix now runs both Host.IntegrationTests and
    OpcUaServer.IntegrationTests

Doc audit + refresh:
  - 3 commits rewriting stale paths and adding v2 architecture coverage across
    Redundancy / ServiceHosting / Cluster / OpcUaServer / security / Architecture-v2
    / v2-release-readiness / phase-7-status / README and 7 driver-touched doc files

Dev-UX (surfaced while smoke-testing in Chrome):
  - fix(host,security): UseStaticWebAssets, MapStaticAssets().AllowAnonymous,
    AddCascadingAuthenticationState, ILdapAuthService Scoped→Singleton,
    /auth/login Content-Type dispatch + DisableAntiforgery, real LdapOptions.DevStubMode
  - feat(adminui): ScadaLink-style sidebar — drop the top app-bar, brand in side rail,
    collapsible NavSection sections with cookie state (otopcua_nav), new LoginLayout
    (no rail), NavSidebar as the interactive island so MainLayout stays static-rendered
  - fix(adminui): refresh stale F9 stub copy on /alerts page

docker-dev deployment:
  - feat(deploy): add site-a + site-b 2-node clusters (fused admin+driver) — three
    isolated Akka meshes (disjoint seed lists) sharing the single OtOpcUa ConfigDb;
    Traefik routes via Host(`site-a.localhost`) / Host(`site-b.localhost`)
  - feat(deploy): one-shot cluster-seed Compose service applies an idempotent SQL
    seed (3 ServerCluster rows + 6 ClusterNode rows) so operators don't have to
    pre-populate via the Admin UI on every fresh bring-up

19 commits, all conventional-commits format. Branch was pushed and reviewed on
gitea before the merge.
2026-05-26 14:08:29 -04:00
Joseph Doherty f02071c9a2 feat(deploy): bake the ServerCluster/ClusterNode seed into docker-compose
Adds a one-shot cluster-seed service to docker-dev/docker-compose.yml
that pre-populates the three Akka clusters' scope rows in the shared
OtOpcUa ConfigDb so operators don't have to click through /clusters +
/hosts on every fresh bring-up.

Seed contents:
  ServerCluster   MAIN (Warm/2), SITE-A (Warm/2), SITE-B (Warm/2)
  ClusterNode     driver-a + driver-b  → MAIN
                  site-a-1 + site-a-2  → SITE-A
                  site-b-1 + site-b-2  → SITE-B

NodeCount + RedundancyMode honour the CK_ServerCluster check constraint.
ApplicationUri follows the urn:OtOpcUa:<NodeId> convention; uniqueness
across the fleet satisfies UX_ClusterNode_ApplicationUri.

Mechanism:
  - docker-dev/seed/seed-clusters.sql — idempotent INSERTs (IF NOT EXISTS
    guards on every row).
  - docker-dev/seed/entrypoint.sh — bash wrapper that waits for SQL to
    accept connections, then polls until dbo.ServerCluster exists (the
    host containers' EF auto-migration creates it on first boot), then
    applies the SQL script.
  - cluster-seed service uses mcr.microsoft.com/mssql-tools as the base
    image (bash + sqlcmd available), restart: "no" so it runs once.

Re-running `docker compose up` is safe: the seed exits cleanly on the
second run because every INSERT is guarded.

Manual re-seed: `docker compose run --rm cluster-seed`.
2026-05-26 14:06:47 -04:00
Joseph Doherty 993e012e55 fix(deploy): site clusters share the single OtOpcUa ConfigDb
The previous commit (961e094) gave each site cluster its own database
(OtOpcUa_SiteA / OtOpcUa_SiteB). That fights the architecture — ConfigDb
is multi-tenant by design: one schema with a ServerCluster table whose
rows scope the rest of the configuration via ClusterId. Per-cluster
databases would split the schema and force every singleton/coordinator
to point at a different connection string.

Correct model: one ConfigDb, three ServerCluster rows (MAIN / SITE-A /
SITE-B), each Akka cluster's ClusterNode rows pointing back at the
matching ClusterId. Akka mesh isolation is still enforced by the
disjoint seed-node lists (unchanged from the previous commit).

Compose: all eight host nodes now point at Server=sql,1433;Database=OtOpcUa
and the README documents the post-boot ServerCluster + ClusterNode rows
operators need to create via /clusters and /hosts before the runtime can
resolve its scope.
2026-05-26 14:02:24 -04:00
Joseph Doherty 961e09430a feat(deploy): add site-a + site-b 2-node clusters to docker-dev
Extends the docker-dev compose with two additional, fully-isolated Akka
clusters representing distinct sites. Each site is a 2-node fused
admin+driver cluster (OTOPCUA_ROLES=admin,driver on both nodes), backed
by its own ConfigDb database so configuration state stays separate from
the main cluster and from the other site.

Cluster isolation: the three meshes share the same Akka system name
"otopcua" and remoting port 4053 (inside each container's own network
namespace), but their seed-node lists are disjoint — main seeds at
admin-a, site-a seeds at site-a-1, site-b seeds at site-b-1 — so gossip
doesn't cross between them.

Layout:
  Main cluster   ConfigDb=OtOpcUa        admin-a, admin-b, driver-a, driver-b
  Site A         ConfigDb=OtOpcUa_SiteA  site-a-1, site-a-2 (fused admin+driver)
  Site B         ConfigDb=OtOpcUa_SiteB  site-b-1, site-b-2 (fused admin+driver)

OPC UA endpoints exposed on host ports 4840-4845. Admin UIs reachable
through Traefik via Host-header routing:
  http://localhost               → main cluster (PathPrefix default)
  http://site-a.localhost        → site A
  http://site-b.localhost        → site B

`*.localhost` auto-resolves on macOS; Linux users add the two hosts to
/etc/hosts (or rely on the resolver's RFC 6761 behaviour).
2026-05-26 13:59:23 -04:00
Joseph Doherty a1a7646b33 fix(adminui): refresh stale F9 stub copy on /alerts page
ScriptedAlarmActor (Runtime/ScriptedAlarms) shipped a while back — the
"Engine wiring (F9 ScriptedAlarmActor) is pending" stub message was
misleading. Also drop the matching "(F9)" / "(future)" parentheticals
in the intro panel and frame the empty state as a current-window
condition, not a missing feature.
2026-05-26 13:53:09 -04:00
Joseph Doherty e4d0d82f7f feat(adminui): collapsible nav sidebar with cookie state + LoginLayout
Port the ScadaLink CentralUI sidebar pattern into the OtOpcUa AdminUI:

- Drop the top app-bar. Brand moves into the side rail's header — same
  visual rhythm as ScadaLink's NavMenu.
- New NavSection.razor: collapsible eyebrow toggle (rail-eyebrow-toggle CSS)
  with a chevron + label. Mirrors ScadaLink/Components/Layout/NavSection.
- New NavSidebar.razor: interactive island carrying the three section
  groups (Navigation / Scripting / Live) + session block. Marked
  @rendermode InteractiveServer; MainLayout itself stays static-rendered
  because layouts can't take a RenderFragment Body across an interactive
  boundary.
- New wwwroot/js/nav-state.js: window.navState.get/.set persists the
  expanded-section list to the otopcua_nav cookie (one-year lifetime,
  SameSite=Lax). Same shape as ScadaLink's scadabridge_nav.
- New LoginLayout.razor + @layout LoginLayout on Login.razor: the login
  page now renders without the side rail — clean centred card.
- MainLayout.razor: slimmed down to the d-flex shell + hamburger toggle +
  <NavSidebar/> + @Body.
- Login.razor: also drops the trailing "LDAP bind against the configured
  directory..." footer that the user asked to remove.
- site.css: adds .side-rail .brand styles (mirrored from ScadaLink) and
  the .rail-eyebrow-toggle / .rail-eyebrow-chevron / .rail-section-body
  styles for the new collapsible UI.

Auto-expand on page load: NavSidebar seeds the expanded set from the
current URL's first path segment (in OnInitialized so it works even on
the very first server render) and from the cookie (in OnAfterRenderAsync
once JS interop is available). LocationChanged hooks keep the expanded
state in sync as the user navigates between sections.
2026-05-26 13:48:35 -04:00
Joseph Doherty 2915755a7c fix(host,security): wire static assets, DI lifetimes, form login, dev-stub LDAP
Six interlocking fixes surfaced while smoke-testing the fused Host in a browser:

- Host/Program.cs: UseStaticWebAssets() opts into the RCL static-asset pipeline
  in any environment (auto-only in Development), MapStaticAssets().AllowAnonymous()
  exempts CSS/JS from the AddOtOpcUaAuth fallback policy, and
  AddCascadingAuthenticationState() lets <AuthorizeView/> work inside interactive
  components (NavSidebar's session block).
- Security/ServiceCollectionExtensions: ILdapAuthService Scoped → Singleton —
  consumed by the Singleton LdapOpcUaUserAuthenticator on driver-role nodes.
  Crash only surfaced in Development (ValidateOnBuild=true).
- Security/Endpoints/AuthEndpoints: /auth/login now dispatches on Content-Type —
  application/json keeps the original 204/401/503 contract for tests, and
  application/x-www-form-urlencoded (the browser <form>) gets a redirect dance.
  DisableAntiforgery on the login endpoint (it's the entry point, no prior session)
  and AllowAnonymous to override the fallback policy.
- Security/Ldap/LdapOptions + LdapAuthService: real DevStubMode property; when
  true the auth service bypasses the LDAP bind and returns a FleetAdmin role so
  dev/test can navigate the full Admin UI without GLAuth running.
- AdminUI/EndpointRouteBuilderExtensions: doc-comment update about static-asset
  flow (the actual MapStaticAssets call lives in Host/Program.cs).
2026-05-26 13:48:18 -04:00
Joseph Doherty a5c6ce279e docs(v2): finish path corrections in phase-7-status, admin-ui, OpcUaClient fixture 2026-05-26 12:09:47 -04:00
Joseph Doherty 59b3d9f295 docs: rewrite stale src/Server/Server|Admin/ paths to v2 project locations 2026-05-26 12:06:59 -04:00
Joseph Doherty 89095c15e3 docs(v2): update for gap-closeout — peer-URI discovery, role overlays, release status 2026-05-26 11:58:06 -04:00
Joseph Doherty bdae749b2b docs(plans): mark gap-closeout tasks complete 2026-05-26 11:48:05 -04:00
Joseph Doherty e8c4f18607 ci(v2): include OpcUaServer.IntegrationTests in integration matrix 2026-05-26 11:42:44 -04:00
Joseph Doherty cb936db7d6 fix(opcua): PopulateServerArray writes IServerInternal.ServerUris so clients see peers 2026-05-26 11:39:44 -04:00
Joseph Doherty a5412c16a3 fix(test): align DualEndpointTests SDK to 1.5.374.126 + sync API 2026-05-26 11:34:01 -04:00
Joseph Doherty dce2528c68 test(opcua): DualEndpointTests — real client reads peer URIs from Server.ServerArray 2026-05-26 11:29:53 -04:00
Joseph Doherty 83eda9e826 test(opcua): scaffold OtOpcUa.OpcUaServer.IntegrationTests project 2026-05-26 11:23:21 -04:00
Joseph Doherty 70ffd2849d feat(opcua): OpcUaApplicationHost publishes peer URIs in Server.ServerArray 2026-05-26 11:21:11 -04:00
Joseph Doherty 898a47746d feat(host): add per-role appsettings overlays for admin/driver/admin-driver 2026-05-26 11:19:10 -04:00
Joseph Doherty 25ce111981 refactor(test): rename FailoverScenarioTests → FailoverDuringDeployTests for plan parity 2026-05-26 11:18:13 -04:00
Joseph Doherty 7209bc99e2 docs(plans): gap-closeout plan + task persistence file 2026-05-26 11:15:59 -04:00
Joseph Doherty 2c49f18442 Merge branch 'v2-akka-fuse' — Akka + fused-host v2 architecture
v2-ci / build (push) Failing after 44s
v2-ci / unit-tests (tests/Core/ZB.MOM.WW.OtOpcUa.Cluster.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.ControlPlane.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Runtime.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Security.Tests) (push) Has been skipped
v2-ci / integration (push) Has been skipped
129 commits implementing the v2 plan in full plus every load-bearing
follow-up. v2-akka-fuse is feature-complete and 210 tests green at
05a0596.

Architecture
- Single fused-host process (OtOpcUa.Host) replacing the v1 multi-process
  Server + Admin + Galaxy.Host split. Roles (admin, driver, historian)
  gate which Akka actors + ASP.NET surfaces wire up at boot.
- Akka.NET cluster (DistributedPubSub for fleet topics) with singleton
  AdminOperationsActor + ConfigPublishCoordinator on admin-role nodes;
  DriverHostActor + per-driver DriverInstanceActor + VirtualTagActor +
  ScriptedAlarmActor + OpcUaPublishActor on driver-role nodes.
- New AdminUI Razor class library (~42 pages, single-page edit-or-create
  + RowVersion concurrency) replaces the 47 legacy admin pages.

Production data path (end-to-end)
- ControlPlane composes deployment artifact → DistributedPubSub dispatch
  → DriverHostActor reconciles drivers → DriverInstanceActor binds real
  IDriver instances (read/subscribe/write) → AttributeValueUpdate flows
  to OpcUaPublishActor → SDK NodeManager writes visible to OPC UA
  clients with proper UNS Area/Line/Equipment folder hierarchy.

Security
- OPC UA transport: None / Basic256Sha256-Sign / SignAndEncrypt all
  exposed; auto-accept-untrusted-cert option for dev.
- LDAP-bound UserName auth via ImpersonateUser handler (same
  ILdapAuthService as Admin cookie/JWT).
- Cert auto-creation in PKI tree on first start.

Observability
- OtOpcUaTelemetry Meter + ActivitySource; 6 counters + histogram + 2
  spans across deploy / driver-lifecycle / virtual-tag-eval / alarm-
  transition / sink-write / service-level paths. Prometheus exporter
  mounted at /metrics.

Engines (production)
- RoslynVirtualTagEvaluator + RoslynScriptedAlarmEvaluator: compile
  user-script bodies through Core.Scripting sandbox, cache per
  expression, surface failures as Failure results without throwing.

Redundancy
- ServiceLevel through SdkServiceLevelPublisher → ServerObject.Service
  Level so clients see the real role-derived byte (240 primary-leader,
  100 secondary).

Tests
- 210 v2 tests across Cluster (15), ControlPlane (29), Runtime (74),
  Security (27), OpcUaServer (48), Host.IntegrationTests (26). Plus
  2-node integration harness covering deploy + failover scenarios.

See docs/plans/2026-05-26-akka-hosting-alignment-plan.md for the full
task list (66/66) and docs/plans/2026-05-26-akka-hosting-alignment-
design.md for the design.
2026-05-26 11:00:53 -04:00
Joseph Doherty 05a0596fb1 feat(host): F9b RoslynScriptedAlarmEvaluator + #107 close engine DI
v2-ci / build (push) Failing after 39s
v2-ci / unit-tests (tests/Core/ZB.MOM.WW.OtOpcUa.Cluster.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.ControlPlane.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Runtime.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Security.Tests) (push) Has been skipped
v2-ci / integration (push) Has been skipped
RoslynScriptedAlarmEvaluator mirrors F8b's pattern for alarm predicates:
caches a compiled ScriptEvaluator<AlarmPredicateContext, bool> per unique
predicate, runs against the dependency dictionary with a 2s timeout, and
turns every failure (compile error, sandbox violation, runtime throw,
ctx.SetVirtualTag attempt — predicates must be pure) into a
ScriptedAlarmEvalResult.Failure. ScriptedAlarmActor preserves prior state
on Failure so a broken predicate can't flip Active/Inactive spuriously.

Program.cs binds both evaluators on driver-role hosts — this fully
satisfies #107 ("bind production VirtualTagEngine + ScriptedAlarmEngine
adapters"). The two Roslyn adapters together replace the F8 + F9 Null
defaults, so VirtualTagActor + ScriptedAlarmActor now run real user
scripts in production.

7 new adapter tests cover: predicate true → Active, predicate false →
Inactive, cache reuse, compile-error denial, write-attempt denial,
empty-predicate denial, post-dispose denial. Host.IntegrationTests now
17/17 green.

Closes #80 + #107. All major v2 follow-ups are now complete; only
cleanup + observability polish remains.
2026-05-26 10:58:04 -04:00
Joseph Doherty 219d10a22d feat(host): F8b RoslynVirtualTagEvaluator — production virtual-tag eval
RoslynVirtualTagEvaluator wraps Core.Scripting.ScriptEvaluator + Core
.VirtualTags.VirtualTagContext into a single-tag IVirtualTagEvaluator
adapter. Caches the compiled ScriptEvaluator per unique expression so
the second-and-onwards Evaluate is an in-process method call against the
dependency dictionary. Compile/sandbox/runtime errors all surface as
VirtualTagEvalResult.Failure rather than propagating exceptions through
the VirtualTagActor message loop.

Single-tag scope: cross-tag ctx.SetVirtualTag writes are dropped + logged
because fan-out between actors is owned by DependencyMuxActor. Cycle
detection + cascade ordering stay in Core.VirtualTags.VirtualTagEngine
where they belong (loaded fleet-wide); this adapter keeps the actor
message handler simple.

Host adds Core.Scripting + Core.VirtualTags project refs, plus a
TargetWarningsAsErrors NU1608 suppression — Microsoft.CodeAnalysis.CSharp
.Scripting 4.12.0 pins Common to 4.12.0 but ASP.NET Core transitively
brings Microsoft.CodeAnalysis.Common 5.0.0; the surface we use is stable
across the drift (verified by Core.Scripting.Tests).

Program.cs binds RoslynVirtualTagEvaluator → IVirtualTagEvaluator on
driver-role hosts, replacing the F8-default NullVirtualTagEvaluator so
VirtualTagActor evaluates real user scripts at runtime.

6 new adapter tests cover: simple expression sums, cache reuse across
calls, compile-error denial, runtime-throw denial, empty-expression
denial, post-dispose denial. Host.IntegrationTests now 10/10 green.

Closes #79. F9b + #107 next.
2026-05-26 10:55:56 -04:00
Joseph Doherty 607dc51dec feat(opcua): #85 UNS Area/Line/Equipment folder hierarchy in SDK
v2-ci / build (push) Failing after 42s
v2-ci / unit-tests (tests/Core/ZB.MOM.WW.OtOpcUa.Cluster.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.ControlPlane.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Runtime.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Security.Tests) (push) Has been skipped
v2-ci / integration (push) Has been skipped
Phase7Composer now carries UnsAreaProjection + UnsLineProjection lists so
the applier can materialise the full UNS topology in the OPC UA address
space. New IOpcUaAddressSpaceSink.EnsureFolder(folderNodeId, parentNodeId,
displayName) seam (no-op default, recorded in tests, forwarded by
DeferredAddressSpaceSink, implemented by SdkAddressSpaceSink). The SDK-
side OtOpcUaNodeManager gains an EnsureFolder API that creates
FolderState nodes with proper parent linkage; RebuildAddressSpace now
clears folders too so re-applies don't accumulate stale topology.

Phase7Applier.MaterialiseHierarchy walks composition.UnsAreas →
composition.UnsLines → composition.EquipmentNodes, calling EnsureFolder
with the correct parent at each level. Idempotent — calling twice with
the same composition is a no-op. OpcUaPublishActor.HandleRebuild invokes
it after Phase7Applier.Apply so OPC UA clients browsing the server now
see Area/Line/Equipment as proper folders rather than flat tag ids.

DeploymentArtifact.ParseComposition reads UnsAreas + UnsLines from the
JSON snapshot the ControlPlane emits, populating the new fields when
present.

Phase7Composer.Compose now accepts UnsAreas + UnsLines; a 3-arg overload
preserves the old signature for legacy callers + existing tests. The
Phase7CompositionResult convenience ctor likewise keeps the planner
tests working without UNS data.

3 new hierarchy tests (pure unit + boot-verify against a real
OtOpcUaSdkServer); OpcUaServer suite is 48/48 green (was 45, +3),
Runtime 74/74 unchanged.

Closes #85.
2026-05-26 10:48:56 -04:00
Joseph Doherty 9d86287d08 test(opcua): Task 60 ServiceLevel end-to-end through SDK
v2-ci / build (push) Failing after 49s
v2-ci / unit-tests (tests/Core/ZB.MOM.WW.OtOpcUa.Cluster.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.ControlPlane.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Runtime.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Security.Tests) (push) Has been skipped
v2-ci / integration (push) Has been skipped
Boots a real StandardServer + OpcUaApplicationHost, wires
SdkServiceLevelPublisher into a DeferredServiceLevelPublisher (production
binding pattern), spawns OpcUaPublishActor against the deferred
publisher, sends RedundancyStateChanged snapshots, and asserts that
ServerObject.ServiceLevel.Value reflects the role-derived byte:

  Primary + RoleLeaderForDriver  → 240
  Secondary                      → 100

Together with the F13b endpoint-security tests (which already verify
ServerConfiguration.SecurityPolicies populates the three baseline
profiles), this closes Task 60's "dual-endpoint + ServiceLevel" scope.
Cross-node failover tests stay in the 2-node integration harness
(Task 59 / FailoverScenarioTests).

Runtime suite now 74 / 74 green (+2). Closes Task 60.
2026-05-26 10:40:58 -04:00
Joseph Doherty 2697af31d1 feat(opcua,host): #81 ServiceLevel SDK publisher
SdkServiceLevelPublisher writes Server.ServiceLevel through the SDK's
ServerObjectState — the standard OPC UA non-transparent-redundancy signal
clients use to pick a primary. Writes are guarded by DiagnosticsLock so
concurrent SDK diagnostics scans don't fight with our updates.

DeferredServiceLevelPublisher mirrors the DeferredAddressSpaceSink late-
binding pattern: Akka actors resolve IServiceLevelPublisher at construction,
hosted service swaps the SDK publisher in after StandardServer.Start. Host
Program.cs registers DeferredServiceLevelPublisher as the singleton bound
to IServiceLevelPublisher; OtOpcUaServerHostedService gets it injected and
fills it once IServerInternal is available.

Tests boot a real StandardServer on a free port (cross-platform), call
Publish, then verify ServerObject.ServiceLevel.Value reflects the write.
5 new tests; OpcUaServer suite now 45/45 green (was 40, +5).

Closes #81 residual. Unblocks Task 60 (OPC UA dual-endpoint + ServiceLevel
tests).
2026-05-26 10:37:42 -04:00
Joseph Doherty 52997ee164 feat(observability): F13d Prometheus + OpenTelemetry instrumentation
v2-ci / build (push) Failing after 38s
v2-ci / unit-tests (tests/Core/ZB.MOM.WW.OtOpcUa.Cluster.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.ControlPlane.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Runtime.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Security.Tests) (push) Has been skipped
v2-ci / integration (push) Has been skipped
OtOpcUaTelemetry (Commons/Observability) centralizes the project's Meter
+ ActivitySource so all instrumentation points emit through a single
named surface. Counters cover the hot paths:

  otopcua.deploy.applied               (outcome=ack|reject)
  otopcua.deploy.apply.duration        (s, histogram)
  otopcua.driver.lifecycle             (event=spawn|spawn_stub|stop|fault)
  otopcua.virtualtag.eval              (outcome=ok|fail|skip)
  otopcua.scriptedalarm.transition     (state=activated|acknowledged|cleared)
  otopcua.opcua.sink.write             (kind=value|alarm|rebuild)
  otopcua.redundancy.service_level_change (level=byte)

Plus two ActivitySource spans:

  otopcua.deploy.apply                 wraps DriverHostActor.ApplyAndAck
  otopcua.opcua.address_space_rebuild  wraps OpcUaPublishActor.HandleRebuild

Instruments are no-op until a listener attaches, so tests + dev hosts
pay nothing for unread telemetry.

Host Program.cs gains AddOtOpcUaObservability() (binds the OtOpcUa Meter
+ ActivitySource to OpenTelemetry, attaches a Prometheus exporter) and
MapOtOpcUaMetrics() (mounts /metrics scrape endpoint). Driver-side
internals + ASP.NET request metrics deliberately stay off — the scrape
payload is scoped to OtOpcUa signals only.

Tests use MeterListener + ActivityListener to verify
VirtualTagActor.eval, OpcUaPublishActor.AttributeValueUpdate, and
RebuildAddressSpace actually emit on the central instruments. Runtime
suite is 72 / 72 green (+3).

Closes #105. Path A (F13b/c/d) complete; next batch options: #85 UNS
folder hierarchy in SDK, or F8b/F9b production engine bindings.
2026-05-26 10:29:40 -04:00
Joseph Doherty 21eac21409 feat(opcua,host): F13c LDAP-bound UserName validator
Adds IOpcUaUserAuthenticator seam in OpcUaServer.Security with a deny-all
NullOpcUaUserAuthenticator default. OpcUaApplicationHost subscribes to
SessionManager.ImpersonateUser after _application.Start so UserName tokens
flow through the authenticator and either attach a UserIdentity to the
session (Allow) or set IdentityValidationError = BadIdentityTokenRejected
(Deny / authenticator exception). Anonymous + X509 tokens fall through to
SDK defaults.

LdapOpcUaUserAuthenticator (Host project) bridges to the same
ILdapAuthService that AddOtOpcUaAuth uses for Admin cookies / JWT, so a
single LDAP source-of-truth governs both Admin control plane and OPC UA
data plane. Program.cs registers LdapOptions + LdapAuthService +
IOpcUaUserAuthenticator on driver-role hosts; admin-only nodes are
unchanged.

OtOpcUaServerHostedService threads the resolved authenticator into
OpcUaApplicationHost so the seam respects Host DI.

10 new tests: 6 in OpcUaServer.Tests cover the pure HandleImpersonation
static method (success / denial / anonymous fallthrough / authenticator-
throw / null-username / Null authenticator); 4 in Host.IntegrationTests
cover the LdapOpcUaUserAuthenticator adapter (LDAP allow → Allow with
roles, LDAP deny → Deny, exception → backend-error denial, display-name
fallback). OpcUaServer suite is 40 / 40 green.

Closes #104. Unblocks Task 60 (dual-endpoint + ServiceLevel tests) once
#81 residual lands.
2026-05-26 10:21:37 -04:00
Joseph Doherty 8b08566f41 feat(opcua): F13b endpoint security profiles — Sign + SignAndEncrypt
OpcUaApplicationHost.BuildConfigurationAsync now populates
ServerConfiguration.SecurityPolicies + UserTokenPolicies from the new
OpcUaSecurityProfile enum on OpcUaApplicationHostOptions. Defaults expose
all three baseline profiles (None + Basic256Sha256-Sign +
Basic256Sha256-SignAndEncrypt) matching docs/security.md. UserName tokens
are SDK-encrypted with the server cert so they work on None endpoints too;
F13c will plug the LDAP validator into SessionManager.

AutoAcceptUntrustedClientCertificates surfaces as an option for dev flows;
production keeps the default (false) and operators promote rejected certs
through the Admin UI.

InternalsVisibleTo added so BuildSecurityPolicies / BuildUserTokenPolicies
stay encapsulated but unit-testable. 6 new tests cover the pure builders +
two boot-verify cases (3-profile default + hardened single-profile),
bringing the suite to 34 / 34 passing.

Closes #103. Unblocks #104 (F13c LDAP user-token validator).
2026-05-26 10:15:04 -04:00
Joseph Doherty 50787823d3 feat(host,runtime): #108 Host DI bindings — OPC UA server + deferred sink
v2-ci / build (push) Failing after 45s
v2-ci / unit-tests (tests/Core/ZB.MOM.WW.OtOpcUa.Cluster.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.ControlPlane.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Runtime.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Security.Tests) (push) Has been skipped
v2-ci / integration (push) Has been skipped
Wires the OPC UA SDK into the fused Host's lifecycle on driver-role
nodes + spawns OpcUaPublishActor with the proper sink/publisher/dbFactory/
applier resolution. The full read+write data path is now live in
production: Deploy → DriverHost → OpcUaPublish → SDK NodeManager →
subscribed OPC UA clients.

DeferredAddressSpaceSink (Commons.OpcUa):
  - Thread-safe wrapper IOpcUaAddressSpaceSink that delegates to an
    inner sink swapped in at runtime. Needed because Akka actors
    resolve the sink at construction time, but the production sink
    (SdkAddressSpaceSink wrapping OtOpcUaNodeManager) only exists
    after the SDK StandardServer has started.
  - Defaults to NullOpcUaAddressSpaceSink so calls before swap are
    safe; SetSink(null) reverts (for graceful shutdown).

OtOpcUaServerHostedService (Host.OpcUa):
  - IHostedService that owns the OPC UA SDK lifecycle. Reads
    OpcUaApplicationHostOptions from the 'OpcUa' config section,
    creates an OtOpcUaSdkServer, boots it through OpcUaApplicationHost,
    then swaps a real SdkAddressSpaceSink into the DeferredAddressSpaceSink
    singleton.
  - SDK boot failure is logged + non-fatal — the rest of the host
    (admin UI, driver actors) keeps running. Stop reverts to null sink.

WithOtOpcUaRuntimeActors (Runtime):
  - Now spawns OpcUaPublishActor (new actor) + threads its ActorRef
    into DriverHostActor's Props so successful applies trigger the
    address-space rebuild pipeline.
  - Phase7Applier is constructed here from the resolved sink + a
    logger; OpcUaPublishActor takes both.
  - Prepends the opcua-synchronized-dispatcher HOCON so the extension
    is self-contained — consumers (Host, tests) don't need to redeclare
    the dispatcher block.
  - New OpcUaPublishActorKey + OpcUaPublishActorName for actor-registry
    resolution.
  - AddOtOpcUaRuntime now also TryAddSingleton's NullOpcUaAddressSpaceSink
    + NullServiceLevelPublisher so admin-only nodes (or tests that
    don't bind the Deferred sink) stay safe.

Host.Program.cs (driver-role only):
  - Binds DeferredAddressSpaceSink as singleton + as IOpcUaAddressSpaceSink
  - AddHostedService<OtOpcUaServerHostedService>()

Tests: OpcUaServer 24 -> 28 (+4 DeferredAddressSpaceSink unit tests),
Runtime 69 -> 69 (existing ServiceCollectionExtensionsTests extended
to verify the new mux + publish actor registration).

All 6 v2 test suites green: 177 tests passing.

Closes #108. Engine-wiring is now production-bound end-to-end on
driver-role nodes — Deploy reaches real OPC UA Variable nodes that
subscribed clients see.
2026-05-26 10:02:15 -04:00
Joseph Doherty 7e22e2250c feat(runtime): #109 OpcUaPublishActor — load artifact, compose, plan-diff, apply
v2-ci / build (push) Failing after 45s
v2-ci / unit-tests (tests/Core/ZB.MOM.WW.OtOpcUa.Cluster.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.ControlPlane.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Runtime.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Security.Tests) (push) Has been skipped
v2-ci / integration (push) Has been skipped
Closes the loop between F10b (SDK NodeManager) and F14 (Phase7Plan +
Phase7Applier). DriverHostActor's successful apply now triggers a
RebuildAddressSpace on the publish actor, which loads the latest
deployment artifact + walks composer → planner → applier through the
sink. The OPC UA address space tracks the deployed composition.

DeploymentArtifact:
  - New ParseComposition(blob) → Phase7CompositionResult that decodes
    Equipment + DriverInstance + ScriptedAlarm arrays into the
    projection records Phase7Planner consumes. Pascal-case property
    names mirror ConfigComposer.SnapshotAndFlattenAsync's output.
  - Each entity reader is tolerant: missing-id rows are dropped,
    natural-key sort matches Phase7Composer's contract.

OpcUaPublishActor:
  - New Props params: dbFactory + applier. When wired, RebuildAddressSpace
    does:
      1. LoadLatestArtifact (most recent Sealed Deployment.ArtifactBlob)
      2. ParseComposition → Phase7CompositionResult
      3. Phase7Planner.Compute(lastApplied, next) → Phase7Plan
      4. Empty plan ⇒ no-op (deploy of unchanged composition is benign)
      5. applier.Apply(plan) drives sink.RebuildAddressSpace +
         WriteAlarmState for removed nodes
      6. lastApplied = next so the next rebuild diffs forward
  - Without dbFactory/applier wiring, falls back to raw
    sink.RebuildAddressSpace — the dev/Mac path before #108 binds prod.

DriverHostActor:
  - New Props param opcUaPublishActor (IActorRef?). After successful
    ApplyAndAck (status Applied, ACK sent), tells the publish actor
    RebuildAddressSpace with the same correlation id so the audit trail
    threads through. Null publish actor ⇒ no trigger (admin-only nodes).

Tests: Runtime 63 -> 69 (+6):
- ParseComposition reads Equipment/Driver/Alarm sorted by natural key
- ParseComposition returns empty for empty blob
- Rebuild with dbFactory + sealed deployment artifact triggers exactly
  one sink.Rebuild call (Equipment topology added)
- Rebuild with no artifact is idempotent no-op
- Second rebuild with same composition is empty-plan no-op
- Rebuild without dbFactory falls back to raw sink.Rebuild (legacy path)

All 6 v2 test suites green: 173 tests passing.

Closes #109. Engine-wiring data flow is now end-to-end through:
  Deploy → DriverHostActor.ApplyAndAck → driver spawn + ACK +
    RebuildAddressSpace → OpcUaPublishActor → Phase7Applier → SDK
    NodeManager → subscribed OPC UA clients see the change.
2026-05-26 09:55:11 -04:00
Joseph Doherty d21f6947e1 feat(opcua): F10b SDK NodeManager binding — real OPC UA address-space writes
v2-ci / build (push) Failing after 38s
v2-ci / unit-tests (tests/Core/ZB.MOM.WW.OtOpcUa.Cluster.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.ControlPlane.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Runtime.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Security.Tests) (push) Has been skipped
v2-ci / integration (push) Has been skipped
OtOpcUaNodeManager + SdkAddressSpaceSink: the v2 IOpcUaAddressSpaceSink
seam now has a production adapter against a real Opc.Ua.Server
CustomNodeManager2. Writes through OpcUaPublishActor's sink materialise
as real OPC UA Variable updates that subscribed clients see via the
standard ClearChangeMasks notification path.

OtOpcUaNodeManager (CustomNodeManager2):
  - Owns a ConcurrentDictionary<string, BaseDataVariableState> under a
    single namespace (https://zb.com/otopcua/ns) hung off Objects/.
  - WriteValue lazy-creates the variable on first write, sets Value +
    StatusCode (mapped from OpcUaQuality severity bits) + SourceTimestamp,
    then ClearChangeMasks to notify subscribers.
  - WriteAlarmState surfaces a [active, acknowledged] pair on a
    dedicated node id — full AlarmConditionState/event firing comes
    with #85 F14b (EquipmentNodeWalker SDK integration).
  - RebuildAddressSpace tears down every registered variable + clears
    the dictionary so the next write-pass starts fresh.
  - Address-space root folder is materialised in CreateAddressSpace.

SdkAddressSpaceSink: thin IOpcUaAddressSpaceSink → OtOpcUaNodeManager
bridge. Production DI binding (#108) constructs this once the host's
StandardServer has booted.

OtOpcUaSdkServer (StandardServer subclass): overrides
CreateMasterNodeManager to inject OtOpcUaNodeManager via the
MasterNodeManager additionalManagers ctor. NodeManager property
exposes the live instance so OpcUaApplicationHost callers can wrap
it in a sink.

Tests: OpcUaServer 20 -> 24 (+4):
- WriteValue creates + updates variables in the manager
- WriteAlarmState creates a node distinct from value writes
- RebuildAddressSpace clears everything; subsequent writes start fresh
- NullOpcUaAddressSpaceSink no-op sanity

Each test boots a real OpcUaApplicationHost on a free port with the
SDK certificate auto-create flow (F13a) intact — full integration
slice on macOS.

All 6 v2 test suites green: 167 tests passing.

F10 status updated to reflect SDK binding shipped. Residuals:
- #109 OpcUaPublishActor.RebuildAddressSpace → Phase7Applier wiring
- #108 Host DI default to SdkAddressSpaceSink when hasDriver
- #85 F14b EquipmentNodeWalker integration (proper AlarmConditionState
  + folder hierarchy)
- IServiceLevelPublisher SDK binding (writes Server.ServiceLevel node)
2026-05-26 09:49:44 -04:00
Joseph Doherty 7fa863f6da feat(runtime): #113 DependencyMuxActor — drivers → virtual-tag fan-out
v2-ci / build (push) Failing after 36s
v2-ci / unit-tests (tests/Core/ZB.MOM.WW.OtOpcUa.Cluster.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.ControlPlane.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Runtime.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Security.Tests) (push) Has been skipped
v2-ci / integration (push) Has been skipped
End-to-end data path is now wired on the read side: driver subscriptions
fire AttributeValuePublished → DriverHostActor → DependencyMuxActor →
DependencyValueChanged to every interested VirtualTagActor. Previously
the publish hit a dead-letter at the host.

DependencyMuxActor:
  - Per-node fan-out router. Maintains tagRef → Set<IActorRef> with a
    reverse subscriber → refs index so unregister/replace are O(refs).
  - Watches subscribers; Terminated triggers automatic unregister so
    dead virtual-tag actors stop receiving publishes.
  - Re-register replaces the prior interest set — no stale-ref leaks
    on actor restart.
  - Drops publishes for refs with no interested subscribers.

VirtualTagActor:
  - New Props params: dependencyRefs + mux ActorRef.
  - PreStart sends RegisterInterest to the mux; PostStop sends
    UnregisterInterest. Default both null so older callers stay quiet.

DriverHostActor:
  - New dependencyMux Props param. Steady + Applying states now
    receive AttributeValuePublished from their DriverInstance children
    and forward to the mux. Null mux is a no-op (dev/Mac).

ServiceCollectionExtensions:
  - WithOtOpcUaRuntimeActors spawns DependencyMuxActor before
    DriverHostActor and threads its ActorRef into the host's Props.
    New DependencyMuxActorKey + DependencyMuxActorName.

Tests: Runtime 57 -> 63 (+6):
- Mux forwards to only subscribers interested in each ref
- Publish for unregistered ref is dropped silently
- Unregister stops forwarding
- Re-register replaces prior interest set
- VirtualTagActor PreStart registration drives end-to-end eval
  (uses AwaitAssert to race-safely settle the PreStart Tell)
- DriverHostActor forwards AttributeValuePublished through to mux

All 6 v2 test suites green: 163 tests passing.

F8 (#79) state updated — dep subscribe seam shipped, Core.VirtualTags
production engine binding (compile + ITagUpstreamSource subscribe) is
the residual.
2026-05-26 09:43:06 -04:00
Joseph Doherty f427dc4f26 feat(runtime): #112 ScriptedAlarmActor state persistence via IAlarmActorStateStore
v2-ci / build (push) Failing after 42s
v2-ci / unit-tests (tests/Core/ZB.MOM.WW.OtOpcUa.Cluster.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.ControlPlane.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Runtime.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Security.Tests) (push) Has been skipped
v2-ci / integration (push) Has been skipped
ScriptedAlarmActor now survives actor restart: PreStart loads from
the configured store + restores in-memory state; every Transition()
fires a fire-and-forget save. ActiveState still re-derives from the
evaluator on first tick (Phase 7 decision #14), but Acked state +
lastAckUser persist verbatim so operators don't re-ack across an
outage.

Three pieces:
- IAlarmActorStateStore seam in Commons.Engines, with the
  AlarmActorStateSnapshot record (alarmId / state / lastTransitionUtc
  / lastAckUser) and NullAlarmActorStateStore default.
- EfAlarmActorStateStore in Runtime.ScriptedAlarms — production
  adapter over the existing ScriptedAlarmState table in ConfigDb.
  Maps the actor's 3-state enum to the table's AckedState column
  (Active⇒Unacknowledged, Acknowledged⇒Acknowledged, Inactive⇒
  Acknowledged). Concurrency conflicts are logged + dropped — the
  next transition writes again.
- ScriptedAlarmActor PreStart load (async, piped back as
  StateRestored) + Transition save. New Props overload takes the
  store; default is NullAlarmActorStateStore so tests stay quiet.

Tests: Runtime 52 -> 57 (+5):
- Transition writes Active then Acknowledged snapshots with
  lastAckUser populated
- PreStart with persisted Active state restores so a subsequent
  AcknowledgeAlarm fires (not ignored as it would be from Inactive)
- Empty store boots Inactive (AcknowledgeAlarm correctly ignored)
- EfAlarmActorStateStore Save + Load round-trips via in-memory EF
- Load for unknown alarmId returns null

All 6 v2 test suites green: 157 tests passing.

Closes #112. F9 (#80) remaining residual is predicate binding to
Core.ScriptedAlarms.ScriptedAlarmEngine — split as F9b in tasks JSON.
2026-05-26 09:34:37 -04:00
Joseph Doherty 3e3f7588bd feat(runtime,host): close F7 — driver subscribe + write paths + Host DI
v2-ci / build (push) Failing after 42s
v2-ci / unit-tests (tests/Core/ZB.MOM.WW.OtOpcUa.Cluster.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.ControlPlane.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Runtime.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Security.Tests) (push) Has been skipped
v2-ci / integration (push) Has been skipped
Three pieces landed in one batch, closing F7-residual + Host DI #106:

Runtime/DriverInstanceActor:
  - Subscribe / Unsubscribe message contracts; the Connected state
    handles them via IDriver.ISubscribable. On every OnDataChange
    event the actor publishes AttributeValuePublished to its parent
    (DriverHostActor → OpcUaPublishActor). OPC UA StatusCode is
    mapped to the 3-state OpcUaQuality enum via severity bits
    (00=Good, 01=Uncertain, 10/11=Bad).
  - DetachSubscription tears the handler off the driver on
    DisconnectObserved, Unsubscribe, and PostStop so a stale handler
    never pushes to a dead actor.
  - WriteAttribute now dispatches IWritable.WriteAsync (batch of one)
    with a 5s CancellationTokenSource; status-code propagated to
    WriteAttributeResult on non-Good results.

Host:
  - New ProjectReferences to Core + every cross-platform driver
    assembly (AbCip/AbLegacy/FOCAS/Galaxy/Modbus/S7/TwinCAT).
    Galaxy is net10 (gRPC client to mxaccessgw); the COM-bound net48
    Wonderware Historian driver stays out of the Host's reference
    closure — its .Client gRPC wrapper is what binds for historian
    needs.
  - New DriverFactoryBootstrap.AddOtOpcUaDriverFactories() registers
    a singleton DriverFactoryRegistry, invokes each driver's
    Register(registry, loggerFactory), and binds IDriverFactory to
    DriverFactoryRegistryAdapter. Replaces the F7 NullDriverFactory
    default so deploys actually materialise real IDriver instances
    on driver-role nodes. ShouldStub() still gates per-platform
    behaviour at spawn time.
  - Program.cs wires AddOtOpcUaDriverFactories() before AddAkka so
    the runtime extension can resolve IDriverFactory from DI.

Tests: Runtime 46 -> 52 (+6):
- Write returns success when StatusCode = Good
- Write propagates non-Good status code in failure Reason
- Subscribe forwards OnDataChange to parent as AttributeValuePublished
- Quality translation: Uncertain (0x40...) and Bad (0x80...)
- Subscribe against non-ISubscribable returns failure
- DisconnectObserved detaches handler so late events are dropped

All 6 v2 test suites green: 152 tests passing.

Closes F7. F7-residual sub-tasks #110 (subscribe) and #111 (write)
both shipped. Host DI binding #106 shipped.
2026-05-26 09:28:34 -04:00
Joseph Doherty c02f016f1d feat(opcua): F14 Phase7Plan + Phase7Applier
v2-ci / build (push) Failing after 34s
v2-ci / unit-tests (tests/Core/ZB.MOM.WW.OtOpcUa.Cluster.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.ControlPlane.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Runtime.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Security.Tests) (push) Has been skipped
v2-ci / integration (push) Has been skipped
Splits the side-effecting half of Phase7Composer (deferred at Task 47)
into two pieces that mirror DriverHostActor's spawn-plan pattern:

Phase7Plan + Phase7Planner.Compute (pure):
  Diff two Phase7CompositionResult snapshots by stable id (EquipmentId,
  DriverInstanceId, ScriptedAlarmId). Emits Added/Removed/Changed lists
  per entity class. Added/Removed are sorted by id for deterministic
  apply order. Changed wraps both Previous and Current projections so
  consumers can decide between in-place mutation and tear-down +
  rebuild.

Phase7Applier (side-effecting):
  Drives an IOpcUaAddressSpaceSink against a plan. Removed equipment/
  alarms get an inactive AlarmState write per id; Added/Removed of
  Equipment or ScriptedAlarm triggers RebuildAddressSpace. Driver-only
  changes correctly skip the rebuild — those flow through DriverHost-
  Actor's spawn-plan in Runtime. Sink exceptions are caught + logged so
  one bad node doesn't abort the apply.

Tests: OpcUaServer 6 -> 20 (+14):
- Phase7PlannerTests x9 (empty-in/empty-out, add/remove/change per
  entity class, mixed changes, deterministic ordering)
- Phase7ApplierTests x5 (empty plan no-op, removal writes inactive
  states + rebuild, added equipment triggers rebuild, driver-only
  skips rebuild, sink fault is non-fatal)

The remaining piece is the EquipmentNodeWalker integration against a
real SDK NodeManager — split as F14b, gated on F10b's SDK builder.

All 6 v2 test suites green: 146 tests passing.
2026-05-26 09:16:08 -04:00
Joseph Doherty a1325299ce feat(runtime): F10 OpcUaPublishActor sink seams + redundancy-driven ServiceLevel
OpcUaPublishActor now routes through pluggable seams instead of just
incrementing a counter:

- IOpcUaAddressSpaceSink (Commons.OpcUa) — WriteValue / WriteAlarmState
  / RebuildAddressSpace. OpcUaQuality enum moved here from the actor's
  nested type so producers don't have to reference the actor itself.
- IServiceLevelPublisher — Publish(byte). NullServiceLevelPublisher
  retains the last level for inspection.
- The actor subscribes to the redundancy-state DPS topic in PreStart
  and maps the local node's NodeRedundancyState to a coarse
  ServiceLevel (Primary+leader=240, Primary=200, Secondary=100,
  Detached=0). This keeps the local SDK's ServiceLevel node honest
  without round-tripping back through the admin-singleton calculator.
- ServiceLevelChanged dedupes identical levels so the SDK doesn't see
  redundant writes.
- Sink + publisher exceptions are caught and logged; the actor never
  crashes its own dispatcher.
- PropsForTests gets optional sink/publisher/localNode params and
  skips the DPS subscribe so unit tests stay on a vanilla TestKit
  cluster.

Production binding to a real SDK NodeManager + Variable nodes is the
remaining residual — split as F10b. Task 60 still blocked on F10b.

Tests: Runtime 40 -> 46 (+6):
- AttributeValueUpdate routes to sink
- AlarmStateUpdate routes to sink
- RebuildAddressSpace calls sink.Rebuild
- ServiceLevelChanged dedupes
- RedundancyStateChanged for primary-leader publishes 240
- RedundancyStateChanged for secondary publishes 100

All 6 v2 test suites green: 132 tests passing.
2026-05-26 09:10:55 -04:00
Joseph Doherty 14fb2b05ed feat(runtime): F8/F9 engine evaluator seams + DPS fan-out
VirtualTagActor and ScriptedAlarmActor now route through pluggable
evaluator interfaces and fan out to the cluster's live-tail topics
shipped in F15.3:

- IVirtualTagEvaluator + NullVirtualTagEvaluator in Commons.Engines.
  VirtualTagActor calls evaluator on every DependencyValueChanged,
  dedupes unchanged values, forwards EvaluationResult to its parent,
  and publishes ScriptLogEntry Warning to the script-logs DPS topic
  whenever the evaluator fails.

- IScriptedAlarmEvaluator + NullScriptedAlarmEvaluator. ScriptedAlarmActor
  takes an AlarmConfig (id/name/equipment-path/severity/predicate) and
  publishes both an AlarmTransitionEvent (alerts topic) and a
  ScriptLogEntry (script-logs topic) at every transition. Manual
  ConditionMet/Acknowledge/Cleared still flow through the same
  Transition() so callers without engine bindings still drive the
  state machine; the legacy single-string Props() overload routes
  through a default AlarmConfig.

The Null* defaults keep the actors safe when no engine is bound —
unconfigured nodes never spuriously alarm. Production binding to
Core.VirtualTags.VirtualTagEngine and Core.ScriptedAlarms is the
remaining residual (F8b/F9b — split in tasks JSON).

Tests: Runtime 34 -> 40 (+6):
- VirtualTagActorTests x3 (evaluator drives EvaluationResult,
  unchanged-value dedup, failure publishes Warning ScriptLogEntry)
- ScriptedAlarmActorTests x3 (engine threshold drives Activated +
  Cleared on alerts topic, manual Acknowledge attribution).

All 6 v2 test suites green: 126 tests passing.
2026-05-26 09:05:04 -04:00
Joseph Doherty da141497f8 feat(runtime): F7 spawn lifecycle + F20 ShouldStub gate
DriverHostActor.ApplyAndAck now reads the deployment artifact and
reconciles its set of DriverInstanceActor children — spawn the missing,
ApplyDelta to those with changed config, stop the removed/disabled.
The diff lives in pure DriverSpawnPlanner so it can be unit-tested
without an ActorSystem.

Adds IDriverFactory in Core.Abstractions (consumed by Runtime) +
DriverFactoryRegistryAdapter in Core.Hosting that wraps the existing
v1 DriverFactoryRegistry — Runtime stays decoupled from Polly/Serilog,
the Host wires the adapter once driver assemblies have registered.

ShouldStub(type, roles) is now actually called on every spawn — Galaxy
+ Wonderware-Historian boot stubbed on macOS/Linux or whenever the host
carries the dev role. Missing factory ⇒ stub fallback, never a crash.

Tests: 24 → 34 in Runtime (+10):
- DriverSpawnPlannerTests x7 (diff cases, type change ⇒ stop+respawn)
- DeploymentArtifactTests  x5 (empty/malformed/missing fields tolerant)
- DriverHostActorReconcileTests x4 (spawn count, stub fallback,
  ShouldStub gate, second-apply stops the removed)
All 6 v2 test suites green: 120 tests passing.

Closes F20 (ShouldStub wired). F7 marked partial — subscription
publishing + write path still stubbed in DriverInstanceActor itself.
2026-05-26 08:57:16 -04:00
Joseph Doherty 9892ceae9a docs(plans): mark F15.3 complete — F15 fully shipped
v2-ci / build (push) Failing after 42s
v2-ci / unit-tests (tests/Core/ZB.MOM.WW.OtOpcUa.Cluster.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.ControlPlane.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Runtime.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Security.Tests) (push) Has been skipped
v2-ci / integration (push) Has been skipped
2026-05-26 08:39:47 -04:00
Joseph Doherty 59858129cb feat(adminui): F15.3 closes F15 — live alerts/script-log, CSV import, Monaco editor
v2-ci / unit-tests (tests/Core/ZB.MOM.WW.OtOpcUa.Cluster.Tests) (push) Has been cancelled
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.ControlPlane.Tests) (push) Has been cancelled
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer.Tests) (push) Has been cancelled
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Runtime.Tests) (push) Has been cancelled
v2-ci / build (push) Has been cancelled
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Security.Tests) (push) Has been cancelled
v2-ci / integration (push) Has been cancelled
Final F15 batch wires up the SignalR-backed live pages, ports the bulk
equipment importer, and progressively enhances the Script source editor
with Monaco.

Message contracts:
- Commons.Messages.Alerts.AlarmTransitionEvent — fires on every alarm
  state transition; published on the `alerts` DPS topic by future
  ScriptedAlarmActor (F9) emits.
- Commons.Messages.Logging.ScriptLogEntry — one log line emitted by a
  hosted script; published on the `script-logs` DPS topic by future
  VirtualTagActor (F8) + ScriptedAlarmActor (F9) emits.
  (Folder named "Logging" to dodge .gitignore's "logs/" rule.)

SignalR plumbing:
- AlertHub gains MethodName + bridge actor (AlertSignalRBridge)
- ScriptLogHub introduced; ScriptLogSignalRBridge follows the same
  DPS-subscribe → IHubContext fan-out pattern as FleetStatusSignalRBridge
- WithOtOpcUaSignalRBridges now spawns all three bridges
- MapOtOpcUaHubs maps /hubs/script-log alongside the existing hubs

Pages:
- /alerts                      live alarm tail, 200-row capacity
- /script-log                  live script-log tail with level + script
                               filter, 500-row capacity
- /clusters/{id}/equipment/import — CSV bulk Equipment add with preview
                                    (Name/MachineCode/UnsLineId/Driver +
                                    optional ZTag/SAPID/Manufacturer/Model;
                                    skips rows whose MachineCode already
                                    exists in the fleet)
- ScriptEdit progressively enhanced with Monaco editor via JSInterop —
  the textarea remains Blazor's source of truth and Monaco syncs into it
  on every keystroke so @bind keeps working; falls back gracefully if
  the CDN is unreachable.

MainLayout nav gains a "Live" section (Deployments, Alerts, Alarms
historian) and a "Scripts" link under Scripting. ClusterEquipment
surfaces the new Import CSV button.

Tally: F15 ships ~42 razor pages + 3 SignalR hubs + 3 bridge actors.
Microsoft.AspNetCore.SignalR.Client added (was already in central PM).

All 104 v2 tests remain green.
2026-05-26 08:39:17 -04:00
Joseph Doherty e248e037e7 docs(plans): mark F15 complete — read views + live-edit CRUD
v2-ci / build (push) Failing after 39s
v2-ci / unit-tests (tests/Core/ZB.MOM.WW.OtOpcUa.Cluster.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.ControlPlane.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Runtime.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Security.Tests) (push) Has been skipped
v2-ci / integration (push) Has been skipped
2026-05-26 08:28:13 -04:00
Joseph Doherty ae980aef5d feat(adminui): F15.2 batch 4 — closes live-edit forms (Acl/VirtualTag/ScriptedAlarm/Script)
v2-ci / unit-tests (tests/Core/ZB.MOM.WW.OtOpcUa.Cluster.Tests) (push) Has been cancelled
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.ControlPlane.Tests) (push) Has been cancelled
v2-ci / build (push) Has been cancelled
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer.Tests) (push) Has been cancelled
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Runtime.Tests) (push) Has been cancelled
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Security.Tests) (push) Has been cancelled
v2-ci / integration (push) Has been cancelled
Final batch of F15.2. After this commit every entity surfaced by the
Phase A-D read views has a matching new/edit/delete form.

- AclEdit.razor                /clusters/{id}/acls/{new|aclId}
  - NodePermissions [Flags] enum surfaced as per-bit checkboxes plus
    one-click bundle buttons (ReadOnly / Operator / Engineer / Admin)
  - ScopeKind select + ScopeId free-text target (null = cluster-wide)
- VirtualTagEdit.razor         /virtual-tags/{new|virtualTagId}
  - Trigger validation: enforces at least one of ChangeTriggered or
    TimerIntervalMs is set
- ScriptedAlarmEdit.razor      /scripted-alarms/{new|scriptedAlarmId}
  - AlarmType select with OPC UA Part 9 subtypes
  - MessageTemplate is a textarea (template tokens are server-resolved)
- ScriptEdit.razor             /scripts/{new|scriptId}
  - SHA-256 hash computed from SourceCode on save (operator never sees
    or edits SourceHash directly)
  - InputTextArea now; Monaco syntax editor is a future enhancement

List pages (ClusterAcls / VirtualTags / ScriptedAlarms / Scripts) all
gain New + per-row Edit affordances.

Tally: F15.2 shipped CRUD for 11 entities — Cluster, ClusterNode,
UnsArea, UnsLine, Namespace, DriverInstance, Equipment, Tag, NodeAcl,
VirtualTag, ScriptedAlarm, Script.

All 9 integration tests still green.
2026-05-26 08:27:56 -04:00
Joseph Doherty 2662ac08e4 feat(adminui): F15.2 batch 3 — Equipment + Tag CRUD (operator surfaces)
v2-ci / build (push) Failing after 44s
v2-ci / unit-tests (tests/Core/ZB.MOM.WW.OtOpcUa.Cluster.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.ControlPlane.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Runtime.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Security.Tests) (push) Has been skipped
v2-ci / integration (push) Has been skipped
The two most-edited entities for daily operator workflows. Both follow the
same single-page edit-or-create pattern from batches 1 + 2 with RowVersion
optimistic concurrency.

- EquipmentEdit.razor   /clusters/{id}/equipment/{new|EquipmentId}
  - EquipmentId is system-generated on create (decision #125): EQ-{first
    12 hex chars of a new EquipmentUuid}.
  - UNS line + driver instance selects are scoped to the cluster.
  - All 9 OPC 40010 identification fields surfaced as an optional panel.
  - MachineCode uniqueness checked client-side before EF unique index
    enforces it server-side.
- TagEdit.razor         /clusters/{id}/tags/{new|TagId}
  - Equipment vs FolderPath input switches based on the selected
    driver's namespace kind — Equipment-kind requires EquipmentId,
    SystemPlatform-kind requires FolderPath (decision #110 invariant
    enforced client-side; sp_ValidateDraft re-enforces server-side at
    deploy).
  - DataType select uses the OPC UA built-in primitive type names.
  - TagConfig validated as JSON pre-flight.

ClusterEquipment + ClusterTags list pages get New / Edit affordances.

All 9 integration tests still green.
2026-05-26 08:22:51 -04:00
Joseph Doherty 45740578c9 feat(adminui): F15.2 batch 2 — topology entity CRUD
v2-ci / build (push) Failing after 52s
v2-ci / unit-tests (tests/Core/ZB.MOM.WW.OtOpcUa.Cluster.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.ControlPlane.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Runtime.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Security.Tests) (push) Has been skipped
v2-ci / integration (push) Has been skipped
Same single-page edit-or-create pattern as batch 1, applied to the
foundational topology entities. After this batch the whole hierarchy
(cluster → nodes → UNS areas → UNS lines → namespaces → drivers) is
fully editable through the UI.

- ClusterEdit.razor                  /clusters/{id}/edit
  Update + delete for an existing cluster. NodeCount stays coupled to
  RedundancyMode (None→1, Warm/Hot→2). ModifiedBy taken from
  AuthenticationStateProvider.
- NodeEdit.razor                     /clusters/{id}/nodes/{new|nodeId}
  Full ClusterNode CRUD. ApplicationUri uniqueness is enforced by EF
  index; ServiceLevelBase defaults to 200 (primary preference) on
  create; per-node DriverConfigOverridesJson validated as JSON.
- UnsAreaEdit.razor                  /clusters/{id}/uns/areas/{new|id}
- UnsLineEdit.razor                  /clusters/{id}/uns/lines/{new|id}
  UNS structure CRUD; Lines pick their parent Area from a select that
  loads the cluster's areas.

List pages updated:
- ClusterOverview now shows an "Edit cluster" button + a "New node"
  action on the nodes panel + per-row Edit buttons.
- ClusterUns gains New/Edit affordances for both Areas and Lines.

All 9 integration tests still green; no regressions.
2026-05-26 08:18:49 -04:00
Joseph Doherty 5ae67a48ba feat(adminui): F15.2 batch 1 — Namespace + DriverInstance live-edit CRUD
v2-ci / build (push) Failing after 34s
v2-ci / unit-tests (tests/Core/ZB.MOM.WW.OtOpcUa.Cluster.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.ControlPlane.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Runtime.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Security.Tests) (push) Has been skipped
v2-ci / integration (push) Has been skipped
Pattern proof for the live-edit forms gated by Phases A–D's read views.
Each entity gets a single edit page handling both create (route param
omitted) and update (route param present) modes, with RowVersion-based
optimistic concurrency checked against EF Core's
DbUpdateConcurrencyException.

Pattern:
- @page "/clusters/{id}/<thing>/new"
- @page "/clusters/{id}/<thing>/{rowId}"
- IsNew computed from rowId presence
- EditForm + DataAnnotations validation
- byte[] RowVersion stashed on FormModel; assigned to
  Entry(e).Property(e => e.RowVersion).OriginalValue before SaveChanges
- Delete button (edit mode only) flows through the same RowVersion check
- Concurrency conflict surfaces as an inline error panel; user reloads

This batch:
- NamespaceEdit.razor          — small entity, validates the pattern
- DriverEdit.razor             — keystone for everything downstream
                                 (Equipment/Tag/VirtualTag/ScriptedAlarm),
                                 JSON config editor per Q1 with reformat
                                 on save and validation pre-flight
- ClusterNamespaces row gains an Edit button + New action
- ClusterDrivers expanded view gains an Edit button + New action

Equipment/UnsArea/UnsLine/Tag/ACL/VirtualTag/ScriptedAlarm/Script forms
follow this same template in subsequent F15.2 batches.

All 9 integration tests still green; no v2 test regressions.
2026-05-26 08:14:36 -04:00
Joseph Doherty d055cb059e docs(plans): mark F15 partial — Phases A–D shipped
v2-ci / build (push) Failing after 34s
v2-ci / unit-tests (tests/Core/ZB.MOM.WW.OtOpcUa.Cluster.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.ControlPlane.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Runtime.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Security.Tests) (push) Has been skipped
v2-ci / integration (push) Has been skipped
2026-05-26 08:02:02 -04:00
Joseph Doherty 74161f9460 feat(adminui): F15 Phase D — logic + ops pages
v2-ci / unit-tests (tests/Core/ZB.MOM.WW.OtOpcUa.Cluster.Tests) (push) Has been cancelled
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.ControlPlane.Tests) (push) Has been cancelled
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer.Tests) (push) Has been cancelled
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Runtime.Tests) (push) Has been cancelled
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Security.Tests) (push) Has been cancelled
v2-ci / integration (push) Has been cancelled
v2-ci / build (push) Has been cancelled
- ClusterAudit (/clusters/{id}/audit) — reads ConfigAuditLog with the
  EventId/CorrelationId columns added in F3; shown as a Cluster tab
- VirtualTags (/virtual-tags)            — fleet-wide read view
- ScriptedAlarms (/scripted-alarms)      — fleet-wide read view
- Scripts (/scripts)                     — fleet-wide; expandable code preview
- RoleGrants (/role-grants)              — per Q4, surfaces the fleet-wide
                                           LDAP-group → role mapping from
                                           Authentication:Ldap:GroupToRole
                                           (read-only; reload via host restart)
- Certificates (/certificates)           — own/trusted/issuer/rejected store
                                           contents resolved against
                                           OpcUa:PkiStoreRoot config (F13a)
- Reservations (/reservations)           — ExternalIdReservation table
- AlarmsHistorian (/alarms-historian)    — live HistorianAdapterActor sink
                                           status via the F11 GetStatus query;
                                           5s polling

ScriptLog deferred (needs the F16-deferred ScriptLogHub bridge).
ClusterNav extended with the Audit tab.

Adds an AdminUI → Runtime project reference so the historian status page can
inject IRequiredActor<HistorianAdapterActorKey>. NuGet audit suppression for
the transitive Opc.Ua.Core advisory mirrored from the Runtime project.

All 104 v2 tests still green.
2026-05-26 08:01:23 -04:00
Joseph Doherty 396052a126 feat(adminui): F15 Phase C — config-tab read views (Equipment/UNS/Namespaces/Drivers/Tags/ACLs)
v2-ci / build (push) Failing after 38s
v2-ci / unit-tests (tests/Core/ZB.MOM.WW.OtOpcUa.Cluster.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.ControlPlane.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Runtime.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Security.Tests) (push) Has been skipped
v2-ci / integration (push) Has been skipped
Per Q3 of the rebuild plan, each v1 ClusterDetail tab becomes a separate
route under /clusters/{id}/<tab>. This batch adds read-only table views
for the six core config entity types; live-edit forms with RowVersion
concurrency land in Phase C.2 once the read-view shape is reviewed.

- ClusterEquipment    /clusters/{id}/equipment   — joins via DriverInstance
                                                   so the cluster scope works
- ClusterUns          /clusters/{id}/uns         — Areas + Lines tables
- ClusterNamespaces   /clusters/{id}/namespaces  — Kind + URI + Enabled chip
- ClusterDrivers      /clusters/{id}/drivers     — collapsed list with JSON
                                                   config expandable per Q1
                                                   (typed editors deferred)
- ClusterTags         /clusters/{id}/tags        — first 200 by name + filter
- ClusterAcls         /clusters/{id}/acls        — LDAP group + scope +
                                                   NodePermissions bits

Shared ClusterNav.razor extracted; ClusterOverview + ClusterRedundancy
updated to use it. _Imports.razor adds Components.Shared so the shared
nav is in scope across pages.
2026-05-26 07:56:39 -04:00
Joseph Doherty fd0cc4dfdb feat(adminui): F15 Phase B — cluster CRUD + Overview/Redundancy routes
v2-ci / build (push) Failing after 38s
v2-ci / unit-tests (tests/Core/ZB.MOM.WW.OtOpcUa.Cluster.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.ControlPlane.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Runtime.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Security.Tests) (push) Has been skipped
v2-ci / integration (push) Has been skipped
- ClustersList (/clusters) — table view, row-click opens detail
- NewCluster (/clusters/new) — EditForm with DataAnnotations; redundancy
  mode + node-count coupling enforced client-side (None→1, Warm/Hot→2);
  CreatedBy taken from AuthenticationStateProvider
- ClusterOverview (/clusters/{id}) — cluster details + last-deployment
  badge + node list. Per Q3, the legacy 10-tab monolith is split into
  separate routes; this page hosts the Overview "tab" as its primary slot
- ClusterRedundancy (/clusters/{id}/redundancy) — static ServiceLevelBase
  config view; live ServiceLevel comes via RedundancyStateActor DPS topic
  (deferred to its own follow-up once the SignalR bridge lands)

The other 8 v1 cluster tabs (Equipment, UNS, Namespaces, Drivers, Tags,
ACLs, ScriptedAlarms, Scripts, Audit) land in Phase C/D.
2026-05-26 07:52:41 -04:00
Joseph Doherty 850d6774ea feat(adminui): F15 Phase A — shell + auth + fleet + hosts pages
v2-ci / build (push) Failing after 38s
v2-ci / unit-tests (tests/Core/ZB.MOM.WW.OtOpcUa.Cluster.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.ControlPlane.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Runtime.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Security.Tests) (push) Has been skipped
v2-ci / integration (push) Has been skipped
Implements Phase A of the F15 rebuild plan: minimum-viable Admin surface
with a working sign-in path and a fleet-state landing page. Decisions Q1–Q5
of docs/v2/AdminUI-rebuild-plan.md were taken as recommended.

- App.razor (moved into AdminUI library from the Host stub; vendored
  Bootstrap from RCL wwwroot — no public CDN, air-gap safe)
- Routes.razor (AuthorizeRouteView enforces page-level [Authorize])
- RedirectToLogin.razor (preserves returnUrl through the auth hop)
- Login.razor (static SSR, posts to /auth/login; Q5 wording about
  generic-vs-specific LDAP errors)
- Account.razor (identity + fleet roles + raw LDAP groups; Q4 — no
  per-cluster grants; fleet-wide LDAP-group → role mapping only)
- Fleet.razor (per-node deployment status: reads NodeDeploymentState
  + unions with IClusterRoleInfo.MembersWithRole("driver") so freshly-
  joined nodes appear as "waiting"; 10s auto-refresh)
- Hosts.razor (Akka cluster topology: members, status, roles, role-
  leader; 5s auto-refresh)

Host's stub App.razor deleted; Program.cs now points at
AdminUI.Components.App via an added using.

All 104 v2 tests remain green.
2026-05-26 07:49:35 -04:00
Joseph Doherty 5c754ecffd docs(v2): F15 UX kickoff — AdminUI rebuild plan
v2-ci / build (push) Failing after 41s
v2-ci / unit-tests (tests/Core/ZB.MOM.WW.OtOpcUa.Cluster.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.ControlPlane.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Runtime.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Security.Tests) (push) Has been skipped
v2-ci / integration (push) Has been skipped
47-page legacy inventory mapped to v2 disposition (5 already done, 22 port
as-is, 7 reshape, 5 dropped because live-edit replaces draft/publish, 4
deferred driver-typed editors). Net ~30 active pages to rebuild.

Five open design questions surfaced for review before per-page work starts:
Q1 driver-typed editors (defer vs. ship), Q2 top-level fleet-wide views
(drop vs. keep), Q3 ClusterDetail tabs vs. split routes, Q4 RoleGrants
cluster-scoped vs. LDAP-group fleet-wide, Q5 Login error UX.

Proposed 4-phase sequencing (~5 days total): shell+auth+fleet, cluster
CRUD, config tabs, logic+ops. Each phase independently mergeable.
2026-05-26 07:38:58 -04:00
Joseph Doherty 68c6f36cfe docs(plans): mark F13a partial-complete (36c4751)
v2-ci / build (push) Failing after 42s
v2-ci / unit-tests (tests/Core/ZB.MOM.WW.OtOpcUa.Cluster.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.ControlPlane.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Runtime.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Security.Tests) (push) Has been skipped
v2-ci / integration (push) Has been skipped
2026-05-26 07:35:04 -04:00
Joseph Doherty 36c4751571 feat(opcua): F13a — cert auto-creation in OpcUaApplicationHost
Adds OPC UA SDK's CheckApplicationInstanceCertificate call to
OpcUaApplicationHost.StartAsync, removing the v1 friction of needing to
pre-create the PKI directory tree before booting.

- New OpcUaApplicationHostOptions.PkiStoreRoot (defaults to "pki")
- BuildConfigurationAsync now derives own/issuer/trusted/rejected from
  PkiStoreRoot so the cert paths are configurable + consistent
- EnsureApplicationCertificateAsync runs before StandardServer.Start, and
  fails fast with a clear message if the SDK can't produce a valid cert
- 2 new tests: fresh-tree creates a cert, second boot reuses it

Partial slice of follow-up F13. Endpoint-security, user-token validator,
and observability wiring still pending in the F13 follow-up. OpcUaServer
tests: 4 → 6.
2026-05-26 07:34:48 -04:00
Joseph Doherty 229282ad8b docs(plans): mark F21 complete (b0a2bb0)
v2-ci / build (push) Failing after 43s
v2-ci / unit-tests (tests/Core/ZB.MOM.WW.OtOpcUa.Cluster.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.ControlPlane.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Runtime.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Security.Tests) (push) Has been skipped
v2-ci / integration (push) Has been skipped
2026-05-26 07:25:36 -04:00
Joseph Doherty b0a2bb037d test(integration): F21 — docker-compose + env-driven SQL/LDAP harness mode
Adds a real-infra mode for the integration test harness alongside the default
in-memory mode. Drops the previously-untested code paths (EF SqlServer
behaviors, real LDAP bind) under env-var control without breaking the
zero-infra default that CI runs.

- docker-compose.yml — minimal SQL 2022 (14331) + OpenLDAP (3894) stack
  (ports chosen to coexist with docker-dev/ on 14330/3893)
- HarnessMode record reads OTOPCUA_HARNESS_USE_SQL=1 / USE_LDAP=1 from env
- SQL mode: per-harness unique DB OtOpcUa_Harness_{guid}, EnsureCreated
  at startup, EnsureDeleted on dispose (best-effort)
- LDAP mode: drops StubLdapAuthService and configures real LdapAuthService
  against the compose'd OpenLDAP via Authentication:Ldap:* config keys
- Microsoft.EntityFrameworkCore.SqlServer added to the test project
- README documents both modes + the macOS no-Docker caveat

Default in-memory mode unchanged — all 9 existing tests still pass.
2026-05-26 07:25:16 -04:00
Joseph Doherty ba6e5dd7f9 docs(plans): mark F11 + F22 complete
v2-ci / build (push) Failing after 49s
v2-ci / unit-tests (tests/Core/ZB.MOM.WW.OtOpcUa.Cluster.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.ControlPlane.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Runtime.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Security.Tests) (push) Has been skipped
v2-ci / integration (push) Has been skipped
F22 → cd5540c (failover scenarios on TwoNodeClusterHarness)
F11 → 6861381 (HistorianAdapterActor → IAlarmHistorianSink bridge)

Branch follow-ups: 11/22 → 13/22 done. Remaining 9 are engine wiring
gated on real drivers/SDKs (F7-F10, F13-F14), Admin UI rebuild (F15),
fold-in to F7 (F20), and SQL/LDAP harness mode (F21).
2026-05-26 07:19:07 -04:00
Joseph Doherty 686138123f feat(runtime): F11 — HistorianAdapterActor wired to IAlarmHistorianSink
v2-ci / unit-tests (tests/Core/ZB.MOM.WW.OtOpcUa.Cluster.Tests) (push) Has been cancelled
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.ControlPlane.Tests) (push) Has been cancelled
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer.Tests) (push) Has been cancelled
v2-ci / build (push) Has been cancelled
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Runtime.Tests) (push) Has been cancelled
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Security.Tests) (push) Has been cancelled
v2-ci / integration (push) Has been cancelled
Reshapes the placeholder buffered-counter actor into a thin fire-and-forget
bridge over the existing IAlarmHistorianSink contract. Default sink is
NullAlarmHistorianSink; production deployments override the DI binding to
SqliteStoreAndForwardSink wrapping WonderwareHistorianClient (the v1
components in src/Drivers/ZB.MOM.WW.OtOpcUa.Driver.Historian.Wonderware*
are reused verbatim — actor is just a mailbox-friendly entry point).

- HistorianAdapterActor.Props(IAlarmHistorianSink?) — null defaults to NullAlarmHistorianSink
- Receive<AlarmHistorianEvent>: fire-and-forget sink.EnqueueAsync
- Receive<GetStatus>: returns sink.GetStatus() (queue depth + drain state)
- ServiceCollectionExtensions.AddOtOpcUaRuntime registers the default sink
- WithOtOpcUaRuntimeActors spawns the actor + registers HistorianAdapterActorKey
- Program.cs calls AddOtOpcUaRuntime when hasDriver

Tests: 2 new (forward-to-sink + GetStatus). Runtime suite 17 → 18.
2026-05-26 07:18:08 -04:00
Joseph Doherty cd5540cb1a test(integration): F22 — failover scenario tests + harness Stop/Restart primitives
Extends TwoNodeClusterHarness with three lifecycle primitives:
- StopNodeBAsync()      — graceful CoordinatedShutdown (Cluster.Leave)
- RestartNodeBAsync()   — rebuild node B on same Akka port + same in-memory DB
- WaitForClusterSizeAsync(n) — converge assertion helper

Adds three failover scenario tests:
- Stopping node B shrinks cluster to 1 Up member
- Restarted node B rejoins on the same Akka port
- Deployment started with B down seals with a single NodeDeploymentState
  (validates ConfigPublishCoordinator.DiscoverDriverNodes snapshots
   membership at dispatch time)

Closes follow-up F22. Integration test count: 6 → 9 (+3).
2026-05-26 07:13:14 -04:00
Joseph Doherty 4e6ef648d1 docs(plans): mark F16 complete
v2-ci / build (push) Failing after 3m8s
v2-ci / unit-tests (tests/Core/ZB.MOM.WW.OtOpcUa.Cluster.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.ControlPlane.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Runtime.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Security.Tests) (push) Has been skipped
v2-ci / integration (push) Has been skipped
2026-05-26 07:01:54 -04:00
Joseph Doherty f18c285cca feat(adminui): FleetStatusSignalRBridge — DPS → SignalR forwarding (F16)
v2-ci / unit-tests (tests/Core/ZB.MOM.WW.OtOpcUa.Cluster.Tests) (push) Has been cancelled
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.ControlPlane.Tests) (push) Has been cancelled
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer.Tests) (push) Has been cancelled
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Runtime.Tests) (push) Has been cancelled
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Security.Tests) (push) Has been cancelled
v2-ci / integration (push) Has been cancelled
v2-ci / build (push) Has been cancelled
New per-admin-node actor that subscribes to the fleet-status DistributedPubSub
topic + forwards every FleetStatusChanged snapshot to all SignalR clients
connected to FleetStatusHub via IHubContext.

Wired via WithOtOpcUaSignalRBridges (new AkkaConfigurationBuilder extension in
AdminUI.Hubs) — Program.cs calls it inside the if(hasAdmin) block alongside
WithOtOpcUaControlPlaneSingletons.

Per-node subscription rather than cluster-singleton: every admin node forwards
its own snapshots to its own connected clients. Simpler than singleton
coordination + acceptable because the messages are small and SignalR fan-out
is per-node anyway.
2026-05-26 07:01:08 -04:00
Joseph Doherty 7a6b016d9e docs(plans): mark F17 complete
v2-ci / build (push) Failing after 41s
v2-ci / unit-tests (tests/Core/ZB.MOM.WW.OtOpcUa.Cluster.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.ControlPlane.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Runtime.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Security.Tests) (push) Has been skipped
v2-ci / integration (push) Has been skipped
2026-05-26 06:58:27 -04:00
Joseph Doherty 8f32b89fb9 feat(adminui): FleetDiagnosticsClient real Akka ActorSelection round-trip (F17)
v2-ci / unit-tests (tests/Core/ZB.MOM.WW.OtOpcUa.Cluster.Tests) (push) Has been cancelled
v2-ci / build (push) Has been cancelled
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.ControlPlane.Tests) (push) Has been cancelled
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer.Tests) (push) Has been cancelled
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Runtime.Tests) (push) Has been cancelled
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Security.Tests) (push) Has been cancelled
v2-ci / integration (push) Has been cancelled
- New Commons.Messages.Fleet.GetDiagnostics request record.
- DriverHostActor handles GetDiagnostics in all three states (Steady, Applying,
  Stale); replies with a NodeDiagnosticsSnapshot built from _currentRevision
  + the local NodeId. Drivers list is empty until F7 wires the per-instance
  children.
- FleetDiagnosticsClient now resolves the target via ActorSelection at
  akka.tcp://{system}@{nodeId}/user/driver-host and Asks with a 3s timeout.
  On timeout/peer-down it returns an empty snapshot so the UI degrades
  gracefully rather than throwing.

Two new integration tests in Host.IntegrationTests:
- GetDiagnostics_returns_snapshot_with_target_NodeId verifies the
  cross-node Ask/Reply works.
- GetDiagnostics_after_deploy_reports_current_revision exercises the
  end-to-end path: AdminOps starts a deployment, both DriverHostActors
  apply, then diagnostics reports the new revision on both nodes.

All 98 v2 tests pass (was 96 + 2 new).
2026-05-26 06:58:11 -04:00
Joseph Doherty 337a691629 docs(plans): mark F3, F4, F5, F12 follow-ups complete
v2-ci / build (push) Failing after 38s
v2-ci / unit-tests (tests/Core/ZB.MOM.WW.OtOpcUa.Cluster.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.ControlPlane.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Runtime.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Security.Tests) (push) Has been skipped
v2-ci / integration (push) Has been skipped
2026-05-26 06:55:39 -04:00
Joseph Doherty b06e3ae740 feat(runtime): PeerOpcUaProbeActor real TCP-connect probe (F12)
Replaces the Ok=true stub with a TCP connect to the peer's OPC UA port (4840
default) with a 2s timeout. A successful connect indicates the OPC UA server
process is up + accepting connections — enough for the redundancy calculator
to treat the peer as live. A full secure-channel Hello/Acknowledge handshake
is overkill for what the redundancy calc consumes and would pull in the OPC
UA Client SDK + a PKI setup. Upgrade later if a deeper liveness signal is ever
required.

Probe extracts the host from NodeId by stripping the :port suffix (commit
5cfbe8b encoded host:port into NodeId for cluster-member identity).

Tests: 2 new tests — Ok=true against a live TcpListener on a chosen port,
Ok=false against an unreachable endpoint. All 17 Runtime tests pass (was 16
covering only the message-contract surface).
2026-05-26 06:54:51 -04:00
Joseph Doherty f57f61deac feat(audit): EventId + CorrelationId columns + filtered unique index (F3 + F4)
ConfigAuditLog gains two nullable columns (EventId, CorrelationId) + a filtered
unique index UX_ConfigAuditLog_EventId. EF migration
20260526105027_AddConfigAuditLogEventIdColumns is additive (nullable + filtered
index = legacy rows backfill cleanly).

AuditWriterActor now writes EventId + CorrelationId into the dedicated columns
instead of synthesising a JSON wrapper into DetailsJson. Cross-restart dedup
is now real: a retry of an already-flushed batch hits the unique index and
SaveChanges throws; the existing catch drops the duplicate without losing the
rest of the batch.

WrapDetails helper deleted — F4 (its JSON hardening) becomes moot.

AuditWriterActorTests.Details_wrapper_embeds_eventId_and_correlationId renamed
+ rewritten to assert against the columns. All 29 ControlPlane tests pass,
all 95 v2 tests green.
2026-05-26 06:52:53 -04:00
Joseph Doherty 8e5c8e29f7 docs(plans): mark Task 61 complete
v2-ci / build (push) Failing after 1m21s
v2-ci / unit-tests (tests/Core/ZB.MOM.WW.OtOpcUa.Cluster.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.ControlPlane.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Runtime.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Security.Tests) (push) Has been skipped
v2-ci / integration (push) Has been skipped
2026-05-26 06:49:06 -04:00
Joseph Doherty 253fb60459 ci: v2 build + unit + integration workflow, nightly E2E (Task 61)
.github/workflows/v2-ci.yml runs on push/PR to v2-akka-fuse + master:
  - build job: dotnet restore + build (Release)
  - unit-tests job: matrix over the 5 v2 test projects (Cluster, ControlPlane,
    Runtime, Security, OpcUaServer) with Category!=E2E
  - integration job: Host.IntegrationTests with Category!=E2E

.github/workflows/v2-e2e.yml runs nightly at 03:00 UTC + workflow_dispatch:
  - Brings up the docker-dev four-node fleet (admin pair + driver pair + SQL
    + LDAP + Traefik)
  - Waits up to 60s for /health/active to return 200
  - Runs Category=E2E only
  - Always tears down with -v

Both workflows pin .NET 10 via actions/setup-dotnet (no global.json so any
10.0 SDK works). Compatible with both GitHub Actions and Gitea Actions
(act_runner). The E2E filter currently matches zero tests because the
tests/Server/ZB.MOM.WW.OtOpcUa.E2ETests project doesn't exist yet — it lands
when F10/F11/F12 wire enough engine for an end-to-end round-trip to be
meaningful.
2026-05-26 06:48:52 -04:00
Joseph Doherty 8ac71db464 docs(plans): mark Tasks 62, 63, 64, 65 complete 2026-05-26 06:46:55 -04:00
Joseph Doherty 7e3b56c27d feat(deploy): Traefik active-leader routing + docker-dev compose (Task 63)
- scripts/install/traefik.yml + traefik-dynamic.yml: Traefik static + dynamic
  config. One :80 entry point, one router on HostRegexp(otopcua.*), one
  service load-balancing admin-a:9000 + admin-b:9000 with /health/active health
  check (interval 5s, timeout 2s, expected 200). Followers return 503 from
  /health/active so Traefik drops them within the next interval after a
  leadership change.

- scripts/install/Install-Traefik.ps1: downloads Traefik for Windows, drops the
  yml configs, registers the OtOpcUaTraefik Windows service via sc.exe with
  restart-on-failure. Companion to Install-Services.ps1.

- docker-dev/{Dockerfile,docker-compose.yml,traefik-dynamic.yml,README.md}:
  Mac-friendly four-node fleet (admin-a + admin-b + driver-a + driver-b) plus
  SQL Server 2022 + OpenLDAP + Traefik. Single OtOpcUa.Host image built once;
  Compose drives OTOPCUA_ROLES + Cluster:* per container to differentiate the
  four hosts. README walks through bring-up + failover smoke + the dev LDAP
  users.

Note: untested on macOS (no local Docker — see docs/v2/dev-environment.md).
2026-05-26 06:46:40 -04:00
Joseph Doherty e40615dad5 feat(install): rewrite Install/Refresh/Uninstall-Services.ps1 for v2 fused Host (Task 62)
- Install-Services.ps1: installs OtOpcUaHost (single fused binary) replacing
  the v1 OtOpcUa + OtOpcUaAdmin pair. Required -Roles param writes OTOPCUA_ROLES
  to the service env so Program.cs decides what to mount (admin / driver / both).
  -HttpPort param (default 9000) writes ASPNETCORE_URLS on admin-role nodes.
  sc.exe restart-on-failure: 5s, 30s, 60s; reset counter after 24h clean run.
  Wonderware historian sidecar install logic preserved from v1.

- Uninstall-Services.ps1: removes OtOpcUaHost + cleans up legacy v1 names
  (OtOpcUa, OtOpcUaAdmin) and the long-retired OtOpcUaGalaxyHost.

- Refresh-Services.ps1: updated service names (OtOpcUa -> OtOpcUaHost), publish
  path (ZB.MOM.WW.OtOpcUa.Server -> ZB.MOM.WW.OtOpcUa.Host), process names
  (OtOpcUa.Server -> OtOpcUa.Host). Switched nssm stop/start calls to
  Stop-Service/Start-Service so the script works whether the underlying
  service was installed via nssm or sc.exe.
2026-05-26 06:44:35 -04:00
Joseph Doherty 1689901c0e docs(v2): Architecture-v2 + Cluster + ControlPlane + Runtime overviews (Task 65)
Four new docs at docs/v2/ giving a single-page tour of each v2 piece:
- Architecture-v2.md: top-level mental model (fused Host + roles + cluster + live-edit)
- Cluster.md: AkkaClusterOptions + IClusterRoleInfo + WithOtOpcUaClusterBootstrap
- ControlPlane.md: 5 admin singletons + DPS topics + deploy flow + failover recovery
- Runtime.md: per-node actor tree + state machines + engine-wiring follow-up map

Each links back to the design doc for depth. Architecture-v2 cross-references
the other three + ServiceHosting + Redundancy + security.
2026-05-26 06:41:48 -04:00
Joseph Doherty 3c3fef911c docs: v2 updates to Redundancy, ServiceHosting, security, README (Task 64)
- Redundancy.md: full rewrite — Akka-leader-driven ServiceLevel replaces
  operator-managed RedundancyRole. Documents the 5-tier ServiceLevelCalculator,
  RedundancyStateActor cluster singleton, and the DPS data flow.

- ServiceHosting.md: full rewrite — single fused OtOpcUa.Host binary with
  OTOPCUA_ROLES env gating. Documents the conditional DI graph and the new
  health endpoints (/health/ready, /health/active, /healthz).

- security.md: v2 banner at top covering path/project renames + new JWT bearer
  + DataProtection persisted to ConfigDb. Body unchanged because the 4-concern
  security model is unchanged in v2; full per-section rewrite waits for F15
  (Admin pages migration) since security.md references many pages that move.

- README.md: platform overview updated to v2 (fused Host + role gating).
2026-05-26 06:38:55 -04:00
Joseph Doherty a8becc9c46 docs(plans): mark Task 59 complete; track F22 failover scenarios 2026-05-26 06:35:03 -04:00
Joseph Doherty 5cfbe8b5dd test(host): deploy happy-path + idempotency integration tests (Task 59)
DeployHappyPathTests exercises the full deploy pipeline on the 2-node harness:
AdminOperationsActor → ConfigPublishCoordinator → DistributedPubSub →
DriverHostActor on both nodes → ApplyAck → coordinator seals. Verifies both
NodeDeploymentState rows reach Applied and Deployment.Status reaches Sealed.

Exposed + fixed two production bugs along the way:

1. Coordinator was publishing DispatchDeployment on the "deployments" topic but
   never subscribed to anything — DriverHostActor ACKs published on the same
   topic could not reach it. Added dedicated "deployment-acks" topic with
   coordinator subscription in PreStart, and DriverHostActor publishes ACKs
   there.

2. NodeId derivation used member.Address.Host only — two cluster members on a
   shared loopback host (test harness, dev VMs) collided to one identity. The
   coordinator's expected-ack set became {1} and the system sealed after only
   half the nodes acked. Switched to host:port everywhere (ClusterRoleInfo +
   coordinator) so loopback nodes stay distinct and production identities are
   harmlessly more specific.

Tests: 95 v2 tests pass (was 93 + 2 deploy tests), 0 skipped.

Failover scenarios (design §8 cases 3-7: node-kill-mid-apply, split-brain,
restart-during-deploy) deferred — they need controlled node-down primitives
on the harness. Tracked as F22 (failover scenario test cases).
2026-05-26 06:34:36 -04:00
Joseph Doherty 62e3cd6599 docs(plans): mark Task 58 complete; track F21 docker-compose follow-up 2026-05-26 06:27:33 -04:00
Joseph Doherty d6fac2d81d test(host): 2-node integration test harness + consolidate to one ActorSystem (Task 58)
Builds TwoNodeClusterHarness: two in-process Host-equivalent nodes sharing
an in-memory ConfigDb. Forms a 2-member Akka cluster. ClusterFormationTests
proves both nodes see each other as admin+driver role members.

Fixes a real production bug uncovered while wiring the harness — Program.cs
ran two separate ActorSystems (one from AddOtOpcUaCluster.AkkaHostedService
with cluster HOCON, one from Akka.Hosting.AddAkka with bare HOCON). Cluster
singletons landed on the bare ActorSystem and could not actually form a
cluster ("Configuration does not contain `akka.cluster` node").

Consolidation:
- AddOtOpcUaCluster now only binds AkkaClusterOptions + registers IClusterRoleInfo
- New WithOtOpcUaClusterBootstrap pushes embedded HOCON + Remote/Cluster options
  into Akka.Hosting's AkkaConfigurationBuilder
- AkkaHostedService.cs deleted — Akka.Hosting now owns the lifecycle
- Program.cs + harness call WithOtOpcUaClusterBootstrap inside AddAkka

Why not WebApplicationFactory<Program>? Program.cs reads OTOPCUA_ROLES from
process env (shared across in-process WAFs); the harness replays Program.cs's
DI graph from a clean WebApplicationBuilder per node with per-node config
overrides. Same production extensions, isolated config + Kestrel + Akka ports.

Tests: 93 v2 tests pass (was 91 + 2 new cluster formation), 0 skipped.
2026-05-26 06:27:04 -04:00
Joseph Doherty bb353c4d43 docs(plans): mark F1, F2, F6, F18, F19 follow-ups complete
Post-compaction batch of 5 follow-ups landed:
- F19 (09d6676): WithOtOpcUaRuntimeActors extension for driver-role spawn
- F1  (463512d): AuthEndpoints integration tests via TestServer
- F6  (dfc143c): RedundancyStateActor broadcast override + un-skip 2 tests
- F18 (b266f63): User.Identity.Name into Deployments createdBy
- F2  (45a8c79): JwtBearer validation via IPostConfigureOptions
2026-05-26 06:18:31 -04:00
Joseph Doherty 45a8c79ffe refactor(security): JwtBearer validation via IPostConfigureOptions (F2)
Eliminates the services.BuildServiceProvider() captive-provider antipattern
(ASP0000) inside AddJwtBearer. The new ConfigureJwtBearerFromTokenService
resolves JwtTokenService from the real DI container at runtime and stays
in lock-step with JwtTokenService.BuildValidationParameters.

All 27 Security.Tests stay green, including the F1 integration tests that
exercise /auth/token through the real bearer pipeline.
2026-05-26 06:18:00 -04:00
Joseph Doherty b266f63cd7 feat(adminui): thread User.Identity.Name into Deployments createdBy (F18)
Injects AuthenticationStateProvider and reads the current user's identity
name on Deploy click, replacing the "(current user)" placeholder.
Anonymous case falls back to "(anonymous)" — should never hit in practice
since the page requires FleetAdmin/ConfigEditor.
2026-05-26 06:17:53 -04:00
Joseph Doherty dfc143cdeb feat(controlplane): RedundancyStateActor broadcast override + un-skip tests (F6)
Mirrors the publisher-injection pattern from FleetStatusBroadcaster and
PeerOpcUaProbeActor: Props accepts an optional Action<object> override so
tests can use a TestProbe sink instead of bootstrapping DistributedPubSub
(unreliable single-node in TestKit).

Un-skips the two RedundancyStateActor tests deferred under F6.
2026-05-26 06:16:32 -04:00
Joseph Doherty 463512d1d8 test(security): AuthEndpoints integration tests via TestServer (F1)
7 tests exercise AddOtOpcUaAuth + MapOtOpcUaAuth end-to-end against an
in-memory ConfigDb + stub ILdapAuthService. Covers /auth/login (204/401/503),
/auth/ping (401/200), /auth/token (200+JWT shape), /auth/logout (204+clear-cookie).

Scope is the auth contract — not the fused Host bootstrap (cluster + role
gating belongs in the Task 58 multi-node harness). HostBuilder + TestServer
is used directly instead of WebApplicationFactory<Program> because the
test project has no Program entry point and Host needs Akka cluster up.
2026-05-26 06:15:07 -04:00
Joseph Doherty 09d6676e1f feat(runtime): WithOtOpcUaRuntimeActors extension for driver-role node startup (F19)
Mirrors WithOtOpcUaControlPlaneSingletons for the driver role. Spawns
DriverHostActor + DbHealthProbeActor on the host's ActorSystem and
registers both under marker keys. Host's Program.cs now calls it when
the node carries the driver role, so driver-only and admin+driver
deployments both auto-bootstrap the per-node actors.

Integration test covers the registration round-trip via Microsoft.Extensions.Hosting
+ Akka.Hosting AddAkka.
2026-05-26 06:09:37 -04:00
Joseph Doherty 698709a578 docs(plans): mark Tasks 56+57 complete 2026-05-26 05:39:07 -04:00
Joseph Doherty 76310b8829 chore(cleanup): delete OtOpcUa.Server, OtOpcUa.Admin, and obsolete v1 tests
Task 56: removes the legacy in-process Server + Admin Web project + their test
projects (Server.Tests, Admin.Tests, Admin.E2ETests). The fused OtOpcUa.Host
binary built across Phases 1-9 is now the sole production entry point.

What happened to the 47 legacy Admin Blazor pages: per follow-up F15, the
v1 architecture's draft/publish UX is replaced by v2's live-edit + snapshot-
deploy model, so a 1:1 migration is not meaningful. The mechanical move via
git mv preserves the history; service classes + page bodies that referenced
removed v1 types (ConfigGeneration, RedundancyRole, GenerationId) were
deleted. AdminUI now ships a minimal Home page + the v2 Deployments page.

Per-page rebuild against the v2 surface is tracked as F15. The v2 Deployments
page (Task 52) is the only first-party UI shipping in this PR.

Task 57: solution build green; 84+ tests green across active v2 + legacy
driver test projects.
2026-05-26 05:38:31 -04:00
Joseph Doherty 2b75ce3876 docs(plans): mark Phase 9 tasks 53-55 complete; track F19/F20 follow-ups 2026-05-26 05:23:48 -04:00
Joseph Doherty 8b4de8080b feat(runtime): DEV-STUB mode for Windows-only drivers on non-Windows or dev role 2026-05-26 05:23:02 -04:00
Joseph Doherty fa1d685ccd feat(host): health endpoints + per-environment appsettings layout 2026-05-26 05:23:01 -04:00
Joseph Doherty e2b357f89a feat(host): role-gated Program.cs composes all v2 components 2026-05-26 05:22:59 -04:00
Joseph Doherty eb4280b7eb docs(plans): add F15-F18 follow-ups for Phase 8 deferred scope 2026-05-26 05:19:02 -04:00
Joseph Doherty 8a1f97b27f docs(plans): mark Phase 8 tasks 48-52 complete; track F15-F18 follow-ups 2026-05-26 05:18:37 -04:00
Joseph Doherty f167808a2c feat(adminui): Deployments page with drift indicator and Deploy button 2026-05-26 05:18:00 -04:00
Joseph Doherty b83f099394 feat(adminui): IFleetDiagnosticsClient skeleton (Akka round-trip tracked as F17) 2026-05-26 05:17:59 -04:00
Joseph Doherty f022499e7f feat(adminui): IAdminOperationsClient backed by ClusterSingletonProxy 2026-05-26 05:17:58 -04:00
Joseph Doherty 26d8f2f620 feat(adminui): FleetStatusHub + AlertHub + MapOtOpcUaHubs (broadcaster bridge tracked as F16) 2026-05-26 05:17:56 -04:00
Joseph Doherty 1a067e609c refactor(adminui): MapAdminUI extension + AddAdminUI DI (47-component migration tracked as F15) 2026-05-26 05:17:55 -04:00
Joseph Doherty 5e31449529 docs(plans): mark Phase 7 tasks 46+47 complete; track F13/F14 full-extraction follow-ups 2026-05-26 05:15:18 -04:00
Joseph Doherty b7c117ab31 feat(opcua): pure Phase7Composer + purity tests (side-effects tracked as F14) 2026-05-26 05:14:45 -04:00
Joseph Doherty 2877a883cd feat(opcua): OpcUaApplicationHost facade in OpcUaServer (full extraction tracked as F13) 2026-05-26 05:14:39 -04:00
Joseph Doherty 2e4f1399bb docs(plans): mark Phase 6 tasks 39-45 complete (race-recovered commit) 2026-05-26 05:10:21 -04:00
Joseph Doherty e31547d00e docs(plans): mark Phase 6 tasks 37-45 complete; track F7-F12 engine-wiring follow-ups 2026-05-26 05:09:52 -04:00
Joseph Doherty 28639cb14d feat(runtime): HistorianAdapter + PeerOpcUaProbe + DbHealthProbe actors (engine wiring tracked as F11/F12) 2026-05-26 05:09:06 -04:00
Joseph Doherty e115f13104 feat(runtime): OpcUaPublishActor on synchronized dispatcher (SDK wiring tracked as F10) 2026-05-26 05:09:04 -04:00
Joseph Doherty 95ef533822 feat(runtime): ScriptedAlarmActor state machine (engine wiring tracked as F9) 2026-05-26 05:09:03 -04:00
Joseph Doherty 39729bfe21 feat(runtime): VirtualTagActor skeleton (engine wiring tracked as F8) 2026-05-26 05:09:01 -04:00
Joseph Doherty 64c627f8d6 feat(runtime): DriverInstanceActor state machine with Connecting/Connected/Reconnecting 2026-05-26 05:05:36 -04:00
Joseph Doherty ed130135ca feat(runtime): DriverHostActor state machine with PreStart recovery + DispatchDeployment + stale fallback 2026-05-26 05:02:42 -04:00
Joseph Doherty ea6f972e96 docs(plans): mark Phase 5 tasks 30-36 complete with commit hashes 2026-05-26 04:57:37 -04:00
Joseph Doherty 52bf4b3371 feat(controlplane): WithOtOpcUaControlPlaneSingletons registration extension (admin role) 2026-05-26 04:57:09 -04:00
Joseph Doherty dd122c4ca9 feat(controlplane): FleetStatusBroadcaster push-driven from cluster events + heartbeats 2026-05-26 04:57:07 -04:00
Joseph Doherty f193872891 feat(controlplane): ConfigPublishCoordinator deadline timeout + failover PreStart recovery 2026-05-26 04:57:05 -04:00
Joseph Doherty bad2aef137 docs(plans): track F5 multi-node coordinator test + F6 RedundancyState publisher refactor 2026-05-26 04:53:32 -04:00
Joseph Doherty 6b37f997ad feat(controlplane): RedundancyStateActor with debounced topology publish 2026-05-26 04:53:31 -04:00
Joseph Doherty 62e12dab95 feat(controlplane): ConfigPublishCoordinator happy path with NodeDeploymentState seeding 2026-05-26 04:53:29 -04:00
Joseph Doherty ef683f5073 feat(controlplane): AdminOperationsActor + ConfigComposer + StartDeployment flow 2026-05-26 04:53:28 -04:00
Joseph Doherty 9f61cd5989 test(controlplane): self-join cluster + DistributedPubSub extension in test harness 2026-05-26 04:53:25 -04:00
Joseph Doherty 9582e448d5 docs(plans): track F4 WrapDetails JSON hardening follow-up 2026-05-26 04:46:18 -04:00
Joseph Doherty 1955bc5f4d docs(plans): mark Task 33+35 partial complete; track F3 audit-idempotency follow-up 2026-05-26 04:44:37 -04:00
Joseph Doherty 23f669c376 feat(controlplane): AuditWriterActor with batched in-buffer-dedup insert 2026-05-26 04:44:01 -04:00
Joseph Doherty 14acab5a58 feat(controlplane): ServiceLevelCalculator + ControlPlane.Tests harness 2026-05-26 04:43:59 -04:00
Joseph Doherty 32574b3e4e docs(plans): track JwtBearer DI antipattern as follow-up F2 2026-05-26 04:39:10 -04:00
Joseph Doherty fc22d4f7b6 docs(plans): track AuthEndpoints integration tests as follow-up F1 2026-05-26 04:37:49 -04:00
Joseph Doherty 973a3d1b9a docs(plans): mark Tasks 24-29 complete in tasks.json 2026-05-26 04:36:17 -04:00
Joseph Doherty 38ea0c5086 test(security): cookie+JWT roundtrip, role mapper, LDAP escape/RDN helpers 2026-05-26 04:35:51 -04:00
Joseph Doherty e38f22e3c2 feat(security): CookieAuthenticationStateProvider for Blazor circuit expiry detection 2026-05-26 04:35:50 -04:00
Joseph Doherty 8be84ba27b feat(security): /auth/login, /auth/ping, /auth/token endpoints 2026-05-26 04:35:49 -04:00
Joseph Doherty 207fc6aba9 feat(security): cookie+JWT hybrid auth via AddOtOpcUaAuth 2026-05-26 04:35:48 -04:00
Joseph Doherty 93316e3431 feat(security): JwtTokenService with HS256 + 15-min expiry 2026-05-26 04:35:46 -04:00
Joseph Doherty 567b8cac1d refactor(security): move LdapAuthService into OtOpcUa.Security library 2026-05-26 04:35:42 -04:00
Joseph Doherty f35925b57e docs(plans): mark Tasks 19-23 complete in tasks.json 2026-05-26 04:31:28 -04:00
Joseph Doherty e0b6d5680b test(cluster): HOCON parses, role parser truth table 2026-05-26 04:31:08 -04:00
Joseph Doherty c217c49f69 feat(cluster): ClusterRoleInfo wraps Akka.Cluster for app-facing role queries 2026-05-26 04:31:07 -04:00
Joseph Doherty dfb06368cd feat(cluster): parse OTOPCUA_ROLES env var with validation 2026-05-26 04:31:06 -04:00
Joseph Doherty f184f8ed1b feat(cluster): AkkaHostedService and DI extension 2026-05-26 04:31:05 -04:00
Joseph Doherty 3d0f4dc168 feat(cluster): embed Akka HOCON config matching ScadaLink tuning 2026-05-26 04:31:03 -04:00
Joseph Doherty fdb4ac7051 docs(plans): mark Tasks 15-18 complete in tasks.json 2026-05-26 04:27:41 -04:00
Joseph Doherty 136234e7f2 feat(commons): add cluster/admin/diagnostics client interfaces 2026-05-26 04:27:19 -04:00
Joseph Doherty 5d3a5a40d7 feat(commons): add deploy/admin/audit/redundancy/fleet message contracts 2026-05-26 04:27:18 -04:00
Joseph Doherty fee4a8c008 feat(commons): add correlation/execution/node/deployment/revisionhash types 2026-05-26 04:26:01 -04:00
Joseph Doherty c168c1c9c6 feat(migration): add Migrate-To-V2.ps1 idempotent migration runner 2026-05-26 04:26:01 -04:00
Joseph Doherty 605dbf3dcc feat(configdb): V2HostingAlignment migration consolidating Phase 1a-1e
Phase 1f — the consolidator migration. Closes out the v2 entity-model
rewrite by emitting a single EF migration that captures the cumulative
schema delta from 14a (RowVersion) through 14e (drop generation entities).

Generated: src/Core/ZB.MOM.WW.OtOpcUa.Configuration/Migrations/
              20260526081556_V2HostingAlignment.cs           (1562 lines)
              20260526081556_V2HostingAlignment.Designer.cs

Migration shape (per `grep -nE migrationBuilder.\(...)`):

  Drop  12 ForeignKey constraints (one per live-edit entity's GenerationId FK)
  Drop  2  Tables  (ConfigGeneration, ClusterNodeGenerationState)
  Drop  45 Indexes (every UX_*_Generation_* and IX_*_Generation_* across the
                    13 live-edit tables — 1 also dropped the unique-Primary
                    filtered index UX_ClusterNode_Primary_Per_Cluster)
  Drop  13 Columns (12 GenerationId + 1 RedundancyRole)
  Add   12 RowVersion columns (one per live-edit entity)
  Create 4  Tables (Deployment, NodeDeploymentState, ConfigEdit,
                    DataProtectionKeys)
  Create ~45 Indexes (recreated under the new naming pattern
                      UX_<Table>_LogicalId / UX_<Table>_<X> with the
                      GenerationId column stripped from composite keys)

Notable EF quirks accepted:
  Unique-on-required-column indexes (UX_VirtualTag_LogicalId etc.) ship a
  `filter: "[VirtualTagId] IS NOT NULL"` clause that EF auto-inserts for
  SQL Server. Harmless — the column is C#-side `required` so NULL never
  appears.

Verification:
  dotnet build src/Core/ZB.MOM.WW.OtOpcUa.Configuration          -> 0 errors
  dotnet ef migrations script --idempotent (against placeholder DSN)
                                                                 -> 3259-line
                                                                    .sql produced
                                                                    OK
  tests/Core/ZB.MOM.WW.OtOpcUa.Configuration.Tests                -> 0 errors

Live `dotnet ef database update` against a scratch SQL Server deferred to
Task 15 (Migrate-To-V2.ps1) — SSH to the docker host needs a key/password I
don't have, and the always-on SQL at 10.100.0.35,14330 uses Integrated
Security (Windows auth, unreachable from this macOS dev). The migration
itself is structurally correct by construction (EF tooling generated it
against the live DbContext model); the live-DB confidence step is the
PowerShell wrapper's job.

SchemaComplianceTests updates:
  - All_expected_tables_exist: removed ConfigGeneration +
    ClusterNodeGenerationState; added Deployment, NodeDeploymentState,
    ConfigEdit, DataProtectionKeys.
  - Filtered_unique_indexes_match_schema_spec: removed entries for
    UX_ClusterNode_Primary_Per_Cluster (Task 14d) and
    UX_ConfigGeneration_Draft_Per_Cluster (Task 14e). Two filtered uniques
    remain (UX_ClusterNodeCredential_Value, UX_ExternalIdReservation_KindValue_Active).
  - Check_constraints_match_schema_spec: added CK_ConfigEdit_FieldsJson_IsJson.

StoredProceduresTests update:
  - Removed RedundancyRole + 'Primary' from the raw INSERT into ClusterNode
    so the DB-backed test runs against the new schema.
2026-05-26 04:18:50 -04:00
Joseph Doherty e00f46d723 refactor(configdb): delete ConfigGeneration + ClusterNodeGenerationState
Phase 1e of the v2 entity-model rewrite. With the FKs gone (Task 14b) and
the apply pipeline replaced (Task 14c), the v1 draft/publish entities have
no remaining v2 consumers.

Deleted entity classes:
  src/Core/ZB.MOM.WW.OtOpcUa.Configuration/Entities/ConfigGeneration.cs
  src/Core/ZB.MOM.WW.OtOpcUa.Configuration/Entities/ClusterNodeGenerationState.cs

Deleted enum classes (no v2 consumers):
  src/Core/ZB.MOM.WW.OtOpcUa.Configuration/Enums/GenerationStatus.cs
  src/Core/ZB.MOM.WW.OtOpcUa.Configuration/Enums/NodeApplyStatus.cs

OtOpcUaConfigDbContext changes:
  - Removed DbSet<ConfigGeneration> ConfigGenerations
  - Removed DbSet<ClusterNodeGenerationState> ClusterNodeGenerationStates
  - Removed ConfigureConfigGeneration(modelBuilder) call + method body
  - Removed ConfigureClusterNodeGenerationState(modelBuilder) call + body
  - Tidied the "v2 deploy-model tables" header comment

Navigation property cleanup:
  - ServerCluster.Generations collection -> removed
  - ClusterNode.GenerationState navigation -> removed

doc-comment cref cleanup (replaced <see cref="X"/> with <c>X</c> for the
deleted types so the C# XML comment compiler doesn't fail with CS1574):
  - Deployment.cs (cref to ConfigGeneration)
  - NodeDeploymentState.cs (cref to ClusterNodeGenerationState)
  - Core/OpcUa/EquipmentNodeWalker.cs (cref to ConfigGeneration in the
    EquipmentNamespaceContent record's doc-comment; while there, removed
    "All four collections are scoped to the same ConfigGeneration" since
    that's no longer true in v2)

Verification:
  src/Core/ZB.MOM.WW.OtOpcUa.Configuration            -> 0 errors
  src/Core/ZB.MOM.WW.OtOpcUa.Core                     -> 0 errors
  tests/Core/ZB.MOM.WW.OtOpcUa.Configuration.Tests    -> 0 errors
  tests/Core/ZB.MOM.WW.OtOpcUa.Core.Tests             -> 0 errors
  whole solution                                       -> 15 errors
    (all in Server/Admin; transitive Server.Tests/Admin.Tests skip per the
    parent's failure, so the per-project count dropped vs Task 14d's 71)
2026-05-26 04:14:55 -04:00
Joseph Doherty 3c915e652e refactor(configdb): drop ClusterNode.RedundancyRole (replaced by Akka leader)
Phase 1d of the v2 entity-model rewrite. The static RedundancyRole column
is replaced by Akka cluster's role-leader-of-"driver" election at runtime
(see RedundancyStateActor + ServiceLevelCalculator in Task 35).

Changes:

  - Removed `public required RedundancyRole RedundancyRole` from
    ClusterNode entity.
  - Removed `e.Property(x => x.RedundancyRole).HasConversion<string>()...`
    mapping from OtOpcUaConfigDbContext.ConfigureClusterNode.
  - Removed the `UX_ClusterNode_Primary_Per_Cluster` filtered unique index
    (filter referenced [RedundancyRole]='Primary').
  - Dropped `using ZB.MOM.WW.OtOpcUa.Configuration.Enums` from ClusterNode.cs
    (no longer needed).
  - Deleted `Enums/RedundancyRole.cs` — the enum is unused in v2-kept code.
  - DraftValidator: dropped the "exactly one Primary per cluster"
    validation block. Comment in place explaining v2 picks primary at
    runtime via Akka.
  - DraftValidatorTests: dropped ValidateClusterTopology_flags_multiple_Primary
    test; reworked BuildNode helper to no longer take a `role` argument.

Untouched (Server + Admin still reference RedundancyRole; accepted broken
per Task 56 policy):

  src/Server/ZB.MOM.WW.OtOpcUa.Server/Redundancy/{ClusterTopologyLoader,
    RedundancyStatePublisher, RedundancyTopology, ServiceLevelCalculator}.cs
  src/Server/ZB.MOM.WW.OtOpcUa.Admin/Services/RedundancyMetrics.cs

DB-runtime tests will fail against the new schema (Task 14f's migration
drops the column) — to be updated in Task 14f's SchemaComplianceTests
update:

  - SchemaComplianceTests.cs:55 (expected filtered index list)
  - StoredProceduresTests.cs:263 (raw INSERT names the column)

Verification:
  src/Core/ZB.MOM.WW.OtOpcUa.Configuration            -> 0 errors
  tests/Core/ZB.MOM.WW.OtOpcUa.Configuration.Tests    -> 0 errors
  whole solution                                       -> 71 errors
    (70 from Task 14b in Server/Admin, +1 new Server/Redundancy reference)
2026-05-26 04:11:57 -04:00
Joseph Doherty 1ddf8bb50e refactor(configdb): delete v1 Apply pipeline (replaced by AdminOperationsActor)
Phase 1c of the v2 entity-model rewrite. Deletes the draft/publish lifecycle
machinery that v2 replaces with AdminOperationsActor + ConfigComposer +
DriverInstanceActor.ApplyDelta.

Deleted (6 files):

  src/Core/ZB.MOM.WW.OtOpcUa.Configuration/Apply/
    IGenerationApplier.cs   — interface for the apply pipeline
    GenerationApplier.cs    — the v1 applier coordinating per-driver hook-back
    GenerationDiff.cs       — typed wrapper over the sp_ComputeGenerationDiff
                              SQL output
    ApplyCallbacks.cs       — per-driver hook surface invoked by the applier
    ChangeKind.cs           — enum {Added, Modified, Removed, Unchanged}

  tests/Core/ZB.MOM.WW.OtOpcUa.Configuration.Tests/GenerationApplierTests.cs

The empty Apply/ directory is removed.

Kept (repurposed in Task 39 for stale-config fallback):

  src/Core/ZB.MOM.WW.OtOpcUa.Configuration/LocalCache/GenerationSealedCache.cs
  src/Core/ZB.MOM.WW.OtOpcUa.Configuration/LocalCache/ResilientConfigReader.cs
  tests/Core/ZB.MOM.WW.OtOpcUa.Configuration.Tests/GenerationSealedCacheTests.cs
  tests/Core/ZB.MOM.WW.OtOpcUa.Configuration.Tests/ResilientConfigReaderTests.cs

Naming rename (GenerationSealedCache -> DeploymentArtifactCache) deferred
to Task 39 (DriverHostActor stale-config fallback) where the consumer is
written. The type stays available under its v1 name until then.

IDriver.cs doc-comment: replaced the "Used by IGenerationApplier..." sentence
with "Invoked by the v2 DriverInstanceActor when ApplyDelta reports that only
this driver's config changed in the new deployment."

Server/Admin breakage from Task 14b unchanged (70 errors). Configuration +
Core.Tests + Configuration.Tests stay green.

  src/Core/ZB.MOM.WW.OtOpcUa.Configuration  -> 0 errors
  tests/Core/ZB.MOM.WW.OtOpcUa.Configuration.Tests  -> 0 errors
  whole solution  -> 70 errors (all in Server/Admin)
2026-05-26 04:09:17 -04:00
Joseph Doherty 13d3aeab09 refactor(configdb): drop GenerationId FK from live-edit entities
Phase 1b of the v2 entity-model rewrite. The design's live-edit model means
the 12 v2 live-edit entities no longer carry a generation scope — they're
edited directly via AdminOperationsActor, with RowVersion (added in Task 14a)
providing last-write-wins detection.

Entity changes (12 files):

  Equipment, DriverInstance, Device, Tag, PollGroup, Namespace,
  UnsArea, UnsLine, NodeAcl, Script, VirtualTag, ScriptedAlarm

  - Removed: public long GenerationId
  - Removed: public ConfigGeneration? Generation (navigation)

DbContext changes (OtOpcUaConfigDbContext.cs):

  - Removed 12 HasOne(x => x.Generation).WithMany().HasForeignKey... mappings
  - Rewrote ~36 indexes: dropped the GenerationId column from each composite
    key, renamed UX_<Table>_Generation_<X> -> UX_<Table>_<X> and
    IX_<Table>_Generation_<X> -> IX_<Table>_<X>. Logical IDs become globally
    unique (UX_<Table>_LogicalId on the LogicalId column alone).
  - Removed Namespace's redundant UX_Namespace_Generation_LogicalId_Cluster
    index (subsumed by the new UX_Namespace_LogicalId).

Core.Tests fixtures (4 files):

  Removed "GenerationId = 1," lines from:
    - PermissionTrieBuilderTests.cs (NodeAcl Row factory)
    - PermissionTrieTests.cs (NodeAcl Row factory)
    - TriePermissionEvaluatorTests.cs (NodeAcl Row factory + 2 gen{1,5}Row
      mutations that test stale-generation evaluation; the trie itself still
      carries a generation tag via PermissionTrie.GenerationId, fed in via
      PermissionTrieBuilder.Build's generationId parameter, so the tests
      still exercise the production code path)
    - EquipmentNodeWalkerTests.cs (Area/Line/Eq/Tag/VirtualTag/ScriptedAlarm
      builders)

Expected breakage (accepted per Task 56 policy):

  src/Server/ZB.MOM.WW.OtOpcUa.Server   ~25 errors  (DriverInstanceBootstrapper,
                                                     AuthorizationBootstrap,
                                                     EquipmentNamespaceContentLoader,
                                                     Phase7Composer, ...)
  src/Server/ZB.MOM.WW.OtOpcUa.Admin    ~45 errors  (VirtualTags.razor,
                                                     ScriptedAlarms.razor,
                                                     DriverInstanceService,
                                                     EquipmentService,
                                                     EquipmentImportBatchService,
                                                     UnsService,
                                                     FocasDriverDetailService,
                                                     ...)

Server.Tests, Admin.Tests, Admin.E2ETests also break transitively (they
project-reference Server/Admin). All deleted in Task 56.

Verification:
  dotnet build src/Core/ZB.MOM.WW.OtOpcUa.Configuration -> 0 errors
  dotnet build tests/Core/ZB.MOM.WW.OtOpcUa.Core.Tests  -> 0 errors
  dotnet build tests/Core/ZB.MOM.WW.OtOpcUa.Configuration.Tests -> 0 errors
  dotnet build (whole solution) -> 70 errors, all in Server/Admin
2026-05-26 04:06:25 -04:00
Joseph Doherty 4bb4ad8acb feat(configdb): add RowVersion to live-edit entities
Phase 1a of the v2 entity-model rewrite. Adds:

  public byte[] RowVersion { get; set; } = Array.Empty<byte>();

and the EF Core mapping

  e.Property(x => x.RowVersion).IsRowVersion();

to 12 live-edit entities:

  Equipment, DriverInstance, Device, Tag, PollGroup, Namespace,
  UnsArea, UnsLine, NodeAcl, Script, VirtualTag, ScriptedAlarm

These are the entities that v2 admins will edit directly via
AdminOperationsActor (no draft staging). RowVersion enables
last-write-wins detection when two operators race on the same row.

GenerationId FKs are still in place on these entities (removed in Task 14b);
this commit only adds the rowversion column so the migration in Task 14f can
emit ADD COLUMN before DROP FK as a single atomic step.
2026-05-26 03:58:58 -04:00
Joseph Doherty 990ce343fe docs(plans): split Task 14 into 14a-14f (entity-model rewrite)
The original Task 14 (5-min EF migration that "drops ConfigGeneration") was
under-scoped: the design doc (live-edit model, ~line 208) requires removing
GenerationId from 13 entities (Equipment, DriverInstance, Device, Tag,
PollGroup, Namespace, UnsArea, UnsLine, NodeAcl, Script, VirtualTag,
ScriptedAlarm) and adding RowVersion columns for last-write-wins detection.
That cascades into GenerationApplier / GenerationDiff / GenerationSealedCache
and the legacy Server/Admin CRUD services.

New decomposition (~85 min total, replacing the original 5-min estimate):

  14a  standard   10m  Add RowVersion to live-edit entities
  14b  high-risk  30m  Drop GenerationId FK from those entities
  14c  high-risk  20m  Obsolete GenerationApplier/Diff/SealedCache
  14d  standard   5m   Drop ClusterNode.RedundancyRole
  14e  small      5m   Delete ConfigGeneration + ClusterNodeGenerationState
  14f  high-risk  15m  Consolidator: generate V2HostingAlignment migration

Policy decision (recorded with user): OtOpcUa.Server + OtOpcUa.Admin are
allowed to fail-to-compile between 14b and Task 56 - only the new v2 projects
need to stay green. Task 56 deletes the legacy projects.

Plan markdown: replaces the original Task 14 section with the 6-task
decomposition + a header explaining the rewrite. Task index table at the
bottom of the plan updated.

Tasks JSON: replaces the single Task 14 row with 6 string-id rows
("14a", "14b", ..., "14f"). Task 15 (Migrate-To-V2.ps1) and downstream
consumers re-pointed at "14f".

Verification step in 14f rewritten to use the shared docker host at
10.100.0.35 per CLAUDE.md (Docker is not installed on this Mac dev VM).
2026-05-26 03:55:48 -04:00
Joseph Doherty 8e2c4f2835 feat(configdb): add Deployment, NodeDeploymentState, ConfigEdit, DataProtectionKey entities
Phase 1 entities for the v2 live-edit + snapshot-deploy model:

  Deployment           — immutable artifact snapshot (replaces v1 ConfigGeneration row)
                         Status enum {Dispatching, AwaitingApplyAcks, Sealed,
                         PartiallyFailed, TimedOut}; carries the SHA256 RevisionHash and
                         the SnapshotAndFlatten() ArtifactBlob; RowVersion for optimistic
                         concurrency.
  NodeDeploymentState  — per-(node, deployment) apply progress row owned by
                         DriverHostActor (replaces single-row ClusterNodeGenerationState).
                         Composite key (NodeId, DeploymentId) gives the
                         ConfigPublishCoordinator the full history it needs to
                         reconstruct in-flight state after a failover.
  ConfigEdit           — append-only audit row written by AdminOperationsActor on every
                         mutating op; optional ExecutionId correlates edits inside one
                         admin transaction (e.g. an import batch).
  DataProtectionKey    — ASP.NET DataProtection key ring storage via
                         IDataProtectionKeyContext so every admin-role node decrypts
                         the same cookies without sharing a filesystem.

OtOpcUaConfigDbContext now implements IDataProtectionKeyContext and registers four new
DbSets + four new ConfigureXxx mappings.

Central package bumps (forced by Microsoft.AspNetCore.DataProtection.EntityFrameworkCore
10.0.7's transitive dep):

  Microsoft.EntityFrameworkCore.{,Design,InMemory,SqlServer}  10.0.0 -> 10.0.7
  Microsoft.Extensions.{Configuration.Abstractions,Configuration.Json,Hosting,Hosting.WindowsServices,Http}  10.0.0 -> 10.0.7

EF migration generation + the ConfigGeneration drop + RedundancyRole column removal are
deferred to Task 14 (high-risk, non-parallelizable).
2026-05-26 03:49:59 -04:00
Joseph Doherty 30a2104fa5 feat(scaffold): introduce 8 v2 component projects
Adds the empty project skeletons that subsequent v2 tasks fill in:

  src/Core/ZB.MOM.WW.OtOpcUa.Commons      (types, interfaces, message contracts)
  src/Core/ZB.MOM.WW.OtOpcUa.Cluster      (Akka.Hosting + cluster wiring)
  src/Server/ZB.MOM.WW.OtOpcUa.Security   (cookie+JWT auth, LDAP)
  src/Server/ZB.MOM.WW.OtOpcUa.ControlPlane (admin-role cluster singletons)
  src/Server/ZB.MOM.WW.OtOpcUa.Runtime    (per-node driver actors)
  src/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer (OPC UA SDK application host)
  src/Server/ZB.MOM.WW.OtOpcUa.AdminUI    (Razor class library)
  src/Server/ZB.MOM.WW.OtOpcUa.Host       (single fused web binary)

Each project sets TreatWarningsAsErrors=true in its own csproj (per the
Directory.Build.props deviation note in the previous commit). NuGetAuditSuppress
entries cover transitive vulnerability advisories the new strictness surfaces:

  - GHSA-g94r-2vxg-569j (OpenTelemetry.Api 1.9.0 via Akka.Cluster.Hosting/Tools)
  - GHSA-h958-fxgg-g7w3 (Opc.Ua.Core 1.5.374.126 via OpcUaServer)
  - GHSA-37gx-xxp4-5rgx + GHSA-w3x6-4m5h-cxqf (legacy advisories already accepted)

OpcUaServer pins OPCFoundation.NetStandard.Opc.Ua.Configuration to 1.5.374.126
via VersionOverride to match Opc.Ua.Server's transitive Opc.Ua.Core (same
constraint as the legacy Server project).

Runtime does NOT project-reference any concrete Driver.* assemblies; drivers
load reflectively at runtime (Phase 6). Runtime gets the IDriver contract
through Core.Abstractions instead.

Host's Microsoft.Extensions.Hosting.WindowsServices is conditional on the
Windows OS so the project builds on macOS dev machines.

Build verification: dotnet build -> 438 warnings (all pre-existing xUnit1051
in legacy Server.Tests/Admin.Tests), 0 errors. Closes Task 9 (build green
smoke check, no separate commit).
2026-05-26 03:44:56 -04:00
Joseph Doherty 2b811477d1 chore(build): introduce central package management for v2
Adds Directory.Packages.props (ManagePackageVersionsCentrally) and
Directory.Build.props (net10.0/nullable/implicit usings/LangVersion latest).
Strips Version attributes from every csproj PackageReference and consolidates
versions into the central file.

Side fixes (necessary to keep the build green on .NET SDK 10.0.105 on macOS):

- Microsoft.CodeAnalysis.CSharp{,.Workspaces}: 5.3.0 -> 5.0.0. The 5.3.0
  analyzer DLL references compiler 5.3.0.0 and the local SDK ships compiler
  5.0.0.0, producing CS9057 on every project that loaded the Analyzers
  output. Master itself was broken on this machine pre-change.
- Server + Server.Tests pin OPCFoundation.NetStandard.Opc.Ua.{Configuration,
  Client} to 1.5.374.126 via VersionOverride, matching Opc.Ua.Server's
  pin. Mixing 1.5.378.106 Opc.Ua.Core transitively with 1.5.374.126
  Opc.Ua.Server breaks CustomNodeManager2 override signatures
  (CS0115 on LoadPredefinedNodes/Browse/HistoryRead*) and CS7069 in
  the tests. The pin disappears when the legacy Server project is
  deleted in Task 56.
- Client.UI + Client.UI.Tests: NuGetAuditSuppress for
  GHSA-xrw6-gwf8-vvr9 (Tmds.DBus.Protocol 0.20.0 reaches both projects
  transitively from Avalonia.Desktop on Linux/macOS only).

Deviation from the plan: TreatWarningsAsErrors=true is NOT set in
Directory.Build.props because the pre-v2 Admin/Server test projects carry
~240 xUnit1051 analyzer warnings that would fail the build. New v2 projects
opt in via their own csproj; the global flag can return once the legacy
projects are deleted in Task 56.
2026-05-26 03:40:24 -04:00
Joseph Doherty fac32ad69b docs(plans): add v2 implementation plan with 66 bite-sized tasks
Converts the akka-hosting-alignment design into an executable plan:
12 phases covering branch/scaffold, ConfigDb schema, Commons,
Cluster, Security, ControlPlane singletons, Runtime per-node actors,
OpcUaServer extraction, AdminUI migration, Host entry point, cleanup,
integration tests, deploy scripts, and docs. Each task has files,
TDD steps, exact commands, classification, time estimate, and
parallelizable list. Co-located .tasks.json drives executing-plans
resume from any session.
2026-05-26 03:17:29 -04:00
Joseph Doherty ef4a70751c docs(plans): add v2 Akka + fused hosting alignment design
Captures the brainstormed design to align OtOpcUa with ScadaLink:
single role-gated binary, Akka.NET cluster with admin/driver roles,
cluster singletons for control plane, per-node actor hierarchy for
OPC UA runtime, dual-endpoint warm redundancy preserved with
ServiceLevel driven by Akka leader, cookie+JWT auth, Traefik routing,
and ScadaLink-style live-edit + deploy model replacing the
draft/publish ConfigGeneration lifecycle.
2026-05-26 03:04:21 -04:00
Joseph Doherty 866dc03fac style(ui): align admin styling with ScadaLink master conventions
- Move CSS into wwwroot/css/ (theme.css, site.css); sidebar 218 -> 220px
- Add hamburger + Bootstrap collapse for <lg viewports
- Add Components/Shared/ with LoadingSpinner, ToastNotification, StatusBadge
- Replace .page-title with flex + <h4 class="mb-0"> across 20 pages
- Convert NewCluster + IdentificationFields forms to card + h6 subsection pattern
2026-05-26 01:12:57 -04:00
1414 changed files with 70672 additions and 36214 deletions
+76
View File
@@ -0,0 +1,76 @@
# CI for the v2 branch — runs on every push + PR to the v2-akka-fuse / master
# branches. Layered into three jobs:
# build dotnet restore + build (fast feedback on compile errors)
# unit-tests every v2 unit-test project
# integration 2-node Host.IntegrationTests harness
#
# Skips E2E (Category=E2E) — that runs nightly via v2-e2e.yml against the full
# four-node docker-dev stack.
#
# Compatible with both GitHub Actions and Gitea Actions (act_runner). The .NET 10
# SDK is pinned via global.json at the repo root; if no global.json exists, the
# setup-dotnet step falls back to dotnet-version below.
name: v2-ci
on:
push:
branches: [v2-akka-fuse, master]
pull_request:
branches: [v2-akka-fuse, master]
workflow_dispatch: {}
env:
DOTNET_NOLOGO: "1"
DOTNET_CLI_TELEMETRY_OPTOUT: "1"
jobs:
build:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/setup-dotnet@v4
with:
dotnet-version: 10.0.x
- name: dotnet restore
run: dotnet restore ZB.MOM.WW.OtOpcUa.slnx
- name: dotnet build
run: dotnet build ZB.MOM.WW.OtOpcUa.slnx --no-restore --configuration Release
unit-tests:
needs: build
runs-on: ubuntu-latest
strategy:
fail-fast: false
matrix:
project:
- tests/Core/ZB.MOM.WW.OtOpcUa.Cluster.Tests
- tests/Server/ZB.MOM.WW.OtOpcUa.ControlPlane.Tests
- tests/Server/ZB.MOM.WW.OtOpcUa.Runtime.Tests
- tests/Server/ZB.MOM.WW.OtOpcUa.Security.Tests
- tests/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer.Tests
steps:
- uses: actions/checkout@v4
- uses: actions/setup-dotnet@v4
with:
dotnet-version: 10.0.x
- name: dotnet test ${{ matrix.project }}
run: dotnet test ${{ matrix.project }} --configuration Release --filter "Category!=E2E"
integration:
needs: build
runs-on: ubuntu-latest
strategy:
fail-fast: false
matrix:
project:
- tests/Server/ZB.MOM.WW.OtOpcUa.Host.IntegrationTests
- tests/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer.IntegrationTests
steps:
- uses: actions/checkout@v4
- uses: actions/setup-dotnet@v4
with:
dotnet-version: 10.0.x
- name: dotnet test ${{ matrix.project }}
run: dotnet test ${{ matrix.project }} --configuration Release --filter "Category!=E2E"
+57
View File
@@ -0,0 +1,57 @@
# Nightly E2E job. Runs against the docker-dev four-node fleet (admin-a +
# admin-b + driver-a + driver-b + SQL + LDAP + Traefik). Trigger:
# - cron at 03:00 UTC daily
# - workflow_dispatch from the Actions UI for on-demand runs
#
# The E2E test project (tests/Server/ZB.MOM.WW.OtOpcUa.E2ETests) does not yet
# exist — it lands when the F-series follow-ups F10/F11/F12 wire enough of the
# SDK/historian/probe so an end-to-end driver round-trip is meaningful. Until
# then this workflow is a green no-op (the `--filter Category=E2E` matches
# zero tests, and `dotnet test` returns 0).
name: v2-e2e
on:
schedule:
- cron: "0 3 * * *"
workflow_dispatch: {}
env:
DOTNET_NOLOGO: "1"
DOTNET_CLI_TELEMETRY_OPTOUT: "1"
jobs:
e2e:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/setup-dotnet@v4
with:
dotnet-version: 10.0.x
- name: dotnet restore
run: dotnet restore ZB.MOM.WW.OtOpcUa.slnx
- name: Build docker-dev fleet
run: docker compose -f docker-dev/docker-compose.yml up -d --build
- name: Wait for cluster
run: |
for i in $(seq 1 30); do
if curl -sf http://localhost/health/active >/dev/null; then
echo "Admin leader healthy after ${i}s"
exit 0
fi
sleep 2
done
echo "Timed out waiting for /health/active"
docker compose -f docker-dev/docker-compose.yml logs --tail=200
exit 1
- name: dotnet test (E2E only)
run: dotnet test --configuration Release --filter "Category=E2E"
- name: Tear down
if: always()
run: docker compose -f docker-dev/docker-compose.yml down -v
+8
View File
@@ -21,6 +21,8 @@ desktop.ini
# NuGet
packages/
*.nupkg
# … but DO track repo-local feed for mxaccessgw client (not yet on public nuget.org).
!nuget-packages/*.nupkg
# Certificates
*.pfx
@@ -40,3 +42,9 @@ config_cache*.db
# Client CLI/UI runtime scratch (last-connected endpoint cache)
session.dat
# Secrets / local credentials — never commit
sql_login.txt
# OPC UA certificate store (runtime PKI: own/trusted/issued/rejected certs + keys)
src/Server/ZB.MOM.WW.OtOpcUa.Host/pki/
+2
View File
@@ -150,3 +150,5 @@ dotnet run --project src/Client/ZB.MOM.WW.OtOpcUa.Client.CLI -- browse -u opc.tc
dotnet run --project src/Client/ZB.MOM.WW.OtOpcUa.Client.CLI -- read -u opc.tcp://localhost:4840 -n "ns=2;s=SomeNode"
dotnet run --project src/Client/ZB.MOM.WW.OtOpcUa.Client.CLI -- subscribe -u opc.tcp://localhost:4840 -n "ns=2;s=SomeNode" -i 500
```
Address pickers in AdminUI support live browse for OpcUaClient and Galaxy drivers — see `docs/plans/2026-05-28-driver-browsers-design.md`.
+18
View File
@@ -0,0 +1,18 @@
<Project>
<!--
Defaults inherited by every csproj. Individual projects may override.
Deviation from the original v2 plan: TreatWarningsAsErrors is NOT set globally because the
pre-v2 test projects (e.g. Admin.Tests) carry 240+ xUnit1051 analyzer warnings that would
fail the build. New v2 projects (Commons, Cluster, ControlPlane, Runtime, OpcUaServer, AdminUI,
Host, Security) MUST opt in to <TreatWarningsAsErrors>true</TreatWarningsAsErrors> in their
own csproj. Once the legacy Admin/Server projects are deleted (Phase 10, Task 56), this can
be promoted back to a global default.
-->
<PropertyGroup>
<TargetFramework>net10.0</TargetFramework>
<Nullable>enable</Nullable>
<ImplicitUsings>enable</ImplicitUsings>
<LangVersion>latest</LangVersion>
</PropertyGroup>
</Project>
+113
View File
@@ -0,0 +1,113 @@
<Project>
<PropertyGroup>
<ManagePackageVersionsCentrally>true</ManagePackageVersionsCentrally>
</PropertyGroup>
<ItemGroup>
<PackageVersion Include="Akka" Version="1.5.62" />
<PackageVersion Include="Akka.Cluster" Version="1.5.62" />
<PackageVersion Include="Akka.Cluster.Hosting" Version="1.5.62" />
<PackageVersion Include="Akka.Cluster.Tools" Version="1.5.62" />
<PackageVersion Include="Akka.Hosting" Version="1.5.62" />
<PackageVersion Include="Akka.Remote" Version="1.5.62" />
<PackageVersion Include="Akka.Remote.Hosting" Version="1.5.62" />
<PackageVersion Include="Akka.Streams" Version="1.5.62" />
<PackageVersion Include="Akka.Streams.TestKit" Version="1.5.62" />
<PackageVersion Include="Akka.TestKit.Xunit2" Version="1.5.62" />
<PackageVersion Include="Avalonia" Version="11.2.7" />
<PackageVersion Include="Avalonia.Controls.DataGrid" Version="11.2.7" />
<PackageVersion Include="Avalonia.Desktop" Version="11.2.7" />
<PackageVersion Include="Avalonia.Diagnostics" Version="11.2.7" />
<PackageVersion Include="Avalonia.Fonts.Inter" Version="11.2.7" />
<PackageVersion Include="Avalonia.Headless" Version="11.2.7" />
<PackageVersion Include="Avalonia.Svg.Skia" Version="11.2.0.2" />
<PackageVersion Include="Avalonia.Themes.Fluent" Version="11.2.7" />
<PackageVersion Include="Beckhoff.TwinCAT.Ads" Version="7.0.172" />
<PackageVersion Include="bunit" Version="2.0.33-preview" />
<PackageVersion Include="CliFx" Version="2.3.6" />
<PackageVersion Include="CommunityToolkit.Mvvm" Version="8.4.0" />
<PackageVersion Include="coverlet.collector" Version="6.0.4" />
<PackageVersion Include="FluentAssertions" Version="8.3.0" />
<PackageVersion Include="Google.Protobuf" Version="3.34.1" />
<PackageVersion Include="Grpc.Core.Api" Version="2.76.0" />
<PackageVersion Include="Grpc.Net.Client" Version="2.76.0" />
<PackageVersion Include="libplctag" Version="1.5.2" />
<PackageVersion Include="LiteDB" Version="5.0.21" />
<PackageVersion Include="MessagePack" Version="2.5.187" />
<PackageVersion Include="Microsoft.AspNetCore.Authorization" Version="10.0.7" />
<PackageVersion Include="Microsoft.AspNetCore.DataProtection" Version="10.0.7" />
<PackageVersion Include="Microsoft.AspNetCore.DataProtection.EntityFrameworkCore" Version="10.0.7" />
<PackageVersion Include="Microsoft.AspNetCore.Mvc.Testing" Version="10.0.0" />
<PackageVersion Include="Microsoft.AspNetCore.SignalR.Client" Version="10.0.0" />
<PackageVersion Include="Microsoft.AspNetCore.SignalR.Core" Version="1.2.0" />
<PackageVersion Include="Microsoft.AspNetCore.TestHost" Version="10.0.7" />
<!--
Roslyn analyzer packages pin to the same major version as the SDK's compiler.
.NET SDK 10.0.105 ships compiler 5.0.0.0. Microsoft.CodeAnalysis.CSharp 5.3.x emits
analyzer DLLs that reference compiler 5.3.0.0 and fail with CS9057 on the local SDK.
Pin to 5.0.0 (matches the compiler the SDK ships) until the SDK rolls to 10.0.110+.
-->
<PackageVersion Include="Microsoft.CodeAnalysis.CSharp" Version="5.0.0" />
<PackageVersion Include="Microsoft.CodeAnalysis.CSharp.Scripting" Version="4.12.0" />
<PackageVersion Include="Microsoft.CodeAnalysis.CSharp.Workspaces" Version="5.0.0" />
<PackageVersion Include="Microsoft.Data.SqlClient" Version="6.1.1" />
<PackageVersion Include="Microsoft.Data.Sqlite" Version="9.0.0" />
<PackageVersion Include="Microsoft.EntityFrameworkCore" Version="10.0.7" />
<PackageVersion Include="Microsoft.EntityFrameworkCore.Design" Version="10.0.7" />
<PackageVersion Include="Microsoft.EntityFrameworkCore.InMemory" Version="10.0.7" />
<PackageVersion Include="Microsoft.EntityFrameworkCore.SqlServer" Version="10.0.7" />
<PackageVersion Include="Microsoft.Extensions.Configuration.Abstractions" Version="10.0.7" />
<PackageVersion Include="Microsoft.Extensions.Configuration.Json" Version="10.0.7" />
<PackageVersion Include="Microsoft.Extensions.DependencyInjection" Version="10.0.7" />
<PackageVersion Include="Microsoft.Extensions.DependencyInjection.Abstractions" Version="10.0.7" />
<PackageVersion Include="Microsoft.Extensions.Hosting" Version="10.0.7" />
<PackageVersion Include="Microsoft.Extensions.Hosting.Abstractions" Version="10.0.7" />
<PackageVersion Include="Microsoft.Extensions.Hosting.WindowsServices" Version="10.0.7" />
<PackageVersion Include="Microsoft.Extensions.Http" Version="10.0.7" />
<PackageVersion Include="Microsoft.Extensions.Logging" Version="10.0.7" />
<PackageVersion Include="Microsoft.Extensions.Logging.Abstractions" Version="10.0.7" />
<PackageVersion Include="Microsoft.Extensions.Options" Version="10.0.7" />
<PackageVersion Include="Microsoft.Extensions.Options.ConfigurationExtensions" Version="10.0.7" />
<PackageVersion Include="Microsoft.IdentityModel.Tokens" Version="8.11.0" />
<PackageVersion Include="Microsoft.NET.Test.Sdk" Version="17.12.0" />
<PackageVersion Include="Microsoft.Playwright" Version="1.51.0" />
<PackageVersion Include="Moq" Version="4.20.72" />
<PackageVersion Include="Novell.Directory.Ldap.NETStandard" Version="3.6.0" />
<PackageVersion Include="OPCFoundation.NetStandard.Opc.Ua.Client" Version="1.5.378.106" />
<PackageVersion Include="OPCFoundation.NetStandard.Opc.Ua.Configuration" Version="1.5.378.106" />
<PackageVersion Include="OPCFoundation.NetStandard.Opc.Ua.Server" Version="1.5.374.126" />
<PackageVersion Include="OpenTelemetry.Exporter.Prometheus.AspNetCore" Version="1.15.3-beta.1" />
<PackageVersion Include="OpenTelemetry.Extensions.Hosting" Version="1.15.3" />
<PackageVersion Include="Polly.Core" Version="8.6.6" />
<PackageVersion Include="S7netplus" Version="0.20.0" />
<PackageVersion Include="Serilog" Version="4.3.1" />
<PackageVersion Include="Serilog.AspNetCore" Version="10.0.0" />
<PackageVersion Include="Serilog.Extensions.Hosting" Version="10.0.0" />
<PackageVersion Include="Serilog.Formatting.Compact" Version="3.0.0" />
<PackageVersion Include="Serilog.Settings.Configuration" Version="10.0.0" />
<PackageVersion Include="Serilog.Sinks.Console" Version="6.0.0" />
<PackageVersion Include="Serilog.Sinks.File" Version="7.0.0" />
<PackageVersion Include="Shouldly" Version="4.3.0" />
<PackageVersion Include="System.CommandLine" Version="2.0.5" />
<PackageVersion Include="System.Data.SqlClient" Version="4.9.0" />
<PackageVersion Include="System.IdentityModel.Tokens.Jwt" Version="8.11.0" />
<PackageVersion Include="System.IO.Pipes.AccessControl" Version="5.0.0" />
<PackageVersion Include="System.Memory" Version="4.5.5" />
<PackageVersion Include="System.Threading.Tasks.Extensions" Version="4.5.4" />
<PackageVersion Include="xunit" Version="2.9.2" />
<PackageVersion Include="xunit.runner.visualstudio" Version="3.0.2" />
<PackageVersion Include="xunit.v3" Version="1.1.0" />
<PackageVersion Include="ZB.MOM.WW.Health" Version="0.1.0" />
<PackageVersion Include="ZB.MOM.WW.Health.Akka" Version="0.1.0" />
<PackageVersion Include="ZB.MOM.WW.Health.EntityFrameworkCore" Version="0.1.0" />
<PackageVersion Include="ZB.MOM.WW.Telemetry" Version="0.1.0" />
<PackageVersion Include="ZB.MOM.WW.Telemetry.Serilog" Version="0.1.0" />
<PackageVersion Include="ZB.MOM.WW.MxGateway.Client" Version="0.1.0" />
<PackageVersion Include="ZB.MOM.WW.MxGateway.Contracts" Version="0.1.0" />
<PackageVersion Include="ZB.MOM.WW.Configuration" Version="0.1.0" />
<PackageVersion Include="ZB.MOM.WW.Auth.Abstractions" Version="0.1.1" />
<PackageVersion Include="ZB.MOM.WW.Auth.Ldap" Version="0.1.1" />
<PackageVersion Include="ZB.MOM.WW.Auth.AspNetCore" Version="0.1.1" />
<PackageVersion Include="ZB.MOM.WW.Audit" Version="0.1.0" />
<PackageVersion Include="ZB.MOM.WW.Theme" Version="0.2.0" />
</ItemGroup>
</Project>
+28
View File
@@ -0,0 +1,28 @@
<?xml version="1.0" encoding="utf-8"?>
<configuration>
<packageSources>
<clear />
<add key="nuget.org" value="https://api.nuget.org/v3/index.json" protocolVersion="3" />
<add key="local-mxgw" value="./nuget-packages" />
<add key="dohertj2-gitea" value="https://gitea.dohertylan.com/api/packages/dohertj2/nuget/index.json" />
</packageSources>
<packageSourceMapping>
<packageSource key="nuget.org">
<package pattern="*" />
</packageSource>
<packageSource key="local-mxgw">
<package pattern="ZB.MOM.WW.MxGateway.*" />
</packageSource>
<packageSource key="dohertj2-gitea">
<package pattern="ZB.MOM.WW.Health" />
<package pattern="ZB.MOM.WW.Health.*" />
<package pattern="ZB.MOM.WW.Telemetry" />
<package pattern="ZB.MOM.WW.Telemetry.*" />
<package pattern="ZB.MOM.WW.Configuration" />
<package pattern="ZB.MOM.WW.Auth" />
<package pattern="ZB.MOM.WW.Auth.*" />
<package pattern="ZB.MOM.WW.Audit" />
<package pattern="ZB.MOM.WW.Theme" />
</packageSource>
</packageSourceMapping>
</configuration>
+30 -5
View File
@@ -2,6 +2,8 @@
<Folder Name="/src/" />
<Folder Name="/src/Core/">
<Project Path="src/Core/ZB.MOM.WW.OtOpcUa.Core.Abstractions/ZB.MOM.WW.OtOpcUa.Core.Abstractions.csproj" />
<Project Path="src/Core/ZB.MOM.WW.OtOpcUa.Cluster/ZB.MOM.WW.OtOpcUa.Cluster.csproj" />
<Project Path="src/Core/ZB.MOM.WW.OtOpcUa.Commons/ZB.MOM.WW.OtOpcUa.Commons.csproj" />
<Project Path="src/Core/ZB.MOM.WW.OtOpcUa.Configuration/ZB.MOM.WW.OtOpcUa.Configuration.csproj" />
<Project Path="src/Core/ZB.MOM.WW.OtOpcUa.Core/ZB.MOM.WW.OtOpcUa.Core.csproj" />
<Project Path="src/Core/ZB.MOM.WW.OtOpcUa.Core.Scripting/ZB.MOM.WW.OtOpcUa.Core.Scripting.csproj" />
@@ -10,21 +12,36 @@
<Project Path="src/Core/ZB.MOM.WW.OtOpcUa.Core.AlarmHistorian/ZB.MOM.WW.OtOpcUa.Core.AlarmHistorian.csproj" />
</Folder>
<Folder Name="/src/Server/">
<Project Path="src/Server/ZB.MOM.WW.OtOpcUa.Server/ZB.MOM.WW.OtOpcUa.Server.csproj" />
<Project Path="src/Server/ZB.MOM.WW.OtOpcUa.Admin/ZB.MOM.WW.OtOpcUa.Admin.csproj" />
<Project Path="src/Server/ZB.MOM.WW.OtOpcUa.AdminUI/ZB.MOM.WW.OtOpcUa.AdminUI.csproj" />
<Project Path="src/Server/ZB.MOM.WW.OtOpcUa.ControlPlane/ZB.MOM.WW.OtOpcUa.ControlPlane.csproj" />
<Project Path="src/Server/ZB.MOM.WW.OtOpcUa.Host/ZB.MOM.WW.OtOpcUa.Host.csproj" />
<Project Path="src/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer/ZB.MOM.WW.OtOpcUa.OpcUaServer.csproj" />
<Project Path="src/Server/ZB.MOM.WW.OtOpcUa.Runtime/ZB.MOM.WW.OtOpcUa.Runtime.csproj" />
<Project Path="src/Server/ZB.MOM.WW.OtOpcUa.Security/ZB.MOM.WW.OtOpcUa.Security.csproj" />
</Folder>
<Folder Name="/src/Drivers/">
<Project Path="src/Drivers/ZB.MOM.WW.OtOpcUa.Driver.Galaxy/ZB.MOM.WW.OtOpcUa.Driver.Galaxy.csproj" />
<Project Path="src/Drivers/ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Browser/ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Browser.csproj" />
<Project Path="src/Drivers/ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Contracts/ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Contracts.csproj" />
<Project Path="src/Drivers/ZB.MOM.WW.OtOpcUa.Driver.Historian.Wonderware/ZB.MOM.WW.OtOpcUa.Driver.Historian.Wonderware.csproj" />
<Project Path="src/Drivers/ZB.MOM.WW.OtOpcUa.Driver.Historian.Wonderware.Client/ZB.MOM.WW.OtOpcUa.Driver.Historian.Wonderware.Client.csproj" />
<Project Path="src/Drivers/ZB.MOM.WW.OtOpcUa.Driver.Historian.Wonderware.Client.Contracts/ZB.MOM.WW.OtOpcUa.Driver.Historian.Wonderware.Client.Contracts.csproj" />
<Project Path="src/Drivers/ZB.MOM.WW.OtOpcUa.Driver.Modbus/ZB.MOM.WW.OtOpcUa.Driver.Modbus.csproj" />
<Project Path="src/Drivers/ZB.MOM.WW.OtOpcUa.Driver.Modbus.Addressing/ZB.MOM.WW.OtOpcUa.Driver.Modbus.Addressing.csproj" />
<Project Path="src/Drivers/ZB.MOM.WW.OtOpcUa.Driver.Modbus.Contracts/ZB.MOM.WW.OtOpcUa.Driver.Modbus.Contracts.csproj" />
<Project Path="src/Drivers/ZB.MOM.WW.OtOpcUa.Driver.S7/ZB.MOM.WW.OtOpcUa.Driver.S7.csproj" />
<Project Path="src/Drivers/ZB.MOM.WW.OtOpcUa.Driver.S7.Contracts/ZB.MOM.WW.OtOpcUa.Driver.S7.Contracts.csproj" />
<Project Path="src/Drivers/ZB.MOM.WW.OtOpcUa.Driver.AbCip/ZB.MOM.WW.OtOpcUa.Driver.AbCip.csproj" />
<Project Path="src/Drivers/ZB.MOM.WW.OtOpcUa.Driver.AbCip.Contracts/ZB.MOM.WW.OtOpcUa.Driver.AbCip.Contracts.csproj" />
<Project Path="src/Drivers/ZB.MOM.WW.OtOpcUa.Driver.AbLegacy/ZB.MOM.WW.OtOpcUa.Driver.AbLegacy.csproj" />
<Project Path="src/Drivers/ZB.MOM.WW.OtOpcUa.Driver.AbLegacy.Contracts/ZB.MOM.WW.OtOpcUa.Driver.AbLegacy.Contracts.csproj" />
<Project Path="src/Drivers/ZB.MOM.WW.OtOpcUa.Driver.TwinCAT/ZB.MOM.WW.OtOpcUa.Driver.TwinCAT.csproj" />
<Project Path="src/Drivers/ZB.MOM.WW.OtOpcUa.Driver.TwinCAT.Contracts/ZB.MOM.WW.OtOpcUa.Driver.TwinCAT.Contracts.csproj" />
<Project Path="src/Drivers/ZB.MOM.WW.OtOpcUa.Driver.FOCAS/ZB.MOM.WW.OtOpcUa.Driver.FOCAS.csproj" />
<Project Path="src/Drivers/ZB.MOM.WW.OtOpcUa.Driver.FOCAS.Contracts/ZB.MOM.WW.OtOpcUa.Driver.FOCAS.Contracts.csproj" />
<Project Path="src/Drivers/ZB.MOM.WW.OtOpcUa.Driver.OpcUaClient/ZB.MOM.WW.OtOpcUa.Driver.OpcUaClient.csproj" />
<Project Path="src/Drivers/ZB.MOM.WW.OtOpcUa.Driver.OpcUaClient.Contracts/ZB.MOM.WW.OtOpcUa.Driver.OpcUaClient.Contracts.csproj" />
<Project Path="src/Drivers/ZB.MOM.WW.OtOpcUa.Driver.OpcUaClient.Browser/ZB.MOM.WW.OtOpcUa.Driver.OpcUaClient.Browser.csproj" />
</Folder>
<Folder Name="/src/Drivers/Driver CLIs/">
<Project Path="src/Drivers/Cli/ZB.MOM.WW.OtOpcUa.Driver.Cli.Common/ZB.MOM.WW.OtOpcUa.Driver.Cli.Common.csproj" />
@@ -46,6 +63,7 @@
<Folder Name="/tests/" />
<Folder Name="/tests/Core/">
<Project Path="tests/Core/ZB.MOM.WW.OtOpcUa.Core.Abstractions.Tests/ZB.MOM.WW.OtOpcUa.Core.Abstractions.Tests.csproj" />
<Project Path="tests/Core/ZB.MOM.WW.OtOpcUa.Cluster.Tests/ZB.MOM.WW.OtOpcUa.Cluster.Tests.csproj" />
<Project Path="tests/Core/ZB.MOM.WW.OtOpcUa.Configuration.Tests/ZB.MOM.WW.OtOpcUa.Configuration.Tests.csproj" />
<Project Path="tests/Core/ZB.MOM.WW.OtOpcUa.Core.Tests/ZB.MOM.WW.OtOpcUa.Core.Tests.csproj" />
<Project Path="tests/Core/ZB.MOM.WW.OtOpcUa.Core.Scripting.Tests/ZB.MOM.WW.OtOpcUa.Core.Scripting.Tests.csproj" />
@@ -54,12 +72,17 @@
<Project Path="tests/Core/ZB.MOM.WW.OtOpcUa.Core.AlarmHistorian.Tests/ZB.MOM.WW.OtOpcUa.Core.AlarmHistorian.Tests.csproj" />
</Folder>
<Folder Name="/tests/Server/">
<Project Path="tests/Server/ZB.MOM.WW.OtOpcUa.Server.Tests/ZB.MOM.WW.OtOpcUa.Server.Tests.csproj" />
<Project Path="tests/Server/ZB.MOM.WW.OtOpcUa.Admin.Tests/ZB.MOM.WW.OtOpcUa.Admin.Tests.csproj" />
<Project Path="tests/Server/ZB.MOM.WW.OtOpcUa.Admin.E2ETests/ZB.MOM.WW.OtOpcUa.Admin.E2ETests.csproj" />
<Project Path="tests/Server/ZB.MOM.WW.OtOpcUa.AdminUI.Tests/ZB.MOM.WW.OtOpcUa.AdminUI.Tests.csproj" />
<Project Path="tests/Server/ZB.MOM.WW.OtOpcUa.ControlPlane.Tests/ZB.MOM.WW.OtOpcUa.ControlPlane.Tests.csproj" />
<Project Path="tests/Server/ZB.MOM.WW.OtOpcUa.Host.IntegrationTests/ZB.MOM.WW.OtOpcUa.Host.IntegrationTests.csproj" />
<Project Path="tests/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer.IntegrationTests/ZB.MOM.WW.OtOpcUa.OpcUaServer.IntegrationTests.csproj" />
<Project Path="tests/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer.Tests/ZB.MOM.WW.OtOpcUa.OpcUaServer.Tests.csproj" />
<Project Path="tests/Server/ZB.MOM.WW.OtOpcUa.Runtime.Tests/ZB.MOM.WW.OtOpcUa.Runtime.Tests.csproj" />
<Project Path="tests/Server/ZB.MOM.WW.OtOpcUa.Security.Tests/ZB.MOM.WW.OtOpcUa.Security.Tests.csproj" />
</Folder>
<Folder Name="/tests/Drivers/">
<Project Path="tests/Drivers/ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Tests/ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Tests.csproj" />
<Project Path="tests/Drivers/ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Browser.Tests/ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Browser.Tests.csproj" />
<Project Path="tests/Drivers/ZB.MOM.WW.OtOpcUa.Driver.Historian.Wonderware.Tests/ZB.MOM.WW.OtOpcUa.Driver.Historian.Wonderware.Tests.csproj" />
<Project Path="tests/Drivers/ZB.MOM.WW.OtOpcUa.Driver.Historian.Wonderware.Client.Tests/ZB.MOM.WW.OtOpcUa.Driver.Historian.Wonderware.Client.Tests.csproj" />
<Project Path="tests/Drivers/ZB.MOM.WW.OtOpcUa.Driver.Modbus.Tests/ZB.MOM.WW.OtOpcUa.Driver.Modbus.Tests.csproj" />
@@ -77,6 +100,8 @@
<Project Path="tests/Drivers/ZB.MOM.WW.OtOpcUa.Driver.FOCAS.IntegrationTests/ZB.MOM.WW.OtOpcUa.Driver.FOCAS.IntegrationTests.csproj" />
<Project Path="tests/Drivers/ZB.MOM.WW.OtOpcUa.Driver.OpcUaClient.Tests/ZB.MOM.WW.OtOpcUa.Driver.OpcUaClient.Tests.csproj" />
<Project Path="tests/Drivers/ZB.MOM.WW.OtOpcUa.Driver.OpcUaClient.IntegrationTests/ZB.MOM.WW.OtOpcUa.Driver.OpcUaClient.IntegrationTests.csproj" />
<Project Path="tests/Drivers/ZB.MOM.WW.OtOpcUa.Driver.OpcUaClient.Browser.Tests/ZB.MOM.WW.OtOpcUa.Driver.OpcUaClient.Browser.Tests.csproj" />
<Project Path="tests/Drivers/ZB.MOM.WW.OtOpcUa.Driver.OpcUaClient.Browser.IntegrationTests/ZB.MOM.WW.OtOpcUa.Driver.OpcUaClient.Browser.IntegrationTests.csproj" />
</Folder>
<Folder Name="/tests/Drivers/Driver CLIs/">
<Project Path="tests/Drivers/Cli/ZB.MOM.WW.OtOpcUa.Driver.Cli.Common.Tests/ZB.MOM.WW.OtOpcUa.Driver.Cli.Common.Tests.csproj" />
+20
View File
@@ -0,0 +1,20 @@
# Multi-stage build of OtOpcUa.Host targeting linux-x64. Used by docker-dev/docker-compose.yml
# to spin four host containers (admin-a, admin-b, driver-a, driver-b) from a single image —
# Compose drives OTOPCUA_ROLES + Cluster:* env per container to differentiate them.
FROM mcr.microsoft.com/dotnet/sdk:10.0 AS build
WORKDIR /src
COPY . .
RUN dotnet restore ZB.MOM.WW.OtOpcUa.slnx
RUN dotnet publish src/Server/ZB.MOM.WW.OtOpcUa.Host/ZB.MOM.WW.OtOpcUa.Host.csproj \
-c Release -o /app --no-restore
FROM mcr.microsoft.com/dotnet/aspnet:10.0 AS runtime
WORKDIR /app
COPY --from=build /app ./
EXPOSE 9000
EXPOSE 4053
EXPOSE 4840
ENTRYPOINT ["dotnet", "OtOpcUa.Host.dll"]
+112
View File
@@ -0,0 +1,112 @@
# docker-dev
Mac-friendly multi-cluster OtOpcUa fleet for manual UI exercise + integration smoke tests. Spins up **three isolated Akka clusters** + SQL Server + OpenLDAP + Traefik on the same Compose network. All three clusters share the single `OtOpcUa` ConfigDb — multi-tenancy is enforced by per-row `ServerCluster.ClusterId` scoping. Akka.Cluster gossip stays isolated between meshes because their seed-node lists are disjoint, even though they share the same system name `otopcua`.
## Stack
### Shared infrastructure
| Service | Role | Ports |
|---|---|---|
| `sql` | SQL Server 2022 — single `OtOpcUa` ConfigDb shared by all three clusters | host `14330` → container `1433` |
| `traefik` | Routes :80 by Host header / PathPrefix | host `80`, dashboard `8089` |
Authentication runs in `DevStubMode` — every host container has `Authentication__Ldap__DevStubMode=true` set, so the LDAP service is not part of the dev compose right now (the `bitnami/openldap:2.6` image was retired and the legacy tag crashes mid-setup with exit 68). Any non-empty username/password signs in as `FleetAdmin`. To restore a real LDAP service, drop the env var and add an `openldap`-compatible image back to compose.
### Main cluster — split admin/driver roles
| Service | Role | Ports |
|---|---|---|
| `admin-a` | `OTOPCUA_ROLES=admin`, cluster seed | internal `9000` |
| `admin-b` | `OTOPCUA_ROLES=admin`, joins admin-a | internal `9000` |
| `driver-a` | `OTOPCUA_ROLES=driver` | host `4840` → container `4840` |
| `driver-b` | `OTOPCUA_ROLES=driver` | host `4841` → container `4840` |
### Site A cluster — 2-node fused admin+driver
| Service | Role | Ports |
|---|---|---|
| `site-a-1` | `OTOPCUA_ROLES=admin,driver`, cluster seed | host `4842` → container `4840` |
| `site-a-2` | `OTOPCUA_ROLES=admin,driver`, joins site-a-1 | host `4843` → container `4840` |
### Site B cluster — 2-node fused admin+driver
| Service | Role | Ports |
|---|---|---|
| `site-b-1` | `OTOPCUA_ROLES=admin,driver`, cluster seed | host `4844` → container `4840` |
| `site-b-2` | `OTOPCUA_ROLES=admin,driver`, joins site-b-1 | host `4845` → container `4840` |
All containers bind Akka remoting to port `4053` inside their own network namespace; the `PublicHostname` of each matches its Compose service name. Akka mesh isolation is enforced purely by disjoint seed lists. Configuration-side isolation is enforced by `ServerCluster.ClusterId` — see "Multi-tenancy" below.
## Multi-tenancy
All eight host nodes write to the same `OtOpcUa` ConfigDb. The `ServerCluster` table differentiates the three Akka meshes: each Akka cluster maps to one row, and each `ClusterNode` row's `ClusterId` ties the runtime node back to its owning cluster scope.
A one-shot `cluster-seed` Compose service (image `mcr.microsoft.com/mssql-tools`) waits for SQL + the EF auto-migration to complete and then INSERTs the rows below. The seed is **idempotent**`IF NOT EXISTS` guards every insert — so re-runs on `docker compose up` are no-ops:
| Akka mesh | `ServerCluster.ClusterId` | `ClusterNode.NodeId` rows |
|---|---|---|
| Main | `MAIN` | `driver-a`, `driver-b` (OPC UA publishers) |
| Site A | `SITE-A` | `site-a-1`, `site-a-2` |
| Site B | `SITE-B` | `site-b-1`, `site-b-2` |
`ClusterNode` is the table for **OPC UA-publishing nodes** (not every Akka cluster member), which is why the main cluster's `admin-a` / `admin-b` don't get rows — they're control-plane-only.
Each `ClusterNode.NodeId` matches the node's `Cluster__PublicHostname` env value (Compose service name) — that's the lookup the runtime uses to resolve its own membership. `ApplicationUri` follows the `urn:OtOpcUa:<NodeId>` convention.
The SQL lives at `seed/seed-clusters.sql`; the wait-and-apply wrapper lives at `seed/entrypoint.sh`. To re-seed manually:
```bash
docker compose -f docker-dev/docker-compose.yml run --rm cluster-seed
```
### Galaxy / MxAccess gateway
The seed also pre-creates a `SystemPlatform` Namespace + a `GalaxyMxGateway` DriverInstance in the MAIN cluster pointing at `http://10.100.0.48:5120`. The API key is resolved from the `GALAXY_MXGW_API_KEY` env var set on every driver-role container in compose; override via `GALAXY_MXGW_API_KEY=... docker compose up -d` to swap keys without editing the compose file.
The DriverHost actor doesn't spawn drivers from raw DriverInstance rows on its own — the v2 deploy lifecycle requires a *sealed Deployment* before drivers materialise. After first bring-up, sign in to the Admin UI and click **Deploy current configuration** on `/deployments` to compose the seeded rows into an artifact and dispatch it. The Galaxy driver instance will start its gRPC connection to the gateway on the next deploy ack.
## Bring up
```bash
# from the repo root
docker compose -f docker-dev/docker-compose.yml up -d --build
# wait ~20 seconds for SQL to come up + all three clusters to form
open http://localhost # main cluster admin UI
open http://site-a.localhost # site A admin UI
open http://site-b.localhost # site B admin UI
open http://localhost:8089 # Traefik dashboard
```
On macOS, `*.localhost` resolves to `127.0.0.1` automatically. On Linux add `127.0.0.1 site-a.localhost site-b.localhost` to `/etc/hosts` if your resolver doesn't.
The first build takes a few minutes (.NET SDK image + restore + publish). Subsequent rebuilds are faster with Docker's layer cache.
## Auth (dev only)
`Authentication__Ldap__DevStubMode=true` is set on every host container, so any non-empty username/password signs in as a `FleetAdmin` user without contacting an LDAP server. **Do not** ship this configuration to production — set `DevStubMode=false` and wire a real LDAP backend before any non-dev deployment.
## Tear down
```bash
docker compose -f docker-dev/docker-compose.yml down -v
```
The `-v` drops the SQL + LDAP volumes; remove it to keep ConfigDb state across restarts.
## Failover smoke
1. Watch the Traefik dashboard at `http://localhost:8089`. Both `admin-a` and `admin-b` should be listed as healthy in the `otopcua-admin` service.
2. `docker compose -f docker-dev/docker-compose.yml stop admin-a``admin-b` should pick up the admin role-leader within ~15 s (Akka split-brain stable-after). Traefik will route traffic to `admin-b` once its `/health/active` returns 200.
3. `docker compose -f docker-dev/docker-compose.yml start admin-a``admin-a` rejoins as a follower; `admin-b` keeps the leader role until something disturbs it.
## Notes
- This compose is for the **local Mac/Linux developer rig**. The team's CI + soak runs go to the remote docker host at `10.100.0.35` (see `docs/v2/dev-environment.md`); the file there mirrors this one with adjusted port bindings.
- The OPC UA driver endpoints are reachable directly from the host (Traefik is only in front of the admin HTTP surface):
- Main: `opc.tcp://localhost:4840` (driver-a), `opc.tcp://localhost:4841` (driver-b)
- Site A: `opc.tcp://localhost:4842` (site-a-1), `opc.tcp://localhost:4843` (site-a-2)
- Site B: `opc.tcp://localhost:4844` (site-b-1), `opc.tcp://localhost:4845` (site-b-2)
- Galaxy + Wonderware drivers can't run in Linux containers (they need the Windows-only mxaccessgw + Historian SDK). On non-Windows, `DriverInstanceActor.ShouldStub(driverType, roles)` returns `true` for those types and the actor goes straight to a `Stubbed` state that returns deterministic success.
+261
View File
@@ -0,0 +1,261 @@
# docker-dev/ — Mac-friendly multi-cluster fleet for v2 development + manual UI exercise.
#
# Stack (3 separate Akka clusters — all share the single `OtOpcUa` ConfigDb):
# sql SQL Server 2022 — hosts the one ConfigDb that all three clusters use
# ldap OpenLDAP with the dev users from C:\publish\glauth\auth.md mirrored in
#
# Main cluster (existing — split-role admin / driver pair on a single Akka mesh):
# admin-a OtOpcUa.Host with OTOPCUA_ROLES=admin (seed)
# admin-b OtOpcUa.Host with OTOPCUA_ROLES=admin (joins admin-a)
# driver-a OtOpcUa.Host with OTOPCUA_ROLES=driver (joins via admin-a)
# driver-b OtOpcUa.Host with OTOPCUA_ROLES=driver (joins via admin-a)
#
# Site A cluster (2-node fused admin+driver):
# site-a-1, site-a-2 OTOPCUA_ROLES=admin,driver, seed = site-a-1
#
# Site B cluster (2-node fused admin+driver):
# site-b-1, site-b-2 OTOPCUA_ROLES=admin,driver, seed = site-b-1
#
# traefik PathPrefix → main cluster admin-a/admin-b; Host(`site-a.localhost`) →
# site-a-*; Host(`site-b.localhost`) → site-b-*. Add the two site hosts to
# your /etc/hosts (or rely on macOS `.localhost` auto-resolution).
#
# Multi-tenancy: ConfigDb is one schema with a `ServerCluster` table; each Akka cluster
# corresponds to a row in it (ClusterId = "MAIN" / "SITE-A" / "SITE-B"), and each node's
# `ClusterNode.NodeId` points back at the row that owns it. After first boot, sign in to
# any cluster's Admin UI and create the matching ServerCluster + ClusterNode rows via
# /clusters and /hosts so the runtime knows what configuration scope applies.
#
# Akka mesh isolation: same system name "otopcua" + same remoting port 4053 inside each
# container's own network namespace, but with disjoint seed-node lists — gossip never
# crosses between the three meshes.
#
# Usage:
# docker compose -f docker-dev/docker-compose.yml up -d --build
# open http://localhost # main cluster Blazor admin UI
# open http://site-a.localhost # site A admin UI
# open http://site-b.localhost # site B admin UI
# open http://localhost:8089 # Traefik dashboard (8080 is the sister scadalink stack)
#
# Tear-down: docker compose -f docker-dev/docker-compose.yml down -v
name: otopcua-dev
services:
sql:
image: mcr.microsoft.com/mssql/server:2022-latest
environment:
ACCEPT_EULA: "Y"
SA_PASSWORD: "OtOpcUa!Dev123"
MSSQL_PID: Developer
ports:
- "14330:1433"
healthcheck:
test: ["CMD-SHELL", "/opt/mssql-tools18/bin/sqlcmd -S localhost -U sa -P 'OtOpcUa!Dev123' -No -Q 'SELECT 1' || exit 1"]
interval: 10s
timeout: 5s
retries: 20
# ── Cluster seed (one-shot) ────────────────────────────────────────────────
# Waits for SQL + the host containers' EF auto-migration, then INSERTs the
# three ServerCluster rows and the six ClusterNode rows that scope each Akka
# mesh inside the shared OtOpcUa ConfigDb. Idempotent — re-runs are no-ops.
cluster-seed:
image: mcr.microsoft.com/mssql-tools:latest
depends_on:
sql:
condition: service_healthy
volumes:
- ./seed:/seed:ro
entrypoint: ["/bin/bash", "/seed/entrypoint.sh"]
restart: "no"
# OpenLDAP was previously here but the bitnami/openldap:2.6 image was retired
# (manifest gone) and bitnamilegacy/openldap:2.6 crashes during LDIF setup with
# exit 68. For the dev compose every host container now runs with
# Authentication__Ldap__DevStubMode=true, so any non-empty username/password
# signs in as `FleetAdmin`. Restore a real LDAP service when there's a need
# for end-to-end LDAP coverage (the host code path is unchanged).
admin-a: &otopcua-host
build:
context: ..
dockerfile: docker-dev/Dockerfile
image: otopcua-host:dev
depends_on:
sql: { condition: service_healthy }
environment:
OTOPCUA_ROLES: "admin"
ASPNETCORE_URLS: "http://+:9000"
ConnectionStrings__ConfigDb: "Server=sql,1433;Database=OtOpcUa;User Id=sa;Password=OtOpcUa!Dev123;TrustServerCertificate=True;"
Cluster__Hostname: "0.0.0.0"
Cluster__Port: "4053"
Cluster__PublicHostname: "admin-a"
Cluster__SeedNodes__0: "akka.tcp://otopcua@admin-a:4053"
Cluster__Roles__0: "admin"
Security__Jwt__SigningKey: "docker-dev-signing-key-with-at-least-32-bytes-of-utf8-content-12345"
Security__Jwt__Issuer: "otopcua-dev"
Security__Jwt__Audience: "otopcua-dev"
Authentication__Ldap__DevStubMode: "true"
GALAXY_MXGW_API_KEY: "${GALAXY_MXGW_API_KEY:-mxgw_otopcua2_GI7-tNozYE6cXGUSgEzL3AHDV7bYcYIHdMwKYgyHdX4}"
admin-b:
<<: *otopcua-host
environment:
OTOPCUA_ROLES: "admin"
ASPNETCORE_URLS: "http://+:9000"
ConnectionStrings__ConfigDb: "Server=sql,1433;Database=OtOpcUa;User Id=sa;Password=OtOpcUa!Dev123;TrustServerCertificate=True;"
Cluster__Hostname: "0.0.0.0"
Cluster__Port: "4053"
Cluster__PublicHostname: "admin-b"
Cluster__SeedNodes__0: "akka.tcp://otopcua@admin-a:4053"
Cluster__Roles__0: "admin"
Security__Jwt__SigningKey: "docker-dev-signing-key-with-at-least-32-bytes-of-utf8-content-12345"
Security__Jwt__Issuer: "otopcua-dev"
Security__Jwt__Audience: "otopcua-dev"
Authentication__Ldap__DevStubMode: "true"
GALAXY_MXGW_API_KEY: "${GALAXY_MXGW_API_KEY:-mxgw_otopcua2_GI7-tNozYE6cXGUSgEzL3AHDV7bYcYIHdMwKYgyHdX4}"
driver-a:
<<: *otopcua-host
environment:
OTOPCUA_ROLES: "driver"
ConnectionStrings__ConfigDb: "Server=sql,1433;Database=OtOpcUa;User Id=sa;Password=OtOpcUa!Dev123;TrustServerCertificate=True;"
Cluster__Hostname: "0.0.0.0"
Cluster__Port: "4053"
Cluster__PublicHostname: "driver-a"
Cluster__SeedNodes__0: "akka.tcp://otopcua@admin-a:4053"
Cluster__Roles__0: "driver"
# Resolved at runtime by GalaxyDriver.ResolveApiKey when a DriverInstance's
# Gateway.ApiKeySecretRef = "env:GALAXY_MXGW_API_KEY".
GALAXY_MXGW_API_KEY: "${GALAXY_MXGW_API_KEY:-mxgw_otopcua2_GI7-tNozYE6cXGUSgEzL3AHDV7bYcYIHdMwKYgyHdX4}"
ports:
- "4840:4840"
driver-b:
<<: *otopcua-host
environment:
OTOPCUA_ROLES: "driver"
ConnectionStrings__ConfigDb: "Server=sql,1433;Database=OtOpcUa;User Id=sa;Password=OtOpcUa!Dev123;TrustServerCertificate=True;"
Cluster__Hostname: "0.0.0.0"
Cluster__Port: "4053"
Cluster__PublicHostname: "driver-b"
Cluster__SeedNodes__0: "akka.tcp://otopcua@admin-a:4053"
Cluster__Roles__0: "driver"
GALAXY_MXGW_API_KEY: "${GALAXY_MXGW_API_KEY:-mxgw_otopcua2_GI7-tNozYE6cXGUSgEzL3AHDV7bYcYIHdMwKYgyHdX4}"
ports:
- "4841:4840"
# ── Site A cluster (2-node fused admin+driver) ──────────────────────────────
# Shares the OtOpcUa ConfigDb with the main + site-b clusters; multi-tenancy is
# enforced by ServerCluster.ClusterId rows (configure via /clusters after boot).
# Akka isolation comes from the disjoint seed list (seed = site-a-1).
site-a-1:
<<: *otopcua-host
environment:
OTOPCUA_ROLES: "admin,driver"
ASPNETCORE_URLS: "http://+:9000"
ConnectionStrings__ConfigDb: "Server=sql,1433;Database=OtOpcUa;User Id=sa;Password=OtOpcUa!Dev123;TrustServerCertificate=True;"
Cluster__Hostname: "0.0.0.0"
Cluster__Port: "4053"
Cluster__PublicHostname: "site-a-1"
Cluster__SeedNodes__0: "akka.tcp://otopcua@site-a-1:4053"
Cluster__Roles__0: "admin"
Cluster__Roles__1: "driver"
Security__Jwt__SigningKey: "docker-dev-signing-key-with-at-least-32-bytes-of-utf8-content-12345"
Security__Jwt__Issuer: "otopcua-dev"
Security__Jwt__Audience: "otopcua-dev"
Authentication__Ldap__DevStubMode: "true"
GALAXY_MXGW_API_KEY: "${GALAXY_MXGW_API_KEY:-mxgw_otopcua2_GI7-tNozYE6cXGUSgEzL3AHDV7bYcYIHdMwKYgyHdX4}"
ports:
- "4842:4840"
site-a-2:
<<: *otopcua-host
depends_on:
sql: { condition: service_healthy }
site-a-1: { condition: service_started }
environment:
OTOPCUA_ROLES: "admin,driver"
ASPNETCORE_URLS: "http://+:9000"
ConnectionStrings__ConfigDb: "Server=sql,1433;Database=OtOpcUa;User Id=sa;Password=OtOpcUa!Dev123;TrustServerCertificate=True;"
Cluster__Hostname: "0.0.0.0"
Cluster__Port: "4053"
Cluster__PublicHostname: "site-a-2"
Cluster__SeedNodes__0: "akka.tcp://otopcua@site-a-1:4053"
Cluster__Roles__0: "admin"
Cluster__Roles__1: "driver"
Security__Jwt__SigningKey: "docker-dev-signing-key-with-at-least-32-bytes-of-utf8-content-12345"
Security__Jwt__Issuer: "otopcua-dev"
Security__Jwt__Audience: "otopcua-dev"
Authentication__Ldap__DevStubMode: "true"
GALAXY_MXGW_API_KEY: "${GALAXY_MXGW_API_KEY:-mxgw_otopcua2_GI7-tNozYE6cXGUSgEzL3AHDV7bYcYIHdMwKYgyHdX4}"
ports:
- "4843:4840"
# ── Site B cluster (2-node fused admin+driver) ──────────────────────────────
site-b-1:
<<: *otopcua-host
environment:
OTOPCUA_ROLES: "admin,driver"
ASPNETCORE_URLS: "http://+:9000"
ConnectionStrings__ConfigDb: "Server=sql,1433;Database=OtOpcUa;User Id=sa;Password=OtOpcUa!Dev123;TrustServerCertificate=True;"
Cluster__Hostname: "0.0.0.0"
Cluster__Port: "4053"
Cluster__PublicHostname: "site-b-1"
Cluster__SeedNodes__0: "akka.tcp://otopcua@site-b-1:4053"
Cluster__Roles__0: "admin"
Cluster__Roles__1: "driver"
Security__Jwt__SigningKey: "docker-dev-signing-key-with-at-least-32-bytes-of-utf8-content-12345"
Security__Jwt__Issuer: "otopcua-dev"
Security__Jwt__Audience: "otopcua-dev"
Authentication__Ldap__DevStubMode: "true"
GALAXY_MXGW_API_KEY: "${GALAXY_MXGW_API_KEY:-mxgw_otopcua2_GI7-tNozYE6cXGUSgEzL3AHDV7bYcYIHdMwKYgyHdX4}"
ports:
- "4844:4840"
site-b-2:
<<: *otopcua-host
depends_on:
sql: { condition: service_healthy }
site-b-1: { condition: service_started }
environment:
OTOPCUA_ROLES: "admin,driver"
ASPNETCORE_URLS: "http://+:9000"
ConnectionStrings__ConfigDb: "Server=sql,1433;Database=OtOpcUa;User Id=sa;Password=OtOpcUa!Dev123;TrustServerCertificate=True;"
Cluster__Hostname: "0.0.0.0"
Cluster__Port: "4053"
Cluster__PublicHostname: "site-b-2"
Cluster__SeedNodes__0: "akka.tcp://otopcua@site-b-1:4053"
Cluster__Roles__0: "admin"
Cluster__Roles__1: "driver"
Security__Jwt__SigningKey: "docker-dev-signing-key-with-at-least-32-bytes-of-utf8-content-12345"
Security__Jwt__Issuer: "otopcua-dev"
Security__Jwt__Audience: "otopcua-dev"
Authentication__Ldap__DevStubMode: "true"
GALAXY_MXGW_API_KEY: "${GALAXY_MXGW_API_KEY:-mxgw_otopcua2_GI7-tNozYE6cXGUSgEzL3AHDV7bYcYIHdMwKYgyHdX4}"
ports:
- "4845:4840"
traefik:
image: traefik:v3.1
command:
- --entrypoints.web.address=:80
- --providers.file.filename=/etc/traefik/dynamic.yml
- --providers.file.watch=true
- --api.insecure=true
ports:
- "9200:80" # host port 9200 → traefik :80 entrypoint (80 conflicts with scadabridge-traefik)
- "8089:8080" # 8080 conflicts with the sister scadalink dev stack
volumes:
- ./traefik-dynamic.yml:/etc/traefik/dynamic.yml:ro
depends_on:
- admin-a
- admin-b
- site-a-1
- site-a-2
- site-b-1
- site-b-2
+48
View File
@@ -0,0 +1,48 @@
#!/usr/bin/env bash
# docker-dev cluster-seed entrypoint. Waits for the OtOpcUa ConfigDb schema to
# be in place, then applies the idempotent row seed.
#
# IMPORTANT: this container does NOT run EF migrations — sqlcmd can't execute
# the V2 migration script cleanly because it contains CREATE PROCEDURE
# statements inside IF NOT EXISTS BEGIN ... END blocks (procs must be the
# first statement in their batch). Migrations are owned by the operator:
#
# dotnet ef database update \
# --project src/Core/ZB.MOM.WW.OtOpcUa.Configuration \
# --startup-project src/Server/ZB.MOM.WW.OtOpcUa.Host
#
# (with ConnectionStrings__ConfigDb pointing at Server=localhost,14330;...).
# Once the schema is in place, restart the cluster-seed container — or just
# `docker compose up -d` and the seed will pick up where it left off thanks to
# the IF NOT EXISTS guards in seed-clusters.sql.
set -euo pipefail
SQLCMD="/opt/mssql-tools/bin/sqlcmd"
SERVER="${SQL_HOST:-sql},1433"
USER="${SQL_USER:-sa}"
PASS="${SQL_PASSWORD:-OtOpcUa!Dev123}"
DB="${SQL_DATABASE:-OtOpcUa}"
run_sql_in() {
local target_db="$1"; shift
# -I forces SET QUOTED_IDENTIFIER ON (needed for filtered indexes if you
# ever extend this script to touch them).
"$SQLCMD" -S "$SERVER" -U "$USER" -P "$PASS" -d "$target_db" -b -h -1 -I "$@"
}
echo "[cluster-seed] waiting for SQL Server to accept connections..."
until run_sql_in master -Q "SELECT 1" >/dev/null 2>&1; do
sleep 2
done
echo "[cluster-seed] SQL Server up."
echo "[cluster-seed] waiting for ${DB} database + dbo.ServerCluster table (operator must run dotnet ef database update)..."
until run_sql_in "$DB" -Q "IF OBJECT_ID('dbo.ServerCluster') IS NULL THROW 50001, 'missing', 1; SELECT 1" >/dev/null 2>&1; do
sleep 3
done
echo "[cluster-seed] schema ready."
echo "[cluster-seed] applying seed-clusters.sql (ServerCluster + ClusterNode rows)..."
run_sql_in "$DB" -i /seed/seed-clusters.sql
echo "[cluster-seed] done."
+195
View File
@@ -0,0 +1,195 @@
-- docker-dev cluster seed. Idempotent — safe to re-run on every `docker compose up`.
--
-- Populates:
-- ServerCluster MAIN, SITE-A, SITE-B
-- ClusterNode driver-a, driver-b → MAIN
-- site-a-1, site-a-2 → SITE-A
-- site-b-1, site-b-2 → SITE-B
--
-- ServerCluster.NodeCount + RedundancyMode are coupled by CHECK constraint:
-- NodeCount=1 ⇒ RedundancyMode='None'
-- NodeCount=2 ⇒ RedundancyMode∈('Warm','Hot')
--
-- Each ClusterNode.ApplicationUri MUST be globally unique (UX_ClusterNode_ApplicationUri).
-- Convention: urn:OtOpcUa:<NodeId>.
--
-- Host = Compose service name (resolves inside the otopcua-dev network).
-- OpcUaPort stays at the container-internal 4840; the host-side port mapping is in
-- docker-compose.yml ports: blocks and is irrelevant to ClusterNode rows.
SET NOCOUNT ON;
SET XACT_ABORT ON;
BEGIN TRANSACTION;
------------------------------------------------------------------------------
-- ServerCluster
------------------------------------------------------------------------------
IF NOT EXISTS (SELECT 1 FROM dbo.ServerCluster WHERE ClusterId = 'MAIN')
INSERT INTO dbo.ServerCluster
(ClusterId, Name, Enterprise, Site, NodeCount, RedundancyMode, Enabled, Notes, CreatedBy)
VALUES
('MAIN', 'Main cluster', 'zb', 'docker-dev',
2, 'Warm', 1,
'docker-dev seed — admin-a/admin-b control-plane, driver-a/driver-b OPC UA publishers.',
'docker-dev-seed');
IF NOT EXISTS (SELECT 1 FROM dbo.ServerCluster WHERE ClusterId = 'SITE-A')
INSERT INTO dbo.ServerCluster
(ClusterId, Name, Enterprise, Site, NodeCount, RedundancyMode, Enabled, Notes, CreatedBy)
VALUES
('SITE-A', 'Site A', 'zb', 'site-a',
2, 'Warm', 1,
'docker-dev seed — 2-node fused admin+driver cluster.',
'docker-dev-seed');
IF NOT EXISTS (SELECT 1 FROM dbo.ServerCluster WHERE ClusterId = 'SITE-B')
INSERT INTO dbo.ServerCluster
(ClusterId, Name, Enterprise, Site, NodeCount, RedundancyMode, Enabled, Notes, CreatedBy)
VALUES
('SITE-B', 'Site B', 'zb', 'site-b',
2, 'Warm', 1,
'docker-dev seed — 2-node fused admin+driver cluster.',
'docker-dev-seed');
------------------------------------------------------------------------------
-- ClusterNode — main cluster OPC UA publishers
--
-- NodeId is "<compose-service>:4053" so it matches what ClusterRoleInfo +
-- ConfigPublishCoordinator derive from Akka.Cluster.Get(system).State.Members
-- (member.Address.Host:Port). NodeDeploymentState.NodeId is FK-bound to
-- ClusterNode.NodeId; mismatched values cause FK 547 on deploy.
------------------------------------------------------------------------------
IF NOT EXISTS (SELECT 1 FROM dbo.ClusterNode WHERE NodeId = 'driver-a:4053')
INSERT INTO dbo.ClusterNode
(NodeId, ClusterId, Host, OpcUaPort, DashboardPort, ApplicationUri, ServiceLevelBase, Enabled, CreatedBy)
VALUES ('driver-a:4053', 'MAIN', 'driver-a', 4840, 8081, 'urn:OtOpcUa:driver-a', 200, 1, 'docker-dev-seed');
IF NOT EXISTS (SELECT 1 FROM dbo.ClusterNode WHERE NodeId = 'driver-b:4053')
INSERT INTO dbo.ClusterNode
(NodeId, ClusterId, Host, OpcUaPort, DashboardPort, ApplicationUri, ServiceLevelBase, Enabled, CreatedBy)
VALUES ('driver-b:4053', 'MAIN', 'driver-b', 4840, 8081, 'urn:OtOpcUa:driver-b', 150, 1, 'docker-dev-seed');
------------------------------------------------------------------------------
-- ClusterNode — site A
------------------------------------------------------------------------------
IF NOT EXISTS (SELECT 1 FROM dbo.ClusterNode WHERE NodeId = 'site-a-1:4053')
INSERT INTO dbo.ClusterNode
(NodeId, ClusterId, Host, OpcUaPort, DashboardPort, ApplicationUri, ServiceLevelBase, Enabled, CreatedBy)
VALUES ('site-a-1:4053', 'SITE-A', 'site-a-1', 4840, 8081, 'urn:OtOpcUa:site-a-1', 200, 1, 'docker-dev-seed');
IF NOT EXISTS (SELECT 1 FROM dbo.ClusterNode WHERE NodeId = 'site-a-2:4053')
INSERT INTO dbo.ClusterNode
(NodeId, ClusterId, Host, OpcUaPort, DashboardPort, ApplicationUri, ServiceLevelBase, Enabled, CreatedBy)
VALUES ('site-a-2:4053', 'SITE-A', 'site-a-2', 4840, 8081, 'urn:OtOpcUa:site-a-2', 150, 1, 'docker-dev-seed');
------------------------------------------------------------------------------
-- ClusterNode — site B
------------------------------------------------------------------------------
IF NOT EXISTS (SELECT 1 FROM dbo.ClusterNode WHERE NodeId = 'site-b-1:4053')
INSERT INTO dbo.ClusterNode
(NodeId, ClusterId, Host, OpcUaPort, DashboardPort, ApplicationUri, ServiceLevelBase, Enabled, CreatedBy)
VALUES ('site-b-1:4053', 'SITE-B', 'site-b-1', 4840, 8081, 'urn:OtOpcUa:site-b-1', 200, 1, 'docker-dev-seed');
IF NOT EXISTS (SELECT 1 FROM dbo.ClusterNode WHERE NodeId = 'site-b-2:4053')
INSERT INTO dbo.ClusterNode
(NodeId, ClusterId, Host, OpcUaPort, DashboardPort, ApplicationUri, ServiceLevelBase, Enabled, CreatedBy)
VALUES ('site-b-2:4053', 'SITE-B', 'site-b-2', 4840, 8081, 'urn:OtOpcUa:site-b-2', 150, 1, 'docker-dev-seed');
------------------------------------------------------------------------------
-- Galaxy MxAccess gateway — MAIN cluster
--
-- Namespace.Kind=SystemPlatform is required for Galaxy/MXAccess data per
-- decision #107; raw equipment drivers use Equipment. DriverInstance points
-- at the external mxaccessgw process. The driver code lives in this repo
-- (.NET 10, cross-platform); only the gateway worker needs Windows.
--
-- ApiKeySecretRef = env:GALAXY_MXGW_API_KEY → resolved at runtime by
-- GalaxyDriver.ResolveApiKey. The env var is set on every driver-role
-- container in docker-compose.yml.
------------------------------------------------------------------------------
IF NOT EXISTS (SELECT 1 FROM dbo.Namespace WHERE NamespaceId = 'MAIN-galaxy')
INSERT INTO dbo.Namespace
(NamespaceRowId, NamespaceId, ClusterId, Kind, NamespaceUri, Enabled, Notes)
VALUES
(NEWID(), 'MAIN-galaxy', 'MAIN', 'SystemPlatform',
'urn:zb:docker-dev:galaxy', 1,
'docker-dev seed — Galaxy / MXAccess namespace served by the MAIN cluster.');
IF NOT EXISTS (SELECT 1 FROM dbo.DriverInstance WHERE DriverInstanceId = 'MAIN-galaxy-mxgw')
INSERT INTO dbo.DriverInstance
(DriverInstanceRowId, DriverInstanceId, ClusterId, NamespaceId, Name, DriverType, Enabled, DriverConfig)
VALUES
(NEWID(), 'MAIN-galaxy-mxgw', 'MAIN', 'MAIN-galaxy',
'MxAccess gateway (10.100.0.48:5120)', 'GalaxyMxGateway', 1,
N'{
"Gateway": {
"Endpoint": "http://10.100.0.48:5120",
"ApiKeySecretRef": "env:GALAXY_MXGW_API_KEY",
"UseTls": false,
"ConnectTimeoutSeconds": 10,
"DefaultCallTimeoutSeconds": 30
},
"MxAccess": {
"ClientName": "OtOpcUa-MAIN-docker-dev",
"PublishingIntervalMs": 1000
},
"Repository": {
"DiscoverPageSize": 5000,
"WatchDeployEvents": true
},
"Reconnect": {
"InitialBackoffMs": 500,
"MaxBackoffMs": 30000,
"ReplayOnSessionLost": true
}
}');
------------------------------------------------------------------------------
-- Galaxy test tags — TestMachine_001.TestAlarm001..003
--
-- SystemPlatform-namespace tags have EquipmentId=NULL and use FolderPath +
-- Name to address the MXAccess item. The Galaxy driver subscribes via the
-- "FolderPath.Name" MXAccess reference form; OPC UA browse path is the
-- equivalent "FolderPath/Name" under the SystemPlatform namespace.
------------------------------------------------------------------------------
IF NOT EXISTS (SELECT 1 FROM dbo.Tag WHERE TagId = 'MAIN-galaxy-TestMachine_001-TestAlarm001')
INSERT INTO dbo.Tag
(TagRowId, TagId, DriverInstanceId, DeviceId, EquipmentId, Name, FolderPath, DataType, AccessLevel, WriteIdempotent, PollGroupId, TagConfig)
VALUES
(NEWID(), 'MAIN-galaxy-TestMachine_001-TestAlarm001', 'MAIN-galaxy-mxgw', NULL, NULL,
'TestAlarm001', 'TestMachine_001', 'Boolean', 0, 0, NULL, N'{}');
IF NOT EXISTS (SELECT 1 FROM dbo.Tag WHERE TagId = 'MAIN-galaxy-TestMachine_001-TestAlarm002')
INSERT INTO dbo.Tag
(TagRowId, TagId, DriverInstanceId, DeviceId, EquipmentId, Name, FolderPath, DataType, AccessLevel, WriteIdempotent, PollGroupId, TagConfig)
VALUES
(NEWID(), 'MAIN-galaxy-TestMachine_001-TestAlarm002', 'MAIN-galaxy-mxgw', NULL, NULL,
'TestAlarm002', 'TestMachine_001', 'Boolean', 0, 0, NULL, N'{}');
IF NOT EXISTS (SELECT 1 FROM dbo.Tag WHERE TagId = 'MAIN-galaxy-TestMachine_001-TestAlarm003')
INSERT INTO dbo.Tag
(TagRowId, TagId, DriverInstanceId, DeviceId, EquipmentId, Name, FolderPath, DataType, AccessLevel, WriteIdempotent, PollGroupId, TagConfig)
VALUES
(NEWID(), 'MAIN-galaxy-TestMachine_001-TestAlarm003', 'MAIN-galaxy-mxgw', NULL, NULL,
'TestAlarm003', 'TestMachine_001', 'Boolean', 0, 0, NULL, N'{}');
COMMIT TRANSACTION;
------------------------------------------------------------------------------
-- Summary (logged by sqlcmd output)
------------------------------------------------------------------------------
SELECT ClusterId, Name, NodeCount, RedundancyMode FROM dbo.ServerCluster ORDER BY ClusterId;
SELECT NodeId, ClusterId, Host, OpcUaPort, ApplicationUri, ServiceLevelBase
FROM dbo.ClusterNode ORDER BY ClusterId, NodeId;
SELECT NamespaceId, ClusterId, Kind, NamespaceUri FROM dbo.Namespace ORDER BY ClusterId, NamespaceId;
SELECT DriverInstanceId, ClusterId, DriverType, NamespaceId, Name
FROM dbo.DriverInstance ORDER BY ClusterId, DriverInstanceId;
SELECT TagId, DriverInstanceId, FolderPath, Name, DataType FROM dbo.Tag ORDER BY DriverInstanceId, FolderPath, Name;
+81
View File
@@ -0,0 +1,81 @@
# docker-dev companion to scripts/install/traefik-dynamic.yml. Routes three
# Akka clusters that share the Compose network:
#
# - Main cluster (default): PathPrefix(`/`) → admin-a / admin-b.
# - Site A cluster: Host(`site-a.localhost`) → site-a-1 / site-a-2.
# - Site B cluster: Host(`site-b.localhost`) → site-b-1 / site-b-2.
#
# Host-header rules are more specific than PathPrefix, so they win over the
# default router for the site hostnames automatically — no priority field needed.
http:
routers:
otopcua-admin:
entryPoints: ["web"]
rule: "PathPrefix(`/`)"
service: otopcua-admin
otopcua-site-a:
entryPoints: ["web"]
rule: "Host(`site-a.localhost`)"
service: otopcua-site-a
otopcua-site-b:
entryPoints: ["web"]
rule: "Host(`site-b.localhost`)"
service: otopcua-site-b
services:
otopcua-admin:
loadBalancer:
# Blazor Server uses SignalR; the WebSocket upgrade must hit the same
# backend that owns the circuit ID. Sticky cookie keeps each session
# pinned to one node so the post-handshake WebSocket doesn't 404.
sticky:
cookie:
name: otopcua_lb
httpOnly: true
sameSite: lax
servers:
- url: "http://admin-a:9000"
- url: "http://admin-b:9000"
healthCheck:
path: /health/active
interval: 5s
timeout: 2s
otopcua-site-a:
loadBalancer:
# Blazor Server uses SignalR; the WebSocket upgrade must hit the same
# backend that owns the circuit ID. Sticky cookie keeps each session
# pinned to one node so the post-handshake WebSocket doesn't 404.
sticky:
cookie:
name: otopcua_lb
httpOnly: true
sameSite: lax
servers:
- url: "http://site-a-1:9000"
- url: "http://site-a-2:9000"
healthCheck:
path: /health/active
interval: 5s
timeout: 2s
otopcua-site-b:
loadBalancer:
# Blazor Server uses SignalR; the WebSocket upgrade must hit the same
# backend that owns the circuit ID. Sticky cookie keeps each session
# pinned to one node so the post-handshake WebSocket doesn't 404.
sticky:
cookie:
name: otopcua_lb
httpOnly: true
sameSite: lax
servers:
- url: "http://site-b-1:9000"
- url: "http://site-b-2:9000"
healthCheck:
path: /health/active
interval: 5s
timeout: 2s
+2 -2
View File
@@ -1,6 +1,6 @@
# Address Space
Each driver's browsable subtree is built by streaming nodes from the driver's `ITagDiscovery.DiscoverAsync` implementation into an `IAddressSpaceBuilder`. `GenericDriverNodeManager` (`src/Core/ZB.MOM.WW.OtOpcUa.Core/OpcUa/GenericDriverNodeManager.cs`) owns the shared orchestration; `DriverNodeManager` (`src/Server/ZB.MOM.WW.OtOpcUa.Server/OpcUa/DriverNodeManager.cs`) implements `IAddressSpaceBuilder` against the OPC Foundation stack's `CustomNodeManager2`. The same code path serves Galaxy object hierarchies, Modbus PLC registers, AB CIP tags, TwinCAT symbols, FOCAS CNC parameters, and OPC UA Client aggregations — Galaxy is one driver of seven, not the driver.
Each driver's browsable subtree is built by streaming nodes from the driver's `ITagDiscovery.DiscoverAsync` implementation into an `IAddressSpaceBuilder`. `GenericDriverNodeManager` (`src/Core/ZB.MOM.WW.OtOpcUa.Core/OpcUa/GenericDriverNodeManager.cs`) owns the shared orchestration; in v2 the SDK-driven materialization is handled by `OtOpcUaNodeManager` (`src/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer/OtOpcUaNodeManager.cs`) fed via `SdkAddressSpaceSink` (`src/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer/SdkAddressSpaceSink.cs`). The same code path serves Galaxy object hierarchies, Modbus PLC registers, AB CIP tags, TwinCAT symbols, FOCAS CNC parameters, and OPC UA Client aggregations — Galaxy is one driver of seven, not the driver.
## Driver root folder
@@ -66,7 +66,7 @@ Drivers that implement `IRediscoverable` fire `OnRediscoveryNeeded` when their b
## Key source files
- `src/Core/ZB.MOM.WW.OtOpcUa.Core/OpcUa/GenericDriverNodeManager.cs` — orchestration + `CapturingBuilder`
- `src/Server/ZB.MOM.WW.OtOpcUa.Server/OpcUa/DriverNodeManager.cs` — OPC UA materialization (`IAddressSpaceBuilder` impl + `NestedBuilder`)
- `src/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer/OtOpcUaNodeManager.cs`, `src/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer/SdkAddressSpaceSink.cs` — OPC UA materialization (write-only sink fed by the actor system)
- `src/Core/ZB.MOM.WW.OtOpcUa.Core.Abstractions/IAddressSpaceBuilder.cs` — builder contract
- `src/Core/ZB.MOM.WW.OtOpcUa.Core.Abstractions/ITagDiscovery.cs` — driver discovery capability
- `src/Core/ZB.MOM.WW.OtOpcUa.Core.Abstractions/DriverAttributeInfo.cs` — per-attribute descriptor
+4 -3
View File
@@ -15,9 +15,10 @@ historical reference.
| **Galaxy sub-attribute fallback** | `IWritable` writes to `$Alarm*` sub-attributes | gateway data subscription → driver `OnDataChange``DriverNodeManager` ConditionSink → `AlarmConditionService` |
| **Scripted alarms** | `Phase7EngineComposer` | server-side script evaluator → `Phase7EngineComposer.RouteToHistorianAsync` + `AlarmConditionService` |
All three converge on `AlarmConditionService` (`src/Server/ZB.MOM.WW.OtOpcUa.Server/Alarms/AlarmConditionService.cs`),
which owns the OPC UA Part 9 state machine and dispatches transitions
to the OPC UA condition node managers. Driver-native transitions take
All three converge on the alarm-state actor — in v2 the OPC UA Part 9 state
machine lives inside `ScriptedAlarmActor`
(`src/Server/ZB.MOM.WW.OtOpcUa.Runtime/ScriptedAlarms/ScriptedAlarmActor.cs`),
which dispatches transitions to the OPC UA condition node managers. Driver-native transitions take
precedence over sub-attribute synthesis when both arrive for the same
condition — the dedup logic prefers the richer driver-native record
because it carries the full operator + raise-time + category metadata
+3 -2
View File
@@ -28,7 +28,7 @@ Static drivers (Modbus, S7, AB CIP, AB Legacy, FOCAS) do not implement `IRedisco
Tag-set changes authored in the Admin UI (UNS edits, CSV imports, driver-config edits) accumulate in a draft generation and commit via `sp_PublishGeneration`. The delta between the currently-published generation and the proposed next one is computed by `sp_ComputeGenerationDiff`, which drives:
- The **DiffViewer** in Admin (`src/Server/ZB.MOM.WW.OtOpcUa.Admin/Components/Pages/Clusters/DiffViewer.razor`) so operators can preview what will change before clicking Publish.
- The publish-preview surface in the Admin UI (`src/Server/ZB.MOM.WW.OtOpcUa.AdminUI/Components/Pages/Deployments.razor`, backed by `AdminOperationsClient`) so operators can preview what will change before clicking Publish.
- The 409-on-stale-draft flow (decision #161) — a UNS drag-reorder preview carries a `DraftRevisionToken` so Confirm returns `409 Conflict / refresh-required` if the draft advanced between preview and commit.
After publish, the server's generation applier invokes `IDriver.ReinitializeAsync(driverConfigJson, ct)` on every driver whose `DriverInstance.DriverConfig` row changed in the new generation. Reinitialize is the in-process recovery path for Tier A/B drivers; if it fails the driver is marked `DriverState.Faulted` and its nodes go Bad quality — but the server process stays running. See `docs/v2/driver-stability.md`.
@@ -64,6 +64,7 @@ Subscriptions for unchanged references stay live across rebuilds — their ref-c
- `src/Core/ZB.MOM.WW.OtOpcUa.Core.Abstractions/IRediscoverable.cs` — backend-change capability
- `src/Core/ZB.MOM.WW.OtOpcUa.Core/OpcUa/GenericDriverNodeManager.cs` — discovery orchestration
- `src/Core/ZB.MOM.WW.OtOpcUa.Core.Abstractions/IDriver.cs``ReinitializeAsync` contract
- `src/Server/ZB.MOM.WW.OtOpcUa.Admin/Services/GenerationService.cs` — publish-flow driver
- `src/Server/ZB.MOM.WW.OtOpcUa.ControlPlane/Coordinators/ConfigPublishCoordinator.cs` — publish-flow driver
- `src/Server/ZB.MOM.WW.OtOpcUa.ControlPlane/AdminOperations/AdminOperationsActor.cs` — cluster singleton invoked by the Admin UI's `AdminOperationsClient`
- `docs/v2/config-db-schema.md``sp_PublishGeneration` + `sp_ComputeGenerationDiff`
- `docs/v2/admin-ui.md` — DiffViewer + draft-revision-token flow
+9 -8
View File
@@ -1,13 +1,13 @@
# OPC UA Server
The OPC UA server component (`src/Server/ZB.MOM.WW.OtOpcUa.Server/OpcUa/OtOpcUaServer.cs`) hosts the OPC UA stack and exposes one browsable subtree per registered driver. The server itself is driver-agnostic — Galaxy/MXAccess, Modbus, S7, AB CIP, AB Legacy, TwinCAT, FOCAS, and OPC UA Client are all plugged in as `IDriver` implementations via the capability interfaces in `src/Core/ZB.MOM.WW.OtOpcUa.Core.Abstractions/`.
The OPC UA server component (`src/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer/OtOpcUaSdkServer.cs`) hosts the OPC UA stack and exposes one browsable subtree per registered driver. The server itself is driver-agnostic — Galaxy/MXAccess, Modbus, S7, AB CIP, AB Legacy, TwinCAT, FOCAS, and OPC UA Client are all plugged in as `IDriver` implementations via the capability interfaces in `src/Core/ZB.MOM.WW.OtOpcUa.Core.Abstractions/`.
## Composition
`OtOpcUaServer` subclasses the OPC Foundation `StandardServer` and wires:
- A `DriverHost` (`src/Core/ZB.MOM.WW.OtOpcUa.Core/Hosting/DriverHost.cs`) which registers drivers and holds the per-instance `IDriver` references.
- One `DriverNodeManager` per registered driver (`src/Server/ZB.MOM.WW.OtOpcUa.Server/OpcUa/DriverNodeManager.cs`), constructed in `CreateMasterNodeManager`. Each manager owns its own namespace URI (`urn:OtOpcUa:{DriverInstanceId}`) and exposes the driver as a subtree under the standard `Objects` folder.
- One `DriverNodeManager` per registered driver (`src/Core/ZB.MOM.WW.OtOpcUa.Core/OpcUa/GenericDriverNodeManager.cs`), constructed in `CreateMasterNodeManager`. Each manager owns its own namespace URI (`urn:OtOpcUa:{DriverInstanceId}`) and exposes the driver as a subtree under the standard `Objects` folder.
- A `CapabilityInvoker` (`src/Core/ZB.MOM.WW.OtOpcUa.Core/Resilience/CapabilityInvoker.cs`) per driver instance, keyed on `(DriverInstanceId, HostName, DriverCapability)` against the shared `DriverResiliencePipelineBuilder`. Every Read/Write/Discovery/Subscribe/HistoryRead/AlarmSubscribe call on the driver flows through this invoker so the Polly pipeline (retry / timeout / breaker / bulkhead) applies. The OTOPCUA0001 Roslyn analyzer enforces the wrapping at compile time.
- An `IUserAuthenticator` (LDAP in production, injected stub in tests) for `UserName` token validation in the `ImpersonateUser` hook.
- Optional `AuthorizationGate` + `NodeScopeResolver` (Phase 6.2) that sit in front of every dispatch call. In lax mode the gate passes through when the identity lacks LDAP groups so existing integration tests keep working; strict mode (`Authorization:StrictMode = true`) denies those cases.
@@ -50,7 +50,7 @@ The host name fed to the invoker comes from `IPerCallHostResolver.ResolveHost(fu
## Redundancy
`Redundancy.Enabled = true` on the `ServerInstance` activates the `RedundancyCoordinator` + `ServiceLevelCalculator` (`src/Server/ZB.MOM.WW.OtOpcUa.Server/Redundancy/`). Standard OPC UA redundancy nodes (`Server/ServerRedundancy/RedundancySupport`, `ServerUriArray`, `Server/ServiceLevel`) are populated on startup; `ServiceLevel` recomputes whenever any driver's `DriverHealth` changes. The apply-lease mechanism prevents two instances from concurrently applying a generation. See `docs/Redundancy.md`.
`Redundancy.Enabled = true` on the `ServerInstance` activates the `RedundancyStateActor` + `ServiceLevelCalculator` (`src/Server/ZB.MOM.WW.OtOpcUa.ControlPlane/Redundancy/`). Standard OPC UA redundancy nodes (`Server/ServerRedundancy/RedundancySupport`, `ServerUriArray`, `Server/ServiceLevel`) are populated on startup; `ServiceLevel` recomputes whenever any driver's `DriverHealth` changes. The apply-lease mechanism prevents two instances from concurrently applying a generation. See `docs/Redundancy.md`.
## Server class hierarchy
@@ -79,10 +79,11 @@ Certificate stores default to `%LOCALAPPDATA%\OPC Foundation\pki\` (directory-ba
## Key source files
- `src/Server/ZB.MOM.WW.OtOpcUa.Server/OpcUa/OtOpcUaServer.cs``StandardServer` subclass + `ImpersonateUser` hook
- `src/Server/ZB.MOM.WW.OtOpcUa.Server/OpcUa/DriverNodeManager.cs` — per-driver `CustomNodeManager2` + dispatch surface
- `src/Server/ZB.MOM.WW.OtOpcUa.Server/OpcUa/OpcUaApplicationHost.cs` — programmatic `ApplicationConfiguration` + lifecycle
- `src/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer/OtOpcUaSdkServer.cs``StandardServer` subclass
- `src/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer/OpcUaApplicationHost.cs` — programmatic `ApplicationConfiguration` + lifecycle + `ImpersonateUser` hook
- `src/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer/OtOpcUaNodeManager.cs` — SDK node manager + write-only address-space sink
- `src/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer/SdkAddressSpaceSink.cs``IOpcUaAddressSpaceSink` adapter the actor system pushes into
- `src/Core/ZB.MOM.WW.OtOpcUa.Core/OpcUa/GenericDriverNodeManager.cs` — per-driver discovery + dispatch surface
- `src/Core/ZB.MOM.WW.OtOpcUa.Core/Hosting/DriverHost.cs` — driver registration
- `src/Core/ZB.MOM.WW.OtOpcUa.Core/Resilience/CapabilityInvoker.cs` — Polly pipeline entry point
- `src/Core/ZB.MOM.WW.OtOpcUa.Core/Authorization/`Phase 6.2 permission trie + evaluator
- `src/Server/ZB.MOM.WW.OtOpcUa.Server/Security/AuthorizationGate.cs` — stack-to-evaluator bridge
- `src/Core/ZB.MOM.WW.OtOpcUa.Core/Authorization/` — permission trie + evaluator (`PermissionTrie`, `PermissionTrieCache`, `TriePermissionEvaluator`)
+7 -4
View File
@@ -9,10 +9,13 @@ The project was originally called **LmxOpcUa** (a single-driver Galaxy/MXAccess
## Platform overview
- **Core** owns the OPC UA stack, address space, session/security/subscription machinery.
> **v2 (2026-05-26):** the separate `OtOpcUa.Server` + `OtOpcUa.Admin` services fused into a single role-gated `OtOpcUa.Host` binary, joined by an Akka.NET cluster. See [v2 design](plans/2026-05-26-akka-hosting-alignment-design.md) for the architectural decision.
- **Core** owns shared abstractions (driver capability contracts, scripting, virtual tags, alarm historian).
- **Drivers** plug in via capability interfaces in `ZB.MOM.WW.OtOpcUa.Core.Abstractions`: `IDriver`, `IReadable`, `IWritable`, `ITagDiscovery`, `ISubscribable`, `IHostConnectivityProbe`, `IAlarmSource`, `IHistoryProvider`, `IPerCallHostResolver`. Each driver opts into whichever it supports.
- **Server** is the OPC UA endpoint process (net10, AnyCPU). Hosts every driver in-process. The Galaxy driver reaches MXAccess via gRPC to a separately-installed **mxaccessgw** sidecar (sibling repo); it is no longer hosted from this repo.
- **Admin** is the Blazor Server operator UI (net10, x64). Owns the Config DB draft/publish flow, ACL + role-grant authoring, fleet status + `/metrics` scrape endpoint.
- **Host** (`src/Server/ZB.MOM.WW.OtOpcUa.Host`) is the single fused binary (.NET 10, AnyCPU). `OTOPCUA_ROLES` env decides what to mount: `admin` (Blazor + control-plane singletons), `driver` (OPC UA endpoint + per-node actors), or both. See [ServiceHosting.md](ServiceHosting.md).
- **Cluster + ControlPlane + Runtime + AdminUI + Security** sit between Core and Host. The cluster glues per-node actors into one logical fleet; the control-plane singletons (deploy coordinator, audit writer, redundancy state) live on the admin role-leader. See [Redundancy.md](Redundancy.md).
- The Galaxy driver still reaches MXAccess via gRPC to a separately-installed **mxaccessgw** sidecar (sibling repo).
## Where to find what
@@ -56,7 +59,7 @@ For Modbus / S7 / AB CIP / AB Legacy / TwinCAT / FOCAS / OPC UA Client specifics
| [security.md](security.md) | Transport security profiles, LDAP auth, ACL trie, role grants, OTOPCUA0001 analyzer |
| [Redundancy.md](Redundancy.md) | `RedundancyCoordinator`, `ServiceLevelCalculator`, apply-lease, Prometheus metrics |
| [Reservations.md](Reservations.md) | Fleet-wide ZTag / SAPID external-ID reservations — publish-time claim, release flow |
| [ServiceHosting.md](ServiceHosting.md) | Two-process deploy (Server + Admin) install/uninstall, plus the optional `OtOpcUaWonderwareHistorian` sidecar |
| [ServiceHosting.md](ServiceHosting.md) | Single fused `OtOpcUa.Host` binary install/uninstall with `OTOPCUA_ROLES` gating, plus the optional `OtOpcUaWonderwareHistorian` sidecar |
| [StatusDashboard.md](StatusDashboard.md) | Pointer — superseded by [v2/admin-ui.md](v2/admin-ui.md) |
### Client tooling
+3 -4
View File
@@ -1,6 +1,6 @@
# Read/Write Operations
`DriverNodeManager` (`src/Server/ZB.MOM.WW.OtOpcUa.Server/OpcUa/DriverNodeManager.cs`) wires the OPC UA stack's per-variable `OnReadValue` and `OnWriteValue` hooks to each driver's `IReadable` and `IWritable` capabilities. Every dispatch flows through `CapabilityInvoker` so the Polly pipeline (retry / timeout / breaker / bulkhead) applies uniformly across Galaxy, Modbus, S7, AB CIP, AB Legacy, TwinCAT, FOCAS, and OPC UA Client drivers.
`GenericDriverNodeManager` (`src/Core/ZB.MOM.WW.OtOpcUa.Core/OpcUa/GenericDriverNodeManager.cs`) wires the OPC UA stack's per-variable `OnReadValue` and `OnWriteValue` hooks to each driver's `IReadable` and `IWritable` capabilities. Every dispatch flows through `CapabilityInvoker` so the Polly pipeline (retry / timeout / breaker / bulkhead) applies uniformly across Galaxy, Modbus, S7, AB CIP, AB Legacy, TwinCAT, FOCAS, and OPC UA Client drivers.
## Driver vs virtual dispatch
@@ -60,8 +60,7 @@ Per decision #12, exceptions in the driver's capability call are logged and conv
## Key source files
- `src/Server/ZB.MOM.WW.OtOpcUa.Server/OpcUa/DriverNodeManager.cs``OnReadValue` / `OnWriteValue` hooks
- `src/Server/ZB.MOM.WW.OtOpcUa.Server/Security/WriteAuthzPolicy.cs` — classification-to-role policy
- `src/Server/ZB.MOM.WW.OtOpcUa.Server/Security/AuthorizationGate.cs` — Phase 6.2 trie gate
- `src/Core/ZB.MOM.WW.OtOpcUa.Core/OpcUa/GenericDriverNodeManager.cs``OnReadValue` / `OnWriteValue` hooks
- `src/Core/ZB.MOM.WW.OtOpcUa.Core/Authorization/` — permission trie + evaluator (`PermissionTrie`, `PermissionTrieCache`, `TriePermissionEvaluator`) that gates Read/Write/Subscribe per the session's resolved LDAP groups
- `src/Core/ZB.MOM.WW.OtOpcUa.Core/Resilience/CapabilityInvoker.cs``ExecuteAsync` / `ExecuteWriteAsync`
- `src/Core/ZB.MOM.WW.OtOpcUa.Core.Abstractions/IReadable.cs`, `IWritable.cs`, `WriteIdempotentAttribute.cs`
+78 -72
View File
@@ -1,103 +1,109 @@
# Redundancy
# Redundancy (v2)
## Overview
OtOpcUa supports OPC UA **non-transparent** warm/hot redundancy. Two (or more) OtOpcUa Server processes run side-by-side, share the same Config DB, the same driver backends (Galaxy ZB, MXAccess runtime, remote PLCs), and advertise the same OPC UA node tree. Each process owns a distinct `ApplicationUri`; OPC UA clients see both endpoints via the standard `ServerUriArray` and pick one based on the `ServiceLevel` that each server publishes.
OtOpcUa supports OPC UA **non-transparent** warm/hot redundancy. Two or more `OtOpcUa.Host` processes run side-by-side, share the same Config DB, and join the same Akka.NET cluster. Each process owns a distinct `ApplicationUri`; OPC UA clients discover both endpoints by reading `Server.ServerArray` (NodeId `i=2254`) on either node and pick one based on the `ServiceLevel` byte that each server publishes.
The redundancy surface lives in `src/Server/ZB.MOM.WW.OtOpcUa.Server/Redundancy/`:
> **Discovery surface.** The `ServerArray` path on the `Server` object is what each node populates with self + peer `ApplicationUri`s — see `OpcUaApplicationHost.PopulateServerArray` and the per-node `PeerApplicationUris` option below. The redundancy-object-type `ServerUriArray` proper (a child of `Server.ServerRedundancy`) remains deferred pending an SDK object-type upgrade; clients should read `Server.ServerArray` for peer discovery today.
| Class | Role |
|---|---|
| `RedundancyCoordinator` | Process-singleton; owns the current `RedundancyTopology` loaded from the `ClusterNode` table. `RefreshAsync` re-reads after `sp_PublishGeneration` so operator role swaps take effect without a process restart. CAS-style swap (`Interlocked.Exchange`) means readers always see a coherent snapshot. |
| `RedundancyTopology` | Immutable `(ClusterId, Self, Peers, ServerUriArray, ValidityFlags)` snapshot. |
| `ApplyLeaseRegistry` | Tracks in-progress `sp_PublishGeneration` apply leases keyed on `(ConfigGenerationId, PublishRequestId)`. `await using` the disposable scope guarantees every exit path (success / exception / cancellation) decrements the lease; a stale-lease watchdog force-closes any lease older than `ApplyMaxDuration` (default 10 minutes) so a crashed publisher can't pin the node at `PrimaryMidApply`. |
| `PeerReachabilityTracker` | Maintains last-known reachability for each peer node over two independent probes — OPC UA ping and HTTP `/healthz`. Both must succeed for `peerReachable = true`. |
| `RecoveryStateManager` | Gates transitions out of the `Recovering*` bands; requires dwell + publish-witness satisfaction before allowing a return to nominal. |
| `ServiceLevelCalculator` | Pure function `(role, selfHealthy, peerUa, peerHttp, applyInProgress, recoveryDwellMet, topologyValid, operatorMaintenance) → byte`. |
| `RedundancyStatePublisher` | Orchestrates inputs into the calculator, pushes the resulting byte to the OPC UA `ServiceLevel` variable via an edge-triggered `OnStateChanged` event, and fires `OnServerUriArrayChanged` when the topology's `ServerUriArray` shifts. |
> **v2 change.** v1's operator-managed `ClusterNode.RedundancyRole` column + `RedundancyCoordinator` / `ApplyLeaseRegistry` / `PeerHttpProbeLoop` are gone. Primary/secondary is now derived from **Akka cluster role-leader** for the `driver` role. The operator no longer writes a role into the DB; cluster topology + health drive ServiceLevel automatically.
## Data model
The runtime pieces live in:
Per-node redundancy state lives in the Config DB `ClusterNode` table (`src/Core/ZB.MOM.WW.OtOpcUa.Configuration/Entities/ClusterNode.cs`):
| Column | Role |
|---|---|
| `NodeId` | Unique node identity; matches `Node:NodeId` in the server's bootstrap `appsettings.json`. |
| `ClusterId` | Foreign key into `ServerCluster`. |
| `RedundancyRole` | `Primary`, `Secondary`, or `Standalone` (`RedundancyRole` enum in `Configuration/Enums`). |
| `ServiceLevelBase` | Per-node base value used to bias nominal ServiceLevel output. |
| `ApplicationUri` | Unique-per-node OPC UA ApplicationUri advertised in endpoint descriptions. |
`ServerUriArray` is derived from the set of peer `ApplicationUri` values at topology-load time and republished when the topology changes.
## ServiceLevel matrix
`ServiceLevelCalculator` produces one of the following bands (see `ServiceLevelBand` enum in the same file):
| Band | Byte | Meaning |
| Component | Project | Role |
|---|---|---|
| `Maintenance` | 0 | Operator-declared maintenance. |
| `NoData` | 1 | Self-reported unhealthy (`/healthz` fails). |
| `InvalidTopology` | 2 | More than one Primary detected; both nodes self-demote. |
| `RecoveringBackup` | 30 | Backup post-fault, dwell not met. |
| `BackupMidApply` | 50 | Backup inside a publish-apply window. |
| `IsolatedBackup` | 80 | Primary unreachable; Backup says "take over if asked" — does **not** auto-promote (non-transparent model). |
| `AuthoritativeBackup` | 100 | Backup nominal. |
| `RecoveringPrimary` | 180 | Primary post-fault, dwell not met. |
| `PrimaryMidApply` | 200 | Primary inside a publish-apply window. |
| `IsolatedPrimary` | 230 | Primary with unreachable peer, retains authority. |
| `AuthoritativePrimary` | 255 | Primary nominal. |
| `ServiceLevelCalculator` | `OtOpcUa.ControlPlane.Redundancy` | Pure function `(NodeHealthInputs) → byte`. No side effects. |
| `RedundancyStateActor` | `OtOpcUa.ControlPlane.Redundancy` | Admin-role cluster singleton; subscribes to cluster topology events, debounces 250ms, broadcasts `RedundancyStateChanged` on the `redundancy-state` DPS topic. |
| `DbHealthProbeActor` | `OtOpcUa.Runtime.Health` | Per-node; runs `SELECT 1` against ConfigDb every 5s. Read by health endpoint + redundancy calc. |
| `PeerOpcUaProbeActor` | `OtOpcUa.Runtime.Health` | Per-node; pings peer `opc.tcp://peer:4840` (real probe call is staged for follow-up F12). |
| `ClusterRoleInfo` | `OtOpcUa.Cluster` | Live view of cluster membership + role-leader; exposes `IClusterRoleInfo` to the rest of the host. |
The reserved bands (0 Maintenance, 1 NoData, 2 InvalidTopology) take precedence over operational states per OPC UA Part 5 §6.3.34. Operational values occupy 2..255 so spec-compliant clients that treat "<3 = unhealthy" keep working.
## ServiceLevel tiers (Part 5 §6.5)
Standalone nodes (single-instance deployments) report `AuthoritativePrimary` when healthy and `PrimaryMidApply` during publish.
`ServiceLevelCalculator.Compute(NodeHealthInputs)` returns a byte in 0..255 by tier:
## Publish fencing and split-brain prevention
| Tier | Byte | Condition |
|---|---|---|
| Down | 0 | Member status is not `Up` or `Joining` (leaving, removed, exiting). |
| Critically degraded | 100 | ConfigDb unreachable AND data is stale. |
| Stale | 200 | Data stale but ConfigDb reachable. |
| Healthy follower | 240 | DB ok + OPC UA probe ok + not stale. |
| Healthy leader | 250 | Healthy + this node is the `driver` role-leader. |
Any Admin-triggered `sp_PublishGeneration` acquires an apply lease through `ApplyLeaseRegistry.BeginApplyLease`. While the lease is held:
Drivers write their computed byte into the OPC UA `ServiceLevel` Variable on each refresh. Clients with the standard redundancy heuristic ("pick the highest ServiceLevel") therefore prefer the role-leader and fall back to followers on its degradation.
- The calculator reports `PrimaryMidApply` / `BackupMidApply` — clients see the band shift and cut over to the unaffected peer rather than racing against a half-applied generation.
- `RedundancyCoordinator.RefreshAsync` is called at the end of the apply window so the post-publish topology becomes visible exactly once, atomically.
- The watchdog force-closes any lease older than `ApplyMaxDuration`; a stuck publisher therefore cannot strand a node at `PrimaryMidApply`.
## Data flow
Because role transitions are **operator-driven** (write `RedundancyRole` in the Config DB + publish), the Backup never auto-promotes. An `IsolatedBackup` at 80 is the signal that the operator should intervene; auto-failover is intentionally out of scope for the non-transparent model (decision #154).
```
Cluster topology event ──┐
DB health probe ─────────┤
OPC UA peer probe ───────┤
RedundancyStateActor (admin singleton)
│ debounce 250ms
DPS topic "redundancy-state"
Driver nodes' OpcUaPublishActor
ServiceLevelCalculator → byte
OPC UA ServiceLevel Variable
```
## Metrics
The admin singleton is the cluster's only `RedundancyStateActor`. If the admin leader fails over, the new admin node spins up its replacement, re-subscribes to cluster events, and publishes a fresh snapshot from the current `Cluster.State`. There is no DB-persisted state to recover.
`RedundancyMetrics` in `src/Server/ZB.MOM.WW.OtOpcUa.Admin/Services/RedundancyMetrics.cs` registers the `ZB.MOM.WW.OtOpcUa.Redundancy` meter on the Admin process. Instruments:
## Configuration
| Name | Kind | Tags | Description |
|---|---|---|---|
| `otopcua.redundancy.role_transition` | Counter<long> | `cluster.id`, `node.id`, `from_role`, `to_role` | Incremented every time `FleetStatusPoller` observes a `RedundancyRole` change on a `ClusterNode` row. |
| `otopcua.redundancy.primary_count` | ObservableGauge<long> | `cluster.id` | Primary-role nodes per cluster — should be exactly 1 in nominal state. |
| `otopcua.redundancy.secondary_count` | ObservableGauge<long> | `cluster.id` | Secondary-role nodes per cluster. |
| `otopcua.redundancy.stale_count` | ObservableGauge<long> | `cluster.id` | Nodes whose `LastSeenAt` exceeded the stale threshold. |
Per-node identity comes from `appsettings.json` + the `OTOPCUA_ROLES` env var:
Admin `Program.cs` wires OpenTelemetry to the Prometheus exporter when `Metrics:Prometheus:Enabled=true` (default), exposing the meter under `/metrics`. The endpoint is intentionally unauthenticated — fleet conventions put it behind a reverse-proxy basic-auth gate if needed.
```json
{
"Cluster": {
"Hostname": "0.0.0.0",
"Port": 4053,
"PublicHostname": "node-a.lan",
"SeedNodes": ["akka.tcp://otopcua@node-a.lan:4053"],
"Roles": ["admin", "driver"]
}
}
```
## Real-time notifications (Admin UI)
```
OTOPCUA_ROLES=admin,driver
```
`FleetStatusPoller` in `src/Server/ZB.MOM.WW.OtOpcUa.Admin/Hubs/` polls the `ClusterNode` table, records role transitions, updates `RedundancyMetrics.SetClusterCounts`, and pushes a `RoleChanged` SignalR event onto `FleetStatusHub` when a transition is observed. `RedundancyTab.razor` subscribes with `_hub.On<RoleChangedMessage>("RoleChanged", …)` so connected Admin sessions see role swaps the moment they happen.
Both nodes share the same `ConfigDb` connection string; `Cluster.PublicHostname` + `Roles` are what makes them distinct in cluster gossip. The first node bootstraps the cluster (its address goes in `SeedNodes`); the second node joins via the same `SeedNodes` list.
## Configuring a redundant pair
There is no longer a `Node:NodeId` setting, no `ClusterNode.RedundancyRole`, no `ServiceLevelBase`. NodeId is derived as `host:port` of the cluster `PublicHostname` (see `ClusterRoleInfo.LocalNode` for the formula).
Redundancy is configured **in the Config DB, not appsettings.json**. The fields that must differ between the two instances:
### Peer URI advertising
| Field | Location | Instance 1 | Instance 2 |
|---|---|---|---|
| `NodeId` | `appsettings.json` `Node:NodeId` (bootstrap) | `node-a` | `node-b` |
| `ClusterNode.ApplicationUri` | Config DB | `urn:node-a:OtOpcUa` | `urn:node-b:OtOpcUa` |
| `ClusterNode.RedundancyRole` | Config DB | `Primary` | `Secondary` |
| `ClusterNode.ServiceLevelBase` | Config DB | typically 255 | typically 100 |
Each node advertises its partner via `OpcUaApplicationHostOptions.PeerApplicationUris` (an `IList<string>`, default empty). `OpcUaApplicationHost.PopulateServerArray` appends each configured peer URI to the SDK's `IServerInternal.ServerUris` string table after server startup, so that `Server.ServerArray` reads served by `OnReadServerArray` return both self + peers. Set this per-node in `appsettings.json`:
Shared between instances: `ClusterId`, Config DB connection string, published generation, cluster-level ACLs, UNS hierarchy, driver instances.
```json
{
"OpcUaServer": {
"PeerApplicationUris": ["urn:node-b:OtOpcUa"]
}
}
```
Role swaps, stand-alone promotions, and base-level adjustments all happen through the Admin UI `RedundancyTab` — the operator edits the `ClusterNode` row in a draft generation and publishes. `RedundancyCoordinator.RefreshAsync` picks up the new topology without a process restart.
Node A lists Node B's `ApplicationUri` and vice-versa. Validated by `DualEndpointTests` in `tests/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer.IntegrationTests/` — boots two `OpcUaApplicationHost` instances on loopback, asserts a real OPCFoundation client `Session` reading `Server.ServerArray` from Node A sees both URIs.
## Split-brain
`akka.conf` configures Akka's split-brain resolver with `active-strategy = keep-oldest`, `stable-after = 15s`, and `failure-detector.threshold = 10.0`. Under a clean partition: the oldest member stays up + the smaller (or younger) side downs itself within ~15 seconds. The `RedundancyStateActor` on the surviving partition re-computes from the post-partition `Cluster.State`.
There is no operator-driven role swap during a partition. Failover is what the cluster does automatically.
## Client-side failover
The OtOpcUa Client CLI at `src/Client/ZB.MOM.WW.OtOpcUa.Client.CLI` supports `-F` / `--failover-urls` for automatic client-side failover; for long-running subscriptions the CLI monitors session KeepAlive and reconnects to the next available server, recreating the subscription on the new endpoint. See [`Client.CLI.md`](Client.CLI.md) for the command reference.
The OtOpcUa Client CLI at `src/Client/ZB.MOM.WW.OtOpcUa.Client.CLI` supports `-F` / `--failover-urls` for automatic client-side failover; for long-running subscriptions the CLI monitors session KeepAlive and reconnects to the next available server, recreating the subscription on the new endpoint. See [`Client.CLI.md`](Client.CLI.md).
## Depth reference
For the full decision trail and implementation plan — topology invariants, peer-probe cadence, recovery-dwell policy, compliance-script guard against enum-value drift — see `docs/v2/plan.md` §Phase 6.3.
For the full design — message contracts, tiered calculator truth table, recovery semantics — see `docs/plans/2026-05-26-akka-hosting-alignment-design.md` §6.
+6 -4
View File
@@ -111,13 +111,13 @@ Emissions map into `AlarmEventArgs` as `AlarmType = Kind.ToString()`, `SourceNod
## Composition
`Phase7EngineComposer.Compose` (`src/Server/ZB.MOM.WW.OtOpcUa.Server/Phase7/Phase7EngineComposer.cs`) is the single call site that instantiates the engine. It takes the generation's `Script` / `VirtualTag` / `ScriptedAlarm` rows, the shared `CachedTagUpstreamSource`, an `IAlarmStateStore`, and an `IAlarmHistorianSink`, and returns a `Phase7ComposedSources` the caller owns. When `scriptedAlarms.Count > 0`:
`Phase7Composer` (`src/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer/Phase7Composer.cs`) is the single call site that instantiates the engine. It takes the generation's `Script` / `VirtualTag` / `ScriptedAlarm` rows, the shared upstream-tag source, an `IAlarmStateStore`, and an `IAlarmHistorianSink`, and returns the composed sources the caller owns. When `scriptedAlarms.Count > 0`:
1. `ProjectScriptedAlarms` resolves each row's `PredicateScriptId` against the script dictionary and produces a `ScriptedAlarmDefinition` list. Unknown or disabled scripts throw immediately — the DB publish guarantees referential integrity but this is a belt-and-braces check.
2. A `ScriptedAlarmEngine` is constructed with the upstream source, the store, a shared `ScriptLoggerFactory` keyed to `scripts-*.log`, and the root Serilog logger.
3. `alarmEngine.OnEvent` is wired to `RouteToHistorianAsync`, which projects each emission into an `AlarmHistorianEvent` and enqueues it on the sink. Fire-and-forget — the SQLite store-and-forward sink is already non-blocking.
4. `LoadAsync(alarmDefs)` runs synchronously on the startup thread: it compiles every predicate, subscribes to the union of predicate inputs and message-template tokens, seeds the value cache, loads persisted state, re-derives `ActiveState` from a fresh predicate evaluation, and starts the 5s shelving timer. Compile failures are aggregated into one `InvalidOperationException` so operators see every bad predicate in one startup log line rather than one at a time.
5. A `ScriptedAlarmSource` is created for the event stream, and a `ScriptedAlarmReadable` (`src/Server/ZB.MOM.WW.OtOpcUa.Server/Phase7/ScriptedAlarmReadable.cs`) is created for OPC UA variable reads on the alarm's active-state node (task #245) — unknown alarm ids return `BadNodeIdUnknown` rather than silently reading `false`.
5. A `ScriptedAlarmSource` is created for the event stream; the v2 `ScriptedAlarmActor` (`src/Server/ZB.MOM.WW.OtOpcUa.Runtime/ScriptedAlarms/ScriptedAlarmActor.cs`) owns the active-state surface for OPC UA variable reads on the alarm's active-state node (task #245) — unknown alarm ids return `BadNodeIdUnknown` rather than silently reading `false`.
Both engine and source are added to `Phase7ComposedSources.Disposables`, which `Phase7Composer` disposes on server shutdown.
@@ -132,5 +132,7 @@ Both engine and source are added to `Phase7ComposedSources.Disposables`, which `
- `src/Core/ZB.MOM.WW.OtOpcUa.Core.ScriptedAlarms/AlarmTypes.cs``AlarmKind` + the four Part 9 enums
- `src/Core/ZB.MOM.WW.OtOpcUa.Core.ScriptedAlarms/MessageTemplate.cs``{path}` placeholder resolver
- `src/Core/ZB.MOM.WW.OtOpcUa.Core.ScriptedAlarms/IAlarmStateStore.cs` — persistence contract + `InMemoryAlarmStateStore` default
- `src/Server/ZB.MOM.WW.OtOpcUa.Server/Phase7/Phase7EngineComposer.cs` — composition, config-row projection, historian routing
- `src/Server/ZB.MOM.WW.OtOpcUa.Server/Phase7/ScriptedAlarmReadable.cs``IReadable` adapter exposing `ActiveState` to OPC UA variable reads
- `src/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer/Phase7Composer.cs` — composition, config-row projection, historian routing
- `src/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer/Phase7Applier.cs` — applies the composed Phase 7 plan into the SDK node manager
- `src/Server/ZB.MOM.WW.OtOpcUa.Runtime/ScriptedAlarms/ScriptedAlarmActor.cs` — actor wrapper owning the alarm state machine and exposing `ActiveState` for OPC UA variable reads
- `src/Server/ZB.MOM.WW.OtOpcUa.Host/Engines/RoslynScriptedAlarmEvaluator.cs` — production Roslyn predicate evaluator
+65 -41
View File
@@ -1,62 +1,86 @@
# Service Hosting
# Service Hosting (v2)
## Overview
A production OtOpcUa deployment runs **two or three processes**, each
with a distinct runtime and install surface:
A production OtOpcUa deployment runs **one binary per node**, plus the optional Wonderware historian sidecar:
| Process | Project | Runtime | Platform | Responsibility |
|---|---|---|---|---|
| **OtOpcUa Server** | `src/Server/ZB.MOM.WW.OtOpcUa.Server` | .NET 10 | x64 | Hosts the OPC UA endpoint; loads every driver in-process (Modbus, S7, AbCip, AbLegacy, TwinCAT, FOCAS, OPC UA Client, Galaxy via mxaccessgw); exposes `/healthz`. |
| **OtOpcUa Admin** | `src/Server/ZB.MOM.WW.OtOpcUa.Admin` | .NET 10 (ASP.NET Core / Blazor Server) | x64 | Operator UI for Config DB editing + fleet status, SignalR hubs (`FleetStatusHub`, `AlertHub`), Prometheus `/metrics`. |
| **OtOpcUa Wonderware Historian** *(optional)* | `src/Drivers/ZB.MOM.WW.OtOpcUa.Driver.Historian.Wonderware` | .NET Framework 4.8 | x86 (32-bit) | Out-of-process sidecar exposing the Wonderware Historian SDK over a named pipe. Required only when `Historian:Wonderware:Enabled=true` in `appsettings.json`. |
| **OtOpcUa Host** | `src/Server/ZB.MOM.WW.OtOpcUa.Host` | .NET 10 | AnyCPU | Single fused binary. `OTOPCUA_ROLES` env decides what to mount: `admin` (Blazor + auth + control-plane singletons), `driver` (OPC UA endpoint + per-driver actors), or both. |
| **OtOpcUa Wonderware Historian** *(optional)* | `src/Drivers/ZB.MOM.WW.OtOpcUa.Driver.Historian.Wonderware` | .NET Framework 4.8 | x86 (32-bit) | Out-of-process sidecar exposing the Wonderware Historian SDK over a named pipe. Required only when `Historian:Wonderware:Enabled=true`. |
Galaxy access uses a separately-installed **mxaccessgw** running out
of a sibling repo (`c:\Users\dohertj2\Desktop\mxaccessgw\`) — see
`docs/v2/Galaxy.ParityRig.md` for setup. The mxaccessgw owns the
MXAccess COM bitness constraint (its worker is x86 net48); nothing
in the OtOpcUa repo carries that constraint anymore. PR 7.2 retired
the legacy in-process `Galaxy.Host` / `Galaxy.Proxy` / `Galaxy.Shared`
projects + the `OtOpcUaGalaxyHost` Windows service.
Galaxy access still uses the separately-installed **mxaccessgw** sidecar (see `docs/v2/Galaxy.ParityRig.md`); the gateway owns the MXAccess COM bitness constraint (its worker is x86 net48). Nothing in the OtOpcUa repo carries that constraint anymore.
## OtOpcUa Server
> **v2 change.** v1's separate `OtOpcUa.Server` + `OtOpcUa.Admin` Windows services merged into a single role-gated `OtOpcUa.Host` binary. Two installers became one (with a `-Roles` parameter). The whole DI graph is composed in `OtOpcUa.Host/Program.cs`; per-role wiring is conditional on the env var.
Hosted via `Microsoft.Extensions.Hosting` with `AddWindowsService`
(decision #30 — replaced TopShelf in v2). The host's `Build()`
returns immediately when launched interactively (e.g. `dotnet run`)
but blocks for SCM signals when running as a Windows service.
## Role gating
In-process drivers are registered at startup in `Program.cs`'s
`DriverFactoryRegistry` block; the `DriverInstance` rows in the
central Config DB select which driver factories materialise into
live `IDriver` instances. See `docs/v2/driver-specs.md` for the
per-driver `DriverConfig` JSON shapes.
`Program.cs` reads `OTOPCUA_ROLES`, parses it with `RoleParser`, and conditionally registers services:
## OtOpcUa Admin
| Role present | Wires |
|---|---|
| `admin` | `AddOtOpcUaAuth`, `AddAdminUI`, `AddSignalR`, `AddOtOpcUaAdminClients`, `MapOtOpcUaAuth`, `MapAdminUI<App>`, `MapOtOpcUaHubs`, `WithOtOpcUaControlPlaneSingletons` (5 admin singletons via `Akka.Hosting`) |
| `driver` | `WithOtOpcUaRuntimeActors` (DriverHostActor + DbHealthProbeActor) — and the OPC UA endpoint on port 4840 |
| Either / both | `AddOtOpcUaConfigDb`, `AddOtOpcUaCluster`, `AddOtOpcUaHealth` (`/health/ready`, `/health/active`, `/healthz`) |
Same hosting model; runs the Blazor Server UI + SignalR hubs.
Reads from the same Config DB the Server writes to.
Single-node dev: `OTOPCUA_ROLES=admin,driver`. Production: typically two admin nodes (HA pair) + N driver nodes.
### Per-role configuration overlays
`Program.cs:33-35` builds a role suffix by joining the parsed roles **alphabetically** with `-` and loads `appsettings.{roleSuffix}.json` as an optional overlay on top of base `appsettings.json`. Three overlays ship in `src/Server/ZB.MOM.WW.OtOpcUa.Host/`:
- `appsettings.admin.json` — admin-only nodes
- `appsettings.driver.json` — driver-only nodes
- `appsettings.admin-driver.json` — fused single-node dev / small deployments
All three carry Serilog log-level overrides + `Security:Ldap:DevStubMode = false`. Loading order is **base `appsettings.json` → role overlay (`appsettings.{role}.json`) → environment overlay (`appsettings.{Environment}.json`)** — later layers win. Overlays are optional; the base file boots a node on its own.
## Akka cluster
The host joins an Akka.NET cluster bound to the address in `appsettings.json::Cluster`:
```json
{
"Cluster": {
"Hostname": "0.0.0.0",
"Port": 4053,
"PublicHostname": "node-a.lan",
"SeedNodes": ["akka.tcp://otopcua@node-a.lan:4053"],
"Roles": ["admin", "driver"]
}
}
```
- `WithOtOpcUaClusterBootstrap` (in `OtOpcUa.Cluster`) loads the embedded HOCON (split-brain resolver, pinned dispatcher, failure detector tuning) and overlays remote endpoint + cluster options.
- All cluster singletons + per-node actors live on this single ActorSystem — there is no second Akka instance.
See [Redundancy.md](Redundancy.md) for the role-leader + ServiceLevel story.
## Health endpoints
Both admin and driver nodes expose:
| Path | Status meaning |
|---|---|
| `/healthz` | Process alive. |
| `/health/ready` | ConfigDb reachable + cluster member state is `Up`. |
| `/health/active` | Admin-role leader (the node Traefik or an HA LB should route traffic to). |
Used by Traefik for the active-leader-only routing pattern (see [Task 63 traefik docs](v2/Architecture-v2.md) — TODO).
## OtOpcUa Wonderware Historian (optional)
When `Historian:Wonderware:Enabled=true`, the Server speaks to a
sidecar that wraps the Wonderware Historian SDK (which is .NET
Framework only). The pipe IPC contract is in
`src/Drivers/ZB.MOM.WW.OtOpcUa.Driver.Historian.Wonderware.Client/Contracts/`
and the sidecar's pipe handler lives at
`src/Drivers/ZB.MOM.WW.OtOpcUa.Driver.Historian.Wonderware/Pipe/`.
Install via the `-InstallWonderwareHistorian` switch on
`scripts/install/Install-Services.ps1`.
Unchanged from v1. Pipe IPC contract lives in `src/Drivers/ZB.MOM.WW.OtOpcUa.Driver.Historian.Wonderware.Client/Contracts/`; sidecar pipe handler in `src/Drivers/ZB.MOM.WW.OtOpcUa.Driver.Historian.Wonderware/Pipe/`. Install via `scripts/install/Install-Services.ps1 -InstallWonderwareHistorian`.
## Install / Uninstall
- `scripts/install/Install-Services.ps1` — installs `OtOpcUa` and
optionally `OtOpcUaWonderwareHistorian`.
- `scripts/install/Uninstall-Services.ps1` — stops + removes both,
plus `OtOpcUaGalaxyHost` if a pre-7.2 rig still carries it.
- `scripts/install/Install-Services.ps1 -Roles admin,driver` — installs `OtOpcUaHost`. v2 rewrite tracked as plan Task 62.
- `scripts/install/Uninstall-Services.ps1` — stops + removes the host service (and the historian sidecar if installed).
## Logging
Serilog with rolling-daily file sinks. Each service writes to
`%ProgramData%\OtOpcUa\<service>-*.log` plus stdout (NSSM-friendly).
Serilog with rolling-daily file sinks. Each host writes to `logs/otopcua-*.log` plus stdout (NSSM/systemd-friendly). Per-environment log level overrides go in `appsettings.{Environment}.json`.
## Depth reference
For the full host-architecture rationale (why fused vs. split, role-gating tradeoffs, multi-node deployment shapes), see `docs/plans/2026-05-26-akka-hosting-alignment-design.md` §3-4.
+8 -8
View File
@@ -107,13 +107,12 @@ Per [ADR-002](v2/implementation/adr-002-driver-vs-virtual-dispatch.md) Option B,
`ITagUpstreamSource` and `IHistoryWriter` are the two ports the engine requires from its host. Both live in `Core.VirtualTags`. In the Server process:
- **`CachedTagUpstreamSource`** (`src/Server/ZB.MOM.WW.OtOpcUa.Server/Phase7/CachedTagUpstreamSource.cs`) implements the interface (and the parallel `Core.ScriptedAlarms.ITagUpstreamSource` — identical shape, distinct namespace). A `ConcurrentDictionary<path, DataValueSnapshot>` cache. `Push(path, snapshot)` updates the cache and fans out synchronously to every observer. Reads of never-pushed paths return `BadNodeIdUnknown` quality (`UpstreamNotConfigured = 0x80340000`).
- **`DriverSubscriptionBridge`** (`src/Server/ZB.MOM.WW.OtOpcUa.Server/Phase7/DriverSubscriptionBridge.cs`) feeds the cache. For each registered `ISubscribable` driver it batches a single `SubscribeAsync` for every fullRef the script graph references, installs an `OnDataChange` handler that translates driver-opaque fullRefs back to UNS paths via a reverse map, and pushes each delta into `CachedTagUpstreamSource`. Unsubscribes on dispose. The bridge suppresses `OTOPCUA0001` (the Roslyn analyzer that requires `ISubscribable` callers to go through `CapabilityInvoker`) on the documented basis that this is a lifecycle wiring, not per-evaluation hot path.
- **Upstream-tag feed.** In v2 the upstream-tag feed is provided by the actor system. `DependencyMuxActor` (`src/Server/ZB.MOM.WW.OtOpcUa.Runtime/VirtualTags/DependencyMuxActor.cs`) multiplexes driver `ISubscribable` subscriptions for every fullRef the script graph references, translating driver-opaque fullRefs back to UNS paths via a reverse map. Deltas land on `VirtualTagActor` (`src/Server/ZB.MOM.WW.OtOpcUa.Runtime/VirtualTags/VirtualTagActor.cs`) as `DependencyValueChanged` messages; the actor's in-memory cache serves the engine's synchronous `GetTag` reads. Reads of never-pushed paths return `BadNodeIdUnknown` quality (`UpstreamNotConfigured = 0x80340000`).
- **`IHistoryWriter`** — no production implementation is currently wired for virtual tags; `VirtualTagEngine` gets `NullHistoryWriter` by default from `Phase7EngineComposer`.
## Composition
`Phase7Composer` (`src/Server/ZB.MOM.WW.OtOpcUa.Server/Phase7/Phase7Composer.cs`) is an `IAsyncDisposable` injected into `OpcUaServerService`:
`Phase7Composer` (`src/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer/Phase7Composer.cs`) projects the published generation into a `Phase7Plan` that `Phase7Applier` applies to the running SDK node manager:
1. `PrepareAsync(generationId, ct)` — called after the bootstrap generation loads and before `OpcUaApplicationHost.StartAsync`. Reads the `Script` / `VirtualTag` / `ScriptedAlarm` rows for that generation from the config DB (`OtOpcUaConfigDbContext`). Empty-config fast path returns `Phase7ComposedSources.Empty`.
2. Constructs a `CachedTagUpstreamSource` + hands it to `Phase7EngineComposer.Compose`.
@@ -145,8 +144,9 @@ Definition reload on config publish: `VirtualTagEngine.Load` is re-entrant — a
- `src/Core/ZB.MOM.WW.OtOpcUa.Core.VirtualTags/ITagUpstreamSource.cs` — driver-tag read + subscribe port
- `src/Core/ZB.MOM.WW.OtOpcUa.Core.VirtualTags/IHistoryWriter.cs` — historize sink port + `NullHistoryWriter`
- `src/Core/ZB.MOM.WW.OtOpcUa.Core.VirtualTags/VirtualTagSource.cs``IReadable` + `ISubscribable` adapter
- `src/Server/ZB.MOM.WW.OtOpcUa.Server/Phase7/CachedTagUpstreamSource.cs` — production `ITagUpstreamSource`
- `src/Server/ZB.MOM.WW.OtOpcUa.Server/Phase7/DriverSubscriptionBridge.cs` — driver `ISubscribable` → cache feed
- `src/Server/ZB.MOM.WW.OtOpcUa.Server/Phase7/Phase7EngineComposer.cs` row projection + engine instantiation
- `src/Server/ZB.MOM.WW.OtOpcUa.Server/Phase7/Phase7Composer.cs`lifecycle host: load rows, compose, wire bridge
- `src/Server/ZB.MOM.WW.OtOpcUa.Server/OpcUa/DriverNodeManager.cs``SelectReadable` + `IsWriteAllowedBySource` dispatch kernel
- `src/Server/ZB.MOM.WW.OtOpcUa.Runtime/VirtualTags/VirtualTagActor.cs` — actor wrapper that owns per-instance state and the synchronous read cache
- `src/Server/ZB.MOM.WW.OtOpcUa.Runtime/VirtualTags/DependencyMuxActor.cs` — driver `ISubscribable` → actor feed (replaces the v1 `DriverSubscriptionBridge`)
- `src/Server/ZB.MOM.WW.OtOpcUa.Host/Engines/RoslynVirtualTagEvaluator.cs` — production Roslyn evaluator wired into the actor
- `src/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer/Phase7Composer.cs`row projection + engine instantiation (`Phase7Plan` composer)
- `src/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer/Phase7Applier.cs` — applies the composed plan into the SDK node manager
- `src/Core/ZB.MOM.WW.OtOpcUa.Core/OpcUa/GenericDriverNodeManager.cs` — driver-vs-virtual dispatch kernel
+7 -6
View File
@@ -136,9 +136,10 @@ ConditionType events (non-base `BaseEventType`) is not verified.
## Follow-up candidates
The easiest win here is to **wire the client driver tests against this
repo's own server**. The integration test project
`tests/Server/ZB.MOM.WW.OtOpcUa.Server.Tests/OpcUaServerIntegrationTests.cs`
already stands up a real OPC UA server on a non-default port with a seeded
repo's own server**. The v2 integration test project
`tests/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer.IntegrationTests/DualEndpointTests.cs`
(the v2 replacement for the retired v1 `OpcUaServerIntegrationTests`) already
stands up a real OPC UA server on a non-default port with a seeded
FakeDriver. An `OpcUaClientLiveLoopbackTests` that connects the client
driver to that server would give:
@@ -165,6 +166,6 @@ Beyond that:
mocked `Session`
- `src/Drivers/ZB.MOM.WW.OtOpcUa.Driver.OpcUaClient/OpcUaClientDriver.cs` — ctor +
session-factory seam tests mock through
- `tests/Server/ZB.MOM.WW.OtOpcUa.Server.Tests/OpcUaServerIntegrationTests.cs`
the server-side integration harness a future loopback client test could
piggyback on
- `tests/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer.IntegrationTests/DualEndpointTests.cs`
the v2 dual-endpoint integration harness a future loopback client test could
piggyback on (v1 `OpcUaServerIntegrationTests.cs` retired with the v1 server project)
@@ -0,0 +1,451 @@
# OtOpcUa v2 — Akka.NET + Fused Hosting Alignment with ScadaLink
**Status:** Design approved, ready for implementation planning
**Date:** 2026-05-26
**Branch:** `v2-akka-fuse`
**Sister project reference:** `~/Desktop/scadalink-design` (ScadaLink)
## 1. Motivation
OtOpcUa today runs as three separate processes (`OtOpcUa.Server` OPC UA host, `OtOpcUa.Admin` Blazor Server web UI, optional `OtOpcUaWonderwareHistorian` Framework sidecar) with manual operator-driven warm-redundancy failover. The sister project ScadaLink — owned by the same developer — solved similar problems with a fused single-binary, role-gated hosting model on top of an Akka.NET cluster.
The motivation for this refactor is twofold:
1. **Consistency.** A single developer (the project owner) moves between OtOpcUa and ScadaLink frequently. Sharing patterns — hosting, auth, actor hierarchy, deployment model — reduces cognitive overhead and makes fixes portable.
2. **Real HA improvements.** Upgrade OtOpcUa's manual operator-driven failover to automatic, Akka-cluster-driven failover with Traefik routing for the web UI. Preserve OPC UA dual-endpoint client-side failover semantics (clients connect to both nodes and pick based on `ServiceLevel`), now driven automatically by Akka cluster leadership.
## 2. Architecture overview
**One binary, role-gated.** `OtOpcUa.Host` (Microsoft.NET.Sdk.Web, .NET 10) replaces `OtOpcUa.Server` and `OtOpcUa.Admin`. Same binary on every node. Role configured via `OTOPCUA_ROLES` environment variable.
**Two Akka roles, single cluster:**
- **`admin`** — hosts Blazor web UI + cluster singletons. Singletons pinned via `ClusterSingletonManagerSettings.WithRole("admin")`. Traefik routes `/` to whichever Admin-role node `/health/active` reports as leader.
- **`driver`** — hosts OPC UA endpoint + per-node `DriverHostActor` hierarchy. Every Driver-role node always serves OPC UA; `ServiceLevel` computed by `RedundancyStateActor` is broadcast back to each Driver node and used to publish to the local OPC UA address space.
Roles are additive: `OTOPCUA_ROLES=admin`, `OTOPCUA_ROLES=driver`, or `OTOPCUA_ROLES=admin,driver`. Small deployments run both roles on both nodes; larger deployments separate them.
**Per-role leadership.** `Cluster.Get(system).State.RoleLeader("driver")` drives OPC UA `ServiceLevel`. `RoleLeader("admin")` drives `/health/active` (Traefik routing). These are independent — admin and driver leadership can land on different nodes if separated.
**Cluster membership.** Both seed nodes; keep-oldest split-brain resolver; `down-if-alone = on`; 15s stable-after; 2s heartbeat / 10s threshold. CoordinatedShutdown for graceful singleton handover. Exact ScadaLink tuning.
**OPC UA dual-endpoint preserved.** Driver-role nodes all bind `opc.tcp://0.0.0.0:4840`. Clients still see N endpoints in `ServerUriArray` and fail over via `ServiceLevel`. OPC UA spec compliance unchanged from today.
**Mac dev:** role `admin,driver,dev``dev` short-circuits Windows-only driver registration (Galaxy, Wonderware) with explicit `[DEV-STUB]` log lines.
## 3. Project & process restructure
Single solution, ScadaLink-style folder layout. Existing OtOpcUa naming convention (`ZB.MOM.WW.OtOpcUa.*`) preserved.
### New entry point & deletions
| Action | Project |
|---|---|
| **New** | `OtOpcUa.Host``Microsoft.NET.Sdk.Web`, single Program.cs, role-gated startup, `AddWindowsService` |
| **Delete** | `OtOpcUa.Server` (content migrates) |
| **Delete** | `OtOpcUa.Admin` (UI moves to library) |
### New libraries
| Project | Owns | ScadaLink analog |
|---|---|---|
| `OtOpcUa.Commons` | Entity POCOs, interfaces, message contracts (`Types/`, `Interfaces/`, `Entities/`, `Messages/`) | `ScadaLink.Commons` |
| `OtOpcUa.ConfigDb` | EF Core `DbContext`, repositories, `IAuditService`, migrations, Data Protection key store | `ScadaLink.ConfigurationDatabase` |
| `OtOpcUa.Cluster` | Akka HOCON, `AkkaHostedService`, split-brain resolver config, role-aware membership helpers, `IClusterRoleInfo` | (split out of ScadaLink Host) |
| `OtOpcUa.Security` | LDAP bind, cookie+JWT hybrid, JWT issuance, role mapping, `/auth/login`, `/auth/ping` endpoints | `ScadaLink.Security` |
| `OtOpcUa.ControlPlane` | Cluster singletons: `ConfigPublishCoordinator`, `AdminOperationsActor`, `AuditWriterActor`, `FleetStatusBroadcaster`, `RedundancyStateActor` | `ScadaLink.ManagementService` |
| `OtOpcUa.Runtime` | Per-node actors: `DriverHostActor`, `DriverInstanceActor`, `VirtualTagActor`, `ScriptedAlarmActor`, `OpcUaPublishActor`, `HistorianAdapterActor`, `PeerOpcUaProbeActor`, `DbHealthProbeActor` | `ScadaLink.SiteRuntime` |
| `OtOpcUa.OpcUaServer` | OPC UA app host, address-space build, `Phase7Composer` extraction | (in ScadaLink.SiteRuntime/DCL) |
| `OtOpcUa.AdminUI` | Blazor components, hubs (`FleetStatusHub`, `AlertHub`, `ScriptLogHub`), auth state provider, `MapAdminUI<TApp>()` | `ScadaLink.CentralUI` |
### Unchanged
- Driver projects (`OtOpcUa.Driver.Galaxy`, `.Modbus`, `.S7`, `.AbCip`, `.AbLegacy`, `.TwinCAT`, `.FOCAS`, `.OpcUaClient`) — still implement `IDriver`, now consumed by `DriverInstanceActor` instead of `DriverInstanceBootstrapper`.
- `OtOpcUa.Driver.Historian.Wonderware` — .NET Framework 4.8 sidecar, named-pipe IPC, wrapped by a `HistorianAdapterActor` in `OtOpcUa.Runtime`.
- `mxaccessgw` sibling repo — unchanged; Galaxy driver still talks gRPC to it.
### Tests
- `tests/OtOpcUa.Cluster.Tests` — split-brain, leadership transitions
- `tests/OtOpcUa.ControlPlane.Tests` — singleton actor unit tests via Akka.TestKit
- `tests/OtOpcUa.Runtime.Tests` — per-node actor tests, driver lifecycle
- `tests/OtOpcUa.Security.Tests` — LDAP, cookie+JWT roundtrip
- `tests/OtOpcUa.Host.IntegrationTests` — 2-node in-process cluster, deployment flow, failover, Mac-safe
- `tests/OtOpcUa.OpcUa.IntegrationTests` — real OPCFoundation client against stubbed Host
- `tests/OtOpcUa.E2E.Tests` — full stack with Traefik (nightly CI)
### Deploy
- `deploy/Install-Services.ps1` — installs one Windows Service per node (`OtOpcUaHost`), passes role via env var. Old script replaced.
- `deploy/traefik/` — Windows Traefik config + service registration for the leader-routed `/health/active`.
- `docker-dev/` (new, optional) — 2-node Mac dev compose with stubbed drivers + LDAP + SQL Server + Traefik.
Solution file: `OtOpcUa.slnx` (matches ScadaLink convention; switch from current `.sln`).
## 4. Actor hierarchy
### Per-node tree
Rooted under `OtOpcUa.Runtime`, one tree per Driver-role node:
```
DriverHostActor (per-node coordinator, started by Host)
├─ DriverInstanceActor (per DriverInstance row)
│ └─ children = pooled or per-subscription work
├─ VirtualTagActor (per VirtualTag row)
├─ ScriptedAlarmActor (per ScriptedAlarm row)
├─ OpcUaPublishActor (per-node bridge to OPCFoundation address space)
├─ HistorianAdapterActor (per-node, wraps Wonderware named-pipe sidecar)
├─ PeerOpcUaProbeActor (per-node, tests peer OPC UA stack health)
└─ DbHealthProbeActor (per-node, cached DB health probe)
```
### Cluster singletons
Pinned to `admin` role via `ClusterSingletonManagerSettings.WithRole("admin")`:
| Actor | Owns | Notes |
|---|---|---|
| `ConfigPublishCoordinator` | The deploy protocol. Writes `Deployment` row, broadcasts `DispatchDeployment(deploymentId)` via `DistributedPubSub` to every `DriverHostActor`, tracks apply ACKs per node. | Replaces `ApplyLeaseRegistry`. Resumes after failover by re-reading ConfigDb state — no Akka.Persistence. |
| `AdminOperationsActor` | All mutating admin ops (CRUD on equipment, drivers, scripts, namespaces, ACLs). Wraps each in an audit envelope. | UI calls via `ClusterSingletonProxy` (in-process when UI is on Admin node). |
| `AuditWriterActor` | Receives `AuditEvent` telemetry from any node, batch-inserts into `ConfigAuditLog`. | Idempotent on `EventId`. |
| `FleetStatusBroadcaster` | Aggregates Akka cluster member events + per-node `DriverHostStatus` heartbeats. Publishes diffs to `IHubContext<FleetStatusHub>` and `IHubContext<AlertHub>`. | Push-driven; replaces today's 5s `FleetStatusPoller`. |
| `RedundancyStateActor` | Subscribes to `ClusterEvent.IMemberEvent` + `ClusterEvent.LeaderChanged` + per-node health probes. Computes `ServiceLevel` byte + `ServerUriArray` per Driver node. Publishes to `DistributedPubSub` topic `redundancy-state`. | Source of truth for OPC UA redundancy. Local `OpcUaPublishActor` subscribes and writes to its OPCFoundation stack. |
### Supervision
| Actor | Strategy |
|---|---|
| `DriverHostActor` | `Resume` |
| `DriverInstanceActor` | `Restart` with backoff (1s → 30s, ×1.5, jitter) |
| `VirtualTagActor` | `Restart` with backoff |
| `ScriptedAlarmActor` | `Restart` with backoff; preserve alarm state via `PreRestart` hook |
| `OpcUaPublishActor` | `Resume` |
| `HistorianAdapterActor` | `Restart` with backoff; SQLite store-and-forward buffers during pipe outage |
| All singletons | `Resume`; resumable state in ConfigDb |
| Script execution actors (short-lived) | `Stop` on failure |
### State machines
- `DriverInstanceActor` — Become/Stash for `Connecting → Connected → Reconnecting → Failed`. Bad-quality publish on disconnect; transparent re-subscribe on reconnect. Write failures returned synchronously via `Ask` from `OpcUaPublishActor`.
- `ConfigPublishCoordinator``Idle → Publishing → AwaitingApplyAcks → Sealed`, with timeout-driven escalation if a node fails to ack within `ApplyMaxDuration` (default 10 min).
- `RedundancyStateActor` — recomputes on every membership event, debounced 250ms to coalesce bursts.
### Communication conventions
- **Tell** for hot-path internal traffic (driver values, alarm state changes, publish broadcasts).
- **Ask** only at system boundaries (UI controller → `AdminOperationsActor`, with explicit timeout + cancellation token).
- **DistributedPubSub** for cluster-wide broadcasts (`DispatchDeployment`, `RedundancyStateChanged`, `FleetStatusChanged`).
- Application-level **correlation IDs** on every request/response message.
- Messages live in `OtOpcUa.Commons.Messages.{Drivers,Deploy,Admin,Audit,Redundancy}` — additive-only evolution.
### Singleton persistence
No Akka.Persistence. Each singleton reads its resumable state from `ConfigDb` on `PreStart` (e.g., `ConfigPublishCoordinator` reads the current in-flight `Deployment` row + per-node `NodeDeploymentState`) and writes on every state transition.
### Mac-dev stubs
`DevNode` role short-circuits driver registration. `DriverInstanceActor` for any Galaxy/Wonderware row enters a `Stubbed` Become state that returns deterministic test values. Logged at INFO with `[DEV-STUB] driver={Name} reason=windows-only`.
## 5. Web hosting, auth, and SignalR
### Kestrel startup gated by `admin` role
`Program.cs` builds `WebApplicationBuilder`, registers all services, but only calls `app.MapBlazor<App>()`, `app.MapHub<...>()`, `app.MapStaticAssets()`, and auth endpoints when `admin ∈ roles`. Driver-only nodes still bind Kestrel for `/healthz` on `:4841` and nothing else.
### Authentication — cookie+JWT hybrid
| Layer | Config |
|---|---|
| Cookie scheme | `OtOpcUa.Auth`, HttpOnly, SameSite=Strict, Secure (prod) / SameAsRequest (dev). Sliding 30-min idle timeout. |
| Embedded JWT | HMAC-SHA256, 15-min expiry, claims = `sub`, `roles`, `nodeAcls`. |
| LDAP bind | `LdapAuthService.AuthenticateAsync(user, pw)` at `/auth/login` POST — preserved from current `OtOpcUa.Admin/Security`. |
| Role mapping | `RoleMapper.MapGroupsToRolesAsync()` — LDAP groups → `FleetAdmin` / `ConfigEditor` / `ReadOnly`. Stays as-is. |
| Token issuance | `/auth/token` returns bearer for external clients (CLI, automation). |
| Circuit expiry probe | `/auth/ping` returns 200/401, polled by `CookieAuthenticationStateProvider` to detect expiry from inside a SignalR circuit. |
| Failure mode | LDAP unreachable → new logins fail, active sessions continue. |
### Data Protection keys
`services.AddDataProtection().PersistKeysToDbContext<OtOpcUaConfigDbContext>().SetApplicationName("OtOpcUa")` — keys live in `ConfigDb` so a circuit started on Admin-node A survives if Traefik fails over to Admin-node B mid-session.
### SignalR hubs
Three existing hubs preserved (`/hubs/fleet`, `/hubs/alerts`, `/hubs/script-log`):
- **Today:** `FleetStatusPoller` polls SQL every 5s.
- **New:** `FleetStatusBroadcaster` singleton receives Akka cluster events + per-node telemetry, pushes diffs via `IHubContext<FleetStatusHub>`. No polling.
- `HubTokenService` bearer-token fallback retired — hubs are circuit-local, cookie auth flows through SignalR natively. External hub consumers use the bearer token from `/auth/token` with a `JwtBearer` authentication scheme declaration on the hub.
### UI → backend wiring
- **Reads:** Blazor components inject scoped repositories from DI and read directly from `ConfigDb`. No change from today.
- **Writes / mutating ops:** Components inject `IAdminOperationsClient` — a thin wrapper around `ClusterSingletonProxy` to `AdminOperationsActor`. Mutations are `Ask` with a 10s timeout + correlation ID. Audit envelope built UI-side, completed singleton-side.
- **Driver diagnostics:** Today's `DriverDiagnosticsClient` HTTP round-trip retires. UI components ask `IFleetDiagnosticsClient` which delegates to `ClusterClientReceptionist`-published actor messages.
### Health endpoints
| Endpoint | Returns | Used by |
|---|---|---|
| `/health/ready` | 200 once Akka member is `Up` + ConfigDb reachable + DataProtection key ring loaded | Service supervisor readiness gate |
| `/health/active` | 200 only on the Admin-role leader; 503 elsewhere | Traefik — routes browser traffic to leader |
| `/healthz` (existing) | 200 when Driver-role actor system is up + at least one driver registered (preserved on `:4841`) | Ops probes, OPC UA monitoring tools |
### Traefik
Windows Service (or external box). One route: `host=otopcua.*` → load-balance to `{admin-node-a:9000, admin-node-b:9000}` with `/health/active` health check, sticky sessions disabled (DataProtection key sharing handles continuity).
### appsettings structure
Mirrors ScadaLink's per-component options pattern: `Cluster:`, `Security:`, `ConfigDb:`, `OpcUa:`, `Drivers:`, `Historian:` sections, bound to options classes owned by their respective component projects.
## 6. Edit + Deploy flow (replaces draft/publish generations)
The single most consequential domain change: **drop the draft/publish `ConfigGeneration` lifecycle**. Edits are live; deploy is a snapshot+push, ScadaLink-style.
### Edit model
- `Equipment`, `Driver`, `DriverInstance`, `Namespace`, `UnsItem`, `Script`, `VirtualTag`, `ScriptedAlarm`, `NodeAcl` are edited **directly** via `AdminOperationsActor`. No draft staging, no `ConfigGeneration` lifecycle. Last-write-wins per row (rowversion column for stale-write detection only).
- Live edits do **not** affect running Driver-role nodes — running stacks reflect the *last-deployed* state. The UI shows a "drift" indicator when live ConfigDb state differs from last sealed deployment.
- Validation runs on edit (semantic checks: driver tag-path validity, script syntax, namespace name uniqueness) — pulled forward from deploy-time to edit-time.
### Deploy model
```
Admin UI "Deploy" → AdminOperationsActor.Ask(StartDeployment)
AdminOperationsActor:
→ snapshot ConfigDb current state
→ ConfigComposer.Flatten() → DeploymentArtifact
→ compute RevisionHash = SHA256(canonical-serialized artifact)
→ write Deployment row (DeploymentId GUID, RevisionHash, CreatedBy, CreatedAtUtc, Status=Dispatching)
→ Ask ConfigPublishCoordinator.DispatchDeployment(deploymentId)
ConfigPublishCoordinator (cluster singleton, admin role):
→ write Deployment.Status = Dispatching
→ DistributedPubSub Publish to "deployments" topic: DispatchDeployment(deploymentId, revisionHash)
→ schedule ApplyDeadline timer (ApplyMaxDuration, default 10 min)
DriverHostActor (per node, subscribed to "deployments"):
receive DispatchDeployment(deploymentId, revisionHash):
→ if currentDeploymentRevision == revisionHash → ack Applied (idempotent)
→ else:
→ acquire per-node ApplyLock (Become Applying(deploymentId))
→ write NodeDeploymentState row (NodeId, DeploymentId, StartedAtUtc)
→ fetch artifact: read DeploymentArtifact blob from ConfigDb by deploymentId
→ diff against current applied artifact → per-instance ApplyDelta plans
→ dispatch ApplyDelta to DriverInstanceActor / VirtualTagActor / ScriptedAlarmActor children
→ collect per-instance acks (all-or-nothing per node)
→ on full success: write GenerationSealedCache (LiteDb local), update NodeDeploymentState.AppliedAtUtc
→ on any instance Failure: rollback to previous deployment, mark NodeDeploymentState=Failed
→ Tell Coordinator: ApplyAck(deploymentId, nodeId, Applied | Failed(reason))
→ Become Steady
ConfigPublishCoordinator: collect ApplyAcks
→ all Driver nodes Applied → Deployment.Status = Sealed → DistributedPubSub PublishDeploymentSealed
→ any Failed → Deployment.Status = PartiallyFailed → broadcast DeploymentFailed
→ deadline elapsed before all acks → Deployment.Status = TimedOut → broadcast DeploymentTimedOut
```
### Per-instance operation lock
All mutating commands (deploy, disable, enable, delete) on a `DriverInstance` go through `DriverInstanceActor`, which serializes them via the actor mailbox — single-threaded by construction.
### Idempotency
- `DeploymentId` + `RevisionHash` together identify a deployment.
- `DriverHostActor` seeing a `DispatchDeployment` whose `RevisionHash` matches current applied state → immediate ack `Applied`, no work. Safe to redeliver.
- `Phase7Composer.ComposeAsync(artifact)` is pure; same artifact → same delta plan.
- `DriverInstanceActor.ApplyDelta(plan)` compares against current state, applies only diffs.
### Concurrency control
- Last-write-wins on edits (no optimistic concurrency on `Equipment`, `Driver`, `Script`, etc.) — matches ScadaLink template behavior.
- **Optimistic concurrency on `Deployment` and `NodeDeploymentState` rows** (rowversion column) — prevents two concurrent Coordinator instances (during failover) from corrupting state.
### Singleton failover during deploy
1. Old Coordinator wrote `Deployment.Status = Dispatching` + `NodeDeploymentState` rows before broadcast.
2. New Coordinator on takeover queries `Deployment` rows with non-terminal `Status`.
3. For each in-flight deployment, `Ask` every `DriverHostActor` (via cluster-aware actor selection) for current `NodeDeploymentState`.
4. Recompute outstanding-ack set; resume the deadline timer with the remaining time.
5. If apply deadline already passed → mark `Deployment.Status = TimedOut` for any unack'd nodes.
### Crash recovery on Driver node restart
- `DriverHostActor.PreStart` reads `NodeDeploymentState` for self.
- If row says `Applied` for some `DeploymentId` and matches last sealed cache → Become Steady on that artifact.
- If row says `Applying` (didn't reach Applied) → discard partial state, re-fetch the artifact, replay apply (idempotent).
- If ConfigDb unreachable → fall back to local LiteDb sealed cache, Become `Stale` (drops ServiceLevel via `RedundancyStateActor`). Background reconnect retries every 30s.
### Schema migration from today
| Today | New |
|---|---|
| `ConfigGeneration` (Draft/Published/Sealed lifecycle) | **Dropped** |
| `ClusterNodeGenerationState` | Renamed → `NodeDeploymentState` with `(NodeId, DeploymentId, Status, StartedAtUtc, AppliedAtUtc, RowVersion)` |
| `ClusterNode.RedundancyRole` column | **Dropped** (Akka leader-of-driver-role is source of truth) |
| `ConfigAuditLog` | Kept; deploy events added as new event types |
| (new) `Deployment` | `(DeploymentId, RevisionHash, Status, CreatedBy, CreatedAtUtc, ArtifactBlob varbinary(max), RowVersion)` |
| (new) `ConfigEdit` audit row per Equipment/Driver/Script edit | Live-edit history |
| (new) `DataProtectionKeys` | DataProtection key ring storage |
No more `ApplyLeaseRegistry` table or watchdog actor. Apply state lives in `NodeDeploymentState`; watchdog is a Coordinator-side scheduled message keyed by `DeploymentId`.
### Stale-config fallback
Preserved from today's `GenerationSealedCache`: local LiteDb cache holds last-applied `DeploymentArtifact`. On Host boot with ConfigDb unreachable, `DriverHostActor` boots from cache → Become `Stale``RedundancyStateActor` drops `ServiceLevel` for that node.
### Peer probes consolidated
| Today | New |
|---|---|
| `PeerHttpProbeLoop` (HTTP `/healthz`) | Retired — Akka failure detector replaces it |
| `PeerUaProbeLoop` (OPC UA `opc.tcp://peer:4840`) | **Retained** as `PeerOpcUaProbeActor` — tests whether the OPC UA stack itself (not just the process) is up. Feeds `RedundancyStateActor`. |
| `DbHealthCache` (cached DB probe) | Retained as `DbHealthProbeActor` per-node. Feeds `RedundancyStateActor` + `/health/ready`. |
### ServiceLevel computation in `RedundancyStateActor`
```
serviceLevel(node) =
base 240 if (cluster member Up AND db reachable AND not stale AND opc ua probe ok)
base 200 if (member Up AND db reachable AND stale)
base 100 if (member Up AND db unreachable AND stale)
base 0 if (member Down / Unreachable)
+10 bonus if Akka driver-role leader is this node
```
ServiceLevel bands match the existing `RedundancyStatePublisher` so OPC UA client behavior is unchanged from today. The leader-bonus replaces today's operator-managed `RedundancyRole = Primary`.
## 7. Error handling & failure modes
### Akka cluster failure modes
| Scenario | Behavior |
|---|---|
| Network partition (split-brain) | Keep-oldest resolver downs the smaller side after 15s stable-after. `down-if-alone = on` covers isolated nodes. |
| Admin leader process crash | Failure detector trips after 10s, downs the member, new singleton instance starts on remaining Admin node. Traefik `/health/active` probe fails over within 1 polling interval (~5s). |
| Driver-role node crash | RedundancyStateActor sees member Down → drops that node's ServiceLevel to 0 → OPC UA clients reconnect to surviving node. Both nodes were already running their own copy; no in-cluster recovery needed for that node's work. |
| Both Admin nodes down simultaneously | Web UI unavailable. Driver nodes continue serving OPC UA from last-sealed cache. No new deployments possible until Admin node recovers. |
| All Driver nodes down | OPC UA endpoints unavailable. Clients reconnect when any Driver node returns. ServiceLevel back to 240 once member Up + DB reachable + apply sealed. |
| Singleton handover during deploy | Coordinator state survives in `Deployment` + `NodeDeploymentState` ConfigDb rows. New Coordinator queries DriverHostActors via cluster-aware actor selection. Resume remaining deadline. |
### ConfigDb unavailability
- **At edit time:** AdminUI returns user-visible error. No retries — operator decides.
- **At deploy time:** Coordinator refuses to start dispatch if it can't write the `Deployment` row.
- **At Driver node boot:** Fall back to local LiteDb sealed cache. RedundancyStateActor drops `ServiceLevel`.
- **At singleton failover:** New Coordinator's `PreStart` retries via Polly (5 attempts, exponential backoff). If exhausted → singleton crashes → cluster restarts singleton on next viable Admin node.
### Driver / equipment failures
- Driver connection loss → `DriverInstanceActor` enters `Reconnecting` Become state, publishes bad-quality to OPC UA address space immediately, retries at fixed interval.
- Tag-path-resolution failure → retried periodically.
- Write failure to driver → returned synchronously to caller via `Ask` from `OpcUaPublishActor`.
- Driver process unresponsive (Galaxy gateway down) → `IDriver.HealthCheck` returns degraded → `DriverInstanceActor` reports to `DriverHostActor``RedundancyStateActor` factors into ServiceLevel.
### Wonderware historian sidecar
- Named-pipe disconnect → `HistorianAdapterActor` enters `Reconnecting`; alarm history rows buffered to local SQLite store-and-forward.
- Sidecar process crash → no in-cluster recovery (external process); operator restarts via Windows Service control.
### Auth failures
- LDAP unreachable → `/auth/login` returns 503. Active sessions continue with cached claims.
- JWT signature failure (key ring drift) → 401, session terminates. DataProtection keys in ConfigDb prevent this in the happy path.
- Cookie expired (sliding 30-min idle) → `/auth/ping` returns 401 → `CookieAuthenticationStateProvider` triggers UI logout.
### SignalR / circuit drops
- Blazor circuit dropped → `App.razor` reload script reconnects (preserved from today).
- Hub message loss during reconnect → `FleetStatusBroadcaster` resends current state to the reconnecting client on `OnConnectedAsync` (full snapshot, not just diffs).
### OPC UA stack failures
- Address-space corruption → `OpcUaPublishActor` logs ERROR, sends `RebuildAddressSpace` to itself; sequence number bump notifies clients to resubscribe.
- OPC UA listener bind failure (port collision) → Host fails readiness probe, supervisor restarts service.
### Audit invariants
- Audit write failures **never abort** the user-facing action. `AuditWriterActor` buffer overflow → log WARN, drop oldest (with counter metric). The action's success/failure path is authoritative.
- All deploy + edit events carry `ExecutionId` (per-request correlation) so audit rows for one operator action share an ID.
## 8. Testing strategy
Test projects mirror the new layering. Test infrastructure stays Mac-friendly: stubbed Windows-only drivers, ephemeral SQL Server (LocalDB on Windows / `mcr.microsoft.com/mssql/server` container on Mac), `OpenLDAP` container, all spun up via `tests/docker-compose.yml`.
### Layered test pyramid
| Layer | Project | What it covers |
|---|---|---|
| **Unit** | `OtOpcUa.Runtime.Tests` | Per-actor logic via `Akka.TestKit.Xunit2`. `DriverInstanceActor` state-machine transitions, `Phase7Composer` purity, `ScriptedAlarmActor` state machine, `VirtualTagActor` expression eval. Drivers mocked via `IDriver` test doubles. |
| **Unit** | `OtOpcUa.ControlPlane.Tests` | Singleton actor logic. `ConfigPublishCoordinator` happy path + timeout + concurrent ack ordering. `RedundancyStateActor` ServiceLevel computation truth table. `AuditWriterActor` batch flush + idempotency on duplicate `EventId`. |
| **Unit** | `OtOpcUa.Cluster.Tests` | Split-brain resolver config validation, role-aware membership helpers, HOCON parses. |
| **Unit** | `OtOpcUa.Security.Tests` | LDAP role mapping, JWT issuance, cookie+JWT roundtrip, `/auth/ping` expiry semantics. |
| **Integration** | `OtOpcUa.Host.IntegrationTests` | 2-node in-process Akka cluster. Real SQL Server, stubbed drivers. Tests: deploy happy path, deploy timeout, deploy with one node down, singleton failover mid-deploy, ConfigDb outage + stale-config fallback, edit-then-deploy roundtrip, audit row emission. |
| **Integration** | `OtOpcUa.OpcUa.IntegrationTests` | Real OPCFoundation client connects to a running stubbed Host. Asserts: dual endpoint visible, ServerUriArray populated, ServiceLevel reflects leader status, browse + read + write through `OpcUaPublishActor`, write failures returned synchronously. |
| **End-to-end** | `OtOpcUa.E2E.Tests` | Full Host with Traefik in front, two Admin nodes + two Driver nodes (4 processes via Docker). Verifies: web UI login via LDAP, deploy from UI flows to OPC UA stack, kill admin leader → Traefik fails over within 25s, kill driver node → OPC UA clients reconnect with correct ServiceLevel. CI nightly. |
### Failover-specific test cases
1. Kill Admin leader during `Dispatching` phase → new Coordinator resumes, deployment seals.
2. Kill Admin leader during `AwaitingApplyAcks` → new Coordinator queries DriverHostActors, completes ack collection.
3. Kill Driver node during `Applying` → Coordinator marks that node's `NodeDeploymentState=Failed` after deadline; surviving Driver nodes complete their apply.
4. Restart Driver node mid-deploy → on restart, replays apply (idempotent).
5. Akka split-brain (network partition between 2 admin nodes) → keep-oldest wins, smaller side downs itself within 15s.
6. Both Admin nodes restart simultaneously → deployments in `Dispatching` resume cleanly after cluster reforms.
7. Concurrent edits to the same `DriverInstance` from two UI sessions → last write wins, both audit rows present, no row corruption.
### Deploy idempotency tests
- Replay `DispatchDeployment` with same `DeploymentId/RevisionHash` → no work, ack `Applied`.
- Apply same `DeploymentArtifact` twice in a row → second application is a no-op.
- Crash DriverHostActor mid-apply, restart → resumes from `NodeDeploymentState`, completes idempotently.
### Property tests
- `Phase7Composer.ComposeAsync` is pure: same artifact → same plan, no side effects.
- `RedundancyStateActor` ServiceLevel computation: every combination of (member-state, db-ok, stale, opc-ok, is-leader) produces expected byte.
- Audit envelope generation: every mutating op produces exactly one audit row with stable `ExecutionId` correlation.
### Mac-dev test invariants
- All unit + integration tests run on macOS without Windows-only assemblies.
- Cluster tests use in-process Akka.Remote on 127.0.0.1.
- LDAP tests use `OpenLDAP` container or `Security:Ldap:DevStubMode=true`.
### Retired tests
Anything touching `ConfigGeneration` lifecycle, `ApplyLeaseRegistry`, `PeerHttpProbeLoop`, `FleetStatusPoller`, `RedundancyCoordinator` peer-probe loops, `RedundancyStatePublisher`.
## 9. Risks & open questions
1. **Akka.NET on .NET 10.** Verify Akka.NET 1.5+ targets .NET 10 cleanly.
2. **OPCFoundation SDK threading.** The OPC UA stack runs its own threadpool. `OpcUaPublishActor` must marshal writes via thread-safe wrappers; use a dedicated `synchronized-dispatcher` for actors that touch the OPC UA address space.
3. **Failure detector tuning.** ScadaLink's 2s/10s is tuned for site-to-central RTT. Benchmark before locking. Aggressive tuning + GC pauses → spurious singleton handover.
4. **ServiceLevel = Akka leader removes operator control.** No escape hatch in v1. If a customer needs one later, add a `PinnedPrimary` column to `ClusterNode` and an override path in `RedundancyStateActor`. Out of scope now.
5. **Long-lived v2 branch drift.** Monthly rebase from main, CI runs on v2 from day one.
6. **Schema migration is destructive.** Dropping `ConfigGeneration` + `ClusterNode.RedundancyRole` is one-way. Cutover must run on a quiesced system. Provide a `Migrate-To-V2.ps1` script that backs up ConfigDb, runs EF migrations, validates row counts, prints a summary.
7. **Wonderware + mxaccessgw still external processes.** Both untouched by this refactor. Future actorization would be a second refactor.
8. **Audit row volume.** Edit-heavy install ≈ 5k rows/day. Need monthly partition + 365-day retention same as ScadaLink #23.
## 10. Migration plan
Big-bang on `v2-akka-fuse` branch:
1. Branch `v2-akka-fuse` off `main`.
2. Add new projects: `OtOpcUa.Host`, `.Cluster`, `.Security`, `.ControlPlane`, `.Runtime`, `.ConfigDb`, `.Commons`, `.AdminUI`, `.OpcUaServer`. Convert to `OtOpcUa.slnx`.
3. Move ConfigDb access (EF context, repos, migrations) out of `Server` and `Admin` into `OtOpcUa.ConfigDb`. Add DataProtection key store table.
4. Move LDAP + cookie + JWT out of `Admin/Security` into `OtOpcUa.Security`. Adopt 15-min JWT / 30-min sliding cookie / `/auth/ping`.
5. Build `OtOpcUa.Cluster`: HOCON, `AkkaHostedService`, role-aware membership helpers, split-brain resolver.
6. Build `OtOpcUa.ControlPlane`: `ConfigPublishCoordinator`, `AdminOperationsActor`, `AuditWriterActor`, `FleetStatusBroadcaster`, `RedundancyStateActor`.
7. Build `OtOpcUa.Runtime`: `DriverHostActor`, `DriverInstanceActor`, `VirtualTagActor`, `ScriptedAlarmActor`, `OpcUaPublishActor`, `HistorianAdapterActor`, `PeerOpcUaProbeActor`, `DbHealthProbeActor`.
8. Migrate `Phase7Composer` to `OtOpcUa.OpcUaServer`; make it pure and unit-tested.
9. Move Blazor components from `Admin` into `OtOpcUa.AdminUI` library; replace `DriverDiagnosticsClient` HTTP with in-process actor calls; rewire `FleetStatusHub` / `AlertHub` / `ScriptLogHub` to be fed by `FleetStatusBroadcaster` `IHubContext`.
10. Build `OtOpcUa.Host` `Program.cs`: role-gated startup, health endpoints (`/health/ready`, `/health/active`, `/healthz`), `AddWindowsService`.
11. ConfigDb migration: add `Deployment`, `ConfigEdit`, `DataProtectionKeys` tables; rename `ClusterNodeGenerationState``NodeDeploymentState`; drop `ConfigGeneration`; drop `ClusterNode.RedundancyRole`. EF migration + idempotent SQL script + `Migrate-To-V2.ps1`.
12. Delete `OtOpcUa.Server`, `OtOpcUa.Admin`, `DriverInstanceBootstrapper`, `RedundancyCoordinator`, `RedundancyStatePublisher`, `ApplyLeaseRegistry`, `FleetStatusPoller`, `PeerHttpProbeLoop`, `HubTokenService`. Sweep any `*RedundancyRole*` references.
13. Update `deploy/Install-Services.ps1`: single Windows Service per node, role via env var, Traefik service registration.
14. Update docs in `docs/`: rewrite `Redundancy.md`, `ServiceHosting.md`; add `Cluster.md`, `ControlPlane.md`, `Runtime.md`. Add top-level `Architecture-v2.md` summary.
15. CI: add integration test job for the 2-node cluster + OPC UA roundtrip.
16. Tag the last v1 release on `main` for backport-only fixes. Merge `v2-akka-fuse``main` when GA.
@@ -0,0 +1,716 @@
# Akka Hosting Alignment — Gap Closeout Implementation Plan
> **For Claude:** REQUIRED SUB-SKILL: Use `superpowers-extended-cc:executing-plans` to implement this plan task-by-task.
**Goal:** Close the four real/cosmetic gaps identified by the audit of `docs/plans/2026-05-26-akka-hosting-alignment-plan.md` so the v2 implementation matches the plan's literal contract (per-role appsettings overlays, explicit dual-endpoint visibility test, plan-prescribed filenames, removal of empty legacy directories).
**Architecture:** Additive only. No production-runtime semantics change. One small extension to `OpcUaApplicationHost` so the OPC UA server can advertise peer URIs in `Server.ServerArray` — gated on a new option, defaults to old behavior. Everything else is JSON, test code, file moves, and `rm -rf` of stale bin/obj trees.
**Tech Stack:** .NET 10, OPCFoundation .NET Standard SDK (`Opc.Ua.*`), xunit.v3, Shouldly, EF Core 10 (inherited; no schema changes).
**Source plan:** `docs/plans/2026-05-26-akka-hosting-alignment-plan.md`. The audit findings closed by this plan map to Tasks 54, 59, 60, and the post-Task-56 cosmetic cleanup. **Read the source plan's "Conventions for every task" block — those rules still apply here.**
**Branch:** `v2-gap-closeout` off `master`.
---
## Conventions for every task
- **Branch:** Stay on `v2-gap-closeout`. Never commit to `master` while plan is running.
- **Build command:** `dotnet build ZB.MOM.WW.OtOpcUa.slnx` — must be green before commit.
- **Test command:** `dotnet test ZB.MOM.WW.OtOpcUa.slnx --no-build` — relevant new/changed tests must pass.
- **Commit format:** Conventional Commits matching the source plan — `feat(host):`, `test(opcua):`, `chore(cleanup):`, `refactor(test):`, etc.
- **Mac compatibility:** All code must build on macOS. The new dual-endpoint test boots two real OPC UA servers on loopback — works on macOS (no Windows-only APIs needed; PKI is created under a per-test temp dir).
---
## Task 0: Add three role-overlay appsettings files (Task 54 gap)
**Classification:** small
**Estimated implement time:** ~3 min
**Parallelizable with:** Task 1, Task 5, Task 6
**Files:**
- Create: `/Users/dohertj2/Desktop/OtOpcUa/src/Server/ZB.MOM.WW.OtOpcUa.Host/appsettings.admin.json`
- Create: `/Users/dohertj2/Desktop/OtOpcUa/src/Server/ZB.MOM.WW.OtOpcUa.Host/appsettings.driver.json`
- Create: `/Users/dohertj2/Desktop/OtOpcUa/src/Server/ZB.MOM.WW.OtOpcUa.Host/appsettings.admin-driver.json`
**Background:**
`Program.cs` line 33-35 loads `appsettings.{role-suffix}.json` where the suffix is the roles joined alphabetically with `'-'`. Today the loader passes `optional: true`, so the host boots without these files — but the source plan (Task 54) called them out as required scaffolding so operators have per-role tunable defaults.
Suffix matrix:
| `OTOPCUA_ROLES` env | Loaded file |
|---|---|
| `admin` | `appsettings.admin.json` |
| `driver` | `appsettings.driver.json` |
| `admin,driver` (any order) | `appsettings.admin-driver.json` (joined alphabetical) |
**Step 1: Create `appsettings.admin.json`**
Admin-only nodes don't bind drivers; tighten Serilog and disable the LDAP dev stub by default.
```json
{
"Serilog": {
"MinimumLevel": {
"Default": "Information",
"Override": {
"Microsoft.AspNetCore": "Warning",
"Akka": "Information"
}
}
},
"Security": {
"Ldap": {
"DevStubMode": false
}
}
}
```
**Step 2: Create `appsettings.driver.json`**
Driver-only nodes have no Admin UI; raise OPC UA verbosity slightly so per-node diagnostics flow to logs.
```json
{
"Serilog": {
"MinimumLevel": {
"Default": "Information",
"Override": {
"Opc.Ua": "Debug",
"Akka": "Information"
}
}
},
"Security": {
"Ldap": {
"DevStubMode": false
}
}
}
```
**Step 3: Create `appsettings.admin-driver.json`**
Combined-role nodes (the docker-dev compose default + the integration test harness) — turn on both surfaces with shared defaults.
```json
{
"Serilog": {
"MinimumLevel": {
"Default": "Information",
"Override": {
"Microsoft.AspNetCore": "Warning",
"Opc.Ua": "Information",
"Akka": "Information"
}
}
},
"Security": {
"Ldap": {
"DevStubMode": false
}
}
}
```
**Step 4: Build green check**
Run: `dotnet build ZB.MOM.WW.OtOpcUa.slnx`
Expected: succeeds. (JSON files do not break the build; this is a smoke check that nothing else regressed.)
**Step 5: Commit**
```bash
git add src/Server/ZB.MOM.WW.OtOpcUa.Host/appsettings.admin.json \
src/Server/ZB.MOM.WW.OtOpcUa.Host/appsettings.driver.json \
src/Server/ZB.MOM.WW.OtOpcUa.Host/appsettings.admin-driver.json
git commit -m "feat(host): add per-role appsettings overlays for admin/driver/admin-driver"
```
---
## Task 1: Extend `OpcUaApplicationHost` with `PeerApplicationUris` + populate `Server.ServerArray`
**Classification:** standard
**Estimated implement time:** ~5 min
**Parallelizable with:** Task 0, Task 5, Task 6
**Files:**
- Modify: `/Users/dohertj2/Desktop/OtOpcUa/src/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer/OpcUaApplicationHost.cs` (add option + post-start population)
- Test: `/Users/dohertj2/Desktop/OtOpcUa/tests/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer.Tests/OpcUaApplicationHostServerArrayTests.cs`
**Background:**
The source plan's Task 60 promised a test where "real OPCFoundation client → both endpoints visible in ServerUriArray". That requires production code to populate the peer URIs onto each server's `Server.ServerArray` (NodeId i=2254) property. No such code exists in v2 today — this task adds it as an opt-in option so existing single-node tests keep their current behavior. Task 3 then writes the integration test that drives it across two servers.
**Step 1: Write the failing unit test**
Create `tests/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer.Tests/OpcUaApplicationHostServerArrayTests.cs`:
```csharp
using System.IO;
using System.Net.Sockets;
using System.Net;
using Microsoft.Extensions.Logging.Abstractions;
using Opc.Ua;
using Opc.Ua.Server;
using Shouldly;
using Xunit;
using ZB.MOM.WW.OtOpcUa.OpcUaServer;
namespace ZB.MOM.WW.OtOpcUa.OpcUaServer.Tests;
/// <summary>
/// Audit gap closeout — verifies <see cref="OpcUaApplicationHostOptions.PeerApplicationUris"/>
/// is reflected in <c>Server.ServerArray</c> after start. Single-server in-process check; the
/// cross-server visibility check lives in <c>OtOpcUa.OpcUaServer.IntegrationTests</c>.
/// </summary>
public sealed class OpcUaApplicationHostServerArrayTests
{
[Fact]
public async Task ServerArray_contains_local_uri_and_configured_peers_after_start()
{
var pkiRoot = Path.Combine(Path.GetTempPath(), $"otopcua-pki-{Guid.NewGuid():N}");
try
{
var options = new OpcUaApplicationHostOptions
{
ApplicationName = "OtOpcUa.UnitTest",
ApplicationUri = "urn:OtOpcUa.UnitTest.NodeA",
OpcUaPort = AllocateFreePort(),
PublicHostname = "127.0.0.1",
PkiStoreRoot = pkiRoot,
PeerApplicationUris = new[] { "urn:OtOpcUa.UnitTest.NodeB" },
};
var server = new StandardServer();
await using var host = new OpcUaApplicationHost(options, NullLogger<OpcUaApplicationHost>.Instance);
await host.StartAsync(server, CancellationToken.None);
var serverArray = server.CurrentInstance.ServerObject.ServerArray.Value;
serverArray.ShouldNotBeNull();
serverArray.ShouldContain("urn:OtOpcUa.UnitTest.NodeA");
serverArray.ShouldContain("urn:OtOpcUa.UnitTest.NodeB");
}
finally
{
if (Directory.Exists(pkiRoot)) Directory.Delete(pkiRoot, recursive: true);
}
}
private static int AllocateFreePort()
{
var listener = new TcpListener(IPAddress.Loopback, 0);
listener.Start();
var port = ((IPEndPoint)listener.LocalEndpoint).Port;
listener.Stop();
return port;
}
}
```
**Step 2: Run the test — confirm it fails**
Run: `dotnet test tests/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer.Tests --filter "FullyQualifiedName~OpcUaApplicationHostServerArrayTests"`
Expected: FAIL with `PeerApplicationUris` not found (compile error) — the option doesn't exist yet.
**Step 3: Add the option**
Edit `src/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer/OpcUaApplicationHost.cs`. Add to `OpcUaApplicationHostOptions` (after `AutoAcceptUntrustedClientCertificates`, around line 65):
```csharp
/// <summary>
/// Peer server URIs published in <c>Server.ServerArray</c> after start, in addition to
/// the local <see cref="ApplicationUri"/>. Empty by default — set this on warm-redundancy
/// deployments so OPC UA clients can discover the partner endpoint via the standard
/// Server.ServerArray property (NodeId i=2254). Order does not matter; the local URI
/// is always element 0.
/// </summary>
public IList<string> PeerApplicationUris { get; set; } = new List<string>();
```
**Step 4: Populate `Server.ServerArray` after start**
Edit `OpcUaApplicationHost.StartAsync` (around line 100-118). After the `_application.Start(server)` call and before the log line, insert:
```csharp
PopulateServerArray();
```
Then add the private method below `AttachUserAuthenticator`:
```csharp
/// <summary>
/// Writes the union of <see cref="OpcUaApplicationHostOptions.ApplicationUri"/> and
/// <see cref="OpcUaApplicationHostOptions.PeerApplicationUris"/> to the OPC UA standard
/// <c>Server.ServerArray</c> property (NodeId i=2254). Clients in a warm-redundancy
/// deployment discover the partner endpoint by reading this property.
/// </summary>
private void PopulateServerArray()
{
var serverObject = _server?.CurrentInstance?.ServerObject;
if (serverObject is null) return;
var uris = new List<string> { _options.ApplicationUri };
foreach (var peer in _options.PeerApplicationUris)
{
if (!string.IsNullOrWhiteSpace(peer) && !uris.Contains(peer))
uris.Add(peer);
}
serverObject.ServerArray.Value = uris.ToArray();
}
```
**Step 5: Run the test — confirm it passes**
Run: `dotnet test tests/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer.Tests --filter "FullyQualifiedName~OpcUaApplicationHostServerArrayTests"`
Expected: PASS. If `ServerObject.ServerArray.Value` is read-only (some SDK versions guard it), fall back to writing through `ServerArrayNode.Value` via the address-space accessor — but try the direct write first; the SDK exposes it as a settable BaseDataVariableState on `ServerObjectState`.
**Step 6: Run full OpcUaServer.Tests suite to confirm no regression**
Run: `dotnet test tests/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer.Tests`
Expected: all tests pass — `PopulateServerArray` is additive when `PeerApplicationUris` is empty (default), so existing tests don't change behavior.
**Step 7: Commit**
```bash
git add src/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer/OpcUaApplicationHost.cs \
tests/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer.Tests/OpcUaApplicationHostServerArrayTests.cs
git commit -m "feat(opcua): OpcUaApplicationHost publishes peer URIs in Server.ServerArray"
```
---
## Task 2: Create `OtOpcUa.OpcUaServer.IntegrationTests` project
**Classification:** small
**Estimated implement time:** ~4 min
**Parallelizable with:** Task 5, Task 6 (file moves elsewhere)
**Depends on:** none (csproj is self-contained)
**Files:**
- Create: `/Users/dohertj2/Desktop/OtOpcUa/tests/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer.IntegrationTests/ZB.MOM.WW.OtOpcUa.OpcUaServer.IntegrationTests.csproj`
- Modify: `/Users/dohertj2/Desktop/OtOpcUa/ZB.MOM.WW.OtOpcUa.slnx` (add the project)
**Background:**
The source plan's Task 60 named this exact project. Audit found ServiceLevel coverage relocated to other test projects but no `OpcUaServer.IntegrationTests` project exists. Creating the project skeleton in its own task keeps Task 3's commit focused on the test code.
**Step 1: Create the csproj**
Mirror the conventions in `tests/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer.Tests/ZB.MOM.WW.OtOpcUa.OpcUaServer.Tests.csproj`. The integration project needs the `Opc.Ua.Client` package (vs. only `Opc.Ua.Server` in the unit tests) — confirm the version against the existing client CLI's csproj: `src/Client/ZB.MOM.WW.OtOpcUa.Client.CLI/ZB.MOM.WW.OtOpcUa.Client.CLI.csproj`.
```xml
<Project Sdk="Microsoft.NET.Sdk">
<PropertyGroup>
<IsPackable>false</IsPackable>
<IsTestProject>true</IsTestProject>
<RootNamespace>ZB.MOM.WW.OtOpcUa.OpcUaServer.IntegrationTests</RootNamespace>
<TreatWarningsAsErrors>true</TreatWarningsAsErrors>
</PropertyGroup>
<ItemGroup>
<PackageReference Include="xunit.v3"/>
<PackageReference Include="Shouldly"/>
<PackageReference Include="Microsoft.NET.Test.Sdk"/>
<PackageReference Include="OPCFoundation.NetStandard.Opc.Ua.Client"/>
<PackageReference Include="OPCFoundation.NetStandard.Opc.Ua.Configuration"/>
<PackageReference Include="Microsoft.Extensions.Logging.Abstractions"/>
<PackageReference Include="xunit.runner.visualstudio">
<PrivateAssets>all</PrivateAssets>
<IncludeAssets>runtime; build; native; contentfiles; analyzers; buildtransitive</IncludeAssets>
</PackageReference>
</ItemGroup>
<ItemGroup>
<ProjectReference Include="..\..\..\src\Server\ZB.MOM.WW.OtOpcUa.OpcUaServer\ZB.MOM.WW.OtOpcUa.OpcUaServer.csproj"/>
</ItemGroup>
</Project>
```
If `OPCFoundation.NetStandard.Opc.Ua.Client` isn't in `Directory.Packages.props`, add it there (mirror the existing `OPCFoundation.NetStandard.Opc.Ua.Server` version exactly).
**Step 2: Add project to the solution**
Run: `dotnet sln ZB.MOM.WW.OtOpcUa.slnx add tests/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer.IntegrationTests/ZB.MOM.WW.OtOpcUa.OpcUaServer.IntegrationTests.csproj`
Expected: "Project added to the solution."
**Step 3: Build green check**
Run: `dotnet build ZB.MOM.WW.OtOpcUa.slnx`
Expected: builds. (Empty project, so no test discovery yet — `dotnet test` would say "no tests".)
**Step 4: Commit**
```bash
git add tests/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer.IntegrationTests/ \
ZB.MOM.WW.OtOpcUa.slnx \
Directory.Packages.props # only if the Opc.Ua.Client version was added there
git commit -m "test(opcua): scaffold OtOpcUa.OpcUaServer.IntegrationTests project"
```
---
## Task 3: `DualEndpointTests` — real OPC UA client reads both URIs from `Server.ServerArray`
**Classification:** standard
**Estimated implement time:** ~5 min
**Parallelizable with:** Task 5, Task 6
**Depends on:** Task 1 (PeerApplicationUris wiring), Task 2 (IT project exists)
**Files:**
- Create: `/Users/dohertj2/Desktop/OtOpcUa/tests/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer.IntegrationTests/DualEndpointTests.cs`
**Background:**
This is the explicit Task 60 deliverable: a real OPC UA client connects to one server and confirms it can discover the partner via `Server.ServerArray`. Single-server unit-side coverage exists in Task 1; this test exercises the wire path with both servers up.
**Step 1: Write the test**
```csharp
using System.Net;
using System.Net.Sockets;
using Microsoft.Extensions.Logging.Abstractions;
using Opc.Ua;
using Opc.Ua.Client;
using Opc.Ua.Configuration;
using Opc.Ua.Server;
using Shouldly;
using Xunit;
using ZB.MOM.WW.OtOpcUa.OpcUaServer;
namespace ZB.MOM.WW.OtOpcUa.OpcUaServer.IntegrationTests;
/// <summary>
/// Source plan Task 60 — closes the audit gap. Boots two real <see cref="StandardServer"/>
/// instances on loopback, each configured with the other's <c>ApplicationUri</c> in
/// <see cref="OpcUaApplicationHostOptions.PeerApplicationUris"/>. A real OPC UA client connects
/// to Node A, reads <c>Server.ServerArray</c>, and asserts both URIs are visible — the
/// warm-redundancy discovery contract clients depend on.
/// </summary>
public sealed class DualEndpointTests
{
private const string NodeAUri = "urn:OtOpcUa.DualEndpoint.NodeA";
private const string NodeBUri = "urn:OtOpcUa.DualEndpoint.NodeB";
[Fact]
public async Task Client_reads_both_ApplicationUris_from_NodeA_ServerArray()
{
var pkiRootA = Path.Combine(Path.GetTempPath(), $"otopcua-pki-a-{Guid.NewGuid():N}");
var pkiRootB = Path.Combine(Path.GetTempPath(), $"otopcua-pki-b-{Guid.NewGuid():N}");
var portA = AllocateFreePort();
var portB = AllocateFreePort();
try
{
await using var nodeA = await StartNodeAsync(NodeAUri, portA, pkiRootA, peers: new[] { NodeBUri });
await using var nodeB = await StartNodeAsync(NodeBUri, portB, pkiRootB, peers: new[] { NodeAUri });
var serverArray = await ReadServerArrayAsync($"opc.tcp://127.0.0.1:{portA}/OtOpcUa");
serverArray.ShouldContain(NodeAUri);
serverArray.ShouldContain(NodeBUri);
}
finally
{
if (Directory.Exists(pkiRootA)) Directory.Delete(pkiRootA, recursive: true);
if (Directory.Exists(pkiRootB)) Directory.Delete(pkiRootB, recursive: true);
}
}
private static async Task<OpcUaApplicationHost> StartNodeAsync(
string applicationUri, int port, string pkiRoot, string[] peers)
{
var options = new OpcUaApplicationHostOptions
{
ApplicationName = applicationUri, // unique per node — SDK uses it for cert CN
ApplicationUri = applicationUri,
OpcUaPort = port,
PublicHostname = "127.0.0.1",
PkiStoreRoot = pkiRoot,
EnabledSecurityProfiles = new List<OpcUaSecurityProfile> { OpcUaSecurityProfile.None },
AutoAcceptUntrustedClientCertificates = true,
PeerApplicationUris = peers,
};
var server = new StandardServer();
var host = new OpcUaApplicationHost(options, NullLogger<OpcUaApplicationHost>.Instance);
await host.StartAsync(server, CancellationToken.None);
return host;
}
private static async Task<string[]> ReadServerArrayAsync(string endpointUrl)
{
var appConfig = new ApplicationConfiguration
{
ApplicationName = "OtOpcUa.DualEndpointClient",
ApplicationUri = $"urn:OtOpcUa.DualEndpointClient.{Guid.NewGuid():N}",
ApplicationType = ApplicationType.Client,
SecurityConfiguration = new SecurityConfiguration
{
ApplicationCertificate = new CertificateIdentifier(),
AutoAcceptUntrustedCertificates = true,
},
ClientConfiguration = new ClientConfiguration { DefaultSessionTimeout = 60_000 },
CertificateValidator = new CertificateValidator(),
};
await appConfig.Validate(ApplicationType.Client);
appConfig.CertificateValidator.CertificateValidation += (_, e) => e.Accept = true;
var endpoint = CoreClientUtils.SelectEndpoint(appConfig, endpointUrl, useSecurity: false);
var endpointConfiguration = EndpointConfiguration.Create(appConfig);
var configuredEndpoint = new ConfiguredEndpoint(null, endpoint, endpointConfiguration);
using var session = await Session.Create(
appConfig, configuredEndpoint, updateBeforeConnect: false,
sessionName: "DualEndpointTests", sessionTimeout: 60_000,
identity: new UserIdentity(new AnonymousIdentityToken()),
preferredLocales: null);
var value = session.ReadValue(VariableIds.Server_ServerArray);
return (string[])value.Value;
}
private static int AllocateFreePort()
{
var listener = new TcpListener(IPAddress.Loopback, 0);
listener.Start();
var port = ((IPEndPoint)listener.LocalEndpoint).Port;
listener.Stop();
return port;
}
}
```
**Step 2: Run the test**
Run: `dotnet test tests/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer.IntegrationTests`
Expected: PASS. Wall-time ~3-5 s (two cert-creation cycles + session handshake).
If the test hangs on the session handshake on first run, it's the SDK reading the trusted-cert store — bumping `AutoAcceptUntrustedClientCertificates = true` on both server hosts (already set above) should resolve it. If `CoreClientUtils.SelectEndpoint` throws because the SDK version uses a different overload, fall back to constructing the `EndpointDescription` directly with `EndpointUrl = endpointUrl, SecurityMode = MessageSecurityMode.None, SecurityPolicyUri = SecurityPolicies.None` and skipping `SelectEndpoint`.
**Step 3: Commit**
```bash
git add tests/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer.IntegrationTests/DualEndpointTests.cs
git commit -m "test(opcua): DualEndpointTests — real client reads peer URIs from Server.ServerArray"
```
---
## Task 4: Wire `OtOpcUa.OpcUaServer.IntegrationTests` into v2-ci.yml
**Classification:** small
**Estimated implement time:** ~3 min
**Parallelizable with:** Task 5, Task 6
**Depends on:** Task 3 (project must exist + have a real test before CI runs it)
**Files:**
- Modify: `/Users/dohertj2/Desktop/OtOpcUa/.github/workflows/v2-ci.yml`
**Step 1: Add the project to the `integration` job**
Either extend the existing `integration` job to run a second `dotnet test` step, or convert it to a matrix. Prefer a matrix for symmetry with `unit-tests`:
Open `.github/workflows/v2-ci.yml`, locate the `integration:` job. Replace it with:
```yaml
integration:
needs: build
runs-on: ubuntu-latest
strategy:
fail-fast: false
matrix:
project:
- tests/Server/ZB.MOM.WW.OtOpcUa.Host.IntegrationTests
- tests/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer.IntegrationTests
steps:
- uses: actions/checkout@v4
- uses: actions/setup-dotnet@v4
with:
dotnet-version: 10.0.x
- name: dotnet test ${{ matrix.project }}
run: dotnet test ${{ matrix.project }} --configuration Release --filter "Category!=E2E"
```
**Step 2: Build green check**
Run: `dotnet test tests/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer.IntegrationTests --configuration Release --filter "Category!=E2E"`
Expected: matches the exact CI command — passes locally so CI will pass too.
**Step 3: Commit**
```bash
git add .github/workflows/v2-ci.yml
git commit -m "ci(v2): include OpcUaServer.IntegrationTests in integration matrix"
```
---
## Task 5: Rename `FailoverScenarioTests` → `FailoverDuringDeployTests` (Task 59 cosmetic)
**Classification:** trivial
**Estimated implement time:** ~2 min
**Parallelizable with:** Task 0, Task 1, Task 2, Task 6 (different files)
**Files:**
- Rename: `tests/Server/ZB.MOM.WW.OtOpcUa.Host.IntegrationTests/FailoverScenarioTests.cs``FailoverDuringDeployTests.cs`
- Modify: class name + namespace-internal references
**Step 1: Rename the file and the class**
```bash
git mv tests/Server/ZB.MOM.WW.OtOpcUa.Host.IntegrationTests/FailoverScenarioTests.cs \
tests/Server/ZB.MOM.WW.OtOpcUa.Host.IntegrationTests/FailoverDuringDeployTests.cs
```
Then edit `FailoverDuringDeployTests.cs` and replace the single class declaration `public sealed class FailoverScenarioTests` with `public sealed class FailoverDuringDeployTests`. Use Edit, not sed — the file only declares this class once (`grep -c "FailoverScenario" .` ≤ 2).
**Step 2: Sweep for any stale references**
Run: `grep -rln "FailoverScenarioTests" .`
Expected: zero matches after Step 1. If anything appears (e.g., a CI filter, a doc), fix the reference.
**Step 3: Build + run test**
Run: `dotnet test tests/Server/ZB.MOM.WW.OtOpcUa.Host.IntegrationTests --filter "FullyQualifiedName~FailoverDuringDeployTests"`
Expected: same tests pass that previously passed under the old name.
**Step 4: Commit**
```bash
git add tests/Server/ZB.MOM.WW.OtOpcUa.Host.IntegrationTests/FailoverDuringDeployTests.cs
git commit -m "refactor(test): rename FailoverScenarioTests → FailoverDuringDeployTests for plan parity"
```
---
## Task 6: Delete empty bin/obj-only legacy directories
**Classification:** trivial
**Estimated implement time:** ~2 min
**Parallelizable with:** Task 0, Task 1, Task 2, Task 5
**Files:**
- Delete: `src/Server/ZB.MOM.WW.OtOpcUa.Server/`
- Delete: `src/Server/ZB.MOM.WW.OtOpcUa.Admin/`
- Delete: `tests/Server/ZB.MOM.WW.OtOpcUa.Server.Tests/`
- Delete: `tests/Server/ZB.MOM.WW.OtOpcUa.Admin.Tests/`
- Delete: `tests/Server/ZB.MOM.WW.OtOpcUa.Admin.E2ETests/`
**Background:**
Source plan Task 56 deleted the projects from `ZB.MOM.WW.OtOpcUa.slnx` (confirmed by the audit) but left `bin/`+`obj/` shells on disk. These confuse new contributors and skew directory listings. None of them are referenced anywhere.
**Step 1: Sanity-check that each directory is bin/obj-only**
```bash
for dir in \
src/Server/ZB.MOM.WW.OtOpcUa.Server \
src/Server/ZB.MOM.WW.OtOpcUa.Admin \
tests/Server/ZB.MOM.WW.OtOpcUa.Server.Tests \
tests/Server/ZB.MOM.WW.OtOpcUa.Admin.Tests \
tests/Server/ZB.MOM.WW.OtOpcUa.Admin.E2ETests; do
echo "--- $dir ---"
find "$dir" -maxdepth 2 -type f | grep -v "/bin/\|/obj/"
done
```
Expected: every section is empty (no source files leak out). If any source file shows, STOP and surface it — don't delete blindly.
**Step 2: Verify slnx doesn't reference them**
Run: `grep -nE 'ZB\.MOM\.WW\.OtOpcUa\.(Server|Admin)(/|\.Tests|\.E2ETests)' ZB.MOM.WW.OtOpcUa.slnx`
Expected: zero matches.
**Step 3: Delete the directories**
```bash
rm -rf src/Server/ZB.MOM.WW.OtOpcUa.Server \
src/Server/ZB.MOM.WW.OtOpcUa.Admin \
tests/Server/ZB.MOM.WW.OtOpcUa.Server.Tests \
tests/Server/ZB.MOM.WW.OtOpcUa.Admin.Tests \
tests/Server/ZB.MOM.WW.OtOpcUa.Admin.E2ETests
```
**Step 4: Build green check**
Run: `dotnet build ZB.MOM.WW.OtOpcUa.slnx`
Expected: succeeds (these directories were already out of the solution).
**Step 5: Commit**
```bash
git add -A
git commit -m "chore(cleanup): remove stale bin/obj shells for deleted v1 Server/Admin projects"
```
---
## Task 7: Final build + test green check
**Classification:** trivial
**Estimated implement time:** ~3 min
**Parallelizable with:** none (verification, depends on all prior tasks)
**Step 1: Restore + build**
Run: `dotnet build ZB.MOM.WW.OtOpcUa.slnx`
Expected: 0 errors, 0 warnings (TreatWarningsAsErrors is on across the solution).
**Step 2: Run the full test suite**
Run: `dotnet test ZB.MOM.WW.OtOpcUa.slnx --no-build`
Expected: all tests green. Specifically confirm:
- `OpcUaApplicationHostServerArrayTests` (Task 1) — pass
- `DualEndpointTests` (Task 3) — pass
- `FailoverDuringDeployTests` (Task 5) — same count of tests pass as before the rename
**Step 3: Smoke check the audit assertions**
Run:
```bash
ls src/Server/ZB.MOM.WW.OtOpcUa.Host/appsettings.*.json
find tests/Server -iname "DualEndpointTests.cs" -o -iname "FailoverDuringDeployTests.cs"
ls -la src/Server/ZB.MOM.WW.OtOpcUa.{Server,Admin} 2>/dev/null
```
Expected:
- 4 appsettings files: `.json`, `.Development.json`, `.admin.json`, `.admin-driver.json`, `.driver.json`
- Both renamed/new test files exist
- The two `ls -la` calls return errors (directories gone)
**Step 4: No commit unless cleanup turned up**
If anything failed in Steps 1-3, fix it as a follow-up task — do not paper over with a `--no-verify` commit.
---
## Final verification
After Task 7:
1. `dotnet build ZB.MOM.WW.OtOpcUa.slnx` — green
2. `dotnet test ZB.MOM.WW.OtOpcUa.slnx --no-build` — green (incl. 2 new tests)
3. `git log --oneline master..HEAD` — exactly 6 commits, Conventional-Commits style
4. Open PR `v2-gap-closeout``master` titled "v2: close audit gaps — appsettings overlays, DualEndpointTests, cleanup"
---
## Task index
| # | Title | Class | Time | Parallel with |
|---|---|---|---|---|
| 0 | Per-role appsettings overlays | small | 3m | 1, 5, 6 |
| 1 | OpcUaApplicationHost.PeerApplicationUris + ServerArray | standard | 5m | 0, 5, 6 |
| 2 | OpcUaServer.IntegrationTests project skeleton | small | 4m | 5, 6 |
| 3 | DualEndpointTests | standard | 5m | 5, 6 |
| 4 | CI matrix entry for new IT project | small | 3m | 5, 6 |
| 5 | Rename FailoverScenarioTests → FailoverDuringDeployTests | trivial | 2m | 0, 1, 2, 6 |
| 6 | Delete stale bin/obj-only directories | trivial | 2m | 0, 1, 2, 5 |
| 7 | Final build + test green check | trivial | 3m | none |
**Total estimated subagent time:** ~27 min.
**Dependency graph (non-parallel pairs):**
- Task 3 depends on Task 1 (option must exist) and Task 2 (project must exist)
- Task 4 depends on Task 3 (CI runs the project's tests)
- Task 7 depends on all prior tasks
@@ -0,0 +1,17 @@
{
"planPath": "docs/plans/2026-05-26-akka-hosting-alignment-gaps-closeout.md",
"tasks": [
{"id": 1, "subject": "Task 0: Per-role appsettings overlays", "status": "completed", "commit": "898a477"},
{"id": 2, "subject": "Task 1: OpcUaApplicationHost.PeerApplicationUris + ServerArray population", "status": "completed", "commits": ["70ffd28", "cb936db"]},
{"id": 3, "subject": "Task 2: OpcUaServer.IntegrationTests project skeleton", "status": "completed", "commit": "83eda9e"},
{"id": 4, "subject": "Task 3: DualEndpointTests — real OPC UA client reads both URIs from Server.ServerArray", "status": "completed", "commits": ["dce2528", "a5412c1", "cb936db"], "blockedBy": ["2", "3"]},
{"id": 5, "subject": "Task 4: Wire OpcUaServer.IntegrationTests into v2-ci.yml", "status": "completed", "commit": "e8c4f18", "blockedBy": ["4"]},
{"id": 6, "subject": "Task 5: Rename FailoverScenarioTests → FailoverDuringDeployTests", "status": "completed", "commit": "25ce111"},
{"id": 7, "subject": "Task 6: Delete empty bin/obj-only legacy directories", "status": "completed", "commit": "(no tracked changes — bin/obj only)"},
{"id": 8, "subject": "Task 7: Final build + test green check", "status": "completed", "blockedBy": ["1", "2", "3", "4", "5", "6", "7"]}
],
"lastUpdated": "2026-05-26T00:00:00Z",
"finalReview": "approved",
"branchHead": "e8c4f18",
"branchCommitCount": 8
}
File diff suppressed because it is too large Load Diff
@@ -0,0 +1,101 @@
{
"planPath": "docs/plans/2026-05-26-akka-hosting-alignment-plan.md",
"branch": "v2-akka-fuse",
"designDoc": "docs/plans/2026-05-26-akka-hosting-alignment-design.md",
"lastUpdated": "2026-05-26T00:00:00Z",
"tasks": [
{"id": 0, "subject": "Task 0: Create branch and central package management", "status": "completed", "classification": "small", "estMinutes": 3, "parallelizableWith": [], "commit": "2b81147"},
{"id": 1, "subject": "Task 1: Create OtOpcUa.Commons project", "status": "completed", "classification": "small", "estMinutes": 3, "parallelizableWith": [2,3,4,5,6,7,8], "blockedBy": [0], "commit": "30a2104"},
{"id": 2, "subject": "Task 2: Create OtOpcUa.Cluster project", "status": "completed", "classification": "small", "estMinutes": 3, "parallelizableWith": [1,3,4,5,6,7,8], "blockedBy": [0], "commit": "30a2104"},
{"id": 3, "subject": "Task 3: Create OtOpcUa.Security project", "status": "completed", "classification": "small", "estMinutes": 3, "parallelizableWith": [1,2,4,5,6,7,8], "blockedBy": [0], "commit": "30a2104"},
{"id": 4, "subject": "Task 4: Create OtOpcUa.ControlPlane project", "status": "completed", "classification": "small", "estMinutes": 3, "parallelizableWith": [1,2,3,5,6,7,8], "blockedBy": [0], "commit": "30a2104"},
{"id": 5, "subject": "Task 5: Create OtOpcUa.Runtime project", "status": "completed", "classification": "small", "estMinutes": 3, "parallelizableWith": [1,2,3,4,6,7,8], "blockedBy": [0], "commit": "30a2104"},
{"id": 6, "subject": "Task 6: Create OtOpcUa.OpcUaServer project", "status": "completed", "classification": "small", "estMinutes": 3, "parallelizableWith": [1,2,3,4,5,7,8], "blockedBy": [0], "commit": "30a2104"},
{"id": 7, "subject": "Task 7: Create OtOpcUa.AdminUI Razor class library", "status": "completed", "classification": "small", "estMinutes": 3, "parallelizableWith": [1,2,3,4,5,6,8], "blockedBy": [0], "commit": "30a2104"},
{"id": 8, "subject": "Task 8: Create OtOpcUa.Host Web SDK project", "status": "completed", "classification": "small", "estMinutes": 5, "parallelizableWith": [1,2,3,4,5,6,7], "blockedBy": [0], "commit": "30a2104"},
{"id": 9, "subject": "Task 9: Build green smoke check", "status": "completed", "classification": "trivial", "estMinutes": 2, "parallelizableWith": [], "blockedBy": [1,2,3,4,5,6,7,8], "commit": "30a2104"},
{"id": 10, "subject": "Task 10: Add Deployment entity", "status": "completed", "classification": "standard", "estMinutes": 5, "parallelizableWith": [11,12,13], "blockedBy": [9], "commit": "8e2c4f2"},
{"id": 11, "subject": "Task 11: Add NodeDeploymentState entity", "status": "completed", "classification": "standard", "estMinutes": 5, "parallelizableWith": [10,12,13], "blockedBy": [9], "commit": "8e2c4f2"},
{"id": 12, "subject": "Task 12: Add ConfigEdit audit entity", "status": "completed", "classification": "small", "estMinutes": 4, "parallelizableWith": [10,11,13], "blockedBy": [9], "commit": "8e2c4f2"},
{"id": 13, "subject": "Task 13: Add DataProtection keys table", "status": "completed", "classification": "small", "estMinutes": 3, "parallelizableWith": [10,11,12], "blockedBy": [9], "commit": "8e2c4f2"},
{"id": "14a", "subject": "Task 14a: Add RowVersion to live-edit entities", "status": "completed", "classification": "standard", "estMinutes": 10, "parallelizableWith": [], "blockedBy": [13], "commit": "4bb4ad8"},
{"id": "14b", "subject": "Task 14b: Decouple live-edit entities from ConfigGeneration", "status": "completed", "classification": "high-risk", "estMinutes": 30, "parallelizableWith": [], "blockedBy": ["14a"], "commit": "13d3aea"},
{"id": "14c", "subject": "Task 14c: Obsolete GenerationApplier/Diff/SealedCache", "status": "completed", "classification": "high-risk", "estMinutes": 20, "parallelizableWith": [], "blockedBy": ["14b"], "commit": "1ddf8bb"},
{"id": "14d", "subject": "Task 14d: Drop ClusterNode.RedundancyRole", "status": "completed", "classification": "standard", "estMinutes": 5, "parallelizableWith": ["14a","14b","14c"], "blockedBy": [13], "commit": "3c915e6"},
{"id": "14e", "subject": "Task 14e: Delete ConfigGeneration + ClusterNodeGenerationState", "status": "completed", "classification": "small", "estMinutes": 5, "parallelizableWith": [], "blockedBy": ["14b","14c"], "commit": "e00f46d"},
{"id": "14f", "subject": "Task 14f: V2HostingAlignment EF migration (consolidator)", "status": "completed", "classification": "high-risk", "estMinutes": 15, "parallelizableWith": [], "blockedBy": ["14a","14b","14c","14d","14e"], "commit": "605dbf3"},
{"id": 15, "subject": "Task 15: Migrate-To-V2.ps1 idempotent script", "status": "completed", "classification": "standard", "estMinutes": 5, "parallelizableWith": [16,17,18], "blockedBy": ["14f"], "commit": "c168c1c"},
{"id": 16, "subject": "Task 16: Common types (CorrelationId, ExecutionId, NodeId, ...)", "status": "completed", "classification": "standard", "estMinutes": 5, "parallelizableWith": [17,18], "blockedBy": [9], "commit": "fee4a8c"},
{"id": 17, "subject": "Task 17: Akka message contracts", "status": "completed", "classification": "standard", "estMinutes": 5, "parallelizableWith": [16,18], "blockedBy": [16], "commit": "5d3a5a4"},
{"id": 18, "subject": "Task 18: Common interfaces", "status": "completed", "classification": "small", "estMinutes": 4, "parallelizableWith": [16,17], "blockedBy": [16], "commit": "136234e"},
{"id": 19, "subject": "Task 19: HOCON config", "status": "completed", "classification": "standard", "estMinutes": 5, "parallelizableWith": [20,21,22], "blockedBy": [2], "commit": "3d0f4dc"},
{"id": 20, "subject": "Task 20: AkkaHostedService implementation", "status": "completed", "classification": "standard", "estMinutes": 5, "parallelizableWith": [19,21,22], "blockedBy": [2,18], "commit": "f184f8e"},
{"id": 21, "subject": "Task 21: Role parser from OTOPCUA_ROLES env", "status": "completed", "classification": "small", "estMinutes": 3, "parallelizableWith": [19,20,22], "blockedBy": [2], "commit": "dfb0636"},
{"id": 22, "subject": "Task 22: ClusterRoleInfo implementation", "status": "completed", "classification": "standard", "estMinutes": 5, "parallelizableWith": [19,20,21], "blockedBy": [18,20], "commit": "c217c49"},
{"id": 23, "subject": "Task 23: Cluster test project + tests", "status": "completed", "classification": "standard", "estMinutes": 5, "parallelizableWith": [], "blockedBy": [19,20,21,22], "commit": "e0b6d56"},
{"id": 24, "subject": "Task 24: Move LdapAuthService into OtOpcUa.Security", "status": "completed", "classification": "standard", "estMinutes": 5, "parallelizableWith": [25], "blockedBy": [3], "commit": "567b8ca"},
{"id": 25, "subject": "Task 25: JwtTokenService", "status": "completed", "classification": "standard", "estMinutes": 5, "parallelizableWith": [24], "blockedBy": [3], "commit": "93316e3"},
{"id": 26, "subject": "Task 26: Cookie+JWT hybrid AddOtOpcUaAuth extension", "status": "completed", "classification": "standard", "estMinutes": 5, "parallelizableWith": [27,28], "blockedBy": [13,24,25], "commit": "207fc6a"},
{"id": 27, "subject": "Task 27: /auth/login, /auth/ping, /auth/token endpoints", "status": "completed", "classification": "standard", "estMinutes": 5, "parallelizableWith": [26,28], "blockedBy": [24,25], "commit": "8be84ba"},
{"id": 28, "subject": "Task 28: CookieAuthenticationStateProvider for Blazor", "status": "completed", "classification": "small", "estMinutes": 4, "parallelizableWith": [26,27], "blockedBy": [25], "commit": "e38f22e"},
{"id": 29, "subject": "Task 29: Security test project + tests", "status": "completed", "classification": "standard", "estMinutes": 5, "parallelizableWith": [], "blockedBy": [24,25,26,27,28], "commit": "38ea0c5"},
{"id": 30, "subject": "Task 30: ConfigPublishCoordinator happy path", "status": "completed", "classification": "high-risk", "estMinutes": 5, "parallelizableWith": [32,33,34,35], "blockedBy": [4,17,18,10,11], "commit": "62e12da"},
{"id": 31, "subject": "Task 31: Coordinator timeout + failover recovery", "status": "completed", "classification": "high-risk", "estMinutes": 5, "parallelizableWith": [32,33,34,35], "blockedBy": [30], "commit": "f193872"},
{"id": 32, "subject": "Task 32: AdminOperationsActor + StartDeployment", "status": "completed", "classification": "standard", "estMinutes": 5, "parallelizableWith": [30,31,33,34,35], "blockedBy": [4,17,18,10,12], "commit": "ef683f5"},
{"id": 33, "subject": "Task 33: AuditWriterActor batched idempotent insert", "status": "completed", "classification": "standard", "estMinutes": 5, "parallelizableWith": [30,31,32,34,35], "blockedBy": [4,17], "commit": "23f669c"},
{"id": 34, "subject": "Task 34: FleetStatusBroadcaster", "status": "completed", "classification": "standard", "estMinutes": 5, "parallelizableWith": [30,31,32,33,35], "blockedBy": [4,17], "commit": "dd122c4"},
{"id": 35, "subject": "Task 35: RedundancyStateActor + ServiceLevelCalculator", "status": "completed", "classification": "high-risk", "estMinutes": 5, "parallelizableWith": [30,31,32,33,34], "blockedBy": [4,17,18], "commit": "6b37f99"},
{"id": 36, "subject": "Task 36: Singleton registration extension (admin role)", "status": "completed", "classification": "standard", "estMinutes": 4, "parallelizableWith": [], "blockedBy": [30,31,32,33,34,35], "commit": "52bf4b3"},
{"id": 37, "subject": "Task 37: DriverHostActor scaffolding + PreStart recovery", "status": "completed", "classification": "high-risk", "estMinutes": 5, "parallelizableWith": [41,42,43,44], "blockedBy": [5,17,18,11], "commit": "ed13013"},
{"id": 38, "subject": "Task 38: DriverHostActor DispatchDeployment handler", "status": "completed", "classification": "high-risk", "estMinutes": 5, "parallelizableWith": [41,42,43,44], "blockedBy": [37], "commit": "ed13013"},
{"id": 39, "subject": "Task 39: DriverHostActor stale-config fallback", "status": "completed", "classification": "standard", "estMinutes": 4, "parallelizableWith": [41,42,43,44], "blockedBy": [38], "commit": "ed13013"},
{"id": 40, "subject": "Task 40: Runtime test project bootstrap", "status": "completed", "classification": "small", "estMinutes": 3, "parallelizableWith": [], "blockedBy": [37,38,39], "commit": "ed13013"},
{"id": 41, "subject": "Task 41: DriverInstanceActor state machine", "status": "completed", "classification": "high-risk", "estMinutes": 5, "parallelizableWith": [42,43,44], "blockedBy": [5,17,40], "commit": "64c627f"},
{"id": 42, "subject": "Task 42: VirtualTagActor", "status": "completed", "classification": "standard", "estMinutes": 5, "parallelizableWith": [41,43,44], "blockedBy": [5,17,40], "commit": "39729bf"},
{"id": 43, "subject": "Task 43: ScriptedAlarmActor", "status": "completed", "classification": "standard", "estMinutes": 5, "parallelizableWith": [41,42,44], "blockedBy": [5,17,40], "commit": "95ef533"},
{"id": 44, "subject": "Task 44: OpcUaPublishActor on synchronized dispatcher", "status": "completed", "classification": "high-risk", "estMinutes": 5, "parallelizableWith": [41,42,43], "blockedBy": [5,6,17,19,40], "commit": "e115f13"},
{"id": 45, "subject": "Task 45: HistorianAdapter + PeerOpcUaProbe + DbHealthProbe actors", "status": "completed", "classification": "standard", "estMinutes": 5, "parallelizableWith": [], "blockedBy": [37,40], "commit": "28639cb"},
{"id": 46, "subject": "Task 46: Extract OpcUaApplicationHost + Phase7Composer", "status": "completed", "classification": "standard", "estMinutes": 5, "parallelizableWith": [], "blockedBy": [6], "commit": "2877a88"},
{"id": 47, "subject": "Task 47: Phase7Composer purity + property tests", "status": "completed", "classification": "standard", "estMinutes": 5, "parallelizableWith": [48,49,50,51,52], "blockedBy": [46], "commit": "b7c117a"},
{"id": 48, "subject": "Task 48: Move Blazor components into AdminUI library", "status": "completed", "classification": "standard", "estMinutes": 5, "parallelizableWith": [47], "blockedBy": [7], "commit": "1a067e6"},
{"id": 49, "subject": "Task 49: Move SignalR hubs and rewire to FleetStatusBroadcaster", "status": "completed", "classification": "standard", "estMinutes": 5, "parallelizableWith": [50,51,52], "blockedBy": [34,48], "commit": "26d8f2f"},
{"id": 50, "subject": "Task 50: IAdminOperationsClient via ClusterSingletonProxy", "status": "completed", "classification": "standard", "estMinutes": 5, "parallelizableWith": [49,51,52], "blockedBy": [18,32,48], "commit": "f022499"},
{"id": 51, "subject": "Task 51: Replace DriverDiagnosticsClient with IFleetDiagnosticsClient", "status": "completed", "classification": "standard", "estMinutes": 5, "parallelizableWith": [49,50,52], "blockedBy": [18,48], "commit": "b83f099"},
{"id": 52, "subject": "Task 52: Drift indicator + Deploy button UI", "status": "completed", "classification": "standard", "estMinutes": 5, "parallelizableWith": [49,50,51], "blockedBy": [50,48], "commit": "f167808"},
{"id": 53, "subject": "Task 53: Host Program.cs role-gated startup", "status": "completed", "classification": "high-risk", "estMinutes": 5, "parallelizableWith": [54,55], "blockedBy": [8,15,20,21,22,26,36,40,45,46,48,49], "commit": "e2b357f"},
{"id": 54, "subject": "Task 54: Health endpoints + appsettings layout", "status": "completed", "classification": "standard", "estMinutes": 5, "parallelizableWith": [53,55], "blockedBy": [8,22], "commit": "fa1d685"},
{"id": 55, "subject": "Task 55: Mac dev mode + DEV-STUB drivers", "status": "completed", "classification": "standard", "estMinutes": 5, "parallelizableWith": [53,54], "blockedBy": [41], "commit": "8b4de80"},
{"id": 56, "subject": "Task 56: Delete OtOpcUa.Server + OtOpcUa.Admin projects", "status": "completed", "classification": "high-risk", "estMinutes": 5, "parallelizableWith": [], "blockedBy": [53,54,55], "commit": "76310b8"},
{"id": 57, "subject": "Task 57: Build & test green check", "status": "completed", "classification": "trivial", "estMinutes": 3, "parallelizableWith": [], "blockedBy": [56], "commit": "76310b8"},
{"id": 58, "subject": "Task 58: 2-node integration test harness", "status": "completed", "classification": "standard", "estMinutes": 5, "parallelizableWith": [], "blockedBy": [57], "commit": "d6fac2d", "deviation": "Also consolidated to a single Akka.Hosting ActorSystem — Program.cs ran two competing ActorSystems (custom AkkaHostedService + Akka.Hosting AddAkka). Cluster singletons landed on the bare one. Fixed in this commit; AkkaHostedService.cs deleted. docker-compose.yml (SQL+OpenLDAP for real local runs) deferred — harness uses EF in-memory."},
{"id": 59, "subject": "Task 59: Deploy + failover integration tests", "status": "completed", "classification": "standard", "estMinutes": 5, "parallelizableWith": [60], "blockedBy": [58], "commit": "5cfbe8b", "deviation": "Happy-path + idempotency landed. Failover scenarios (kill-mid-apply, split-brain, restart-during-deploy) deferred as F22 — they need node-down/restart primitives on the harness. Two production bugs fixed in this commit: (1) coordinator missing DPS subscription for ACKs, (2) NodeId collision on shared loopback host."},
{"id": 60, "subject": "Task 60: OPC UA dual-endpoint + ServiceLevel tests", "status": "pending", "classification": "standard", "estMinutes": 5, "parallelizableWith": [59], "blockedBy": [58]},
{"id": 61, "subject": "Task 61: E2E test infrastructure + GitHub Actions CI", "status": "completed", "classification": "standard", "estMinutes": 5, "parallelizableWith": [], "blockedBy": [59,60], "commit": "253fb60", "deviation": "CI workflow files landed but E2E test project (tests/Server/ZB.MOM.WW.OtOpcUa.E2ETests) deferred — it lands when F10/F11/F12 wire enough engine for an end-to-end round-trip to be meaningful. The E2E workflow runs against the docker-dev fleet but its --filter Category=E2E currently matches zero tests."},
{"id": 62, "subject": "Task 62: Rewrite Install-Services.ps1", "status": "completed", "classification": "standard", "estMinutes": 5, "parallelizableWith": [63,64,65], "blockedBy": [53], "commit": "e40615d"},
{"id": 63, "subject": "Task 63: Traefik config + docker-dev compose", "status": "completed", "classification": "standard", "estMinutes": 5, "parallelizableWith": [62,64,65], "blockedBy": [53], "commit": "7e3b56c", "deviation": "Untested on macOS (no local Docker). Compose file should work — exercise + adjust on first run against a real Docker host."},
{"id": 64, "subject": "Task 64: Update existing docs (Redundancy, ServiceHosting, security)", "status": "completed", "classification": "standard", "estMinutes": 5, "parallelizableWith": [62,63,65], "blockedBy": [57], "commit": "3c3fef9", "deviation": "Redundancy.md + ServiceHosting.md full rewrites. security.md v2 banner only — full per-section rewrite waits for F15 (Admin pages migration) since security.md references many pages that will move. README.md platform-overview updated."},
{"id": 65, "subject": "Task 65: New v2 docs (Architecture-v2, Cluster, ControlPlane, Runtime)", "status": "completed", "classification": "standard", "estMinutes": 5, "parallelizableWith": [62,63,64], "blockedBy": [57], "commit": "1689901"},
{"id": "F1", "subject": "Follow-up: AuthEndpoints integration tests against fused Host", "status": "completed", "classification": "small", "estMinutes": 10, "parallelizableWith": ["F2"], "blockedBy": [53], "commit": "463512d", "origin": "Deviation from Task 29 (commit 38ea0c5) — deferred until Task 53 wires AddOtOpcUaAuth/MapOtOpcUaAuth in Program. Add WebApplicationFactory<OtOpcUa.Host.Program> tests for /auth/login (204/401/503), /auth/ping (401/200), /auth/token (200+JWT), /auth/logout (204+cookie clear) using a stub ILdapAuthService.", "deviation": "Used HostBuilder + TestServer directly (Security.Tests/AuthEndpointsIntegrationTests) instead of WebApplicationFactory<Program> — Host needs Akka cluster bootstrap that's out of scope for this contract test. Cluster-mode auth coverage belongs in Task 58."},
{"id": "F2", "subject": "Follow-up: Replace JwtBearer BuildServiceProvider antipattern with IPostConfigureOptions", "status": "completed", "classification": "small", "estMinutes": 5, "parallelizableWith": ["F1"], "blockedBy": [], "commit": "45a8c79", "origin": "Deviation from Task 26 (commit 207fc6a) — AddOtOpcUaAuth uses services.BuildServiceProvider().CreateScope() inside .AddJwtBearer lambda (ASP0000). Refactor to IPostConfigureOptions<JwtBearerOptions> so validation parameters resolve lazily from the real request provider."},
{"id": "F3", "subject": "Follow-up: Add EventId unique column to ConfigAuditLog for cross-restart audit idempotency", "status": "completed", "classification": "small", "estMinutes": 15, "parallelizableWith": ["F4"], "blockedBy": [], "commit": "f57f61d", "origin": "Deviation from Task 33 — AuditWriterActor only dedups in-buffer; ConfigAuditLog lacks EventId column so a duplicate AuditEvent that arrives after a flush becomes a duplicate row. Add nullable EventId Guid + filtered unique index, migration, and refactor AuditWriterActor.WrapDetails away."},
{"id": "F4", "subject": "Follow-up: Harden AuditWriterActor.WrapDetails JSON synthesis with System.Text.Json", "status": "completed", "classification": "small", "estMinutes": 5, "parallelizableWith": ["F3"], "blockedBy": [], "commit": "f57f61d", "deviation": "Moot — F3 deleted WrapDetails entirely (EventId/CorrelationId now live in dedicated columns).", "origin": "Self-review of Task 33 — WrapDetails uses string concat; malformed caller DetailsJson would produce invalid JSON and trip the CK_ConfigAuditLog_DetailsJson_IsJson constraint, killing the entire flush batch. Discard this task if F3 lands first (F3 removes WrapDetails entirely)."},
{"id": "F5", "subject": "Follow-up: ConfigPublishCoordinator multi-node happy-path test", "status": "completed", "classification": "standard", "estMinutes": 30, "parallelizableWith": [], "blockedBy": [], "commit": "5cfbe8b", "deviation": "Delivered by Task 59 — DeployHappyPathTests.StartDeployment_seals_after_both_nodes_apply exercises the exact 'dispatch to N driver nodes, all ack, seals' flow via the real 2-node TwoNodeClusterHarness rather than a multi-system TestKit. Cleaner because it tests the production code path end-to-end.", "origin": "Self-review of Task 30 — single-ActorSystem TestKit can't simulate the plan's 'dispatch to N driver nodes, all ack, seals' happy path because DiscoverDriverNodes() needs real cluster membership. Add a multi-system test (two ActorSystems joined into one cluster, driver-role on the second)."},
{"id": "F6", "subject": "Follow-up: RedundancyStateActor publisher abstraction so tests don't need DPS bootstrap", "status": "completed", "classification": "small", "estMinutes": 10, "parallelizableWith": [], "blockedBy": [], "commit": "dfc143c", "origin": "Self-review of Task 35 — RedundancyStateActorTests are skipped because single-node DistributedPubSub bootstrap is unreliable in TestKit. Inject an Action<object> broadcast so tests can replace it with a probe; un-skip both tests."},
{"id": "F7", "subject": "Follow-up: DriverInstanceActor full engine wiring (subscriptions, writes, ApplyDelta diff)", "status": "completed", "classification": "standard", "estMinutes": 45, "parallelizableWith": [], "blockedBy": [44], "origin": "Self-review of Task 41 — subscription publishing, ApplyDelta diffing, bad-quality-on-disconnect, write path, and supervisor backoff are stubbed. Wire after OpcUaPublishActor lands.", "shipped": "All three pieces landed: (1) spawn lifecycle in DriverHostActor (DriverSpawnPlanner + IDriverFactory seam) — da14149, (2) ISubscribable wiring + OPC UA status-code → OpcUaQuality severity-bit mapping + DetachSubscription on disconnect/PostStop, (3) IWritable.WriteAsync write path with 5s timeout, status-code bubble-up, and AttributeValuePublished published to parent on every OnDataChange — both shipped in the F7-residual batch. Host DI binding (DriverFactoryBootstrap registers AbCip/AbLegacy/FOCAS/Galaxy/Modbus/S7/TwinCAT factories) lives in src/Server/ZB.MOM.WW.OtOpcUa.Host/Drivers/."},
{"id": "F8", "subject": "Follow-up: VirtualTagActor engine wiring (compile expression, subscribe deps, publish result)", "status": "partial", "classification": "standard", "estMinutes": 30, "parallelizableWith": [], "blockedBy": [], "origin": "Self-review of Task 42 — VirtualTagEngine.Evaluate not called; DependencyValueChanged just buffers.", "shipped": "(1) IVirtualTagEvaluator seam + NullVirtualTagEvaluator default. VirtualTagActor calls evaluator on DependencyValueChanged, dedupes unchanged results, emits EvaluationResult to parent, publishes Warning ScriptLogEntry on failure. (2) DependencyMuxActor in Runtime fans out DriverInstanceActor.AttributeValuePublished from DriverHostActor through to interested VirtualTagActor subscribers. VirtualTagActor takes dependencyRefs + mux ActorRef in Props, registers interest in PreStart, unregisters in PostStop. WithOtOpcUaRuntimeActors spawns the mux + threads it into DriverHostActor. Production binding to Core.VirtualTags.VirtualTagEngine (expression compile + dep extraction) still TODO — split as F8b."},
{"id": "F9", "subject": "Follow-up: ScriptedAlarmActor engine wiring + state persistence", "status": "partial", "classification": "standard", "estMinutes": 30, "parallelizableWith": [], "blockedBy": [], "origin": "Self-review of Task 43 — AlarmConditionService not called; PreRestart persistence to ScriptedAlarmState DB not wired; HistorianAdapter rows not emitted.", "shipped": "(1) IScriptedAlarmEvaluator seam + NullScriptedAlarmEvaluator default. ScriptedAlarmActor takes AlarmConfig (id/name/path/severity/predicate), evaluates on DependencyValueChanged, publishes AlarmTransitionEvent + ScriptLogEntry on every transition. (2) IAlarmActorStateStore seam in Commons.Engines + NullAlarmActorStateStore default + EfAlarmActorStateStore production adapter over the ScriptedAlarmState entity. ScriptedAlarmActor PreStart loads + restores; every Transition fires a fire-and-forget save with lastAckUser. Predicate binding to Core.ScriptedAlarms.ScriptedAlarmEngine still TODO — split as F9b."},
{"id": "F10", "subject": "Follow-up: OpcUaPublishActor SDK integration (address-space writes + ServiceLevel + RebuildAddressSpace)", "status": "partial", "classification": "high-risk", "estMinutes": 60, "parallelizableWith": [], "blockedBy": [47], "origin": "Self-review of Task 44 — SDK calls stubbed; counters only. Wire after Phase 7 OpcUaServer extraction.", "shipped": "(1) IOpcUaAddressSpaceSink + IServiceLevelPublisher seams in Commons.OpcUa with Null* defaults. OpcUaPublishActor routes through the sink, dedupes ServiceLevelChanged, subscribes to redundancy-state DPS topic, maps redundancy snapshot to a coarse ServiceLevel (Primary+leader=240, Primary=200, Secondary=100, Detached=0). (2) OtOpcUaNodeManager (CustomNodeManager2) + OtOpcUaSdkServer (StandardServer subclass) + SdkAddressSpaceSink in OpcUaServer — lazy variable creation on first WriteValue, WriteAlarmState shape, RebuildAddressSpace tear-down. Variable updates propagate via ClearChangeMasks so subscribed OPC UA clients see them. Tests boot a real StandardServer + verify sink writes show up in the manager. Production wiring through OpcUaApplicationHost.StartAsync (default server = OtOpcUaSdkServer) + IServiceLevelPublisher SDK binding + #109 OpcUaPublishActor→Phase7Applier integration are the remaining pieces."},
{"id": "F11", "subject": "Follow-up: HistorianAdapterActor named-pipe IPC + SqliteStoreAndForwardSink wiring", "status": "completed", "classification": "standard", "estMinutes": 30, "parallelizableWith": [], "blockedBy": [], "commit": "6861381", "deviationNotes": "Reshaped HistorianAdapterActor around the existing IAlarmHistorianSink abstraction (alarm-event shape, not the original tag-history-row stub). Defaults to NullAlarmHistorianSink; production deployments wire SqliteStoreAndForwardSink + WonderwareHistorianClient via AddOtOpcUaRuntime overrides. Actor now exposes GetStatus returning HistorianSinkStatus for diagnostics. Named-pipe transport implementation lives unchanged in src/Drivers/ZB.MOM.WW.OtOpcUa.Driver.Historian.Wonderware.Client/WonderwareHistorianClient.cs — the actor is intentionally just a fire-and-forget bridge.", "origin": "Self-review of Task 45 — stub buffers in-memory; named-pipe + SQLite store-and-forward not wired."},
{"id": "F12", "subject": "Follow-up: PeerOpcUaProbeActor real opc.tcp ping (replace Ok=true stub)", "status": "completed", "classification": "small", "estMinutes": 20, "parallelizableWith": [], "blockedBy": [], "commit": "b06e3ae", "deviation": "TCP-connect probe rather than full OPC UA Hello/Acknowledge handshake. Enough for the redundancy calc; deeper liveness signals can layer on later without changing the actor's contract.", "origin": "Self-review of Task 45 — RunProbe always returns Ok=true; replace with OPC UA Client connect."},
{"id": "F13", "subject": "Follow-up: Full OpcUaApplicationHost extraction (security/alarms/history/observability)", "status": "partial", "classification": "high-risk", "estMinutes": 120, "parallelizableWith": [], "blockedBy": [], "commit": "36c4751-partial", "deviationNotes": "F13a (cert auto-creation) shipped in 36c4751. Remaining: endpoint-security wiring (SecurityProfileResolver into ServerConfiguration.SecurityPolicies), LDAP user-token validator (the OPC UA UserNameToken path; HTTP-layer LDAP auth is separate and already in OtOpcUa.Security), scripted-alarm node manager creation, history backend wiring, observability hooks (OpenTelemetry metrics + traces). These are gated by F10's OpcUaPublishActor SDK integration — until F10 lands, nothing instantiates OpcUaApplicationHost so the missing wiring is dead weight.", "origin": "Self-review of Task 46 — facade only boots ApplicationInstance + StandardServer. Legacy 391-line file pulls Server.Security/Alarms/History/Observability. Pull those into thin OpcUaServer interfaces."},
{"id": "F14", "subject": "Follow-up: Migrate side-effecting Phase7Composer (EquipmentNodeWalker, trace logs, node cache)", "status": "partial", "classification": "standard", "estMinutes": 60, "parallelizableWith": [], "blockedBy": [], "origin": "Self-review of Task 47 — pure version covers the projection. Walker + alarm sink registration + cache mutation stay in legacy until split into Phase7Applier.", "shipped": "Phase7Plan + Phase7Planner.Compute (pure diff over EquipmentNodes/DriverInstancePlans/ScriptedAlarmPlans by stable id, with Added/Removed/Changed lists). Phase7Applier consumes plan + IOpcUaAddressSpaceSink: drives RebuildAddressSpace on Equipment/Alarm topology change, writes inactive AlarmState for removed nodes, catches + logs sink faults. Driver-only changes correctly skip the rebuild (DriverHostActor's spawn-plan in Runtime handles those). Walker integration with the real SDK NodeManager is the remaining piece — split as F14b (consumes the existing EquipmentNodeWalker once F10b lands an SDK builder)."},
{"id": "F15", "subject": "Follow-up: Migrate 47 legacy Admin Blazor components into AdminUI library", "status": "completed", "classification": "high-risk", "estMinutes": 180, "commit": "Phase A-D (read views) + F15.2 batches 1-4 (live-edit CRUD) + F15.3 (live alerts/script-log/CSV import/Monaco)", "deviationNotes": "All 4 phases of read-only views shipped: Phase A (shell/auth/fleet/hosts), B (cluster CRUD + Overview/Redundancy), C (Equipment/UNS/Namespaces/Drivers/Tags/ACLs), D (Audit/VirtualTags/ScriptedAlarms/Scripts/RoleGrants/Certificates/Reservations/AlarmsHistorian). Per Q1Q5 of docs/v2/AdminUI-rebuild-plan.md: typed driver editors deferred, top-level VirtualTags/ScriptedAlarms kept (Q2 reversed for cross-cluster discoverability), routes-not-tabs adopted, fleet-wide LDAP→role map only, generic login errors. Live-edit forms (F15.2) and ScriptLog page (depends on F16 ScriptLogHub) are explicit follow-ups.", "parallelizableWith": [], "blockedBy": [], "origin": "Self-review of Task 48 — only MapAdminUI scaffold + 1 new page (Deployments). 47 pages stay in legacy Admin (accepted-broken) until Task 56."},
{"id": "F16", "subject": "Follow-up: Bridge FleetStatusBroadcaster → SignalR hubs (FleetStatusHub / AlertHub / ScriptLogHub)", "status": "completed", "classification": "standard", "estMinutes": 30, "parallelizableWith": [], "blockedBy": [], "commit": "f18c285", "deviation": "FleetStatusHub bridge landed. AlertHub + ScriptLogHub deferred — they need upstream message contracts that aren't defined yet (alerts emerge from F9 ScriptedAlarmActor, script logs from F8 VirtualTagActor).", "origin": "Self-review of Task 49 — hubs are passive Hub subclasses; the bridge from FleetStatusBroadcaster.broadcast → IHubContext is not wired."},
{"id": "F17", "subject": "Follow-up: FleetDiagnosticsClient real Akka ActorSelection round-trip (GetDiagnosticsRequest)", "status": "completed", "classification": "standard", "estMinutes": 30, "parallelizableWith": [], "blockedBy": [], "commit": "8f32b89", "origin": "Self-review of Task 51 — client returns an empty snapshot stub. Add GetDiagnosticsRequest contract + DriverHostActor handler + real Ask/Reply."},
{"id": "F18", "subject": "Follow-up: Thread HttpContext.User.Identity.Name into Deployments page (createdBy)", "status": "completed", "classification": "small", "estMinutes": 5, "parallelizableWith": [], "blockedBy": [], "commit": "b266f63", "origin": "Self-review of Task 52 — Deployments.razor hardcodes createdBy=\"(current user)\"; needs @inject AuthenticationStateProvider."},
{"id": "F19", "subject": "Follow-up: RuntimeStartup extension for driver-role node-actor spawn", "status": "completed", "classification": "standard", "estMinutes": 20, "parallelizableWith": [], "blockedBy": [], "commit": "09d6676", "origin": "Self-review of Task 53 — only admin-role singletons are wired via WithOtOpcUaControlPlaneSingletons. Driver-role nodes need a parallel WithOtOpcUaRuntimeActors that spawns DriverHostActor."},
{"id": "F20", "subject": "Follow-up: Wire DriverInstanceActor.ShouldStub() into DriverHostActor child spawn", "status": "completed", "classification": "small", "estMinutes": 10, "parallelizableWith": ["F7"], "blockedBy": [], "origin": "Self-review of Task 55 — ShouldStub helper exists but nothing calls it. Folds into F7 when DriverHostActor learns to spawn DriverInstanceActor children.", "shipped": "DriverHostActor.SpawnChild now calls DriverInstanceActor.ShouldStub(type, _localRoles) and routes Windows-only driver types to the stub path on non-Windows / dev-role hosts. Verified by DriverHostActorReconcileTests.Galaxy_on_non_windows_is_stubbed_by_ShouldStub_check."},
{"id": "F21", "subject": "Follow-up: docker-compose.yml for Host.IntegrationTests (real SQL Server + OpenLDAP)", "status": "completed", "classification": "standard", "estMinutes": 30, "parallelizableWith": [], "blockedBy": [], "commit": "b0a2bb0", "deviationNotes": "Stack shipped (SQL on 14331, OpenLDAP on 3894). HarnessMode reads OTOPCUA_HARNESS_USE_SQL=1 / USE_LDAP=1 from env; SQL mode uses per-harness unique DB via EnsureCreated. Compose itself not local-validated — DESKTOP-6JL3KKO has no Docker per CLAUDE.md; CI on Linux will exercise the real path. The xunit test-trait split was punted — env vars are simpler and cover the same use case (one suite, two modes, no test-class duplication).", "origin": "Deviation from Task 58 — TwoNodeClusterHarness uses EF InMemoryDatabase + StubLdapAuthService. For Mac-friendly local runs against real SQL constraints + LDAP, add tests/Server/ZB.MOM.WW.OtOpcUa.Host.IntegrationTests/docker-compose.yml (SQL Server + OpenLDAP), wire EF SqlServer provider behind an env var (OTOPCUA_HARNESS_USE_SQL=1), and add a test trait so CI can run both modes."},
{"id": "F22", "subject": "Follow-up: failover scenario integration tests (kill-mid-apply, split-brain, restart-during-deploy)", "status": "completed", "classification": "standard", "estMinutes": 60, "parallelizableWith": [], "blockedBy": [], "commit": "cd5540c", "deviationNotes": "Shipped 3 scenarios on the existing 2-node harness: stop-shrinks, restart-rejoins-same-port, deploy-with-one-node-down. Split-brain via simulated partition deferred — Akka.Hosting + xunit don't expose transport-level interference cleanly. The graceful-shutdown + rejoin path is what production actually exercises; ungraceful kill-mid-apply non-deterministic under SBR's 15s stable-after.", "origin": "Deviation from Task 59 — happy-path + idempotency landed but design §8 cases 3-7 need controlled node-down primitives on TwoNodeClusterHarness (StopNodeAsync, RestartNodeAsync, PartitionBetweenAsync). Add those + 5 scenario tests."}
]
}
@@ -0,0 +1,308 @@
# AdminUI — Driver-Specific Pages
**Status:** Design approved, ready for implementation planning
**Date:** 2026-05-28
**Branch:** `master` (work to land on a feature branch)
## 1. Motivation
Today the AdminUI has a single generic `DriverEdit.razor` page (`src/Server/ZB.MOM.WW.OtOpcUa.AdminUI/Components/Pages/Clusters/DriverEdit.razor`, 323 lines) that edits every driver type via a raw JSON `DriverConfig` textarea. The page itself flags this as temporary:
> Per Q1 of the AdminUI rebuild plan, typed driver editors (Modbus, FOCAS) are deferred… lands in a Phase C.2 follow-up.
This design is that follow-up. Goals:
1. Replace the JSON blob with a **typed form per driver type** that exposes every supported configuration option.
2. Add three driver-aware operator capabilities to each page: **Test Connect**, **live runtime status**, and a **driver-specific tag/address picker**.
3. Add **Reconnect / Restart** controls on the status panel for authorized users.
## 2. Scope
All 9 driver types ship typed pages in this work:
```
ModbusTcp, AbCip, AbLegacy, S7, TwinCat, FOCAS,
OpcUaClient, Galaxy, Historian.Wonderware
```
Each typed page exposes the full surface of its driver's options class — the JSON editor is retired; the typed form is the only way to edit driver config from the AdminUI.
## 3. Architecture
### 3.1 Project layout
```
src/Drivers/
ZB.MOM.WW.OtOpcUa.Driver.<Type>/ (runtime, unchanged behavior)
ZB.MOM.WW.OtOpcUa.Driver.<Type>.Contracts/ NEW — POCO options + DataAnnotations only
<Type>DriverOptions.cs (moved from runtime project)
src/Server/ZB.MOM.WW.OtOpcUa.AdminUI/
Components/Pages/Clusters/Drivers/ NEW folder
DriverTypePicker.razor route: /clusters/{id}/drivers/new
DriverEditRouter.razor route: /clusters/{id}/drivers/{instanceId}
ModbusDriverPage.razor route: /clusters/{id}/drivers/new/modbus
GalaxyDriverPage.razor route: /clusters/{id}/drivers/new/galaxy
S7DriverPage.razor
OpcUaClientDriverPage.razor
AbCipDriverPage.razor
AbLegacyDriverPage.razor
TwinCatDriverPage.razor
FocasDriverPage.razor
HistorianWonderwareDriverPage.razor
Components/Shared/Drivers/ NEW folder
DriverFormShell.razor panel layout + Save/Cancel/Delete
DriverIdentitySection.razor InstanceId, Name, Namespace, Enabled
DriverResilienceSection.razor Polly overrides
DriverStatusPanel.razor live status + Reconnect/Restart
DriverTestConnectButton.razor per-driver-timeout probe
DriverTagPicker.razor modal shell, hosts per-driver picker body
Hubs/
DriverStatusHub.cs NEW SignalR hub at /hubs/driverstatus
DriverStatusSignalRBridge.cs NEW (mirrors FleetStatusSignalRBridge)
```
### 3.2 Routing
- `/clusters/{ClusterId}/drivers` — existing `ClusterDrivers.razor` list, unchanged.
- `/clusters/{ClusterId}/drivers/new` — new `DriverTypePicker.razor` (operator picks driver type).
- `/clusters/{ClusterId}/drivers/new/{driverType}` — typed new-form for that type.
- `/clusters/{ClusterId}/drivers/{DriverInstanceId}``DriverEditRouter.razor` reads the row's `DriverType`, dispatches to the right `*DriverPage` via `<DynamicComponent>` (no redirect flicker).
### 3.3 Schema source
Each driver's `Options` class moves to a new `Driver.<Type>.Contracts` csproj — POCO + `System.ComponentModel.DataAnnotations` attributes only, no NuGet references, no project references. The runtime driver project adds a `ProjectReference` to its contracts sibling and re-uses the same type (single source of truth, no `TypeForwardedTo` needed if the namespace is preserved). The AdminUI gains 9 `ProjectReference`s — all pure POCO, so no native deps (Galaxy COM, FOCAS native libs, OPC UA stack) leak into the AdminUI publish output.
Attributes used:
- `[Required]`, `[Range(...)]`, `[RegularExpression(...)]` — render as inputs + `<ValidationMessage>` via `DataAnnotationsValidator`.
- `[Display(Name, Description, GroupName)]` — label, help-text under field, panel section.
- `[DataType(DataType.Password)]` — render as `<InputText type="password">` (e.g. mxaccessgw API key).
The `*DriverPage.razor` files **explicitly** bind each field (no runtime reflection). Attributes drive labels/help/validation but not field discovery — this avoids the "metadata silently drifts from rendering" trap.
### 3.4 Persistence
`DriverInstance.DriverConfig` stays a JSON string column (no schema change). On save: typed form-model serialized via `System.Text.Json` against the driver's Options class. On load: row's JSON deserialized into the matching Options class with `JsonSerializerOptions { UnmappedMemberHandling = Skip }` so old/unknown fields are silently dropped on next save. Version skew is bounded by the fact that drivers ship as one host binary.
`RowVersion` optimistic concurrency unchanged from today's `DriverEdit.razor`.
## 4. Test Connect
### 4.1 Flow
```
[Browser] [AdminUI server] [Cluster]
DriverGalaxyPage AdminProbeService AdminOperationsActor
| | |
|-- click TestConnect --->| |
| |-- Ask<TestDriverConnect> --->|
| | (driverType, configJson, |
| | timeoutSecs) |
| | |--> spawn transient probe actor
| | | (resolves IDriverProbe by
| | | driverType via DI)
| | |<-- ProbeResult (ok, latencyMs)
| |<-- TestDriverConnectResult --|
|<-- green / red chip ----|
```
### 4.2 Components
- **`IDriverProbe`** — interface in `Core.Abstractions` (or equivalent). One implementation per driver type, lives in the driver's **runtime** project. Reuses the existing `IHostConnectivityProbe` plumbing where present (FOCAS, TwinCAT confirmed). For drivers without one, the probe is a cheap subset of the real connect path: TCP `SocketAsyncOperations` for Modbus/AbCip/S7, session open+close for OpcUaClient, `MxCommand.Ping` for Galaxy. Probes **never write**.
- **`TestDriverConnect` message** in `Commons/Messages/Admin``(string driverType, string configJson, TimeSpan timeout)`. Handler in `AdminOperationsActor`: resolves the right probe via keyed DI (`IServiceProvider.GetRequiredKeyedService<IDriverProbe>(driverType)`), deserializes JSON into the matching Options class, calls `probe.RunAsync(options, ct)`. Returns `TestDriverConnectResult(bool ok, string? message, TimeSpan? latency)`.
- **`AdminProbeService`** (AdminUI side) — thin wrapper around the existing AdminOperationsActor bridge. Caller passes timeout; service enforces a 60s hard backstop.
- **`<DriverTestConnectButton>`** — accepts driver type + `Func<string>` to build form JSON on-click. Renders button + inline result chip (auto-clears after 30s). Disabled while in-flight.
### 4.3 Timeout
Each driver's Options class exposes a `ProbeTimeout` (`TimeSpan` or `int Seconds`) with a driver-appropriate default — e.g. Modbus 5s, OpcUaClient 15s, Galaxy 30s. The button reads from the live form (not the persisted row), so an operator can override the timeout per probe attempt. Server-side max = 60s.
### 4.4 Safety
- Probe spawns a transient actor with the *form's* config — the live driver actor (using the *persisted* config) is untouched.
- Probe never mutates the live driver or the database.
- Probe inherits the user context via the existing AdminOperationsActor audit-log entry.
## 5. Live Status Panel
### 5.1 Flow
```
DriverActor DriverStatusSignalRBridge DriverStatusPanel (browser)
| ^ |
|-- publishes |-- subscribed in |
| DriverHealthChanged | OnInitializedAsync |
| to event stream | with InstanceId filter |
| | |
| |-- pushes update -------------->|
| |
| |-- renders state chip,
| | last-success, error count,
| | Reconnect/Restart buttons
```
### 5.2 Reused infrastructure
Driver actors already maintain `DriverHealth(state, lastSuccessUtc, lastError)` — confirmed in FOCAS (`FocasDriver.cs`) and TwinCAT. The bridge mirrors the existing `FleetStatusSignalRBridge` + `AlertSignalRBridge` pattern. SignalR hub uses the same cookie-auth as existing hubs.
### 5.3 New components
- **`DriverStatusHub`** — single method `JoinDriver(string driverInstanceId)`, adds connection to a per-instance group and immediately replies with the current snapshot.
- **`DriverStatusSignalRBridge`** — subscribes to per-cluster driver-health event stream, fans out into SignalR groups keyed by `driverInstanceId`. Only running drivers publish; `Enabled=false` instances render "Disabled — not deployed" without subscribing.
- **`<DriverStatusPanel>`** — props `DriverInstanceId`, `Enabled`. Opens hub on init, calls `JoinDriver`, registers `On<StatusSnapshot>("status", ...)`. Renders state chip (`Healthy` / `Connecting` / `Faulted` / `Unknown`) + last-success timestamp ("2s ago") + error count over last 5min + last error message (collapsed, expandable). Disposes hub on dispose.
### 5.4 Reconnect / Restart controls
Two buttons on the status panel:
- **Reconnect** — driver actor closes + reopens its transport, keeps actor alive. Fast, idempotent. No confirm dialog.
- **Restart** — full actor stop + respawn, loses in-memory state. Slower, can interrupt active subscriptions. Confirm dialog required.
Both:
- Gated by authorization policy `DriverOperator` (mapped to an LDAP group via existing `Authentication.Ldap` config). **Hidden** (not just disabled) for unauthorized users — same approach as other AdminUI gated actions.
- Dispatch `RestartDriver` / `ReconnectDriver` messages through `AdminOperationsActor`, which audit-logs each operation.
- Show spinner + inline "Reconnecting…" chip; panel reflects new state via the SignalR push once health changes.
- Disabled when `Enabled=false` (nothing to restart) and during any in-flight Test Connect on the same page.
### 5.5 Out of scope this PR
History graphs (latency/error rate over time), deep diagnostics (per-tag last values, queue depths), and per-driver bespoke controls beyond Reconnect/Restart — all follow-ups.
### 5.6 Edge cases
- **Driver not yet deployed** (row exists, `Enabled=true`, cluster hasn't picked it up) — panel shows "Awaiting deployment", `DriverHealth.Unknown`.
- **Edit page open while driver is running** — status reflects deployed config, not the form. Banner: "Showing live status for the deployed config — your unsaved changes take effect after Save → next deploy cycle."
- **Test Connect + live status** — probe runs in a transient actor (Section 4), live status reflects the persistent actor. Don't interfere.
## 6. Tag / Address Picker
A picker slot on each page, launched as a modal so the config form stays visible behind it.
### 6.1 Shared shell
`<DriverTagPicker>` — modal chrome + search box + "use this address" action that emits a string back to the parent (e.g. `4x0001` for Modbus, `ns=2;s=Channel.Device.Tag` for OPC UA). Where the picked address lands depends on context: from "create equipment / create tag" flow, pushed into that form; standalone, copy-to-clipboard.
### 6.2 Per-driver bodies — first pass (all static address builders)
| Driver | Picker body |
|---|---|
| Modbus | Register-type dropdown + offset spinner + length → `4x00001-4` |
| AbCip | Tag name + element index, PLC-family hint from form |
| AbLegacy | File type (N/B/F/I/O/S/T/C/R) + file number + element, PLC-family-aware |
| S7 | Area (DB/M/I/Q) + db-number + offset + S7 type → `DB10.DBD20:REAL` |
| TwinCat | ADS variable name (free-text + format hint) |
| FOCAS | Parameter group dropdown + parameter ID; drives FOCAS function-code lookup |
| OpcUaClient | Static helper (NodeId free-text) — **live browse deferred** |
| Galaxy | Static helper (tag_name.AttributeName free-text) — **live browse deferred** |
| Historian.Wonderware | Tag name + retrieval mode + interval |
### 6.3 Deferred to follow-up
- **OpcUaClient live browse** — open session against configured endpoint, walk address space, return NodeId. Reuses the existing `Client.CLI` browse path or calls the OPC UA stack inline. Requires endpoint config to be valid (Test Connect first).
- **Galaxy live browse** — calls mxaccessgw's `GalaxyRepository.ListObjects` / `ListAttributes` via gRPC. Returns `tag_name.AttributeName`. Reuses `IGalaxyHierarchySource`.
- **Historian.Wonderware tag list** — pull from historian's tag store.
The picker slot is wired so swapping a static builder for a live browser later is a 1-component swap, not a page rewrite.
## 7. Error Handling
| Failure | Surface |
|---|---|
| Invalid form input | `DataAnnotationsValidator` + per-field `<ValidationMessage>`; Save disabled. |
| `DbUpdateConcurrencyException` | Red banner — "Another user changed this driver instance, reload before re-applying." (matches existing pattern) |
| FK violation (Namespace deleted while edit open) | Catch `DbUpdateException` — "Namespace `<id>` no longer exists in this cluster — pick another or recreate it." |
| Probe — driver-side exception | Probe actor catches, returns `(false, ex.Message, null)`. Red chip with message. Full stack to Serilog with audit context. |
| Probe — timeout | `(false, "Probe timed out after {n}s", null)`. Server-side 60s backstop. |
| Probe — DI lookup fails (unknown driver type) | Defensive — `(false, "No probe registered for driver type '{type}'", null)`. Error-level log. |
| SignalR disconnect | "Reconnecting…" chip + SignalR auto-reconnect. Stale snapshot dimmed after 30s. |
| Reconnect/Restart on stopped driver | "Driver is not running on any node". Button re-enables. |
| Authorization denied | Reconnect/Restart buttons hidden for unauthorized users. |
| Corrupted `DriverConfig` JSON on row load | Yellow banner — "Saved config could not be parsed against the current schema; falling back to defaults. Save will overwrite." Original JSON preserved in banner for copy-paste. |
## 8. Testing
### 8.1 Unit tests (`tests/Server/ZB.MOM.WW.OtOpcUa.AdminUI.Tests/` — extend existing project, or create if absent)
- `DriverPageFormSerializationTests` — 9 drivers × round-trip Options ↔ JSON ↔ form ↔ DB row. Asserts no loss for known fields, unknown fields dropped silently.
- `DriverTestConnectButtonTests` — render tests: enabled/disabled states, timeout behavior, result chip.
- `DriverStatusPanelTests` — render snapshots for each `DriverState`, disabled mode, stale-data dim.
- `DriverRestartReconnectAuthorizationTests` — buttons hidden without `DriverOperator` policy.
- Address-builder unit tests per driver — 9 small suites covering canonical address formats.
### 8.2 Integration tests (`tests/Server/.../IntegrationTests/`)
- `DriverTestConnectE2eTests` — Modbus + AbCip + S7 against Docker fixtures (`lmxopcua-fix up modbus` etc.). Green probe vs sim, red probe vs wrong port, timeout vs black-holed IP.
- `DriverReconnectE2eTests` — start a driver, click Reconnect, assert `Connecting → Healthy` transition within N seconds.
- `DriverStatusHubE2eTests` — open hub, force state change, assert push arrives within 1s.
### 8.3 Manual smoke (run before PR ship)
Operator on the dev VM with Docker fixtures available:
1. Pre-flight:
- `lmxopcua-fix up modbus standard` — Modbus sim running on `10.100.0.35:5020`.
- AdminUI deployed and reachable.
- LDAP user has the `DriverOperator` (or `FleetAdmin`) role.
2. Type picker:
- Navigate to `/clusters/<id>/drivers/new`. Verify 9 driver-type cards render.
- Click "ModbusTcp". Verify the typed form opens on `/clusters/<id>/drivers/new/modbustcp`.
3. Test Connect (form-driven, no save):
- Fill in Host=`10.100.0.35`, Port=`5020`, leave defaults otherwise.
- Click "Test Connect". Verify green chip + latency < 100ms.
- Change port to `9999`. Click again. Verify red chip with "ConnectionRefused" or similar.
- Change host to `1.2.3.4`. Click again. Within (default 5s) the chip shows "Probe timed out after 5s".
4. Save + edit:
- Set valid endpoint back. Save. Verify redirect to `/clusters/<id>/drivers`.
- Open the just-saved instance. Verify the typed form pre-populates correctly.
5. Live status panel:
- In a second browser tab, open the same driver's edit page. Confirm the `DriverStatusPanel` renders state + last-update.
- Stop the Modbus sim (`lmxopcua-fix down modbus`). Within ~30s, verify the panel transitions Healthy → Reconnecting / Faulted (depending on driver state).
- Bring the sim back up (`lmxopcua-fix up modbus standard`). Verify Healthy is restored.
6. Reconnect / Restart:
- Click "Reconnect" on the status panel. Verify a brief "Reconnecting…" chip + a Healthy state push within 5s.
- Click "Restart". Confirm in the dialog. Verify the actor restarts (full state transition).
- Verify both buttons are HIDDEN for an unauthorized user (LDAP user without `DriverOperator` role).
7. Address picker:
- Click "Pick address" on the Modbus page. Verify the modal opens.
- Builder: select Holding + offset=10 + length=2. Verify the chip shows `4x00010-2`. Click "Use this address" — verify it surfaces in the parent page.
- Close the modal. Repeat for one other driver type (e.g. S7) to confirm cross-driver wiring.
8. Other 8 driver types — smoke each page renders:
- Repeat steps 24 for each remaining driver type. For Galaxy, the Test Connect uses the mxaccessgw endpoint; for OPC UA, an `opc.tcp://` endpoint.
If any step fails, record the failure mode + Razor / actor log excerpts and reopen for fix before PR ship.
### 8.4 bUnit harness
If the AdminUI tests project doesn't already use bUnit, render tests downgrade to logic-only tests on the `@code { }` block; Razor markup is covered by integration tests. Decision deferred to implementation plan.
## 9. Migration / Sequencing
Incremental — driver-by-driver swap-over. Each step compile-clean and shippable on its own:
1. Land 9 Contracts projects + move Options classes. No UI changes.
2. Land shared section components (`DriverIdentitySection`, `DriverResilienceSection`, `DriverFormShell`). Wire into existing `DriverEdit.razor` first so they're tested in place.
3. Land `DriverTypePicker` + `DriverEditRouter` + `<DynamicComponent>` dispatch.
4. Land driver-specific pages one at a time. After each, route list-page links for that driver type only to the new page; leave others on generic editor.
5. Delete the generic `DriverEdit.razor` + its route once all 9 typed pages exist.
6. Land `DriverStatusHub` + bridge + `<DriverStatusPanel>` (read-only first).
7. Land `<DriverTestConnectButton>` + `IDriverProbe` impls + AdminOperationsActor handler.
8. Land Reconnect/Restart on the status panel with `DriverOperator` policy.
9. Land 9 static address builders inside `<DriverTagPicker>`.
## 10. Out of scope (follow-ups)
- Live tag browse for OpcUaClient + Galaxy (Section 6.3).
- Historian.Wonderware tag list pulled from store.
- Status panel history graphs + per-tag diagnostics (Section 5.5).
- Per-driver bespoke controls beyond Reconnect/Restart.
- bUnit setup if not already present (Section 8.4) — decide during implementation planning.
@@ -0,0 +1,840 @@
# AdminUI Driver-Specific Pages Implementation Plan
> **For Claude:** REQUIRED SUB-SKILL: Use superpowers-extended-cc:executing-plans to implement this plan task-by-task.
**Goal:** Replace `DriverEdit.razor` (generic JSON editor) with typed per-driver pages for all 9 driver types, each with Test Connect, live runtime status panel (Reconnect/Restart), and a driver-specific tag/address picker.
**Architecture:** 9 new `Driver.<Type>.Contracts` csprojs hold the `Options` POCOs (moved from runtime projects). AdminUI gains 9 thin `ProjectReference`s — no native deps leak. 9 typed `*DriverPage.razor` components share `<DriverFormShell>` + section/picker/status/test components in `Components/Shared/Drivers/`. Test Connect routes through `AdminOperationsActor` to per-driver `IDriverProbe` impls. Live status uses an Akka DistributedPubSub bridge → SignalR hub → Blazor panel (same pattern as the existing `FleetStatusSignalRBridge`).
**Tech Stack:** .NET 10 Blazor Server, EF Core (SQL Server), Akka.NET (cluster + DistributedPubSub), SignalR, OPC Foundation OPC UA .NET Standard stack, xUnit + Shouldly.
**Authoritative design:** `docs/plans/2026-05-28-adminui-driver-pages-design.md`. Re-read its sections when a task references them.
---
## Phase 0 — Preconditions
### Task 0.1: Create AdminUI test project (currently absent)
**Classification:** small
**Estimated implement time:** ~4 min
**Parallelizable with:** none (every later test task depends on this)
**Files:**
- Create: `tests/Server/ZB.MOM.WW.OtOpcUa.AdminUI.Tests/ZB.MOM.WW.OtOpcUa.AdminUI.Tests.csproj`
- Create: `tests/Server/ZB.MOM.WW.OtOpcUa.AdminUI.Tests/_PlaceholderTests.cs`
- Modify: `ZB.MOM.WW.OtOpcUa.slnx` (add the new project)
**Step 1:** Create csproj targeting `net10.0`, `<IsPackable>false</IsPackable>`. Package refs: `xunit`, `xunit.runner.visualstudio`, `Microsoft.NET.Test.Sdk`, `Shouldly`. Project ref: `..\..\..\src\Server\ZB.MOM.WW.OtOpcUa.AdminUI\ZB.MOM.WW.OtOpcUa.AdminUI.csproj`. Copy structure from a peer like `tests/Server/ZB.MOM.WW.OtOpcUa.Runtime.Tests/*.csproj`.
**Step 2:** Add a single `_PlaceholderTests.cs` with one passing fact so the project compiles + the test runner discovers something.
**Step 3:** Add `<Solution><Project Path="tests/Server/ZB.MOM.WW.OtOpcUa.AdminUI.Tests/ZB.MOM.WW.OtOpcUa.AdminUI.Tests.csproj"/></Solution>` to `ZB.MOM.WW.OtOpcUa.slnx` (match the existing element style).
**Step 4:** Run `dotnet build tests/Server/ZB.MOM.WW.OtOpcUa.AdminUI.Tests` then `dotnet test tests/Server/ZB.MOM.WW.OtOpcUa.AdminUI.Tests`. Both succeed.
**Step 5:** Commit. `test(adminui): scaffold AdminUI.Tests project`
**Decision (deferred from design §8.4):** *no bUnit*. All Razor render tests degrade to logic-only tests on `@code { }` blocks. Razor markup is covered by the integration tests in Phase 6/7/8.
---
## Phase 1 — Contracts Projects (Driver Options → POCO-only siblings)
**Pattern (apply to every task in this phase):**
For driver type `<Type>` (folder `src/Drivers/ZB.MOM.WW.OtOpcUa.Driver.<Type>/`, options file `<Type>DriverOptions.cs`):
1. Create new csproj `src/Drivers/ZB.MOM.WW.OtOpcUa.Driver.<Type>.Contracts/ZB.MOM.WW.OtOpcUa.Driver.<Type>.Contracts.csproj`:
```xml
<Project Sdk="Microsoft.NET.Sdk">
<PropertyGroup>
<TargetFramework>net10.0</TargetFramework>
<Nullable>enable</Nullable>
<ImplicitUsings>enable</ImplicitUsings>
</PropertyGroup>
<!-- NO ProjectReference. NO PackageReference. Pure POCO. -->
</Project>
```
2. `git mv` the options file from the runtime project into the contracts project. Preserve namespace (`ZB.MOM.WW.OtOpcUa.Driver.<Type>` → keep the same `namespace ZB.MOM.WW.OtOpcUa.Driver.<Type>` declaration so consumers don't change). If the options file `using`s anything that isn't `System.*` or `System.ComponentModel.DataAnnotations`, strip that dep — most options classes are pure POCO already; if any pulls a runtime-only type, leave a `// TODO: extract <type> too` and capture it in a follow-up task here.
3. Add `<ProjectReference Include="..\ZB.MOM.WW.OtOpcUa.Driver.<Type>.Contracts\ZB.MOM.WW.OtOpcUa.Driver.<Type>.Contracts.csproj" />` to the runtime project's csproj `<ItemGroup>`.
4. Add the new contracts csproj to `ZB.MOM.WW.OtOpcUa.slnx`.
5. `dotnet build src/Drivers/ZB.MOM.WW.OtOpcUa.Driver.<Type>` → clean. `dotnet build ZB.MOM.WW.OtOpcUa.slnx` → clean.
6. Commit. `refactor(driver-<type>): extract <Type>DriverOptions to .Contracts`
### Task 1.1: Modbus contracts
**Classification:** small
**Estimated implement time:** ~3 min
**Parallelizable with:** Task 1.2 1.9 (different folders, different csprojs)
**Files:**
- Create: `src/Drivers/ZB.MOM.WW.OtOpcUa.Driver.Modbus.Contracts/ZB.MOM.WW.OtOpcUa.Driver.Modbus.Contracts.csproj`
- Move: `src/Drivers/ZB.MOM.WW.OtOpcUa.Driver.Modbus/ModbusDriverOptions.cs``src/Drivers/ZB.MOM.WW.OtOpcUa.Driver.Modbus.Contracts/ModbusDriverOptions.cs`
- Modify: `src/Drivers/ZB.MOM.WW.OtOpcUa.Driver.Modbus/ZB.MOM.WW.OtOpcUa.Driver.Modbus.csproj` (add ProjectReference)
- Modify: `ZB.MOM.WW.OtOpcUa.slnx`
Follow the Phase 1 pattern above.
### Task 1.2: AbCip contracts
**Classification:** small
**Estimated implement time:** ~3 min
**Parallelizable with:** Task 1.1, 1.3 1.9
**Files:**
- Create: `src/Drivers/ZB.MOM.WW.OtOpcUa.Driver.AbCip.Contracts/ZB.MOM.WW.OtOpcUa.Driver.AbCip.Contracts.csproj`
- Move: `src/Drivers/ZB.MOM.WW.OtOpcUa.Driver.AbCip/AbCipDriverOptions.cs` → contracts project
- Modify: `src/Drivers/ZB.MOM.WW.OtOpcUa.Driver.AbCip/ZB.MOM.WW.OtOpcUa.Driver.AbCip.csproj`, `ZB.MOM.WW.OtOpcUa.slnx`
### Task 1.3: AbLegacy contracts
**Classification:** small · **Estimated implement time:** ~3 min · **Parallelizable with:** 1.11.9 (except itself)
**Files:** Create `src/Drivers/ZB.MOM.WW.OtOpcUa.Driver.AbLegacy.Contracts/*`, move `AbLegacyDriverOptions.cs`, update `Driver.AbLegacy.csproj` + slnx.
### Task 1.4: S7 contracts
**Classification:** small · **Estimated implement time:** ~3 min · **Parallelizable with:** 1.11.9 (except itself)
**Files:** Create `src/Drivers/ZB.MOM.WW.OtOpcUa.Driver.S7.Contracts/*`, move `S7DriverOptions.cs`, update `Driver.S7.csproj` + slnx.
### Task 1.5: TwinCAT contracts
**Classification:** small · **Estimated implement time:** ~3 min · **Parallelizable with:** 1.11.9 (except itself)
**Files:** Create `src/Drivers/ZB.MOM.WW.OtOpcUa.Driver.TwinCAT.Contracts/*`, move `TwinCATDriverOptions.cs`, update `Driver.TwinCAT.csproj` + slnx.
### Task 1.6: FOCAS contracts
**Classification:** small · **Estimated implement time:** ~3 min · **Parallelizable with:** 1.11.9 (except itself)
**Files:** Create `src/Drivers/ZB.MOM.WW.OtOpcUa.Driver.FOCAS.Contracts/*`, move `FocasDriverOptions.cs`, update `Driver.FOCAS.csproj` + slnx.
### Task 1.7: OpcUaClient contracts
**Classification:** small · **Estimated implement time:** ~3 min · **Parallelizable with:** 1.11.9 (except itself)
**Files:** Create `src/Drivers/ZB.MOM.WW.OtOpcUa.Driver.OpcUaClient.Contracts/*`, move `OpcUaClientDriverOptions.cs`, update `Driver.OpcUaClient.csproj` + slnx.
### Task 1.8: Galaxy contracts
**Classification:** small · **Estimated implement time:** ~3 min · **Parallelizable with:** 1.11.9 (except itself)
**Files:** Create `src/Drivers/ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Contracts/*`, move `Config/GalaxyDriverOptions.cs` (place at root of contracts project — drop the `Config/` subdir), update `Driver.Galaxy.csproj` + slnx.
### Task 1.9: Wonderware Historian client contracts
**Classification:** small · **Estimated implement time:** ~3 min · **Parallelizable with:** 1.11.9 (except itself)
**Files:** Create `src/Drivers/ZB.MOM.WW.OtOpcUa.Driver.Historian.Wonderware.Client.Contracts/*`, move `WonderwareHistorianClientOptions.cs`, update `Driver.Historian.Wonderware.Client.csproj` + slnx.
### Task 1.10: Validate the full solution + add ProbeTimeout property to each Options class
**Classification:** standard
**Estimated implement time:** ~5 min
**Parallelizable with:** none (depends on 1.11.9)
**Files:**
- Modify: each of the 9 `*DriverOptions.cs` files just moved into the contracts projects.
**Step 1:** `dotnet build ZB.MOM.WW.OtOpcUa.slnx` — clean.
**Step 2:** Add a `ProbeTimeout` property to each Options class with a driver-appropriate default:
```csharp
/// <summary>Timeout for the AdminUI Test Connect probe. Server-side max = 60s.</summary>
[Display(Name = "Probe timeout (seconds)", Description = "Connection test timeout. Default {n}s.", GroupName = "Diagnostics")]
[Range(1, 60)]
public int ProbeTimeoutSeconds { get; init; } = <default>;
```
Defaults: Modbus 5, AbCip 5, AbLegacy 5, S7 5, TwinCAT 10, FOCAS 10, OpcUaClient 15, Galaxy 30, Wonderware Historian 15.
**Step 3:** `dotnet build ZB.MOM.WW.OtOpcUa.slnx` — clean (any driver runtime code that constructs the Options via positional record syntax may break — fix by using `with { ProbeTimeoutSeconds = N }` or making it a property with default).
**Step 4:** `dotnet test ZB.MOM.WW.OtOpcUa.slnx` — all existing tests still pass.
**Step 5:** Commit. `feat(drivers): expose ProbeTimeoutSeconds on every driver Options class`
---
## Phase 2 — Shared section components
### Task 2.1: DriverFormShell.razor
**Classification:** small
**Estimated implement time:** ~4 min
**Parallelizable with:** Task 2.2, 2.3
**Files:**
- Create: `src/Server/ZB.MOM.WW.OtOpcUa.AdminUI/Components/Shared/Drivers/DriverFormShell.razor`
**Step 1:** Implement panel chrome (`<section class="panel rise">` + `<div class="panel-head">`) with `Title`, `ChildContent`, `Footer` render fragments. Cancel/Save/Delete buttons via parameters: `OnSave` `EventCallback`, `OnCancel` `EventCallback`, `OnDelete` `EventCallback?` (null hides delete). `Busy` bool drives spinner + disabled. Error banner from `Error` string param.
**Step 2:** Pattern-match the existing `DriverEdit.razor` save bar (lines 116128) — same visual layout.
**Step 3:** No code-behind logic; pure presentation.
**Step 4:** `dotnet build src/Server/ZB.MOM.WW.OtOpcUa.AdminUI` — clean.
**Step 5:** Commit. `feat(adminui): add DriverFormShell shared component`
### Task 2.2: DriverIdentitySection.razor
**Classification:** small
**Estimated implement time:** ~4 min
**Parallelizable with:** Task 2.1, 2.3
**Files:**
- Create: `src/Server/ZB.MOM.WW.OtOpcUa.AdminUI/Components/Shared/Drivers/DriverIdentitySection.razor`
**Step 1:** Component renders the identity fields lifted from `DriverEdit.razor` lines 3888: `DriverInstanceId` (read-only when not new), `Name`, `NamespaceId` (select from `Namespace[]` passed in), `Enabled`. Bind via `IdentityModel` record passed as `@bind-Value`.
**Step 2:** Define `IdentityModel` record in the same file's `@code` block: `public sealed record IdentityModel { ... }`. Properties match the existing `FormModel` Identity fields, with their `[Required]` / `[RegularExpression]` attributes preserved.
**Step 3:** Component takes `IsNew` bool, `Namespaces` list.
**Step 4:** Build clean.
**Step 5:** Commit. `feat(adminui): add DriverIdentitySection shared component`
### Task 2.3: DriverResilienceSection.razor
**Classification:** small
**Estimated implement time:** ~3 min
**Parallelizable with:** Task 2.1, 2.2
**Files:**
- Create: `src/Server/ZB.MOM.WW.OtOpcUa.AdminUI/Components/Shared/Drivers/DriverResilienceSection.razor`
**Step 1:** For this PR, keep the existing JSON textarea for Polly overrides — typed-form-ifying Polly is out of scope (Section 10 of the design says so implicitly). The component wraps the textarea + help text from `DriverEdit.razor` lines 101109 in a panel.
**Step 2:** Bind `[Parameter] public string? ResilienceConfig { get; set; }` + `EventCallback<string?> ResilienceConfigChanged`.
**Step 3:** Build clean.
**Step 4:** Commit. `feat(adminui): add DriverResilienceSection shared component`
### Task 2.4: Wire the three new sections into existing DriverEdit.razor
**Classification:** small
**Estimated implement time:** ~4 min
**Parallelizable with:** none (depends on 2.12.3)
**Files:**
- Modify: `src/Server/ZB.MOM.WW.OtOpcUa.AdminUI/Components/Pages/Clusters/DriverEdit.razor`
**Step 1:** Replace lines 3888 with `<DriverIdentitySection @bind-Value="_identityModel" Namespaces="_namespaces" IsNew="IsNew" />`. Wire `_identityModel` to/from `_form` in `OnInitializedAsync` and `SubmitAsync`.
**Step 2:** Replace lines 101109 with `<DriverResilienceSection @bind-ResilienceConfig="_form.ResilienceConfig" />`.
**Step 3:** Wrap the form in `<DriverFormShell Busy="_busy" Error="_error" OnSave="SubmitAsync" OnCancel="@(...)" OnDelete="@(IsNew ? null : DeleteAsync)">`.
**Step 4:** Smoke test: `dotnet run --project src/Server/ZB.MOM.WW.OtOpcUa.Host` (admin role), open `/clusters/<existing>/drivers/<existing>`, page renders identically to before. (Driver config JSON textarea + identity fields + save bar visually unchanged.)
**Step 5:** Commit. `refactor(adminui): drive DriverEdit.razor through shared section components`
---
## Phase 3 — Router + Type Picker
### Task 3.1: DriverTypePicker.razor
**Classification:** small
**Estimated implement time:** ~4 min
**Parallelizable with:** Task 3.2
**Files:**
- Create: `src/Server/ZB.MOM.WW.OtOpcUa.AdminUI/Components/Pages/Clusters/Drivers/DriverTypePicker.razor`
**Step 1:** `@page "/clusters/{ClusterId}/drivers/new"` (this *replaces* the same route on the existing `DriverEdit.razor` — order matters; we'll yank the old route in Task 3.3).
**Step 2:** Renders a grid of 9 driver-type cards (`ModbusTcp`, `AbCip`, `AbLegacy`, `S7`, `TwinCat`, `FOCAS`, `OpcUaClient`, `Galaxy`, `Historian.Wonderware`). Each card is a `<a href="/clusters/@ClusterId/drivers/new/<type-slug>">` linking to the typed new-form route. Type slug = lowercase driver-type string (e.g. `modbustcp` → keep human-readable; map slug → DriverType enum-string in a static dictionary in this file).
**Step 3:** Card content: driver type name, one-line description, an icon (text symbol fine — `[M]`, `[7]`, `[OPC]`, etc., no new images this PR).
**Step 4:** `<ClusterNav ClusterId="@ClusterId" ActiveTab="drivers" />` for consistency with peer pages.
**Step 5:** Build clean. Commit. `feat(adminui): add DriverTypePicker landing page`
### Task 3.2: DriverEditRouter.razor
**Classification:** small
**Estimated implement time:** ~4 min
**Parallelizable with:** Task 3.1
**Files:**
- Create: `src/Server/ZB.MOM.WW.OtOpcUa.AdminUI/Components/Pages/Clusters/Drivers/DriverEditRouter.razor`
**Step 1:** `@page "/clusters/{ClusterId}/drivers/{DriverInstanceId}"` — same edit route as today's `DriverEdit.razor` (will collide; resolve in Task 3.3).
**Step 2:** In `OnInitializedAsync`: load the `DriverInstance` row, read its `DriverType` string.
**Step 3:** Map `DriverType` → component type via a static dictionary literal `_componentMap`:
```csharp
private static readonly IReadOnlyDictionary<string, Type> _componentMap = new Dictionary<string, Type>(StringComparer.OrdinalIgnoreCase) {
["ModbusTcp"] = typeof(ModbusDriverPage),
["AbCip"] = typeof(AbCipDriverPage),
["AbLegacy"] = typeof(AbLegacyDriverPage),
["S7"] = typeof(S7DriverPage),
["TwinCat"] = typeof(TwinCatDriverPage),
["Focas"] = typeof(FocasDriverPage),
["OpcUaClient"] = typeof(OpcUaClientDriverPage),
["Galaxy"] = typeof(GalaxyDriverPage),
["Historian.Wonderware"] = typeof(HistorianWonderwareDriverPage),
};
```
**Step 4:** Render `<DynamicComponent Type="_componentMap[_driverType]" Parameters="_params" />` where `_params = new Dictionary<string, object?> { ["ClusterId"] = ClusterId, ["DriverInstanceId"] = DriverInstanceId }`.
**Step 5:** Until the typed pages exist (Phase 4), the map is empty + this page falls back to a "not yet implemented for type X" notice. Keep route collision deferred until Task 3.3.
**Step 6:** Build clean. Commit. `feat(adminui): add DriverEditRouter dispatch page`
### Task 3.3: Resolve route collision — delete old new-route, keep old edit-route until Phase 5
**Classification:** trivial
**Estimated implement time:** ~2 min
**Parallelizable with:** none (depends on 3.1)
**Files:**
- Modify: `src/Server/ZB.MOM.WW.OtOpcUa.AdminUI/Components/Pages/Clusters/DriverEdit.razor`
**Step 1:** Delete line 1 (`@page "/clusters/{ClusterId}/drivers/new"`). Keep line 2 (`@page "/clusters/{ClusterId}/drivers/{DriverInstanceId}"`). Until Task 3.4 unhooks it too, the old generic edit page still owns the edit route — `DriverEditRouter.razor` from Task 3.2 stays inert (build fine, but unreachable).
**Step 2:** Build clean.
**Step 3:** Smoke test: `/clusters/<id>/drivers/new` now hits `DriverTypePicker.razor`. `/clusters/<id>/drivers/<existing>` still hits `DriverEdit.razor`.
**Step 4:** Commit. `refactor(adminui): hand /drivers/new to DriverTypePicker`
### Task 3.4: Hand /drivers/{id} from DriverEdit.razor to DriverEditRouter.razor
**Classification:** trivial
**Estimated implement time:** ~2 min
**Parallelizable with:** none (depends on 3.3)
**Files:**
- Modify: `src/Server/ZB.MOM.WW.OtOpcUa.AdminUI/Components/Pages/Clusters/DriverEdit.razor`
**Step 1:** Delete the remaining `@page` directive — file no longer routes. The router from Task 3.2 owns the route. The DriverEdit `.razor` file stays on disk as a referenceable component used as a fallback inside the router until all 9 typed pages land (Phase 4).
**Step 2:** Update `DriverEditRouter.razor` `_componentMap` so any driver type *not yet implemented* falls back to `typeof(DriverEdit)` — passing parameters identically. This keeps every existing driver row editable through whichever editor (typed or generic) is available at the time the row is opened.
**Step 3:** Build clean.
**Step 4:** Smoke test: open `/clusters/<id>/drivers/<existing-modbus-row>`. Router dispatches → falls back to `DriverEdit.razor` (since `ModbusDriverPage` doesn't exist yet) → page renders as before.
**Step 5:** Commit. `refactor(adminui): route /drivers/{id} through DriverEditRouter`
---
## Phase 4 — Typed driver pages (one per driver)
**Pattern (apply to every task in this phase):**
Each `<Type>DriverPage.razor` is a self-contained page with:
1. Route(s):
- `@page "/clusters/{ClusterId}/drivers/new/<type-slug>"` (new path)
- The router (Task 3.2) dispatches the edit case — the page itself does NOT declare the edit route; it accepts `[Parameter] public string? DriverInstanceId { get; set; }` and the router passes it.
2. Wraps everything in `<DriverFormShell>` (Task 2.1).
3. Top: `<DriverIdentitySection>` (Task 2.2).
4. Middle: a `<section class="panel">` per logical group of driver options. Each `<InputText>` / `<InputNumber>` / `<InputSelect>` is *explicitly* bound to a property on a form model (`<Type>FormModel`) inside `@code`. Field labels + help text come from the `[Display(Name, Description, GroupName)]` attributes on the `Options` class — but read via `ModelMetadata`, NOT via reflection at render time. (Implementation hint: use a static helper `static string Label<T>(Expression<Func<T,object?>> path)` that pops `[Display]` off at compile-time — simpler is to just hard-code the label in markup and treat `[Display]` as the redundant runtime hint for the API/validator. **Hard-coding labels is the chosen path — keep the page Razor explicit.**)
5. Below: `<DriverTestConnectButton DriverType="<Type>" GetConfigJson="@BuildConfigJson" TimeoutSeconds="@_form.ProbeTimeoutSeconds" />` (real component lands in Phase 7; for Phase 4 ship a stub component that just disables the button and shows "Available after Phase 7" — defined in Phase 7 Task 7.5).
6. Below: `<DriverStatusPanel DriverInstanceId="@DriverInstanceId" Enabled="@_form.Enabled" />` (only visible in edit mode, not new — stub lands in Phase 6).
7. Below: `<DriverResilienceSection>` (Task 2.3).
8. Tag picker: opens `<DriverTagPicker>` modal with the driver's picker body (Phase 9).
9. Save path: serializes the typed form model to JSON via `JsonSerializer.Serialize(_form.Config, _jsonOpts)`, normalizes (same as today's `NormalizeJson`), upserts the `DriverInstance` row with `RowVersion` opt-concurrency. Match the existing save flow in `DriverEdit.razor:187-257` line-for-line for the upsert mechanics.
10. Load path: if `DriverInstanceId != null`, load row, `JsonSerializer.Deserialize<<Type>Options>(row.DriverConfig, _jsonOpts)` where `_jsonOpts = new() { UnmappedMemberHandling = JsonUnmappedMemberHandling.Skip }`. Wrap into the form model.
11. After save, navigate to `/clusters/{ClusterId}/drivers` (the list page — match existing behavior).
**Per-driver task acceptance:**
- Page compiles, routes resolve.
- For a new instance: typed form save round-trips to DB; row's `DriverType` is set; `DriverConfig` JSON contains every field shown in the form.
- For an edit: page loads existing row; every field populates; save preserves all fields.
- Update `DriverEditRouter.razor` `_componentMap` to point this driver type at the new page.
- Update `ClusterDrivers.razor` (the list page) — no change needed; it already links via the unified edit route.
- Add a unit test in `tests/Server/ZB.MOM.WW.OtOpcUa.AdminUI.Tests/<Type>DriverPageFormSerializationTests.cs` round-tripping `<Type>Options` ↔ JSON ↔ form-model ↔ `<Type>Options`. Use Shouldly.
### Task 4.1: ModbusDriverPage.razor
**Classification:** standard
**Estimated implement time:** ~5 min
**Parallelizable with:** 4.2 4.9
**Files:**
- Create: `src/Server/ZB.MOM.WW.OtOpcUa.AdminUI/Components/Pages/Clusters/Drivers/ModbusDriverPage.razor`
- Create: `tests/Server/ZB.MOM.WW.OtOpcUa.AdminUI.Tests/ModbusDriverPageFormSerializationTests.cs`
- Modify: `src/Server/ZB.MOM.WW.OtOpcUa.AdminUI/Components/Pages/Clusters/Drivers/DriverEditRouter.razor` (`_componentMap`)
- Modify: `src/Server/ZB.MOM.WW.OtOpcUa.AdminUI/ZB.MOM.WW.OtOpcUa.AdminUI.csproj` (add `ProjectReference` to `Driver.Modbus.Contracts`)
Follow Phase 4 pattern. Modbus is the largest options class (289 lines); group into panels: **Transport** (endpoint, port, unit ID), **Polling** (interval, batch sizes), **Probe** (probe options + `ProbeTimeoutSeconds`), **Tuning** (timeouts, retries). Read `ModbusDriverOptions.cs` first to enumerate every property.
### Task 4.2: AbCipDriverPage.razor
**Classification:** standard
**Estimated implement time:** ~5 min
**Parallelizable with:** 4.1, 4.3 4.9
**Files:** as 4.1, swap `Modbus``AbCip`. Add `ProjectReference` to `Driver.AbCip.Contracts`. Read `AbCipDriverOptions.cs` to enumerate fields. PLC family selector (CompactLogix / ControlLogix) is a key field.
### Task 4.3: AbLegacyDriverPage.razor
**Classification:** standard · **Estimated implement time:** ~5 min · **Parallelizable with:** 4.1, 4.2, 4.44.9
**Files:** as 4.1 for AbLegacy. Read `AbLegacyDriverOptions.cs`.
### Task 4.4: S7DriverPage.razor
**Classification:** standard · **Estimated implement time:** ~5 min · **Parallelizable with:** 4.14.3, 4.54.9
**Files:** as 4.1 for S7. CPU/rack/slot tuple in a Connection panel.
### Task 4.5: TwinCatDriverPage.razor
**Classification:** standard · **Estimated implement time:** ~5 min · **Parallelizable with:** 4.14.4, 4.64.9
**Files:** as 4.1 for TwinCAT. AMS NetId + port in a Connection panel.
### Task 4.6: FocasDriverPage.razor
**Classification:** standard · **Estimated implement time:** ~5 min · **Parallelizable with:** 4.14.5, 4.74.9
**Files:** as 4.1 for FOCAS. CNC series + connection params.
### Task 4.7: OpcUaClientDriverPage.razor
**Classification:** standard · **Estimated implement time:** ~5 min · **Parallelizable with:** 4.14.6, 4.8, 4.9
**Files:** as 4.1 for OpcUaClient. **Security profile (None / Basic256Sha256-Sign / Basic256Sha256-SignAndEncrypt)** is a dropdown sourced from the same enum the OPC UA Server uses (cross-ref `docs/security.md`). Username/password are `[DataType(DataType.Password)]`.
### Task 4.8: GalaxyDriverPage.razor
**Classification:** standard · **Estimated implement time:** ~5 min · **Parallelizable with:** 4.14.7, 4.9
**Files:** as 4.1 for Galaxy. Two panels: **mxaccessgw** (gateway endpoint, API key — password input), **Galaxy** (ClientName, SQL config db connection if exposed in options). API key field is `[DataType(DataType.Password)]`.
### Task 4.9: HistorianWonderwareDriverPage.razor
**Classification:** standard · **Estimated implement time:** ~5 min · **Parallelizable with:** 4.14.8
**Files:** as 4.1 for the Wonderware Historian (the page covers the *driver*'s view of historian client options — the `WonderwareHistorianClientOptions` lives in the `.Client.Contracts` project from Task 1.9). The driver type-string in DriverInstance is `Historian.Wonderware`.
---
## Phase 5 — Delete the generic DriverEdit.razor
### Task 5.1: Remove DriverEdit.razor + fallback in router
**Classification:** small
**Estimated implement time:** ~3 min
**Parallelizable with:** none (depends on all 4.x tasks)
**Files:**
- Delete: `src/Server/ZB.MOM.WW.OtOpcUa.AdminUI/Components/Pages/Clusters/DriverEdit.razor`
- Modify: `src/Server/ZB.MOM.WW.OtOpcUa.AdminUI/Components/Pages/Clusters/Drivers/DriverEditRouter.razor` (remove fallback)
**Step 1:** Confirm `_componentMap` in the router has all 9 driver types. Delete the "fallback to DriverEdit" branch.
**Step 2:** `git rm src/Server/ZB.MOM.WW.OtOpcUa.AdminUI/Components/Pages/Clusters/DriverEdit.razor`.
**Step 3:** Build clean. `dotnet test ZB.MOM.WW.OtOpcUa.slnx` — clean.
**Step 4:** Smoke test all 9 driver types: open the list page, open one existing row of each type, verify the typed page renders. (For types without existing rows on dev DB, create one via the type picker first.)
**Step 5:** Commit. `refactor(adminui): retire generic DriverEdit.razon (typed pages cover all 9 drivers)`
---
## Phase 6 — Live status panel
### Task 6.1: DriverHealthChanged DPS message
**Classification:** small
**Estimated implement time:** ~3 min
**Parallelizable with:** Task 6.2 (different folders)
**Files:**
- Create: `src/Core/ZB.MOM.WW.OtOpcUa.Commons/Messages/Drivers/DriverHealthChanged.cs`
**Step 1:** Define record `public sealed record DriverHealthChanged(string ClusterId, string DriverInstanceId, string State, DateTime? LastSuccessUtc, string? LastError, int ErrorCount5Min, DateTime PublishedUtc);``State` is the `DriverState` enum string (matches `Healthy` / `Connecting` / `Faulted` / `Unknown` used by `DriverHealth`).
**Step 2:** Add `[MemoryPackable]` if peer messages in this folder use MemoryPack — match the folder's existing pattern. (Look at `src/Core/ZB.MOM.WW.OtOpcUa.Commons/Messages/Fleet/FleetStatusChanged.cs` for the canonical pattern.)
**Step 3:** Build clean. Commit. `feat(messages): add DriverHealthChanged DPS contract`
### Task 6.2: Publish DriverHealthChanged from each driver actor
**Classification:** high-risk
**Estimated implement time:** ~5 min (per driver; bundle = ~15 min — **split this into 6.2a6.2d if implementer balks**)
**Parallelizable with:** none (touches every driver actor)
**Files:**
- Modify: each `<Type>Driver.cs` — find the place that updates `_health` (e.g. `FocasDriver.cs:565+`, `ModbusDriver.cs:1180+`). After every `Volatile.Write(ref _health, ...)`, also publish to DPS topic `driver-health-{ClusterId}` via the injected `DistributedPubSub.Get(_actorSystem).Mediator`.
- Find pattern: `Volatile.Write(ref _health,` — every occurrence in `src/Drivers/**/*.cs` not under obj/bin.
**Step 1:** Inject the publish callback into each driver. The cleanest hook is `IDriverHealthPublisher` (new interface in `Core.Abstractions`), with the Akka-backed impl living in `Runtime` and DI-registered there. Driver constructors take `IDriverHealthPublisher` (nullable for backward compat in tests).
**Step 2:** After each `_health` write, call `_healthPublisher?.Publish(new DriverHealthChanged(...))`. Pull `ClusterId` + `DriverInstanceId` from the driver's existing identity (every driver already knows its instance ID for telemetry tags).
**Step 3:** Add `ErrorCount5Min` tracking: a sliding-window counter on the driver — bump on every transition into `Faulted`, decay over 5min. Simple impl: a `Queue<DateTime>` guarded by lock; on read, dequeue entries older than 5min and return `.Count`.
**Step 4:** `dotnet build` clean. `dotnet test` clean. Driver unit tests may need a no-op `IDriverHealthPublisher` (provide one in Core.Abstractions: `public sealed class NullDriverHealthPublisher : IDriverHealthPublisher { public void Publish(DriverHealthChanged _) { } }`).
**Step 5:** Commit. `feat(drivers): publish DriverHealthChanged to DPS on every health transition`
### Task 6.3: DriverStatusHub
**Classification:** small
**Estimated implement time:** ~4 min
**Parallelizable with:** Task 6.4
**Files:**
- Create: `src/Server/ZB.MOM.WW.OtOpcUa.AdminUI/Hubs/DriverStatusHub.cs`
- Modify: `src/Server/ZB.MOM.WW.OtOpcUa.AdminUI/Hubs/HubServiceCollectionExtensions.cs` (register hub)
- Modify: `src/Server/ZB.MOM.WW.OtOpcUa.AdminUI/Hubs/HubRouteBuilderExtensions.cs` (`MapHub<DriverStatusHub>("/hubs/driverstatus")`)
**Step 1:** Hub with one method: `Task JoinDriver(string driverInstanceId)` — adds the connection to a group named `driver:{driverInstanceId}`, immediately invokes `Clients.Caller.SendAsync("status", currentSnapshot)`. Inject `IDriverStatusSnapshotStore` (new — Task 6.4) to read the current snapshot.
**Step 2:** Hub method-name constant: `public const string MethodName = "status";`.
**Step 3:** `[Microsoft.AspNetCore.Authorization.Authorize]` on the class (same auth as the existing AdminUI hubs).
**Step 4:** Build clean. Commit. `feat(adminui): add DriverStatusHub`
### Task 6.4: DriverStatusSignalRBridge + snapshot store
**Classification:** standard
**Estimated implement time:** ~5 min
**Parallelizable with:** Task 6.3
**Files:**
- Create: `src/Server/ZB.MOM.WW.OtOpcUa.AdminUI/Hubs/DriverStatusSignalRBridge.cs`
- Create: `src/Server/ZB.MOM.WW.OtOpcUa.AdminUI/Hubs/IDriverStatusSnapshotStore.cs`
- Create: `src/Server/ZB.MOM.WW.OtOpcUa.AdminUI/Hubs/InMemoryDriverStatusSnapshotStore.cs`
- Modify: `src/Server/ZB.MOM.WW.OtOpcUa.AdminUI/Hubs/HubServiceCollectionExtensions.cs` (spawn bridge actor on admin-role startup; DI-register snapshot store as singleton)
**Step 1:** Bridge = Akka `ReceiveActor`. `PreStart` subscribes to DPS topic `driver-health-*` (or one topic + per-cluster filter — pick the same wildcard convention `FleetStatusSignalRBridge` uses). On `DriverHealthChanged msg`: writes to `IDriverStatusSnapshotStore` (latest-snapshot-wins, keyed by instance ID), then `_hub.Clients.Group($"driver:{msg.DriverInstanceId}").SendAsync(DriverStatusHub.MethodName, msg)`.
**Step 2:** `InMemoryDriverStatusSnapshotStore`: `ConcurrentDictionary<string, DriverHealthChanged> _byInstance;``Upsert(msg)` and `TryGet(instanceId, out msg)`.
**Step 3:** Wire bridge in `AddOtOpcUaSignalRBridges` (or equivalent). Singleton snapshot store. Build clean.
**Step 4:** Commit. `feat(adminui): add DriverStatusSignalRBridge + snapshot store`
### Task 6.5: DriverStatusPanel.razor
**Classification:** standard
**Estimated implement time:** ~5 min
**Parallelizable with:** none (consumes 6.3 + 6.4)
**Files:**
- Create: `src/Server/ZB.MOM.WW.OtOpcUa.AdminUI/Components/Shared/Drivers/DriverStatusPanel.razor`
**Step 1:** Parameters: `DriverInstanceId`, `Enabled`. Inject `NavigationManager` for hub URL building, `IServiceProvider` for hub-connection auth.
**Step 2:** `OnInitializedAsync`: if `Enabled == false`, render "Disabled — not deployed" + skip the hub. Else build a `HubConnection` (`HubConnectionBuilder``WithUrl(Nav.ToAbsoluteUri("/hubs/driverstatus"))``WithAutomaticReconnect()``Build()`), register `On<DriverHealthChanged>("status", ...)`, `StartAsync`, then `InvokeAsync("JoinDriver", DriverInstanceId)`. The handler updates `_snapshot` + `StateHasChanged`.
**Step 3:** Render: state chip (color-mapped: `Healthy` green, `Connecting` yellow, `Faulted` red, `Unknown` gray) + "last success {humanized timestamp}" + `ErrorCount5Min` badge + collapsible "last error" panel showing `LastError` if set. Visual reuses existing `chip` / `panel` styles from sibling pages.
**Step 4:** `_lastSnapshotAt = DateTime.UtcNow` on each push; if `(now - _lastSnapshotAt) > 30s` (timer-driven re-render every 5s), add `dim` class to the whole panel.
**Step 5:** `DisposeAsync`: `await _hub.DisposeAsync()`. Implement `IAsyncDisposable`.
**Step 6:** Wire into all 9 driver pages. In edit mode (i.e. `DriverInstanceId != null`), render `<DriverStatusPanel DriverInstanceId="@DriverInstanceId" Enabled="@_form.Identity.Enabled" />` above the resilience section.
**Step 7:** Build clean. Smoke test: bring up `lmxopcua-fix up modbus`, deploy a Modbus driver pointing at the sim, observe `Healthy` push within seconds. Stop the sim, observe `Faulted` push within the driver's poll interval. Commit. `feat(adminui): live driver status panel on every driver page`
---
## Phase 7 — Test Connect
### Task 7.1: IDriverProbe interface + TestDriverConnect message
**Classification:** small
**Estimated implement time:** ~4 min
**Parallelizable with:** Task 7.2
**Files:**
- Create: `src/Core/ZB.MOM.WW.OtOpcUa.Core.Abstractions/IDriverProbe.cs`
- Create: `src/Core/ZB.MOM.WW.OtOpcUa.Commons/Messages/Admin/TestDriverConnect.cs`
- Create: `src/Core/ZB.MOM.WW.OtOpcUa.Commons/Messages/Admin/TestDriverConnectResult.cs`
**Step 1:** `IDriverProbe`:
```csharp
public interface IDriverProbe
{
/// <summary>Driver-type string this probe handles (matches DriverInstance.DriverType).</summary>
string DriverType { get; }
/// <summary>Run a connection probe. Never mutates; never writes.</summary>
Task<DriverProbeResult> ProbeAsync(string configJson, TimeSpan timeout, CancellationToken ct);
}
public sealed record DriverProbeResult(bool Ok, string? Message, TimeSpan? Latency);
```
**Step 2:** `TestDriverConnect(string DriverType, string ConfigJson, int TimeoutSeconds, Guid CorrelationId)` and `TestDriverConnectResult(bool Ok, string? Message, double? LatencyMs, Guid CorrelationId)`. Match the MemoryPack conventions of peer messages in `Messages/Admin/`.
**Step 3:** Build clean. Commit. `feat(messages,abstractions): add IDriverProbe + TestDriverConnect contract`
### Task 7.2: AdminOperationsActor handler for TestDriverConnect
**Classification:** standard
**Estimated implement time:** ~5 min
**Parallelizable with:** Task 7.1 (separate file; modify late)
**Files:**
- Modify: `src/Server/ZB.MOM.WW.OtOpcUa.ControlPlane/AdminOperations/AdminOperationsActor.cs`
**Step 1:** Add ctor param `IEnumerable<IDriverProbe> probes`. Build `_probesByType = probes.ToDictionary(p => p.DriverType, StringComparer.OrdinalIgnoreCase)` in ctor. Update `Props` factory accordingly.
**Step 2:** `ReceiveAsync<TestDriverConnect>(HandleTestDriverConnectAsync)`. Handler:
```csharp
private async Task HandleTestDriverConnectAsync(TestDriverConnect msg)
{
var replyTo = Sender;
if (!_probesByType.TryGetValue(msg.DriverType, out var probe))
{
replyTo.Tell(new TestDriverConnectResult(false, $"No probe registered for driver type '{msg.DriverType}'", null, msg.CorrelationId));
return;
}
var clampedSec = Math.Clamp(msg.TimeoutSeconds, 1, 60);
using var cts = new CancellationTokenSource(TimeSpan.FromSeconds(clampedSec));
try
{
var sw = Stopwatch.StartNew();
var result = await probe.ProbeAsync(msg.ConfigJson, TimeSpan.FromSeconds(clampedSec), cts.Token);
replyTo.Tell(new TestDriverConnectResult(result.Ok, result.Message, sw.Elapsed.TotalMilliseconds, msg.CorrelationId));
}
catch (OperationCanceledException)
{
replyTo.Tell(new TestDriverConnectResult(false, $"Probe timed out after {clampedSec}s", null, msg.CorrelationId));
}
catch (Exception ex)
{
_log.Error(ex, "Probe for {DriverType} threw", msg.DriverType);
replyTo.Tell(new TestDriverConnectResult(false, ex.Message, null, msg.CorrelationId));
}
}
```
**Step 3:** Update wherever `AdminOperationsActor.Props(...)` is called (search the repo) to pass the new `probes` enumerable. Likely in `Runtime` DI registration — register all `IDriverProbe` impls then resolve `IEnumerable<IDriverProbe>` for the singleton.
**Step 4:** Build clean. Commit. `feat(adminops): handle TestDriverConnect via per-driver IDriverProbe`
### Task 7.3: TCP-only probe impls (Modbus, AbCip, AbLegacy, S7)
**Classification:** standard
**Estimated implement time:** ~5 min
**Parallelizable with:** Task 7.4
**Files:**
- Create: `src/Drivers/ZB.MOM.WW.OtOpcUa.Driver.Modbus/ModbusDriverProbe.cs`
- Create: `src/Drivers/ZB.MOM.WW.OtOpcUa.Driver.AbCip/AbCipDriverProbe.cs`
- Create: `src/Drivers/ZB.MOM.WW.OtOpcUa.Driver.AbLegacy/AbLegacyDriverProbe.cs`
- Create: `src/Drivers/ZB.MOM.WW.OtOpcUa.Driver.S7/S7DriverProbe.cs`
**Step 1:** Each impl deserializes its own Options class from the configJson, extracts `Endpoint` (host:port — exact field name depends on the Options class), opens a TCP `Socket` with `ConnectAsync(host, port, ct)`, closes immediately. On success: `(true, null, sw.Elapsed)`. On `SocketException`: `(false, ex.SocketErrorCode.ToString(), null)`.
**Step 2:** Register each as `services.AddSingleton<IDriverProbe, ModbusDriverProbe>()` (and peers) in the driver's existing `*FactoryExtensions.cs` `Add*` method.
**Step 3:** Build clean. Commit. `feat(drivers): TCP-only probes for Modbus, AbCip, AbLegacy, S7`
### Task 7.4: Specialty probes (FOCAS, TwinCAT, OpcUaClient, Galaxy, Historian.Wonderware)
**Classification:** standard
**Estimated implement time:** ~5 min (per driver; bundle ~15 min — **may need to split into 7.4a7.4e**)
**Parallelizable with:** Task 7.3
**Files:**
- Create: `src/Drivers/ZB.MOM.WW.OtOpcUa.Driver.FOCAS/FocasDriverProbe.cs` — calls FOCAS `cnc_allclibhndl3` connect + immediate `cnc_freelibhndl`. Reuses the existing `IFocasClient.Connect`.
- Create: `src/Drivers/ZB.MOM.WW.OtOpcUa.Driver.TwinCAT/TwinCATDriverProbe.cs` — opens an `AmsAddress` and sends an ADS Read State request.
- Create: `src/Drivers/ZB.MOM.WW.OtOpcUa.Driver.OpcUaClient/OpcUaClientDriverProbe.cs` — opens an `OPCFoundation.NetStandard.Opc.Ua.Client.Session` against the configured endpoint with the configured security profile, immediately closes.
- Create: `src/Drivers/ZB.MOM.WW.OtOpcUa.Driver.Galaxy/Health/GalaxyDriverProbe.cs` — sends `MxCommand.Ping` to mxaccessgw via the existing gRPC client. (Build on the existing `Health/` folder.)
- Create: `src/Drivers/ZB.MOM.WW.OtOpcUa.Driver.Historian.Wonderware/HistorianWonderwareDriverProbe.cs` — TCP probe to historian endpoint (historian client uses an MX-style transport; cheap path is a TCP connect to the historian's IPC port + close).
**Step 1:** Each impl registers in its driver's `Add*` extension.
**Step 2:** Build clean. `dotnet test` clean. Commit. `feat(drivers): specialty Test Connect probes for FOCAS/TwinCAT/OPCUA/Galaxy/Historian`
### Task 7.5: DriverTestConnectButton.razor
**Classification:** small
**Estimated implement time:** ~4 min
**Parallelizable with:** none (consumes 7.2 + 7.3 + 7.4)
**Files:**
- Create: `src/Server/ZB.MOM.WW.OtOpcUa.AdminUI/Components/Shared/Drivers/DriverTestConnectButton.razor`
- Create: `src/Server/ZB.MOM.WW.OtOpcUa.AdminUI/Clients/AdminProbeService.cs`
**Step 1:** `AdminProbeService` — thin wrapper around `IAdminOperationsClient`. Method `Task<TestDriverConnectResult> TestAsync(string driverType, string configJson, int timeoutSeconds, CancellationToken ct)`. Builds the message + Asks + applies a `Task.WhenAny` 65s timeout wall as outer guard. DI-register as scoped.
**Step 2:** Button component params: `DriverType`, `GetConfigJson` (`Func<string>`), `TimeoutSeconds` (`int`). Renders `<button class="btn btn-outline-primary btn-sm">Test Connect</button>` + inline result chip (green tick + latency, or red x + message). Spinner during in-flight. Auto-clears chip after 30s.
**Step 3:** On click: invokes `AdminProbeService.TestAsync(DriverType, GetConfigJson(), TimeoutSeconds, ct)` with `CancellationToken.None` (the actor-side timeout already bounds it).
**Step 4:** Wire into all 9 driver pages by replacing the Phase 4 stub.
**Step 5:** Build clean. Smoke test: open `/clusters/<id>/drivers/new/modbustcp`, type sim endpoint into form, click Test Connect → green. Wrong port → red within 5s. Black-holed IP → "Probe timed out after 5s".
**Step 6:** Commit. `feat(adminui): Test Connect button on every driver page`
---
## Phase 8 — Reconnect / Restart
### Task 8.1: RestartDriver + ReconnectDriver messages + AdminOperationsActor handlers
**Classification:** standard
**Estimated implement time:** ~5 min
**Parallelizable with:** Task 8.2 (separate files)
**Files:**
- Create: `src/Core/ZB.MOM.WW.OtOpcUa.Commons/Messages/Admin/RestartDriver.cs`
- Create: `src/Core/ZB.MOM.WW.OtOpcUa.Commons/Messages/Admin/ReconnectDriver.cs`
- Modify: `src/Server/ZB.MOM.WW.OtOpcUa.ControlPlane/AdminOperations/AdminOperationsActor.cs`
**Step 1:** Messages: `RestartDriver(string ClusterId, string DriverInstanceId, string ActorByUserName, Guid CorrelationId)` + `RestartDriverResult(bool Ok, string? Message, Guid CorrelationId)`. Same shape for `Reconnect*`.
**Step 2:** Handlers in the actor. They locate the running driver actor (the existing `DriverHostActor` hierarchy already addresses driver actors by instance ID — find the existing lookup mechanism in `DriverHostActor.cs` / `Runtime` and reuse it). Reconnect = Tell the driver actor a `Reconnect` internal command; Restart = Tell its supervisor to stop+restart the child.
**Step 3:** Audit-log every call via the existing `ConfigEdits` mechanism — entity type `DriverInstance`, fields `{op: restart|reconnect}`.
**Step 4:** Build clean. Commit. `feat(adminops): Restart/Reconnect driver operations`
### Task 8.2: DriverOperator authorization policy
**Classification:** small
**Estimated implement time:** ~4 min
**Parallelizable with:** Task 8.1
**Files:**
- Modify: `src/Server/ZB.MOM.WW.OtOpcUa.Security/` — add a `DriverOperator` policy alongside the existing `WriteOperate` / `WriteTune` policies. Map to LDAP group `ot-driver-operator` (or document the chosen group name in `docs/Security.md`).
- Modify: `src/Server/ZB.MOM.WW.OtOpcUa.AdminUI/` — register the policy with `AddAuthorizationBuilder().AddPolicy("DriverOperator", p => p.RequireRole("ot-driver-operator"))` (or whatever pattern the existing AdminUI policies use).
- Modify: `docs/Security.md` — add a row to the role/policy table.
**Step 1:** Mirror the shape of the most-similar existing policy (probably `WriteOperate`).
**Step 2:** Build clean. Commit. `feat(security): add DriverOperator authorization policy`
### Task 8.3: Wire Reconnect/Restart buttons into DriverStatusPanel
**Classification:** small
**Estimated implement time:** ~4 min
**Parallelizable with:** none (depends on 8.1 + 8.2 + 6.5)
**Files:**
- Modify: `src/Server/ZB.MOM.WW.OtOpcUa.AdminUI/Components/Shared/Drivers/DriverStatusPanel.razor`
**Step 1:** Inject `IAuthorizationService`. In `OnInitializedAsync`, check `AuthorizeAsync(user, null, "DriverOperator")` → set `_canOperate` bool. Render buttons only when `_canOperate && Enabled`.
**Step 2:** Two buttons:
- **Reconnect** — no confirm. Click: spinner on button, set inline "Reconnecting…" chip, invoke `AdminOperationsClient.AskAsync<ReconnectDriverResult>(new ReconnectDriver(...))`. Result chip clears once next `DriverHealthChanged` push arrives.
- **Restart** — confirm dialog "Restart driver `<id>`? This briefly interrupts subscriptions." Same flow otherwise.
**Step 3:** Both buttons disabled (greyed-out) during in-flight ops or during a Test Connect on the same page (publish a simple page-scoped `bool _busyAnything` via the parent driver page → flows to the panel via a parameter).
**Step 4:** Build clean. Smoke test: Reconnect → see `Connecting → Healthy` transition push in the panel. Restart → confirm → see actor restart. Commit. `feat(adminui): Reconnect/Restart on DriverStatusPanel (DriverOperator-gated)`
---
## Phase 9 — Static address pickers
### Task 9.1: DriverTagPicker.razor modal shell
**Classification:** small
**Estimated implement time:** ~4 min
**Parallelizable with:** 9.29.10
**Files:**
- Create: `src/Server/ZB.MOM.WW.OtOpcUa.AdminUI/Components/Shared/Drivers/DriverTagPicker.razor`
**Step 1:** Modal shell. Params: `Visible` (`bool`), `OnClose` (`EventCallback`), `Title` (string, e.g. "Modbus address"), `ChildContent` (`RenderFragment`), `OnPickAddress` (`EventCallback<string>`). Renders a Bootstrap-style `.modal.show` (no JS interop — Razor-managed visibility class). The child fragment is the per-driver picker body.
**Step 2:** Includes a search box + "Use this address" button at the bottom; "Use" calls `OnPickAddress` with the value currently bound in the child.
**Step 3:** Build clean. Commit. `feat(adminui): DriverTagPicker modal shell`
### Task 9.2 9.10: Per-driver static picker bodies (9 tasks)
**Classification:** small
**Estimated implement time:** ~3 min each
**Parallelizable with:** each other (different files)
For each driver, create `src/Server/ZB.MOM.WW.OtOpcUa.AdminUI/Components/Shared/Drivers/Pickers/<Type>AddressPickerBody.razor`. Per Section 6.2 of the design:
| Task | Driver | Picker body |
|---|---|---|
| 9.2 | Modbus | Register-type dropdown (Coil/DiscreteInput/Holding/Input) + offset spinner + length → renders `4x00001-4`. |
| 9.3 | AbCip | Tag name + element index; CompactLogix/ControlLogix hint from form. |
| 9.4 | AbLegacy | File type (N/B/F/I/O/S/T/C/R) + file number + element. |
| 9.5 | S7 | Area (DB/M/I/Q) + db-number + offset + S7 type → `DB10.DBD20:REAL`. |
| 9.6 | TwinCat | Free-text ADS variable name + format hint. |
| 9.7 | FOCAS | Parameter group dropdown + parameter ID; drives the FOCAS function-code lookup table. |
| 9.8 | OpcUaClient | Free-text NodeId field. (Live browse deferred — Section 10.) |
| 9.9 | Galaxy | Free-text `tag_name.AttributeName` field. (Live browse deferred.) |
| 9.10 | Historian.Wonderware | Tag name + retrieval mode + interval. |
Each task:
**Step 1:** Create the per-driver picker body component.
**Step 2:** Add a small unit test in `tests/Server/ZB.MOM.WW.OtOpcUa.AdminUI.Tests/<Type>AddressBuilderTests.cs` — for the static address builders that compute a string (Modbus, S7, FOCAS), assert input → string output. For free-text bodies (OpcUaClient, Galaxy), test pass-through.
**Step 3:** Wire into the matching `*DriverPage.razor` — add a "Pick address" button that toggles `<DriverTagPicker>` open with this body as its child.
**Step 4:** Build clean. Commit per task. `feat(adminui): <Type> address picker`
---
## Phase 10 — End-to-end verification
### Task 10.1: DriverTestConnectE2eTests
**Classification:** standard
**Estimated implement time:** ~5 min
**Parallelizable with:** 10.2, 10.3
**Files:**
- Create: `tests/Server/ZB.MOM.WW.OtOpcUa.Host.IntegrationTests/DriverTestConnectE2eTests.cs`
**Step 1:** Use the existing Docker fixture pattern from peer tests in this project. Three test methods: Modbus, AbCip, S7 — each starts the corresponding `lmxopcua-fix up <driver>` fixture (the test project's fixture base class handles it) + asserts green probe vs sim, red probe vs wrong port, timeout vs `1.2.3.4:502` (black-holed).
**Step 2:** `dotnet test tests/Server/ZB.MOM.WW.OtOpcUa.Host.IntegrationTests --filter DriverTestConnectE2eTests` — passes.
**Step 3:** Commit. `test(adminui): E2E Test Connect probes against Docker sims`
### Task 10.2: DriverReconnectE2eTests
**Classification:** standard
**Estimated implement time:** ~5 min
**Parallelizable with:** 10.1, 10.3
**Files:**
- Create: `tests/Server/ZB.MOM.WW.OtOpcUa.Host.IntegrationTests/DriverReconnectE2eTests.cs`
**Step 1:** Start a Modbus driver against the sim, observe `Healthy`, dispatch `ReconnectDriver` via the in-cluster admin ops client, assert `Connecting → Healthy` transitions within 5s.
**Step 2:** Build + run. Commit. `test(adminui): E2E Reconnect operation`
### Task 10.3: DriverStatusHubE2eTests
**Classification:** standard
**Estimated implement time:** ~5 min
**Parallelizable with:** 10.1, 10.2
**Files:**
- Create: `tests/Server/ZB.MOM.WW.OtOpcUa.Host.IntegrationTests/DriverStatusHubE2eTests.cs`
**Step 1:** Open a SignalR connection to `/hubs/driverstatus`, invoke `JoinDriver`, force a `DriverHealthChanged` via test seam (publish directly to the DPS topic), assert push received within 1s.
**Step 2:** Build + run. Commit. `test(adminui): E2E DriverStatusHub push`
### Task 10.4: Manual smoke checklist (documented, not automated)
**Classification:** trivial
**Estimated implement time:** ~2 min
**Parallelizable with:** none
**Files:**
- Modify: `docs/plans/2026-05-28-adminui-driver-pages-design.md` — replace Section 8.3 stub with the actual checklist as run, with timestamps.
**Step 1:** Run the checklist (Section 8.3 of the design). Tick each item.
**Step 2:** Commit. `docs(plans): record AdminUI driver pages smoke-test results`
---
## Out-of-scope (documented follow-ups)
These are NOT part of this plan. Capture as separate work items after merge:
- Live OPC UA browse in OpcUaClient picker.
- Live Galaxy hierarchy browse in Galaxy picker.
- Historian.Wonderware tag list pulled from the historian store.
- DriverStatusPanel history graphs + per-tag diagnostics.
- Per-driver bespoke controls beyond Reconnect/Restart.
- Polly resilience config typed-form (still a JSON textarea this PR).
---
## Cross-cutting verification (run before final PR)
1. `dotnet build ZB.MOM.WW.OtOpcUa.slnx` — clean.
2. `dotnet test ZB.MOM.WW.OtOpcUa.slnx` — clean.
3. `lmxopcua-fix up modbus`, then run the manual smoke (10.4).
4. Review `git diff --stat master..` — confirm scope matches plan (no surprise file changes).
5. Confirm `OtOpcUa-docs-issues.md` shows no new XML-doc warnings introduced by the new code (run `commentchecker-aot` on the AdminUI + Drivers/* trees).
@@ -0,0 +1,76 @@
{
"planPath": "docs/plans/2026-05-28-adminui-driver-pages-plan.md",
"designPath": "docs/plans/2026-05-28-adminui-driver-pages-design.md",
"tasks": [
{"id": "0.1", "subject": "Create AdminUI test project + slnx entry + placeholder test", "status": "completed", "commit": "dc12c37"},
{"id": "1.1", "subject": "Driver.Modbus.Contracts — extract ModbusDriverOptions", "status": "completed", "blockedBy": ["0.1"], "commit": "5058a56", "notes": "Has 1 ProjectReference to Modbus.Addressing (sibling zero-dep enum project) — design intent preserved."},
{"id": "1.2", "subject": "Driver.AbCip.Contracts — extract AbCipDriverOptions", "status": "completed", "blockedBy": ["0.1"], "commit": "b474d63", "notes": "AbCipDataType enum moved with Options; extensions split into runtime."},
{"id": "1.3", "subject": "Driver.AbLegacy.Contracts — extract AbLegacyDriverOptions", "status": "completed", "blockedBy": ["0.1"], "commit": "4902295", "notes": "AbLegacyDataType + AbLegacyPlcFamilyProfile also moved; extensions split."},
{"id": "1.4", "subject": "Driver.S7.Contracts — extract S7DriverOptions", "status": "completed", "blockedBy": ["0.1"], "commit": "9f62f2c", "notes": "Parallel S7CpuType enum (7 values) + S7CpuTypeMap in runtime; S7.Cli + 2 tests fixed for type change."},
{"id": "1.5", "subject": "Driver.TwinCAT.Contracts — extract TwinCATDriverOptions", "status": "completed", "blockedBy": ["0.1"], "commit": "a88721c", "notes": "TwinCATDataType enum moved; extensions split."},
{"id": "1.6", "subject": "Driver.FOCAS.Contracts — extract FocasDriverOptions", "status": "completed", "blockedBy": ["0.1"], "commit": "d892ab9", "notes": "FocasCncSeries + FocasDataType enums moved; extensions split."},
{"id": "1.7", "subject": "Driver.OpcUaClient.Contracts — extract OpcUaClientDriverOptions", "status": "completed", "blockedBy": ["0.1"], "commit": "5f0e048", "notes": "All 4 enums self-contained in options file; no NuGet types leaked."},
{"id": "1.8", "subject": "Driver.Galaxy.Contracts — extract GalaxyDriverOptions", "status": "completed", "blockedBy": ["0.1"], "commit": "5ffbc42", "notes": "Moved from Config/ subdir to contracts root; namespace preserved."},
{"id": "1.9", "subject": "Driver.Historian.Wonderware.Client.Contracts — extract options", "status": "completed", "blockedBy": ["0.1"], "commit": "8c0a320", "notes": "Pure record, primitives only."},
{"id": "1.10", "subject": "Add ProbeTimeoutSeconds to all 9 Options classes + slnx validation", "status": "completed", "blockedBy": ["1.1","1.2","1.3","1.4","1.5","1.6","1.7","1.8","1.9"], "commit": "f2f6eeb"},
{"id": "2.1", "subject": "DriverFormShell.razor", "status": "completed", "blockedBy": ["0.1"], "commit": "85af126"},
{"id": "2.2", "subject": "DriverIdentitySection.razor", "status": "completed", "blockedBy": ["0.1"], "commit": "1ff3875", "notes": "Bonus ValidationMessage tags added."},
{"id": "2.3", "subject": "DriverResilienceSection.razor", "status": "completed", "blockedBy": ["0.1"], "commit": "a008530"},
{"id": "2.4", "subject": "Wire shared sections into existing DriverEdit.razor", "status": "completed", "blockedBy": ["2.1","2.2","2.3"], "commit": "a28f4cd", "notes": "Net -74 lines; zero functional regression."},
{"id": "3.1", "subject": "DriverTypePicker.razor (route: /drivers/new)", "status": "completed", "blockedBy": ["2.4"], "commit": "c0ce5d0"},
{"id": "3.2", "subject": "DriverEditRouter.razor with DynamicComponent dispatch","status": "completed", "blockedBy": ["2.4"], "commit": "55e8bf7"},
{"id": "3.3", "subject": "Hand /drivers/new from DriverEdit to DriverTypePicker","status": "completed", "blockedBy": ["3.1"], "commit": "27b3a01", "notes": "Bundled with 3.4 — single commit removed both @page directives."},
{"id": "3.4", "subject": "Hand /drivers/{id} from DriverEdit to DriverEditRouter (fallback to DriverEdit)", "status": "completed", "blockedBy": ["3.2","3.3"], "commit": "27b3a01"},
{"id": "4.0", "subject": "AdminUI csproj references all 9 Driver.*.Contracts", "status": "completed", "blockedBy": ["1.10","3.4"], "commit": "7014c93", "notes": "Inserted as a precondition for parallel 4.1-4.9 implementation."},
{"id": "4.1", "subject": "ModbusDriverPage.razor + serialization test", "status": "completed", "blockedBy": ["4.0"], "commit": "a3073d1"},
{"id": "4.2", "subject": "AbCipDriverPage.razor + serialization test", "status": "completed", "blockedBy": ["4.0"], "commit": "dc21cba"},
{"id": "4.3", "subject": "AbLegacyDriverPage.razor + serialization test", "status": "completed", "blockedBy": ["4.0"], "commit": "059a621"},
{"id": "4.4", "subject": "S7DriverPage.razor + serialization test", "status": "completed", "blockedBy": ["4.0"], "commit": "5cad9b2"},
{"id": "4.5", "subject": "TwinCatDriverPage.razor + serialization test", "status": "completed", "blockedBy": ["4.0"], "commit": "dfbf679"},
{"id": "4.6", "subject": "FocasDriverPage.razor + serialization test", "status": "completed", "blockedBy": ["4.0"], "commit": "8149739"},
{"id": "4.7", "subject": "OpcUaClientDriverPage.razor + serialization test", "status": "completed", "blockedBy": ["4.0"], "commit": "efcc231"},
{"id": "4.8", "subject": "GalaxyDriverPage.razor + serialization test", "status": "completed", "blockedBy": ["4.0"], "commit": "a243cfd"},
{"id": "4.9", "subject": "HistorianWonderwareDriverPage.razor + serialization test", "status": "completed", "blockedBy": ["4.0"], "commit": "2c16062"},
{"id": "4.10","subject": "Wire all 9 typed pages into DriverEditRouter._componentMap", "status": "completed", "blockedBy": ["4.1","4.2","4.3","4.4","4.5","4.6","4.7","4.8","4.9"], "commit": "5f8fa70"},
{"id": "4.11","subject": "Fixup: S7 Tags data-loss + missing FormModel tests (post-review)", "status": "completed", "blockedBy": ["4.10"], "commit": "c4086c2"},
{"id": "5.1", "subject": "Delete DriverEdit.razor + remove fallback in DriverEditRouter", "status": "completed", "blockedBy": ["4.1","4.2","4.3","4.4","4.5","4.6","4.7","4.8","4.9"], "commit": "a971db3"},
{"id": "6.1", "subject": "DriverHealthChanged DPS message contract", "status": "pending", "blockedBy": ["5.1"]},
{"id": "6.2", "subject": "Publish DriverHealthChanged from each driver actor (IDriverHealthPublisher)", "status": "pending", "blockedBy": ["6.1"]},
{"id": "6.3", "subject": "DriverStatusHub", "status": "pending", "blockedBy": ["6.1"]},
{"id": "6.4", "subject": "DriverStatusSignalRBridge + InMemoryDriverStatusSnapshotStore", "status": "pending", "blockedBy": ["6.2","6.3"]},
{"id": "6.5", "subject": "DriverStatusPanel.razor + wire into all 9 driver pages", "status": "pending", "blockedBy": ["6.4"]},
{"id": "7.1", "subject": "IDriverProbe interface + TestDriverConnect messages", "status": "pending", "blockedBy": ["5.1"]},
{"id": "7.2", "subject": "AdminOperationsActor handler for TestDriverConnect", "status": "pending", "blockedBy": ["7.1"]},
{"id": "7.3", "subject": "TCP probes (Modbus, AbCip, AbLegacy, S7)", "status": "pending", "blockedBy": ["7.1"]},
{"id": "7.4", "subject": "Specialty probes (FOCAS, TwinCAT, OPCUA, Galaxy, Historian)", "status": "pending", "blockedBy": ["7.1"]},
{"id": "7.5", "subject": "AdminProbeService + DriverTestConnectButton.razor + wire into pages", "status": "pending", "blockedBy": ["7.2","7.3","7.4"]},
{"id": "8.1", "subject": "RestartDriver + ReconnectDriver messages + AdminOperationsActor handlers", "status": "pending", "blockedBy": ["6.5","7.5"]},
{"id": "8.2", "subject": "DriverOperator authorization policy + docs/Security.md update", "status": "pending", "blockedBy": ["6.5"]},
{"id": "8.3", "subject": "Wire Reconnect/Restart buttons into DriverStatusPanel", "status": "pending", "blockedBy": ["8.1","8.2"]},
{"id": "9.1", "subject": "DriverTagPicker.razor modal shell", "status": "pending", "blockedBy": ["5.1"]},
{"id": "9.2", "subject": "Modbus address picker body + unit test", "status": "pending", "blockedBy": ["9.1"]},
{"id": "9.3", "subject": "AbCip address picker body + unit test", "status": "pending", "blockedBy": ["9.1"]},
{"id": "9.4", "subject": "AbLegacy address picker body + unit test", "status": "pending", "blockedBy": ["9.1"]},
{"id": "9.5", "subject": "S7 address picker body + unit test", "status": "pending", "blockedBy": ["9.1"]},
{"id": "9.6", "subject": "TwinCat address picker body + unit test", "status": "pending", "blockedBy": ["9.1"]},
{"id": "9.7", "subject": "FOCAS address picker body + unit test", "status": "pending", "blockedBy": ["9.1"]},
{"id": "9.8", "subject": "OpcUaClient picker body (free-text NodeId)", "status": "pending", "blockedBy": ["9.1"]},
{"id": "9.9", "subject": "Galaxy picker body (free-text tag_name.AttributeName)", "status": "pending", "blockedBy": ["9.1"]},
{"id": "9.10","subject": "Historian.Wonderware picker body + unit test", "status": "pending", "blockedBy": ["9.1"]},
{"id": "10.1", "subject": "DriverTestConnectE2eTests (Modbus/AbCip/S7 vs Docker sims)", "status": "pending", "blockedBy": ["8.3","9.10"]},
{"id": "10.2", "subject": "DriverReconnectE2eTests", "status": "pending", "blockedBy": ["8.3","9.10"]},
{"id": "10.3", "subject": "DriverStatusHubE2eTests", "status": "pending", "blockedBy": ["8.3","9.10"]},
{"id": "10.4", "subject": "Manual smoke checklist (documented)", "status": "pending", "blockedBy": ["10.1","10.2","10.3"]}
],
"lastUpdated": "2026-05-28"
}
@@ -0,0 +1,313 @@
# Live address browsers for OpcUaClient + Galaxy drivers — design
> **Status:** approved 2026-05-28. Implementation plan to follow via `writing-plans`.
> **Builds on:** PR that shipped driver-specific AdminUI pages (commit `0d3ec46`).
> Both `OpcUaClientAddressPickerBody.razor` and `GalaxyAddressPickerBody.razor` were
> intentionally shipped as static stubs ("enter the string manually") with live
> browse deferred to this follow-up.
**Goal:** Add lazy, ad-hoc browse trees to the OpcUaClient and Galaxy address pickers in the AdminUI, so operators can navigate the remote server's (or galaxy's) hierarchy and pick an address rather than typing it.
**Architecture:** A new `IDriverBrowser` abstraction registered per driver type (parallel to the runtime's `IDriverProbe`), with implementations housed in sibling `*.Browser` projects under `src/Drivers/`. AdminUI owns the live browse sessions in-process via a `BrowseSessionRegistry` singleton with a 2-minute idle TTL and an `IHostedService` reaper. Razor picker bodies talk to a scoped `IBrowserSessionService`; no actor messages on the hot path.
**Tech stack:** .NET 10 / Blazor Server / OPCFoundation.NetStandard.Opc.Ua.Client / `ZB.MOM.WW.MxGateway.Client` (sibling repo, lazy-browse API already shipped).
---
## 1. Architecture
### Abstraction
```csharp
// Commons (shared)
public interface IDriverBrowser {
string DriverType { get; } // "OpcUaClient", "Galaxy", ...
Task<IBrowseSession> OpenAsync(string configJson, CancellationToken ct);
}
public interface IBrowseSession : IAsyncDisposable {
Guid Token { get; }
DateTime LastUsedUtc { get; }
Task<IReadOnlyList<BrowseNode>> RootAsync(CancellationToken ct);
Task<IReadOnlyList<BrowseNode>> ExpandAsync(string nodeId, CancellationToken ct);
Task<IReadOnlyList<AttributeInfo>> AttributesAsync(string nodeId, CancellationToken ct); // empty for OPC UA
}
public sealed record BrowseNode(
string NodeId, // address persisted on commit
string DisplayName,
BrowseNodeKind Kind, // Folder | Leaf
bool HasChildrenHint);
public sealed record AttributeInfo(
string Name, // e.g. "DownloadPath"
string DriverDataType,
bool IsArray,
string SecurityClass); // FreeAccess | Operate | Tune | Configure | ViewOnly
public enum BrowseNodeKind { Folder, Leaf }
```
### Session lifecycle
1. Razor picker body calls `BrowserSessionService.OpenAsync(driverType, formJson)`
2. Service resolves `IDriverBrowser` from DI by driver type, calls `OpenAsync(json)`
3. Returns `IBrowseSession`; service registers it in `BrowseSessionRegistry` under a new `Guid` token
4. Razor stores token, calls `RootAsync(token)` to populate the initial tree
5. Each subsequent expand-click calls `ExpandAsync(token, nodeId)`
6. Picker body's `IAsyncDisposable.DisposeAsync` fires `CloseAsync(token)` on tear-down
7. `BrowseSessionReaper` (`IHostedService`) ticks every 30s, evicts any session where `(UtcNow - LastUsedUtc) > 2 min`, awaits `DisposeAsync`
The session genuinely has no value to other cluster nodes — it's tied to one circuit. Hosting it in-process avoids cross-cluster Ask latency on every folder click.
---
## 2. Components
### New projects
| Path | Purpose |
|---|---|
| `src/Drivers/ZB.MOM.WW.OtOpcUa.Driver.OpcUaClient.Browser/` | OPC UA browser impl + session |
| `src/Drivers/ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Browser/` | Galaxy browser impl + session |
| `tests/Drivers/ZB.MOM.WW.OtOpcUa.Driver.OpcUaClient.Browser.Tests/` | Unit tests (use opc-plc fixture) |
| `tests/Drivers/ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Browser.Tests/` | Unit tests (fake transport) |
Driver-specific browsers live in **sibling** projects so AdminUI doesn't drag the runtime `Driver.*` projects (and their full SDK chains) through a transitive reference.
### New abstractions
| Path | Purpose |
|---|---|
| `src/Core/ZB.MOM.WW.OtOpcUa.Commons/Browsing/IDriverBrowser.cs` | Per-driver factory |
| `src/Core/ZB.MOM.WW.OtOpcUa.Commons/Browsing/IBrowseSession.cs` | Session contract |
| `src/Core/ZB.MOM.WW.OtOpcUa.Commons/Browsing/BrowseNode.cs` | + `BrowseNodeKind` enum + `AttributeInfo` |
### AdminUI plumbing
| Path | Purpose |
|---|---|
| `src/Server/.../AdminUI/Browsing/BrowseSessionRegistry.cs` | Singleton, `ConcurrentDictionary<Guid, IBrowseSession>` |
| `src/Server/.../AdminUI/Browsing/BrowseSessionReaper.cs` | `IHostedService`, 30s tick, 2 min idle TTL |
| `src/Server/.../AdminUI/Browsing/IBrowserSessionService.cs` | Scoped DI service for Razor |
| `src/Server/.../AdminUI/Browsing/BrowserSessionService.cs` | Impl: resolve driver, register session, enforce per-call timeouts |
| `src/Server/.../AdminUI/Components/Shared/Drivers/DriverBrowseTree.razor` | Shared lazy tree component with per-node text filter |
### Modified files
| Path | Change |
|---|---|
| `src/Server/.../Pickers/OpcUaClientAddressPickerBody.razor` | Add Browse button + DriverBrowseTree; keep manual entry |
| `src/Server/.../Pickers/GalaxyAddressPickerBody.razor` | Same shape + side-panel for attribute pick |
| `src/Server/.../AdminUI/Program.cs` | Register `IDriverBrowser` services + registry + reaper |
| `src/Drivers/.../OpcUaClient.Contracts/NamespaceMap.cs` | Extract from runtime `Driver.OpcUaClient` for shared use |
| `ZB.MOM.WW.OtOpcUa.slnx` | Add the four new projects |
---
## 3. Data flow
**Open → tree → pick** (OpcUaClient as worked example; Galaxy identical except attribute side-panel before commit):
```
Razor picker body BrowserSessionService IDriverBrowser Remote
| | | |
click Browse ────────► OpenAsync(driverType, json) ─► OpenAsync(json) ────────► connect + activate session
| ◄──────────────── token (Guid) ◄───── ISession |
| | | |
render tree ─────────► RootAsync(token) ─────────────► session.RootAsync ─────► BrowseAsync(ObjectsFolder)
| ◄──────────────── BrowseNode[] ◄───── refs |
| | | |
click folder ────────► ExpandAsync(token, nodeId) ──► session.ExpandAsync ───► BrowseAsync(nodeId)
| ◄──────────────── BrowseNode[] ◄───── refs |
| | | |
click leaf + commit ─► CloseAsync(token) ─────────► session.DisposeAsync ───► CloseSession
| | | |
```
**Galaxy two-stage attribute pick:** after the user selects an object (Folder) in the tree, the picker body calls `AttributesAsync(token, tagName)` and renders the result as a side-panel. The user picks an attribute; the committed address is `tag_name.AttributeName`.
**Stable address format:**
- OpcUaClient: `nsu=<uri>;<localid>` via `NamespaceMap.ToStableReference` — survives remote namespace-table reorder across restarts
- Galaxy: `tag_name` (the globally unique system name) — already stable by definition
**Per-node text filter:** purely client-side over the already-loaded `node.Children`. No round-trip on filter input.
---
## 4. OpcUaClient browser specifics
### Connection
- Reuses `OpcUaClientDriverOptions` (deserialize with `UnmappedMemberHandling.Skip`)
- Builds a **separate** `ApplicationConfiguration` from the runtime driver — PKI root at `%LocalAppData%/OtOpcUa/adminui-browse-pki/` (separate cert store)
- `ApplicationName = "OtOpcUa AdminUI Browse"`, `ApplicationUri = "urn:OtOpcUa:AdminUI:Browse"`
- Endpoint selection: same `DiscoveryClient.GetEndpointsAsync` → filter `(policy, mode)` as the runtime driver
- One endpoint only (no failover) — interactive use; user retries with different URL on failure
- Bounded by `OpcUaClientDriverOptions.PerEndpointConnectTimeout` (clamped [5, 30]s)
### Namespace map
- `NamespaceMap` class extracted to `OpcUaClient.Contracts` so both runtime and Browser projects share one impl
- Browser builds the map from the live session on open; uses `ToStableReference` for outbound NodeIds; uses `TryResolve` for inbound
### Lazy browse
- One level per click using `Session.BrowseAsync` + `BrowseNextAsync` continuation-point loop
- `BrowseDescriptionCollection` filters to `NodeClass.Object | NodeClass.Variable`, `ResultMask = BrowseName | DisplayName | NodeClass`
- `BrowseNode.HasChildrenHint = (Kind == Folder)` — heuristic; saves a per-node round-trip
- Inside-session calls guarded by `SemaphoreSlim _gate` (same pattern as runtime driver — OPC UA `Session.BrowseAsync` not thread-safe)
### Cert handling
- `AutoAcceptCertificates = true` honored with parity to runtime + log warning + per-session unwire on dispose
- `AutoAcceptCertificates = false` + untrusted cert → `OpenAsync` fails with SDK error message in the UI
### Reconnect handling
- None. Browse sessions are short-lived (2 min idle TTL). Keep-alive failure → UI surfaces error chip → user re-clicks Browse.
---
## 5. Galaxy browser specifics
### Connection
- Reuses `GalaxyDriverOptions` (deserialize with `UnmappedMemberHandling.Skip`)
- Opens `MxGatewaySession` with `ClientName = "OtOpcUa-AdminUI-Browse"` — distinct from runtime driver's name so the gateway can attribute load
- Per-call gateway client built via `session.GalaxyRepository(opts.GalaxyName)`
### Lazy browse
- Root: `client.BrowseAsync(new BrowseChildrenOptions(), ct)``IReadOnlyList<LazyBrowseNode>`
- Expand: cached `LazyBrowseNode` lookup by `tag_name`, then `node.ExpandAsync(ct)` (gateway client handles paging internally)
- No internal gate — `LazyBrowseNode.ExpandAsync` already has its own lock; gateway client is thread-safe across distinct calls
### Two-stage attribute pick
- Galaxy `BrowseNode.Kind` is always `Folder` — leaves don't exist at tree level
- When the user clicks an object node, picker body calls `AttributesAsync(token, tagName)` and shows the result as a side-panel listing `(Name, DriverDataType, IsArray, SecurityClass)`
- On attribute click, committed address is `$"{tagName}.{attrName}"`
- Backing call: either `BrowseChildrenOptions { IncludeAttributes = true }` filtered to the GobjectId, or a dedicated `GetAttributesAsync(GobjectId, ct)` — to be confirmed during plan write against the gateway client surface
### Filters in v1
- Per-node text filter (client-side) for tree navigation
- Server-side filters (`TagNameGlob`, `AlarmBearingOnly`, `HistorizedOnly`) deferred to a follow-up — easy to add later without breaking the wire (the session is constructed today with `new BrowseChildrenOptions()`)
---
## 6. Error handling, timeouts, TTL
### Failures
- `OpenAsync` → catches `Exception`, logs Info, returns typed `BrowseOpenResult(Ok: false, Message, Token: Empty)`. UI shows red chip with truncated SDK message
- `ExpandAsync` / `AttributesAsync` → same shape per-call. Failed branch shows error chip; rest of tree intact; session stays alive
- `BrowseSessionNotFoundException` when token unknown (session reaped or never existed)
### Timeouts
- Per-call expand/attributes: **20 s** via `CTS.CreateLinkedTokenSource(callerCt)` in `BrowserSessionService`
- Session open: **30 s** ceiling; OPC UA reuses `PerEndpointConnectTimeout` (default 10 s), Galaxy hardcodes 30 s for `MxGatewaySession.OpenAsync`
### TTL & reaping
- `LastUsedUtc` set on every `RootAsync`/`ExpandAsync`/`AttributesAsync`
- Reaper: `IHostedService` with `PeriodicTimer(30s)`. On each tick: snapshot keys; for any session with `(UtcNow - LastUsedUtc) > 120s`: `TryRemove` then `await DisposeAsync` outside the dictionary
- Concurrent `ExpandAsync` racing eviction → caller catches closed-session error → service translates to `BrowseSessionNotFoundException`
- On AdminUI shutdown: `StopAsync` walks the registry once and disposes all sessions
### Concurrency
- `BrowseSessionRegistry` = `ConcurrentDictionary<Guid, IBrowseSession>` — no extra lock
- OpcUaClient session serializes browse on `SemaphoreSlim`; Galaxy session relies on its internal locks
### Component dispose
- Razor picker body implements `IAsyncDisposable`
- Fires `CloseAsync(token)` fire-and-forget (no await) so circuit teardown isn't blocked by a gRPC roundtrip
- Reaper is the safety net if dispose doesn't fire
### Logging
- Serilog. Info at open + close, Debug at close-with-reason (`user-close | idle-ttl | shutdown`), Info on failure
- No per-expand logging (noise)
### Audit trail
- None — browse is read-only and doesn't mutate config or driver state (matches probe pattern)
---
## 7. Security & auth
### Role gating
- Browse button gated by existing `DriverOperator` LDAP policy — same as Reconnect/Restart in `DriverStatusPanel`
- Picker bodies check policy in `OnInitializedAsync` via `IAuthorizationService` and `AuthenticationStateProvider`
- Manual entry stays available regardless of role
### Credentials in JSON
- Form JSON posted to `BrowserSessionService.OpenAsync` contains plaintext passwords / API keys — same as the existing `TestDriverConnect` probe
- JSON is deserialized into typed Options → used to build SDK config → both released; no `_lastConfigJson` cached field anywhere in the registry or session impls
- Browse session tokens are `Guid.NewGuid()` and only ever cross the authenticated Blazor circuit
### Cert handling
- `AutoAcceptCertificates = true` honored with log warning + per-session unwire on dispose
- Browse PKI store separate from runtime PKI — browse-time accept doesn't poison the runtime driver's trust store
### Rate limiting
- None. DriverOperator role gating + 2-minute TTL is the budget. A bad actor with DriverOperator already has Reconnect/Restart capability
### Multi-replica AdminUI
- Sticky cookies (already configured via Traefik) pin a user to one replica → `BrowseSessionRegistry` is always co-located with the circuit that created the token
- Failover → token invalid on new replica → UI re-opens gracefully
---
## 8. Testing
### Unit tests — per-driver browsers
- `tests/Drivers/.../OpcUaClient.Browser.Tests/`: against opc-plc at `opc.tcp://10.100.0.35:50000`. `OpcUaClientBrowseSessionTests`, `OpcUaClientDriverBrowserTests` (bad endpoint, auth rejected, bad JSON)
- `tests/Drivers/.../Galaxy.Browser.Tests/`: fake `IGalaxyRepositoryClientTransport` (precedent in gateway-client repo). `GalaxyBrowseSessionTests`, `GalaxyDriverBrowserTests`
### Unit tests — AdminUI plumbing (added to existing `tests/Server/AdminUI.Tests/`)
- `BrowseSessionRegistryTests`: register/get/remove, concurrent registration
- `BrowseSessionReaperTests`: virtual time, idle eviction, non-idle preservation, eviction-vs-in-flight-expand race
- `BrowserSessionServiceTests`: open→root→expand→close, unknown driver type, per-call timeout enforced
### Component tests
- `DriverBrowseTree` lazy-expand contract with fake `IBrowserSessionService`; per-node filter filters DOM but does not call ExpandAsync; click caching
- Picker bodies: Browse button hidden when `!_canOperate`; manual entry still works
### Integration tests (opt-in, fixture-gated)
- `tests/Drivers/.../OpcUaClient.Browser.IntegrationTests/`: end-to-end against opc-plc, 3-level expand + round-trip resolve. Skipped unless `OPCUA_SIM_ENDPOINT` set
- No Galaxy integration suite in v1 (requires wonder-app-vd03; deferred)
### Specific regression tests
- Namespace-stable round-trip: open → browse → take returned NodeId string → `ExpandAsync(string)` → must resolve back to same NodeId
- TTL reaper racing live ExpandAsync: `TryRemove` while expand is in-flight → safe, translates to `BrowseSessionNotFoundException`
### Verification at PR time
- `dotnet build ZB.MOM.WW.OtOpcUa.slnx` clean
- `dotnet test tests/Server/.../AdminUI.Tests/` green (existing 51 + new ~12)
- `dotnet test tests/Drivers/.../OpcUaClient.Browser.Tests/` with `lmxopcua-fix up opcuaclient`
- `dotnet test tests/Drivers/.../Galaxy.Browser.Tests/` (no fixture)
- Manual smoke: run AdminUI, edit an OpcUaClient driver, click Browse against opc-plc, pick a variable, verify the stored NodeId reads cleanly via Client CLI
---
## 9. Implementation sequencing (for plan-writing)
Suggested phase split — each phase shippable + reviewable independently:
1. **Phase 1 — Abstractions.** Add `IDriverBrowser`, `IBrowseSession`, `BrowseNode`, `AttributeInfo`, `BrowseNodeKind` to Commons. Empty build.
2. **Phase 2 — Extract NamespaceMap.** Move from runtime `Driver.OpcUaClient` to `Driver.OpcUaClient.Contracts`; update runtime ref.
3. **Phase 3 — OpcUaClient browser.** New `Driver.OpcUaClient.Browser` project; impl + unit tests against opc-plc.
4. **Phase 4 — Galaxy browser.** New `Driver.Galaxy.Browser` project; impl + unit tests with fake transport. Confirm attribute-fetch API surface on `GalaxyRepositoryClient`.
5. **Phase 5 — AdminUI plumbing.** `BrowseSessionRegistry`, `BrowseSessionReaper`, `BrowserSessionService`, DI wire-up in `Program.cs`. Unit tests.
6. **Phase 6 — Shared `DriverBrowseTree.razor`.** Lazy tree component with per-node filter. Component tests with fake service.
7. **Phase 7 — Wire pickers.** Update `OpcUaClientAddressPickerBody.razor` and `GalaxyAddressPickerBody.razor` to use `DriverBrowseTree` + DriverOperator gating + (Galaxy) attribute side-panel. Manual smoke test.
8. **Phase 8 — Integration test + docs.** Opt-in opc-plc integration suite, design doc cross-references in `docs/`, `CLAUDE.md` (or `docs/security.md`) updates if needed.
---
## Decisions table
| # | Decision | Rationale |
|---|---|---|
| 1 | Ad-hoc browse using form JSON | Mirrors `TestDriverConnect` probe; works for new drafts and existing drivers uniformly |
| 2 | Tree + lazy load both drivers | Galaxy gateway just shipped `LazyBrowseNode.ExpandAsync` — symmetric UX possible |
| 3 | AdminUI-hosted via `IDriverBrowser` factory | Browse is interactive (≥10 calls/session); cross-cluster Ask hop would multiply latency; session has no value to other nodes |
| 4 | Sibling `*.Browser` projects | Keep AdminUI from pulling runtime `Driver.*` projects' SDK chains |
| 5 | `NamespaceMap` to `OpcUaClient.Contracts` | Shared between runtime + browser, no new project needed |
| 6 | Separate browse PKI store | Browse-time cert accept must not poison runtime driver's trust store |
| 7 | Per-node client-side text filter (v1) | Quick UX win; server-side filters deferred |
| 8 | 2 min idle TTL, 30s reaper tick | Matches typical user cadence; bounds resource exposure |
| 9 | 20 s per-call / 30 s open timeouts | Interactive feel; longer hangs almost always mean broken remote |
| 10 | DriverOperator role gating | Live remote connection is operationally privileged; matches Reconnect/Restart precedent |
| 11 | No audit trail | Browse is read-only; matches probe pattern |
| 12 | Galaxy two-stage attribute side-panel | One modal, no extra clicks vs. two-modal flow |
File diff suppressed because it is too large Load Diff
@@ -0,0 +1,24 @@
{
"planPath": "docs/plans/2026-05-28-driver-browsers-plan.md",
"tasks": [
{"id": 1, "subject": "Task 1: Phase 1 — Add IDriverBrowser/IBrowseSession/BrowseNode to Commons", "status": "pending"},
{"id": 2, "subject": "Task 2: Phase 2 — Extract NamespaceMap to OpcUaClient.Contracts", "status": "pending", "blockedBy": [1]},
{"id": 3, "subject": "Task 3: Phase 3a — Scaffold Driver.OpcUaClient.Browser project", "status": "pending", "blockedBy": [2]},
{"id": 4, "subject": "Task 4: Phase 3b — Implement OpcUaClientBrowseSession", "status": "pending", "blockedBy": [3]},
{"id": 5, "subject": "Task 5: Phase 3c — Implement OpcUaClientDriverBrowser factory", "status": "pending", "blockedBy": [4]},
{"id": 6, "subject": "Task 6: Phase 3d — OpcUaClient.Browser tests (opc-plc fixture)", "status": "pending", "blockedBy": [5]},
{"id": 7, "subject": "Task 7: Phase 4a — Scaffold Driver.Galaxy.Browser project", "status": "pending", "blockedBy": [1]},
{"id": 8, "subject": "Task 8: Phase 4b — Implement GalaxyBrowseSession", "status": "pending", "blockedBy": [7]},
{"id": 9, "subject": "Task 9: Phase 4c — Implement GalaxyDriverBrowser factory", "status": "pending", "blockedBy": [8]},
{"id": 10, "subject": "Task 10: Phase 4d — Galaxy.Browser tests (fake transport)", "status": "pending", "blockedBy": [9]},
{"id": 11, "subject": "Task 11: Phase 5a — BrowseSessionRegistry + reaper + service", "status": "pending", "blockedBy": [1]},
{"id": 12, "subject": "Task 12: Phase 5b — Wire DI in AddAdminUI()", "status": "pending", "blockedBy": [5, 9, 11]},
{"id": 13, "subject": "Task 13: Phase 5c — Tests for registry, reaper, service", "status": "pending", "blockedBy": [11]},
{"id": 14, "subject": "Task 14: Phase 6 — Shared DriverBrowseTree.razor", "status": "pending", "blockedBy": [12]},
{"id": 15, "subject": "Task 15: Phase 7a — Wire OpcUaClient picker to browser", "status": "pending", "blockedBy": [14]},
{"id": 16, "subject": "Task 16: Phase 7b — Wire Galaxy picker + attribute side-panel", "status": "pending", "blockedBy": [14]},
{"id": 17, "subject": "Task 17: Phase 8a — opc-plc integration test", "status": "pending", "blockedBy": [6]},
{"id": 18, "subject": "Task 18: Phase 8b — Manual smoke + CLAUDE.md update", "status": "pending", "blockedBy": [13, 15, 16, 17]}
],
"lastUpdated": "2026-05-28T00:00:00Z"
}
@@ -0,0 +1,132 @@
# Design — Complete AdminUI deferred follow-ups
**Date:** 2026-05-29
**Status:** Approved (design); implementation plan to follow
**Author:** Joseph Doherty (with Claude Code)
## Background
The AdminUI carried a family of "deferred / Phase C.2 follow-up" notes. A prior
change stripped the stale *rendered roadmap banners* from the cluster list pages.
Three remaining note groups were investigated to decide what real work they hide:
- **Group 1 — driver-page inline notes** ("list-editor coming in a follow-up
phase" for tags/devices/endpoints; "typed-form-ifying Polly is a follow-up").
→ **Real pending UI work.**
- **Group 2 — RoleGrants** ("UI-driven editing of the mapping is deferred — it
implies a config-reload mechanism that doesn't exist yet"). → **Real work; half
the infra already exists.**
- **Group 3 — source comments** (F15 Razor migration, F16 FleetStatusHub bridge,
"Phase 4" identity section, `TODO(3.3/3.4)` route collision). → **~90% stale**;
the referenced work already shipped (the F16 bridge is wired; the legacy
`DriverEdit.razor` no longer exists). Only the Polly typed form is real, and it
is already counted in Group 1.
### Key facts established during exploration
- **Driver-embedded tag/device lists in `DriverConfig` JSON are the runtime source
of truth.** Driver factories deserialize them and poll exactly those rows; the
canonical `Tag` table is orthogonal (OPC UA browse-tree only, never read by
drivers). So inline editors are meaningful, not redundant — editing them changes
what the driver polls on the next publish/reinitialize.
- **Resilience** already has a strongly-typed model: `DriverResilienceOptions`
(`BulkheadMaxConcurrent`, `BulkheadMaxQueue`, `RecycleIntervalSeconds`,
`CapabilityPolicies: {DriverCapability → (TimeoutSeconds, RetryCount,
BreakerFailureThreshold)}`) with tier A/B/C defaults via `GetTierDefaults(tier)`
and a `DriverResilienceOptionsParser`. The stored JSON is an *override* shape;
null/absent keys fall back to tier defaults.
- **LDAP role map**: the `LdapGroupRoleMapping` entity + migration +
`ILdapGroupRoleMappingService` (CRUD) already exist but are **not wired** into
login. `LdapAuthService` still reads the static appsettings `GroupToRole`
(`Dictionary<string,string>`). `RoleGrants.razor` is read-only.
- **Testing**: no bUnit. Established pattern = test `FromOptions`/`ToOptions`
round-trips (xUnit + Shouldly in `AdminUI.Tests`) and services with in-memory EF
(`Configuration.Tests`).
## Decisions
- **Scope:** full build — all real follow-ups in Groups 1 & 2, plus Group 3
comment cleanup.
- **List-editor UX:** modal-per-row with a shared shell component.
- **LDAP reload semantics:** DB-backed, **live on the user's next sign-in**
(per-login DB query; no restart, no new infra). appsettings `GroupToRole` becomes
a bootstrap **fallback** layer.
- **Roles are GLOBAL.** No cluster-level permissions / no per-cluster enforcement
(explicitly chosen for simplicity, reversing an earlier cluster-scoping answer).
Every `LdapGroupRoleMapping` row is `IsSystemWide=true`, `ClusterId=null`.
## Workstreams
### WS1 — Driver collection editors (modal-per-row + shared shell)
- New generic `CollectionEditor<TRow>` component in `Components/Shared/Drivers/`:
compact read-only table + `[+ Add]` / per-row `Edit` / `Delete`, and a Bootstrap
modal editing a **working copy** of a row (commit on modal-Save, discard on
Cancel). Parameters: `List<TRow> Items` (bound), header fragment, read-only-cells
fragment, modal-body fragment, `NewRow` factory, optional `Validate` delegate.
- Each driver page swaps its read-only `<pre>` for a `CollectionEditor` supplying
its own columns + modal fields. Edits mutate the in-memory `List<T>` already in
the page's `FormModel`; the page's existing **Save** serializes it into
`DriverConfig` — no new persistence path.
- Coverage: tags (Modbus, AbCip, AbLegacy, TwinCAT, S7, FOCAS); devices (AbCip,
AbLegacy, TwinCAT, FOCAS); endpoints (OpcUaClient).
- **Errors/validation:** required fields, duplicate Name within list,
driver-specific address format; delete confirm; list mutates only on valid commit.
- **Testing:** per-driver `NewRow` factories + `Validate` methods unit-tested
directly; existing `*FormSerializationTests` extended for add/remove via the form
model. Modal interaction verified manually via `/run`.
### WS2 — Resilience typed form
- Replace the textarea in `DriverResilienceSection.razor` with a typed form bound to
a new mutable `ResilienceFormModel` (all fields nullable; null = tier default):
bulkhead concurrent/queue, recycle interval, and an 8-capability grid (Read,
Write, Discover, Subscribe, Probe, AlarmSubscribe, AlarmAcknowledge, HistoryRead)
of (timeout / retry / breaker-threshold).
- `FromJson`/`ToJson` emit only non-null overrides (blank → `null`, preserving the
current "null = tier defaults" contract). The section gains a `DriverTier`
parameter; each driver page passes its known tier so `GetTierDefaults(tier)`
renders as placeholders. A collapsible "raw JSON" view remains as escape hatch.
- **Errors:** non-negative / sane-range numeric validation; emitted JSON must
re-parse cleanly through `DriverResilienceOptionsParser`.
- **Testing:** `ResilienceFormModel` round-trip tests in `AdminUI.Tests`
blank→null, partial-override-preserved, emit→parse-back compatibility.
### WS3 — Editable LDAP→role map (DB-backed, global, live on next sign-in)
- `RoleGrants.razor` → full CRUD over `LdapGroupRoleMapping` via the existing
`ILdapGroupRoleMappingService`. **Global only**: `IsSystemWide=true`,
`ClusterId=null`; no cluster UI. Fields: LDAP group, `AdminRole`
(ConfigViewer/ConfigEditor/FleetAdmin), notes. A group may carry several roles
(multiple rows). Edit page gated to **FleetAdmin** (add a minimal FleetAdmin
authorization policy; confirm existing role-policy plumbing during plan-writing).
- Wire the service into `LdapAuthService`: at login → resolve groups →
`GetByGroupsAsync` (indexed) → map roles → **merge appsettings `GroupToRole` as a
fallback layer** (used when no DB row covers a group). Edits take effect on the
user's next sign-in. DB rows authoritative + editable; appsettings entries shown
read-only as "fallback."
- **Errors:** DB unreachable at login → catch, log, fall back to appsettings;
login never blocks. CRUD: no duplicate `(LdapGroup, Role)`; group/role required.
- **Testing:** extend `LdapGroupRoleMappingServiceTests` (in-memory EF) for CRUD +
dedupe; new `RoleMapper` overload `Map(groups, dbRows, fallbackDict)` unit-tested
for merge + fallback precedence + DB-error fallback.
### WS4 — Cleanup (runs last, after the features exist)
- **Delete stale comments:** `FleetStatusHub.cs` ("passive channel / until the
bridge lands"), `EndpointRouteBuilderExtensions.cs` (F15), `DriverIdentitySection.razor`
("Phase 4 / generic DriverEdit"), `DriverEditRouter.razor` + `DriverTypePicker.razor`
(`TODO(3.3/3.4)` + the "falls back to legacy DriverEdit" path — verify & clean,
legacy file is gone), and update `DriverResilienceSection.razor`'s comment.
- **Strip rendered notes** now true: per-driver "list-editor coming in a follow-up
phase" notes, the OpcUaClient endpoint note, the resilience "typed-form-ifying
Polly is a follow-up" note, and the RoleGrants "UI-driven editing is deferred" note.
## Cross-cutting
- **No DB schema change**`LdapGroupRoleMapping` migration already applied;
`DriverConfig`/`ResilienceConfig` columns unchanged.
- **Definition of done:** build clean + `dotnet test` green + a `/run` pass
exercising the modal editors and role-map CRUD.
- **Suggested sequence:** WS1 shared shell + Modbus tags as proof → remaining
drivers → WS2 → WS3 → WS4.
File diff suppressed because it is too large Load Diff
@@ -0,0 +1,26 @@
{
"planPath": "docs/plans/2026-05-29-adminui-followups.md",
"branch": "feat/adminui-followups",
"tasks": [
{"id": 11, "plan": 1, "subject": "Task 1: Generic CollectionEditor<TRow> component", "status": "pending"},
{"id": 12, "plan": 2, "subject": "Task 2: Modbus tag editor (proof) + tests", "status": "pending", "blockedBy": [11]},
{"id": 13, "plan": 3, "subject": "Task 3: AbCip device+tag editors + tests", "status": "pending", "blockedBy": [11]},
{"id": 14, "plan": 4, "subject": "Task 4: AbLegacy device+tag editors + tests", "status": "pending", "blockedBy": [11]},
{"id": 15, "plan": 5, "subject": "Task 5: TwinCAT device+tag editors + tests", "status": "pending", "blockedBy": [11]},
{"id": 16, "plan": 6, "subject": "Task 6: FOCAS device+tag editors + tests", "status": "pending", "blockedBy": [11]},
{"id": 17, "plan": 7, "subject": "Task 7: S7 tag editor + tests", "status": "pending", "blockedBy": [11]},
{"id": 18, "plan": 8, "subject": "Task 8: OpcUaClient endpoint-URL editor + tests", "status": "pending", "blockedBy": [11]},
{"id": 19, "plan": 9, "subject": "Task 9: ResilienceFormModel + tests", "status": "pending"},
{"id": 20, "plan": 10, "subject": "Task 10: Typed resilience form in DriverResilienceSection", "status": "pending", "blockedBy": [19]},
{"id": 21, "plan": 11, "subject": "Task 11: RoleMapper.Merge overload + tests", "status": "pending"},
{"id": 22, "plan": 12, "subject": "Task 12: Register ILdapGroupRoleMappingService in DI", "status": "pending"},
{"id": 23, "plan": 13, "subject": "Task 13: Wire DB merge into AuthEndpoints.LoginAsync", "status": "pending", "blockedBy": [21, 22]},
{"id": 24, "plan": 14, "subject": "Task 14: Add FleetAdmin authorization policy", "status": "pending"},
{"id": 25, "plan": 15, "subject": "Task 15: RoleGrants.razor global CRUD (FleetAdmin-gated)", "status": "pending", "blockedBy": [22, 24]},
{"id": 26, "plan": 16, "subject": "Task 16: LdapGroupRoleMapping service tests (global CRUD)", "status": "pending"},
{"id": 27, "plan": 17, "subject": "Task 17: Delete stale source comments", "status": "pending", "blockedBy": [12, 13, 14, 15, 16, 17, 18, 20, 25]},
{"id": 28, "plan": 18, "subject": "Task 18: Strip now-true rendered notes", "status": "pending", "blockedBy": [12, 13, 14, 15, 16, 17, 18, 25]},
{"id": 29, "plan": 19, "subject": "Task 19: Full verification (build + test + /run)", "status": "pending", "blockedBy": [20, 23, 26, 27, 28]}
],
"lastUpdated": "2026-05-29"
}
@@ -0,0 +1,273 @@
# Auth/login alignment with ScadaBridge — design
> **Status:** approved 2026-05-29. Implementation plan to follow via `writing-plans`.
> **Trigger:** browser hitting `http://localhost:9200/` rendered Chrome's `HTTP_RESPONSE_CODE_FAILURE` page because the cookie scheme's `OnRedirectToLogin` event was overridden to return 401 with no body, and the parallel JwtBearer scheme stamped `WWW-Authenticate: Bearer`. ScadaBridge sets `LoginPath` and lets the framework do its built-in browser-vs-AJAX heuristic; OtOpcUa diverged.
**Goal:** Restore default browser-redirect ergonomics on protected GETs, retire the unused JwtBearer server-side scheme, and externalize cookie config — bringing OtOpcUa's auth structure into parity with ScadaBridge.
**Architecture:** Single Cookie auth scheme. The JWT keeps minting (via `JwtTokenService`) and validating (in `CookieAuthenticationStateProvider`) as the **cookie payload only**; no `AddJwtBearer`, no parallel `Authorization: Bearer` validation. Cookie config (`Name`, `ExpiryMinutes`, `RequireHttpsCookie`) flows through the existing-but-unused `OtOpcUaCookieOptions` via a `Configure<IOptions<OtOpcUaCookieOptions>, ILoggerFactory>` PostConfigure step — same pattern ScadaBridge uses.
**Tech stack:** .NET 10 / ASP.NET Core / `Microsoft.AspNetCore.Authentication.Cookies` only (drop `Microsoft.AspNetCore.Authentication.JwtBearer` from the wiring if its only remaining transitive use disappears with this change).
---
## 1. Architecture
### Schemes
| Before | After |
|---|---|
| Cookie (primary) + JwtBearer (parallel) | Cookie only |
| `FallbackPolicy` lists both schemes | `FallbackPolicy` lists Cookie only |
| `OnRedirectToLogin` overridden to 401 | default behavior: 302 for browsers, 401 for AJAX |
| `OnRedirectToAccessDenied` overridden to 403 | default behavior: 302 to `/Account/AccessDenied` (404s today; matches ScadaBridge) |
### Cookie config — externalized via `OtOpcUaCookieOptions`
```csharp
public sealed class OtOpcUaCookieOptions
{
public const string SectionName = "Security:Cookie";
public string Name { get; set; } = "ZB.MOM.WW.OtOpcUa.Auth";
public int ExpiryMinutes { get; set; } = 30;
public bool RequireHttpsCookie { get; set; } = true;
}
```
Wired into `CookieAuthenticationOptions` via:
```csharp
services.AddOptions<CookieAuthenticationOptions>(CookieAuthenticationDefaults.AuthenticationScheme)
.Configure<IOptions<OtOpcUaCookieOptions>, ILoggerFactory>((cookieOpts, ourOpts, lf) =>
{
cookieOpts.Cookie.Name = ourOpts.Value.Name;
cookieOpts.ExpireTimeSpan = TimeSpan.FromMinutes(ourOpts.Value.ExpiryMinutes);
cookieOpts.SlidingExpiration = true;
cookieOpts.Cookie.SecurePolicy = ourOpts.Value.RequireHttpsCookie
? CookieSecurePolicy.Always
: CookieSecurePolicy.SameAsRequest;
if (!ourOpts.Value.RequireHttpsCookie)
{
lf.CreateLogger("ZB.MOM.WW.OtOpcUa.Security").LogWarning(
"Security:Cookie:RequireHttpsCookie is DISABLED — auth cookie SecurePolicy is SameAsRequest. " +
"Cookie travels in cleartext over plain HTTP. Dev-only.");
}
});
```
### Endpoint surface — unchanged
| Path | Auth | Behavior |
|---|---|---|
| `POST /auth/login` | AllowAnonymous | LDAP auth → SignInAsync(Cookie); JSON callers get 204 / 401 / 503, form posters get 302 + cookie |
| `POST /auth/logout` | RequireAuthorization | SignOutAsync(Cookie) |
| `GET /auth/ping` | AllowAnonymous (handler-returns 200/401) | Polled by Blazor every 60s |
| `POST /auth/token` | RequireAuthorization | Mints JWT for hypothetical external callers (matches ScadaBridge — they keep this even without JwtBearer wired) |
### Cookie rename
Old: `OtOpcUa.Auth`. New: `ZB.MOM.WW.OtOpcUa.Auth`. Effect: all sessions in flight at deploy time are invisible to the new handler → users re-prompt for login on next protected GET. No security impact (the old cookie expires per its own sliding window; nothing reads it).
---
## 2. Components
### Files modified
| File | Change |
|---|---|
| `src/Server/.../Security/CookieOptions.cs` | Add `RequireHttpsCookie`; change `Name` default to `ZB.MOM.WW.OtOpcUa.Auth` |
| `src/Server/.../Security/ServiceCollectionExtensions.cs` | Drop `using JwtBearer`; delete `ConfigureJwtBearerFromTokenService` class; drop `.AddJwtBearer` + its IPostConfigureOptions registration; drop `OnRedirectToLogin` / `OnRedirectToAccessDenied` overrides; add `LoginPath` + `LogoutPath`; add PostConfigure block consuming `OtOpcUaCookieOptions`; remove `JwtBearerDefaults.AuthenticationScheme` from `FallbackPolicy` builder |
| `tests/Server/.../Security.Tests/AuthEndpointsIntegrationTests.cs` | Update the `Set-Cookie` assertion on the login-success test from `OtOpcUa.Auth=``ZB.MOM.WW.OtOpcUa.Auth=` |
### Files NOT modified
| File | Why |
|---|---|
| `Endpoints/AuthEndpoints.cs` | Endpoint contracts unchanged |
| `Jwt/JwtTokenService.cs` | Still mints JWT into cookie payload |
| `Blazor/CookieAuthenticationStateProvider.cs` | Still polls `/auth/ping` |
| `Ldap/*` | Untouched |
| Razor login page | POST target unchanged |
| `appsettings*.json` | Defaults are production-safe; no required config edit |
### Tests added
Single new file or appended class in `tests/Server/.../Security.Tests/`:
```csharp
public class AuthChallengeTests : AuthEndpointsTestBase
{
[Fact]
public async Task Root_anonymous_browser_GET_redirects_to_login()
{
var client = NewClient(allowAutoRedirect: false);
client.DefaultRequestHeaders.Accept.ParseAdd("text/html");
var resp = await client.GetAsync("/", Ct);
resp.StatusCode.ShouldBe(HttpStatusCode.Found); // 302
resp.Headers.Location!.ToString().ShouldContain("/login");
resp.Headers.Location.ToString().ShouldContain("ReturnUrl");
}
[Fact]
public async Task Root_anonymous_xhr_GET_returns_401()
{
var client = NewClient(allowAutoRedirect: false);
client.DefaultRequestHeaders.Add("X-Requested-With", "XMLHttpRequest");
var resp = await client.GetAsync("/", Ct);
resp.StatusCode.ShouldBe(HttpStatusCode.Unauthorized);
// Framework still writes a Location header alongside the 401 — AJAX clients ignore it.
}
}
```
**Framework reality vs. earlier hypothesis:** The ASP.NET Core cookie handler's `IsAjaxRequest` heuristic checks ONLY the `X-Requested-With: XMLHttpRequest` header, NOT the `Accept` content type. A request with `Accept: application/json` but no XHR header is classified as a browser → 302. The third test originally proposed (`Root_anonymous_json_GET_returns_401`) was dropped because it tests behavior the framework doesn't have. ScadaBridge accepts the same framework reality (it doesn't override the heuristic either).
### Package references
`src/Server/ZB.MOM.WW.OtOpcUa.Security/ZB.MOM.WW.OtOpcUa.Security.csproj`: remove `<PackageReference Include="Microsoft.AspNetCore.Authentication.JwtBearer" />` if grep confirms `JwtTokenService` doesn't itself need it (it uses `Microsoft.IdentityModel.Tokens` for validation parameters, separate package).
---
## 3. Data flow
### Anonymous browser hits `/`
```
Browser → GET /
Accept: text/html
┌──> AuthN: no cookie → unauthenticated
├──> AuthZ FallbackPolicy fails
└──> Cookie HandleChallengeAsync:
- Accept: text/html → browser
- 302 Location: /login?ReturnUrl=%2F
Browser → GET /login ← redirect followed; login page renders (AllowAnonymous)
[user submits form]
Browser → POST /auth/login Content-Type: application/x-www-form-urlencoded
─── LoginAsync:
- LDAP authenticate
- SignInAsync(Cookie)
- Set-Cookie: ZB.MOM.WW.OtOpcUa.Auth=...
- 302 Location: / (or ReturnUrl)
Browser → GET / cookie present → AuthZ passes → 200 + Razor render
```
### XHR / fetch hits a protected endpoint without cookie
```
fetch('/api/something') Accept: application/json
X-Requested-With: XMLHttpRequest
┌──> AuthN: no cookie → unauthenticated
├──> AuthZ FallbackPolicy fails
└──> Cookie HandleChallengeAsync:
- not text/html → API client
- 401 (no body, no Location)
```
The cookie handler's built-in `IsAjaxRequest` heuristic is what makes this work — it looks for `X-Requested-With: XMLHttpRequest`. No custom event handler needed. Note: requests with only `Accept: application/json` (no XHR header) are classified as browsers → 302; AJAX callers should set the XHR header to get 401.
### Logout
```
fetch('/auth/logout', POST) cookie present
─── LogoutAsync (RequireAuthorization passes):
- SignOutAsync(Cookie)
- Set-Cookie: ZB.MOM.WW.OtOpcUa.Auth=; expires=...
- 204 (or browser-form: 302 /login)
```
### Old cookie ignored
Browser holds stale `OtOpcUa.Auth` from a session that predates the deploy. Cookie scheme is now configured for `ZB.MOM.WW.OtOpcUa.Auth` — old cookie is invisible. User treated as anonymous → 302 to `/login`. Old cookie sits in jar until its own sliding window expires (max 30 min); no security risk because nothing reads it.
### Blazor `/auth/ping` polling
```
CookieAuthenticationStateProvider → GET /auth/ping every 60s
cookie present → 200
cookie expired/missing → 401
Blazor → invalidates auth state → re-render → root [Authorize] fails
→ Cookie HandleChallengeAsync → 302 /login
```
Unchanged.
---
## 4. Error handling
| Surface | Behavior |
|---|---|
| Unknown `Accept` (`*/*`, missing, JSON) | Framework default: treated as non-AJAX → 302 to `/login`. The cookie handler's `IsAjaxRequest` only looks at `X-Requested-With`, NOT `Accept`. CLI tools that want a 401 should set `X-Requested-With: XMLHttpRequest`. |
| `LoginAsync` bad creds | JSON: `401`. Form: `302 /login?error=…&returnUrl=…`. Handler-returned, unaffected by middleware changes. |
| `LoginAsync` LDAP throws | `503 ServiceUnavailable`. Handler-returned. |
| `LoginAsync` success | JSON: `204`. Form: `302 /` (or `ReturnUrl`). |
| Cookie expires mid-request | Treated as anonymous → 302 to `/login` (browser) or 401 (AJAX). Active users kept alive by `SlidingExpiration = true`. |
| `RequireHttpsCookie = false` over HTTPS | Cookie marked `SecurePolicy = SameAsRequest`. Misconfiguration risk; startup logs Warning every boot so it's audible. No validator-refused boot — default is `true`; dev compose explicitly opts out. |
| Missing `Security:Cookie` section in config | `.Bind()` no-ops; defaults take over (`Name = ZB.MOM.WW.OtOpcUa.Auth`, `ExpiryMinutes = 30`, `RequireHttpsCookie = true`). Production-safe. |
| `[Authorize(Policy="DriverOperator")]` denied for authenticated non-operator | Cookie handler redirects to default `AccessDeniedPath = "/Account/AccessDenied"` which 404s in OtOpcUa. Matches ScadaBridge; rare enough not to be a P0. Follow-up: add a minimal `/access-denied` Razor page. |
---
## 5. Testing
### Existing tests pass unchanged
- `Login_with_invalid_credentials_returns_401` — handler-returned, unaffected
- `Login_when_ldap_throws_returns_503` — handler-returned, unaffected
- `Ping_anonymous_returns_401` — handler-returned, unaffected
- `Ping_after_cookie_login_returns_200` — uses HttpClient cookie container, picks up renamed cookie automatically
- `Login_with_cookie_credentials_returns_204_and_sets_cookie` — needs one assertion update (cookie name)
### Tests added (3 new)
- `Root_anonymous_browser_GET_redirects_to_login` — asserts 302 + `Location` contains `/login` + `ReturnUrl`
- `Root_anonymous_ajax_GET_returns_401``X-Requested-With: XMLHttpRequest` → 401, no `Location`
(the originally planned `Root_anonymous_json_GET_returns_401` was dropped — see Section 3 framework-reality note above)
### Removed/orphaned tests
None expected. The explore phase found no test depending on `ConfigureJwtBearerFromTokenService` or the `WWW-Authenticate: Bearer` response. Grep at plan-write time to confirm.
### Manual smoke (docker-dev stack)
1. `http://localhost:9200/` anonymously → expect 302 to `/login?ReturnUrl=%2F` (was: Chrome error page)
2. Sign in via the form
3. `http://localhost:9200/` authenticated → expect Razor dashboard
4. DevTools → Application → Cookies → confirm `ZB.MOM.WW.OtOpcUa.Auth`
5. `curl -i http://localhost:9200/``302 Found`, Location: `/login?ReturnUrl=%2F`
6. `curl -i -H "Accept: application/json" http://localhost:9200/``401 Unauthorized`
### Verification gates at PR time
- `dotnet build ZB.MOM.WW.OtOpcUa.slnx` — zero new errors (pre-existing 12 unchanged)
- `dotnet test tests/Server/ZB.MOM.WW.OtOpcUa.Security.Tests/` — all green
- `dotnet test tests/Server/ZB.MOM.WW.OtOpcUa.AdminUI.Tests/` — all green
- Manual Chrome smoke above passes
---
## 6. Sequencing (for plan-writing)
Single-PR feature, but split into reviewable phases:
1. **Phase 1 — Options class.** Extend `OtOpcUaCookieOptions` with `RequireHttpsCookie` and new `Name` default. Tests unaffected.
2. **Phase 2 — Wiring rewrite.** Edit `ServiceCollectionExtensions.cs`: drop JwtBearer, drop event overrides, add `LoginPath`/`LogoutPath`, add PostConfigure consumption of `OtOpcUaCookieOptions`. Update the one existing test assertion. Build + existing Security.Tests green.
3. **Phase 3 — New challenge tests.** Add the 3 new redirect/401 tests.
4. **Phase 4 — Package cleanup.** Remove `Microsoft.AspNetCore.Authentication.JwtBearer` from csproj if grep confirms no remaining consumer.
5. **Phase 5 — Manual smoke + commit.** Restart admin-a/admin-b in docker-dev; verify in Chrome.
---
## Decisions table
| # | Decision | Rationale |
|---|---|---|
| 1 | Drop JwtBearer server-side scheme | No in-repo consumer; brought non-redirect 401 + `WWW-Authenticate: Bearer` to browser GETs |
| 2 | Keep `JwtTokenService` + `/auth/token` | Token-as-cookie-payload is load-bearing for Blazor; `/auth/token` matches ScadaBridge surface |
| 3 | Rename cookie `OtOpcUa.Auth``ZB.MOM.WW.OtOpcUa.Auth` | Naming parity with ScadaBridge; one-time forced sign-out acceptable |
| 4 | Externalize via existing `OtOpcUaCookieOptions` + PostConfigure | Mirrors ScadaBridge pattern; fixes pre-existing bug where options class was bound but ignored |
| 5 | Drop both `OnRedirectToLogin` and `OnRedirectToAccessDenied` overrides | Restores framework's browser-vs-AJAX heuristic; ScadaBridge does the same |
| 6 | Set `LoginPath = "/login"`, `LogoutPath = "/auth/logout"` | Required for the framework's default redirect to work |
| 7 | Accept 404 on `/Account/AccessDenied` for v1 | Matches ScadaBridge; rare path; follow-up to add minimal page |
| 8 | Warning-log when `RequireHttpsCookie = false` | Audible misconfig signal; same as ScadaBridge |
@@ -0,0 +1,652 @@
# Auth/login alignment with ScadaBridge — implementation plan
> **For Claude:** REQUIRED SUB-SKILL: Use `superpowers-extended-cc:executing-plans` or `superpowers-extended-cc:subagent-driven-development` to implement this plan task-by-task.
**Goal:** Match ScadaBridge's single-Cookie auth pattern: drop the unused JwtBearer parallel scheme, restore the framework's default browser-vs-AJAX challenge heuristic, and externalize cookie config through the existing-but-unused `OtOpcUaCookieOptions`.
**Architecture:** Cookie-only auth. `JwtTokenService` keeps minting JWTs as the cookie payload (Blazor circuit hydration depends on it). Cookie name + idle timeout + HTTPS policy flow through `OtOpcUaCookieOptions` via a `Configure<IOptions<OtOpcUaCookieOptions>, ILoggerFactory>` PostConfigure step. Endpoint surface (`/auth/login`, `/auth/logout`, `/auth/ping`, `/auth/token`) unchanged.
**Tech stack:** .NET 10 / ASP.NET Core / `Microsoft.AspNetCore.Authentication.Cookies` / xUnit v3 + Shouldly / `Microsoft.AspNetCore.TestHost.TestServer`.
**Design doc:** `docs/plans/2026-05-29-auth-alignment-design.md` (commit `bc4fce5`). Each task below cites the design section it implements.
---
## Sequencing
```
Task 1 (Options class)
└─► Task 2 (Wiring rewrite + test assertion update)
├─► Task 3 (3 new challenge tests)
└─► Task 4 (csproj cleanup)
└─► Task 5 (manual smoke + final commit)
```
Tasks 3 and 4 are parallelizable (disjoint files).
---
## Task 1 — Extend `OtOpcUaCookieOptions`
**Classification:** trivial
**Estimated implement time:** ~2 min
**Parallelizable with:** none (Task 2 depends on this)
**Files:**
- Modify: `src/Server/ZB.MOM.WW.OtOpcUa.Security/CookieOptions.cs`
**Implements design:** Section 1 (Architecture, "Cookie config — externalized") + Section 2 (Components, file table row 1).
### Step 1: Replace file contents
Current file (12 lines):
```csharp
namespace ZB.MOM.WW.OtOpcUa.Security;
public sealed class OtOpcUaCookieOptions
{
public const string SectionName = "Security:Cookie";
/// <summary>Gets or sets the cookie name.</summary>
public string Name { get; set; } = "OtOpcUa.Auth";
/// <summary>Idle sliding window, in minutes (default 30).</summary>
public int ExpiryMinutes { get; set; } = 30;
}
```
Replace with:
```csharp
namespace ZB.MOM.WW.OtOpcUa.Security;
/// <summary>
/// Auth-cookie configuration bound from <c>Security:Cookie</c>. Consumed by a
/// <c>Configure&lt;IOptions&lt;OtOpcUaCookieOptions&gt;, ILoggerFactory&gt;</c> step inside
/// <c>AddOtOpcUaAuth</c> that copies the values onto <c>CookieAuthenticationOptions</c>.
/// </summary>
public sealed class OtOpcUaCookieOptions
{
/// <summary>Configuration section name (<c>Security:Cookie</c>).</summary>
public const string SectionName = "Security:Cookie";
/// <summary>
/// Auth cookie name. Default uses the <c>ZB.MOM.WW</c> convention; mirrors ScadaBridge's
/// <c>ZB.MOM.WW.ScadaBridge.Auth</c>. Changing this invalidates existing sessions on next
/// deploy.
/// </summary>
public string Name { get; set; } = "ZB.MOM.WW.OtOpcUa.Auth";
/// <summary>Idle sliding-window length in minutes (default 30).</summary>
public int ExpiryMinutes { get; set; } = 30;
/// <summary>
/// Require HTTPS for the auth cookie. Default <c>true</c>: cookie is marked
/// <c>SecurePolicy = Always</c>. Set to <c>false</c> ONLY for local dev stacks running
/// plain HTTP — emits a startup Warning when disabled so the misconfiguration is
/// audible.
/// </summary>
public bool RequireHttpsCookie { get; set; } = true;
}
```
### Step 2: Build
Run:
```bash
cd /Users/dohertj2/Desktop/OtOpcUa
dotnet build src/Server/ZB.MOM.WW.OtOpcUa.Security/
```
Expected: 0 errors, 0 warnings.
### Step 3: Commit
```bash
git -C /Users/dohertj2/Desktop/OtOpcUa add src/Server/ZB.MOM.WW.OtOpcUa.Security/CookieOptions.cs
git -C /Users/dohertj2/Desktop/OtOpcUa commit -m "feat(security): extend OtOpcUaCookieOptions with RequireHttpsCookie + ZB.MOM.WW cookie name default"
```
### Output report
- Lines before / after
- Build clean
- Commit SHA
### Self-review checklist
- [ ] `Name` default is `"ZB.MOM.WW.OtOpcUa.Auth"` (NOT `"OtOpcUa.Auth"`)
- [ ] `RequireHttpsCookie` field added with default `true` and XML doc explaining the dev-only opt-out
- [ ] `ExpiryMinutes` default unchanged at 30
- [ ] `SectionName` constant unchanged
- [ ] Build clean
---
## Task 2 — Rewrite auth wiring in `ServiceCollectionExtensions.cs`
**Classification:** standard
**Estimated implement time:** ~5 min
**Parallelizable with:** none (Tasks 3 and 4 depend on this)
**Files:**
- Modify: `src/Server/ZB.MOM.WW.OtOpcUa.Security/ServiceCollectionExtensions.cs`
- Modify: `tests/Server/ZB.MOM.WW.OtOpcUa.Security.Tests/AuthEndpointsIntegrationTests.cs:93`
**Implements design:** Section 1 + Section 2 file table rows 2 + 3.
### Step 1: Read current file
```bash
cat /Users/dohertj2/Desktop/OtOpcUa/src/Server/ZB.MOM.WW.OtOpcUa.Security/ServiceCollectionExtensions.cs
```
Current shape (relevant excerpt):
- `using Microsoft.AspNetCore.Authentication.JwtBearer;` at top
- `internal sealed class ConfigureJwtBearerFromTokenService(JwtTokenService tokenService) : IPostConfigureOptions<JwtBearerOptions>` class (lines ~15-35)
- `.AddCookie(o => { ... })` with `OnRedirectToLogin` / `OnRedirectToAccessDenied` overrides
- `.AddJwtBearer(JwtBearerDefaults.AuthenticationScheme, _ => { })` chained after AddCookie
- `services.AddSingleton<IPostConfigureOptions<JwtBearerOptions>, ConfigureJwtBearerFromTokenService>()` after the AddAuthentication block
- `FallbackPolicy` builder takes both Cookie + JwtBearer schemes
### Step 2: Replace the file with the new shape
The full target file:
```csharp
using Microsoft.AspNetCore.Authentication.Cookies;
using Microsoft.AspNetCore.DataProtection;
using Microsoft.AspNetCore.Http;
using Microsoft.Extensions.Configuration;
using Microsoft.Extensions.DependencyInjection;
using Microsoft.Extensions.Logging;
using Microsoft.Extensions.Options;
using ZB.MOM.WW.OtOpcUa.Configuration;
using ZB.MOM.WW.OtOpcUa.Security.Jwt;
using ZB.MOM.WW.OtOpcUa.Security.Ldap;
namespace ZB.MOM.WW.OtOpcUa.Security;
/// <summary>
/// DI registration for OtOpcUa auth. Single Cookie scheme (the JWT lives inside the
/// cookie as its credential payload); no JwtBearer parallel scheme. Matches ScadaBridge
/// structurally — see <c>docs/plans/2026-05-29-auth-alignment-design.md</c>.
/// </summary>
public static class ServiceCollectionExtensions
{
/// <summary>Wires cookie authentication, DataProtection key persistence to ConfigDb,
/// LDAP services, and the LDAP-backed JwtTokenService. Browser flows redirect to
/// <c>/login</c>; AJAX/JSON callers receive 401 (handled by the framework's default
/// challenge heuristic).</summary>
/// <param name="services">The service collection.</param>
/// <param name="configuration">The application configuration root.</param>
public static IServiceCollection AddOtOpcUaAuth(this IServiceCollection services, IConfiguration configuration)
{
services.AddOptions<JwtOptions>().Bind(configuration.GetSection(JwtOptions.SectionName));
services.AddOptions<OtOpcUaCookieOptions>().Bind(configuration.GetSection(OtOpcUaCookieOptions.SectionName));
services.AddOptions<LdapOptions>().Bind(configuration.GetSection(LdapOptions.SectionName));
services.AddSingleton<JwtTokenService>();
// Singleton — LdapAuthService is stateless (creates an LdapConnection per call) and
// must be consumable by the Singleton LdapOpcUaUserAuthenticator on driver-role nodes.
services.AddSingleton<ILdapAuthService, LdapAuthService>();
services.AddDataProtection()
.PersistKeysToDbContext<OtOpcUaConfigDbContext>()
.SetApplicationName("OtOpcUa");
services.AddAuthentication(CookieAuthenticationDefaults.AuthenticationScheme)
.AddCookie(o =>
{
// Static fields only — Name / ExpireTimeSpan / SecurePolicy / SlidingExpiration
// are bound from OtOpcUaCookieOptions in the PostConfigure block below.
o.LoginPath = "/login";
o.LogoutPath = "/auth/logout";
o.Cookie.HttpOnly = true;
o.Cookie.SameSite = SameSiteMode.Strict;
// No OnRedirectToLogin / OnRedirectToAccessDenied overrides — let the framework's
// built-in IsAjaxRequest heuristic do its thing (302 for browsers, 401 for AJAX).
});
// Externalised cookie config — mirrors ScadaBridge's PostConfigure pattern. Fixes a
// pre-existing latent bug where OtOpcUaCookieOptions was bound but ignored.
services.AddOptions<CookieAuthenticationOptions>(CookieAuthenticationDefaults.AuthenticationScheme)
.Configure<IOptions<OtOpcUaCookieOptions>, ILoggerFactory>((cookieOpts, ourOpts, lf) =>
{
var v = ourOpts.Value;
cookieOpts.Cookie.Name = v.Name;
cookieOpts.ExpireTimeSpan = TimeSpan.FromMinutes(v.ExpiryMinutes);
cookieOpts.SlidingExpiration = true;
cookieOpts.Cookie.SecurePolicy = v.RequireHttpsCookie
? CookieSecurePolicy.Always
: CookieSecurePolicy.SameAsRequest;
if (!v.RequireHttpsCookie)
{
lf.CreateLogger("ZB.MOM.WW.OtOpcUa.Security").LogWarning(
"Security:Cookie:RequireHttpsCookie is DISABLED — auth cookie SecurePolicy is " +
"SameAsRequest. The cookie-embedded JWT will travel in cleartext over plain HTTP. " +
"Intended for local dev only — set Security:Cookie:RequireHttpsCookie=true in production.");
}
});
services.AddAuthorization(o =>
{
o.FallbackPolicy = new Microsoft.AspNetCore.Authorization.AuthorizationPolicyBuilder(
CookieAuthenticationDefaults.AuthenticationScheme)
.RequireAuthenticatedUser()
.Build();
// DriverOperator: may issue Reconnect/Restart commands against live driver instances
// from the Admin UI DriverStatusPanel. Map LDAP group → role via GroupToRole in
// appsettings (e.g. "ot-driver-operator": "DriverOperator").
o.AddPolicy("DriverOperator", policy =>
policy.RequireRole("DriverOperator", "FleetAdmin"));
});
return services;
}
}
```
What's gone (vs. the original):
- `using Microsoft.AspNetCore.Authentication.JwtBearer;`
- `ConfigureJwtBearerFromTokenService` internal class entirely
- `.AddJwtBearer(...)` chain after `.AddCookie(...)`
- `services.AddSingleton<IPostConfigureOptions<JwtBearerOptions>, ConfigureJwtBearerFromTokenService>();`
- `OnRedirectToLogin` / `OnRedirectToAccessDenied` event overrides
- Hardcoded `o.Cookie.Name = "OtOpcUa.Auth"`, `o.SlidingExpiration = true`, `o.ExpireTimeSpan = TimeSpan.FromMinutes(30)`, `o.Cookie.SecurePolicy = CookieSecurePolicy.SameAsRequest`
- `JwtBearerDefaults.AuthenticationScheme` from the `FallbackPolicy` builder
What's added:
- `using Microsoft.Extensions.Logging;`
- `o.LoginPath = "/login"`, `o.LogoutPath = "/auth/logout"` inside `.AddCookie(...)`
- The `services.AddOptions<CookieAuthenticationOptions>(...).Configure<...>(...)` PostConfigure block
### Step 3: Update the one existing test assertion
In `tests/Server/ZB.MOM.WW.OtOpcUa.Security.Tests/AuthEndpointsIntegrationTests.cs` around line 93:
```csharp
// before
response.Headers.GetValues("Set-Cookie").ShouldContain(c => c.StartsWith("OtOpcUa.Auth="));
// after
response.Headers.GetValues("Set-Cookie").ShouldContain(c => c.StartsWith("ZB.MOM.WW.OtOpcUa.Auth="));
```
### Step 4: Build + run security tests
```bash
cd /Users/dohertj2/Desktop/OtOpcUa
dotnet build src/Server/ZB.MOM.WW.OtOpcUa.Security/
dotnet test tests/Server/ZB.MOM.WW.OtOpcUa.Security.Tests/
```
Expected: build clean; all Security.Tests pass (the existing 5 AuthEndpointsIntegrationTests + JwtTokenServiceTests + LdapHelperTests + RoleMapperTests).
### Step 5: Commit
```bash
git -C /Users/dohertj2/Desktop/OtOpcUa add \
src/Server/ZB.MOM.WW.OtOpcUa.Security/ServiceCollectionExtensions.cs \
tests/Server/ZB.MOM.WW.OtOpcUa.Security.Tests/AuthEndpointsIntegrationTests.cs
git -C /Users/dohertj2/Desktop/OtOpcUa commit -m "$(cat <<'EOF'
refactor(security): drop JwtBearer parallel scheme, externalize cookie config
Single Cookie auth scheme; framework default challenge restores 302 → /login
for browsers + 401 for AJAX. OtOpcUaCookieOptions now flows through to
CookieAuthenticationOptions via PostConfigure (fixes a latent bug where the
options class was bound but ignored). Cookie name moves to
ZB.MOM.WW.OtOpcUa.Auth; existing sessions get a one-time forced sign-out.
EOF
)"
```
### Output report
- Net LOC change (additions / deletions)
- Build clean
- Test count run / passed
- Commit SHA
- Anything unexpected
### Self-review checklist
- [ ] `using Microsoft.AspNetCore.Authentication.JwtBearer;` removed
- [ ] `ConfigureJwtBearerFromTokenService` class deleted
- [ ] `.AddJwtBearer(...)` call deleted
- [ ] `IPostConfigureOptions<JwtBearerOptions>` singleton registration deleted
- [ ] `OnRedirectToLogin` and `OnRedirectToAccessDenied` overrides deleted
- [ ] `LoginPath = "/login"` and `LogoutPath = "/auth/logout"` added inside `.AddCookie(...)`
- [ ] PostConfigure block added consuming `OtOpcUaCookieOptions`
- [ ] Warning log fires when `RequireHttpsCookie == false`
- [ ] `FallbackPolicy` now takes only `CookieAuthenticationDefaults.AuthenticationScheme`
- [ ] `DriverOperator` policy unchanged
- [ ] Test assertion updated to `ZB.MOM.WW.OtOpcUa.Auth=`
- [ ] `dotnet test tests/Server/.../Security.Tests/` all green
---
## Task 3 — Add browser-vs-AJAX challenge tests
**Classification:** small
**Estimated implement time:** ~4 min
**Parallelizable with:** Task 4
**Files:**
- Modify: `tests/Server/ZB.MOM.WW.OtOpcUa.Security.Tests/AuthEndpointsIntegrationTests.cs` (append 3 new test methods + 1 helper)
**Implements design:** Section 5 "Tests added" + Section 4 "Auth challenge for unknown content type".
### Context for the implementer
`AuthEndpointsIntegrationTests` is `IAsyncLifetime`-backed and stands up a `TestServer` with `MapOtOpcUaAuth()` mounted (line 66). The `web.UseEndpoints(e => e.MapOtOpcUaAuth())` wires ONLY the four `/auth/*` endpoints — there is NO root `MapGet("/", ...)` registered. So an anonymous GET to `/` hits the routing pipeline, falls through to a 404 BEFORE auth middleware even challenges.
**The test harness needs a protected root endpoint.** Add one in `InitializeAsync` inside the `web.UseEndpoints(...)` callback. Then the 3 new tests will exercise the cookie scheme's challenge for that protected route.
### Step 1: Modify the test host setup
In `AuthEndpointsIntegrationTests.cs`, change `web.UseEndpoints(...)` (around line 66) from:
```csharp
app.UseEndpoints(e => e.MapOtOpcUaAuth());
```
to:
```csharp
app.UseEndpoints(e =>
{
e.MapOtOpcUaAuth();
// Protected root used by AuthChallengeTests below — exercises the cookie
// scheme's challenge heuristic without depending on the full Razor host.
e.MapGet("/", () => Results.Ok("authenticated")).RequireAuthorization();
});
```
### Step 2: Add the three new test methods
Append at the bottom of the class (before the closing brace), keeping the file's existing summary style and using `TestContext.Current.CancellationToken` via the existing `Ct` property:
```csharp
/// <summary>Anonymous browser GET of a protected route redirects to /login with a ReturnUrl.</summary>
[Fact]
public async Task Root_anonymous_browser_GET_redirects_to_login()
{
var client = NewClientNoRedirect();
var req = new HttpRequestMessage(HttpMethod.Get, "/");
req.Headers.Accept.ParseAdd("text/html");
var resp = await client.SendAsync(req, Ct);
resp.StatusCode.ShouldBe(HttpStatusCode.Found);
resp.Headers.Location.ShouldNotBeNull();
resp.Headers.Location!.OriginalString.ShouldContain("/login");
resp.Headers.Location.OriginalString.ShouldContain("ReturnUrl");
}
/// <summary>Anonymous AJAX GET of a protected route returns 401 with no Location.</summary>
[Fact]
public async Task Root_anonymous_ajax_GET_returns_401()
{
var client = NewClientNoRedirect();
var req = new HttpRequestMessage(HttpMethod.Get, "/");
req.Headers.Add("X-Requested-With", "XMLHttpRequest");
var resp = await client.SendAsync(req, Ct);
resp.StatusCode.ShouldBe(HttpStatusCode.Unauthorized);
resp.Headers.Location.ShouldBeNull();
}
/// <summary>Anonymous JSON GET of a protected route returns 401.</summary>
[Fact]
public async Task Root_anonymous_json_GET_returns_401()
{
var client = NewClientNoRedirect();
var req = new HttpRequestMessage(HttpMethod.Get, "/");
req.Headers.Accept.ParseAdd("application/json");
var resp = await client.SendAsync(req, Ct);
resp.StatusCode.ShouldBe(HttpStatusCode.Unauthorized);
}
```
### Step 3: Add the no-redirect client helper
Right next to the existing `NewClient()` method (line 82):
```csharp
/// <summary>Creates a TestServer-backed HttpClient that does NOT auto-follow redirects.
/// Used by challenge tests so we can assert on the 302 / Location directly.</summary>
private HttpClient NewClientNoRedirect() => new(_server.CreateHandler())
{
BaseAddress = _server.BaseAddress,
};
```
### Step 4: Run the tests
```bash
cd /Users/dohertj2/Desktop/OtOpcUa
dotnet test tests/Server/ZB.MOM.WW.OtOpcUa.Security.Tests/
```
Expected: existing 5 tests still pass + 3 new tests pass = 8+ total green.
**If `Root_anonymous_browser_GET_redirects_to_login` returns 200 instead of 302**: HttpClient is still auto-following redirects. Two fixes to try in order:
1. Confirm `NewClientNoRedirect` uses `_server.CreateHandler()` (not `CreateClient()`).
2. If still wrong, swap to: `var handler = new HttpClientHandler { AllowAutoRedirect = false };` — but TestServer doesn't expose HttpClientHandler directly. The `CreateHandler()` path SHOULD return a non-redirecting handler; if it doesn't, the implementation may need a `DelegatingHandler` wrapper.
**If `Root_anonymous_browser_GET_redirects_to_login` returns 401 instead of 302**: the cookie scheme isn't classifying `Accept: text/html` as a browser. Inspect Task 2's changes — `OnRedirectToLogin` may not have been fully removed, OR `LoginPath` was not set, OR an `Accept` parsing issue. Look at the response body — if it's empty + 401, the JwtBearer scheme or the override is still in play.
### Step 5: Commit
```bash
git -C /Users/dohertj2/Desktop/OtOpcUa add tests/Server/ZB.MOM.WW.OtOpcUa.Security.Tests/AuthEndpointsIntegrationTests.cs
git -C /Users/dohertj2/Desktop/OtOpcUa commit -m "test(security): add browser-vs-AJAX challenge tests for root path"
```
### Output report
- 3 new tests + 1 helper + modified InitializeAsync
- Build clean
- Test count: existing N + 3 new = N+3 green
- Commit SHA
- Anything unexpected (e.g. redirect-following behavior of `_server.CreateHandler()`)
### Self-review checklist
- [ ] `MapGet("/", ...).RequireAuthorization()` added inside `web.UseEndpoints(...)`
- [ ] `NewClientNoRedirect()` helper added
- [ ] 3 new `[Fact]` methods added with `TestContext.Current.CancellationToken` via the `Ct` property
- [ ] Each test asserts on the exact status + Location header (or absence)
- [ ] All tests green
- [ ] Existing 5 tests still pass
---
## Task 4 — Remove `Microsoft.AspNetCore.Authentication.JwtBearer` package reference
**Classification:** trivial
**Estimated implement time:** ~2 min
**Parallelizable with:** Task 3
**Files:**
- Modify: `src/Server/ZB.MOM.WW.OtOpcUa.Security/ZB.MOM.WW.OtOpcUa.Security.csproj` (delete one line)
- Verify: `Directory.Packages.props` — leave the `<PackageVersion Include="Microsoft.AspNetCore.Authentication.JwtBearer" ... />` entry in place (other projects may consume it).
**Implements design:** Section 2 "Package references" + Section 6 phase 4.
### Step 1: Confirm no remaining consumer in the Security project
```bash
grep -rn "Microsoft\.AspNetCore\.Authentication\.JwtBearer\|JwtBearer" \
/Users/dohertj2/Desktop/OtOpcUa/src/Server/ZB.MOM.WW.OtOpcUa.Security/ \
--include="*.cs"
```
Expected: zero matches. (Task 2 removed all uses.) If there are matches, STOP and report — Task 2 was incomplete.
### Step 2: Remove the PackageReference
In `src/Server/ZB.MOM.WW.OtOpcUa.Security/ZB.MOM.WW.OtOpcUa.Security.csproj`, find this line (currently around line 13):
```xml
<PackageReference Include="Microsoft.AspNetCore.Authentication.JwtBearer"/>
```
Delete it. **Keep** these:
```xml
<PackageReference Include="Microsoft.IdentityModel.Tokens"/>
<PackageReference Include="System.IdentityModel.Tokens.Jwt"/>
```
(`JwtTokenService` consumes those for `TokenValidationParameters` + JWT creation respectively — they're not from the JwtBearer authentication package.)
### Step 3: Check whether ANY other project still references the package
```bash
grep -rn "Microsoft\.AspNetCore\.Authentication\.JwtBearer" \
/Users/dohertj2/Desktop/OtOpcUa/src/ /Users/dohertj2/Desktop/OtOpcUa/tests/ \
--include="*.csproj"
```
If zero results: also remove the `<PackageVersion Include="Microsoft.AspNetCore.Authentication.JwtBearer" ...>` line from `Directory.Packages.props` (search for it). If one or more other projects still reference it, leave `Directory.Packages.props` alone.
### Step 4: Restore + build
```bash
cd /Users/dohertj2/Desktop/OtOpcUa
dotnet restore src/Server/ZB.MOM.WW.OtOpcUa.Security/
dotnet build src/Server/ZB.MOM.WW.OtOpcUa.Security/
dotnet build ZB.MOM.WW.OtOpcUa.slnx
```
Expected: 0 NEW errors. The known pre-existing 12 errors (OpcUaServer.Tests + Runtime.Tests + AbLegacy.Cli + S7.Cli) remain unchanged.
### Step 5: Commit
```bash
git -C /Users/dohertj2/Desktop/OtOpcUa add \
src/Server/ZB.MOM.WW.OtOpcUa.Security/ZB.MOM.WW.OtOpcUa.Security.csproj \
Directory.Packages.props # only if you also removed it from Directory.Packages.props
git -C /Users/dohertj2/Desktop/OtOpcUa commit -m "chore(security): drop Microsoft.AspNetCore.Authentication.JwtBearer (unused)"
```
If only the csproj changed: omit `Directory.Packages.props` from the add.
### Output report
- Was Directory.Packages.props also touched? Justify based on whether other projects still reference the package.
- Build clean (0 new errors)
- Commit SHA
### Self-review checklist
- [ ] Confirmed zero `Microsoft.AspNetCore.Authentication.JwtBearer` or `JwtBearer` matches in `src/Server/ZB.MOM.WW.OtOpcUa.Security/**/*.cs` before deletion
- [ ] PackageReference removed from Security.csproj
- [ ] `Microsoft.IdentityModel.Tokens` and `System.IdentityModel.Tokens.Jwt` kept
- [ ] Directory.Packages.props touched ONLY if no other project consumes the package
- [ ] Full solution build adds zero new errors
---
## Task 5 — Manual smoke + final commit
**Classification:** trivial
**Estimated implement time:** ~3 min
**Parallelizable with:** none
**Files:** none (verification + optional cleanup commit)
**Implements design:** Section 5 "Manual smoke" + Section 6 phase 5.
### Step 1: Restart the docker-dev cluster
The admin nodes need to pick up the new `Microsoft.AspNetCore.TestHost`-side code path AND the new cookie name. Since the in-cluster admin processes run a prior build, force a rebuild + recreate:
```bash
cd /Users/dohertj2/Desktop/OtOpcUa
docker compose -f docker-dev/docker-compose.yml up -d --build admin-a admin-b
```
Wait ~15 s for warm-up. Then:
```bash
docker compose -f docker-dev/docker-compose.yml ps admin-a admin-b
```
Both should show `Up` and `(healthy)` (or `Up` if no healthcheck).
### Step 2: curl smoke
```bash
# Anonymous browser-shaped GET → 302 to /login with ReturnUrl
curl -i -H "Accept: text/html" http://localhost:9200/ 2>&1 | head -12
# Expected: HTTP/1.1 302 Found, Location: /login?ReturnUrl=%2F
# Anonymous AJAX GET → 401
curl -i -H "X-Requested-With: XMLHttpRequest" http://localhost:9200/ 2>&1 | head -8
# Expected: HTTP/1.1 401 Unauthorized
# Anonymous JSON GET → 401
curl -i -H "Accept: application/json" http://localhost:9200/ 2>&1 | head -8
# Expected: HTTP/1.1 401 Unauthorized
# Login form → 302 with Set-Cookie ZB.MOM.WW.OtOpcUa.Auth
curl -i -X POST -d "username=alice&password=alice" \
-H "Content-Type: application/x-www-form-urlencoded" \
http://localhost:9200/auth/login 2>&1 | head -15
# Expected: HTTP/1.1 302 Found, Set-Cookie: ZB.MOM.WW.OtOpcUa.Auth=... (the test stub user may differ — check docker-compose's GLAuth seed for a valid LDAP creds pair)
```
### Step 3: Chrome smoke (via the macbook browser instance from earlier in the session)
1. Open `http://localhost:9200/` — should redirect to `/login?ReturnUrl=%2F` (not Chrome's error page)
2. Sign in via the form
3. DevTools → Application → Cookies → confirm cookie name is `ZB.MOM.WW.OtOpcUa.Auth`
4. Navigate to `http://localhost:9200/` again — should render the AdminUI dashboard
5. Click logout → confirm redirect back to `/login`
### Step 4: Optional CLAUDE.md update
If `CLAUDE.md` mentions the old `OtOpcUa.Auth` cookie name anywhere, update to the new `ZB.MOM.WW.OtOpcUa.Auth`. Run:
```bash
grep -n "OtOpcUa\.Auth" /Users/dohertj2/Desktop/OtOpcUa/CLAUDE.md
```
If matches: update them, otherwise skip.
### Step 5: Final commit (only if Step 4 changed CLAUDE.md)
```bash
git -C /Users/dohertj2/Desktop/OtOpcUa add CLAUDE.md
git -C /Users/dohertj2/Desktop/OtOpcUa commit -m "docs: update cookie name reference in CLAUDE.md"
```
### Output report
- All 4 curl smoke checks passed?
- Chrome smoke passed?
- CLAUDE.md changed?
- Final SHA on master (if any docs commit)
- Commit count since this plan started (vs `bc4fce5`)
### Self-review checklist
- [ ] `docker compose up -d --build admin-a admin-b` succeeded
- [ ] All 4 curl smoke checks return expected status codes
- [ ] Chrome smoke shows redirect to `/login`, then dashboard after auth
- [ ] Cookie name in DevTools matches `ZB.MOM.WW.OtOpcUa.Auth`
- [ ] No new commits left uncommitted in the working tree
---
## Verification gates (apply at end of every task)
- `dotnet build src/Server/ZB.MOM.WW.OtOpcUa.Security/` — 0 errors
- `dotnet test tests/Server/ZB.MOM.WW.OtOpcUa.Security.Tests/` — all green (existing + new)
- `dotnet build ZB.MOM.WW.OtOpcUa.slnx` — no NEW errors beyond the 12 pre-existing
- No untracked files staged accidentally (especially `sql_login.txt`, `pki/`, doc-fix artifacts)
---
## Risk hot-spots for reviewers
1. **TestServer's no-redirect HttpClient.** The plan assumes `new HttpClient(_server.CreateHandler()) { BaseAddress = _server.BaseAddress }` does NOT auto-follow redirects. If it does, the `Root_anonymous_browser_GET_redirects_to_login` test fails with 200 instead of 302. Fix path documented in Task 3 Step 4.
2. **Framework default of `Accept: */*` → 302.** Curl's default Accept header is `*/*`, which the framework classifies as browser → 302. Documented behavior, mirrors ScadaBridge; reviewers should not flag the smoke step that uses `Accept: text/html` as redundant — it's the explicit "browser" assertion.
3. **Cookie rename invalidates sessions.** The deploy effectively logs every currently-signed-in user out. Document in commit body; the cluster was just restarted on the new API key anyway, so the timing is opportune.
4. **`Directory.Packages.props` change is conditional.** Don't touch it if other projects still consume the JwtBearer package. Task 4 has explicit grep guard.
5. **`/Account/AccessDenied` 404.** Authenticated users hitting a `DriverOperator`-only route now get a generic 404 page instead of a clean access-denied message. Documented design choice; follow-up to add a Razor page if UX feedback demands it.
@@ -0,0 +1,11 @@
{
"planPath": "docs/plans/2026-05-29-auth-alignment-plan.md",
"tasks": [
{"id": 1, "subject": "Task 1: Extend OtOpcUaCookieOptions", "status": "pending"},
{"id": 2, "subject": "Task 2: Rewrite auth wiring + update cookie-name assertion", "status": "pending", "blockedBy": [1]},
{"id": 3, "subject": "Task 3: Add browser-vs-AJAX challenge tests", "status": "pending", "blockedBy": [2]},
{"id": 4, "subject": "Task 4: Remove JwtBearer package reference", "status": "pending", "blockedBy": [2]},
{"id": 5, "subject": "Task 5: Manual smoke + final commit", "status": "pending", "blockedBy": [3, 4]}
],
"lastUpdated": "2026-05-29T00:00:00Z"
}
+106
View File
@@ -0,0 +1,106 @@
# Alarms D.1 — smoke artifact
> **Status (2026-05-29): alarm-source leg VERIFIED. Historian-write leg still
> pending the Windows sidecar + live AVEVA Historian.**
>
> **Re-confirmed 2026-05-31** against the same gateway (`http://10.100.0.48:5120`):
> the Skip-gated live test passed again, pulling a native `Raise` transition
> (`Galaxy!TestArea.TestMachine_001.TestAlarm001`, raw sev 500 → OPC UA 750/High,
> category `TestArea`, operator comment `Test alarm #1`) through the production
> consumer. Independent re-run, not the original capture.
>
> This is the D.1 deliverable called for by `docs/plans/alarms-worker-wiring-plan.md`
> — captured evidence that a live Galaxy alarm reaches lmxopcua through the native
> gateway path (not the sub-attribute fallback). It supersedes the "A.2 blocked"
> banners in `alarms-over-gateway.md` / `alarms-worker-wiring-plan.md`, which were
> written 2026-04-30 before the gateway's alarm feed was working.
## What was verified
The mxaccessgw gateway **does** serve native MxAccess alarms today, and the lmxopcua
consumer ingests them with full fidelity — **including operator-comment**, the field
the 2026-04-30 plan flagged as "the only v1 regression."
Verified from the macOS dev box against the live gateway at `http://10.100.0.48:5120`
(reachable; `nc -z` succeeds). No acknowledge / no writes were issued — read-only
`StreamAlarms`.
### 1. Gateway boundary — raw `StreamAlarms` (`ZB.MOM.WW.MxGateway.Client`)
A standalone client streamed the active-alarm snapshot: **20 active alarms**, each
carrying native metadata. Sample (one of 20):
```json
{ "alarmFullReference": "Galaxy!TestArea.TestMachine_001.TestAlarm001",
"sourceObjectReference": "TestMachine_001.TestAlarm001",
"alarmTypeName": "DSC", "severity": 500,
"currentState": "ALARM_CONDITION_STATE_ACTIVE", "category": "TestArea",
"lastTransitionTimestamp": "2026-05-24T16:04:10.856Z",
"operatorComment": "Test alarm #1" }
```
Followed by the `SnapshotComplete` marker. `operatorComment`, `category`, `severity`,
`currentState`, and `lastTransitionTimestamp` are all populated.
### 2. lmxopcua consumer — `GatewayGalaxyAlarmFeed``GalaxyAlarmTransition`
The Skip-gated live test
`Runtime/GatewayGalaxyAlarmFeedLiveTests.Live_gateway_delivers_native_alarm_transitions_through_the_consumer`
wires the real `MxGatewayClient.StreamAlarmsAsync` into the production consumer seam
and **passes**. Captured output (`D1_SMOKE_OUT`):
```
# consumer transitions observed: 2+
Raise Galaxy!TestArea.TestMachine_001.TestAlarm001 | sev=750(High) raw=500 | cat=TestArea | comment='Test alarm #1' | xitionUtc=2026-05-24T16:04:10.856Z
Raise Galaxy!TestArea.TestMachine_003.TestAlarm001 | sev=750(High) raw=500 | cat=TestArea | comment='Test alarm #1' | xitionUtc=2026-05-07T18:14:00.594Z
```
The consumer preserves `operatorComment` + `category` + transition timestamp and
applies the OPC UA severity-bucket mapping (`MxAccessSeverityMapper`: raw 500 →
OPC UA 750, bucket `High`).
### 3. Full chain to the OPC UA Part 9 surface (code-path verified)
`GalaxyDriver.OnAlarmFeedTransition` maps `GalaxyAlarmTransition`
`AlarmEventArgs`, carrying `OperatorComment`, `OriginalRaiseTimestampUtc`,
`AlarmCategory`, and the severity bucket onto `IAlarmSource.OnAlarmEvent`.
`AlarmEventArgs` already declares those fields — so the **E.7 contract extension is
done**, not pending. The server's Part-9 condition layer consumes `IAlarmSource`
via `AlarmSurfaceInvoker``GenericDriverNodeManager`. Unit coverage:
`GalaxyDriverAlarmSourceTests`, `GatewayGalaxyAlarmFeedTests`.
## How to re-run
```bash
export MXGW_ENDPOINT="http://10.100.0.48:5120"
export GALAXY_MXGW_API_KEY="<dev key from docker-dev/docker-compose.yml>"
export D1_SMOKE_OUT="/tmp/d1-consumer-transitions.txt" # optional capture
dotnet test tests/Drivers/ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Tests \
--filter "FullyQualifiedName~GatewayGalaxyAlarmFeedLiveTests"
```
Without the env vars the test `Skip`s, so normal `dotnet test` runs are unaffected.
## Not covered here (still open)
1. **Scripted-alarm historian write-back → AVEVA Historian** (C.1's live leg). The
`SdkAlarmHistorianWriteBackend` (real `HistorianAccess.AddStreamedValue` path) is
implemented and unit-tested, but its `Live_*` write smoke needs the Windows
historian sidecar + a live AVEVA Historian — neither reachable from the macOS dev
box. Capture this leg on the Windows parity rig.
2. **Running-server → OPC UA A&C client round-trip.** This artifact proves the driver
consumer end; it does not exercise a full OtOpcUa server surfacing the condition to
an OPC UA client, because the docker-dev stack stubs the Galaxy driver on Linux
(`DriverInstanceActor.ShouldStub`). Capture on the Windows parity rig (or a Linux
host with `ShouldStub` overridden to point the real driver at the gateway).
## Mechanism — true MxAccess alarm-event support
The gateway delivers these alarms via **true MxAccess alarm-event support** in the
mxaccessgw .NET client — a real alarm-event subscription, **not** the value-driven
sub-attribute fallback. (Confirmed by the gateway maintainer; the client-side stream
check above can only observe the resulting feed, which is why this artifact records the
mechanism here rather than inferring it.) So A.2 is implemented as originally specified:
`MX_EVENT_FAMILY_ON_ALARM_TRANSITION` carries genuine native alarm-event metadata, and
the operator-comment / original-raise-time / category fields are first-class — not
reconstructed from attribute reads.
+33 -16
View File
@@ -9,24 +9,41 @@
> the new RPCs; the sub-attribute fallback path keeps Galaxy alarms
> functional today.
>
> ⚠️ **Worker-side native alarm subscription blocked on a dev-rig
> finding (2026-04-30):** the MXAccess COM Toolkit at
> **UPDATE 2026-05-29 — native alarm feed VERIFIED working; the
> 2026-04-30 "blocked" finding below is superseded.** A live
> `StreamAlarms` check against the gateway at `10.100.0.48:5120`
> returned the active-alarm snapshot (20 alarms) with full native
> metadata — `severity`, `category`, `currentState`,
> `lastTransitionTimestamp`, **and `operatorComment`** (the field the
> note below called "the only v1 regression"). The lmxopcua consumer
> (`GatewayGalaxyAlarmFeed``GalaxyAlarmTransition`
> `AlarmEventArgs``IAlarmSource`) ingests it with full fidelity and
> the OPC UA severity-bucket mapping applied — proven by the passing
> Skip-gated live test `GatewayGalaxyAlarmFeedLiveTests`. `AlarmEventArgs`
> already carries operator-comment / original-raise-time / category, so
> **E.7 is done too**. See `docs/plans/alarms-d1-smoke-artifact.md` for
> the captured evidence. The gateway delivers this via **true MxAccess
> alarm-event support** in the mxaccessgw .NET client (a real
> alarm-event subscription — **not** the sub-attribute fallback), so A.2
> is implemented as originally specified. Still open: the scripted-alarm
> → AVEVA Historian write-back live smoke (C.1's `Live_*` leg) and a full
> running-server → OPC UA A&C round-trip — both need the Windows parity rig.
>
> ⚠️ **[SUPERSEDED — kept for history] Worker-side native alarm
> subscription blocked on a dev-rig finding (2026-04-30):** the MXAccess
> COM Toolkit at
> `C:\Program Files (x86)\ArchestrA\Framework\Bin\ArchestrA.MXAccess.dll`
> exposes no alarm-event family — only `OnDataChange`,
> `OnWriteComplete`, `OperationComplete`, `OnBufferedDataChange`.
> exposed no alarm-event family — only `OnDataChange`,
> `OnWriteComplete`, `OperationComplete`, `OnBufferedDataChange` — and
> AVEVA's `aaAlarmManagedClient` / `ArchestrAAlarmsAndEvents.SDK`
> assemblies are x64-only and incompatible with the worker's x86
> bitness. **Operator decision needed before
> `MX_EVENT_FAMILY_ON_ALARM_TRANSITION` carries any events:** either
> accept the value-driven sub-attribute path as the production
> architecture (operator-comment fidelity is the only v1 regression)
> or add an x64 alarm-helper sub-process alongside the worker. See
> `src/MxGateway.Worker/MxAccess/MxAccessAlarmEventSink.cs` in the
> mxaccessgw repo for the architectural notes. Live
> `aahClientManaged` alarm-event write call site
> (`SdkAlarmHistorianWriteBackend` placeholder from PR C.1) and the
> D.1 smoke artifact ship once those decisions resolve. The
> remainder of this document is preserved as the design record.
> assemblies are x64-only vs. the worker's x86 bitness. The operator
> decision (accept the value-driven sub-attribute path, or add an x64
> alarm-helper sub-process) has since been resolved on the gateway side
> `MX_EVENT_FAMILY_ON_ALARM_TRANSITION` now carries events (verified
> above). The C.1 `SdkAlarmHistorianWriteBackend` is **no longer a
> placeholder** — it writes through the real
> `HistorianAccess.AddStreamedValue` path (only its live-rig write
> smoke remains).
Coordinated epic across two repos:
+27 -10
View File
@@ -1,5 +1,18 @@
# Alarms Worker Wiring Plan
> ✅ **UPDATE 2026-05-29 — the blocker below is RESOLVED on the gateway side; this
> plan is largely complete.** A live `StreamAlarms` check against `10.100.0.48:5120`
> returns the active-alarm snapshot with full native metadata **including
> `operatorComment`**, and the lmxopcua consumer ingests it end-to-end (passing live
> test `GatewayGalaxyAlarmFeedLiveTests`). So **A.2 / A.3 / A.4** are functionally done
> at the gateway boundary (the worker now emits native alarm transitions and the client
> exposes `AcknowledgeAlarm` / `QueryActiveAlarms` RPCs). **C.1** ships real code
> (`SdkAlarmHistorianWriteBackend``HistorianAccess.AddStreamedValue`). **D.1**'s
> alarm-source leg is captured in `docs/plans/alarms-d1-smoke-artifact.md`. Only two
> things remain, both needing the Windows parity rig: C.1's live historian-write smoke
> and a full running-server → OPC UA A&C round-trip. The per-item detail below is kept
> as the historical record of the original blocked state.
>
> **Context**: The alarms-over-gateway epic shipped 19 PRs across the
> `lmxopcua` and `mxaccessgw` repos (merged 2026-04-30). Contracts are live;
> the sub-attribute fallback path keeps Galaxy alarms functional today. Four
@@ -16,7 +29,7 @@
---
## Dev-rig finding that blocks everything (2026-04-30)
## Dev-rig finding that blocks everything (2026-04-30) — [SUPERSEDED 2026-05-29]
During PR A.2 work the following was discovered on the dev box:
@@ -318,16 +331,20 @@ fallback as production).
## Summary of blocks
| Item | Blocked by | Estimated effort once unblocked |
|------|-----------|--------------------------------|
| A.2 | Architectural decision (x64 alarm-helper vs. sub-attribute fallback as production) | 23 days implementation; 1 day tests |
| A.3 | A.2 delivering WorkerEvent bodies | 12 days |
| A.4 | A.2 (active-alarm query needs AlarmClient session) | 1 day |
| C.1 | aahClientManaged SDK access (available on dev box); NOT blocked by A.2 | 12 days |
| D.1 | A.2 + A.3 + C.1 all passing on parity rig | 0.5 day (smoke + artifact capture) |
> **Resolved as of 2026-05-29** — see the update banner at the top and
> `docs/plans/alarms-d1-smoke-artifact.md`. Original status table kept for history.
C.1 can proceed in parallel with A.2 / A.3 since the sidecar's `aahClientManaged`
is x64 and does not share the worker bitness constraint.
| Item | Status (2026-05-29) | Original block |
|------|--------------------|----------------|
| A.2 | ✅ **True MxAccess alarm-event support** in the gateway client (real alarm-event subscription, not the sub-attribute fallback); verified via live `StreamAlarms` with operator-comment fidelity | Architectural decision (x64 alarm-helper vs. sub-attribute fallback) |
| A.3 | ✅ Dispatch + `AcknowledgeAlarm` RPC present on the client surface | A.2 delivering WorkerEvent bodies |
| A.4 | ✅ `QueryActiveAlarms` RPC present on the client surface | A.2 (active-alarm query needs AlarmClient session) |
| C.1 | ✅ Code shipped (`AddStreamedValue` path); ⏳ live historian-write smoke needs the Windows rig | aahClientManaged SDK access |
| D.1 | ◑ Alarm-source leg captured (`alarms-d1-smoke-artifact.md`); ⏳ historian-write leg + full server→A&C round-trip need the Windows rig | A.2 + A.3 + C.1 all passing on parity rig |
The gateway delivers operator-comment fidelity through **true MxAccess alarm-event
support** in the mxaccessgw .NET client — a real alarm-event subscription, not the
value-driven sub-attribute path. The sub-attribute fallback is now legacy.
---
+22 -19
View File
@@ -1,5 +1,19 @@
# Security
> **v2 status (2026-05-26).** The four security concerns below are unchanged in v2.
> Paths + project names moved: `OtOpcUa.Server/Security/``OtOpcUa.Security/`
> (`Ldap/`, `Jwt/`, `Endpoints/AuthEndpoints.cs`), `OtOpcUa.Admin` is gone (its
> auth + role-grant pages live in `OtOpcUa.AdminUI`), and Admin auth policies
> register in `OtOpcUa.Host/Program.cs` via `AddOtOpcUaAuth` rather than in a
> separate Admin process. The v2 `Security:Jwt` section adds JWT bearer auth
> alongside the existing cookie scheme (`AddJwtBearer` wired via
> `IPostConfigureOptions<JwtBearerOptions>` in `OtOpcUa.Security`). DataProtection
> keys persist to the shared `ConfigDb.DataProtectionKeys` table so cookies
> survive failover between admin-role nodes.
>
> See `docs/plans/2026-05-26-akka-hosting-alignment-design.md` §5 for the v2
> auth + DataProtection rationale.
OtOpcUa has four independent security concerns. This document covers all four:
1. **Transport security** — OPC UA secure channel (signing, encryption, X.509 trust).
@@ -95,7 +109,7 @@ The Server accepts three OPC UA identity-token types:
| Token | Handler | Notes |
|---|---|---|
| Anonymous | `IUserAuthenticator.AuthenticateAsync(username: "", password: "")` | Refused in strict mode unless explicit anonymous grants exist; allowed in lax mode for backward compatibility. |
| UserName/Password | `LdapUserAuthenticator` (`src/Server/ZB.MOM.WW.OtOpcUa.Server/Security/LdapUserAuthenticator.cs`) | LDAP bind + group lookup; resolved `LdapGroups` flow into the session's identity bearer (`ILdapGroupsBearer`). |
| UserName/Password | `LdapOpcUaUserAuthenticator` (`src/Server/ZB.MOM.WW.OtOpcUa.Host/OpcUa/LdapOpcUaUserAuthenticator.cs`, backed by `LdapAuthService` at `src/Server/ZB.MOM.WW.OtOpcUa.Security/Ldap/LdapAuthService.cs`) | LDAP bind + group lookup; resolved `LdapGroups` flow into the session's identity bearer (`ILdapGroupsBearer`). |
| X.509 Certificate | Stack-level acceptance + role mapping via CN | X.509 identity carries `AuthenticatedUser` + read roles; finer-grain authorization happens through the data-plane ACLs. |
### LDAP bind flow (`LdapUserAuthenticator`)
@@ -207,20 +221,16 @@ The three Write tiers map to Galaxy's v1 `SecurityClassification` — `FreeAcces
`NodeScope` carries `(ClusterId, NamespaceId, AreaId, LineId, EquipmentId, TagId)`; any suffix may be null — a tag-level ACL is more specific than an area-level ACL but both contribute via union.
### Dispatch gate — `AuthorizationGate`
### Dispatch gate — `IPermissionEvaluator`
`src/Server/ZB.MOM.WW.OtOpcUa.Server/Security/AuthorizationGate.cs` bridges the OPC UA stack's `ISystemContext.UserIdentity` to the evaluator. `DriverNodeManager` holds exactly one reference to it and calls `IsAllowed(identity, OpcUaOperation.*, NodeScope)` on every Read, Write, HistoryRead, Browse, Subscribe, AckAlarm, Call path. A false return short-circuits the dispatch with `BadUserAccessDenied`.
`IPermissionEvaluator.Authorize(session, operation, scope)` (default impl `TriePermissionEvaluator` at `src/Core/ZB.MOM.WW.OtOpcUa.Core/Authorization/TriePermissionEvaluator.cs`) bridges the OPC UA stack's `ISystemContext.UserIdentity` to the trie. The dispatch path calls it on every Read, Write, HistoryRead, Browse, Subscribe, AckAlarm, Call. A non-allow decision short-circuits the dispatch with `BadUserAccessDenied`.
Key properties:
- **Driver-agnostic.** No driver-level code participates in authorization decisions. Drivers report `SecurityClassification` as metadata on tag discovery; everything else flows through `AuthorizationGate`.
- **Driver-agnostic.** No driver-level code participates in authorization decisions. Drivers report `SecurityClassification` as metadata on tag discovery; everything else flows through the evaluator.
- **Fail-open-during-transition.** `StrictMode = false` (default during ACL rollouts) lets sessions without resolved LDAP groups proceed; flip `Authorization:StrictMode = true` in production once ACLs are populated.
- **Evaluator stays pure.** `TriePermissionEvaluator` has no OPC UA stack dependency — it's tested directly from xUnit.
### Probe-this-permission (Admin UI)
`PermissionProbeService` (`src/Server/ZB.MOM.WW.OtOpcUa.Admin/Services/PermissionProbeService.cs`) lets an operator ask "if a user with groups X, Y, Z asked to do operation O on node N, would it succeed?" The answer is rendered in the AclsTab "Probe" dialog — same evaluator, same trie, so the Admin UI answer and the live Server answer cannot disagree.
### Full model
See [`docs/v2/acl-design.md`](v2/acl-design.md) for the complete design: trie invalidation, flag semantics, per-path override rules, and the reasoning behind additive-only (no Deny).
@@ -235,23 +245,16 @@ Per decision #150 control-plane roles are **deliberately independent of data-pla
### Roles
`src/Server/ZB.MOM.WW.OtOpcUa.Admin/Services/AdminRoles.cs`:
The `AdminRole` enum (`src/Core/ZB.MOM.WW.OtOpcUa.Configuration/Enums/AdminRole.cs`) defines:
| Role | Capabilities |
|---|---|
| `ConfigViewer` | Read-only access to drafts, generations, audit log, fleet status. |
| `ConfigEditor` | ConfigViewer plus draft editing (UNS, equipment, tags, ACLs, driver instances, reservations, CSV imports). Cannot publish. |
| `FleetAdmin` | ConfigEditor plus publish, cluster/node CRUD, credential management, role-grant management. |
| `FleetAdmin` | ConfigEditor plus publish, cluster/node CRUD, credential management, role-grant management. Also satisfies the `DriverOperator` authorization policy. |
| `DriverOperator` | May issue **Reconnect** and **Restart** commands against live driver instances from the Admin UI `DriverStatusPanel`. Gated by the `DriverOperator` named policy in `AddAuthorization` (`src/Server/ZB.MOM.WW.OtOpcUa.Security/ServiceCollectionExtensions.cs`). Map an LDAP group via `GroupToRole`, e.g. `"ot-driver-operator": "DriverOperator"`. |
Policies registered in Admin `Program.cs`:
```csharp
builder.Services.AddAuthorizationBuilder()
.AddPolicy("CanEdit", p => p.RequireRole(AdminRoles.ConfigEditor, AdminRoles.FleetAdmin))
.AddPolicy("CanPublish", p => p.RequireRole(AdminRoles.FleetAdmin));
```
Razor pages and API endpoints gate with `[Authorize(Policy = "CanEdit")]` / `"CanPublish"`; nav-menu sections hide via `<AuthorizeView>`.
In v2 the authentication + authorization stack is wired centrally by `AddOtOpcUaAuth` (`src/Server/ZB.MOM.WW.OtOpcUa.Security/ServiceCollectionExtensions.cs`) and Razor pages gate inline with the role names, e.g. `@attribute [Authorize(Roles = "FleetAdmin,ConfigEditor")]` on `Deployments.razor`. Nav-menu sections hide via `<AuthorizeView>`.
### Role grant source
+265
View File
@@ -0,0 +1,265 @@
# Admin UI rebuild plan (F15)
**Status:** UX kickoff — proposals to react to before any per-page rebuild starts.
**Last updated:** 2026-05-26 on `v2-akka-fuse`.
## Why this isn't a straight port
The v1 Admin UI was built around `ConfigGeneration` draft → publish:
operators edited a **draft** generation, the system computed a **diff** against the
last published one, and a manual **Publish** sealed it. Six full pages
(`DraftEditor`, `DiffViewer`, `DiffSection`, `Generations`, plus the per-tab
"viewing draft N" header) lived to make this workflow legible.
v2 replaces that with **live-edit + snapshot-deploy** (decisions #14a#14e on this
branch). Edits write directly to live tables guarded by `RowVersion`
concurrency; deploying is a single click that snapshots the current live state
and dispatches it via Akka. Drift between "current live" and "last sealed
deployment" surfaces as a one-line indicator on the
[Deployments](../../src/Server/ZB.MOM.WW.OtOpcUa.AdminUI/Components/Pages/Deployments.razor)
page.
That collapses **six pages → zero** before we ship a line of new Razor. The
remaining ~41 legacy pages map to ~30 v2 pages once redundant fleet-wide views
fold into their cluster-tab equivalents.
## Inventory: 47 legacy pages → v2 disposition
Source: `git show 76310b8^ -- 'src/Server/ZB.MOM.WW.OtOpcUa.Admin/**/*.razor'`.
### Site shell (5 files) — port
| Legacy | v2 status | Notes |
|---|---|---|
| `App.razor`, `Routes.razor`, `_Imports.razor` | Port | Boilerplate; minor render-mode tweaks |
| `Layout/MainLayout.razor` | ✅ Already in v2 | Done in Task 48 |
| `Components/Pages/Login.razor`, `Account.razor` | Port | Auth endpoints changed (cookie+JWT hybrid, Task 26); login form posts to `/auth/login` now |
### Shared widgets (5 files) — port
| Legacy | v2 status |
|---|---|
| `StatusBadge.razor` | ✅ Already in v2 |
| `LoadingSpinner.razor` | ✅ Already in v2 |
| `ToastNotification.razor` | ✅ Already in v2 |
| `ClusterAuthorizeView.razor`, `RedirectToLogin.razor` | Port — adjust for v2 `IUserAuthenticator` |
### Fleet (1 file) — reshape
| Legacy | v2 strategy |
|---|---|
| `Fleet.razor` | **Reshape.** Drop the v1 "poller reads central DB" data source. v2 reads `NodeDeploymentState` (Applied / Failed / Stale per node) + subscribes to `FleetStatusHub` for live `ServiceLevel` updates (already wired in F16) + queries `IFleetDiagnosticsClient.GetDiagnostics` (F17) for per-node driver health. Single page, similar shape to v1. |
### Cluster CRUD (3 files) — port
| Legacy | v2 strategy |
|---|---|
| `ClustersList.razor` | Port |
| `NewCluster.razor` | Port |
| `ClusterDetail.razor` | **Port — drop draft/publish chrome.** No "New draft" button; no "current published" sidebar. Replace with "Last deployed" badge + a "Deploy" button (already a SignalR-aware widget on the Deployments page; this becomes a cluster-scoped variant). |
### Draft/publish workflow (4 files) — **drop entirely**
| Legacy | v2 strategy |
|---|---|
| `DraftEditor.razor` | **Drop.** No drafts in v2. |
| `DiffViewer.razor` | **Drop.** Drift indicator replaces it on Deployments page. |
| `DiffSection.razor` | **Drop.** |
| `Generations.razor` | **Drop — replaced by `Deployments.razor`** (already shipped in v2 ahead of F15). |
### Cluster tabs (11 files) — port as live-edit forms
Each becomes a live-edit surface: load the entity, bind to a form, save with
`RowVersion` concurrency check (409 on conflict → toast + reload). No "viewing
draft N" header; no per-tab snapshot view.
| Legacy tab | v2 strategy |
|---|---|
| `EquipmentTab.razor` | Port — UNS path tree picker stays |
| `UnsTab.razor` | Port — same |
| `NamespacesTab.razor` | Port |
| `DriversTab.razor` | Port — **driver-type-specific editors are a separate question (see below)** |
| `TagsTab.razor` | Port |
| `AclsTab.razor` | Port — wire to v2 LDAP group → role mapping (see RoleGrants question) |
| `RedundancyTab.razor` | Port — surface v2 `ServiceLevel` calc (Task 35) instead of v1 redundancy state machine |
| `ScriptedAlarmsTab.razor` | Port |
| `ScriptsTab.razor` | Port |
| `VirtualTagsTab.razor` | Port |
| `AuditTab.razor` | Port — wire to v2 `ConfigAuditLog` (post-F3 schema: `EventId`, `CorrelationId` columns) |
### Cluster-scoped editors (3 files) — port as reusable inputs
| Legacy | v2 strategy |
|---|---|
| `IdentificationFields.razor` | Port |
| `ImportEquipment.razor` | Port |
| `ScriptEditor.razor` | Port |
### Cross-cluster pages (8 files) — mixed
| Legacy | v2 strategy |
|---|---|
| `Hosts.razor` | Port — reshape to "Akka cluster members" (showing `host:port` NodeIds, roles, redundancy state) |
| `Certificates.razor` | Port — F13a's `PkiStoreRoot` becomes the data source |
| `Reservations.razor` | Port |
| `RoleGrants.razor` | **Reshape.** v1 was cluster-scoped role grants; v2 uses LDAP group → role mapping (see Q4 below) |
| `AlarmsHistorian.razor` | Port — wire to F11's `HistorianAdapterActor.GetStatus` (queue depth + drain state) |
| `ScriptLog.razor` | Port — needs SignalR hub bridge (F16 deferred ScriptLogHub) |
| `ScriptedAlarms.razor` (top-level) | **Possibly drop** (see Q2 below) |
| `VirtualTags.razor` (top-level) | **Possibly drop** (see Q2 below) |
### Driver-typed editors (5 files) — sequencing decision needed
| Legacy | v2 strategy |
|---|---|
| `Drivers/FocasDetail.razor` | Defer — JSON editor in `DriversTab` covers the same config initially |
| `Modbus/ModbusOptionsEditor.razor` | Same |
| `Modbus/ModbusAddressEditor.razor` | Same |
| `Modbus/ModbusAddressPreview.razor` | Same |
| `Modbus/ModbusDiagnostics.razor` | Port — separate from the config editor, this is operational telemetry |
### Account (1 file) — port
| Legacy | v2 strategy |
|---|---|
| `Account.razor` | Port — minor reshape for JWT (token expiry UI, refresh button) |
## Summary by disposition
| Disposition | Count |
|---|---|
| Already in v2 | 5 |
| Port as-is | 22 |
| Port + reshape | 7 |
| **Drop (replaced by live-edit / Deployments page)** | **5** |
| Drop (redundant with cluster tab) | 2 (pending Q2) |
| Defer (driver-typed editors) | 4 |
| **Total active rebuild** | ~30 pages |
## Open design questions
These need answers before per-page sequencing starts. They drive how many
phases the rebuild takes and what gets cut.
### Q1 — Driver-typed editors: ship now or defer?
**Context.** v1 had typed editors for Modbus + FOCAS driver config. They sat
behind a generic JSON editor for the other six driver types. The typed editors
caught operator typos that the JSON editor missed (port ranges, slave-ID
collisions, address-map overlaps).
**Options.**
- **Defer all typed editors.** Ship `DriversTab` with a JSON editor first; add
typed editors per-driver as field requests come in. Saves ~1 day on F15.
- **Port the existing two.** Modbus + FOCAS were already validated against
field use. The other six driver types stay JSON-only.
- **Ship all eight typed editors.** Most work, best UX. ~3 extra days on F15.
**Recommendation:** Defer. The OPC UA dual-endpoint tests + driver
engine wiring (F7-F10) are higher-leverage and need attention first.
### Q2 — Top-level `ScriptedAlarms.razor` and `VirtualTags.razor`: keep or drop?
**Context.** In v1, these were fleet-wide views of every scripted alarm and
virtual tag across every cluster. The cluster tabs let you edit them; the
top-level pages let you find them across clusters.
**Options.**
- **Drop.** Fleet-wide view is rare; cluster scope covers 95% of use.
- **Keep as read-only.** Cross-cluster search + drill-down to the per-cluster tab.
**Recommendation:** Drop, but expose a global search on the top nav that
matches cluster + alarm/tag names if operators ask.
### Q3 — ClusterDetail: 10 tabs or split routes?
**Context.** v1 had 10 nav-tabs inside `ClusterDetail.razor`. Some are very
heavy (Tags can be 10k rows; AuditTab streams). All 10 share render state.
**Options.**
- **Keep tabs.** Familiar; one URL per cluster.
- **Split into routes.** `/clusters/{id}/equipment`, `/clusters/{id}/tags`,
etc. Better deep-linking, better load (one tab's data per page), easier auth
scoping.
**Recommendation:** Split into routes. The v1 monolith was already groaning
under the live-update SignalR fan-in; routes let each surface manage its own
subscription lifecycle.
### Q4 — RoleGrants: cluster-scoped table or LDAP group → role map?
**Context.** v1 had a per-cluster `RoleGrants` table where you mapped users to
cluster-scoped roles (ClusterAdmin, ClusterOperator, etc.). v2 introduced
LDAP-driven auth: LDAP group membership maps to OPC UA permissions
(`ReadOnly`, `WriteOperate`, `WriteTune`, `WriteConfigure`, `AlarmAck`)
fleet-wide.
**Options.**
- **Keep v1 model.** Cluster-scoped grants survive; LDAP just provides the
username.
- **Replace with fleet-wide LDAP-group → role mapping.** v2's `LdapOptions`
already has a `GroupToRole` dictionary; surface that in a single fleet-level
page.
- **Both.** LDAP map for fleet-wide defaults; per-cluster overrides for
scoping.
**Recommendation:** Fleet-wide LDAP-group → role map only. Per-cluster scoping
adds combinatorial complexity that v2's redundancy model doesn't need
(every driver-role node runs every driver in the fleet).
### Q5 — Login UI: backed by `/auth/login` (cookie+JWT hybrid) — what about LDAP error UX?
**Context.** v2's `/auth/login` does an LDAP bind. Failures come back as
specific reasons (invalid creds vs. service-account misconfig vs. server
unreachable). The default behavior is to lump them all into "Login failed."
**Options.**
- **Generic "Login failed."** Safer; doesn't leak whether the username exists.
- **Specific error categories.** Helps operators diagnose deploy issues.
**Recommendation:** Generic for production deployments, specific when
`Authentication:Ldap:AllowInsecureLdap=true` (dev mode signal).
## Proposed sequencing (4 phases)
Each phase is independently mergeable. The branch ships when Phase A is in;
Phases BD can follow as smaller PRs.
### Phase A — Shell + auth + fleet (minimum-viable Admin)
~½–1 day. Ships a working admin surface with no config editing.
- Port `App.razor`, `Routes.razor`, `_Imports.razor`
- Port `Login.razor` (post Q5)
- Port `Account.razor`
- Reshape `Fleet.razor` against v2 data sources
- Port `Hosts.razor` reshape
### Phase B — Cluster CRUD + Overview/Redundancy tabs
~1 day. Adds cluster browse + readonly redundancy view.
- Port `ClustersList`, `NewCluster`, `ClusterDetail` (Overview tab only)
- Port `RedundancyTab` (read-only — surfaces v2 `ServiceLevel`)
- Split into routes if Q3 = split
### Phase C — Config editor tabs
~2 days. The big chunk — the live-edit config surface.
- `EquipmentTab`, `UnsTab`, `NamespacesTab`
- `DriversTab` (JSON-only initially per Q1)
- `TagsTab`
- `AclsTab` post Q4 reshape
- `ImportEquipment`, `IdentificationFields`
### Phase D — Logic + ops pages
~1 day.
- `VirtualTagsTab`, `ScriptedAlarmsTab`, `ScriptsTab`, `ScriptEditor`
- `AuditTab` against new ConfigAuditLog schema
- `RoleGrants` post Q4 reshape
- `Certificates`
- `Reservations`
- `AlarmsHistorian`, `ScriptLog` (depends on F16 ScriptLogHub deferred)
## Out of scope for F15
- Typed driver editors (Q1, deferred unless reversed)
- Top-level fleet-wide ScriptedAlarms / VirtualTags pages (Q2, recommended drop)
- Per-cluster RoleGrants (Q4, recommended drop)
- ScriptLogHub SignalR bridge (F16 deferred — only needed for Phase D's
ScriptLog page; can move to a separate F16-extension follow-up)
+128
View File
@@ -0,0 +1,128 @@
# OtOpcUa v2 Architecture
Single-page tour of the v2 layout. For decision history + tradeoffs, see [`2026-05-26-akka-hosting-alignment-design.md`](../plans/2026-05-26-akka-hosting-alignment-design.md).
## Big picture
```
┌─────────────────────────────────────────────┐
│ OtOpcUa.Host │ (fused binary)
│ │
│ reads OTOPCUA_ROLES env, mounts: │
│ ┌─────────────────────────────────────┐ │
│ │ admin → Blazor + auth + control- │ │
│ │ plane singletons │ │
│ │ driver → OPC UA endpoint + │ │
│ │ per-node actors │ │
│ └─────────────────────────────────────┘ │
└─────────────────────────────────────────────┘
│ joins
┌─────────────────────────────────────────────┐
│ Akka.NET cluster │
│ (split-brain resolver: keep-oldest, 15s) │
└─────────────────────────────────────────────┘
shared by every node: ┌─────────────────┐
│ ConfigDb (SQL) │ live-edit + Deployment artifacts + audit
└─────────────────┘
```
The v1 setup was two separate Windows services (`OtOpcUa.Server` + `OtOpcUa.Admin`) talking through the DB. v2 collapses them into one binary with role gating, and adds an Akka cluster so admin singletons can drive deploys and the redundancy story is automatic.
## Project layout
```
src/Core/ shared abstractions, no Server deps
ZB.MOM.WW.OtOpcUa.Commons types + Akka message contracts + interfaces
ZB.MOM.WW.OtOpcUa.Cluster HOCON, AkkaClusterOptions, IClusterRoleInfo
ZB.MOM.WW.OtOpcUa.Configuration EF Core DbContext + entities
src/Server/ server-side projects
ZB.MOM.WW.OtOpcUa.Security cookie+JWT auth, LDAP, JwtTokenService
ZB.MOM.WW.OtOpcUa.ControlPlane admin-role cluster singletons
ZB.MOM.WW.OtOpcUa.Runtime driver-role per-node actors
ZB.MOM.WW.OtOpcUa.OpcUaServer OPC UA endpoint facade + Phase7Composer
ZB.MOM.WW.OtOpcUa.AdminUI Blazor Razor class library
ZB.MOM.WW.OtOpcUa.Host fused binary (Program.cs)
```
| Project | Role | Doc |
|---|---|---|
| Cluster | Bootstrap + cluster topology view | [Cluster.md](Cluster.md) |
| ControlPlane | Admin singletons (deploy, audit, fleet, redundancy) | [ControlPlane.md](ControlPlane.md) |
| Runtime | Driver-role actor tree | [Runtime.md](Runtime.md) |
| Security | Cookie+JWT auth, LDAP, /auth/* endpoints | [../security.md](../security.md) |
| OpcUaServer | OPC UA endpoint host + composer | [../OpcUaServer.md](../OpcUaServer.md) |
| Host | Role-gated DI graph + Program.cs | [../ServiceHosting.md](../ServiceHosting.md) |
## Role gating
`Program.cs` reads `OTOPCUA_ROLES` once (per process) and decides what to wire:
```csharp
var roles = RoleParser.Parse(Environment.GetEnvironmentVariable("OTOPCUA_ROLES"));
var hasAdmin = roles.Contains("admin");
var hasDriver = roles.Contains("driver");
builder.Services.AddOtOpcUaConfigDb(builder.Configuration);
builder.Services.AddOtOpcUaCluster(builder.Configuration);
builder.Services.AddAkka("otopcua", (ab, sp) =>
{
ab.WithOtOpcUaClusterBootstrap(sp); // HOCON + remote + cluster options
if (hasAdmin) ab.WithOtOpcUaControlPlaneSingletons();
if (hasDriver) ab.WithOtOpcUaRuntimeActors();
});
if (hasAdmin)
{
builder.Services.AddOtOpcUaAuth(builder.Configuration);
builder.Services.AddAdminUI();
// SignalR, AdminOpsClient, etc.
}
builder.Services.AddOtOpcUaHealth();
```
There is a **single** ActorSystem. Cluster singletons + per-node actors share it via the `Akka.Hosting` registry. This was a v2 fix (the initial Phase 9 wiring ran two ActorSystems by mistake; see commit `d6fac2d`).
## Live-edit vs draft/publish
v1 had `ConfigGeneration(Draft|Published)` with every live-edit entity FK'd to a generation. Edits accumulated in a Draft until Publish promoted them.
v2 removes that entirely:
- No `ConfigGeneration` table, no `GenerationId` columns.
- Every live-edit entity has a `RowVersion` (`IsRowVersion()`) for last-write-wins.
- Audit goes to `ConfigEdit` (per-row delta) and `ConfigAuditLog` (event-level).
- Deploys snapshot the *current* DB state into an immutable `Deployment.ArtifactBlob` + its `RevisionHash`. That artifact is what driver nodes apply.
See [ControlPlane.md § Deploy flow](ControlPlane.md#deploy-flow) for the end-to-end dispatch + ACK + seal sequence.
## NodeId
Each cluster member has a `NodeId` derived as `{PublicHostname}:{Port}` of the Akka remote endpoint. `ClusterRoleInfo.LocalNode` + `ConfigPublishCoordinator.DiscoverDriverNodes()` use the same formula so they always agree. The port suffix makes loopback test deployments distinguishable (commit `5cfbe8b`); in production the hostname alone is already unique.
## Health endpoints
| Path | Returns 200 when… |
|---|---|
| `/healthz` | Process is alive (no checks). |
| `/health/ready` | DB reachable + this node is `Up` in the cluster. |
| `/health/active` | This node is the admin role-leader (used by Traefik/HA-LB to pin traffic). |
## What lives where (quick map)
| Concern | Project | Entry point |
|---|---|---|
| Read OTOPCUA_ROLES | `Cluster.RoleParser` | static `Parse(string?)` |
| Cluster lifecycle | `Cluster.WithOtOpcUaClusterBootstrap` | extension on `AkkaConfigurationBuilder` |
| Local node identity | `Cluster.IClusterRoleInfo.LocalNode` | DI singleton |
| Admin singletons | `ControlPlane.WithOtOpcUaControlPlaneSingletons` | extension on `AkkaConfigurationBuilder` |
| Driver actors | `Runtime.WithOtOpcUaRuntimeActors` | extension on `AkkaConfigurationBuilder` |
| Auth pipeline | `Security.AddOtOpcUaAuth` + `MapOtOpcUaAuth` | extensions on `IServiceCollection` / `IEndpointRouteBuilder` |
| OPC UA facade | `OpcUaServer.OpcUaApplicationHost` | runtime host, started by driver-role startup |
| Partner-URI advertising | `OpcUaServer.OpcUaApplicationHost.PopulateServerArray` | runs after `_application.Start`, appends `PeerApplicationUris` to the SDK `ServerUris` `StringTable` so `Server.ServerArray` (i=2254) returns self + peers |
| Health endpoints | `Host.Health.AddOtOpcUaHealth` + `MapOtOpcUaHealth` | extensions on `IServiceCollection` / `IEndpointRouteBuilder` |
+104
View File
@@ -0,0 +1,104 @@
# OtOpcUa.Cluster
Akka.NET cluster bootstrap + topology view. Used by every other server-side project to talk to the live cluster.
Path: `src/Core/ZB.MOM.WW.OtOpcUa.Cluster/`
## Public surface
| Type | Role |
|---|---|
| `AkkaClusterOptions` | DI-bound options from `appsettings.json::Cluster`. Hostname/Port/PublicHostname/SeedNodes/Roles. |
| `IClusterRoleInfo` (interface in Commons) | Live view of cluster membership + role-leader topology. Thread-safe + event-raising. |
| `ClusterRoleInfo` | Implementation. Subscribes to `ClusterEvent.IMemberEvent` + `RoleLeaderChanged` + `LeaderChanged`. |
| `HoconLoader.LoadBaseConfig()` | Reads the embedded `Resources/akka.conf`. |
| `RoleParser.Parse(string?)` | Parses `OTOPCUA_ROLES` env var into a deduped `string[]`. |
| `ServiceCollectionExtensions.AddOtOpcUaCluster(configuration)` | Binds options + registers `IClusterRoleInfo` singleton. **Does not** start an ActorSystem. |
| `WithOtOpcUaClusterBootstrap(serviceProvider)` | Extension on `AkkaConfigurationBuilder`. Loads embedded HOCON + applies `WithRemoting(...)` + `WithClustering(...)` from options. |
## Bootstrap flow
```csharp
// Program.cs
builder.Services.AddOtOpcUaCluster(builder.Configuration);
builder.Services.AddAkka("otopcua", (ab, sp) =>
{
ab.WithOtOpcUaClusterBootstrap(sp); // HOCON + remote + cluster
// …singletons + node actors layered on
});
```
Order matters: `AddOtOpcUaCluster` must come before `AddAkka` so the options binding has run by the time the `AddAkka` lambda fires. Inside the lambda, `WithOtOpcUaClusterBootstrap` resolves `IOptions<AkkaClusterOptions>` from `sp` and writes them into the Akka builder.
The single ActorSystem this produces is what every other v2 piece runs on. There is no second Akka instance — that was a Phase 9 bug (commit `d6fac2d` consolidated).
## Embedded HOCON
`src/Core/ZB.MOM.WW.OtOpcUa.Cluster/Resources/akka.conf` contains:
| Setting | Value | Why |
|---|---|---|
| `akka.actor.provider` | `cluster` | Required for `Cluster.Get(system)` to work. |
| `akka.cluster.split-brain-resolver.active-strategy` | `keep-oldest` | Smaller/younger side downs itself on partition. |
| `akka.cluster.split-brain-resolver.stable-after` | `15s` | Time before SBR acts. |
| `akka.cluster.failure-detector.threshold` | `10.0` | Higher than default (8.0) for GC-pause tolerance. |
| `opcua-synchronized-dispatcher.type` | `PinnedDispatcher` | Dedicated thread for `OpcUaPublishActor` so SDK calls stay marshalled. |
The Cluster.Tests project verifies these key values stay correct (`HoconLoaderTests`).
## Configuration
```json
{
"Cluster": {
"Hostname": "0.0.0.0",
"Port": 4053,
"PublicHostname": "node-a.lan",
"SeedNodes": ["akka.tcp://otopcua@node-a.lan:4053"],
"Roles": ["admin", "driver"]
}
}
```
- `Hostname`: interface to bind. `0.0.0.0` listens on every interface.
- `Port`: TCP port for cluster gossip. Default 4053.
- `PublicHostname`: address advertised in cluster gossip. Must be reachable by every other node.
- `SeedNodes`: where new nodes go to join. List one (or two) stable nodes. First node bootstraps the cluster from its own address.
- `Roles`: free-form tags Akka gossip propagates. v2 uses `admin` + `driver`; per-role wiring in `Program.cs` reads `OTOPCUA_ROLES` env var, not this list — these two should stay in sync.
Per-role overlay files (`appsettings.admin.json`, `appsettings.driver.json`, `appsettings.admin-driver.json`) layer on top of base `appsettings.json` based on the parsed `OTOPCUA_ROLES` (alphabetical, joined by `-`). See [ServiceHosting.md § Per-role configuration overlays](../ServiceHosting.md#per-role-configuration-overlays).
## IClusterRoleInfo
Anywhere in the host that needs the local node's identity or a view of who-else-is-in-the-cluster, inject `IClusterRoleInfo`:
```csharp
public sealed class MyService(IClusterRoleInfo cluster)
{
public NodeId Self => cluster.LocalNode;
public IReadOnlyList<NodeId> Drivers => cluster.MembersWithRole("driver");
public NodeId? AdminLeader => cluster.RoleLeader("admin");
public MyService(IClusterRoleInfo cluster)
{
cluster.RoleLeaderChanged += (_, e) =>
Console.WriteLine($"role={e.Role}: {e.PreviousLeader} → {e.NewLeader}");
}
}
```
`LocalNode` is `{PublicHostname}:{Port}` (the port suffix lets loopback test deployments stay distinct; production hostnames are already unique). `ConfigPublishCoordinator` uses the same `{host}:{port}` formula so the expected-ack set and the driver self-identification agree (commit `5cfbe8b`).
## Lifecycle
Akka.Hosting owns the lifecycle: `IHostedService` starts the ActorSystem at host start, runs `CoordinatedShutdown.ClusterLeavingReason` on host stop. The Cluster project does not register its own `IHostedService` (the v1 `AkkaHostedService` was deleted in commit `d6fac2d`).
## Tests
`tests/Core/ZB.MOM.WW.OtOpcUa.Cluster.Tests/` covers:
- `HoconLoaderTests` — embedded resource loads + key settings parse correctly.
- `RoleParserTests` — comma-split + dedup + trim semantics.
Cross-project integration is in `tests/Server/ZB.MOM.WW.OtOpcUa.Host.IntegrationTests/` (cluster formation, deploy round-trip).
+99
View File
@@ -0,0 +1,99 @@
# OtOpcUa.ControlPlane
Five admin-role cluster singletons that drive the v2 deploy, audit, fleet, and redundancy stories. Path: `src/Server/ZB.MOM.WW.OtOpcUa.ControlPlane/`.
## Singletons
| Actor | File | Marker key | Role |
|---|---|---|---|
| `ConfigPublishCoordinator` | `Coordinators/ConfigPublishCoordinator.cs` | `ConfigPublishCoordinatorKey` | Dispatches `DispatchDeployment`, collects `ApplyAck`s, seals/fails/times-out. |
| `AdminOperationsActor` | `AdminOperations/AdminOperationsActor.cs` | `AdminOperationsActorKey` | Receives `StartDeployment` from the UI, snapshots ConfigDb via `ConfigComposer`, persists `Deployment` row + `ConfigEdit` marker, tells the coordinator to dispatch. |
| `AuditWriterActor` | `Audit/AuditWriterActor.cs` | `AuditWriterActorKey` | Batched `ConfigAuditLog` writer. Flushes every 500 events or 5 s. In-buffer dedup; cross-restart dedup tracked as F3. |
| `FleetStatusBroadcaster` | `Fleet/FleetStatusBroadcaster.cs` | `FleetStatusBroadcasterKey` | Aggregates per-node `FleetNodeStatus` heartbeats; publishes `FleetStatusChanged` on the `fleet-status` DPS topic (SignalR bridge tracked as F16). |
| `RedundancyStateActor` | `Redundancy/RedundancyStateActor.cs` | `RedundancyStateActorKey` | Cluster-event subscriber; debounces 250 ms; publishes `RedundancyStateChanged` on the `redundancy-state` DPS topic. |
All five register via `WithOtOpcUaControlPlaneSingletons()` (extension on `AkkaConfigurationBuilder`). Each uses `ClusterSingletonOptions { Role = "admin" }` so the singleton runs on the admin role-leader and migrates to the next admin node on failover.
```csharp
// Program.cs (admin role only)
builder.Services.AddAkka("otopcua", (ab, sp) =>
{
ab.WithOtOpcUaClusterBootstrap(sp);
if (hasAdmin) ab.WithOtOpcUaControlPlaneSingletons();
if (hasDriver) ab.WithOtOpcUaRuntimeActors();
});
```
Resolve from anywhere via `IRequiredActor<T>` or the `ActorRegistry`:
```csharp
public sealed class AdminOperationsClient(ActorRegistry registry) : IAdminOperationsClient
{
private readonly IActorRef _proxy = registry.Get<AdminOperationsActorKey>();
// ...
}
```
## Deploy flow
```
UI → IAdminOperationsClient.StartDeploymentAsync(createdBy)
│ Ask the AdminOperationsActor singleton proxy
AdminOperationsActor
│ ConfigComposer.SnapshotAndFlattenAsync(db) → ConfigArtifact(blob, revHash)
│ insert Deployment(Dispatching) + ConfigEdit marker
│ Tell coordinator → DispatchDeployment
ConfigPublishCoordinator
│ DiscoverDriverNodes() → expected ACK set (host:port per member)
│ insert NodeDeploymentState(Applying) per driver
│ Publish DispatchDeployment on "deployments" topic
│ Start apply-deadline timer (2 min default)
▼ DistributedPubSub
DriverHostActor (on each driver node — subscribed to "deployments")
│ PreStart subscribed; current state Steady(rev)
│ if currentRev == msg.rev → immediate ApplyAck(Applied) (idempotent)
│ else Become(Applying) → write NodeDeploymentStatus → ApplyAck
▼ via "deployment-acks" topic
ConfigPublishCoordinator (subscribed to "deployment-acks" in PreStart)
│ PersistNodeAck + collect
│ all-Applied → Sealed
│ any-Failed → PartiallyFailed
│ deadline → TimedOut
```
The dedicated `deployment-acks` topic + coordinator subscription was added in commit `5cfbe8b`. Before that, ACKs were published back on `deployments` and the coordinator (not subscribed) silently dropped them — deployments hung at `AwaitingApplyAcks` forever in multi-node tests.
### Failover recovery
If the admin singleton fails over mid-deploy, the new instance's `PreStart` queries `NodeDeploymentState` for any `Dispatching`/`AwaitingApplyAcks` row, rebuilds `_expectedAcks` + `_acks` from persisted state, and resumes the deadline timer. See `Coordinators/ConfigPublishCoordinator.cs::PreStart`.
## ConfigComposer
Pure function `SnapshotAndFlattenAsync(db) → ConfigArtifact(byte[], string)`:
1. Reads every live-edit table.
2. Serialises to a stable byte[] (deterministic ordering).
3. Computes SHA-256 over the bytes → 64-hex `RevisionHash`.
Same DB state → same artifact + same hash. That's what makes the `NoChanges` outcome work (AdminOperations compares the proposed hash to the last sealed deployment's hash).
## ServiceLevelCalculator
Pure function exposed at `Redundancy/ServiceLevelCalculator.Compute(NodeHealthInputs)`. Returns the OPC UA `ServiceLevel` byte per the truth table in [Redundancy.md](../Redundancy.md#servicelevel-tiers-part-5-65). No side effects; trivially unit-testable.
## DPS topics
| Topic | Publisher | Subscribers |
|---|---|---|
| `deployments` | ConfigPublishCoordinator | DriverHostActor (per-node) |
| `deployment-acks` | DriverHostActor | ConfigPublishCoordinator |
| `fleet-status` | FleetStatusBroadcaster | (SignalR bridge — F16) |
| `redundancy-state` | RedundancyStateActor | (per-node ServiceLevel calc — F10) |
## Tests
`tests/Server/ZB.MOM.WW.OtOpcUa.ControlPlane.Tests/` — 29 tests covering coordinator (happy path, timeout, failover recovery), AdminOps (StartDeployment outcomes), AuditWriter (batching, dedup), FleetStatusBroadcaster (heartbeat staleness), RedundancyStateActor (debounce, snapshot), ConfigComposer (purity), ServiceLevelCalculator (truth table).
Multi-node tests (cross-ActorSystem) are in `tests/Server/ZB.MOM.WW.OtOpcUa.Host.IntegrationTests/`.
+126
View File
@@ -0,0 +1,126 @@
# OtOpcUa.Runtime
Driver-role actor tree — one set per node. Path: `src/Server/ZB.MOM.WW.OtOpcUa.Runtime/`.
## Actor tree
```
DriverHostActor (per node)
│ state machine: Steady ⇄ Applying ⇄ Stale
├──▶ DriverInstanceActor (per configured DriverInstance row)
│ state: Connecting → Connected → Reconnecting (or Stubbed)
├──▶ VirtualTagActor (per VirtualTag row)
│ compiles + evaluates expression, publishes derived value
├──▶ ScriptedAlarmActor (per ScriptedAlarm row)
│ state: Inactive ⇄ Active ⇄ Acknowledged
├──▶ OpcUaPublishActor (per node, pinned dispatcher)
│ marshalled OPC UA SDK writes + RebuildAddressSpace
├──▶ HistorianAdapterActor (per node)
│ pipe IPC to Wonderware historian sidecar
├──▶ PeerOpcUaProbeActor (per peer node)
│ opc.tcp ping → redundancy-state DPS topic
└──▶ DbHealthProbeActor (per node)
cached SELECT 1; consumed by /health/ready + redundancy calc
```
## Public surface
| Type | File |
|---|---|
| `WithOtOpcUaRuntimeActors()` | `ServiceCollectionExtensions.cs` — extension on `AkkaConfigurationBuilder`. Spawns `DriverHostActor` + `DbHealthProbeActor` on the host's ActorSystem. |
| `DriverHostActor` | `Drivers/DriverHostActor.cs` |
| `DriverInstanceActor` | `Drivers/DriverInstanceActor.cs` |
| `VirtualTagActor` | `VirtualTags/VirtualTagActor.cs` |
| `ScriptedAlarmActor` | `ScriptedAlarms/ScriptedAlarmActor.cs` |
| `OpcUaPublishActor` | `OpcUa/OpcUaPublishActor.cs` |
| `HistorianAdapterActor` | `Historian/HistorianAdapterActor.cs` |
| `PeerOpcUaProbeActor` | `Health/PeerOpcUaProbeActor.cs` |
| `DbHealthProbeActor` | `Health/DbHealthProbeActor.cs` |
Marker keys for registry lookup: `DriverHostActorKey`, `DbHealthProbeActorKey`.
## DriverHostActor
Per-node supervisor with three Become states:
| State | Meaning |
|---|---|
| `Steady(rev)` | Caught up. `DispatchDeployment` with `msg.rev == currentRev` → immediate `ApplyAck(Applied)` (idempotent). New rev → `Become(Applying)`. |
| `Applying(id)` | Apply in progress. Further `DispatchDeployment` for in-flight ID → debug-log + ignore. For new ID → defer via `Self.Forward`. |
| `Stale` | ConfigDb unreachable on bootstrap. Periodic `RetryConfigDbConnection` tries to advance to `Steady`. |
`PreStart`:
1. Subscribe to `deployments` DPS topic.
2. Read most-recent `NodeDeploymentState` for this node from ConfigDb.
3. If `Applied` → restore `_currentRevision`, `Become(Steady)`.
4. If `Applying` (orphan from crash) → replay apply (idempotent).
5. If `Failed``Become(Steady)` at last known rev.
6. DB unreachable → `Become(Stale)`, start retry timer.
ACK publishing: when no `_coordinatorOverride` is set (production), `SendAck` publishes on the dedicated `deployment-acks` DPS topic which the coordinator subscribes to (commit `5cfbe8b`).
## DriverInstanceActor
Per-driver-instance child. State machine:
- `Connecting` → first attempt to reach the underlying driver
- `Connected` → subscriptions active, reads/writes flow
- `Reconnecting` → temporary disconnect; backoff retry
- `Stubbed` → DEV-STUB mode for Windows-only drivers (Galaxy, Wonderware Historian) on non-Windows or when `roles` contains `dev`
`ShouldStub(driverType, roles)` returns `true` for `"Galaxy" | "Historian.Wonderware"` on non-Windows; the actor goes straight to `Stubbed` and returns deterministic success without touching real hardware. Wiring this into the DriverHost child-spawn path is follow-up F20 (folds into F7).
Engine wiring (subscription publishing, ApplyDelta diff, bad-quality-on-disconnect, write path, supervisor backoff) is stubbed — tracked as F7. Tests exercise message contracts, not engine behaviour.
## VirtualTagActor / ScriptedAlarmActor
Skeleton state machines + message handlers. Engine work:
- `VirtualTagEngine.Evaluate()` not yet called from `VirtualTagActor.DependencyValueChanged` (F8).
- `AlarmConditionService` not yet called from `ScriptedAlarmActor` (F9).
- `ScriptedAlarmState` DB persistence on `PreRestart` not wired (F9).
## OpcUaPublishActor
The only actor on the **pinned dispatcher** (`opcua-synchronized-dispatcher` from `akka.conf`). All OPC UA SDK address-space writes go through it so the SDK's threading model isn't violated.
Message contracts are defined; actual SDK calls are stubbed (counters only). Real address-space writes + `ServiceLevel` Variable updates + `RebuildAddressSpace` after a deploy land in F10 (gated on F13 — full `OpcUaApplicationHost` extraction).
## HistorianAdapterActor, PeerOpcUaProbeActor
Both have message contracts wired. Engine integration deferred:
- `HistorianAdapterActor` — named-pipe IPC to the Wonderware historian sidecar + `SqliteStoreAndForwardSink` (F11).
- `PeerOpcUaProbeActor` — real `opc.tcp://peer:4840` ping (F12). Current stub always returns `Ok=true`.
## DbHealthProbeActor
`Ask<DbHealthStatus>` returns cached state (refreshed every 5 s by an internal `SELECT 1`). Consumed by `/health/ready` and `RedundancyStateActor`.
## Lifecycle wiring
```csharp
// Program.cs (driver role only)
builder.Services.AddAkka("otopcua", (ab, sp) =>
{
ab.WithOtOpcUaClusterBootstrap(sp);
if (hasAdmin) ab.WithOtOpcUaControlPlaneSingletons();
if (hasDriver) ab.WithOtOpcUaRuntimeActors();
});
```
`WithOtOpcUaRuntimeActors` resolves `IDbContextFactory<OtOpcUaConfigDbContext>` + `IClusterRoleInfo` from DI, then spawns `DbHealthProbeActor` and `DriverHostActor` as top-level `/user/` actors. Both register marker keys in `ActorRegistry` so the registry lookup works from anywhere.
## Tests
`tests/Server/ZB.MOM.WW.OtOpcUa.Runtime.Tests/` — 16 tests covering DriverHostActor (Steady ack, Applying transitions, Stale recovery), DriverInstanceActor (state machine, stub mode), VirtualTagActor + ScriptedAlarmActor (message contracts), OpcUaPublishActor (props + message acceptance), DbHealthProbe + PeerOpcUaProbe (probe loop), and the `WithOtOpcUaRuntimeActors` registration round-trip.
End-to-end deploy from admin → driver via the cluster is in `tests/Server/ZB.MOM.WW.OtOpcUa.Host.IntegrationTests/DeployHappyPathTests.cs`.
+3 -3
View File
@@ -36,7 +36,7 @@ Mirror ScadaLink's layout exactly:
```
src/
ZB.MOM.WW.OtOpcUa.Admin/ # Razor Components project (.NET 10)
ZB.MOM.WW.OtOpcUa.AdminUI/ # Razor Components project (.NET 10)
Auth/
AuthEndpoints.cs # /auth/login, /auth/logout, /auth/token
CookieAuthenticationStateProvider.cs # bridges cookie auth to Blazor <AuthorizeView>
@@ -61,10 +61,10 @@ src/
NotAuthorizedView.razor
EndpointExtensions.cs # MapAuthEndpoints + role policies
ServiceCollectionExtensions.cs # AddCentralAdmin
ZB.MOM.WW.OtOpcUa.Admin.Security/ # LDAP + role mapping + JWT (sibling of ScadaLink.Security)
ZB.MOM.WW.OtOpcUa.Security/ # LDAP + role mapping + JWT (sibling of ScadaLink.Security)
```
The `Admin.Security` project carries `LdapAuthService`, `RoleMapper`, `JwtTokenService`, `AuthorizationPolicies`. If it ever makes sense to consolidate with ScadaLink's identical project, lift to a shared internal NuGet — out of scope for v2.0 to keep OtOpcUa decoupled from ScadaLink's release cycle.
The `Security` project carries `LdapAuthService`, `RoleMapper`, `JwtTokenService`, `AuthorizationPolicies`. If it ever makes sense to consolidate with ScadaLink's identical project, lift to a shared internal NuGet — out of scope for v2.0 to keep OtOpcUa decoupled from ScadaLink's release cycle.
## Authentication & Authorization
+1 -1
View File
@@ -65,7 +65,7 @@ Running record of v2 dev services on the Windows dev VM. Updated on every instal
|---------|---------------------|---------|-----------|------------------------|---------------|--------|
| **Central config DB** | Docker container `otopcua-mssql` on the Linux Docker host (image `mcr.microsoft.com/mssql/server:2022-latest`) | 16.0.4250.1 (RTM-CU24-GDR, KB5083252) | `10.100.0.35:14330``1433` (container) — port 14330 retained from the previous local-container setup so connection-string ports don't churn | User `sa` / Password `OtOpcUaDev_2026!` | Docker named volume `otopcua-mssql-data` on the Docker host | ✅ Running on Docker host (`/opt/otopcua-mssql/`) since 2026-04-28; carries `project=lmxopcua` label |
| Dev Galaxy (AVEVA System Platform) | Local install on this dev box — full ArchestrA + Historian + OI-Server stack | v1 baseline | Local COM via MXAccess (`C:\Program Files (x86)\ArchestrA\Framework\bin\ArchestrA.MXAccess.dll`); Historian via `aaH*` services; SuiteLink via `slssvc` | Windows Auth | Galaxy repository DB `ZB` on local SQL Server (separate instance from `otopcua-mssql` — legacy v1 Galaxy DB, not related to v2 config DB) | ✅ **Fully available — Phase 2 lift unblocked.** 27 ArchestrA / AVEVA / Wonderware services running incl. `aaBootstrap`, `aaGR` (Galaxy Repository), `aaLogger`, `aaUserValidator`, `aaPim`, `ArchestrADataStore`, `AsbServiceManager`, `AutoBuild_Service`; full Historian set (`aahClientAccessPoint`, `aahGateway`, `aahInSight`, `aahSearchIndexer`, `aahSupervisor`, `InSQLStorage`, `InSQLConfiguration`, `InSQLEventSystem`, `InSQLIndexing`, `InSQLIOServer`, `InSQLManualStorage`, `InSQLSystemDriver`, `HistorianSearch-x64`); `slssvc` (Wonderware SuiteLink); `OI-Gateway` install present at `C:\Program Files (x86)\Wonderware\OI-Server\OI-Gateway\` (decision #142 AppServer-via-OI-Gateway smoke test now also unblocked) |
| GLAuth (LDAP) | Local install at `C:\publish\glauth\` | v2.4.0 | `localhost:3893` (LDAP) / `3894` (LDAPS, disabled) | Direct-bind `cn={user},dc=lmxopcua,dc=local` per `auth.md`; users `readonly`/`writeop`/`writetune`/`writeconfig`/`alarmack`/`admin`/`serviceaccount` (passwords in `glauth.cfg` as SHA-256) | `C:\publish\glauth\` | ✅ Running (NSSM service `GLAuth`). Phase 1 Admin uses GroupToRole map `ReadOnly→ConfigViewer`, `WriteOperate→ConfigEditor`, `AlarmAck→FleetAdmin`. v2-rebrand to `dc=otopcua,dc=local` is a future cosmetic change |
| GLAuth (LDAP) | Local install at `C:\publish\glauth\` | v2.4.0 | `localhost:3893` (LDAP) / `3894` (LDAPS, disabled) | Direct-bind `cn={user},dc=zb,dc=local` per `auth.md`; users `readonly`/`writeop`/`writetune`/`writeconfig`/`alarmack`/`admin`/`serviceaccount` (passwords in `glauth.cfg` as SHA-256) | `C:\publish\glauth\` | ✅ Running (NSSM service `GLAuth`). Phase 1 Admin uses GroupToRole map `ReadOnly→ConfigViewer`, `WriteOperate→ConfigEditor`, `AlarmAck→FleetAdmin`. Dev base DN unified to `dc=zb,dc=local` (Task 1.6) |
| OPC Foundation reference server | Not yet built | — | `10.100.0.35:62541` (target) | `user1` / `password1` (reference-server defaults) | — | Pending (needed for Phase 5 OPC UA Client driver testing) |
| FOCAS TCP stub | Not yet built | — | `10.100.0.35:8193` (target) | n/a | — | Pending (built in Phase 5; runs on Docker host) |
| Modbus simulator (`otopcua-pymodbus:3.13.0`) | Docker compose at `/opt/otopcua-modbus/` on Docker host | pinned 3.13.0 | `10.100.0.35:5020` | n/a | n/a | Stack staged; bring up with `lmxopcua-fix up modbus <profile>` from this VM |
+2 -2
View File
@@ -104,8 +104,8 @@ Anonymous OPC UA sessions are denied writes against `Operate`-classified tags by
"Enabled": true,
"Server": "localhost",
"Port": 3893,
"SearchBase": "dc=lmxopcua,dc=local",
"ServiceAccountDn": "cn=serviceaccount,dc=lmxopcua,dc=local",
"SearchBase": "dc=zb,dc=local",
"ServiceAccountDn": "cn=serviceaccount,dc=zb,dc=local",
"ServiceAccountPassword": "serviceaccount123",
"GroupToRole": {
"ReadOnly": "ReadOnly",
+4 -4
View File
@@ -96,7 +96,7 @@ Shipped as PR #183 (12 tests in configuration; 13 more in Admin.Tests).
| F.4 — Test harness (modal, synthetic inputs, output + logger display) | **Partial** | `ScriptTestHarnessService.cs` is complete and tested. `ScriptsTab.razor` calls `Harness.RunVirtualTagAsync` with zero-value synthetic inputs derived from the extractor. A full interactive input-form modal was not shipped — the harness zeroes all inputs automatically rather than prompting the operator per-tag. |
| F.5 — Script log viewer (SignalR tail of `scripts-*.log` filtered by `ScriptName`, load-more) | **Not started** | No SignalR stream of the scripts log is wired in the Admin UI. The `AlertHub` / `FleetStatusHub` exist but there is no `ScriptLogHub`. |
| F.6 — `/alarms/historian` diagnostics view | **Done** | `AlarmsHistorian.razor` + `HistorianDiagnosticsService.cs` |
| F.7 — Playwright smoke (author calc tag, verify in equipment tree; author alarm, verify in `AlarmsAndConditions`) | **Not started** | `tests/Server/ZB.MOM.WW.OtOpcUa.Admin.E2ETests/` exists but its `UnsTabDragDropE2ETests.cs` is the only Playwright test; no Phase 7 Admin UI playwright scenario. |
| F.7 — Playwright smoke (author calc tag, verify in equipment tree; author alarm, verify in `AlarmsAndConditions`) | **Not started** | No Phase 7 Playwright/E2E project exists in the repo today; future-work item without an assigned path. |
Shipped as PR #185 (13 Admin service tests; UI completeness is partial — see gaps section).
@@ -190,8 +190,8 @@ The SignalR tail of `scripts-*.log` filtered by `ScriptName` was not implemented
| `Core.VirtualTags` sources | `src/Core/ZB.MOM.WW.OtOpcUa.Core.VirtualTags/` |
| `Core.ScriptedAlarms` sources | `src/Core/ZB.MOM.WW.OtOpcUa.Core.ScriptedAlarms/` |
| `Core.AlarmHistorian` sources | `src/Core/ZB.MOM.WW.OtOpcUa.Core.AlarmHistorian/` |
| Server Phase7 composition | `src/Server/ZB.MOM.WW.OtOpcUa.Server/Phase7/` |
| Admin services | `src/Server/ZB.MOM.WW.OtOpcUa.Admin/Services/Script*.cs`, `VirtualTagService.cs`, `HistorianDiagnosticsService.cs` |
| Admin UI pages | `src/Server/ZB.MOM.WW.OtOpcUa.Admin/Components/Pages/Clusters/ScriptsTab.razor`, `AlarmsHistorian.razor` |
| Server Phase7 composition | `src/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer/Phase7Composer.cs`, `Phase7Applier.cs`, `Phase7Plan.cs` |
| Admin services (CRUD writes) | `src/Server/ZB.MOM.WW.OtOpcUa.ControlPlane/AdminOperations/AdminOperationsActor.cs` (actor-driven); live state in `src/Server/ZB.MOM.WW.OtOpcUa.Runtime/ScriptedAlarms/ScriptedAlarmActor.cs`, `Runtime/VirtualTags/VirtualTagActor.cs`; Roslyn engines in `src/Server/ZB.MOM.WW.OtOpcUa.Host/Engines/` — v1 `Admin/Services/Script*.cs`, `VirtualTagService.cs`, `HistorianDiagnosticsService.cs` deleted |
| Admin UI pages | `src/Server/ZB.MOM.WW.OtOpcUa.AdminUI/Components/Pages/Scripts.razor`, `ScriptEdit.razor`, `ScriptedAlarms.razor`, `ScriptedAlarmEdit.razor`, `AlarmsHistorian.razor`, `VirtualTags.razor`, `VirtualTagEdit.razor` |
| Historian sidecar writer | `src/Drivers/ZB.MOM.WW.OtOpcUa.Driver.Historian.Wonderware.Client/WonderwareHistorianClient.cs` |
| EF migrations | `src/Core/ZB.MOM.WW.OtOpcUa.Configuration/Migrations/20260420231641_AddPhase7ScriptingTables.cs`, `20260420232000_ExtendComputeGenerationDiffWithPhase7.cs` |
+9 -6
View File
@@ -55,6 +55,7 @@ Each row is one manual run; pass criterion in the right column.
| A2 | ServiceLevel updates on peer down | Connect to Primary. Stop Backup (`sc stop OtOpcUa`). Watch `ServiceLevel`. | Transitions 200 → 150 within ~2 s of peer probe timeout |
| A3 | RedundancySupport | Browse to `Server.ServerRedundancy.RedundancySupport`. | Value matches the declared `RedundancyMode` (Warm / Hot / None) |
| A4 | ServerUriArray (non-transparent upgrade) | Requires a redundancy-object-type upgrade follow-up. | When upgrade lands: `ServerUriArray` reports both ApplicationUris, self first |
| A4b | Peer URI visibility via `Server.ServerArray` (i=2254) | Configure each `OpcUaApplicationHost` with the partner's `ApplicationUri` via `OpcUaApplicationHostOptions.PeerApplicationUris`. From any client, Read NodeId `i=2254` (`Server.ServerArray`). | Returned `String[]` includes both self + peer `ApplicationUri`s. Validated by `DualEndpointTests` in `tests/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer.IntegrationTests/` (loopback dual-host with real OPCFoundation client `Session` read). |
| A5 | Mid-apply dip | On Primary trigger a `sp_PublishGeneration` apply. | `ServiceLevel` drops to 75 for the apply duration + dwell |
### Block B — Client failover
@@ -101,7 +102,9 @@ flips A4 from "deferred" to "expected pass").
- **A4 pending**: `Server.ServerRedundancy` on our current SDK build lands as
the base `ServerRedundancyState`, which has no `ServerUriArray` child.
`ServerRedundancyNodeWriter.ApplyServerUriArray` logs-and-skips until the
redundancy-object-type upgrade follow-up lands.
redundancy-object-type upgrade follow-up lands. Cross-reference **A4b**
peer URIs are visible today via `Server.ServerArray` (i=2254) populated by
`OpcUaApplicationHost.PopulateServerArray`.
- **Recovery dwell default**: `RecoveryStateManager.DwellTime` defaults to 60 s
in `Program.cs`. Adjust via future config knob if B3 takes too long to
observe.
@@ -121,8 +124,8 @@ flips A4 from "deferred" to "expected pass").
redundancy implementations we don't control.
- For the sub-set of scenarios that *can* be automated — the self-loopback
case where our own `otopcua-cli` drives Primary + Backup — the existing
`tests/Server/ZB.MOM.WW.OtOpcUa.Server.Tests/RedundancyStatePublisherTests` +
`ServiceLevelCalculatorTests` (unit) + `ClusterTopologyLoaderTests`
(integration) already cover the math + data path. The wire-level assertion
that the values actually land on the right OPC UA nodes is covered by
`ServerRedundancyNodeWriterTests`.
`tests/Server/ZB.MOM.WW.OtOpcUa.ControlPlane.Tests/RedundancyStateActorTests` +
`ServiceLevelCalculatorTests` (unit) already cover the math + data path.
The wire-level assertion that the peer URIs actually land on the
`Server.ServerArray` node (i=2254) is covered by `DualEndpointTests` in
`tests/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer.IntegrationTests/`.
+2 -1
View File
@@ -57,7 +57,7 @@ Remaining follow-ups (hardening):
Remaining Phase 6.3 surfaces (hardening, not release-blocking):
- ~~`PeerHttpProbeLoop` + `PeerUaProbeLoop` HostedServices populating `PeerReachabilityTracker` on each tick.~~ **Closed 2026-04-24.** Two-layer probe model shipped: HTTP probe at 2 s / 1 s timeout against `/healthz`; OPC UA probe at 10 s / 5 s timeout via `DiscoveryClient.GetEndpoints`, short-circuiting when HTTP reports the peer unhealthy. Registered on the Server as `AddHostedService<PeerHttpProbeLoop>` + `AddHostedService<PeerUaProbeLoop>`. Publisher now sees accurate `PeerReachability` per peer instead of degrading to `Unknown` → Isolated-Primary band (230).
- OPC UA variable-node wiring: bind `ServiceLevel` Byte + `ServerUriArray` String[] to the publisher's events via `BaseDataVariable.OnReadValue` / direct value push.
- ~~OPC UA variable-node wiring: bind `ServiceLevel` Byte + `ServerUriArray` String[] to the publisher's events via `BaseDataVariable.OnReadValue` / direct value push.~~ **Closed 2026-05-26.** `ServiceLevel` byte binding closed earlier under Path D. Peer-URI half closed via `OpcUaApplicationHost.PopulateServerArray` — populates self + each `PeerApplicationUris` entry into the SDK `IServerInternal.ServerUris` `StringTable`; clients read `Server.ServerArray` (NodeId `i=2254`). Validated by `DualEndpointTests` in `tests/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer.IntegrationTests/`. `ServerUriArray` proper (the redundancy-object-type child) remains deferred pending object-type upgrade.
- ~~`sp_PublishGeneration` wraps its apply in `await using var lease = coordinator.BeginApplyLease(...)` so the `PrimaryMidApply` band (200) fires during actual publishes (task #148 part 2).~~ **Closed 2026-04-24.** The apply loop now lives in `GenerationRefreshHostedService` — polls `sp_GetCurrentGenerationForCluster` every 5s, opens a lease when a new generation is detected, calls `RedundancyCoordinator.RefreshAsync` inside the `await using`, releases the lease on all exit paths. Replaces the previous "topology never refreshes without a process restart" behaviour.
- Client interop matrix — Ignition / Kepware / Aveva OI Gateway (Stream F, task #150). Manual + doc-only.
@@ -118,6 +118,7 @@ v2 GA requires all of the following:
## Change log
- **2026-05-26** — Gap-closeout pass. `OpcUaApplicationHost.PopulateServerArray` populates `Server.ServerArray` (NodeId `i=2254`) with self + `OpcUaApplicationHostOptions.PeerApplicationUris`, giving non-transparent peer URI visibility through the standard discovery surface. New `tests/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer.IntegrationTests/` IT project (`DualEndpointTests`) validates with two real `OpcUaApplicationHost` instances on loopback + a live OPCFoundation client `Session` read. CI `v2-ci.yml` `integration:` job converted to a matrix across `Host.IntegrationTests` + `OpcUaServer.IntegrationTests`. Per-role appsettings overlays shipped (`appsettings.admin.json` / `appsettings.driver.json` / `appsettings.admin-driver.json`) — `Program.cs:33-35` loads by alphabetical-joined role suffix. `FailoverScenarioTests``FailoverDuringDeployTests` rename. Stale empty `src/Server/{Server,Admin}` + `tests/Server/{Server.Tests,Admin.Tests,Admin.E2ETests}` directories deleted (no source, absent from `.slnx`).
- **2026-04-24** — Phase 5 driver complement closed (task #120 CLOSED). AB CIP, AB Legacy, TwinCAT, FOCAS all shipped. FOCAS migration: retired the Tier-C split (`Driver.FOCAS.Host` + `Driver.FOCAS.Shared` + `FwlibNative` + shim DLL deleted) in favour of a pure-managed in-process `FocasWireClient` inlined into `Driver.FOCAS`; driver is now read-only against the CNC by design. Integration test matrix grew to cover Browse / Subscribe / IAlarmSource / Probe end-to-end.
- **2026-04-23** — Phase 6.4 audit close-out. IdentificationFolderBuilder + OPC 40010 Identification folder verified against the shipped code.
- **2026-04-20** — Phase 7 plan drafted (`phase-7-scripting-and-alarming.md`, `phase-7-e2e-smoke.md`). Out of scope for v2 GA.
+85 -51
View File
@@ -1,46 +1,63 @@
<#
.SYNOPSIS
Registers the v2 Windows services on a node: OtOpcUa (main server, net10) and
optionally OtOpcUaWonderwareHistorian (Wonderware historian sidecar).
Registers the v2 Windows service on a node: OtOpcUaHost (fused binary, .NET 10)
and optionally OtOpcUaWonderwareHistorian (Wonderware historian sidecar, net48 x86).
.DESCRIPTION
PR 7.2 retired the legacy out-of-process OtOpcUaGalaxyHost service alongside the
GalaxyProxyDriver / GalaxyHost / GalaxyShared projects. Galaxy access now flows
through the in-process GalaxyDriver talking gRPC to a separately-installed
mxaccessgw. The mxaccessgw server runs out of its own repo
(`c:\Users\dohertj2\Desktop\mxaccessgw\`) see
`docs/v2/Galaxy.ParityRig.md` for the gw setup recipe.
v2 consolidates the legacy OtOpcUa + OtOpcUaAdmin services into a single role-gated
OtOpcUaHost binary. The -Roles parameter sets the OTOPCUA_ROLES service env so
Program.cs decides what to mount (admin / driver / both). The Wonderware historian
sidecar logic is unchanged from v1; install it with -InstallWonderwareHistorian.
Galaxy access flows through the mxaccessgw sibling repo (separate service); see
docs/v2/Galaxy.ParityRig.md for the gateway setup.
.PARAMETER InstallRoot
Where the binaries live (typically C:\Program Files\OtOpcUa).
Where the binaries live (typically C:\Program Files\OtOpcUa). The OtOpcUaHost
service runs OtOpcUa.Host.exe from this directory; publish the Host project there
with `dotnet publish -c Release -r win-x64 --self-contained` first.
.PARAMETER ServiceAccount
Service account SID or DOMAIN\name. The OtOpcUa service runs under this account.
Service account SID or DOMAIN\name. The OtOpcUaHost service runs under this account.
.PARAMETER Roles
Comma-separated cluster roles for this node. One of:
- "admin,driver" single-node dev or all-in-one production node
- "admin" admin-only HA pair member (Blazor + control-plane singletons)
- "driver" driver-only node (OPC UA endpoint + per-node actors)
Written to the service env as OTOPCUA_ROLES.
.PARAMETER HttpPort
HTTP port for the AdminUI + auth endpoints. Default 9000. Written as ASPNETCORE_URLS.
Ignored on driver-only nodes (no Blazor surface).
.PARAMETER InstallWonderwareHistorian
Gate the OtOpcUaWonderwareHistorian sidecar install. Off by default; set when
the deployment uses the Wonderware historian for history reads + alarm-event
persistence.
Gate the OtOpcUaWonderwareHistorian sidecar install. Off by default; set when the
deployment uses the Wonderware historian for history reads + alarm-event persistence.
.PARAMETER HistorianSharedSecret
Per-process secret passed to the Historian sidecar via env var. Generated
freshly per install when not supplied.
Per-process secret passed to the historian sidecar via env var. Generated freshly
per install when not supplied.
.EXAMPLE
.\Install-Services.ps1 -InstallRoot 'C:\Program Files\OtOpcUa' -ServiceAccount 'OTOPCUA\svc-otopcua'
.\Install-Services.ps1 -InstallRoot 'C:\Program Files\OtOpcUa' `
-ServiceAccount 'OTOPCUA\svc-otopcua' -Roles 'admin,driver'
.EXAMPLE
.\Install-Services.ps1 -InstallRoot 'C:\Program Files\OtOpcUa' -ServiceAccount 'OTOPCUA\svc-otopcua' `
.\Install-Services.ps1 -InstallRoot 'C:\Program Files\OtOpcUa' `
-ServiceAccount 'OTOPCUA\svc-otopcua' -Roles 'driver' `
-InstallWonderwareHistorian
#>
[CmdletBinding()]
param(
[Parameter(Mandatory)] [string]$InstallRoot,
[Parameter(Mandatory)] [string]$ServiceAccount,
[Parameter(Mandatory)] [ValidateSet('admin', 'driver', 'admin,driver', 'driver,admin')]
[string]$Roles,
[int]$HttpPort = 9000,
# PR 3.W — Wonderware historian sidecar. Optional; gates the
# OtOpcUaWonderwareHistorian service. Secret + pipe defaults match the server's
# Historian:Wonderware appsettings block.
# Wonderware historian sidecar. Optional; gates the OtOpcUaWonderwareHistorian
# service. Secret + pipe defaults match the server's Historian:Wonderware appsettings.
[switch]$InstallWonderwareHistorian,
[string]$HistorianSharedSecret,
[string]$HistorianPipeName = 'OtOpcUaWonderwareHistorian',
@@ -51,18 +68,19 @@ param(
$ErrorActionPreference = 'Stop'
if (-not (Test-Path "$InstallRoot\OtOpcUa.Server.exe")) {
Write-Error "OtOpcUa.Server.exe not found at $InstallRoot — copy the publish output first"
if (-not (Test-Path "$InstallRoot\OtOpcUa.Host.exe")) {
Write-Error "OtOpcUa.Host.exe not found at $InstallRoot — copy the publish output first"
exit 1
}
# Generate fresh shared secrets per install if not supplied.
function New-SharedSecret {
$bytes = New-Object byte[] 32
[System.Security.Cryptography.RandomNumberGenerator]::Create().GetBytes($bytes)
return [Convert]::ToBase64String($bytes)
}
if ($InstallWonderwareHistorian -and -not $HistorianSharedSecret) { $HistorianSharedSecret = New-SharedSecret }
if ($InstallWonderwareHistorian -and -not $HistorianSharedSecret) {
$HistorianSharedSecret = New-SharedSecret
}
if ($InstallWonderwareHistorian -and -not (Test-Path "$InstallRoot\WonderwareHistorian\OtOpcUa.Driver.Historian.Wonderware.exe")) {
Write-Error "OtOpcUa.Driver.Historian.Wonderware.exe not found at $InstallRoot\WonderwareHistorian — copy the publish output first"
@@ -76,10 +94,7 @@ $sid = if ($ServiceAccount.StartsWith('S-1-')) {
(New-Object System.Security.Principal.NTAccount $ServiceAccount).Translate([System.Security.Principal.SecurityIdentifier]).Value
}
# --- Install OtOpcUaWonderwareHistorian (PR 3.W) — separate sidecar that exposes the
# Wonderware Historian SDK via a named-pipe protocol consumed by the .NET 10 server.
# Optional: only installed when -InstallWonderwareHistorian is supplied. Depends on the
# hard AVEVA services that host the historian SDK runtime path.
# --- OtOpcUaWonderwareHistorian sidecar (optional, unchanged from v1) -------
$historianDepend = $null
if ($InstallWonderwareHistorian) {
$historianEnv = @(
@@ -87,14 +102,10 @@ if ($InstallWonderwareHistorian) {
"OTOPCUA_ALLOWED_SID=$sid"
"OTOPCUA_HISTORIAN_SECRET=$HistorianSharedSecret"
"OTOPCUA_HISTORIAN_ENABLED=true"
# Default-on when the historian sidecar is installed; flip to false for a
# read-only deployment that still loads aahClientManaged for reads but
# rejects WriteAlarmEvents frames.
"OTOPCUA_HISTORIAN_ALARM_WRITE_ENABLED=true"
"OTOPCUA_HISTORIAN_SERVER=$HistorianServer"
"OTOPCUA_HISTORIAN_PORT=$HistorianPort"
) -join "`0"
$historianEnv += "`0`0"
)
Write-Host "Installing OtOpcUaWonderwareHistorian..."
& sc.exe create OtOpcUaWonderwareHistorian binPath= "`"$InstallRoot\WonderwareHistorian\OtOpcUa.Driver.Historian.Wonderware.exe`"" `
@@ -105,36 +116,59 @@ if ($InstallWonderwareHistorian) {
& sc.exe config OtOpcUaWonderwareHistorian start= delayed-auto | Out-Null
$svcKey = "HKLM:\SYSTEM\CurrentControlSet\Services\OtOpcUaWonderwareHistorian"
$envValue = $historianEnv.Split("`0") | Where-Object { $_ -ne '' }
Set-ItemProperty -Path $svcKey -Name 'Environment' -Type MultiString -Value $envValue
Set-ItemProperty -Path $svcKey -Name 'Environment' -Type MultiString -Value $historianEnv
& sc.exe failure OtOpcUaWonderwareHistorian reset= 86400 actions= restart/5000/restart/30000/restart/60000 | Out-Null
$historianDepend = 'OtOpcUaWonderwareHistorian'
}
# --- Install OtOpcUa. Galaxy access flows through GalaxyDriver → mxaccessgw (gRPC),
# so OtOpcUa no longer depends on a sibling service for Galaxy connectivity. The
# mxaccessgw is installed separately. When the Wonderware sidecar is installed,
# depend on it for startup ordering.
$otOpcUaDepends = @()
if ($historianDepend) { $otOpcUaDepends += $historianDepend }
# --- OtOpcUaHost (the fused v2 binary) --------------------------------------
$normalisedRoles = ($Roles -split ',' | ForEach-Object { $_.Trim() } | Sort-Object -Unique) -join ','
Write-Host "Installing OtOpcUa..."
$hasAdmin = $normalisedRoles -split ',' -contains 'admin'
$hostEnv = @(
"OTOPCUA_ROLES=$normalisedRoles",
'DOTNET_ENVIRONMENT=Production'
)
if ($hasAdmin) {
$hostEnv += "ASPNETCORE_URLS=http://+:$HttpPort"
}
$hostDepends = @()
if ($historianDepend) { $hostDepends += $historianDepend }
Write-Host "Installing OtOpcUaHost (roles=$normalisedRoles)..."
$createArgs = @(
'create', 'OtOpcUa',
'binPath=', "`"$InstallRoot\OtOpcUa.Server.exe`"",
'DisplayName=', 'OtOpcUa Server',
'create', 'OtOpcUaHost',
'binPath=', "`"$InstallRoot\OtOpcUa.Host.exe`"",
'DisplayName=', "OtOpcUa Host ($normalisedRoles)",
'start=', 'auto',
'obj=', $ServiceAccount
)
if ($otOpcUaDepends.Count -gt 0) {
$createArgs += @('depend=', ($otOpcUaDepends -join '/'))
if ($hostDepends.Count -gt 0) {
$createArgs += @('depend=', ($hostDepends -join '/'))
}
& sc.exe @createArgs | Out-Null
# Env block via registry MultiString (sc.exe doesn't take env directly).
$svcKey = "HKLM:\SYSTEM\CurrentControlSet\Services\OtOpcUaHost"
Set-ItemProperty -Path $svcKey -Name 'Environment' -Type MultiString -Value $hostEnv
# Restart-on-failure: 5s, 30s, 60s; reset counter after a clean 24h run.
& sc.exe failure OtOpcUaHost reset= 86400 actions= restart/5000/restart/30000/restart/60000 | Out-Null
Write-Host ""
Write-Host "Installed. Start with:"
Write-Host "Installed OtOpcUaHost:"
Write-Host " Roles: $normalisedRoles"
if ($hasAdmin) { Write-Host " HTTP port: $HttpPort" }
Write-Host " Binary: $InstallRoot\OtOpcUa.Host.exe"
Write-Host " Account: $ServiceAccount"
Write-Host ""
Write-Host "Start with:"
if ($InstallWonderwareHistorian) { Write-Host " sc.exe start OtOpcUaWonderwareHistorian" }
Write-Host " sc.exe start OtOpcUa"
Write-Host " sc.exe start OtOpcUaHost"
if ($InstallWonderwareHistorian) {
Write-Host ""
Write-Host "Wonderware historian shared secret (configure into appsettings.json Historian:Wonderware:SharedSecret):"
@@ -142,5 +176,5 @@ if ($InstallWonderwareHistorian) {
}
Write-Host ""
Write-Host "NOTE: Galaxy access flows through mxaccessgw — install + run that separately"
Write-Host " per docs/v2/Galaxy.ParityRig.md. OtOpcUa connects via the Galaxy.Gateway"
Write-Host " section of appsettings.json (default endpoint http://localhost:5120)."
Write-Host " per docs/v2/Galaxy.ParityRig.md. OtOpcUaHost connects via the"
Write-Host " Galaxy.Gateway section of appsettings.json (default http://localhost:5120)."
+68
View File
@@ -0,0 +1,68 @@
<#
.SYNOPSIS
Installs Traefik as a Windows service that routes admin HTTP traffic to whichever
OtOpcUa.Host node holds the admin role-leader (via /health/active).
.DESCRIPTION
Downloads the Traefik Windows binary into $InstallRoot, drops traefik.yml +
traefik-dynamic.yml from this directory next to it, and registers Traefik as a
Windows service via sc.exe with restart-on-failure.
Companion to Install-Services.ps1. Run on the box that fronts the admin HTTP
traffic (typically a separate node from OtOpcUaHost, or co-located on the
primary admin node).
.PARAMETER InstallRoot
Where the Traefik binary + config land. Default 'C:\Program Files\Traefik'.
.PARAMETER TraefikVersion
Traefik version to download. Default 'v3.1.6'.
.EXAMPLE
.\Install-Traefik.ps1 -InstallRoot 'C:\Program Files\Traefik'
#>
[CmdletBinding()]
param(
[string]$InstallRoot = 'C:\Program Files\Traefik',
[string]$TraefikVersion = 'v3.1.6'
)
$ErrorActionPreference = 'Stop'
if (-not (Test-Path $InstallRoot)) {
New-Item -ItemType Directory -Path $InstallRoot | Out-Null
}
$zip = Join-Path $env:TEMP "traefik-$TraefikVersion.zip"
$url = "https://github.com/traefik/traefik/releases/download/$TraefikVersion/traefik_${TraefikVersion}_windows_amd64.zip"
Write-Host "Downloading Traefik $TraefikVersion..."
Invoke-WebRequest -Uri $url -OutFile $zip
Expand-Archive -Path $zip -DestinationPath $InstallRoot -Force
Remove-Item $zip
$scriptDir = Split-Path -Parent $MyInvocation.MyCommand.Path
Copy-Item -Force (Join-Path $scriptDir 'traefik.yml') $InstallRoot
Copy-Item -Force (Join-Path $scriptDir 'traefik-dynamic.yml') (Join-Path $InstallRoot 'dynamic.yml')
# Traefik reads dynamic.yml from /etc/traefik on Linux; on Windows place it next to the
# binary and point the file provider at it. Edit traefik.yml's `filename:` if you want
# to change the location.
(Get-Content -Raw (Join-Path $InstallRoot 'traefik.yml')) `
-replace '/etc/traefik/dynamic.yml', (Join-Path $InstallRoot 'dynamic.yml').Replace('\', '/') `
| Set-Content (Join-Path $InstallRoot 'traefik.yml')
Write-Host "Installing Traefik Windows service..."
& sc.exe create OtOpcUaTraefik binPath= "`"$InstallRoot\traefik.exe`" --configFile=`"$InstallRoot\traefik.yml`"" `
DisplayName= 'OtOpcUa Traefik (admin HTTP front door)' `
start= auto | Out-Null
& sc.exe failure OtOpcUaTraefik reset= 86400 actions= restart/5000/restart/30000/restart/60000 | Out-Null
Write-Host ""
Write-Host "Installed OtOpcUaTraefik. Edit:"
Write-Host " $InstallRoot\dynamic.yml (router + service definitions)"
Write-Host "Start with:"
Write-Host " sc.exe start OtOpcUaTraefik"
Write-Host ""
Write-Host "Traefik dashboard: http://localhost:8080 (turn off api.insecure in production)"
+11 -11
View File
@@ -43,11 +43,11 @@ function Test-NssmService([string]$Name) {
# Step 1: Stop in reverse dependency order
# ------------------------------------------------------------------------
Step "Stopping services (OtOpcUa OtOpcUaWonderwareHistorian MxAccessGw)"
Step "Stopping services (OtOpcUaHost > OtOpcUaWonderwareHistorian > MxAccessGw)"
foreach ($name in @('OtOpcUa', 'OtOpcUaWonderwareHistorian', 'MxAccessGw')) {
foreach ($name in @('OtOpcUaHost', 'OtOpcUaWonderwareHistorian', 'MxAccessGw')) {
if (Test-NssmService $name) {
Run { nssm stop $name } "stop $name"
Run { Stop-Service $name -Force -ErrorAction SilentlyContinue } "stop $name"
}
else {
Write-Host " ($name not installed; skipping)" -ForegroundColor DarkGray
@@ -56,7 +56,7 @@ foreach ($name in @('OtOpcUa', 'OtOpcUaWonderwareHistorian', 'MxAccessGw')) {
if (-not $WhatIf) {
Start-Sleep -Seconds 3
Get-Process MxGateway.Server, MxGateway.Worker, OtOpcUa.Server, OtOpcUa.Driver.Historian.Wonderware -ErrorAction SilentlyContinue |
Get-Process MxGateway.Server, MxGateway.Worker, OtOpcUa.Host, OtOpcUa.Driver.Historian.Wonderware -ErrorAction SilentlyContinue |
ForEach-Object {
Write-Host " killing residual process $($_.ProcessName) (PID=$($_.Id))" -ForegroundColor DarkYellow
Stop-Process -Id $_.Id -Force -ErrorAction SilentlyContinue
@@ -109,14 +109,14 @@ Run {
# Step 4: Refresh OtOpcUa + Wonderware historian sidecar
# ------------------------------------------------------------------------
Step "Publishing OtOpcUa server + Wonderware historian sidecar from $RepoRoot"
Step "Publishing OtOpcUa.Host + Wonderware historian sidecar from $RepoRoot"
Run {
& dotnet publish "$RepoRoot\src\Server\ZB.MOM.WW.OtOpcUa.Server" `
& dotnet publish "$RepoRoot\src\Server\ZB.MOM.WW.OtOpcUa.Host" `
-c Release -o (Join-Path $PublishRoot "lmxopcua") | Out-Null
& dotnet publish "$RepoRoot\src\Drivers\ZB.MOM.WW.OtOpcUa.Driver.Historian.Wonderware" `
-c Release -o (Join-Path $PublishRoot "lmxopcua\WonderwareHistorian") | Out-Null
} "dotnet publish (Server + sidecar)"
} "dotnet publish (Host + sidecar)"
# ------------------------------------------------------------------------
# Step 5: Service env block — ensure OTOPCUA_HISTORIAN_ALARM_WRITE_ENABLED
@@ -143,16 +143,16 @@ if (Test-NssmService 'OtOpcUaWonderwareHistorian') {
# Step 6: Start in forward dependency order
# ------------------------------------------------------------------------
Step "Starting services (MxAccessGw OtOpcUaWonderwareHistorian OtOpcUa)"
Step "Starting services (MxAccessGw > OtOpcUaWonderwareHistorian > OtOpcUaHost)"
foreach ($pair in @(
@{ Name = 'MxAccessGw'; Wait = 4 },
@{ Name = 'OtOpcUaWonderwareHistorian'; Wait = 4 },
@{ Name = 'OtOpcUa'; Wait = 8 }
@{ Name = 'OtOpcUaHost'; Wait = 8 }
)) {
$name = $pair.Name
if (Test-NssmService $name) {
Run { nssm start $name } "start $name"
Run { Start-Service $name } "start $name"
if (-not $WhatIf) { Start-Sleep -Seconds $pair.Wait }
}
else {
@@ -167,7 +167,7 @@ foreach ($pair in @(
Step "Smoke verification"
if (-not $WhatIf) {
foreach ($name in @('MxAccessGw', 'OtOpcUaWonderwareHistorian', 'OtOpcUa')) {
foreach ($name in @('MxAccessGw', 'OtOpcUaWonderwareHistorian', 'OtOpcUaHost')) {
if (Test-NssmService $name) {
$status = (Get-Service $name).Status
$color = if ($status -eq 'Running') { 'Green' } else { 'Red' }
+7 -6
View File
@@ -3,16 +3,17 @@
Stops + removes the v2 services. Mirrors Install-Services.ps1.
.DESCRIPTION
PR 7.2 retired the legacy OtOpcUaGalaxyHost service. Galaxy access now flows
through the in-process GalaxyDriver against a separately-installed mxaccessgw.
OtOpcUaGalaxyHost is included in the cleanup loop below so this script safely
removes it from any rig still carrying the legacy service from a pre-7.2
install.
Removes the v2 OtOpcUaHost service plus the optional OtOpcUaWonderwareHistorian
sidecar. Also cleans up legacy service names from prior installs:
- OtOpcUa (v1 server) replaced by OtOpcUaHost in v2
- OtOpcUaAdmin (v1 admin) fused into OtOpcUaHost in v2
- OtOpcUaGalaxyHost (pre-7.2 Galaxy host) long-retired
#>
[CmdletBinding()] param()
$ErrorActionPreference = 'Continue'
foreach ($svc in 'OtOpcUa', 'OtOpcUaWonderwareHistorian', 'OtOpcUaGalaxyHost') {
foreach ($svc in 'OtOpcUaHost', 'OtOpcUaWonderwareHistorian',
'OtOpcUa', 'OtOpcUaAdmin', 'OtOpcUaGalaxyHost') {
if (Get-Service $svc -ErrorAction SilentlyContinue) {
Write-Host "Stopping $svc..."
Stop-Service $svc -Force -ErrorAction SilentlyContinue
+24
View File
@@ -0,0 +1,24 @@
# Dynamic (file-provider) Traefik config for the OtOpcUa admin HTTP routing.
# Picked up by traefik.yml's file provider (with watch: true) so router/service
# edits hot-reload without a Traefik restart.
http:
routers:
otopcua-admin:
entryPoints: ["web"]
rule: "HostRegexp(`otopcua.*`)"
service: otopcua-admin
services:
otopcua-admin:
loadBalancer:
servers:
- url: "http://admin-a:9000"
- url: "http://admin-b:9000"
healthCheck:
path: /health/active
interval: 5s
timeout: 2s
# Default expected status is 2xx. Followers return 503 from
# /health/active so Traefik will drop them from the balancer
# within the next interval after a leadership change.
+30
View File
@@ -0,0 +1,30 @@
# Traefik static configuration for the OtOpcUa fleet HTTP front door.
#
# Routes admin-role HTTP traffic (Blazor + auth + SignalR + /auth/*) to whichever
# OtOpcUa.Host node currently holds the admin role-leader. Uses the /health/active
# endpoint as the active-leader signal: a node returns 200 only when it is the
# Akka admin role-leader; followers return 503 and Traefik routes around them.
#
# OPC UA traffic is NOT routed through Traefik — clients connect directly to
# opc.tcp://node:4840 on every driver node and use the standard ServiceLevel
# heuristic for failover.
entryPoints:
web:
address: ":80"
providers:
file:
filename: /etc/traefik/dynamic.yml
watch: true
api:
insecure: true
dashboard: true
log:
level: INFO
format: common
accessLog:
format: common
+59
View File
@@ -0,0 +1,59 @@
<#
.SYNOPSIS
Idempotent migration runner that takes the OtOpcUaConfig database from the v1 schema
(with ConfigGeneration / ClusterNodeGenerationState) to the v2 hosting-aligned schema
(with Deployment / NodeDeploymentState / ConfigEdit / DataProtectionKeys).
.DESCRIPTION
Backs the database up, applies the idempotent EF migration script, then validates that
expected tables exist and legacy tables are gone. Safe to re-run the EF script itself
is idempotent, and the backup picks a unique filename per invocation.
.PARAMETER ConnectionString
Mandatory. Full ADO.NET connection string with permissions to BACKUP DATABASE and
apply DDL on the target ConfigDb.
.PARAMETER BackupPath
Optional. Full path for the backup file. Defaults to a timestamped path under $env:TEMP.
.EXAMPLE
.\Migrate-To-V2.ps1 -ConnectionString "Server=sql01;Database=OtOpcUaConfig;Trusted_Connection=True;TrustServerCertificate=True"
#>
[CmdletBinding()]
param(
[Parameter(Mandatory)][string] $ConnectionString,
[string] $BackupPath = "$env:TEMP\OtOpcUa-V1-Backup-$(Get-Date -Format yyyyMMddHHmmss).bak"
)
$ErrorActionPreference = 'Stop'
if (-not (Get-Command Invoke-Sqlcmd -ErrorAction SilentlyContinue)) {
throw "Invoke-Sqlcmd not available. Install module: Install-Module SqlServer -Scope CurrentUser"
}
Write-Host "Step 1/4 — Backup ConfigDb to $BackupPath" -ForegroundColor Cyan
Invoke-Sqlcmd -ConnectionString $ConnectionString `
-Query "BACKUP DATABASE [OtOpcUaConfig] TO DISK = '$BackupPath' WITH FORMAT, COMPRESSION"
Write-Host "Step 2/4 — Row counts (before)" -ForegroundColor Cyan
$beforeCounts = Invoke-Sqlcmd -ConnectionString $ConnectionString -InputFile "$PSScriptRoot\count-rows.sql"
$beforeCounts | Format-Table
Write-Host "Step 3/4 — Apply Migrate-To-V2.sql" -ForegroundColor Cyan
Invoke-Sqlcmd -ConnectionString $ConnectionString -InputFile "$PSScriptRoot\Migrate-To-V2.sql" -QueryTimeout 1800
Write-Host "Step 4/4 — Row counts (after) + validation" -ForegroundColor Cyan
$afterCounts = Invoke-Sqlcmd -ConnectionString $ConnectionString -InputFile "$PSScriptRoot\count-rows.sql"
$afterCounts | Format-Table
$tablesNow = (Invoke-Sqlcmd -ConnectionString $ConnectionString `
-Query "SELECT name FROM sys.tables ORDER BY name").name
foreach ($t in 'Deployment','NodeDeploymentState','ConfigEdit','DataProtectionKeys') {
if ($tablesNow -notcontains $t) { throw "Expected v2 table $t missing." }
}
foreach ($t in 'ConfigGeneration','ClusterNodeGenerationState') {
if ($tablesNow -contains $t) { throw "Legacy v1 table $t still present." }
}
Write-Host "Migration complete. Backup at $BackupPath" -ForegroundColor Green
File diff suppressed because it is too large Load Diff
+26
View File
@@ -0,0 +1,26 @@
-- Per-table row counts for pre/post-migration audit.
-- Covers every table relevant to the v1 -> v2 transition so the operator can confirm
-- live-edit data was preserved and v2 tables came up empty.
SELECT TableName = t.name, [Rows] = SUM(p.[rows])
FROM sys.tables t
JOIN sys.partitions p ON p.object_id = t.object_id AND p.index_id IN (0,1)
WHERE t.name IN (
-- Live-edit configuration (rows must survive)
'ServerCluster','ClusterNode','ClusterNodeCredential',
'Namespace','UnsArea','UnsLine',
'DriverInstance','Device','Equipment','Tag','PollGroup','VirtualTag',
'NodeAcl','ExternalIdReservation',
'Script','ScriptedAlarm','ScriptedAlarmState',
'LdapGroupRoleMapping',
'EquipmentImportBatch','EquipmentImportRow',
-- Status tables (rebuilt at runtime; counts informational)
'DriverHostStatus','DriverInstanceResilienceStatus',
-- Audit (preserved)
'ConfigAuditLog',
-- v2 deploy model (empty pre-migration, populated post)
'Deployment','NodeDeploymentState','ConfigEdit','DataProtectionKeys'
)
GROUP BY t.name
ORDER BY t.name;
GO
@@ -67,11 +67,13 @@ public abstract class CommandBase : ICommand
/// Executes the command-specific workflow against the configured OPC UA endpoint.
/// </summary>
/// <param name="console">The CLI console used for output and cancellation handling.</param>
/// <returns>A value task that represents the asynchronous command execution.</returns>
public abstract ValueTask ExecuteAsync(IConsole console);
/// <summary>
/// Creates a <see cref="ConnectionSettings" /> from the common command options.
/// </summary>
/// <returns>A <see cref="ConnectionSettings"/> populated from the current command option values.</returns>
protected ConnectionSettings CreateConnectionSettings()
{
var securityMode = SecurityModeMapper.FromString(Security);
@@ -97,6 +99,7 @@ public abstract class CommandBase : ICommand
/// and returns both the service and the connection info.
/// </summary>
/// <param name="ct">The cancellation token that aborts connection setup for the command.</param>
/// <returns>A tuple of the connected <see cref="IOpcUaClientService"/> and the resulting <see cref="ConnectionInfo"/>.</returns>
protected async Task<(IOpcUaClientService Service, ConnectionInfo Info)> CreateServiceAndConnectAsync(
CancellationToken ct)
{
@@ -42,6 +42,7 @@ public class AlarmsCommand : CommandBase
/// Connects to the server, subscribes to alarm events, and streams operator-facing alarm state changes to the console.
/// </summary>
/// <param name="console">The CLI console used for output and cancellation handling.</param>
/// <inheritdoc />
public override async ValueTask ExecuteAsync(IConsole console)
{
ConfigureLogging();
@@ -36,10 +36,7 @@ public class BrowseCommand : CommandBase
[CommandOption("recursive", 'r', Description = "Browse recursively (uses --depth as max depth)")]
public bool Recursive { get; init; }
/// <summary>
/// Connects to the server and prints a tree view of the requested address-space branch.
/// </summary>
/// <param name="console">The CLI console used for output and cancellation handling.</param>
/// <inheritdoc />
public override async ValueTask ExecuteAsync(IConsole console)
{
ConfigureLogging();
@@ -15,10 +15,7 @@ public class ConnectCommand : CommandBase
{
}
/// <summary>
/// Connects to the server and prints the negotiated endpoint details for operator verification.
/// </summary>
/// <param name="console">The CLI console used for output and cancellation handling.</param>
/// <inheritdoc />
public override async ValueTask ExecuteAsync(IConsole console)
{
ConfigureLogging();
@@ -56,10 +56,7 @@ public class HistoryReadCommand : CommandBase
[CommandOption("interval", Description = "Processing interval in milliseconds for aggregates")]
public double IntervalMs { get; init; } = 3600000;
/// <summary>
/// Connects to the server and prints raw or processed historical values for the requested node.
/// </summary>
/// <param name="console">The CLI console used for output and cancellation handling.</param>
/// <inheritdoc />
public override async ValueTask ExecuteAsync(IConsole console)
{
ConfigureLogging();
@@ -24,10 +24,7 @@ public class ReadCommand : CommandBase
[CommandOption("node", 'n', Description = "Node ID (e.g. ns=2;s=MyNode)", IsRequired = true)]
public string NodeId { get; init; } = default!;
/// <summary>
/// Connects to the server and prints the current value, status, and timestamps for the requested node.
/// </summary>
/// <param name="console">The CLI console used for output and cancellation handling.</param>
/// <inheritdoc />
public override async ValueTask ExecuteAsync(IConsole console)
{
ConfigureLogging();
@@ -15,10 +15,8 @@ public class RedundancyCommand : CommandBase
{
}
/// <summary>
/// Connects to the server and prints redundancy mode, service level, and partner-server identity data.
/// </summary>
/// <param name="console">The CLI console used for output and cancellation handling.</param>
/// <summary>Connects to the server and prints redundancy mode, service level, and partner-server identity data.</summary>
/// <inheritdoc />
public override async ValueTask ExecuteAsync(IConsole console)
{
ConfigureLogging();
@@ -67,11 +67,7 @@ public class SubscribeCommand : CommandBase
[CommandOption("summary-file", Description = "Write summary to this file path on exit (in addition to stdout)")]
public string? SummaryFile { get; init; }
/// <summary>
/// Connects to the server, subscribes to <see cref="NodeId" /> (or its subtree when recursive),
/// streams data-change notifications to the console, and prints a summary when the command exits.
/// </summary>
/// <param name="console">The CLI console used for output and cancellation handling.</param>
/// <inheritdoc />
public override async ValueTask ExecuteAsync(IConsole console)
{
ConfigureLogging();
@@ -35,6 +35,7 @@ public class WriteCommand : CommandBase
/// Connects to the server, converts the supplied value to the node's current data type, and issues the write.
/// </summary>
/// <param name="console">The CLI console used for output and cancellation handling.</param>
/// <inheritdoc />
public override async ValueTask ExecuteAsync(IConsole console)
{
ConfigureLogging();
@@ -9,9 +9,9 @@
</PropertyGroup>
<ItemGroup>
<PackageReference Include="CliFx" Version="2.3.6"/>
<PackageReference Include="Serilog" Version="4.2.0"/>
<PackageReference Include="Serilog.Sinks.Console" Version="6.0.0"/>
<PackageReference Include="CliFx"/>
<PackageReference Include="Serilog"/>
<PackageReference Include="Serilog.Sinks.Console"/>
</ItemGroup>
<ItemGroup>
@@ -12,6 +12,7 @@ internal sealed class DefaultApplicationConfigurationFactory : IApplicationConfi
{
private static readonly ILogger Logger = Log.ForContext<DefaultApplicationConfigurationFactory>();
/// <inheritdoc />
public async Task<ApplicationConfiguration> CreateAsync(ConnectionSettings settings, CancellationToken ct)
{
// Resolve the canonical PKI path lazily on first use so constructing a
@@ -11,6 +11,7 @@ internal sealed class DefaultEndpointDiscovery : IEndpointDiscovery
{
private static readonly ILogger Logger = Log.ForContext<DefaultEndpointDiscovery>();
/// <inheritdoc />
public EndpointDescription SelectEndpoint(ApplicationConfiguration config, string endpointUrl,
MessageSecurityMode requestedMode)
{
@@ -49,6 +50,7 @@ internal static class EndpointSelector
/// Thrown when no endpoint matches <paramref name="requestedMode"/>; the message lists the
/// security mode + policy combinations the server returned so operators can diagnose mismatches.
/// </exception>
/// <returns>The best matching <see cref="EndpointDescription"/> with its URL rewritten to the requested host.</returns>
public static EndpointDescription SelectBest(
IEnumerable<EndpointDescription> allEndpoints,
string endpointUrl,
@@ -11,6 +11,14 @@ internal sealed class DefaultSessionFactory : ISessionFactory
{
private static readonly ILogger Logger = Log.ForContext<DefaultSessionFactory>();
/// <summary>Creates a new OPC UA session.</summary>
/// <param name="config">The OPC UA application configuration.</param>
/// <param name="endpoint">The endpoint description to connect to.</param>
/// <param name="sessionName">The name for the session.</param>
/// <param name="sessionTimeoutMs">The session timeout in milliseconds.</param>
/// <param name="identity">The user identity for the session.</param>
/// <param name="ct">The cancellation token.</param>
/// <returns>An adapter wrapping the created session.</returns>
public async Task<ISessionAdapter> CreateSessionAsync(
ApplicationConfiguration config,
EndpointDescription endpoint,
@@ -11,5 +11,8 @@ internal interface IApplicationConfigurationFactory
/// <summary>
/// Creates a validated ApplicationConfiguration for the given connection settings.
/// </summary>
/// <param name="settings">The connection settings to configure.</param>
/// <param name="ct">Cancellation token for the operation.</param>
/// <returns>A task that resolves to the validated <see cref="ApplicationConfiguration"/>.</returns>
Task<ApplicationConfiguration> CreateAsync(ConnectionSettings settings, CancellationToken ct = default);
}
@@ -11,6 +11,10 @@ internal interface IEndpointDiscovery
/// Discovers endpoints at the given URL and returns the best match for the requested security mode.
/// Also rewrites the endpoint URL hostname to match the requested URL when they differ.
/// </summary>
/// <param name="config">The OPC UA application configuration.</param>
/// <param name="endpointUrl">The endpoint URL to discover.</param>
/// <param name="requestedMode">The requested message security mode.</param>
/// <returns>The best matching endpoint description for the requested security mode.</returns>
EndpointDescription SelectEndpoint(ApplicationConfiguration config, string endpointUrl,
MessageSecurityMode requestedMode);
}
@@ -58,6 +58,7 @@ internal interface ISessionAdapter : IDisposable
/// </summary>
/// <param name="nodeId">The node whose current runtime value should be read.</param>
/// <param name="ct">The cancellation token that aborts the server read if the client cancels the request.</param>
/// <returns>A task that resolves to the current <see cref="DataValue"/> for the node.</returns>
Task<DataValue> ReadValueAsync(NodeId nodeId, CancellationToken ct = default);
/// <summary>
@@ -66,6 +67,7 @@ internal interface ISessionAdapter : IDisposable
/// <param name="nodeId">The node whose value should be updated.</param>
/// <param name="value">The typed OPC UA data value to write to the server.</param>
/// <param name="ct">The cancellation token that aborts the write if the client cancels the request.</param>
/// <returns>A task that resolves to the OPC UA <see cref="StatusCode"/> for the write operation.</returns>
Task<StatusCode> WriteValueAsync(NodeId nodeId, DataValue value, CancellationToken ct = default);
/// <summary>
@@ -75,6 +77,7 @@ internal interface ISessionAdapter : IDisposable
/// <param name="nodeId">The starting node for the hierarchical browse.</param>
/// <param name="nodeClassMask">The node classes that should be returned to the caller.</param>
/// <param name="ct">The cancellation token that aborts the browse request.</param>
/// <returns>A task that resolves to a tuple of an optional continuation point and the returned references.</returns>
Task<(byte[]? ContinuationPoint, ReferenceDescriptionCollection References)> BrowseAsync(
NodeId nodeId, uint nodeClassMask = 0, CancellationToken ct = default);
@@ -83,6 +86,7 @@ internal interface ISessionAdapter : IDisposable
/// </summary>
/// <param name="continuationPoint">The continuation token returned by a prior browse result page.</param>
/// <param name="ct">The cancellation token that aborts the browse-next request.</param>
/// <returns>A task that resolves to a tuple of an optional next continuation point and the returned references.</returns>
Task<(byte[]? ContinuationPoint, ReferenceDescriptionCollection References)> BrowseNextAsync(
byte[] continuationPoint, CancellationToken ct = default);
@@ -91,6 +95,7 @@ internal interface ISessionAdapter : IDisposable
/// </summary>
/// <param name="nodeId">The node to inspect for child objects or variables.</param>
/// <param name="ct">The cancellation token that aborts the child lookup.</param>
/// <returns>A task that resolves to <see langword="true"/> if the node has at least one child; otherwise <see langword="false"/>.</returns>
Task<bool> HasChildrenAsync(NodeId nodeId, CancellationToken ct = default);
/// <summary>
@@ -101,6 +106,7 @@ internal interface ISessionAdapter : IDisposable
/// <param name="endTime">The inclusive end of the requested history window.</param>
/// <param name="maxValues">The maximum number of raw samples to return to the client.</param>
/// <param name="ct">The cancellation token that aborts the history read.</param>
/// <returns>A task that resolves to the ordered list of raw historical data values.</returns>
Task<IReadOnlyList<DataValue>> HistoryReadRawAsync(NodeId nodeId, DateTime startTime, DateTime endTime,
int maxValues, CancellationToken ct = default);
@@ -113,6 +119,7 @@ internal interface ISessionAdapter : IDisposable
/// <param name="aggregateId">The OPC UA aggregate function to evaluate over the history window.</param>
/// <param name="intervalMs">The processing interval, in milliseconds, for each aggregate bucket.</param>
/// <param name="ct">The cancellation token that aborts the aggregate history read.</param>
/// <returns>A task that resolves to the ordered list of processed aggregate data values.</returns>
Task<IReadOnlyList<DataValue>> HistoryReadAggregateAsync(NodeId nodeId, DateTime startTime, DateTime endTime,
NodeId aggregateId, double intervalMs, CancellationToken ct = default);
@@ -121,6 +128,7 @@ internal interface ISessionAdapter : IDisposable
/// </summary>
/// <param name="publishingIntervalMs">The requested publishing interval for monitored items on the new subscription.</param>
/// <param name="ct">The cancellation token that aborts subscription creation.</param>
/// <returns>A task that resolves to the newly created <see cref="ISubscriptionAdapter"/>.</returns>
Task<ISubscriptionAdapter> CreateSubscriptionAsync(int publishingIntervalMs, CancellationToken ct = default);
/// <summary>
@@ -130,11 +138,13 @@ internal interface ISessionAdapter : IDisposable
/// <param name="methodId">The method node to invoke.</param>
/// <param name="inputArguments">The ordered input arguments supplied to the server method call.</param>
/// <param name="ct">The cancellation token that aborts the method invocation.</param>
/// <returns>A task that resolves to the list of output arguments returned by the method, or <see langword="null"/> if none.</returns>
Task<IList<object>?> CallMethodAsync(NodeId objectId, NodeId methodId, object[] inputArguments, CancellationToken ct = default);
/// <summary>
/// Closes the underlying session gracefully before the adapter is disposed or replaced during failover.
/// </summary>
/// <param name="ct">The cancellation token that aborts the close request.</param>
/// <returns>A task that represents the asynchronous operation.</returns>
Task CloseAsync(CancellationToken ct = default);
}
@@ -28,6 +28,7 @@ internal interface ISubscriptionAdapter : IDisposable
/// </summary>
/// <param name="clientHandle">The client handle returned when the monitored item was created.</param>
/// <param name="ct">The cancellation token that aborts the monitored-item removal.</param>
/// <returns>A task that represents the asynchronous operation.</returns>
Task RemoveMonitoredItemAsync(uint clientHandle, CancellationToken ct = default);
/// <summary>
@@ -46,11 +47,13 @@ internal interface ISubscriptionAdapter : IDisposable
/// Requests a condition refresh for this subscription.
/// </summary>
/// <param name="ct">The cancellation token that aborts the condition refresh request.</param>
/// <returns>A task that represents the asynchronous operation.</returns>
Task ConditionRefreshAsync(CancellationToken ct = default);
/// <summary>
/// Removes all monitored items and deletes the subscription.
/// </summary>
/// <param name="ct">The cancellation token that aborts subscription deletion.</param>
/// <returns>A task that represents the asynchronous operation.</returns>
Task DeleteAsync(CancellationToken ct = default);
}
@@ -28,6 +28,7 @@ public static class ClientStoragePaths
/// one-shot legacy-folder migration before returning so callers that depend on this
/// path (PKI store, settings file) find their existing state at the canonical name.
/// </summary>
/// <returns>The absolute path to the client's top-level folder under LocalApplicationData.</returns>
public static string GetRoot()
{
var localAppData = Environment.GetFolderPath(Environment.SpecialFolder.LocalApplicationData);
@@ -37,6 +38,7 @@ public static class ClientStoragePaths
}
/// <summary>Subfolder for the application's PKI store — used by both CLI + UI.</summary>
/// <returns>The absolute path to the PKI store subfolder.</returns>
public static string GetPkiPath() => Path.Combine(GetRoot(), "pki");
/// <summary>
@@ -45,6 +47,7 @@ public static class ClientStoragePaths
/// folder existed + was moved to canonical, false when no migration was needed or
/// canonical was already present.
/// </summary>
/// <returns><see langword="true"/> when the legacy folder was found and moved; <see langword="false"/> when no migration was needed.</returns>
public static bool TryRunLegacyMigration()
{
var localAppData = Environment.GetFolderPath(Environment.SpecialFolder.LocalApplicationData);
@@ -11,6 +11,8 @@ public static class AggregateTypeMapper
/// <summary>
/// Returns the OPC UA NodeId for the specified aggregate type.
/// </summary>
/// <param name="aggregate">The aggregate type to map to a NodeId.</param>
/// <returns>The OPC UA NodeId for the aggregate function.</returns>
public static NodeId ToNodeId(AggregateType aggregate)
{
return aggregate switch
@@ -8,9 +8,9 @@ namespace ZB.MOM.WW.OtOpcUa.Client.Shared.Helpers;
/// </summary>
public static class SecurityModeMapper
{
/// <summary>
/// Converts a <see cref="SecurityMode" /> to an OPC UA <see cref="MessageSecurityMode" />.
/// </summary>
/// <summary>Converts a SecurityMode to an OPC UA MessageSecurityMode.</summary>
/// <param name="mode">The security mode to convert.</param>
/// <returns>The corresponding message security mode.</returns>
public static MessageSecurityMode ToMessageSecurityMode(SecurityMode mode)
{
return mode switch
@@ -24,12 +24,14 @@ public interface IOpcUaClientService : IDisposable
/// </summary>
/// <param name="settings">The endpoint, security, and authentication settings used to establish the session.</param>
/// <param name="ct">The cancellation token that aborts the connect workflow.</param>
/// <returns>A <see cref="ConnectionInfo"/> describing the active session after a successful connect.</returns>
Task<ConnectionInfo> ConnectAsync(ConnectionSettings settings, CancellationToken ct = default);
/// <summary>
/// Disconnects from the active OPC UA endpoint and tears down subscriptions owned by the client.
/// </summary>
/// <param name="ct">The cancellation token that aborts disconnect cleanup.</param>
/// <returns>A task that represents the asynchronous operation.</returns>
Task DisconnectAsync(CancellationToken ct = default);
/// <summary>
@@ -37,6 +39,7 @@ public interface IOpcUaClientService : IDisposable
/// </summary>
/// <param name="nodeId">The node whose value should be retrieved.</param>
/// <param name="ct">The cancellation token that aborts the read request.</param>
/// <returns>The current <see cref="DataValue"/> including value, status code, and timestamps.</returns>
Task<DataValue> ReadValueAsync(NodeId nodeId, CancellationToken ct = default);
/// <summary>
@@ -45,6 +48,7 @@ public interface IOpcUaClientService : IDisposable
/// <param name="nodeId">The node whose value should be updated.</param>
/// <param name="value">The raw value supplied by the CLI or UI workflow.</param>
/// <param name="ct">The cancellation token that aborts the write request.</param>
/// <returns>The OPC UA <see cref="StatusCode"/> returned by the server for the write operation.</returns>
Task<StatusCode> WriteValueAsync(NodeId nodeId, object value, CancellationToken ct = default);
/// <summary>
@@ -52,6 +56,7 @@ public interface IOpcUaClientService : IDisposable
/// </summary>
/// <param name="parentNodeId">The node to browse, or <see cref="ObjectIds.ObjectsFolder"/> when omitted.</param>
/// <param name="ct">The cancellation token that aborts the browse request.</param>
/// <returns>The list of child nodes discovered under the specified parent.</returns>
Task<IReadOnlyList<BrowseResult>> BrowseAsync(NodeId? parentNodeId = null, CancellationToken ct = default);
/// <summary>
@@ -60,6 +65,7 @@ public interface IOpcUaClientService : IDisposable
/// <param name="nodeId">The node whose value changes should be monitored.</param>
/// <param name="intervalMs">The monitored-item sampling and publishing interval in milliseconds.</param>
/// <param name="ct">The cancellation token that aborts subscription creation.</param>
/// <returns>A task that represents the asynchronous operation.</returns>
Task SubscribeAsync(NodeId nodeId, int intervalMs = 1000, CancellationToken ct = default);
/// <summary>
@@ -67,6 +73,7 @@ public interface IOpcUaClientService : IDisposable
/// </summary>
/// <param name="nodeId">The node whose live-data subscription should be removed.</param>
/// <param name="ct">The cancellation token that aborts the unsubscribe request.</param>
/// <returns>A task that represents the asynchronous operation.</returns>
Task UnsubscribeAsync(NodeId nodeId, CancellationToken ct = default);
/// <summary>
@@ -75,18 +82,21 @@ public interface IOpcUaClientService : IDisposable
/// <param name="sourceNodeId">The event source to monitor, or the server object when omitted.</param>
/// <param name="intervalMs">The publishing interval in milliseconds for the alarm subscription.</param>
/// <param name="ct">The cancellation token that aborts alarm subscription creation.</param>
/// <returns>A task that represents the asynchronous operation.</returns>
Task SubscribeAlarmsAsync(NodeId? sourceNodeId = null, int intervalMs = 1000, CancellationToken ct = default);
/// <summary>
/// Removes the active alarm subscription.
/// </summary>
/// <param name="ct">The cancellation token that aborts alarm subscription cleanup.</param>
/// <returns>A task that represents the asynchronous operation.</returns>
Task UnsubscribeAlarmsAsync(CancellationToken ct = default);
/// <summary>
/// Requests retained alarm conditions again so a client can repopulate its alarm list after reconnecting.
/// </summary>
/// <param name="ct">The cancellation token that aborts the condition refresh request.</param>
/// <returns>A task that represents the asynchronous operation.</returns>
Task RequestConditionRefreshAsync(CancellationToken ct = default);
/// <summary>
@@ -111,6 +121,7 @@ public interface IOpcUaClientService : IDisposable
/// <param name="endTime">The inclusive end of the requested history range.</param>
/// <param name="maxValues">The maximum number of raw values to return.</param>
/// <param name="ct">The cancellation token that aborts the history read.</param>
/// <returns>The raw historical <see cref="DataValue"/> samples in the requested range.</returns>
Task<IReadOnlyList<DataValue>> HistoryReadRawAsync(NodeId nodeId, DateTime startTime, DateTime endTime,
int maxValues = 1000, CancellationToken ct = default);
@@ -123,6 +134,7 @@ public interface IOpcUaClientService : IDisposable
/// <param name="aggregate">The aggregate function the operator selected for processed history.</param>
/// <param name="intervalMs">The processing interval, in milliseconds, for each aggregate bucket.</param>
/// <param name="ct">The cancellation token that aborts the processed history request.</param>
/// <returns>The processed historical <see cref="DataValue"/> samples computed by the requested aggregate.</returns>
Task<IReadOnlyList<DataValue>> HistoryReadAggregateAsync(NodeId nodeId, DateTime startTime, DateTime endTime,
AggregateType aggregate, double intervalMs = 3600000, CancellationToken ct = default);
@@ -130,6 +142,7 @@ public interface IOpcUaClientService : IDisposable
/// Reads redundancy status data such as redundancy mode, service level, and partner endpoint URIs.
/// </summary>
/// <param name="ct">The cancellation token that aborts redundancy inspection.</param>
/// <returns>A <see cref="RedundancyInfo"/> snapshot containing redundancy mode, service level, and partner endpoint URIs.</returns>
Task<RedundancyInfo> GetRedundancyInfoAsync(CancellationToken ct = default);
/// <summary>
@@ -5,5 +5,7 @@ namespace ZB.MOM.WW.OtOpcUa.Client.Shared;
/// </summary>
public interface IOpcUaClientServiceFactory
{
/// <summary>Creates a new OPC UA client service instance.</summary>
/// <returns>A new <see cref="IOpcUaClientService"/> instance.</returns>
IOpcUaClientService Create();
}
@@ -5,6 +5,20 @@ namespace ZB.MOM.WW.OtOpcUa.Client.Shared.Models;
/// </summary>
public sealed class AlarmEventArgs : EventArgs
{
/// <summary>Initializes a new instance of the <see cref="AlarmEventArgs"/> class.</summary>
/// <param name="sourceName">The name of the source object that raised the alarm.</param>
/// <param name="conditionName">The condition type name.</param>
/// <param name="severity">The alarm severity (0-1000).</param>
/// <param name="message">Human-readable alarm message.</param>
/// <param name="retain">Whether the alarm should be retained in the display.</param>
/// <param name="activeState">Whether the alarm condition is currently active.</param>
/// <param name="ackedState">Whether the alarm has been acknowledged.</param>
/// <param name="time">The time the event occurred.</param>
/// <param name="eventId">The EventId used for alarm acknowledgment.</param>
/// <param name="conditionNodeId">The NodeId of the condition instance.</param>
/// <param name="operatorComment">Operator-supplied comment on acknowledgment transitions.</param>
/// <param name="originalRaiseTimestampUtc">When the alarm originally entered the active state.</param>
/// <param name="alarmCategory">Upstream alarm taxonomy bucket (e.g. Process, Safety, Diagnostics).</param>
public AlarmEventArgs(
string sourceName,
string conditionName,
@@ -5,6 +5,11 @@ namespace ZB.MOM.WW.OtOpcUa.Client.Shared.Models;
/// </summary>
public sealed class BrowseResult
{
/// <summary>Initializes a new instance of the BrowseResult class.</summary>
/// <param name="nodeId">The string representation of the node's NodeId.</param>
/// <param name="displayName">The display name of the node.</param>
/// <param name="nodeClass">The node class (e.g., "Object", "Variable", "Method").</param>
/// <param name="hasChildren">Whether the node has child references.</param>
public BrowseResult(string nodeId, string displayName, string nodeClass, bool hasChildren)
{
NodeId = nodeId;
@@ -5,6 +5,13 @@ namespace ZB.MOM.WW.OtOpcUa.Client.Shared.Models;
/// </summary>
public sealed class ConnectionInfo
{
/// <summary>Initializes a new instance of the ConnectionInfo with session details.</summary>
/// <param name="endpointUrl">The endpoint URL of the connected server.</param>
/// <param name="serverName">The server application name.</param>
/// <param name="securityMode">The security mode in use.</param>
/// <param name="securityPolicyUri">The security policy URI.</param>
/// <param name="sessionId">The session identifier.</param>
/// <param name="sessionName">The session name.</param>
public ConnectionInfo(
string endpointUrl,
string serverName,
@@ -5,6 +5,10 @@ namespace ZB.MOM.WW.OtOpcUa.Client.Shared.Models;
/// </summary>
public sealed class ConnectionStateChangedEventArgs : EventArgs
{
/// <summary>Initializes a new instance of the ConnectionStateChangedEventArgs class.</summary>
/// <param name="oldState">The previous connection state.</param>
/// <param name="newState">The new connection state.</param>
/// <param name="endpointUrl">The endpoint URL associated with the state change.</param>
public ConnectionStateChangedEventArgs(ConnectionState oldState, ConnectionState newState, string endpointUrl)
{
OldState = oldState;
@@ -7,6 +7,9 @@ namespace ZB.MOM.WW.OtOpcUa.Client.Shared.Models;
/// </summary>
public sealed class DataChangedEventArgs : EventArgs
{
/// <summary>Initializes a new instance of the DataChangedEventArgs class.</summary>
/// <param name="nodeId">The node ID that changed.</param>
/// <param name="value">The new data value.</param>
public DataChangedEventArgs(string nodeId, DataValue value)
{
NodeId = nodeId;
@@ -5,6 +5,11 @@ namespace ZB.MOM.WW.OtOpcUa.Client.Shared.Models;
/// </summary>
public sealed class RedundancyInfo
{
/// <summary>Initializes a new instance of the RedundancyInfo class.</summary>
/// <param name="mode">The redundancy mode (e.g., "None", "Cold", "Warm", "Hot").</param>
/// <param name="serviceLevel">The server's current service level (0-255).</param>
/// <param name="serverUris">URIs of all servers in the redundant set.</param>
/// <param name="applicationUri">The application URI of the connected server.</param>
public RedundancyInfo(string mode, byte serviceLevel, string[] serverUris, string applicationUri)
{
Mode = mode;
@@ -73,13 +73,13 @@ public sealed class OpcUaClientService : IOpcUaClientService
{
}
/// <inheritdoc />
/// <summary>Raised when subscribed node values change.</summary>
public event EventHandler<DataChangedEventArgs>? DataChanged;
/// <inheritdoc />
/// <summary>Raised when an alarm event is received from the server.</summary>
public event EventHandler<AlarmEventArgs>? AlarmEvent;
/// <inheritdoc />
/// <summary>Raised when the connection state changes.</summary>
public event EventHandler<ConnectionStateChangedEventArgs>? ConnectionStateChanged;
/// <inheritdoc />
@@ -5,6 +5,8 @@ namespace ZB.MOM.WW.OtOpcUa.Client.Shared;
/// </summary>
public sealed class OpcUaClientServiceFactory : IOpcUaClientServiceFactory
{
/// <summary>Creates a new OPC UA client service instance with production adapters.</summary>
/// <returns>A new OpcUaClientService instance.</returns>
public IOpcUaClientService Create()
{
return new OpcUaClientService();
@@ -8,8 +8,8 @@
</PropertyGroup>
<ItemGroup>
<PackageReference Include="OPCFoundation.NetStandard.Opc.Ua.Client" Version="1.5.378.106"/>
<PackageReference Include="Serilog" Version="4.2.0"/>
<PackageReference Include="OPCFoundation.NetStandard.Opc.Ua.Client"/>
<PackageReference Include="Serilog"/>
</ItemGroup>
<ItemGroup>
@@ -10,11 +10,13 @@ namespace ZB.MOM.WW.OtOpcUa.Client.UI;
public class App : Application
{
/// <inheritdoc />
public override void Initialize()
{
AvaloniaXamlLoader.Load(this);
}
/// <inheritdoc />
public override void OnFrameworkInitializationCompleted()
{
if (ApplicationLifetime is IClassicDesktopStyleApplicationLifetime desktop)

Some files were not shown because too many files have changed in this diff Show More