Commit Graph

56 Commits

Author SHA1 Message Date
Joseph Doherty 10efcf4517 feat(opcua): write-outcome self-correction — capture prior + compare-and-revert on failure 2026-06-14 01:30:20 -04:00
Joseph Doherty bb5832e900 feat(server): inbound operator-write pipeline — OnWriteValue authz gate + node-write router 2026-06-13 12:35:15 -04:00
Joseph Doherty 9357c001b7 fix(opcuaclient): register the OpcUaClient driver factory (was always stubbed) 2026-06-13 08:20:02 -04:00
Joseph Doherty e2960515cf chore(historian): drop dead pipe package ref + stale pipe strings (review) 2026-06-12 12:02:05 -04:00
Joseph Doherty 72f32045a4 refactor(historian): remove named-pipe transport 2026-06-12 11:51:53 -04:00
Joseph Doherty 7ce7505a36 feat(historian-host): bind TCP host/port/tls config 2026-06-12 11:19:46 -04:00
Joseph Doherty 57355405a6 chore(security): drop dead audit suppressions; patch OpenTelemetry + Tmds.DBus CVEs
All five suppressed advisories are now resolved at baseline/resolved versions,
so every NuGetAuditSuppress is removed repo-wide:
- System.Security.Cryptography.Xml (GHSA-37gx-xxp4-5rgx / GHSA-w3x6-4m5h-cxqf)
  -> fixed by the .NET 10 baseline (10.0.6)
- OPCFoundation Opc.Ua.Core (GHSA-h958-fxgg-g7w3) -> fixed at resolved 1.5.378.106

Two were still live and are now patched via direct security pins:
- OpenTelemetry.Api 1.9.0 -> 1.15.3 (GHSA-g94r-2vxg-569j) pinned in Cluster;
  Runtime/ControlPlane/AdminUI + tests inherit via project reference
- Tmds.DBus.Protocol 0.20.0 -> 0.21.3 (GHSA-xrw6-gwf8-vvr9) pinned in Client.UI

Also correct the Historian sidecar runtime comments (x86 -> x64, matching the
csproj PlatformTarget). Solution audit: 0 vulnerable packages; full build clean.
2026-06-12 09:03:42 -04:00
Joseph Doherty f215982b93 feat(historian): drain/capacity/retention config knobs + startup config-warning validation 2026-06-11 13:04:16 -04:00
Joseph Doherty 943c621371 feat(historian): config-gated SqliteStoreAndForward→Wonderware sink (AddAlarmHistorian) 2026-06-11 11:30:31 -04:00
Joseph Doherty 63289d377c feat(alarms): route inbound Part 9 alarm methods through AlarmAck gate (T18)
Wire the materialised AlarmConditionState method handlers so a client calling
Acknowledge/Confirm/Shelve/AddComment is gated on the AlarmAck data-plane role
and, when allowed, routed back to the scripted-alarm engine via a new
`alarm-commands` DistributedPubSub topic.

- Commons: new AlarmCommand DTO (AlarmId/Operation/User/Comment/UnshelveAtUtc).
- ScriptedAlarmHostActor: add AlarmCommandsTopic const.
- OtOpcUaNodeManager: settable AlarmCommandRouter + wire OnAcknowledge/OnConfirm/
  OnAddComment/OnShelve/OnTimedUnshelve. Each resolves the principal off
  ISessionOperationContext.UserIdentity as RoleCarryingUserIdentity, fails closed
  (BadUserAccessDenied) when the AlarmAck role is absent or no identity, else maps
  + routes an AlarmCommand and returns Good. OnShelve discriminates OneShotShelve/
  TimedShelve/Unshelve from the SDK flags; TimedShelve expiry = UtcNow + ms.
  No Akka/IActorRef handle — only the Action<AlarmCommand> delegate. T20 de-dup
  note left; WriteAlarmCondition untouched.
- OpcUaServer.Security: OpcUaDataPlaneRoles.AlarmAck shared const (the role was a
  bare string everywhere; introduced one symbol for the gate + tests).
- OtOpcUaSdkServer: SetAlarmCommandRouter pass-through.
- Host: boot wiring publishes each command via mediator.Tell(Publish(...)) using a
  lazy ActorSystem accessor (mirrors DpsScriptLogPublisher).
- Tests: 11 new gate + mapping tests (OpcUaServer.Tests 88->99, all green).
2026-06-11 06:05:39 -04:00
Joseph Doherty a27e82c8d1 chore(dev): Security:Auth:DisableLogin default off; enable in docker-dev central nodes 2026-06-11 04:29:47 -04:00
Joseph Doherty fc0d43a3dc refactor(scripted-alarms): retire orphaned ScriptedAlarmActor + F9b evaluator (T11) 2026-06-10 15:22:26 -04:00
Joseph Doherty bd2dd05a0c feat(scripting): evaluators log through root script logger → script-log page (F8) 2026-06-10 12:03:51 -04:00
Joseph Doherty bf86b3def6 fix(scripting): explicit companion logger + disposable ScriptRootLogger (T2 review) 2026-06-10 11:56:51 -04:00
Joseph Doherty 73014258ef feat(scripting): root script logger + DPS publisher wired in Host 2026-06-10 11:50:50 -04:00
Joseph Doherty b54a6ad29f feat(adminui): script-analysis contracts, wrapper seam, endpoints + DI 2026-06-09 14:17:31 -04:00
Joseph Doherty 08d7477860 feat(vtag): passthrough fast-path skips Roslyn for mirror scripts (A) 2026-06-07 15:26:20 -04:00
Joseph Doherty 2676fc17b5 fix(host): pin EF/AspNetCore logs to Warning in appsettings.Development 2026-06-07 11:19:26 -04:00
Joseph Doherty ad7f9e731f feat(admin): headless POST /api/deployments REST endpoint (API-key gated)
v2-ci / build (push) Failing after 50s
v2-ci / unit-tests (tests/Core/ZB.MOM.WW.OtOpcUa.Cluster.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.ControlPlane.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Runtime.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Security.Tests) (push) Has been skipped
v2-ci / integration (tests/Server/ZB.MOM.WW.OtOpcUa.Host.IntegrationTests) (push) Has been skipped
v2-ci / integration (tests/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer.IntegrationTests) (push) Has been skipped
A thin gateway over the admin-operations cluster singleton so CI/scripts can trigger a
deployment without the Blazor button. Forwards to the same IAdminOperationsClient.
StartDeploymentAsync; mounted on admin-role nodes. Auth is a fixed-time X-Api-Key check
against Security:DeployApiKey (orthogonal to the cookie-only web auth); AllowAnonymous so the
auth fallback doesn't 401 it, self-disabling (503) until the key is set. Outcome->status:
202/200/409/422. Unit tests for the key check + outcome mapping; HTTP E2E (real auth + real
deploy via the 2-node harness). Documented in docs/security.md.
2026-06-06 15:54:51 -04:00
Joseph Doherty c3ae458a95 fix(config): let env vars override per-role appsettings overlay + correct dev-compose Ldap section
v2-ci / build (push) Failing after 46s
v2-ci / unit-tests (tests/Core/ZB.MOM.WW.OtOpcUa.Cluster.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.ControlPlane.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Runtime.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Security.Tests) (push) Has been skipped
v2-ci / integration (tests/Server/ZB.MOM.WW.OtOpcUa.Host.IntegrationTests) (push) Has been skipped
v2-ci / integration (tests/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer.IntegrationTests) (push) Has been skipped
The per-role overlay (appsettings.{role}.json) was appended after WebApplicationBuilder's
default sources, so it outranked environment variables — a baked role file could not be
overridden by a deployment env var. In otopcua-dev this meant appsettings.admin.json's
Security:Ldap:DevStubMode=false beat the compose's DevStub override, so every AdminUI login
attempted a real LDAP bind against a non-existent server and failed with 'Unexpected
authentication error'.

- Program.cs: re-append AddEnvironmentVariables() + AddCommandLine(args) after the role
  overlay so deployment overrides keep top precedence (overlay still beats base appsettings).
- docker-dev/docker-compose.yml: the DevStub env var targeted the stale 'Authentication:Ldap'
  section; the code reads 'Security:Ldap'. Corrected the prefix on every host node (+ comment).

Dev AdminUI login now signs in as Administrator via the DevStub bypass.
2026-06-04 11:37:21 -04:00
Joseph Doherty 11de14d12e refactor(adminui): explicit ClaimTypes.Role footer filter; fix stale NavSidebar comment
v2-ci / build (push) Failing after 45s
v2-ci / unit-tests (tests/Core/ZB.MOM.WW.OtOpcUa.Cluster.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.ControlPlane.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Runtime.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Security.Tests) (push) Has been skipped
v2-ci / integration (tests/Server/ZB.MOM.WW.OtOpcUa.Host.IntegrationTests) (push) Has been skipped
v2-ci / integration (tests/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer.IntegrationTests) (push) Has been skipped
2026-06-03 03:18:08 -04:00
Joseph Doherty c4f315ec90 fix(auth): OtOpcUa 1.2 review fixes — startup insecure-transport guard + Ldaps in prod overlays, test fidelity, 0.1.1 pin 2026-06-02 01:37:29 -04:00
Joseph Doherty 257caa7bd1 feat(auth): cut OtOpcUa over to ZB.MOM.WW.Auth.Ldap; preserve DevStubMode; route roles via IGroupRoleMapper (Task 1.2/1.4) 2026-06-02 00:55:10 -04:00
Joseph Doherty 2844180865 fix: honor LdapOptions.Enabled at runtime; dedupe ILdapAuthService registration; +SearchBase test, doc fix
v2-ci / build (push) Failing after 41s
v2-ci / unit-tests (tests/Core/ZB.MOM.WW.OtOpcUa.Cluster.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.ControlPlane.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Runtime.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Security.Tests) (push) Has been skipped
v2-ci / integration (tests/Server/ZB.MOM.WW.OtOpcUa.Host.IntegrationTests) (push) Has been skipped
v2-ci / integration (tests/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer.IntegrationTests) (push) Has been skipped
2026-06-01 23:03:12 -04:00
Joseph Doherty d3ab2bfbaf fix: bind OtOpcUa LdapOptions from real Security:Ldap section; gate validator on DevStubMode 2026-06-01 22:46:09 -04:00
Joseph Doherty 88e773af36 feat: validate OpcUa host options at startup (route through IOptions + ValidateOnStart) 2026-06-01 18:45:55 -04:00
Joseph Doherty f35ebd7aaf feat: add fail-fast LDAP options validation in OtOpcUa via ZB.MOM.WW.Configuration 2026-06-01 18:32:44 -04:00
Joseph Doherty 7ff7a60ae0 feat(otopcua): config-driven OTLP exporter opt-in (default Prometheus) 2026-06-01 16:40:24 -04:00
Joseph Doherty 60017177cb feat(otopcua): adopt AddZbSerilog (shared enrichers + trace correlation); sinks to config 2026-06-01 15:41:21 -04:00
Joseph Doherty 26bae36f8b feat(otopcua): wire OTel via AddZbTelemetry (shared Resource + std instrumentation) 2026-06-01 15:33:28 -04:00
Joseph Doherty 368390ea9d build(otopcua): reference ZB.MOM.WW.Telemetry packages from Gitea feed 2026-06-01 15:29:46 -04:00
Joseph Doherty 1d729fb0f8 feat: adopt shared ZB.MOM.WW.Health probes (preserve tiers + OtOpcUaCompat policy) 2026-06-01 13:36:28 -04:00
Joseph Doherty 0b99aceacb build: reference ZB.MOM.WW.Health packages from the Gitea feed 2026-06-01 13:30:13 -04:00
Joseph Doherty 61193629b6 fix(adminui): wire Test Connect probes + live panels on admin-only nodes
v2-ci / build (push) Failing after 36s
v2-ci / unit-tests (tests/Core/ZB.MOM.WW.OtOpcUa.Cluster.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.ControlPlane.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Runtime.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Security.Tests) (push) Has been skipped
v2-ci / integration (tests/Server/ZB.MOM.WW.OtOpcUa.Host.IntegrationTests) (push) Has been skipped
v2-ci / integration (tests/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer.IntegrationTests) (push) Has been skipped
Both bugs surfaced only on split-role deployments (the MAIN cluster's
admin-only nodes), where the AdminUI runs without the driver role.

- Test Connect returned "No probe registered" for every driver: the
  IDriverProbe set was registered only under the driver role, but the
  admin-operations singleton that consumes it is pinned to admin. Extract
  AddOtOpcUaDriverProbes() (idempotent via TryAddEnumerable) and call it
  in the hasAdmin path too.

- Live driver-status/alerts/script-log panels showed "SignalR error:
  Connection refused": these Blazor Server components opened a HubConnection
  to their own hub via the browser's public URL, which server-side code
  can't reach behind Traefik (host :9200 -> container :9000). Read the
  in-process source directly instead -- DriverStatus via
  IDriverStatusSnapshotStore.SnapshotChanged, Alerts/ScriptLog via a new
  IInProcessBroadcaster<T>. Fleet status was unaffected (reads DB/ActorSystem).

Adds unit tests for probe registration, the snapshot-store event, and the
broadcaster.
2026-05-29 16:38:32 -04:00
Joseph Doherty c19d124e89 feat(drivers): TCP-connect IDriverProbe for all 9 driver types
Cheap-and-fast probe: open TCP socket to the configured endpoint,
close immediately. Surfaces SocketError on failure, latency on
success, "timed out" on caller cancel. Sufficient for the AdminUI
Test Connect "can we reach the host?" question. Richer protocol-
level probes (OPC UA session open, FOCAS handshake, gRPC ping)
are a documented follow-up. Each probe registered as
AddSingleton<IDriverProbe, X> in DriverFactoryBootstrap so they
flow through DI into AdminOperationsActor.

Historian.Wonderware returns a clean "TCP probe not applicable"
result because it communicates over a Windows named pipe, not TCP.
Also adds OpcUaClient + Historian.Wonderware.Client project
references to Host.csproj (both were missing from the driver
ItemGroup).
2026-05-28 10:53:42 -04:00
Joseph Doherty 64e3fbe035 docs: backfill XML documentation across 756 files
v2-ci / build (push) Failing after 1m43s
v2-ci / unit-tests (tests/Core/ZB.MOM.WW.OtOpcUa.Cluster.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.ControlPlane.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Runtime.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Security.Tests) (push) Has been skipped
v2-ci / integration (tests/Server/ZB.MOM.WW.OtOpcUa.Host.IntegrationTests) (push) Has been skipped
v2-ci / integration (tests/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer.IntegrationTests) (push) Has been skipped
Adds <summary>, <param>, <typeparam>, and <inheritdoc/> tags to public
members surfaced by commentchecker — resolves 5,847 of 5,869 issues
(99.6%) across three /fixdocs passes.
2026-05-28 08:10:17 -04:00
Joseph Doherty f9fc7dd2e1 feat(host): wire UseWindowsService so sc.exe-installed service runs cleanly
v2-ci / build (push) Failing after 45s
v2-ci / unit-tests (tests/Core/ZB.MOM.WW.OtOpcUa.Cluster.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.ControlPlane.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Runtime.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Security.Tests) (push) Has been skipped
v2-ci / integration (tests/Server/ZB.MOM.WW.OtOpcUa.Host.IntegrationTests) (push) Has been skipped
v2-ci / integration (tests/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer.IntegrationTests) (push) Has been skipped
v2-e2e / e2e (push) Failing after 37s
The v2 plan's blessed install path (scripts/install/Install-Services.ps1)
registers the host via `sc.exe create binPath=...OtOpcUa.Host.exe`, but the
binary never called `UseWindowsService`. Without it, the Service Control
Manager waits ~30s for the process to call SetServiceStatus(Running) and
then kills it — the install script's design was incomplete.

Two changes:

- Host.csproj: drop the `IsOSPlatform('Windows')` condition on the
  Microsoft.Extensions.Hosting.WindowsServices package reference so the
  package is always available. The runtime helper used by
  UseWindowsService gates on WindowsServiceHelpers.IsWindowsService()
  internally, so it's a no-op when running as a console app or under
  Linux/macOS — the binary stays cross-platform-buildable.

- Program.cs: call builder.Host.UseWindowsService(options =>
  options.ServiceName = "OtOpcUaHost") immediately after CreateBuilder.
  When the host is launched by SCM, WindowsServiceLifetime takes over
  the IHostLifetime slot and reports START/STOP correctly. When launched
  by `dotnet run` or `OtOpcUa.Host.exe` from a console, it's a no-op.

Verified end-to-end on wonder-app-vd03.zmr.zimmer.com: `sc.exe create`
followed by `sc.exe start OtOpcUaHost` transitions from START_PENDING to
RUNNING; /login + /health/ready + /health/active all return 200; service
survives SSH session close and auto-starts on boot per the AUTO_START
flag set by the installer script.
2026-05-26 17:07:52 -04:00
Joseph Doherty ed1c17bc7b fix(deploy,host): docker-dev bring-up — anon health probes, robust seeder
v2-ci / build (push) Failing after 32s
v2-ci / unit-tests (tests/Core/ZB.MOM.WW.OtOpcUa.Cluster.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.ControlPlane.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Runtime.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Security.Tests) (push) Has been skipped
v2-ci / integration (tests/Server/ZB.MOM.WW.OtOpcUa.Host.IntegrationTests) (push) Has been skipped
v2-ci / integration (tests/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer.IntegrationTests) (push) Has been skipped
Two fixes surfaced while bringing up the docker-dev stack end-to-end:

- HealthEndpoints.MapOtOpcUaHealth now calls .AllowAnonymous() on /health/ready,
  /health/active, /healthz. Without it the AddOtOpcUaAuth fallback policy 401s
  every probe and Traefik marks every backend unhealthy → all three cluster
  routes return 503.

- cluster-seed entrypoint no longer attempts to apply Migrate-To-V2.sql via
  sqlcmd. The EF-generated idempotent script puts CREATE PROCEDURE inside
  IF NOT EXISTS BEGIN ... END blocks (procs must be first in their batch),
  so sqlcmd fails with "Must declare the scalar variable @FromGenerationId".
  EF's own runner handles this; sqlcmd doesn't. The seed now just waits for
  the schema and applies row inserts. Migrations remain the operator's job:
      dotnet ef database update --project src/Core/.../Configuration \
                                --startup-project src/Server/.../Host

Also: LDAP service removed (bitnami/openldap:2.6 image retired, legacy tag
crashes mid-setup with exit 68); every host now runs with
Authentication__Ldap__DevStubMode=true. Bumped LDAP+Traefik dashboard host
ports to avoid collisions with the sister scadalink dev stack (3893→3894,
8080→8089).

Confirmed working end-to-end: all three Traefik routes return HTTP 200,
cluster-seed populates ServerCluster (MAIN/SITE-A/SITE-B) + ClusterNode
(driver-a/b, site-a-1/2, site-b-1/2) rows on first boot.
2026-05-26 14:37:01 -04:00
Joseph Doherty 2915755a7c fix(host,security): wire static assets, DI lifetimes, form login, dev-stub LDAP
Six interlocking fixes surfaced while smoke-testing the fused Host in a browser:

- Host/Program.cs: UseStaticWebAssets() opts into the RCL static-asset pipeline
  in any environment (auto-only in Development), MapStaticAssets().AllowAnonymous()
  exempts CSS/JS from the AddOtOpcUaAuth fallback policy, and
  AddCascadingAuthenticationState() lets <AuthorizeView/> work inside interactive
  components (NavSidebar's session block).
- Security/ServiceCollectionExtensions: ILdapAuthService Scoped → Singleton —
  consumed by the Singleton LdapOpcUaUserAuthenticator on driver-role nodes.
  Crash only surfaced in Development (ValidateOnBuild=true).
- Security/Endpoints/AuthEndpoints: /auth/login now dispatches on Content-Type —
  application/json keeps the original 204/401/503 contract for tests, and
  application/x-www-form-urlencoded (the browser <form>) gets a redirect dance.
  DisableAntiforgery on the login endpoint (it's the entry point, no prior session)
  and AllowAnonymous to override the fallback policy.
- Security/Ldap/LdapOptions + LdapAuthService: real DevStubMode property; when
  true the auth service bypasses the LDAP bind and returns a FleetAdmin role so
  dev/test can navigate the full Admin UI without GLAuth running.
- AdminUI/EndpointRouteBuilderExtensions: doc-comment update about static-asset
  flow (the actual MapStaticAssets call lives in Host/Program.cs).
2026-05-26 13:48:18 -04:00
Joseph Doherty 898a47746d feat(host): add per-role appsettings overlays for admin/driver/admin-driver 2026-05-26 11:19:10 -04:00
Joseph Doherty 05a0596fb1 feat(host): F9b RoslynScriptedAlarmEvaluator + #107 close engine DI
v2-ci / build (push) Failing after 39s
v2-ci / unit-tests (tests/Core/ZB.MOM.WW.OtOpcUa.Cluster.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.ControlPlane.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Runtime.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Security.Tests) (push) Has been skipped
v2-ci / integration (push) Has been skipped
RoslynScriptedAlarmEvaluator mirrors F8b's pattern for alarm predicates:
caches a compiled ScriptEvaluator<AlarmPredicateContext, bool> per unique
predicate, runs against the dependency dictionary with a 2s timeout, and
turns every failure (compile error, sandbox violation, runtime throw,
ctx.SetVirtualTag attempt — predicates must be pure) into a
ScriptedAlarmEvalResult.Failure. ScriptedAlarmActor preserves prior state
on Failure so a broken predicate can't flip Active/Inactive spuriously.

Program.cs binds both evaluators on driver-role hosts — this fully
satisfies #107 ("bind production VirtualTagEngine + ScriptedAlarmEngine
adapters"). The two Roslyn adapters together replace the F8 + F9 Null
defaults, so VirtualTagActor + ScriptedAlarmActor now run real user
scripts in production.

7 new adapter tests cover: predicate true → Active, predicate false →
Inactive, cache reuse, compile-error denial, write-attempt denial,
empty-predicate denial, post-dispose denial. Host.IntegrationTests now
17/17 green.

Closes #80 + #107. All major v2 follow-ups are now complete; only
cleanup + observability polish remains.
2026-05-26 10:58:04 -04:00
Joseph Doherty 219d10a22d feat(host): F8b RoslynVirtualTagEvaluator — production virtual-tag eval
RoslynVirtualTagEvaluator wraps Core.Scripting.ScriptEvaluator + Core
.VirtualTags.VirtualTagContext into a single-tag IVirtualTagEvaluator
adapter. Caches the compiled ScriptEvaluator per unique expression so
the second-and-onwards Evaluate is an in-process method call against the
dependency dictionary. Compile/sandbox/runtime errors all surface as
VirtualTagEvalResult.Failure rather than propagating exceptions through
the VirtualTagActor message loop.

Single-tag scope: cross-tag ctx.SetVirtualTag writes are dropped + logged
because fan-out between actors is owned by DependencyMuxActor. Cycle
detection + cascade ordering stay in Core.VirtualTags.VirtualTagEngine
where they belong (loaded fleet-wide); this adapter keeps the actor
message handler simple.

Host adds Core.Scripting + Core.VirtualTags project refs, plus a
TargetWarningsAsErrors NU1608 suppression — Microsoft.CodeAnalysis.CSharp
.Scripting 4.12.0 pins Common to 4.12.0 but ASP.NET Core transitively
brings Microsoft.CodeAnalysis.Common 5.0.0; the surface we use is stable
across the drift (verified by Core.Scripting.Tests).

Program.cs binds RoslynVirtualTagEvaluator → IVirtualTagEvaluator on
driver-role hosts, replacing the F8-default NullVirtualTagEvaluator so
VirtualTagActor evaluates real user scripts at runtime.

6 new adapter tests cover: simple expression sums, cache reuse across
calls, compile-error denial, runtime-throw denial, empty-expression
denial, post-dispose denial. Host.IntegrationTests now 10/10 green.

Closes #79. F9b + #107 next.
2026-05-26 10:55:56 -04:00
Joseph Doherty 2697af31d1 feat(opcua,host): #81 ServiceLevel SDK publisher
SdkServiceLevelPublisher writes Server.ServiceLevel through the SDK's
ServerObjectState — the standard OPC UA non-transparent-redundancy signal
clients use to pick a primary. Writes are guarded by DiagnosticsLock so
concurrent SDK diagnostics scans don't fight with our updates.

DeferredServiceLevelPublisher mirrors the DeferredAddressSpaceSink late-
binding pattern: Akka actors resolve IServiceLevelPublisher at construction,
hosted service swaps the SDK publisher in after StandardServer.Start. Host
Program.cs registers DeferredServiceLevelPublisher as the singleton bound
to IServiceLevelPublisher; OtOpcUaServerHostedService gets it injected and
fills it once IServerInternal is available.

Tests boot a real StandardServer on a free port (cross-platform), call
Publish, then verify ServerObject.ServiceLevel.Value reflects the write.
5 new tests; OpcUaServer suite now 45/45 green (was 40, +5).

Closes #81 residual. Unblocks Task 60 (OPC UA dual-endpoint + ServiceLevel
tests).
2026-05-26 10:37:42 -04:00
Joseph Doherty 52997ee164 feat(observability): F13d Prometheus + OpenTelemetry instrumentation
v2-ci / build (push) Failing after 38s
v2-ci / unit-tests (tests/Core/ZB.MOM.WW.OtOpcUa.Cluster.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.ControlPlane.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Runtime.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Security.Tests) (push) Has been skipped
v2-ci / integration (push) Has been skipped
OtOpcUaTelemetry (Commons/Observability) centralizes the project's Meter
+ ActivitySource so all instrumentation points emit through a single
named surface. Counters cover the hot paths:

  otopcua.deploy.applied               (outcome=ack|reject)
  otopcua.deploy.apply.duration        (s, histogram)
  otopcua.driver.lifecycle             (event=spawn|spawn_stub|stop|fault)
  otopcua.virtualtag.eval              (outcome=ok|fail|skip)
  otopcua.scriptedalarm.transition     (state=activated|acknowledged|cleared)
  otopcua.opcua.sink.write             (kind=value|alarm|rebuild)
  otopcua.redundancy.service_level_change (level=byte)

Plus two ActivitySource spans:

  otopcua.deploy.apply                 wraps DriverHostActor.ApplyAndAck
  otopcua.opcua.address_space_rebuild  wraps OpcUaPublishActor.HandleRebuild

Instruments are no-op until a listener attaches, so tests + dev hosts
pay nothing for unread telemetry.

Host Program.cs gains AddOtOpcUaObservability() (binds the OtOpcUa Meter
+ ActivitySource to OpenTelemetry, attaches a Prometheus exporter) and
MapOtOpcUaMetrics() (mounts /metrics scrape endpoint). Driver-side
internals + ASP.NET request metrics deliberately stay off — the scrape
payload is scoped to OtOpcUa signals only.

Tests use MeterListener + ActivityListener to verify
VirtualTagActor.eval, OpcUaPublishActor.AttributeValueUpdate, and
RebuildAddressSpace actually emit on the central instruments. Runtime
suite is 72 / 72 green (+3).

Closes #105. Path A (F13b/c/d) complete; next batch options: #85 UNS
folder hierarchy in SDK, or F8b/F9b production engine bindings.
2026-05-26 10:29:40 -04:00
Joseph Doherty 21eac21409 feat(opcua,host): F13c LDAP-bound UserName validator
Adds IOpcUaUserAuthenticator seam in OpcUaServer.Security with a deny-all
NullOpcUaUserAuthenticator default. OpcUaApplicationHost subscribes to
SessionManager.ImpersonateUser after _application.Start so UserName tokens
flow through the authenticator and either attach a UserIdentity to the
session (Allow) or set IdentityValidationError = BadIdentityTokenRejected
(Deny / authenticator exception). Anonymous + X509 tokens fall through to
SDK defaults.

LdapOpcUaUserAuthenticator (Host project) bridges to the same
ILdapAuthService that AddOtOpcUaAuth uses for Admin cookies / JWT, so a
single LDAP source-of-truth governs both Admin control plane and OPC UA
data plane. Program.cs registers LdapOptions + LdapAuthService +
IOpcUaUserAuthenticator on driver-role hosts; admin-only nodes are
unchanged.

OtOpcUaServerHostedService threads the resolved authenticator into
OpcUaApplicationHost so the seam respects Host DI.

10 new tests: 6 in OpcUaServer.Tests cover the pure HandleImpersonation
static method (success / denial / anonymous fallthrough / authenticator-
throw / null-username / Null authenticator); 4 in Host.IntegrationTests
cover the LdapOpcUaUserAuthenticator adapter (LDAP allow → Allow with
roles, LDAP deny → Deny, exception → backend-error denial, display-name
fallback). OpcUaServer suite is 40 / 40 green.

Closes #104. Unblocks Task 60 (dual-endpoint + ServiceLevel tests) once
#81 residual lands.
2026-05-26 10:21:37 -04:00
Joseph Doherty 50787823d3 feat(host,runtime): #108 Host DI bindings — OPC UA server + deferred sink
v2-ci / build (push) Failing after 45s
v2-ci / unit-tests (tests/Core/ZB.MOM.WW.OtOpcUa.Cluster.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.ControlPlane.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Runtime.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Security.Tests) (push) Has been skipped
v2-ci / integration (push) Has been skipped
Wires the OPC UA SDK into the fused Host's lifecycle on driver-role
nodes + spawns OpcUaPublishActor with the proper sink/publisher/dbFactory/
applier resolution. The full read+write data path is now live in
production: Deploy → DriverHost → OpcUaPublish → SDK NodeManager →
subscribed OPC UA clients.

DeferredAddressSpaceSink (Commons.OpcUa):
  - Thread-safe wrapper IOpcUaAddressSpaceSink that delegates to an
    inner sink swapped in at runtime. Needed because Akka actors
    resolve the sink at construction time, but the production sink
    (SdkAddressSpaceSink wrapping OtOpcUaNodeManager) only exists
    after the SDK StandardServer has started.
  - Defaults to NullOpcUaAddressSpaceSink so calls before swap are
    safe; SetSink(null) reverts (for graceful shutdown).

OtOpcUaServerHostedService (Host.OpcUa):
  - IHostedService that owns the OPC UA SDK lifecycle. Reads
    OpcUaApplicationHostOptions from the 'OpcUa' config section,
    creates an OtOpcUaSdkServer, boots it through OpcUaApplicationHost,
    then swaps a real SdkAddressSpaceSink into the DeferredAddressSpaceSink
    singleton.
  - SDK boot failure is logged + non-fatal — the rest of the host
    (admin UI, driver actors) keeps running. Stop reverts to null sink.

WithOtOpcUaRuntimeActors (Runtime):
  - Now spawns OpcUaPublishActor (new actor) + threads its ActorRef
    into DriverHostActor's Props so successful applies trigger the
    address-space rebuild pipeline.
  - Phase7Applier is constructed here from the resolved sink + a
    logger; OpcUaPublishActor takes both.
  - Prepends the opcua-synchronized-dispatcher HOCON so the extension
    is self-contained — consumers (Host, tests) don't need to redeclare
    the dispatcher block.
  - New OpcUaPublishActorKey + OpcUaPublishActorName for actor-registry
    resolution.
  - AddOtOpcUaRuntime now also TryAddSingleton's NullOpcUaAddressSpaceSink
    + NullServiceLevelPublisher so admin-only nodes (or tests that
    don't bind the Deferred sink) stay safe.

Host.Program.cs (driver-role only):
  - Binds DeferredAddressSpaceSink as singleton + as IOpcUaAddressSpaceSink
  - AddHostedService<OtOpcUaServerHostedService>()

Tests: OpcUaServer 24 -> 28 (+4 DeferredAddressSpaceSink unit tests),
Runtime 69 -> 69 (existing ServiceCollectionExtensionsTests extended
to verify the new mux + publish actor registration).

All 6 v2 test suites green: 177 tests passing.

Closes #108. Engine-wiring is now production-bound end-to-end on
driver-role nodes — Deploy reaches real OPC UA Variable nodes that
subscribed clients see.
2026-05-26 10:02:15 -04:00
Joseph Doherty 3e3f7588bd feat(runtime,host): close F7 — driver subscribe + write paths + Host DI
v2-ci / build (push) Failing after 42s
v2-ci / unit-tests (tests/Core/ZB.MOM.WW.OtOpcUa.Cluster.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.ControlPlane.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Runtime.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Security.Tests) (push) Has been skipped
v2-ci / integration (push) Has been skipped
Three pieces landed in one batch, closing F7-residual + Host DI #106:

Runtime/DriverInstanceActor:
  - Subscribe / Unsubscribe message contracts; the Connected state
    handles them via IDriver.ISubscribable. On every OnDataChange
    event the actor publishes AttributeValuePublished to its parent
    (DriverHostActor → OpcUaPublishActor). OPC UA StatusCode is
    mapped to the 3-state OpcUaQuality enum via severity bits
    (00=Good, 01=Uncertain, 10/11=Bad).
  - DetachSubscription tears the handler off the driver on
    DisconnectObserved, Unsubscribe, and PostStop so a stale handler
    never pushes to a dead actor.
  - WriteAttribute now dispatches IWritable.WriteAsync (batch of one)
    with a 5s CancellationTokenSource; status-code propagated to
    WriteAttributeResult on non-Good results.

Host:
  - New ProjectReferences to Core + every cross-platform driver
    assembly (AbCip/AbLegacy/FOCAS/Galaxy/Modbus/S7/TwinCAT).
    Galaxy is net10 (gRPC client to mxaccessgw); the COM-bound net48
    Wonderware Historian driver stays out of the Host's reference
    closure — its .Client gRPC wrapper is what binds for historian
    needs.
  - New DriverFactoryBootstrap.AddOtOpcUaDriverFactories() registers
    a singleton DriverFactoryRegistry, invokes each driver's
    Register(registry, loggerFactory), and binds IDriverFactory to
    DriverFactoryRegistryAdapter. Replaces the F7 NullDriverFactory
    default so deploys actually materialise real IDriver instances
    on driver-role nodes. ShouldStub() still gates per-platform
    behaviour at spawn time.
  - Program.cs wires AddOtOpcUaDriverFactories() before AddAkka so
    the runtime extension can resolve IDriverFactory from DI.

Tests: Runtime 46 -> 52 (+6):
- Write returns success when StatusCode = Good
- Write propagates non-Good status code in failure Reason
- Subscribe forwards OnDataChange to parent as AttributeValuePublished
- Quality translation: Uncertain (0x40...) and Bad (0x80...)
- Subscribe against non-ISubscribable returns failure
- DisconnectObserved detaches handler so late events are dropped

All 6 v2 test suites green: 152 tests passing.

Closes F7. F7-residual sub-tasks #110 (subscribe) and #111 (write)
both shipped. Host DI binding #106 shipped.
2026-05-26 09:28:34 -04:00
Joseph Doherty 850d6774ea feat(adminui): F15 Phase A — shell + auth + fleet + hosts pages
v2-ci / build (push) Failing after 38s
v2-ci / unit-tests (tests/Core/ZB.MOM.WW.OtOpcUa.Cluster.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.ControlPlane.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Runtime.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Security.Tests) (push) Has been skipped
v2-ci / integration (push) Has been skipped
Implements Phase A of the F15 rebuild plan: minimum-viable Admin surface
with a working sign-in path and a fleet-state landing page. Decisions Q1–Q5
of docs/v2/AdminUI-rebuild-plan.md were taken as recommended.

- App.razor (moved into AdminUI library from the Host stub; vendored
  Bootstrap from RCL wwwroot — no public CDN, air-gap safe)
- Routes.razor (AuthorizeRouteView enforces page-level [Authorize])
- RedirectToLogin.razor (preserves returnUrl through the auth hop)
- Login.razor (static SSR, posts to /auth/login; Q5 wording about
  generic-vs-specific LDAP errors)
- Account.razor (identity + fleet roles + raw LDAP groups; Q4 — no
  per-cluster grants; fleet-wide LDAP-group → role mapping only)
- Fleet.razor (per-node deployment status: reads NodeDeploymentState
  + unions with IClusterRoleInfo.MembersWithRole("driver") so freshly-
  joined nodes appear as "waiting"; 10s auto-refresh)
- Hosts.razor (Akka cluster topology: members, status, roles, role-
  leader; 5s auto-refresh)

Host's stub App.razor deleted; Program.cs now points at
AdminUI.Components.App via an added using.

All 104 v2 tests remain green.
2026-05-26 07:49:35 -04:00
Joseph Doherty 686138123f feat(runtime): F11 — HistorianAdapterActor wired to IAlarmHistorianSink
v2-ci / unit-tests (tests/Core/ZB.MOM.WW.OtOpcUa.Cluster.Tests) (push) Has been cancelled
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.ControlPlane.Tests) (push) Has been cancelled
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer.Tests) (push) Has been cancelled
v2-ci / build (push) Has been cancelled
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Runtime.Tests) (push) Has been cancelled
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Security.Tests) (push) Has been cancelled
v2-ci / integration (push) Has been cancelled
Reshapes the placeholder buffered-counter actor into a thin fire-and-forget
bridge over the existing IAlarmHistorianSink contract. Default sink is
NullAlarmHistorianSink; production deployments override the DI binding to
SqliteStoreAndForwardSink wrapping WonderwareHistorianClient (the v1
components in src/Drivers/ZB.MOM.WW.OtOpcUa.Driver.Historian.Wonderware*
are reused verbatim — actor is just a mailbox-friendly entry point).

- HistorianAdapterActor.Props(IAlarmHistorianSink?) — null defaults to NullAlarmHistorianSink
- Receive<AlarmHistorianEvent>: fire-and-forget sink.EnqueueAsync
- Receive<GetStatus>: returns sink.GetStatus() (queue depth + drain state)
- ServiceCollectionExtensions.AddOtOpcUaRuntime registers the default sink
- WithOtOpcUaRuntimeActors spawns the actor + registers HistorianAdapterActorKey
- Program.cs calls AddOtOpcUaRuntime when hasDriver

Tests: 2 new (forward-to-sink + GetStatus). Runtime suite 17 → 18.
2026-05-26 07:18:08 -04:00
Joseph Doherty f18c285cca feat(adminui): FleetStatusSignalRBridge — DPS → SignalR forwarding (F16)
v2-ci / unit-tests (tests/Core/ZB.MOM.WW.OtOpcUa.Cluster.Tests) (push) Has been cancelled
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.ControlPlane.Tests) (push) Has been cancelled
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer.Tests) (push) Has been cancelled
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Runtime.Tests) (push) Has been cancelled
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Security.Tests) (push) Has been cancelled
v2-ci / integration (push) Has been cancelled
v2-ci / build (push) Has been cancelled
New per-admin-node actor that subscribes to the fleet-status DistributedPubSub
topic + forwards every FleetStatusChanged snapshot to all SignalR clients
connected to FleetStatusHub via IHubContext.

Wired via WithOtOpcUaSignalRBridges (new AkkaConfigurationBuilder extension in
AdminUI.Hubs) — Program.cs calls it inside the if(hasAdmin) block alongside
WithOtOpcUaControlPlaneSingletons.

Per-node subscription rather than cluster-singleton: every admin node forwards
its own snapshots to its own connected clients. Simpler than singleton
coordination + acceptable because the messages are small and SignalR fan-out
is per-node anyway.
2026-05-26 07:01:08 -04:00