A thin gateway over the admin-operations cluster singleton so CI/scripts can trigger a
deployment without the Blazor button. Forwards to the same IAdminOperationsClient.
StartDeploymentAsync; mounted on admin-role nodes. Auth is a fixed-time X-Api-Key check
against Security:DeployApiKey (orthogonal to the cookie-only web auth); AllowAnonymous so the
auth fallback doesn't 401 it, self-disabling (503) until the key is set. Outcome->status:
202/200/409/422. Unit tests for the key check + outcome mapping; HTTP E2E (real auth + real
deploy via the 2-node harness). Documented in docs/security.md.
The per-role overlay (appsettings.{role}.json) was appended after WebApplicationBuilder's
default sources, so it outranked environment variables — a baked role file could not be
overridden by a deployment env var. In otopcua-dev this meant appsettings.admin.json's
Security:Ldap:DevStubMode=false beat the compose's DevStub override, so every AdminUI login
attempted a real LDAP bind against a non-existent server and failed with 'Unexpected
authentication error'.
- Program.cs: re-append AddEnvironmentVariables() + AddCommandLine(args) after the role
overlay so deployment overrides keep top precedence (overlay still beats base appsettings).
- docker-dev/docker-compose.yml: the DevStub env var targeted the stale 'Authentication:Ldap'
section; the code reads 'Security:Ldap'. Corrected the prefix on every host node (+ comment).
Dev AdminUI login now signs in as Administrator via the DevStub bypass.
Both bugs surfaced only on split-role deployments (the MAIN cluster's
admin-only nodes), where the AdminUI runs without the driver role.
- Test Connect returned "No probe registered" for every driver: the
IDriverProbe set was registered only under the driver role, but the
admin-operations singleton that consumes it is pinned to admin. Extract
AddOtOpcUaDriverProbes() (idempotent via TryAddEnumerable) and call it
in the hasAdmin path too.
- Live driver-status/alerts/script-log panels showed "SignalR error:
Connection refused": these Blazor Server components opened a HubConnection
to their own hub via the browser's public URL, which server-side code
can't reach behind Traefik (host :9200 -> container :9000). Read the
in-process source directly instead -- DriverStatus via
IDriverStatusSnapshotStore.SnapshotChanged, Alerts/ScriptLog via a new
IInProcessBroadcaster<T>. Fleet status was unaffected (reads DB/ActorSystem).
Adds unit tests for probe registration, the snapshot-store event, and the
broadcaster.
The v2 plan's blessed install path (scripts/install/Install-Services.ps1)
registers the host via `sc.exe create binPath=...OtOpcUa.Host.exe`, but the
binary never called `UseWindowsService`. Without it, the Service Control
Manager waits ~30s for the process to call SetServiceStatus(Running) and
then kills it — the install script's design was incomplete.
Two changes:
- Host.csproj: drop the `IsOSPlatform('Windows')` condition on the
Microsoft.Extensions.Hosting.WindowsServices package reference so the
package is always available. The runtime helper used by
UseWindowsService gates on WindowsServiceHelpers.IsWindowsService()
internally, so it's a no-op when running as a console app or under
Linux/macOS — the binary stays cross-platform-buildable.
- Program.cs: call builder.Host.UseWindowsService(options =>
options.ServiceName = "OtOpcUaHost") immediately after CreateBuilder.
When the host is launched by SCM, WindowsServiceLifetime takes over
the IHostLifetime slot and reports START/STOP correctly. When launched
by `dotnet run` or `OtOpcUa.Host.exe` from a console, it's a no-op.
Verified end-to-end on wonder-app-vd03.zmr.zimmer.com: `sc.exe create`
followed by `sc.exe start OtOpcUaHost` transitions from START_PENDING to
RUNNING; /login + /health/ready + /health/active all return 200; service
survives SSH session close and auto-starts on boot per the AUTO_START
flag set by the installer script.
Six interlocking fixes surfaced while smoke-testing the fused Host in a browser:
- Host/Program.cs: UseStaticWebAssets() opts into the RCL static-asset pipeline
in any environment (auto-only in Development), MapStaticAssets().AllowAnonymous()
exempts CSS/JS from the AddOtOpcUaAuth fallback policy, and
AddCascadingAuthenticationState() lets <AuthorizeView/> work inside interactive
components (NavSidebar's session block).
- Security/ServiceCollectionExtensions: ILdapAuthService Scoped → Singleton —
consumed by the Singleton LdapOpcUaUserAuthenticator on driver-role nodes.
Crash only surfaced in Development (ValidateOnBuild=true).
- Security/Endpoints/AuthEndpoints: /auth/login now dispatches on Content-Type —
application/json keeps the original 204/401/503 contract for tests, and
application/x-www-form-urlencoded (the browser <form>) gets a redirect dance.
DisableAntiforgery on the login endpoint (it's the entry point, no prior session)
and AllowAnonymous to override the fallback policy.
- Security/Ldap/LdapOptions + LdapAuthService: real DevStubMode property; when
true the auth service bypasses the LDAP bind and returns a FleetAdmin role so
dev/test can navigate the full Admin UI without GLAuth running.
- AdminUI/EndpointRouteBuilderExtensions: doc-comment update about static-asset
flow (the actual MapStaticAssets call lives in Host/Program.cs).
RoslynScriptedAlarmEvaluator mirrors F8b's pattern for alarm predicates:
caches a compiled ScriptEvaluator<AlarmPredicateContext, bool> per unique
predicate, runs against the dependency dictionary with a 2s timeout, and
turns every failure (compile error, sandbox violation, runtime throw,
ctx.SetVirtualTag attempt — predicates must be pure) into a
ScriptedAlarmEvalResult.Failure. ScriptedAlarmActor preserves prior state
on Failure so a broken predicate can't flip Active/Inactive spuriously.
Program.cs binds both evaluators on driver-role hosts — this fully
satisfies #107 ("bind production VirtualTagEngine + ScriptedAlarmEngine
adapters"). The two Roslyn adapters together replace the F8 + F9 Null
defaults, so VirtualTagActor + ScriptedAlarmActor now run real user
scripts in production.
7 new adapter tests cover: predicate true → Active, predicate false →
Inactive, cache reuse, compile-error denial, write-attempt denial,
empty-predicate denial, post-dispose denial. Host.IntegrationTests now
17/17 green.
Closes#80 + #107. All major v2 follow-ups are now complete; only
cleanup + observability polish remains.
RoslynVirtualTagEvaluator wraps Core.Scripting.ScriptEvaluator + Core
.VirtualTags.VirtualTagContext into a single-tag IVirtualTagEvaluator
adapter. Caches the compiled ScriptEvaluator per unique expression so
the second-and-onwards Evaluate is an in-process method call against the
dependency dictionary. Compile/sandbox/runtime errors all surface as
VirtualTagEvalResult.Failure rather than propagating exceptions through
the VirtualTagActor message loop.
Single-tag scope: cross-tag ctx.SetVirtualTag writes are dropped + logged
because fan-out between actors is owned by DependencyMuxActor. Cycle
detection + cascade ordering stay in Core.VirtualTags.VirtualTagEngine
where they belong (loaded fleet-wide); this adapter keeps the actor
message handler simple.
Host adds Core.Scripting + Core.VirtualTags project refs, plus a
TargetWarningsAsErrors NU1608 suppression — Microsoft.CodeAnalysis.CSharp
.Scripting 4.12.0 pins Common to 4.12.0 but ASP.NET Core transitively
brings Microsoft.CodeAnalysis.Common 5.0.0; the surface we use is stable
across the drift (verified by Core.Scripting.Tests).
Program.cs binds RoslynVirtualTagEvaluator → IVirtualTagEvaluator on
driver-role hosts, replacing the F8-default NullVirtualTagEvaluator so
VirtualTagActor evaluates real user scripts at runtime.
6 new adapter tests cover: simple expression sums, cache reuse across
calls, compile-error denial, runtime-throw denial, empty-expression
denial, post-dispose denial. Host.IntegrationTests now 10/10 green.
Closes#79. F9b + #107 next.
SdkServiceLevelPublisher writes Server.ServiceLevel through the SDK's
ServerObjectState — the standard OPC UA non-transparent-redundancy signal
clients use to pick a primary. Writes are guarded by DiagnosticsLock so
concurrent SDK diagnostics scans don't fight with our updates.
DeferredServiceLevelPublisher mirrors the DeferredAddressSpaceSink late-
binding pattern: Akka actors resolve IServiceLevelPublisher at construction,
hosted service swaps the SDK publisher in after StandardServer.Start. Host
Program.cs registers DeferredServiceLevelPublisher as the singleton bound
to IServiceLevelPublisher; OtOpcUaServerHostedService gets it injected and
fills it once IServerInternal is available.
Tests boot a real StandardServer on a free port (cross-platform), call
Publish, then verify ServerObject.ServiceLevel.Value reflects the write.
5 new tests; OpcUaServer suite now 45/45 green (was 40, +5).
Closes#81 residual. Unblocks Task 60 (OPC UA dual-endpoint + ServiceLevel
tests).
OtOpcUaTelemetry (Commons/Observability) centralizes the project's Meter
+ ActivitySource so all instrumentation points emit through a single
named surface. Counters cover the hot paths:
otopcua.deploy.applied (outcome=ack|reject)
otopcua.deploy.apply.duration (s, histogram)
otopcua.driver.lifecycle (event=spawn|spawn_stub|stop|fault)
otopcua.virtualtag.eval (outcome=ok|fail|skip)
otopcua.scriptedalarm.transition (state=activated|acknowledged|cleared)
otopcua.opcua.sink.write (kind=value|alarm|rebuild)
otopcua.redundancy.service_level_change (level=byte)
Plus two ActivitySource spans:
otopcua.deploy.apply wraps DriverHostActor.ApplyAndAck
otopcua.opcua.address_space_rebuild wraps OpcUaPublishActor.HandleRebuild
Instruments are no-op until a listener attaches, so tests + dev hosts
pay nothing for unread telemetry.
Host Program.cs gains AddOtOpcUaObservability() (binds the OtOpcUa Meter
+ ActivitySource to OpenTelemetry, attaches a Prometheus exporter) and
MapOtOpcUaMetrics() (mounts /metrics scrape endpoint). Driver-side
internals + ASP.NET request metrics deliberately stay off — the scrape
payload is scoped to OtOpcUa signals only.
Tests use MeterListener + ActivityListener to verify
VirtualTagActor.eval, OpcUaPublishActor.AttributeValueUpdate, and
RebuildAddressSpace actually emit on the central instruments. Runtime
suite is 72 / 72 green (+3).
Closes#105. Path A (F13b/c/d) complete; next batch options: #85 UNS
folder hierarchy in SDK, or F8b/F9b production engine bindings.
Adds IOpcUaUserAuthenticator seam in OpcUaServer.Security with a deny-all
NullOpcUaUserAuthenticator default. OpcUaApplicationHost subscribes to
SessionManager.ImpersonateUser after _application.Start so UserName tokens
flow through the authenticator and either attach a UserIdentity to the
session (Allow) or set IdentityValidationError = BadIdentityTokenRejected
(Deny / authenticator exception). Anonymous + X509 tokens fall through to
SDK defaults.
LdapOpcUaUserAuthenticator (Host project) bridges to the same
ILdapAuthService that AddOtOpcUaAuth uses for Admin cookies / JWT, so a
single LDAP source-of-truth governs both Admin control plane and OPC UA
data plane. Program.cs registers LdapOptions + LdapAuthService +
IOpcUaUserAuthenticator on driver-role hosts; admin-only nodes are
unchanged.
OtOpcUaServerHostedService threads the resolved authenticator into
OpcUaApplicationHost so the seam respects Host DI.
10 new tests: 6 in OpcUaServer.Tests cover the pure HandleImpersonation
static method (success / denial / anonymous fallthrough / authenticator-
throw / null-username / Null authenticator); 4 in Host.IntegrationTests
cover the LdapOpcUaUserAuthenticator adapter (LDAP allow → Allow with
roles, LDAP deny → Deny, exception → backend-error denial, display-name
fallback). OpcUaServer suite is 40 / 40 green.
Closes#104. Unblocks Task 60 (dual-endpoint + ServiceLevel tests) once
#81 residual lands.
Wires the OPC UA SDK into the fused Host's lifecycle on driver-role
nodes + spawns OpcUaPublishActor with the proper sink/publisher/dbFactory/
applier resolution. The full read+write data path is now live in
production: Deploy → DriverHost → OpcUaPublish → SDK NodeManager →
subscribed OPC UA clients.
DeferredAddressSpaceSink (Commons.OpcUa):
- Thread-safe wrapper IOpcUaAddressSpaceSink that delegates to an
inner sink swapped in at runtime. Needed because Akka actors
resolve the sink at construction time, but the production sink
(SdkAddressSpaceSink wrapping OtOpcUaNodeManager) only exists
after the SDK StandardServer has started.
- Defaults to NullOpcUaAddressSpaceSink so calls before swap are
safe; SetSink(null) reverts (for graceful shutdown).
OtOpcUaServerHostedService (Host.OpcUa):
- IHostedService that owns the OPC UA SDK lifecycle. Reads
OpcUaApplicationHostOptions from the 'OpcUa' config section,
creates an OtOpcUaSdkServer, boots it through OpcUaApplicationHost,
then swaps a real SdkAddressSpaceSink into the DeferredAddressSpaceSink
singleton.
- SDK boot failure is logged + non-fatal — the rest of the host
(admin UI, driver actors) keeps running. Stop reverts to null sink.
WithOtOpcUaRuntimeActors (Runtime):
- Now spawns OpcUaPublishActor (new actor) + threads its ActorRef
into DriverHostActor's Props so successful applies trigger the
address-space rebuild pipeline.
- Phase7Applier is constructed here from the resolved sink + a
logger; OpcUaPublishActor takes both.
- Prepends the opcua-synchronized-dispatcher HOCON so the extension
is self-contained — consumers (Host, tests) don't need to redeclare
the dispatcher block.
- New OpcUaPublishActorKey + OpcUaPublishActorName for actor-registry
resolution.
- AddOtOpcUaRuntime now also TryAddSingleton's NullOpcUaAddressSpaceSink
+ NullServiceLevelPublisher so admin-only nodes (or tests that
don't bind the Deferred sink) stay safe.
Host.Program.cs (driver-role only):
- Binds DeferredAddressSpaceSink as singleton + as IOpcUaAddressSpaceSink
- AddHostedService<OtOpcUaServerHostedService>()
Tests: OpcUaServer 24 -> 28 (+4 DeferredAddressSpaceSink unit tests),
Runtime 69 -> 69 (existing ServiceCollectionExtensionsTests extended
to verify the new mux + publish actor registration).
All 6 v2 test suites green: 177 tests passing.
Closes#108. Engine-wiring is now production-bound end-to-end on
driver-role nodes — Deploy reaches real OPC UA Variable nodes that
subscribed clients see.
Three pieces landed in one batch, closing F7-residual + Host DI #106:
Runtime/DriverInstanceActor:
- Subscribe / Unsubscribe message contracts; the Connected state
handles them via IDriver.ISubscribable. On every OnDataChange
event the actor publishes AttributeValuePublished to its parent
(DriverHostActor → OpcUaPublishActor). OPC UA StatusCode is
mapped to the 3-state OpcUaQuality enum via severity bits
(00=Good, 01=Uncertain, 10/11=Bad).
- DetachSubscription tears the handler off the driver on
DisconnectObserved, Unsubscribe, and PostStop so a stale handler
never pushes to a dead actor.
- WriteAttribute now dispatches IWritable.WriteAsync (batch of one)
with a 5s CancellationTokenSource; status-code propagated to
WriteAttributeResult on non-Good results.
Host:
- New ProjectReferences to Core + every cross-platform driver
assembly (AbCip/AbLegacy/FOCAS/Galaxy/Modbus/S7/TwinCAT).
Galaxy is net10 (gRPC client to mxaccessgw); the COM-bound net48
Wonderware Historian driver stays out of the Host's reference
closure — its .Client gRPC wrapper is what binds for historian
needs.
- New DriverFactoryBootstrap.AddOtOpcUaDriverFactories() registers
a singleton DriverFactoryRegistry, invokes each driver's
Register(registry, loggerFactory), and binds IDriverFactory to
DriverFactoryRegistryAdapter. Replaces the F7 NullDriverFactory
default so deploys actually materialise real IDriver instances
on driver-role nodes. ShouldStub() still gates per-platform
behaviour at spawn time.
- Program.cs wires AddOtOpcUaDriverFactories() before AddAkka so
the runtime extension can resolve IDriverFactory from DI.
Tests: Runtime 46 -> 52 (+6):
- Write returns success when StatusCode = Good
- Write propagates non-Good status code in failure Reason
- Subscribe forwards OnDataChange to parent as AttributeValuePublished
- Quality translation: Uncertain (0x40...) and Bad (0x80...)
- Subscribe against non-ISubscribable returns failure
- DisconnectObserved detaches handler so late events are dropped
All 6 v2 test suites green: 152 tests passing.
Closes F7. F7-residual sub-tasks #110 (subscribe) and #111 (write)
both shipped. Host DI binding #106 shipped.
Implements Phase A of the F15 rebuild plan: minimum-viable Admin surface
with a working sign-in path and a fleet-state landing page. Decisions Q1–Q5
of docs/v2/AdminUI-rebuild-plan.md were taken as recommended.
- App.razor (moved into AdminUI library from the Host stub; vendored
Bootstrap from RCL wwwroot — no public CDN, air-gap safe)
- Routes.razor (AuthorizeRouteView enforces page-level [Authorize])
- RedirectToLogin.razor (preserves returnUrl through the auth hop)
- Login.razor (static SSR, posts to /auth/login; Q5 wording about
generic-vs-specific LDAP errors)
- Account.razor (identity + fleet roles + raw LDAP groups; Q4 — no
per-cluster grants; fleet-wide LDAP-group → role mapping only)
- Fleet.razor (per-node deployment status: reads NodeDeploymentState
+ unions with IClusterRoleInfo.MembersWithRole("driver") so freshly-
joined nodes appear as "waiting"; 10s auto-refresh)
- Hosts.razor (Akka cluster topology: members, status, roles, role-
leader; 5s auto-refresh)
Host's stub App.razor deleted; Program.cs now points at
AdminUI.Components.App via an added using.
All 104 v2 tests remain green.
Reshapes the placeholder buffered-counter actor into a thin fire-and-forget
bridge over the existing IAlarmHistorianSink contract. Default sink is
NullAlarmHistorianSink; production deployments override the DI binding to
SqliteStoreAndForwardSink wrapping WonderwareHistorianClient (the v1
components in src/Drivers/ZB.MOM.WW.OtOpcUa.Driver.Historian.Wonderware*
are reused verbatim — actor is just a mailbox-friendly entry point).
- HistorianAdapterActor.Props(IAlarmHistorianSink?) — null defaults to NullAlarmHistorianSink
- Receive<AlarmHistorianEvent>: fire-and-forget sink.EnqueueAsync
- Receive<GetStatus>: returns sink.GetStatus() (queue depth + drain state)
- ServiceCollectionExtensions.AddOtOpcUaRuntime registers the default sink
- WithOtOpcUaRuntimeActors spawns the actor + registers HistorianAdapterActorKey
- Program.cs calls AddOtOpcUaRuntime when hasDriver
Tests: 2 new (forward-to-sink + GetStatus). Runtime suite 17 → 18.
New per-admin-node actor that subscribes to the fleet-status DistributedPubSub
topic + forwards every FleetStatusChanged snapshot to all SignalR clients
connected to FleetStatusHub via IHubContext.
Wired via WithOtOpcUaSignalRBridges (new AkkaConfigurationBuilder extension in
AdminUI.Hubs) — Program.cs calls it inside the if(hasAdmin) block alongside
WithOtOpcUaControlPlaneSingletons.
Per-node subscription rather than cluster-singleton: every admin node forwards
its own snapshots to its own connected clients. Simpler than singleton
coordination + acceptable because the messages are small and SignalR fan-out
is per-node anyway.
Builds TwoNodeClusterHarness: two in-process Host-equivalent nodes sharing
an in-memory ConfigDb. Forms a 2-member Akka cluster. ClusterFormationTests
proves both nodes see each other as admin+driver role members.
Fixes a real production bug uncovered while wiring the harness — Program.cs
ran two separate ActorSystems (one from AddOtOpcUaCluster.AkkaHostedService
with cluster HOCON, one from Akka.Hosting.AddAkka with bare HOCON). Cluster
singletons landed on the bare ActorSystem and could not actually form a
cluster ("Configuration does not contain `akka.cluster` node").
Consolidation:
- AddOtOpcUaCluster now only binds AkkaClusterOptions + registers IClusterRoleInfo
- New WithOtOpcUaClusterBootstrap pushes embedded HOCON + Remote/Cluster options
into Akka.Hosting's AkkaConfigurationBuilder
- AkkaHostedService.cs deleted — Akka.Hosting now owns the lifecycle
- Program.cs + harness call WithOtOpcUaClusterBootstrap inside AddAkka
Why not WebApplicationFactory<Program>? Program.cs reads OTOPCUA_ROLES from
process env (shared across in-process WAFs); the harness replays Program.cs's
DI graph from a clean WebApplicationBuilder per node with per-node config
overrides. Same production extensions, isolated config + Kestrel + Akka ports.
Tests: 93 v2 tests pass (was 91 + 2 new cluster formation), 0 skipped.
Mirrors WithOtOpcUaControlPlaneSingletons for the driver role. Spawns
DriverHostActor + DbHealthProbeActor on the host's ActorSystem and
registers both under marker keys. Host's Program.cs now calls it when
the node carries the driver role, so driver-only and admin+driver
deployments both auto-bootstrap the per-node actors.
Integration test covers the registration round-trip via Microsoft.Extensions.Hosting
+ Akka.Hosting AddAkka.
Adds the empty project skeletons that subsequent v2 tasks fill in:
src/Core/ZB.MOM.WW.OtOpcUa.Commons (types, interfaces, message contracts)
src/Core/ZB.MOM.WW.OtOpcUa.Cluster (Akka.Hosting + cluster wiring)
src/Server/ZB.MOM.WW.OtOpcUa.Security (cookie+JWT auth, LDAP)
src/Server/ZB.MOM.WW.OtOpcUa.ControlPlane (admin-role cluster singletons)
src/Server/ZB.MOM.WW.OtOpcUa.Runtime (per-node driver actors)
src/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer (OPC UA SDK application host)
src/Server/ZB.MOM.WW.OtOpcUa.AdminUI (Razor class library)
src/Server/ZB.MOM.WW.OtOpcUa.Host (single fused web binary)
Each project sets TreatWarningsAsErrors=true in its own csproj (per the
Directory.Build.props deviation note in the previous commit). NuGetAuditSuppress
entries cover transitive vulnerability advisories the new strictness surfaces:
- GHSA-g94r-2vxg-569j (OpenTelemetry.Api 1.9.0 via Akka.Cluster.Hosting/Tools)
- GHSA-h958-fxgg-g7w3 (Opc.Ua.Core 1.5.374.126 via OpcUaServer)
- GHSA-37gx-xxp4-5rgx + GHSA-w3x6-4m5h-cxqf (legacy advisories already accepted)
OpcUaServer pins OPCFoundation.NetStandard.Opc.Ua.Configuration to 1.5.374.126
via VersionOverride to match Opc.Ua.Server's transitive Opc.Ua.Core (same
constraint as the legacy Server project).
Runtime does NOT project-reference any concrete Driver.* assemblies; drivers
load reflectively at runtime (Phase 6). Runtime gets the IDriver contract
through Core.Abstractions instead.
Host's Microsoft.Extensions.Hosting.WindowsServices is conditional on the
Windows OS so the project builds on macOS dev machines.
Build verification: dotnet build -> 438 warnings (all pre-existing xUnit1051
in legacy Server.Tests/Admin.Tests), 0 errors. Closes Task 9 (build green
smoke check, no separate commit).