A thin gateway over the admin-operations cluster singleton so CI/scripts can trigger a
deployment without the Blazor button. Forwards to the same IAdminOperationsClient.
StartDeploymentAsync; mounted on admin-role nodes. Auth is a fixed-time X-Api-Key check
against Security:DeployApiKey (orthogonal to the cookie-only web auth); AllowAnonymous so the
auth fallback doesn't 401 it, self-disabling (503) until the key is set. Outcome->status:
202/200/409/422. Unit tests for the key check + outcome mapping; HTTP E2E (real auth + real
deploy via the 2-node harness). Documented in docs/security.md.
Seed a 1-area/1-line/1-equipment/1-tag Equipment namespace, StartDeployment via the
in-process 2-node harness, and assert the persisted artifact decodes (ParseComposition)
to the equipment signal (FullName from TagConfig) + friendly UNS folder names. Covers the
ConfigComposer -> ArtifactBlob -> ParseComposition.EquipmentTags seam the unit tests only
approximated with hand-built JSON. (OPC UA browse is covered against a real SDK node manager
in Phase7ApplierHierarchyTests; the cluster harness binds the no-op sink.)
Standardize the control-plane admin role VALUES on the canonical six
(ZB.MOM.WW.Auth CanonicalRole). OtOpcUa uses four:
ConfigViewer -> Viewer
ConfigEditor -> Designer
FleetAdmin -> Administrator
DriverOperator -> Operator (appsettings-only string role)
This is a rename, not a permission change: enforcement semantics are
preserved (whoever could deploy/administer/operate before still can).
- AdminRole enum members renamed (persisted as string names via
HasConversion<string>); RoleGrants.razor dropdown default updated.
- EF DATA migration CanonicalizeAdminRoles rewrites existing
LdapGroupRoleMapping.Role rows old->new (Up) and back (Down); schema /
model snapshot byte-identical (no pending model changes).
- Enforcement role STRINGS canonicalized:
* Security policies keep their NAMES ("DriverOperator"/"FleetAdmin")
but require canonical roles: RequireRole("Operator","Administrator")
and RequireRole("Administrator").
* Deployments.razor [Authorize(Roles="Administrator,Designer")].
* DevStub now grants "Administrator"; LdapOptions/doc-comment examples
canonicalized.
- Data-plane authorization (NodePermissions/NodeAcl/IPermissionEvaluator/
TriePermissionEvaluator/UserAuthorizationState) UNTOUCHED.
- New CanonicalAdminRolesTests pins canonical claim values end-to-end and
the real registered policies; existing role-string tests updated.
Both bugs surfaced only on split-role deployments (the MAIN cluster's
admin-only nodes), where the AdminUI runs without the driver role.
- Test Connect returned "No probe registered" for every driver: the
IDriverProbe set was registered only under the driver role, but the
admin-operations singleton that consumes it is pinned to admin. Extract
AddOtOpcUaDriverProbes() (idempotent via TryAddEnumerable) and call it
in the hasAdmin path too.
- Live driver-status/alerts/script-log panels showed "SignalR error:
Connection refused": these Blazor Server components opened a HubConnection
to their own hub via the browser's public URL, which server-side code
can't reach behind Traefik (host :9200 -> container :9000). Read the
in-process source directly instead -- DriverStatus via
IDriverStatusSnapshotStore.SnapshotChanged, Alerts/ScriptLog via a new
IInProcessBroadcaster<T>. Fleet status was unaffected (reads DB/ActorSystem).
Adds unit tests for probe registration, the snapshot-store event, and the
broadcaster.
- DriverTestConnectE2eTests: 3 scenarios (sim/wrong-port/black-hole)
against the Modbus Docker fixture. Sim + wrong-port skip if fixture
unreachable; black-hole uses ModbusDriverProbe directly (no fixture).
- DriverReconnectE2eTests: message round-trip through AdminOperationsActor
cluster singleton — Ok=true + audit write, without live driver side effect.
- DriverStatusHubE2eTests: bridge-mocked fallback — spawns
DriverStatusSignalRBridge in the harness ActorSystem with a mock
IHubContext, publishes DriverHealthChanged to the driver-health DPS
topic, asserts store upsert + hub SendAsync call.
- DockerFixtureAvailability helper: TCP-connect probe for skip guards.
- Moq 4.20.72 added to central package management for hub mocking.
- Design doc §8.3 replaced with concrete pre-ship operator runbook.
Adds <summary>, <param>, <typeparam>, and <inheritdoc/> tags to public
members surfaced by commentchecker — resolves 5,847 of 5,869 issues
(99.6%) across three /fixdocs passes.
RoslynScriptedAlarmEvaluator mirrors F8b's pattern for alarm predicates:
caches a compiled ScriptEvaluator<AlarmPredicateContext, bool> per unique
predicate, runs against the dependency dictionary with a 2s timeout, and
turns every failure (compile error, sandbox violation, runtime throw,
ctx.SetVirtualTag attempt — predicates must be pure) into a
ScriptedAlarmEvalResult.Failure. ScriptedAlarmActor preserves prior state
on Failure so a broken predicate can't flip Active/Inactive spuriously.
Program.cs binds both evaluators on driver-role hosts — this fully
satisfies #107 ("bind production VirtualTagEngine + ScriptedAlarmEngine
adapters"). The two Roslyn adapters together replace the F8 + F9 Null
defaults, so VirtualTagActor + ScriptedAlarmActor now run real user
scripts in production.
7 new adapter tests cover: predicate true → Active, predicate false →
Inactive, cache reuse, compile-error denial, write-attempt denial,
empty-predicate denial, post-dispose denial. Host.IntegrationTests now
17/17 green.
Closes#80 + #107. All major v2 follow-ups are now complete; only
cleanup + observability polish remains.
RoslynVirtualTagEvaluator wraps Core.Scripting.ScriptEvaluator + Core
.VirtualTags.VirtualTagContext into a single-tag IVirtualTagEvaluator
adapter. Caches the compiled ScriptEvaluator per unique expression so
the second-and-onwards Evaluate is an in-process method call against the
dependency dictionary. Compile/sandbox/runtime errors all surface as
VirtualTagEvalResult.Failure rather than propagating exceptions through
the VirtualTagActor message loop.
Single-tag scope: cross-tag ctx.SetVirtualTag writes are dropped + logged
because fan-out between actors is owned by DependencyMuxActor. Cycle
detection + cascade ordering stay in Core.VirtualTags.VirtualTagEngine
where they belong (loaded fleet-wide); this adapter keeps the actor
message handler simple.
Host adds Core.Scripting + Core.VirtualTags project refs, plus a
TargetWarningsAsErrors NU1608 suppression — Microsoft.CodeAnalysis.CSharp
.Scripting 4.12.0 pins Common to 4.12.0 but ASP.NET Core transitively
brings Microsoft.CodeAnalysis.Common 5.0.0; the surface we use is stable
across the drift (verified by Core.Scripting.Tests).
Program.cs binds RoslynVirtualTagEvaluator → IVirtualTagEvaluator on
driver-role hosts, replacing the F8-default NullVirtualTagEvaluator so
VirtualTagActor evaluates real user scripts at runtime.
6 new adapter tests cover: simple expression sums, cache reuse across
calls, compile-error denial, runtime-throw denial, empty-expression
denial, post-dispose denial. Host.IntegrationTests now 10/10 green.
Closes#79. F9b + #107 next.
Adds IOpcUaUserAuthenticator seam in OpcUaServer.Security with a deny-all
NullOpcUaUserAuthenticator default. OpcUaApplicationHost subscribes to
SessionManager.ImpersonateUser after _application.Start so UserName tokens
flow through the authenticator and either attach a UserIdentity to the
session (Allow) or set IdentityValidationError = BadIdentityTokenRejected
(Deny / authenticator exception). Anonymous + X509 tokens fall through to
SDK defaults.
LdapOpcUaUserAuthenticator (Host project) bridges to the same
ILdapAuthService that AddOtOpcUaAuth uses for Admin cookies / JWT, so a
single LDAP source-of-truth governs both Admin control plane and OPC UA
data plane. Program.cs registers LdapOptions + LdapAuthService +
IOpcUaUserAuthenticator on driver-role hosts; admin-only nodes are
unchanged.
OtOpcUaServerHostedService threads the resolved authenticator into
OpcUaApplicationHost so the seam respects Host DI.
10 new tests: 6 in OpcUaServer.Tests cover the pure HandleImpersonation
static method (success / denial / anonymous fallthrough / authenticator-
throw / null-username / Null authenticator); 4 in Host.IntegrationTests
cover the LdapOpcUaUserAuthenticator adapter (LDAP allow → Allow with
roles, LDAP deny → Deny, exception → backend-error denial, display-name
fallback). OpcUaServer suite is 40 / 40 green.
Closes#104. Unblocks Task 60 (dual-endpoint + ServiceLevel tests) once
#81 residual lands.
Adds a real-infra mode for the integration test harness alongside the default
in-memory mode. Drops the previously-untested code paths (EF SqlServer
behaviors, real LDAP bind) under env-var control without breaking the
zero-infra default that CI runs.
- docker-compose.yml — minimal SQL 2022 (14331) + OpenLDAP (3894) stack
(ports chosen to coexist with docker-dev/ on 14330/3893)
- HarnessMode record reads OTOPCUA_HARNESS_USE_SQL=1 / USE_LDAP=1 from env
- SQL mode: per-harness unique DB OtOpcUa_Harness_{guid}, EnsureCreated
at startup, EnsureDeleted on dispose (best-effort)
- LDAP mode: drops StubLdapAuthService and configures real LdapAuthService
against the compose'd OpenLDAP via Authentication:Ldap:* config keys
- Microsoft.EntityFrameworkCore.SqlServer added to the test project
- README documents both modes + the macOS no-Docker caveat
Default in-memory mode unchanged — all 9 existing tests still pass.
Extends TwoNodeClusterHarness with three lifecycle primitives:
- StopNodeBAsync() — graceful CoordinatedShutdown (Cluster.Leave)
- RestartNodeBAsync() — rebuild node B on same Akka port + same in-memory DB
- WaitForClusterSizeAsync(n) — converge assertion helper
Adds three failover scenario tests:
- Stopping node B shrinks cluster to 1 Up member
- Restarted node B rejoins on the same Akka port
- Deployment started with B down seals with a single NodeDeploymentState
(validates ConfigPublishCoordinator.DiscoverDriverNodes snapshots
membership at dispatch time)
Closes follow-up F22. Integration test count: 6 → 9 (+3).
- New Commons.Messages.Fleet.GetDiagnostics request record.
- DriverHostActor handles GetDiagnostics in all three states (Steady, Applying,
Stale); replies with a NodeDiagnosticsSnapshot built from _currentRevision
+ the local NodeId. Drivers list is empty until F7 wires the per-instance
children.
- FleetDiagnosticsClient now resolves the target via ActorSelection at
akka.tcp://{system}@{nodeId}/user/driver-host and Asks with a 3s timeout.
On timeout/peer-down it returns an empty snapshot so the UI degrades
gracefully rather than throwing.
Two new integration tests in Host.IntegrationTests:
- GetDiagnostics_returns_snapshot_with_target_NodeId verifies the
cross-node Ask/Reply works.
- GetDiagnostics_after_deploy_reports_current_revision exercises the
end-to-end path: AdminOps starts a deployment, both DriverHostActors
apply, then diagnostics reports the new revision on both nodes.
All 98 v2 tests pass (was 96 + 2 new).
DeployHappyPathTests exercises the full deploy pipeline on the 2-node harness:
AdminOperationsActor → ConfigPublishCoordinator → DistributedPubSub →
DriverHostActor on both nodes → ApplyAck → coordinator seals. Verifies both
NodeDeploymentState rows reach Applied and Deployment.Status reaches Sealed.
Exposed + fixed two production bugs along the way:
1. Coordinator was publishing DispatchDeployment on the "deployments" topic but
never subscribed to anything — DriverHostActor ACKs published on the same
topic could not reach it. Added dedicated "deployment-acks" topic with
coordinator subscription in PreStart, and DriverHostActor publishes ACKs
there.
2. NodeId derivation used member.Address.Host only — two cluster members on a
shared loopback host (test harness, dev VMs) collided to one identity. The
coordinator's expected-ack set became {1} and the system sealed after only
half the nodes acked. Switched to host:port everywhere (ClusterRoleInfo +
coordinator) so loopback nodes stay distinct and production identities are
harmlessly more specific.
Tests: 95 v2 tests pass (was 93 + 2 deploy tests), 0 skipped.
Failover scenarios (design §8 cases 3-7: node-kill-mid-apply, split-brain,
restart-during-deploy) deferred — they need controlled node-down primitives
on the harness. Tracked as F22 (failover scenario test cases).
Builds TwoNodeClusterHarness: two in-process Host-equivalent nodes sharing
an in-memory ConfigDb. Forms a 2-member Akka cluster. ClusterFormationTests
proves both nodes see each other as admin+driver role members.
Fixes a real production bug uncovered while wiring the harness — Program.cs
ran two separate ActorSystems (one from AddOtOpcUaCluster.AkkaHostedService
with cluster HOCON, one from Akka.Hosting.AddAkka with bare HOCON). Cluster
singletons landed on the bare ActorSystem and could not actually form a
cluster ("Configuration does not contain `akka.cluster` node").
Consolidation:
- AddOtOpcUaCluster now only binds AkkaClusterOptions + registers IClusterRoleInfo
- New WithOtOpcUaClusterBootstrap pushes embedded HOCON + Remote/Cluster options
into Akka.Hosting's AkkaConfigurationBuilder
- AkkaHostedService.cs deleted — Akka.Hosting now owns the lifecycle
- Program.cs + harness call WithOtOpcUaClusterBootstrap inside AddAkka
Why not WebApplicationFactory<Program>? Program.cs reads OTOPCUA_ROLES from
process env (shared across in-process WAFs); the harness replays Program.cs's
DI graph from a clean WebApplicationBuilder per node with per-node config
overrides. Same production extensions, isolated config + Kestrel + Akka ports.
Tests: 93 v2 tests pass (was 91 + 2 new cluster formation), 0 skipped.