Central cluster (2 fused admin+driver nodes) hosts the only UI + deploy
singleton; site clusters (2 driver-only nodes each) join the central mesh
and are logically separated by ClusterId. Each node applies only its own
cluster's drivers + address space on a global deploy. Approved design;
next step is the implementation plan.
Entities -> Phase7Composer.Compose -> MaterialiseHierarchy + MaterialiseEquipmentTags ->
real OtOpcUaNodeManager, asserting the Area/Line/Equipment folders + the equipment-signal
Variable land in a live OPC UA address space (structure-only). Also covers compose-side
EquipmentTags extraction. The cluster-level deploy + network-browse E2E + scadaproj loader
need the docker-dev fixture (not runnable on this dev box) and are tracked as a follow-up.
Two bundle-review fixes + idempotency coverage:
- CRITICAL: the planner ignored EquipmentTags, so an incremental deploy changing only
equipment tags produced an empty plan and HandleRebuild short-circuited before
materialising them. Add TagId to EquipmentTagPlan + Added/Removed/ChangedEquipmentTags
to Phase7Plan (diffed by TagId, in IsEmpty, driving Apply's needsRebuild) — mirroring
the GalaxyTags treatment.
- IMPORTANT: equipment variable NodeId was the raw driver FullName, which collides across
identical machines (e.g. two PLCs both exposing register 40001) — the second variable
was silently dropped. NodeId is now folder-scoped (parent/Name); FullName stays on
EquipmentTagPlan for the later values-routing milestone.
- Task 4: SDK-backed idempotency test (double-apply -> single variable); restart-safety
confirmed (RestoreApplied reuses the same RebuildAddressSpace -> HandleRebuild path).
- Minor: align composer equipment-tag sort with the artifact decoder (coalesce FolderPath).
Equipment folder DisplayName was the colloquial MachineCode; the live rebuild (artifact
ReadEquipmentNode) + composer now use the UNS level-5 Name segment, matching Area/Line
folders + EquipmentNodeWalker. NodeId stays the logical EquipmentId so browse-path
resolution + ACLs are unaffected.
Add Phase7Applier.MaterialiseEquipmentTags — a sink-based pass (Task-0 decision A) that
ensures each EquipmentTagPlan's Variable (NodeId = FullName) under its existing equipment
folder, nesting any FolderPath as a sub-folder. Wire it into OpcUaPublishActor.HandleRebuild
after the Galaxy pass. Variables start BadWaitingForInitialData; never re-creates equipment
folders (decision #4).
Add EquipmentTagPlan + an init-only EquipmentTags member on Phase7CompositionResult
(mirror of GalaxyTags). Populate it compose-side (Tag.EquipmentId != null AND owning
namespace Kind == Equipment) and artifact-decode-side via BuildEquipmentTagPlans, with
FullName extracted from Tag.TagConfig. Init-only member (not a 7th positional param) so
existing convenience constructors + call sites are untouched.
Implements WS-1/WS-2/WS-4 (+ tests) of the materialization scope: carry
Equipment-namespace tags through the composition/artifact, add a sink-based
MaterialiseEquipmentTags pass to the live rebuild, browse folders by friendly
Name (NodeId stays the logical Id). 6 tasks with dependencies; structure-only
(BadWaitingForInitialData leaves) — live values are the next milestone. Resume
via /executing-plans docs/plans/2026-06-06-equipment-namespace-structure-milestone.md
Equipment-kind namespaces materialise only their Area/Line/Equipment folder
skeleton on deploy, not the signals under them: EquipmentNodeWalker is fully
built + unit-tested but has no production call site, the composition/artifact
drops EquipmentId-bound tags, and no value source is wired (OpcUaClient driver
factory missing; VirtualTag ITagUpstreamSource unregistered). Documents the
gaps, workstreams, value-path options, and a structure-only-first sequencing.
Independently re-ran the D.1 alarm-source smoke against the live gateway
(10.100.0.48:5120) to back the native MxAccess alarm-event claim with a
fresh empirical run, not just the original 2026-05-29 capture.
The 2026-04-30 alarm plan banners claimed worker-side native alarm
subscription was blocked on a COM-bitness finding. That's stale: the
mxaccessgw .NET client now has true MxAccess alarm-event support, and a
live StreamAlarms check (+ new Skip-gated GatewayGalaxyAlarmFeedLiveTests
through the lmxopcua consumer) confirms native alarms — operator comment,
category, severity, timestamps — flow end-to-end. Reconcile both plan docs
to reality and add docs/plans/alarms-d1-smoke-artifact.md as the D.1
alarm-source deliverable. Historian-write live smoke + full server->A&C
round-trip remain (Windows parity rig only).
Driver collection editors (modal-per-row shared shell), resilience typed
form, editable DB-backed LDAP->role map (global roles, live on next
sign-in), and stale-comment/note cleanup. Roles intentionally global —
no per-cluster permissions.
ASP.NET Core's cookie-handler IsAjaxRequest heuristic only checks
X-Requested-With (not Accept). Drop the third test (Accept: application/json
was assumed to → 401 but actually → 302) and the Location.ShouldBeNull
assertion on the XHR test (framework still writes Location alongside 401;
clients ignore it). Renamed _ajax_ → _xhr_ for accuracy. Design doc
updated to match.
5 tasks following Section 6 of the approved design (bc4fce5). Tasks 3 and 4
parallelizable. Each task carries Classification + Estimated implement time
+ Parallelizable-with metadata for subagent dispatch.
Removes the JwtBearer parallel scheme + non-redirect 401 challenge that left
browsers staring at Chrome's HTTP_RESPONSE_CODE_FAILURE page on protected
GETs. JWT keeps minting (cookie payload only); cookie config flows through
the existing-but-unused OtOpcUaCookieOptions via PostConfigure (same pattern
ScadaBridge uses).
18-task plan following Section 9 of the approved design. Phases 3 & 4
parallelizable. Each task carries Classification + Estimated implement
time + Parallelizable-with metadata to drive subagent dispatch.
Approved design for the deferred follow-up from PR #f9fc7dd's driver-pages
work. Lazy tree browse via per-driver IDriverBrowser registered in AdminUI
DI, sessions held in-process with TTL reaper. Detailed sequencing for the
writing-plans handoff is in section 9.
Records final commit hashes + notes per task. Persistence file mirrors
the 43-commit branch state so future sessions can resume from the
correct checkpoint via /superpowers-extended-cc:executing-plans.
- DriverTestConnectE2eTests: 3 scenarios (sim/wrong-port/black-hole)
against the Modbus Docker fixture. Sim + wrong-port skip if fixture
unreachable; black-hole uses ModbusDriverProbe directly (no fixture).
- DriverReconnectE2eTests: message round-trip through AdminOperationsActor
cluster singleton — Ok=true + audit write, without live driver side effect.
- DriverStatusHubE2eTests: bridge-mocked fallback — spawns
DriverStatusSignalRBridge in the harness ActorSystem with a mock
IHubContext, publishes DriverHealthChanged to the driver-health DPS
topic, asserts store upsert + hub SendAsync call.
- DockerFixtureAvailability helper: TCP-connect probe for skip guards.
- Moq 4.20.72 added to central package management for hub mocking.
- Design doc §8.3 replaced with concrete pre-ship operator runbook.
Replaces the generic JSON-blob DriverEdit page with typed per-driver
pages (all 9 drivers), Test Connect, live status panel with
Reconnect/Restart, and a per-driver tag/address picker. Live OPC UA +
Galaxy browse explicitly deferred to a follow-up.
OtOpcUaNodeManager + SdkAddressSpaceSink: the v2 IOpcUaAddressSpaceSink
seam now has a production adapter against a real Opc.Ua.Server
CustomNodeManager2. Writes through OpcUaPublishActor's sink materialise
as real OPC UA Variable updates that subscribed clients see via the
standard ClearChangeMasks notification path.
OtOpcUaNodeManager (CustomNodeManager2):
- Owns a ConcurrentDictionary<string, BaseDataVariableState> under a
single namespace (https://zb.com/otopcua/ns) hung off Objects/.
- WriteValue lazy-creates the variable on first write, sets Value +
StatusCode (mapped from OpcUaQuality severity bits) + SourceTimestamp,
then ClearChangeMasks to notify subscribers.
- WriteAlarmState surfaces a [active, acknowledged] pair on a
dedicated node id — full AlarmConditionState/event firing comes
with #85 F14b (EquipmentNodeWalker SDK integration).
- RebuildAddressSpace tears down every registered variable + clears
the dictionary so the next write-pass starts fresh.
- Address-space root folder is materialised in CreateAddressSpace.
SdkAddressSpaceSink: thin IOpcUaAddressSpaceSink → OtOpcUaNodeManager
bridge. Production DI binding (#108) constructs this once the host's
StandardServer has booted.
OtOpcUaSdkServer (StandardServer subclass): overrides
CreateMasterNodeManager to inject OtOpcUaNodeManager via the
MasterNodeManager additionalManagers ctor. NodeManager property
exposes the live instance so OpcUaApplicationHost callers can wrap
it in a sink.
Tests: OpcUaServer 20 -> 24 (+4):
- WriteValue creates + updates variables in the manager
- WriteAlarmState creates a node distinct from value writes
- RebuildAddressSpace clears everything; subsequent writes start fresh
- NullOpcUaAddressSpaceSink no-op sanity
Each test boots a real OpcUaApplicationHost on a free port with the
SDK certificate auto-create flow (F13a) intact — full integration
slice on macOS.
All 6 v2 test suites green: 167 tests passing.
F10 status updated to reflect SDK binding shipped. Residuals:
- #109 OpcUaPublishActor.RebuildAddressSpace → Phase7Applier wiring
- #108 Host DI default to SdkAddressSpaceSink when hasDriver
- #85 F14b EquipmentNodeWalker integration (proper AlarmConditionState
+ folder hierarchy)
- IServiceLevelPublisher SDK binding (writes Server.ServiceLevel node)
End-to-end data path is now wired on the read side: driver subscriptions
fire AttributeValuePublished → DriverHostActor → DependencyMuxActor →
DependencyValueChanged to every interested VirtualTagActor. Previously
the publish hit a dead-letter at the host.
DependencyMuxActor:
- Per-node fan-out router. Maintains tagRef → Set<IActorRef> with a
reverse subscriber → refs index so unregister/replace are O(refs).
- Watches subscribers; Terminated triggers automatic unregister so
dead virtual-tag actors stop receiving publishes.
- Re-register replaces the prior interest set — no stale-ref leaks
on actor restart.
- Drops publishes for refs with no interested subscribers.
VirtualTagActor:
- New Props params: dependencyRefs + mux ActorRef.
- PreStart sends RegisterInterest to the mux; PostStop sends
UnregisterInterest. Default both null so older callers stay quiet.
DriverHostActor:
- New dependencyMux Props param. Steady + Applying states now
receive AttributeValuePublished from their DriverInstance children
and forward to the mux. Null mux is a no-op (dev/Mac).
ServiceCollectionExtensions:
- WithOtOpcUaRuntimeActors spawns DependencyMuxActor before
DriverHostActor and threads its ActorRef into the host's Props.
New DependencyMuxActorKey + DependencyMuxActorName.
Tests: Runtime 57 -> 63 (+6):
- Mux forwards to only subscribers interested in each ref
- Publish for unregistered ref is dropped silently
- Unregister stops forwarding
- Re-register replaces prior interest set
- VirtualTagActor PreStart registration drives end-to-end eval
(uses AwaitAssert to race-safely settle the PreStart Tell)
- DriverHostActor forwards AttributeValuePublished through to mux
All 6 v2 test suites green: 163 tests passing.
F8 (#79) state updated — dep subscribe seam shipped, Core.VirtualTags
production engine binding (compile + ITagUpstreamSource subscribe) is
the residual.
ScriptedAlarmActor now survives actor restart: PreStart loads from
the configured store + restores in-memory state; every Transition()
fires a fire-and-forget save. ActiveState still re-derives from the
evaluator on first tick (Phase 7 decision #14), but Acked state +
lastAckUser persist verbatim so operators don't re-ack across an
outage.
Three pieces:
- IAlarmActorStateStore seam in Commons.Engines, with the
AlarmActorStateSnapshot record (alarmId / state / lastTransitionUtc
/ lastAckUser) and NullAlarmActorStateStore default.
- EfAlarmActorStateStore in Runtime.ScriptedAlarms — production
adapter over the existing ScriptedAlarmState table in ConfigDb.
Maps the actor's 3-state enum to the table's AckedState column
(Active⇒Unacknowledged, Acknowledged⇒Acknowledged, Inactive⇒
Acknowledged). Concurrency conflicts are logged + dropped — the
next transition writes again.
- ScriptedAlarmActor PreStart load (async, piped back as
StateRestored) + Transition save. New Props overload takes the
store; default is NullAlarmActorStateStore so tests stay quiet.
Tests: Runtime 52 -> 57 (+5):
- Transition writes Active then Acknowledged snapshots with
lastAckUser populated
- PreStart with persisted Active state restores so a subsequent
AcknowledgeAlarm fires (not ignored as it would be from Inactive)
- Empty store boots Inactive (AcknowledgeAlarm correctly ignored)
- EfAlarmActorStateStore Save + Load round-trips via in-memory EF
- Load for unknown alarmId returns null
All 6 v2 test suites green: 157 tests passing.
Closes#112. F9 (#80) remaining residual is predicate binding to
Core.ScriptedAlarms.ScriptedAlarmEngine — split as F9b in tasks JSON.
Three pieces landed in one batch, closing F7-residual + Host DI #106:
Runtime/DriverInstanceActor:
- Subscribe / Unsubscribe message contracts; the Connected state
handles them via IDriver.ISubscribable. On every OnDataChange
event the actor publishes AttributeValuePublished to its parent
(DriverHostActor → OpcUaPublishActor). OPC UA StatusCode is
mapped to the 3-state OpcUaQuality enum via severity bits
(00=Good, 01=Uncertain, 10/11=Bad).
- DetachSubscription tears the handler off the driver on
DisconnectObserved, Unsubscribe, and PostStop so a stale handler
never pushes to a dead actor.
- WriteAttribute now dispatches IWritable.WriteAsync (batch of one)
with a 5s CancellationTokenSource; status-code propagated to
WriteAttributeResult on non-Good results.
Host:
- New ProjectReferences to Core + every cross-platform driver
assembly (AbCip/AbLegacy/FOCAS/Galaxy/Modbus/S7/TwinCAT).
Galaxy is net10 (gRPC client to mxaccessgw); the COM-bound net48
Wonderware Historian driver stays out of the Host's reference
closure — its .Client gRPC wrapper is what binds for historian
needs.
- New DriverFactoryBootstrap.AddOtOpcUaDriverFactories() registers
a singleton DriverFactoryRegistry, invokes each driver's
Register(registry, loggerFactory), and binds IDriverFactory to
DriverFactoryRegistryAdapter. Replaces the F7 NullDriverFactory
default so deploys actually materialise real IDriver instances
on driver-role nodes. ShouldStub() still gates per-platform
behaviour at spawn time.
- Program.cs wires AddOtOpcUaDriverFactories() before AddAkka so
the runtime extension can resolve IDriverFactory from DI.
Tests: Runtime 46 -> 52 (+6):
- Write returns success when StatusCode = Good
- Write propagates non-Good status code in failure Reason
- Subscribe forwards OnDataChange to parent as AttributeValuePublished
- Quality translation: Uncertain (0x40...) and Bad (0x80...)
- Subscribe against non-ISubscribable returns failure
- DisconnectObserved detaches handler so late events are dropped
All 6 v2 test suites green: 152 tests passing.
Closes F7. F7-residual sub-tasks #110 (subscribe) and #111 (write)
both shipped. Host DI binding #106 shipped.
Splits the side-effecting half of Phase7Composer (deferred at Task 47)
into two pieces that mirror DriverHostActor's spawn-plan pattern:
Phase7Plan + Phase7Planner.Compute (pure):
Diff two Phase7CompositionResult snapshots by stable id (EquipmentId,
DriverInstanceId, ScriptedAlarmId). Emits Added/Removed/Changed lists
per entity class. Added/Removed are sorted by id for deterministic
apply order. Changed wraps both Previous and Current projections so
consumers can decide between in-place mutation and tear-down +
rebuild.
Phase7Applier (side-effecting):
Drives an IOpcUaAddressSpaceSink against a plan. Removed equipment/
alarms get an inactive AlarmState write per id; Added/Removed of
Equipment or ScriptedAlarm triggers RebuildAddressSpace. Driver-only
changes correctly skip the rebuild — those flow through DriverHost-
Actor's spawn-plan in Runtime. Sink exceptions are caught + logged so
one bad node doesn't abort the apply.
Tests: OpcUaServer 6 -> 20 (+14):
- Phase7PlannerTests x9 (empty-in/empty-out, add/remove/change per
entity class, mixed changes, deterministic ordering)
- Phase7ApplierTests x5 (empty plan no-op, removal writes inactive
states + rebuild, added equipment triggers rebuild, driver-only
skips rebuild, sink fault is non-fatal)
The remaining piece is the EquipmentNodeWalker integration against a
real SDK NodeManager — split as F14b, gated on F10b's SDK builder.
All 6 v2 test suites green: 146 tests passing.
OpcUaPublishActor now routes through pluggable seams instead of just
incrementing a counter:
- IOpcUaAddressSpaceSink (Commons.OpcUa) — WriteValue / WriteAlarmState
/ RebuildAddressSpace. OpcUaQuality enum moved here from the actor's
nested type so producers don't have to reference the actor itself.
- IServiceLevelPublisher — Publish(byte). NullServiceLevelPublisher
retains the last level for inspection.
- The actor subscribes to the redundancy-state DPS topic in PreStart
and maps the local node's NodeRedundancyState to a coarse
ServiceLevel (Primary+leader=240, Primary=200, Secondary=100,
Detached=0). This keeps the local SDK's ServiceLevel node honest
without round-tripping back through the admin-singleton calculator.
- ServiceLevelChanged dedupes identical levels so the SDK doesn't see
redundant writes.
- Sink + publisher exceptions are caught and logged; the actor never
crashes its own dispatcher.
- PropsForTests gets optional sink/publisher/localNode params and
skips the DPS subscribe so unit tests stay on a vanilla TestKit
cluster.
Production binding to a real SDK NodeManager + Variable nodes is the
remaining residual — split as F10b. Task 60 still blocked on F10b.
Tests: Runtime 40 -> 46 (+6):
- AttributeValueUpdate routes to sink
- AlarmStateUpdate routes to sink
- RebuildAddressSpace calls sink.Rebuild
- ServiceLevelChanged dedupes
- RedundancyStateChanged for primary-leader publishes 240
- RedundancyStateChanged for secondary publishes 100
All 6 v2 test suites green: 132 tests passing.
VirtualTagActor and ScriptedAlarmActor now route through pluggable
evaluator interfaces and fan out to the cluster's live-tail topics
shipped in F15.3:
- IVirtualTagEvaluator + NullVirtualTagEvaluator in Commons.Engines.
VirtualTagActor calls evaluator on every DependencyValueChanged,
dedupes unchanged values, forwards EvaluationResult to its parent,
and publishes ScriptLogEntry Warning to the script-logs DPS topic
whenever the evaluator fails.
- IScriptedAlarmEvaluator + NullScriptedAlarmEvaluator. ScriptedAlarmActor
takes an AlarmConfig (id/name/equipment-path/severity/predicate) and
publishes both an AlarmTransitionEvent (alerts topic) and a
ScriptLogEntry (script-logs topic) at every transition. Manual
ConditionMet/Acknowledge/Cleared still flow through the same
Transition() so callers without engine bindings still drive the
state machine; the legacy single-string Props() overload routes
through a default AlarmConfig.
The Null* defaults keep the actors safe when no engine is bound —
unconfigured nodes never spuriously alarm. Production binding to
Core.VirtualTags.VirtualTagEngine and Core.ScriptedAlarms is the
remaining residual (F8b/F9b — split in tasks JSON).
Tests: Runtime 34 -> 40 (+6):
- VirtualTagActorTests x3 (evaluator drives EvaluationResult,
unchanged-value dedup, failure publishes Warning ScriptLogEntry)
- ScriptedAlarmActorTests x3 (engine threshold drives Activated +
Cleared on alerts topic, manual Acknowledge attribution).
All 6 v2 test suites green: 126 tests passing.
DriverHostActor.ApplyAndAck now reads the deployment artifact and
reconciles its set of DriverInstanceActor children — spawn the missing,
ApplyDelta to those with changed config, stop the removed/disabled.
The diff lives in pure DriverSpawnPlanner so it can be unit-tested
without an ActorSystem.
Adds IDriverFactory in Core.Abstractions (consumed by Runtime) +
DriverFactoryRegistryAdapter in Core.Hosting that wraps the existing
v1 DriverFactoryRegistry — Runtime stays decoupled from Polly/Serilog,
the Host wires the adapter once driver assemblies have registered.
ShouldStub(type, roles) is now actually called on every spawn — Galaxy
+ Wonderware-Historian boot stubbed on macOS/Linux or whenever the host
carries the dev role. Missing factory ⇒ stub fallback, never a crash.
Tests: 24 → 34 in Runtime (+10):
- DriverSpawnPlannerTests x7 (diff cases, type change ⇒ stop+respawn)
- DeploymentArtifactTests x5 (empty/malformed/missing fields tolerant)
- DriverHostActorReconcileTests x4 (spawn count, stub fallback,
ShouldStub gate, second-apply stops the removed)
All 6 v2 test suites green: 120 tests passing.
Closes F20 (ShouldStub wired). F7 marked partial — subscription
publishing + write path still stubbed in DriverInstanceActor itself.
Replaces the Ok=true stub with a TCP connect to the peer's OPC UA port (4840
default) with a 2s timeout. A successful connect indicates the OPC UA server
process is up + accepting connections — enough for the redundancy calculator
to treat the peer as live. A full secure-channel Hello/Acknowledge handshake
is overkill for what the redundancy calc consumes and would pull in the OPC
UA Client SDK + a PKI setup. Upgrade later if a deeper liveness signal is ever
required.
Probe extracts the host from NodeId by stripping the :port suffix (commit
5cfbe8b encoded host:port into NodeId for cluster-member identity).
Tests: 2 new tests — Ok=true against a live TcpListener on a chosen port,
Ok=false against an unreachable endpoint. All 17 Runtime tests pass (was 16
covering only the message-contract surface).