ParseComposition(blob, nodeId, onInconsistency?) detects a kept equipment whose
UNS line belongs to another cluster (a same-cluster-invariant violation that
would orphan the equipment folder) and reports it via an optional callback,
wired to OpcUaPublishActor's logger. Detection-only; the upstream draft
validator remains the authority. Adds two unit tests.
Log a WARNING on startup when IVirtualTagEvaluator is not registered so a DI misconfig on a
driver-role node is visible in logs instead of silently evaluating all VirtualTags to NoChange.
Add a comment in PushDesiredSubscriptions noting that TryRecoverFromStale does not call this
method, so VirtualTags remain empty after a Stale recovery until the next deployment dispatch
(intentional, consistent with driver recovery).
Context.Watch each spawned child; OnChildTerminated evicts it from _children so the
next ApplyVirtualTags (still containing that vtagId) falls through the ContainsKey
guard and re-spawns a fresh VirtualTagActor. Adds a spawn-site Debug log, moves the
TODO about in-place plan mutation to the skip-existing branch where it belongs, and
adds a deterministic TestKit test (Child_is_respawned_after_unexpected_termination)
that kills the first child, drains its UnregisterInterest from the mux probe, re-applies,
and asserts a second distinct RegisterInterest arrives.
Two bundle-review fixes + idempotency coverage:
- CRITICAL: the planner ignored EquipmentTags, so an incremental deploy changing only
equipment tags produced an empty plan and HandleRebuild short-circuited before
materialising them. Add TagId to EquipmentTagPlan + Added/Removed/ChangedEquipmentTags
to Phase7Plan (diffed by TagId, in IsEmpty, driving Apply's needsRebuild) — mirroring
the GalaxyTags treatment.
- IMPORTANT: equipment variable NodeId was the raw driver FullName, which collides across
identical machines (e.g. two PLCs both exposing register 40001) — the second variable
was silently dropped. NodeId is now folder-scoped (parent/Name); FullName stays on
EquipmentTagPlan for the later values-routing milestone.
- Task 4: SDK-backed idempotency test (double-apply -> single variable); restart-safety
confirmed (RestoreApplied reuses the same RebuildAddressSpace -> HandleRebuild path).
- Minor: align composer equipment-tag sort with the artifact decoder (coalesce FolderPath).
Equipment folder DisplayName was the colloquial MachineCode; the live rebuild (artifact
ReadEquipmentNode) + composer now use the UNS level-5 Name segment, matching Area/Line
folders + EquipmentNodeWalker. NodeId stays the logical EquipmentId so browse-path
resolution + ACLs are unaffected.
Add Phase7Applier.MaterialiseEquipmentTags — a sink-based pass (Task-0 decision A) that
ensures each EquipmentTagPlan's Variable (NodeId = FullName) under its existing equipment
folder, nesting any FolderPath as a sub-folder. Wire it into OpcUaPublishActor.HandleRebuild
after the Galaxy pass. Variables start BadWaitingForInitialData; never re-creates equipment
folders (decision #4).
Add EquipmentTagPlan + an init-only EquipmentTags member on Phase7CompositionResult
(mirror of GalaxyTags). Populate it compose-side (Tag.EquipmentId != null AND owning
namespace Kind == Equipment) and artifact-decode-side via BuildEquipmentTagPlans, with
FullName extracted from Tag.TagConfig. Init-only member (not a 7th positional param) so
existing convenience constructors + call sites are untouched.
Two ordering/lifecycle gaps surfaced once tag values began streaming:
1. OpcUaPublishActor.HandleRebuild loaded the latest *Sealed* artifact, but the
rebuild fires at apply time — before this deployment seals — so it materialised
the PREVIOUS revision while SubscribeBulk subscribed to the applied one. The two
disagreed (4 variables materialised vs 396 subscribed) and every config needed
two deploys. RebuildAddressSpace now carries the applied DeploymentId and the
rebuild loads that exact artifact.
2. On restart a node recovered its revision from NodeDeploymentState but left the
driver children + address space empty (and an identical-config redeploy no-ops on
the unchanged revision), so a rebuilt node served nothing until a config change.
Bootstrap now calls RestoreApplied: re-spawn drivers, rebuild from the applied
artifact, re-push SubscribeBulk — no re-ack.
Verified live: recreating the driver nodes auto-restores all 396 galaxy mirror
tags across 40 machines with Good live values, no deploy required.
Materialised SystemPlatform/Galaxy variables previously stayed
BadWaitingForInitialData because nothing told the driver to subscribe
(OpcUaPublishActor TODO 'on a future SubscribeBulk pass') and published
values were only forwarded to the VirtualTag mux, never the OPC UA sink.
DriverHostActor now, after each apply, groups the deployment's galaxy tag
MXAccess refs by driver and sends DriverInstanceActor.SetDesiredSubscriptions;
the actor retains the set and (re)subscribes on every Connected entry, so
values resume after reconnects/redeploys (closes the F8b/#113 gap). Published
values are also forwarded to OpcUaPublishActor as AttributeValueUpdate
(NodeId == galaxy MxAccessRef) so the materialised variable shows live data.
Verified live in docker-dev: galaxy TestMachine_001 tags go Good with a
changing TestChangingInt. +1 unit test.
- Topic-name drift fix: DriverHealthChanged.TopicName and
DriverControlTopic.Name now live on the message contracts in
Commons. AkkaDriverHealthPublisher, DriverStatusSignalRBridge,
DriverHostActor, and AdminOperationsActor all delegate to the
single constant so a rename can't silently desynchronise
publisher and subscriber.
- DriverStatusPanel._opResultClearTimer switched from
System.Timers.Timer to System.Threading.Timer + awaited
DisposeAsync. Prevents an in-flight 8s clear-callback from
invoking StateHasChanged on a component whose hub has already
been released.
- PublishHealthSnapshot deduplicates against the last published
(state, lastSuccess, lastError, errorCount) fingerprint. The
30s heartbeat no longer floods the SignalR layer with identical
Healthy snapshots — newly-joined clients still warm up via the
snapshot store on JoinDriver.
- DriverInstanceSpec carries ClusterId from the deployment artifact;
DriverHostActor threads the real cluster identity into
DriverInstanceActor instead of the local NodeId. Old pre-PR
artifacts without a ClusterId field fall back to the NodeId so
in-flight deployments keep working.
- DriverHostActor.ChildEntry holds the full DriverInstanceSpec
(was only carrying DriverType + LastConfigJson). Restart respawns
preserve RowId, Name, Enabled, ClusterId — no placeholder values.
- Drop the unnecessary _faultLock on DriverInstanceActor — every
read/write site runs inside an Akka message handler which is
single-threaded per actor instance.
- DriverStatusPanel.DisposeAsync awaits Timer.DisposeAsync so an
in-flight 5s tick can't invoke StateHasChanged on a component
whose hub has already been torn down.
- RestartDriver / ReconnectDriver messages + AdminOperationsActor
handlers (broadcast via driver-control DPS topic; audited via
ConfigEdits).
- DriverHostActor subscribes to driver-control; locates the
matching child DriverInstanceActor and stops+respawns it
(Restart) or sends it a ForceReconnect internal message
(Reconnect — re-enters Reconnecting state without full stop).
DriverInstanceSpec constructor call uses named args to handle
the full 6-parameter signature.
- New DriverOperator authorization policy mapped to DriverOperator
or FleetAdmin role; documented in docs/security.md. Map LDAP
group via GroupToRole (e.g. "ot-driver-operator": "DriverOperator").
- DriverStatusPanel renders Reconnect + Restart buttons when the
user holds the DriverOperator policy (hidden otherwise). Restart
requires an in-page Razor confirm block (no JS confirm, keeps
SignalR event loop unblocked). Both buttons show a spinner and
are disabled during in-flight; result chip auto-clears after 8s.
Username sourced from AuthenticationStateProvider.
Reconnect resolves to "ForceReconnect" (re-enter Reconnecting,
not full stop+respawn) — transport drops and retries while actor
and in-memory state are preserved. All DriverInstanceActor states
handle ForceReconnect safely (no-op when already in transition).
- IDriverHealthPublisher in Core.Abstractions + NullDriverHealthPublisher
no-op for tests/dev-stub paths.
- AkkaDriverHealthPublisher in Runtime forwards to the cluster-wide
`driver-health` DPS topic.
- DriverInstanceActor instrumented to publish snapshots on every
observable state change + a periodic 30s heartbeat so the AdminUI
snapshot store warms up for newly-joined SignalR clients.
- Sliding 5-minute Faulted-count tracked per actor via Queue<DateTime>.
- DriverHostActor.SpawnChild threads clusterId (_localNode.Value) and
the health publisher down to every DriverInstanceActor child.
- ServiceCollectionExtensions.AddOtOpcUaRuntime registers
AkkaDriverHealthPublisher as IDriverHealthPublisher singleton.
Adds <summary>, <param>, <typeparam>, and <inheritdoc/> tags to public
members surfaced by commentchecker — resolves 5,847 of 5,869 issues
(99.6%) across three /fixdocs passes.
Closes the gap where Tag rows with EquipmentId=NULL + Namespace.Kind=SystemPlatform
(Galaxy hierarchy) existed in ConfigDb but were never surfaced in the OPC UA
address space. Now they materialise as Variable nodes under a folder named for
their FolderPath, browseable through any OPC UA client.
Layers touched:
- IOpcUaAddressSpaceSink: new EnsureVariable(nodeId, parentFolderId, displayName,
dataType) signature on the sink interface, NullSink, DeferredSink, SdkSink.
- OtOpcUaNodeManager.EnsureVariable: creates a BaseDataVariableState parented
under the named folder (or root), initial Value=null +
StatusCode=BadWaitingForInitialData; resolves Tag.DataType strings to the
matching OPC UA built-in NodeId. Idempotent.
- Phase7CompositionResult: new GalaxyTags collection of GalaxyTagPlan records
carrying (TagId, DriverInstanceId, FolderPath, DisplayName, DataType,
MxAccessRef). Constructor overloads keep existing call sites compiling.
- Phase7Composer.Compose: now takes Tag + Namespace inputs, filters for
SystemPlatform-namespace tags with EquipmentId=NULL, emits GalaxyTagPlan
rows with MXAccess ref "FolderPath.Name".
- Phase7Plan: new AddedGalaxyTags / RemovedGalaxyTags / ChangedGalaxyTags
collections + GalaxyTagDelta record; IsEmpty + needsRebuild updated.
- Phase7Planner.Compute: diffs GalaxyTags by TagId via existing DiffById helper.
- DeploymentArtifact.ParseComposition: reads the Tags + Namespaces +
DriverInstances arrays the ConfigComposer already emits, applies the same
SystemPlatform filter, returns the same GalaxyTagPlan list as the composer
so artifact-side and compose-side plans agree.
- Phase7Applier: new MaterialiseGalaxyTags pass that ensures one folder per
distinct FolderPath then one Variable per tag. NodeId for the variable is
"<FolderPath>.<Name>" matching the MXAccess ref so the future Galaxy
SubscribeBulk wiring can address them directly.
- OpcUaPublishActor.RebuildAddressSpace: invokes MaterialiseGalaxyTags after
MaterialiseHierarchy. _lastApplied initialiser updated for the new ctor.
- seed-clusters.sql: pre-existing TestMachine_001.TestAlarm001..003 rows
needed no change — the composer/applier now picks them up automatically.
Verified end-to-end via docker-dev: deploy click → driver-a logs
"Phase7Applier: Galaxy tags materialised (tags=3, folders=1)" → OPC UA Client
CLI browses the three Variable nodes under TestMachine_001 folder. Reads
return BadWaitingForInitialData status (expected — Galaxy driver's
SubscribeBulk wiring to push values into the nodes is the remaining
follow-up).
User confirmed the mxaccessgw client (Galaxy driver) doesn't need Windows
— only the gateway worker has that constraint. This wires the Galaxy
driver into the docker-dev fleet:
- docker-compose.yml: GALAXY_MXGW_API_KEY env var on every host service
(admin nodes harmlessly ignore it; driver-role nodes pick it up when
the seeded DriverInstance resolves ApiKeySecretRef=env:GALAXY_MXGW_API_KEY).
Default value matches the key the operator provided; override via shell
env (GALAXY_MXGW_API_KEY=... docker compose up -d) to rotate without
editing compose.
- seed-clusters.sql: now creates a SystemPlatform Namespace
(MAIN-galaxy, urn:zb:docker-dev:galaxy) plus a GalaxyMxGateway
DriverInstance (MAIN-galaxy-mxgw) in the MAIN cluster pointing at
http://10.100.0.48:5120 with UseTls=false. Idempotent via IF NOT EXISTS.
- DriverInstanceActor.ShouldStub: clarified the doc comment — only the
legacy "Galaxy" type name and "Historian.Wonderware" are Windows-only;
the v2 "GalaxyMxGateway" driver is .NET 10 cross-platform (gRPC to an
external gateway) and is NOT stubbed.
- README: documents the final operator step — sign in, click "Deploy
current configuration" on /deployments to materialise the seeded
Galaxy driver into a running gRPC connection. Raw DriverInstance rows
don't spawn drivers on their own; the v2 lifecycle requires a sealed
Deployment first.
Phase7Composer now carries UnsAreaProjection + UnsLineProjection lists so
the applier can materialise the full UNS topology in the OPC UA address
space. New IOpcUaAddressSpaceSink.EnsureFolder(folderNodeId, parentNodeId,
displayName) seam (no-op default, recorded in tests, forwarded by
DeferredAddressSpaceSink, implemented by SdkAddressSpaceSink). The SDK-
side OtOpcUaNodeManager gains an EnsureFolder API that creates
FolderState nodes with proper parent linkage; RebuildAddressSpace now
clears folders too so re-applies don't accumulate stale topology.
Phase7Applier.MaterialiseHierarchy walks composition.UnsAreas →
composition.UnsLines → composition.EquipmentNodes, calling EnsureFolder
with the correct parent at each level. Idempotent — calling twice with
the same composition is a no-op. OpcUaPublishActor.HandleRebuild invokes
it after Phase7Applier.Apply so OPC UA clients browsing the server now
see Area/Line/Equipment as proper folders rather than flat tag ids.
DeploymentArtifact.ParseComposition reads UnsAreas + UnsLines from the
JSON snapshot the ControlPlane emits, populating the new fields when
present.
Phase7Composer.Compose now accepts UnsAreas + UnsLines; a 3-arg overload
preserves the old signature for legacy callers + existing tests. The
Phase7CompositionResult convenience ctor likewise keeps the planner
tests working without UNS data.
3 new hierarchy tests (pure unit + boot-verify against a real
OtOpcUaSdkServer); OpcUaServer suite is 48/48 green (was 45, +3),
Runtime 74/74 unchanged.
Closes#85.
OtOpcUaTelemetry (Commons/Observability) centralizes the project's Meter
+ ActivitySource so all instrumentation points emit through a single
named surface. Counters cover the hot paths:
otopcua.deploy.applied (outcome=ack|reject)
otopcua.deploy.apply.duration (s, histogram)
otopcua.driver.lifecycle (event=spawn|spawn_stub|stop|fault)
otopcua.virtualtag.eval (outcome=ok|fail|skip)
otopcua.scriptedalarm.transition (state=activated|acknowledged|cleared)
otopcua.opcua.sink.write (kind=value|alarm|rebuild)
otopcua.redundancy.service_level_change (level=byte)
Plus two ActivitySource spans:
otopcua.deploy.apply wraps DriverHostActor.ApplyAndAck
otopcua.opcua.address_space_rebuild wraps OpcUaPublishActor.HandleRebuild
Instruments are no-op until a listener attaches, so tests + dev hosts
pay nothing for unread telemetry.
Host Program.cs gains AddOtOpcUaObservability() (binds the OtOpcUa Meter
+ ActivitySource to OpenTelemetry, attaches a Prometheus exporter) and
MapOtOpcUaMetrics() (mounts /metrics scrape endpoint). Driver-side
internals + ASP.NET request metrics deliberately stay off — the scrape
payload is scoped to OtOpcUa signals only.
Tests use MeterListener + ActivityListener to verify
VirtualTagActor.eval, OpcUaPublishActor.AttributeValueUpdate, and
RebuildAddressSpace actually emit on the central instruments. Runtime
suite is 72 / 72 green (+3).
Closes#105. Path A (F13b/c/d) complete; next batch options: #85 UNS
folder hierarchy in SDK, or F8b/F9b production engine bindings.
Wires the OPC UA SDK into the fused Host's lifecycle on driver-role
nodes + spawns OpcUaPublishActor with the proper sink/publisher/dbFactory/
applier resolution. The full read+write data path is now live in
production: Deploy → DriverHost → OpcUaPublish → SDK NodeManager →
subscribed OPC UA clients.
DeferredAddressSpaceSink (Commons.OpcUa):
- Thread-safe wrapper IOpcUaAddressSpaceSink that delegates to an
inner sink swapped in at runtime. Needed because Akka actors
resolve the sink at construction time, but the production sink
(SdkAddressSpaceSink wrapping OtOpcUaNodeManager) only exists
after the SDK StandardServer has started.
- Defaults to NullOpcUaAddressSpaceSink so calls before swap are
safe; SetSink(null) reverts (for graceful shutdown).
OtOpcUaServerHostedService (Host.OpcUa):
- IHostedService that owns the OPC UA SDK lifecycle. Reads
OpcUaApplicationHostOptions from the 'OpcUa' config section,
creates an OtOpcUaSdkServer, boots it through OpcUaApplicationHost,
then swaps a real SdkAddressSpaceSink into the DeferredAddressSpaceSink
singleton.
- SDK boot failure is logged + non-fatal — the rest of the host
(admin UI, driver actors) keeps running. Stop reverts to null sink.
WithOtOpcUaRuntimeActors (Runtime):
- Now spawns OpcUaPublishActor (new actor) + threads its ActorRef
into DriverHostActor's Props so successful applies trigger the
address-space rebuild pipeline.
- Phase7Applier is constructed here from the resolved sink + a
logger; OpcUaPublishActor takes both.
- Prepends the opcua-synchronized-dispatcher HOCON so the extension
is self-contained — consumers (Host, tests) don't need to redeclare
the dispatcher block.
- New OpcUaPublishActorKey + OpcUaPublishActorName for actor-registry
resolution.
- AddOtOpcUaRuntime now also TryAddSingleton's NullOpcUaAddressSpaceSink
+ NullServiceLevelPublisher so admin-only nodes (or tests that
don't bind the Deferred sink) stay safe.
Host.Program.cs (driver-role only):
- Binds DeferredAddressSpaceSink as singleton + as IOpcUaAddressSpaceSink
- AddHostedService<OtOpcUaServerHostedService>()
Tests: OpcUaServer 24 -> 28 (+4 DeferredAddressSpaceSink unit tests),
Runtime 69 -> 69 (existing ServiceCollectionExtensionsTests extended
to verify the new mux + publish actor registration).
All 6 v2 test suites green: 177 tests passing.
Closes#108. Engine-wiring is now production-bound end-to-end on
driver-role nodes — Deploy reaches real OPC UA Variable nodes that
subscribed clients see.
Closes the loop between F10b (SDK NodeManager) and F14 (Phase7Plan +
Phase7Applier). DriverHostActor's successful apply now triggers a
RebuildAddressSpace on the publish actor, which loads the latest
deployment artifact + walks composer → planner → applier through the
sink. The OPC UA address space tracks the deployed composition.
DeploymentArtifact:
- New ParseComposition(blob) → Phase7CompositionResult that decodes
Equipment + DriverInstance + ScriptedAlarm arrays into the
projection records Phase7Planner consumes. Pascal-case property
names mirror ConfigComposer.SnapshotAndFlattenAsync's output.
- Each entity reader is tolerant: missing-id rows are dropped,
natural-key sort matches Phase7Composer's contract.
OpcUaPublishActor:
- New Props params: dbFactory + applier. When wired, RebuildAddressSpace
does:
1. LoadLatestArtifact (most recent Sealed Deployment.ArtifactBlob)
2. ParseComposition → Phase7CompositionResult
3. Phase7Planner.Compute(lastApplied, next) → Phase7Plan
4. Empty plan ⇒ no-op (deploy of unchanged composition is benign)
5. applier.Apply(plan) drives sink.RebuildAddressSpace +
WriteAlarmState for removed nodes
6. lastApplied = next so the next rebuild diffs forward
- Without dbFactory/applier wiring, falls back to raw
sink.RebuildAddressSpace — the dev/Mac path before #108 binds prod.
DriverHostActor:
- New Props param opcUaPublishActor (IActorRef?). After successful
ApplyAndAck (status Applied, ACK sent), tells the publish actor
RebuildAddressSpace with the same correlation id so the audit trail
threads through. Null publish actor ⇒ no trigger (admin-only nodes).
Tests: Runtime 63 -> 69 (+6):
- ParseComposition reads Equipment/Driver/Alarm sorted by natural key
- ParseComposition returns empty for empty blob
- Rebuild with dbFactory + sealed deployment artifact triggers exactly
one sink.Rebuild call (Equipment topology added)
- Rebuild with no artifact is idempotent no-op
- Second rebuild with same composition is empty-plan no-op
- Rebuild without dbFactory falls back to raw sink.Rebuild (legacy path)
All 6 v2 test suites green: 173 tests passing.
Closes#109. Engine-wiring data flow is now end-to-end through:
Deploy → DriverHostActor.ApplyAndAck → driver spawn + ACK +
RebuildAddressSpace → OpcUaPublishActor → Phase7Applier → SDK
NodeManager → subscribed OPC UA clients see the change.
End-to-end data path is now wired on the read side: driver subscriptions
fire AttributeValuePublished → DriverHostActor → DependencyMuxActor →
DependencyValueChanged to every interested VirtualTagActor. Previously
the publish hit a dead-letter at the host.
DependencyMuxActor:
- Per-node fan-out router. Maintains tagRef → Set<IActorRef> with a
reverse subscriber → refs index so unregister/replace are O(refs).
- Watches subscribers; Terminated triggers automatic unregister so
dead virtual-tag actors stop receiving publishes.
- Re-register replaces the prior interest set — no stale-ref leaks
on actor restart.
- Drops publishes for refs with no interested subscribers.
VirtualTagActor:
- New Props params: dependencyRefs + mux ActorRef.
- PreStart sends RegisterInterest to the mux; PostStop sends
UnregisterInterest. Default both null so older callers stay quiet.
DriverHostActor:
- New dependencyMux Props param. Steady + Applying states now
receive AttributeValuePublished from their DriverInstance children
and forward to the mux. Null mux is a no-op (dev/Mac).
ServiceCollectionExtensions:
- WithOtOpcUaRuntimeActors spawns DependencyMuxActor before
DriverHostActor and threads its ActorRef into the host's Props.
New DependencyMuxActorKey + DependencyMuxActorName.
Tests: Runtime 57 -> 63 (+6):
- Mux forwards to only subscribers interested in each ref
- Publish for unregistered ref is dropped silently
- Unregister stops forwarding
- Re-register replaces prior interest set
- VirtualTagActor PreStart registration drives end-to-end eval
(uses AwaitAssert to race-safely settle the PreStart Tell)
- DriverHostActor forwards AttributeValuePublished through to mux
All 6 v2 test suites green: 163 tests passing.
F8 (#79) state updated — dep subscribe seam shipped, Core.VirtualTags
production engine binding (compile + ITagUpstreamSource subscribe) is
the residual.
ScriptedAlarmActor now survives actor restart: PreStart loads from
the configured store + restores in-memory state; every Transition()
fires a fire-and-forget save. ActiveState still re-derives from the
evaluator on first tick (Phase 7 decision #14), but Acked state +
lastAckUser persist verbatim so operators don't re-ack across an
outage.
Three pieces:
- IAlarmActorStateStore seam in Commons.Engines, with the
AlarmActorStateSnapshot record (alarmId / state / lastTransitionUtc
/ lastAckUser) and NullAlarmActorStateStore default.
- EfAlarmActorStateStore in Runtime.ScriptedAlarms — production
adapter over the existing ScriptedAlarmState table in ConfigDb.
Maps the actor's 3-state enum to the table's AckedState column
(Active⇒Unacknowledged, Acknowledged⇒Acknowledged, Inactive⇒
Acknowledged). Concurrency conflicts are logged + dropped — the
next transition writes again.
- ScriptedAlarmActor PreStart load (async, piped back as
StateRestored) + Transition save. New Props overload takes the
store; default is NullAlarmActorStateStore so tests stay quiet.
Tests: Runtime 52 -> 57 (+5):
- Transition writes Active then Acknowledged snapshots with
lastAckUser populated
- PreStart with persisted Active state restores so a subsequent
AcknowledgeAlarm fires (not ignored as it would be from Inactive)
- Empty store boots Inactive (AcknowledgeAlarm correctly ignored)
- EfAlarmActorStateStore Save + Load round-trips via in-memory EF
- Load for unknown alarmId returns null
All 6 v2 test suites green: 157 tests passing.
Closes#112. F9 (#80) remaining residual is predicate binding to
Core.ScriptedAlarms.ScriptedAlarmEngine — split as F9b in tasks JSON.
Three pieces landed in one batch, closing F7-residual + Host DI #106:
Runtime/DriverInstanceActor:
- Subscribe / Unsubscribe message contracts; the Connected state
handles them via IDriver.ISubscribable. On every OnDataChange
event the actor publishes AttributeValuePublished to its parent
(DriverHostActor → OpcUaPublishActor). OPC UA StatusCode is
mapped to the 3-state OpcUaQuality enum via severity bits
(00=Good, 01=Uncertain, 10/11=Bad).
- DetachSubscription tears the handler off the driver on
DisconnectObserved, Unsubscribe, and PostStop so a stale handler
never pushes to a dead actor.
- WriteAttribute now dispatches IWritable.WriteAsync (batch of one)
with a 5s CancellationTokenSource; status-code propagated to
WriteAttributeResult on non-Good results.
Host:
- New ProjectReferences to Core + every cross-platform driver
assembly (AbCip/AbLegacy/FOCAS/Galaxy/Modbus/S7/TwinCAT).
Galaxy is net10 (gRPC client to mxaccessgw); the COM-bound net48
Wonderware Historian driver stays out of the Host's reference
closure — its .Client gRPC wrapper is what binds for historian
needs.
- New DriverFactoryBootstrap.AddOtOpcUaDriverFactories() registers
a singleton DriverFactoryRegistry, invokes each driver's
Register(registry, loggerFactory), and binds IDriverFactory to
DriverFactoryRegistryAdapter. Replaces the F7 NullDriverFactory
default so deploys actually materialise real IDriver instances
on driver-role nodes. ShouldStub() still gates per-platform
behaviour at spawn time.
- Program.cs wires AddOtOpcUaDriverFactories() before AddAkka so
the runtime extension can resolve IDriverFactory from DI.
Tests: Runtime 46 -> 52 (+6):
- Write returns success when StatusCode = Good
- Write propagates non-Good status code in failure Reason
- Subscribe forwards OnDataChange to parent as AttributeValuePublished
- Quality translation: Uncertain (0x40...) and Bad (0x80...)
- Subscribe against non-ISubscribable returns failure
- DisconnectObserved detaches handler so late events are dropped
All 6 v2 test suites green: 152 tests passing.
Closes F7. F7-residual sub-tasks #110 (subscribe) and #111 (write)
both shipped. Host DI binding #106 shipped.
OpcUaPublishActor now routes through pluggable seams instead of just
incrementing a counter:
- IOpcUaAddressSpaceSink (Commons.OpcUa) — WriteValue / WriteAlarmState
/ RebuildAddressSpace. OpcUaQuality enum moved here from the actor's
nested type so producers don't have to reference the actor itself.
- IServiceLevelPublisher — Publish(byte). NullServiceLevelPublisher
retains the last level for inspection.
- The actor subscribes to the redundancy-state DPS topic in PreStart
and maps the local node's NodeRedundancyState to a coarse
ServiceLevel (Primary+leader=240, Primary=200, Secondary=100,
Detached=0). This keeps the local SDK's ServiceLevel node honest
without round-tripping back through the admin-singleton calculator.
- ServiceLevelChanged dedupes identical levels so the SDK doesn't see
redundant writes.
- Sink + publisher exceptions are caught and logged; the actor never
crashes its own dispatcher.
- PropsForTests gets optional sink/publisher/localNode params and
skips the DPS subscribe so unit tests stay on a vanilla TestKit
cluster.
Production binding to a real SDK NodeManager + Variable nodes is the
remaining residual — split as F10b. Task 60 still blocked on F10b.
Tests: Runtime 40 -> 46 (+6):
- AttributeValueUpdate routes to sink
- AlarmStateUpdate routes to sink
- RebuildAddressSpace calls sink.Rebuild
- ServiceLevelChanged dedupes
- RedundancyStateChanged for primary-leader publishes 240
- RedundancyStateChanged for secondary publishes 100
All 6 v2 test suites green: 132 tests passing.
VirtualTagActor and ScriptedAlarmActor now route through pluggable
evaluator interfaces and fan out to the cluster's live-tail topics
shipped in F15.3:
- IVirtualTagEvaluator + NullVirtualTagEvaluator in Commons.Engines.
VirtualTagActor calls evaluator on every DependencyValueChanged,
dedupes unchanged values, forwards EvaluationResult to its parent,
and publishes ScriptLogEntry Warning to the script-logs DPS topic
whenever the evaluator fails.
- IScriptedAlarmEvaluator + NullScriptedAlarmEvaluator. ScriptedAlarmActor
takes an AlarmConfig (id/name/equipment-path/severity/predicate) and
publishes both an AlarmTransitionEvent (alerts topic) and a
ScriptLogEntry (script-logs topic) at every transition. Manual
ConditionMet/Acknowledge/Cleared still flow through the same
Transition() so callers without engine bindings still drive the
state machine; the legacy single-string Props() overload routes
through a default AlarmConfig.
The Null* defaults keep the actors safe when no engine is bound —
unconfigured nodes never spuriously alarm. Production binding to
Core.VirtualTags.VirtualTagEngine and Core.ScriptedAlarms is the
remaining residual (F8b/F9b — split in tasks JSON).
Tests: Runtime 34 -> 40 (+6):
- VirtualTagActorTests x3 (evaluator drives EvaluationResult,
unchanged-value dedup, failure publishes Warning ScriptLogEntry)
- ScriptedAlarmActorTests x3 (engine threshold drives Activated +
Cleared on alerts topic, manual Acknowledge attribution).
All 6 v2 test suites green: 126 tests passing.
DriverHostActor.ApplyAndAck now reads the deployment artifact and
reconciles its set of DriverInstanceActor children — spawn the missing,
ApplyDelta to those with changed config, stop the removed/disabled.
The diff lives in pure DriverSpawnPlanner so it can be unit-tested
without an ActorSystem.
Adds IDriverFactory in Core.Abstractions (consumed by Runtime) +
DriverFactoryRegistryAdapter in Core.Hosting that wraps the existing
v1 DriverFactoryRegistry — Runtime stays decoupled from Polly/Serilog,
the Host wires the adapter once driver assemblies have registered.
ShouldStub(type, roles) is now actually called on every spawn — Galaxy
+ Wonderware-Historian boot stubbed on macOS/Linux or whenever the host
carries the dev role. Missing factory ⇒ stub fallback, never a crash.
Tests: 24 → 34 in Runtime (+10):
- DriverSpawnPlannerTests x7 (diff cases, type change ⇒ stop+respawn)
- DeploymentArtifactTests x5 (empty/malformed/missing fields tolerant)
- DriverHostActorReconcileTests x4 (spawn count, stub fallback,
ShouldStub gate, second-apply stops the removed)
All 6 v2 test suites green: 120 tests passing.
Closes F20 (ShouldStub wired). F7 marked partial — subscription
publishing + write path still stubbed in DriverInstanceActor itself.
Reshapes the placeholder buffered-counter actor into a thin fire-and-forget
bridge over the existing IAlarmHistorianSink contract. Default sink is
NullAlarmHistorianSink; production deployments override the DI binding to
SqliteStoreAndForwardSink wrapping WonderwareHistorianClient (the v1
components in src/Drivers/ZB.MOM.WW.OtOpcUa.Driver.Historian.Wonderware*
are reused verbatim — actor is just a mailbox-friendly entry point).
- HistorianAdapterActor.Props(IAlarmHistorianSink?) — null defaults to NullAlarmHistorianSink
- Receive<AlarmHistorianEvent>: fire-and-forget sink.EnqueueAsync
- Receive<GetStatus>: returns sink.GetStatus() (queue depth + drain state)
- ServiceCollectionExtensions.AddOtOpcUaRuntime registers the default sink
- WithOtOpcUaRuntimeActors spawns the actor + registers HistorianAdapterActorKey
- Program.cs calls AddOtOpcUaRuntime when hasDriver
Tests: 2 new (forward-to-sink + GetStatus). Runtime suite 17 → 18.
- New Commons.Messages.Fleet.GetDiagnostics request record.
- DriverHostActor handles GetDiagnostics in all three states (Steady, Applying,
Stale); replies with a NodeDiagnosticsSnapshot built from _currentRevision
+ the local NodeId. Drivers list is empty until F7 wires the per-instance
children.
- FleetDiagnosticsClient now resolves the target via ActorSelection at
akka.tcp://{system}@{nodeId}/user/driver-host and Asks with a 3s timeout.
On timeout/peer-down it returns an empty snapshot so the UI degrades
gracefully rather than throwing.
Two new integration tests in Host.IntegrationTests:
- GetDiagnostics_returns_snapshot_with_target_NodeId verifies the
cross-node Ask/Reply works.
- GetDiagnostics_after_deploy_reports_current_revision exercises the
end-to-end path: AdminOps starts a deployment, both DriverHostActors
apply, then diagnostics reports the new revision on both nodes.
All 98 v2 tests pass (was 96 + 2 new).
Replaces the Ok=true stub with a TCP connect to the peer's OPC UA port (4840
default) with a 2s timeout. A successful connect indicates the OPC UA server
process is up + accepting connections — enough for the redundancy calculator
to treat the peer as live. A full secure-channel Hello/Acknowledge handshake
is overkill for what the redundancy calc consumes and would pull in the OPC
UA Client SDK + a PKI setup. Upgrade later if a deeper liveness signal is ever
required.
Probe extracts the host from NodeId by stripping the :port suffix (commit
5cfbe8b encoded host:port into NodeId for cluster-member identity).
Tests: 2 new tests — Ok=true against a live TcpListener on a chosen port,
Ok=false against an unreachable endpoint. All 17 Runtime tests pass (was 16
covering only the message-contract surface).
DeployHappyPathTests exercises the full deploy pipeline on the 2-node harness:
AdminOperationsActor → ConfigPublishCoordinator → DistributedPubSub →
DriverHostActor on both nodes → ApplyAck → coordinator seals. Verifies both
NodeDeploymentState rows reach Applied and Deployment.Status reaches Sealed.
Exposed + fixed two production bugs along the way:
1. Coordinator was publishing DispatchDeployment on the "deployments" topic but
never subscribed to anything — DriverHostActor ACKs published on the same
topic could not reach it. Added dedicated "deployment-acks" topic with
coordinator subscription in PreStart, and DriverHostActor publishes ACKs
there.
2. NodeId derivation used member.Address.Host only — two cluster members on a
shared loopback host (test harness, dev VMs) collided to one identity. The
coordinator's expected-ack set became {1} and the system sealed after only
half the nodes acked. Switched to host:port everywhere (ClusterRoleInfo +
coordinator) so loopback nodes stay distinct and production identities are
harmlessly more specific.
Tests: 95 v2 tests pass (was 93 + 2 deploy tests), 0 skipped.
Failover scenarios (design §8 cases 3-7: node-kill-mid-apply, split-brain,
restart-during-deploy) deferred — they need controlled node-down primitives
on the harness. Tracked as F22 (failover scenario test cases).
Mirrors WithOtOpcUaControlPlaneSingletons for the driver role. Spawns
DriverHostActor + DbHealthProbeActor on the host's ActorSystem and
registers both under marker keys. Host's Program.cs now calls it when
the node carries the driver role, so driver-only and admin+driver
deployments both auto-bootstrap the per-node actors.
Integration test covers the registration round-trip via Microsoft.Extensions.Hosting
+ Akka.Hosting AddAkka.