ScriptedAlarmActor now survives actor restart: PreStart loads from
the configured store + restores in-memory state; every Transition()
fires a fire-and-forget save. ActiveState still re-derives from the
evaluator on first tick (Phase 7 decision #14), but Acked state +
lastAckUser persist verbatim so operators don't re-ack across an
outage.
Three pieces:
- IAlarmActorStateStore seam in Commons.Engines, with the
AlarmActorStateSnapshot record (alarmId / state / lastTransitionUtc
/ lastAckUser) and NullAlarmActorStateStore default.
- EfAlarmActorStateStore in Runtime.ScriptedAlarms — production
adapter over the existing ScriptedAlarmState table in ConfigDb.
Maps the actor's 3-state enum to the table's AckedState column
(Active⇒Unacknowledged, Acknowledged⇒Acknowledged, Inactive⇒
Acknowledged). Concurrency conflicts are logged + dropped — the
next transition writes again.
- ScriptedAlarmActor PreStart load (async, piped back as
StateRestored) + Transition save. New Props overload takes the
store; default is NullAlarmActorStateStore so tests stay quiet.
Tests: Runtime 52 -> 57 (+5):
- Transition writes Active then Acknowledged snapshots with
lastAckUser populated
- PreStart with persisted Active state restores so a subsequent
AcknowledgeAlarm fires (not ignored as it would be from Inactive)
- Empty store boots Inactive (AcknowledgeAlarm correctly ignored)
- EfAlarmActorStateStore Save + Load round-trips via in-memory EF
- Load for unknown alarmId returns null
All 6 v2 test suites green: 157 tests passing.
Closes#112. F9 (#80) remaining residual is predicate binding to
Core.ScriptedAlarms.ScriptedAlarmEngine — split as F9b in tasks JSON.
Three pieces landed in one batch, closing F7-residual + Host DI #106:
Runtime/DriverInstanceActor:
- Subscribe / Unsubscribe message contracts; the Connected state
handles them via IDriver.ISubscribable. On every OnDataChange
event the actor publishes AttributeValuePublished to its parent
(DriverHostActor → OpcUaPublishActor). OPC UA StatusCode is
mapped to the 3-state OpcUaQuality enum via severity bits
(00=Good, 01=Uncertain, 10/11=Bad).
- DetachSubscription tears the handler off the driver on
DisconnectObserved, Unsubscribe, and PostStop so a stale handler
never pushes to a dead actor.
- WriteAttribute now dispatches IWritable.WriteAsync (batch of one)
with a 5s CancellationTokenSource; status-code propagated to
WriteAttributeResult on non-Good results.
Host:
- New ProjectReferences to Core + every cross-platform driver
assembly (AbCip/AbLegacy/FOCAS/Galaxy/Modbus/S7/TwinCAT).
Galaxy is net10 (gRPC client to mxaccessgw); the COM-bound net48
Wonderware Historian driver stays out of the Host's reference
closure — its .Client gRPC wrapper is what binds for historian
needs.
- New DriverFactoryBootstrap.AddOtOpcUaDriverFactories() registers
a singleton DriverFactoryRegistry, invokes each driver's
Register(registry, loggerFactory), and binds IDriverFactory to
DriverFactoryRegistryAdapter. Replaces the F7 NullDriverFactory
default so deploys actually materialise real IDriver instances
on driver-role nodes. ShouldStub() still gates per-platform
behaviour at spawn time.
- Program.cs wires AddOtOpcUaDriverFactories() before AddAkka so
the runtime extension can resolve IDriverFactory from DI.
Tests: Runtime 46 -> 52 (+6):
- Write returns success when StatusCode = Good
- Write propagates non-Good status code in failure Reason
- Subscribe forwards OnDataChange to parent as AttributeValuePublished
- Quality translation: Uncertain (0x40...) and Bad (0x80...)
- Subscribe against non-ISubscribable returns failure
- DisconnectObserved detaches handler so late events are dropped
All 6 v2 test suites green: 152 tests passing.
Closes F7. F7-residual sub-tasks #110 (subscribe) and #111 (write)
both shipped. Host DI binding #106 shipped.
Splits the side-effecting half of Phase7Composer (deferred at Task 47)
into two pieces that mirror DriverHostActor's spawn-plan pattern:
Phase7Plan + Phase7Planner.Compute (pure):
Diff two Phase7CompositionResult snapshots by stable id (EquipmentId,
DriverInstanceId, ScriptedAlarmId). Emits Added/Removed/Changed lists
per entity class. Added/Removed are sorted by id for deterministic
apply order. Changed wraps both Previous and Current projections so
consumers can decide between in-place mutation and tear-down +
rebuild.
Phase7Applier (side-effecting):
Drives an IOpcUaAddressSpaceSink against a plan. Removed equipment/
alarms get an inactive AlarmState write per id; Added/Removed of
Equipment or ScriptedAlarm triggers RebuildAddressSpace. Driver-only
changes correctly skip the rebuild — those flow through DriverHost-
Actor's spawn-plan in Runtime. Sink exceptions are caught + logged so
one bad node doesn't abort the apply.
Tests: OpcUaServer 6 -> 20 (+14):
- Phase7PlannerTests x9 (empty-in/empty-out, add/remove/change per
entity class, mixed changes, deterministic ordering)
- Phase7ApplierTests x5 (empty plan no-op, removal writes inactive
states + rebuild, added equipment triggers rebuild, driver-only
skips rebuild, sink fault is non-fatal)
The remaining piece is the EquipmentNodeWalker integration against a
real SDK NodeManager — split as F14b, gated on F10b's SDK builder.
All 6 v2 test suites green: 146 tests passing.
OpcUaPublishActor now routes through pluggable seams instead of just
incrementing a counter:
- IOpcUaAddressSpaceSink (Commons.OpcUa) — WriteValue / WriteAlarmState
/ RebuildAddressSpace. OpcUaQuality enum moved here from the actor's
nested type so producers don't have to reference the actor itself.
- IServiceLevelPublisher — Publish(byte). NullServiceLevelPublisher
retains the last level for inspection.
- The actor subscribes to the redundancy-state DPS topic in PreStart
and maps the local node's NodeRedundancyState to a coarse
ServiceLevel (Primary+leader=240, Primary=200, Secondary=100,
Detached=0). This keeps the local SDK's ServiceLevel node honest
without round-tripping back through the admin-singleton calculator.
- ServiceLevelChanged dedupes identical levels so the SDK doesn't see
redundant writes.
- Sink + publisher exceptions are caught and logged; the actor never
crashes its own dispatcher.
- PropsForTests gets optional sink/publisher/localNode params and
skips the DPS subscribe so unit tests stay on a vanilla TestKit
cluster.
Production binding to a real SDK NodeManager + Variable nodes is the
remaining residual — split as F10b. Task 60 still blocked on F10b.
Tests: Runtime 40 -> 46 (+6):
- AttributeValueUpdate routes to sink
- AlarmStateUpdate routes to sink
- RebuildAddressSpace calls sink.Rebuild
- ServiceLevelChanged dedupes
- RedundancyStateChanged for primary-leader publishes 240
- RedundancyStateChanged for secondary publishes 100
All 6 v2 test suites green: 132 tests passing.
VirtualTagActor and ScriptedAlarmActor now route through pluggable
evaluator interfaces and fan out to the cluster's live-tail topics
shipped in F15.3:
- IVirtualTagEvaluator + NullVirtualTagEvaluator in Commons.Engines.
VirtualTagActor calls evaluator on every DependencyValueChanged,
dedupes unchanged values, forwards EvaluationResult to its parent,
and publishes ScriptLogEntry Warning to the script-logs DPS topic
whenever the evaluator fails.
- IScriptedAlarmEvaluator + NullScriptedAlarmEvaluator. ScriptedAlarmActor
takes an AlarmConfig (id/name/equipment-path/severity/predicate) and
publishes both an AlarmTransitionEvent (alerts topic) and a
ScriptLogEntry (script-logs topic) at every transition. Manual
ConditionMet/Acknowledge/Cleared still flow through the same
Transition() so callers without engine bindings still drive the
state machine; the legacy single-string Props() overload routes
through a default AlarmConfig.
The Null* defaults keep the actors safe when no engine is bound —
unconfigured nodes never spuriously alarm. Production binding to
Core.VirtualTags.VirtualTagEngine and Core.ScriptedAlarms is the
remaining residual (F8b/F9b — split in tasks JSON).
Tests: Runtime 34 -> 40 (+6):
- VirtualTagActorTests x3 (evaluator drives EvaluationResult,
unchanged-value dedup, failure publishes Warning ScriptLogEntry)
- ScriptedAlarmActorTests x3 (engine threshold drives Activated +
Cleared on alerts topic, manual Acknowledge attribution).
All 6 v2 test suites green: 126 tests passing.
DriverHostActor.ApplyAndAck now reads the deployment artifact and
reconciles its set of DriverInstanceActor children — spawn the missing,
ApplyDelta to those with changed config, stop the removed/disabled.
The diff lives in pure DriverSpawnPlanner so it can be unit-tested
without an ActorSystem.
Adds IDriverFactory in Core.Abstractions (consumed by Runtime) +
DriverFactoryRegistryAdapter in Core.Hosting that wraps the existing
v1 DriverFactoryRegistry — Runtime stays decoupled from Polly/Serilog,
the Host wires the adapter once driver assemblies have registered.
ShouldStub(type, roles) is now actually called on every spawn — Galaxy
+ Wonderware-Historian boot stubbed on macOS/Linux or whenever the host
carries the dev role. Missing factory ⇒ stub fallback, never a crash.
Tests: 24 → 34 in Runtime (+10):
- DriverSpawnPlannerTests x7 (diff cases, type change ⇒ stop+respawn)
- DeploymentArtifactTests x5 (empty/malformed/missing fields tolerant)
- DriverHostActorReconcileTests x4 (spawn count, stub fallback,
ShouldStub gate, second-apply stops the removed)
All 6 v2 test suites green: 120 tests passing.
Closes F20 (ShouldStub wired). F7 marked partial — subscription
publishing + write path still stubbed in DriverInstanceActor itself.
Replaces the Ok=true stub with a TCP connect to the peer's OPC UA port (4840
default) with a 2s timeout. A successful connect indicates the OPC UA server
process is up + accepting connections — enough for the redundancy calculator
to treat the peer as live. A full secure-channel Hello/Acknowledge handshake
is overkill for what the redundancy calc consumes and would pull in the OPC
UA Client SDK + a PKI setup. Upgrade later if a deeper liveness signal is ever
required.
Probe extracts the host from NodeId by stripping the :port suffix (commit
5cfbe8b encoded host:port into NodeId for cluster-member identity).
Tests: 2 new tests — Ok=true against a live TcpListener on a chosen port,
Ok=false against an unreachable endpoint. All 17 Runtime tests pass (was 16
covering only the message-contract surface).
The original Task 14 (5-min EF migration that "drops ConfigGeneration") was
under-scoped: the design doc (live-edit model, ~line 208) requires removing
GenerationId from 13 entities (Equipment, DriverInstance, Device, Tag,
PollGroup, Namespace, UnsArea, UnsLine, NodeAcl, Script, VirtualTag,
ScriptedAlarm) and adding RowVersion columns for last-write-wins detection.
That cascades into GenerationApplier / GenerationDiff / GenerationSealedCache
and the legacy Server/Admin CRUD services.
New decomposition (~85 min total, replacing the original 5-min estimate):
14a standard 10m Add RowVersion to live-edit entities
14b high-risk 30m Drop GenerationId FK from those entities
14c high-risk 20m Obsolete GenerationApplier/Diff/SealedCache
14d standard 5m Drop ClusterNode.RedundancyRole
14e small 5m Delete ConfigGeneration + ClusterNodeGenerationState
14f high-risk 15m Consolidator: generate V2HostingAlignment migration
Policy decision (recorded with user): OtOpcUa.Server + OtOpcUa.Admin are
allowed to fail-to-compile between 14b and Task 56 - only the new v2 projects
need to stay green. Task 56 deletes the legacy projects.
Plan markdown: replaces the original Task 14 section with the 6-task
decomposition + a header explaining the rewrite. Task index table at the
bottom of the plan updated.
Tasks JSON: replaces the single Task 14 row with 6 string-id rows
("14a", "14b", ..., "14f"). Task 15 (Migrate-To-V2.ps1) and downstream
consumers re-pointed at "14f".
Verification step in 14f rewritten to use the shared docker host at
10.100.0.35 per CLAUDE.md (Docker is not installed on this Mac dev VM).