Filter ExternalIdReservations to WHERE ReleasedAt IS NULL so
DraftSnapshot.ActiveReservations matches its documented semantics and
ValidateReservationPreflight cannot emit spurious BadDuplicateExternalIdentifier
errors from already-released rows. Adds a focused unit test seeding one active
and one released reservation and asserting only the active row is returned.
Persist the canonical AuditOutcome and make structured audit rows visible.
- ConfigAuditLog gains a nullable Outcome column, stored as the AuditOutcome
enum member name (nvarchar(16), mirroring how AdminRole is persisted). The
AuditWriterActor flush now writes Outcome = evt.Outcome.ToString(). Nullable so
legacy rows and the bespoke stored-procedure path (no derived outcome) write
NULL.
- Migration 20260602135350_AddConfigAuditLogOutcome: additive nullable column,
no backfill. Up adds the column, Down drops it. Chains after
20260602112419_CanonicalizeAdminRoles; `dotnet ef migrations
has-pending-model-changes` is clean.
- ClusterAudit visibility fix: the page filtered solely on ClusterId, but the
structured AuditWriterActor path stamps NodeId (ClusterId null), so those rows
were invisible. Extracted ClusterAuditQuery.ForClusterAsync (shared by the page
and tests) which ORs in rows whose NodeId belongs to a node in the cluster —
membership resolved from ClusterNode (NodeId -> ClusterId). SP-path
ClusterId-stamped rows still match.
Tests: ControlPlane 45/45 (adds Outcome persistence + Denied-outcome asserts);
new Configuration ClusterAuditQueryTests 3/3 (both-paths visible, other-cluster
excluded, page-size cap); AdminUI 121/121. Configuration Unit suite is green on a
clean run (a pre-existing timing flake in ResilientConfigReaderTests, untouched
here, occasionally fails under parallel load and passes in isolation).
Deep-adopt the shared audit record. Deletes the bespoke 8-field positional
Commons AuditEvent and repoints the writer path at ZB.MOM.WW.Audit.AuditEvent
(0.1.0, feed-mapped via dohertj2-gitea). Adds the package reference to both
Commons and ControlPlane.
- AuditWriterActor now implements IAuditWriter: WriteAsync(evt, ct) is a
best-effort, never-throwing entry point that Self.Tell()s the event onto the
same batching/dedup/flush pipeline and returns Task.CompletedTask. Existing
Receive<AuditEvent> + 500/5s batching + two-layer dedup unchanged.
- Flush mapping updated for the canonical field types: OccurredAtUtc is now
DateTimeOffset (.UtcDateTime into the datetime2 column), SourceNode is string?
(was NodeId.Value), CorrelationId is Guid? (stored null when null). Outcome is
NOT yet persisted (column lands in Task 2.2).
- New AuditOutcomeMapper.FromAction maps the OtOpcUa action vocabulary to the
required canonical Outcome: OpcUaAccessDenied / CrossClusterNamespaceAttempt ->
Denied; config verbs (DraftCreated/Edited, Published, RolledBack, NodeApplied,
ClusterCreated, NodeAdded, CredentialAdded/Disabled, ExternalIdReleased) ->
Success. OtOpcUa emits no Failure events.
The Akka message shape changed, but the structured audit path is dormant (zero
production emit/Tell sites; all live audit flows through the bespoke SP path),
so there is no rolling-deploy wire-compat concern. Tested-not-exercised by
design.
ControlPlane.Tests: 44/44 green (AuditWriterActor suite rewritten to construct
the canonical record + assert the Outcome derivation table + the WriteAsync
best-effort/mailbox-routing contract + null SourceNode/CorrelationId handling).
Standardize the control-plane admin role VALUES on the canonical six
(ZB.MOM.WW.Auth CanonicalRole). OtOpcUa uses four:
ConfigViewer -> Viewer
ConfigEditor -> Designer
FleetAdmin -> Administrator
DriverOperator -> Operator (appsettings-only string role)
This is a rename, not a permission change: enforcement semantics are
preserved (whoever could deploy/administer/operate before still can).
- AdminRole enum members renamed (persisted as string names via
HasConversion<string>); RoleGrants.razor dropdown default updated.
- EF DATA migration CanonicalizeAdminRoles rewrites existing
LdapGroupRoleMapping.Role rows old->new (Up) and back (Down); schema /
model snapshot byte-identical (no pending model changes).
- Enforcement role STRINGS canonicalized:
* Security policies keep their NAMES ("DriverOperator"/"FleetAdmin")
but require canonical roles: RequireRole("Operator","Administrator")
and RequireRole("Administrator").
* Deployments.razor [Authorize(Roles="Administrator,Designer")].
* DevStub now grants "Administrator"; LdapOptions/doc-comment examples
canonicalized.
- Data-plane authorization (NodePermissions/NodeAcl/IPermissionEvaluator/
TriePermissionEvaluator/UserAuthorizationState) UNTOUCHED.
- New CanonicalAdminRolesTests pins canonical claim values end-to-end and
the real registered policies; existing role-string tests updated.
The driver/factory/seed use 'GalaxyMxGateway' (legacy 'Galaxy' was retired),
but the AdminUI editor router, GalaxyDriverPage, address picker, identity
dropdown, the Galaxy browser/probe, and DraftValidator still keyed on 'Galaxy'.
Result: the seeded GalaxyMxGateway driver couldn't be edited ('no editor
registered'), UI-created Galaxy drivers wrote a type with no factory, and a
SystemPlatform-bound GalaxyMxGateway driver failed publish validation.
Align all stragglers to GalaxyMxGateway (+ failing-test-first DraftValidator
coverage). ShouldStub's 'Galaxy' legacy safety-net left intact.
- Topic-name drift fix: DriverHealthChanged.TopicName and
DriverControlTopic.Name now live on the message contracts in
Commons. AkkaDriverHealthPublisher, DriverStatusSignalRBridge,
DriverHostActor, and AdminOperationsActor all delegate to the
single constant so a rename can't silently desynchronise
publisher and subscriber.
- DriverStatusPanel._opResultClearTimer switched from
System.Timers.Timer to System.Threading.Timer + awaited
DisposeAsync. Prevents an in-flight 8s clear-callback from
invoking StateHasChanged on a component whose hub has already
been released.
- PublishHealthSnapshot deduplicates against the last published
(state, lastSuccess, lastError, errorCount) fingerprint. The
30s heartbeat no longer floods the SignalR layer with identical
Healthy snapshots — newly-joined clients still warm up via the
snapshot store on JoinDriver.
- RestartDriver / ReconnectDriver messages + AdminOperationsActor
handlers (broadcast via driver-control DPS topic; audited via
ConfigEdits).
- DriverHostActor subscribes to driver-control; locates the
matching child DriverInstanceActor and stops+respawns it
(Restart) or sends it a ForceReconnect internal message
(Reconnect — re-enters Reconnecting state without full stop).
DriverInstanceSpec constructor call uses named args to handle
the full 6-parameter signature.
- New DriverOperator authorization policy mapped to DriverOperator
or FleetAdmin role; documented in docs/security.md. Map LDAP
group via GroupToRole (e.g. "ot-driver-operator": "DriverOperator").
- DriverStatusPanel renders Reconnect + Restart buttons when the
user holds the DriverOperator policy (hidden otherwise). Restart
requires an in-page Razor confirm block (no JS confirm, keeps
SignalR event loop unblocked). Both buttons show a spinner and
are disabled during in-flight; result chip auto-clears after 8s.
Username sourced from AuthenticationStateProvider.
Reconnect resolves to "ForceReconnect" (re-enter Reconnecting,
not full stop+respawn) — transport drops and retries while actor
and in-memory state are preserved. All DriverInstanceActor states
handle ForceReconnect safely (no-op when already in transition).
- AdminProbeService routes TestDriverConnect through
IAdminOperationsClient with a 65s outer guard (actor side already
clamps to [1,60]).
- Added generic AskAsync<T> to IAdminOperationsClient interface and
AdminOperationsClient impl, delegating straight to the Akka proxy.
- DriverTestConnectButton renders the button + inline result chip,
auto-clears after 30s, disables during in-flight.
- Wired into all 9 typed driver pages directly under the
identity section. Sources timeout from the form's
ProbeTimeoutSeconds; sources config JSON from the form's
current Options (operator can test BEFORE saving).
- IDriverProbe abstraction in Core.Abstractions; one impl per driver
type, resolved by DriverType string. Phase 7.3 + 7.4 add concrete
probes for the 9 supported driver types.
- TestDriverConnect / TestDriverConnectResult messages.
- AdminOperationsActor.HandleTestDriverConnectAsync looks up the probe
by DriverType, runs it with a [1,60]s clamped timeout, and returns
success/latency or failure/message. Probes that throw or time out
surface as soft failures.
- IDriverHealthPublisher in Core.Abstractions + NullDriverHealthPublisher
no-op for tests/dev-stub paths.
- AkkaDriverHealthPublisher in Runtime forwards to the cluster-wide
`driver-health` DPS topic.
- DriverInstanceActor instrumented to publish snapshots on every
observable state change + a periodic 30s heartbeat so the AdminUI
snapshot store warms up for newly-joined SignalR clients.
- Sliding 5-minute Faulted-count tracked per actor via Queue<DateTime>.
- DriverHostActor.SpawnChild threads clusterId (_localNode.Value) and
the health publisher down to every DriverInstanceActor child.
- ServiceCollectionExtensions.AddOtOpcUaRuntime registers
AkkaDriverHealthPublisher as IDriverHealthPublisher singleton.
Adds <summary>, <param>, <typeparam>, and <inheritdoc/> tags to public
members surfaced by commentchecker — resolves 5,847 of 5,869 issues
(99.6%) across three /fixdocs passes.
Closes the gap where Tag rows with EquipmentId=NULL + Namespace.Kind=SystemPlatform
(Galaxy hierarchy) existed in ConfigDb but were never surfaced in the OPC UA
address space. Now they materialise as Variable nodes under a folder named for
their FolderPath, browseable through any OPC UA client.
Layers touched:
- IOpcUaAddressSpaceSink: new EnsureVariable(nodeId, parentFolderId, displayName,
dataType) signature on the sink interface, NullSink, DeferredSink, SdkSink.
- OtOpcUaNodeManager.EnsureVariable: creates a BaseDataVariableState parented
under the named folder (or root), initial Value=null +
StatusCode=BadWaitingForInitialData; resolves Tag.DataType strings to the
matching OPC UA built-in NodeId. Idempotent.
- Phase7CompositionResult: new GalaxyTags collection of GalaxyTagPlan records
carrying (TagId, DriverInstanceId, FolderPath, DisplayName, DataType,
MxAccessRef). Constructor overloads keep existing call sites compiling.
- Phase7Composer.Compose: now takes Tag + Namespace inputs, filters for
SystemPlatform-namespace tags with EquipmentId=NULL, emits GalaxyTagPlan
rows with MXAccess ref "FolderPath.Name".
- Phase7Plan: new AddedGalaxyTags / RemovedGalaxyTags / ChangedGalaxyTags
collections + GalaxyTagDelta record; IsEmpty + needsRebuild updated.
- Phase7Planner.Compute: diffs GalaxyTags by TagId via existing DiffById helper.
- DeploymentArtifact.ParseComposition: reads the Tags + Namespaces +
DriverInstances arrays the ConfigComposer already emits, applies the same
SystemPlatform filter, returns the same GalaxyTagPlan list as the composer
so artifact-side and compose-side plans agree.
- Phase7Applier: new MaterialiseGalaxyTags pass that ensures one folder per
distinct FolderPath then one Variable per tag. NodeId for the variable is
"<FolderPath>.<Name>" matching the MXAccess ref so the future Galaxy
SubscribeBulk wiring can address them directly.
- OpcUaPublishActor.RebuildAddressSpace: invokes MaterialiseGalaxyTags after
MaterialiseHierarchy. _lastApplied initialiser updated for the new ctor.
- seed-clusters.sql: pre-existing TestMachine_001.TestAlarm001..003 rows
needed no change — the composer/applier now picks them up automatically.
Verified end-to-end via docker-dev: deploy click → driver-a logs
"Phase7Applier: Galaxy tags materialised (tags=3, folders=1)" → OPC UA Client
CLI browses the three Variable nodes under TestMachine_001 folder. Reads
return BadWaitingForInitialData status (expected — Galaxy driver's
SubscribeBulk wiring to push values into the nodes is the remaining
follow-up).
Phase7Composer now carries UnsAreaProjection + UnsLineProjection lists so
the applier can materialise the full UNS topology in the OPC UA address
space. New IOpcUaAddressSpaceSink.EnsureFolder(folderNodeId, parentNodeId,
displayName) seam (no-op default, recorded in tests, forwarded by
DeferredAddressSpaceSink, implemented by SdkAddressSpaceSink). The SDK-
side OtOpcUaNodeManager gains an EnsureFolder API that creates
FolderState nodes with proper parent linkage; RebuildAddressSpace now
clears folders too so re-applies don't accumulate stale topology.
Phase7Applier.MaterialiseHierarchy walks composition.UnsAreas →
composition.UnsLines → composition.EquipmentNodes, calling EnsureFolder
with the correct parent at each level. Idempotent — calling twice with
the same composition is a no-op. OpcUaPublishActor.HandleRebuild invokes
it after Phase7Applier.Apply so OPC UA clients browsing the server now
see Area/Line/Equipment as proper folders rather than flat tag ids.
DeploymentArtifact.ParseComposition reads UnsAreas + UnsLines from the
JSON snapshot the ControlPlane emits, populating the new fields when
present.
Phase7Composer.Compose now accepts UnsAreas + UnsLines; a 3-arg overload
preserves the old signature for legacy callers + existing tests. The
Phase7CompositionResult convenience ctor likewise keeps the planner
tests working without UNS data.
3 new hierarchy tests (pure unit + boot-verify against a real
OtOpcUaSdkServer); OpcUaServer suite is 48/48 green (was 45, +3),
Runtime 74/74 unchanged.
Closes#85.
SdkServiceLevelPublisher writes Server.ServiceLevel through the SDK's
ServerObjectState — the standard OPC UA non-transparent-redundancy signal
clients use to pick a primary. Writes are guarded by DiagnosticsLock so
concurrent SDK diagnostics scans don't fight with our updates.
DeferredServiceLevelPublisher mirrors the DeferredAddressSpaceSink late-
binding pattern: Akka actors resolve IServiceLevelPublisher at construction,
hosted service swaps the SDK publisher in after StandardServer.Start. Host
Program.cs registers DeferredServiceLevelPublisher as the singleton bound
to IServiceLevelPublisher; OtOpcUaServerHostedService gets it injected and
fills it once IServerInternal is available.
Tests boot a real StandardServer on a free port (cross-platform), call
Publish, then verify ServerObject.ServiceLevel.Value reflects the write.
5 new tests; OpcUaServer suite now 45/45 green (was 40, +5).
Closes#81 residual. Unblocks Task 60 (OPC UA dual-endpoint + ServiceLevel
tests).
OtOpcUaTelemetry (Commons/Observability) centralizes the project's Meter
+ ActivitySource so all instrumentation points emit through a single
named surface. Counters cover the hot paths:
otopcua.deploy.applied (outcome=ack|reject)
otopcua.deploy.apply.duration (s, histogram)
otopcua.driver.lifecycle (event=spawn|spawn_stub|stop|fault)
otopcua.virtualtag.eval (outcome=ok|fail|skip)
otopcua.scriptedalarm.transition (state=activated|acknowledged|cleared)
otopcua.opcua.sink.write (kind=value|alarm|rebuild)
otopcua.redundancy.service_level_change (level=byte)
Plus two ActivitySource spans:
otopcua.deploy.apply wraps DriverHostActor.ApplyAndAck
otopcua.opcua.address_space_rebuild wraps OpcUaPublishActor.HandleRebuild
Instruments are no-op until a listener attaches, so tests + dev hosts
pay nothing for unread telemetry.
Host Program.cs gains AddOtOpcUaObservability() (binds the OtOpcUa Meter
+ ActivitySource to OpenTelemetry, attaches a Prometheus exporter) and
MapOtOpcUaMetrics() (mounts /metrics scrape endpoint). Driver-side
internals + ASP.NET request metrics deliberately stay off — the scrape
payload is scoped to OtOpcUa signals only.
Tests use MeterListener + ActivityListener to verify
VirtualTagActor.eval, OpcUaPublishActor.AttributeValueUpdate, and
RebuildAddressSpace actually emit on the central instruments. Runtime
suite is 72 / 72 green (+3).
Closes#105. Path A (F13b/c/d) complete; next batch options: #85 UNS
folder hierarchy in SDK, or F8b/F9b production engine bindings.
Wires the OPC UA SDK into the fused Host's lifecycle on driver-role
nodes + spawns OpcUaPublishActor with the proper sink/publisher/dbFactory/
applier resolution. The full read+write data path is now live in
production: Deploy → DriverHost → OpcUaPublish → SDK NodeManager →
subscribed OPC UA clients.
DeferredAddressSpaceSink (Commons.OpcUa):
- Thread-safe wrapper IOpcUaAddressSpaceSink that delegates to an
inner sink swapped in at runtime. Needed because Akka actors
resolve the sink at construction time, but the production sink
(SdkAddressSpaceSink wrapping OtOpcUaNodeManager) only exists
after the SDK StandardServer has started.
- Defaults to NullOpcUaAddressSpaceSink so calls before swap are
safe; SetSink(null) reverts (for graceful shutdown).
OtOpcUaServerHostedService (Host.OpcUa):
- IHostedService that owns the OPC UA SDK lifecycle. Reads
OpcUaApplicationHostOptions from the 'OpcUa' config section,
creates an OtOpcUaSdkServer, boots it through OpcUaApplicationHost,
then swaps a real SdkAddressSpaceSink into the DeferredAddressSpaceSink
singleton.
- SDK boot failure is logged + non-fatal — the rest of the host
(admin UI, driver actors) keeps running. Stop reverts to null sink.
WithOtOpcUaRuntimeActors (Runtime):
- Now spawns OpcUaPublishActor (new actor) + threads its ActorRef
into DriverHostActor's Props so successful applies trigger the
address-space rebuild pipeline.
- Phase7Applier is constructed here from the resolved sink + a
logger; OpcUaPublishActor takes both.
- Prepends the opcua-synchronized-dispatcher HOCON so the extension
is self-contained — consumers (Host, tests) don't need to redeclare
the dispatcher block.
- New OpcUaPublishActorKey + OpcUaPublishActorName for actor-registry
resolution.
- AddOtOpcUaRuntime now also TryAddSingleton's NullOpcUaAddressSpaceSink
+ NullServiceLevelPublisher so admin-only nodes (or tests that
don't bind the Deferred sink) stay safe.
Host.Program.cs (driver-role only):
- Binds DeferredAddressSpaceSink as singleton + as IOpcUaAddressSpaceSink
- AddHostedService<OtOpcUaServerHostedService>()
Tests: OpcUaServer 24 -> 28 (+4 DeferredAddressSpaceSink unit tests),
Runtime 69 -> 69 (existing ServiceCollectionExtensionsTests extended
to verify the new mux + publish actor registration).
All 6 v2 test suites green: 177 tests passing.
Closes#108. Engine-wiring is now production-bound end-to-end on
driver-role nodes — Deploy reaches real OPC UA Variable nodes that
subscribed clients see.
ScriptedAlarmActor now survives actor restart: PreStart loads from
the configured store + restores in-memory state; every Transition()
fires a fire-and-forget save. ActiveState still re-derives from the
evaluator on first tick (Phase 7 decision #14), but Acked state +
lastAckUser persist verbatim so operators don't re-ack across an
outage.
Three pieces:
- IAlarmActorStateStore seam in Commons.Engines, with the
AlarmActorStateSnapshot record (alarmId / state / lastTransitionUtc
/ lastAckUser) and NullAlarmActorStateStore default.
- EfAlarmActorStateStore in Runtime.ScriptedAlarms — production
adapter over the existing ScriptedAlarmState table in ConfigDb.
Maps the actor's 3-state enum to the table's AckedState column
(Active⇒Unacknowledged, Acknowledged⇒Acknowledged, Inactive⇒
Acknowledged). Concurrency conflicts are logged + dropped — the
next transition writes again.
- ScriptedAlarmActor PreStart load (async, piped back as
StateRestored) + Transition save. New Props overload takes the
store; default is NullAlarmActorStateStore so tests stay quiet.
Tests: Runtime 52 -> 57 (+5):
- Transition writes Active then Acknowledged snapshots with
lastAckUser populated
- PreStart with persisted Active state restores so a subsequent
AcknowledgeAlarm fires (not ignored as it would be from Inactive)
- Empty store boots Inactive (AcknowledgeAlarm correctly ignored)
- EfAlarmActorStateStore Save + Load round-trips via in-memory EF
- Load for unknown alarmId returns null
All 6 v2 test suites green: 157 tests passing.
Closes#112. F9 (#80) remaining residual is predicate binding to
Core.ScriptedAlarms.ScriptedAlarmEngine — split as F9b in tasks JSON.
OpcUaPublishActor now routes through pluggable seams instead of just
incrementing a counter:
- IOpcUaAddressSpaceSink (Commons.OpcUa) — WriteValue / WriteAlarmState
/ RebuildAddressSpace. OpcUaQuality enum moved here from the actor's
nested type so producers don't have to reference the actor itself.
- IServiceLevelPublisher — Publish(byte). NullServiceLevelPublisher
retains the last level for inspection.
- The actor subscribes to the redundancy-state DPS topic in PreStart
and maps the local node's NodeRedundancyState to a coarse
ServiceLevel (Primary+leader=240, Primary=200, Secondary=100,
Detached=0). This keeps the local SDK's ServiceLevel node honest
without round-tripping back through the admin-singleton calculator.
- ServiceLevelChanged dedupes identical levels so the SDK doesn't see
redundant writes.
- Sink + publisher exceptions are caught and logged; the actor never
crashes its own dispatcher.
- PropsForTests gets optional sink/publisher/localNode params and
skips the DPS subscribe so unit tests stay on a vanilla TestKit
cluster.
Production binding to a real SDK NodeManager + Variable nodes is the
remaining residual — split as F10b. Task 60 still blocked on F10b.
Tests: Runtime 40 -> 46 (+6):
- AttributeValueUpdate routes to sink
- AlarmStateUpdate routes to sink
- RebuildAddressSpace calls sink.Rebuild
- ServiceLevelChanged dedupes
- RedundancyStateChanged for primary-leader publishes 240
- RedundancyStateChanged for secondary publishes 100
All 6 v2 test suites green: 132 tests passing.
VirtualTagActor and ScriptedAlarmActor now route through pluggable
evaluator interfaces and fan out to the cluster's live-tail topics
shipped in F15.3:
- IVirtualTagEvaluator + NullVirtualTagEvaluator in Commons.Engines.
VirtualTagActor calls evaluator on every DependencyValueChanged,
dedupes unchanged values, forwards EvaluationResult to its parent,
and publishes ScriptLogEntry Warning to the script-logs DPS topic
whenever the evaluator fails.
- IScriptedAlarmEvaluator + NullScriptedAlarmEvaluator. ScriptedAlarmActor
takes an AlarmConfig (id/name/equipment-path/severity/predicate) and
publishes both an AlarmTransitionEvent (alerts topic) and a
ScriptLogEntry (script-logs topic) at every transition. Manual
ConditionMet/Acknowledge/Cleared still flow through the same
Transition() so callers without engine bindings still drive the
state machine; the legacy single-string Props() overload routes
through a default AlarmConfig.
The Null* defaults keep the actors safe when no engine is bound —
unconfigured nodes never spuriously alarm. Production binding to
Core.VirtualTags.VirtualTagEngine and Core.ScriptedAlarms is the
remaining residual (F8b/F9b — split in tasks JSON).
Tests: Runtime 34 -> 40 (+6):
- VirtualTagActorTests x3 (evaluator drives EvaluationResult,
unchanged-value dedup, failure publishes Warning ScriptLogEntry)
- ScriptedAlarmActorTests x3 (engine threshold drives Activated +
Cleared on alerts topic, manual Acknowledge attribution).
All 6 v2 test suites green: 126 tests passing.
DriverHostActor.ApplyAndAck now reads the deployment artifact and
reconciles its set of DriverInstanceActor children — spawn the missing,
ApplyDelta to those with changed config, stop the removed/disabled.
The diff lives in pure DriverSpawnPlanner so it can be unit-tested
without an ActorSystem.
Adds IDriverFactory in Core.Abstractions (consumed by Runtime) +
DriverFactoryRegistryAdapter in Core.Hosting that wraps the existing
v1 DriverFactoryRegistry — Runtime stays decoupled from Polly/Serilog,
the Host wires the adapter once driver assemblies have registered.
ShouldStub(type, roles) is now actually called on every spawn — Galaxy
+ Wonderware-Historian boot stubbed on macOS/Linux or whenever the host
carries the dev role. Missing factory ⇒ stub fallback, never a crash.
Tests: 24 → 34 in Runtime (+10):
- DriverSpawnPlannerTests x7 (diff cases, type change ⇒ stop+respawn)
- DeploymentArtifactTests x5 (empty/malformed/missing fields tolerant)
- DriverHostActorReconcileTests x4 (spawn count, stub fallback,
ShouldStub gate, second-apply stops the removed)
All 6 v2 test suites green: 120 tests passing.
Closes F20 (ShouldStub wired). F7 marked partial — subscription
publishing + write path still stubbed in DriverInstanceActor itself.
Final F15 batch wires up the SignalR-backed live pages, ports the bulk
equipment importer, and progressively enhances the Script source editor
with Monaco.
Message contracts:
- Commons.Messages.Alerts.AlarmTransitionEvent — fires on every alarm
state transition; published on the `alerts` DPS topic by future
ScriptedAlarmActor (F9) emits.
- Commons.Messages.Logging.ScriptLogEntry — one log line emitted by a
hosted script; published on the `script-logs` DPS topic by future
VirtualTagActor (F8) + ScriptedAlarmActor (F9) emits.
(Folder named "Logging" to dodge .gitignore's "logs/" rule.)
SignalR plumbing:
- AlertHub gains MethodName + bridge actor (AlertSignalRBridge)
- ScriptLogHub introduced; ScriptLogSignalRBridge follows the same
DPS-subscribe → IHubContext fan-out pattern as FleetStatusSignalRBridge
- WithOtOpcUaSignalRBridges now spawns all three bridges
- MapOtOpcUaHubs maps /hubs/script-log alongside the existing hubs
Pages:
- /alerts live alarm tail, 200-row capacity
- /script-log live script-log tail with level + script
filter, 500-row capacity
- /clusters/{id}/equipment/import — CSV bulk Equipment add with preview
(Name/MachineCode/UnsLineId/Driver +
optional ZTag/SAPID/Manufacturer/Model;
skips rows whose MachineCode already
exists in the fleet)
- ScriptEdit progressively enhanced with Monaco editor via JSInterop —
the textarea remains Blazor's source of truth and Monaco syncs into it
on every keystroke so @bind keeps working; falls back gracefully if
the CDN is unreachable.
MainLayout nav gains a "Live" section (Deployments, Alerts, Alarms
historian) and a "Scripts" link under Scripting. ClusterEquipment
surfaces the new Import CSV button.
Tally: F15 ships ~42 razor pages + 3 SignalR hubs + 3 bridge actors.
Microsoft.AspNetCore.SignalR.Client added (was already in central PM).
All 104 v2 tests remain green.
- New Commons.Messages.Fleet.GetDiagnostics request record.
- DriverHostActor handles GetDiagnostics in all three states (Steady, Applying,
Stale); replies with a NodeDiagnosticsSnapshot built from _currentRevision
+ the local NodeId. Drivers list is empty until F7 wires the per-instance
children.
- FleetDiagnosticsClient now resolves the target via ActorSelection at
akka.tcp://{system}@{nodeId}/user/driver-host and Asks with a 3s timeout.
On timeout/peer-down it returns an empty snapshot so the UI degrades
gracefully rather than throwing.
Two new integration tests in Host.IntegrationTests:
- GetDiagnostics_returns_snapshot_with_target_NodeId verifies the
cross-node Ask/Reply works.
- GetDiagnostics_after_deploy_reports_current_revision exercises the
end-to-end path: AdminOps starts a deployment, both DriverHostActors
apply, then diagnostics reports the new revision on both nodes.
All 98 v2 tests pass (was 96 + 2 new).
ConfigAuditLog gains two nullable columns (EventId, CorrelationId) + a filtered
unique index UX_ConfigAuditLog_EventId. EF migration
20260526105027_AddConfigAuditLogEventIdColumns is additive (nullable + filtered
index = legacy rows backfill cleanly).
AuditWriterActor now writes EventId + CorrelationId into the dedicated columns
instead of synthesising a JSON wrapper into DetailsJson. Cross-restart dedup
is now real: a retry of an already-flushed batch hits the unique index and
SaveChanges throws; the existing catch drops the duplicate without losing the
rest of the batch.
WrapDetails helper deleted — F4 (its JSON hardening) becomes moot.
AuditWriterActorTests.Details_wrapper_embeds_eventId_and_correlationId renamed
+ rewritten to assert against the columns. All 29 ControlPlane tests pass,
all 95 v2 tests green.
DeployHappyPathTests exercises the full deploy pipeline on the 2-node harness:
AdminOperationsActor → ConfigPublishCoordinator → DistributedPubSub →
DriverHostActor on both nodes → ApplyAck → coordinator seals. Verifies both
NodeDeploymentState rows reach Applied and Deployment.Status reaches Sealed.
Exposed + fixed two production bugs along the way:
1. Coordinator was publishing DispatchDeployment on the "deployments" topic but
never subscribed to anything — DriverHostActor ACKs published on the same
topic could not reach it. Added dedicated "deployment-acks" topic with
coordinator subscription in PreStart, and DriverHostActor publishes ACKs
there.
2. NodeId derivation used member.Address.Host only — two cluster members on a
shared loopback host (test harness, dev VMs) collided to one identity. The
coordinator's expected-ack set became {1} and the system sealed after only
half the nodes acked. Switched to host:port everywhere (ClusterRoleInfo +
coordinator) so loopback nodes stay distinct and production identities are
harmlessly more specific.
Tests: 95 v2 tests pass (was 93 + 2 deploy tests), 0 skipped.
Failover scenarios (design §8 cases 3-7: node-kill-mid-apply, split-brain,
restart-during-deploy) deferred — they need controlled node-down primitives
on the harness. Tracked as F22 (failover scenario test cases).
Builds TwoNodeClusterHarness: two in-process Host-equivalent nodes sharing
an in-memory ConfigDb. Forms a 2-member Akka cluster. ClusterFormationTests
proves both nodes see each other as admin+driver role members.
Fixes a real production bug uncovered while wiring the harness — Program.cs
ran two separate ActorSystems (one from AddOtOpcUaCluster.AkkaHostedService
with cluster HOCON, one from Akka.Hosting.AddAkka with bare HOCON). Cluster
singletons landed on the bare ActorSystem and could not actually form a
cluster ("Configuration does not contain `akka.cluster` node").
Consolidation:
- AddOtOpcUaCluster now only binds AkkaClusterOptions + registers IClusterRoleInfo
- New WithOtOpcUaClusterBootstrap pushes embedded HOCON + Remote/Cluster options
into Akka.Hosting's AkkaConfigurationBuilder
- AkkaHostedService.cs deleted — Akka.Hosting now owns the lifecycle
- Program.cs + harness call WithOtOpcUaClusterBootstrap inside AddAkka
Why not WebApplicationFactory<Program>? Program.cs reads OTOPCUA_ROLES from
process env (shared across in-process WAFs); the harness replays Program.cs's
DI graph from a clean WebApplicationBuilder per node with per-node config
overrides. Same production extensions, isolated config + Kestrel + Akka ports.
Tests: 93 v2 tests pass (was 91 + 2 new cluster formation), 0 skipped.
Phase 1f — the consolidator migration. Closes out the v2 entity-model
rewrite by emitting a single EF migration that captures the cumulative
schema delta from 14a (RowVersion) through 14e (drop generation entities).
Generated: src/Core/ZB.MOM.WW.OtOpcUa.Configuration/Migrations/
20260526081556_V2HostingAlignment.cs (1562 lines)
20260526081556_V2HostingAlignment.Designer.cs
Migration shape (per `grep -nE migrationBuilder.\(...)`):
Drop 12 ForeignKey constraints (one per live-edit entity's GenerationId FK)
Drop 2 Tables (ConfigGeneration, ClusterNodeGenerationState)
Drop 45 Indexes (every UX_*_Generation_* and IX_*_Generation_* across the
13 live-edit tables — 1 also dropped the unique-Primary
filtered index UX_ClusterNode_Primary_Per_Cluster)
Drop 13 Columns (12 GenerationId + 1 RedundancyRole)
Add 12 RowVersion columns (one per live-edit entity)
Create 4 Tables (Deployment, NodeDeploymentState, ConfigEdit,
DataProtectionKeys)
Create ~45 Indexes (recreated under the new naming pattern
UX_<Table>_LogicalId / UX_<Table>_<X> with the
GenerationId column stripped from composite keys)
Notable EF quirks accepted:
Unique-on-required-column indexes (UX_VirtualTag_LogicalId etc.) ship a
`filter: "[VirtualTagId] IS NOT NULL"` clause that EF auto-inserts for
SQL Server. Harmless — the column is C#-side `required` so NULL never
appears.
Verification:
dotnet build src/Core/ZB.MOM.WW.OtOpcUa.Configuration -> 0 errors
dotnet ef migrations script --idempotent (against placeholder DSN)
-> 3259-line
.sql produced
OK
tests/Core/ZB.MOM.WW.OtOpcUa.Configuration.Tests -> 0 errors
Live `dotnet ef database update` against a scratch SQL Server deferred to
Task 15 (Migrate-To-V2.ps1) — SSH to the docker host needs a key/password I
don't have, and the always-on SQL at 10.100.0.35,14330 uses Integrated
Security (Windows auth, unreachable from this macOS dev). The migration
itself is structurally correct by construction (EF tooling generated it
against the live DbContext model); the live-DB confidence step is the
PowerShell wrapper's job.
SchemaComplianceTests updates:
- All_expected_tables_exist: removed ConfigGeneration +
ClusterNodeGenerationState; added Deployment, NodeDeploymentState,
ConfigEdit, DataProtectionKeys.
- Filtered_unique_indexes_match_schema_spec: removed entries for
UX_ClusterNode_Primary_Per_Cluster (Task 14d) and
UX_ConfigGeneration_Draft_Per_Cluster (Task 14e). Two filtered uniques
remain (UX_ClusterNodeCredential_Value, UX_ExternalIdReservation_KindValue_Active).
- Check_constraints_match_schema_spec: added CK_ConfigEdit_FieldsJson_IsJson.
StoredProceduresTests update:
- Removed RedundancyRole + 'Primary' from the raw INSERT into ClusterNode
so the DB-backed test runs against the new schema.
Phase 1e of the v2 entity-model rewrite. With the FKs gone (Task 14b) and
the apply pipeline replaced (Task 14c), the v1 draft/publish entities have
no remaining v2 consumers.
Deleted entity classes:
src/Core/ZB.MOM.WW.OtOpcUa.Configuration/Entities/ConfigGeneration.cs
src/Core/ZB.MOM.WW.OtOpcUa.Configuration/Entities/ClusterNodeGenerationState.cs
Deleted enum classes (no v2 consumers):
src/Core/ZB.MOM.WW.OtOpcUa.Configuration/Enums/GenerationStatus.cs
src/Core/ZB.MOM.WW.OtOpcUa.Configuration/Enums/NodeApplyStatus.cs
OtOpcUaConfigDbContext changes:
- Removed DbSet<ConfigGeneration> ConfigGenerations
- Removed DbSet<ClusterNodeGenerationState> ClusterNodeGenerationStates
- Removed ConfigureConfigGeneration(modelBuilder) call + method body
- Removed ConfigureClusterNodeGenerationState(modelBuilder) call + body
- Tidied the "v2 deploy-model tables" header comment
Navigation property cleanup:
- ServerCluster.Generations collection -> removed
- ClusterNode.GenerationState navigation -> removed
doc-comment cref cleanup (replaced <see cref="X"/> with <c>X</c> for the
deleted types so the C# XML comment compiler doesn't fail with CS1574):
- Deployment.cs (cref to ConfigGeneration)
- NodeDeploymentState.cs (cref to ClusterNodeGenerationState)
- Core/OpcUa/EquipmentNodeWalker.cs (cref to ConfigGeneration in the
EquipmentNamespaceContent record's doc-comment; while there, removed
"All four collections are scoped to the same ConfigGeneration" since
that's no longer true in v2)
Verification:
src/Core/ZB.MOM.WW.OtOpcUa.Configuration -> 0 errors
src/Core/ZB.MOM.WW.OtOpcUa.Core -> 0 errors
tests/Core/ZB.MOM.WW.OtOpcUa.Configuration.Tests -> 0 errors
tests/Core/ZB.MOM.WW.OtOpcUa.Core.Tests -> 0 errors
whole solution -> 15 errors
(all in Server/Admin; transitive Server.Tests/Admin.Tests skip per the
parent's failure, so the per-project count dropped vs Task 14d's 71)
Phase 1d of the v2 entity-model rewrite. The static RedundancyRole column
is replaced by Akka cluster's role-leader-of-"driver" election at runtime
(see RedundancyStateActor + ServiceLevelCalculator in Task 35).
Changes:
- Removed `public required RedundancyRole RedundancyRole` from
ClusterNode entity.
- Removed `e.Property(x => x.RedundancyRole).HasConversion<string>()...`
mapping from OtOpcUaConfigDbContext.ConfigureClusterNode.
- Removed the `UX_ClusterNode_Primary_Per_Cluster` filtered unique index
(filter referenced [RedundancyRole]='Primary').
- Dropped `using ZB.MOM.WW.OtOpcUa.Configuration.Enums` from ClusterNode.cs
(no longer needed).
- Deleted `Enums/RedundancyRole.cs` — the enum is unused in v2-kept code.
- DraftValidator: dropped the "exactly one Primary per cluster"
validation block. Comment in place explaining v2 picks primary at
runtime via Akka.
- DraftValidatorTests: dropped ValidateClusterTopology_flags_multiple_Primary
test; reworked BuildNode helper to no longer take a `role` argument.
Untouched (Server + Admin still reference RedundancyRole; accepted broken
per Task 56 policy):
src/Server/ZB.MOM.WW.OtOpcUa.Server/Redundancy/{ClusterTopologyLoader,
RedundancyStatePublisher, RedundancyTopology, ServiceLevelCalculator}.cs
src/Server/ZB.MOM.WW.OtOpcUa.Admin/Services/RedundancyMetrics.cs
DB-runtime tests will fail against the new schema (Task 14f's migration
drops the column) — to be updated in Task 14f's SchemaComplianceTests
update:
- SchemaComplianceTests.cs:55 (expected filtered index list)
- StoredProceduresTests.cs:263 (raw INSERT names the column)
Verification:
src/Core/ZB.MOM.WW.OtOpcUa.Configuration -> 0 errors
tests/Core/ZB.MOM.WW.OtOpcUa.Configuration.Tests -> 0 errors
whole solution -> 71 errors
(70 from Task 14b in Server/Admin, +1 new Server/Redundancy reference)
Phase 1c of the v2 entity-model rewrite. Deletes the draft/publish lifecycle
machinery that v2 replaces with AdminOperationsActor + ConfigComposer +
DriverInstanceActor.ApplyDelta.
Deleted (6 files):
src/Core/ZB.MOM.WW.OtOpcUa.Configuration/Apply/
IGenerationApplier.cs — interface for the apply pipeline
GenerationApplier.cs — the v1 applier coordinating per-driver hook-back
GenerationDiff.cs — typed wrapper over the sp_ComputeGenerationDiff
SQL output
ApplyCallbacks.cs — per-driver hook surface invoked by the applier
ChangeKind.cs — enum {Added, Modified, Removed, Unchanged}
tests/Core/ZB.MOM.WW.OtOpcUa.Configuration.Tests/GenerationApplierTests.cs
The empty Apply/ directory is removed.
Kept (repurposed in Task 39 for stale-config fallback):
src/Core/ZB.MOM.WW.OtOpcUa.Configuration/LocalCache/GenerationSealedCache.cs
src/Core/ZB.MOM.WW.OtOpcUa.Configuration/LocalCache/ResilientConfigReader.cs
tests/Core/ZB.MOM.WW.OtOpcUa.Configuration.Tests/GenerationSealedCacheTests.cs
tests/Core/ZB.MOM.WW.OtOpcUa.Configuration.Tests/ResilientConfigReaderTests.cs
Naming rename (GenerationSealedCache -> DeploymentArtifactCache) deferred
to Task 39 (DriverHostActor stale-config fallback) where the consumer is
written. The type stays available under its v1 name until then.
IDriver.cs doc-comment: replaced the "Used by IGenerationApplier..." sentence
with "Invoked by the v2 DriverInstanceActor when ApplyDelta reports that only
this driver's config changed in the new deployment."
Server/Admin breakage from Task 14b unchanged (70 errors). Configuration +
Core.Tests + Configuration.Tests stay green.
src/Core/ZB.MOM.WW.OtOpcUa.Configuration -> 0 errors
tests/Core/ZB.MOM.WW.OtOpcUa.Configuration.Tests -> 0 errors
whole solution -> 70 errors (all in Server/Admin)