Commit Graph

35 Commits

Author SHA1 Message Date
Joseph Doherty 6a8020e7e7 feat(adminui): native-alarm HistorizeToAveva opt-out 2026-06-16 16:27:31 -04:00
Joseph Doherty 93d9160dae feat(alarms): DriverHostActor routes native-condition acks to the owning driver [H6d] 2026-06-15 14:46:00 -04:00
Joseph Doherty 4501f12669 feat(vtags): wire IHistoryWriter through DriverHostActor (Null default; durable sink infra-gated) (H5d, stillpending §1) 2026-06-15 10:38:49 -04:00
Joseph Doherty b4af9e7f37 docs(comments): correct 7 stale 'later task/milestone' comments (stillpending §9) 2026-06-15 09:47:08 -04:00
Joseph Doherty c9643f68ba fix(runtime): restart driver no longer throws 'actor name is not unique'
v2-ci / build (push) Failing after 42s
v2-ci / unit-tests (tests/Core/ZB.MOM.WW.OtOpcUa.Cluster.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.ControlPlane.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Runtime.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Security.Tests) (push) Has been skipped
v2-ci / integration (tests/Server/ZB.MOM.WW.OtOpcUa.Host.IntegrationTests) (push) Has been skipped
v2-ci / integration (tests/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer.IntegrationTests) (push) Has been skipped
HandleRestartDriver stopped + respawned the child within one synchronous
message handler, reusing the base actor name drv-<id>. Context.Stop is async
(the child processes its own stop on its own mailbox), so the old child was
ALWAYS still registered when the respawn ran — Context.ActorOf threw
InvalidActorNameException deterministically on every AdminUI Restart press,
crashing + restarting the host.

Fix: a monotonic _childSpawnGeneration counter (single-threaded actor) feeds a
-g<gen> suffix on every spawned child name, so a respawn can never collide with
the still-terminating predecessor. Children are tracked by the _children dict
(by IActorRef), never by actor path, so the suffix is invisible to callers.
This also closes the same-shaped latent race in the reconcile path (a removed-
then-readded instance, and a driver-type-change ToStop+ToSpawn in one plan).

Regression test RestartDriver_respawns_the_child_without_an_actor_name_collision
(verified: FAILS on the old code with the exact InvalidActorNameException,
PASSES with the fix). Runtime.Tests 238/238 green. Code-reviewed (approved).
2026-06-15 05:41:18 -04:00
Joseph Doherty 7f313df7a6 fix(alarms): subscribe native alarms to un-gate the IAlarmSource feed
Phase B native alarms never fired end-to-end: GalaxyDriver suppresses OnAlarmEvent until
an alarm subscription exists (_alarmSubscriptions.Count > 0), but the runtime only attached
the OnAlarmEvent handler and never called SubscribeAlarmsAsync — so the central feed stayed
gated and no transition reached the Part 9 condition / /alerts. Unit tests passed because
they inject through the IAlarmSource seam directly; the deferred live /run surfaced it.

DriverHostActor computes per-driver alarm refs (alarm-bearing tags' FullNames) and hands them
via SetDesiredSubscriptions; DriverInstanceActor calls SubscribeAlarmsAsync for IAlarmSource
drivers on Connected entry and whenever alarm refs are pushed while Connected (the deploy path),
idempotent via a cached handle reset on detach so reconnect re-subscribes.
2026-06-15 00:42:43 -04:00
Joseph Doherty f9be38430c fix(alarms): route native alarms by ConditionId (dotted FullName), not bare SourceNodeId (integration review)
v2-ci / build (push) Failing after 46s
v2-ci / unit-tests (tests/Core/ZB.MOM.WW.OtOpcUa.Cluster.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.ControlPlane.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Runtime.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Security.Tests) (push) Has been skipped
v2-ci / integration (tests/Server/ZB.MOM.WW.OtOpcUa.Host.IntegrationTests) (push) Has been skipped
v2-ci / integration (tests/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer.IntegrationTests) (push) Has been skipped
2026-06-14 04:09:01 -04:00
Joseph Doherty 7e86fa7099 fix(alarms): normalise native TransitionKind to canonical EmissionKind vocabulary (review) 2026-06-14 03:58:46 -04:00
Joseph Doherty 8736fcc37c feat(alarms): Primary-gated AlarmTransitionEvent fan-out for native alarms (Phase B WS-5) 2026-06-14 03:48:41 -04:00
Joseph Doherty 4c56a1719b feat(alarms): DriverHostActor routes native alarm transitions to Part 9 conditions (Phase B WS-4c) 2026-06-14 03:34:25 -04:00
Joseph Doherty 4cda275b8d fix(runtime): fast-fail RouteNodeWrite while Stale + micro-opts + raw-blob routing test 2026-06-14 00:16:47 -04:00
Joseph Doherty f8f1027287 feat(runtime): NodeId->driver reverse routing + primary-gated RouteNodeWrite 2026-06-13 11:44:26 -04:00
Joseph Doherty c4435e4fd6 feat(runtime): route driver values to folder-scoped equipment NodeIds (live-value delivery)
v2-ci / build (push) Failing after 44s
v2-ci / unit-tests (tests/Core/ZB.MOM.WW.OtOpcUa.Cluster.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.ControlPlane.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Runtime.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Security.Tests) (push) Has been skipped
v2-ci / integration (tests/Server/ZB.MOM.WW.OtOpcUa.Host.IntegrationTests) (push) Has been skipped
v2-ci / integration (tests/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer.IntegrationTests) (push) Has been skipped
2026-06-13 06:32:38 -04:00
Joseph Doherty 7d25480fee docs(galaxy): neutralize remaining stale SystemPlatform/alias terminology in comments + a test name
Replace "SystemPlatform mirror tag", "Galaxy alias", and "SystemPlatform-kind" in doc-comments and
test names with neutral accurate wording ("FolderPath-scoped tag", "EquipmentId == null", etc.).
No code, logic, or test bodies changed — comments and one test method name only.
2026-06-12 22:30:50 -04:00
Joseph Doherty 5edea52bd7 docs(galaxy): fix stale SystemPlatform/alias/Galaxy doc comments (review follow-up)
Resolves the code-review notes on 95be607a + the AdminUI bundle: the
EnsureVariable docs (IOpcUaAddressSpaceSink, OtOpcUaNodeManager) and the Tag
entity doc no longer say 'Galaxy / SystemPlatform / alias'; the DriverHostActor
ForwardToMux comment now states the real equipment-tag value-routing gap (the
FullName→NodeId 'live values' milestone) instead of claiming Galaxy values map
straight through.
2026-06-12 22:00:52 -04:00
Joseph Doherty 95be607a07 feat(opcua): remove SystemPlatform-mirror GalaxyTags contract end-to-end (composer+applier+artifact, byte-parity) 2026-06-12 21:45:19 -04:00
Joseph Doherty 06c415598c feat(redundancy): gate scripted-alarm alerts publish on Primary (A1) 2026-06-11 08:44:44 -04:00
Joseph Doherty 5256761368 feat(scripted-alarms): spawn + apply ScriptedAlarmHostActor in DriverHostActor (T10) 2026-06-10 15:17:29 -04:00
Joseph Doherty 2bfe18abcf chore(runtime): warn on missing VirtualTag evaluator; document Stale-recovery VirtualTag behaviour
Log a WARNING on startup when IVirtualTagEvaluator is not registered so a DI misconfig on a
driver-role node is visible in logs instead of silently evaluating all VirtualTags to NoChange.
Add a comment in PushDesiredSubscriptions noting that TryRecoverFromStale does not call this
method, so VirtualTags remain empty after a Stale recovery until the next deployment dispatch
(intentional, consistent with driver recovery).
2026-06-07 05:46:24 -04:00
Joseph Doherty 397f9b783a feat(runtime): spawn+apply VirtualTagHostActor on deploy apply and restore 2026-06-07 05:41:04 -04:00
Joseph Doherty 1b7f995aea feat(runtime): DriverHost spawns + subscribes only its own ClusterId's drivers 2026-06-07 03:19:22 -04:00
Joseph Doherty b1b3f3ff23 fix(runtime): materialise from applied artifact + restore served state on bootstrap
v2-ci / build (push) Failing after 47s
v2-ci / unit-tests (tests/Core/ZB.MOM.WW.OtOpcUa.Cluster.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.ControlPlane.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Runtime.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Security.Tests) (push) Has been skipped
v2-ci / integration (tests/Server/ZB.MOM.WW.OtOpcUa.Host.IntegrationTests) (push) Has been skipped
v2-ci / integration (tests/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer.IntegrationTests) (push) Has been skipped
Two ordering/lifecycle gaps surfaced once tag values began streaming:

1. OpcUaPublishActor.HandleRebuild loaded the latest *Sealed* artifact, but the
   rebuild fires at apply time — before this deployment seals — so it materialised
   the PREVIOUS revision while SubscribeBulk subscribed to the applied one. The two
   disagreed (4 variables materialised vs 396 subscribed) and every config needed
   two deploys. RebuildAddressSpace now carries the applied DeploymentId and the
   rebuild loads that exact artifact.

2. On restart a node recovered its revision from NodeDeploymentState but left the
   driver children + address space empty (and an identical-config redeploy no-ops on
   the unchanged revision), so a rebuilt node served nothing until a config change.
   Bootstrap now calls RestoreApplied: re-spawn drivers, rebuild from the applied
   artifact, re-push SubscribeBulk — no re-ack.

Verified live: recreating the driver nodes auto-restores all 396 galaxy mirror
tags across 40 machines with Good live values, no deploy required.
2026-06-06 12:53:38 -04:00
Joseph Doherty c1ce5833e9 feat(runtime): wire driver SubscribeBulk pass so tag values stream
v2-ci / build (push) Failing after 51s
v2-ci / unit-tests (tests/Core/ZB.MOM.WW.OtOpcUa.Cluster.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.ControlPlane.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Runtime.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Security.Tests) (push) Has been skipped
v2-ci / integration (tests/Server/ZB.MOM.WW.OtOpcUa.Host.IntegrationTests) (push) Has been skipped
v2-ci / integration (tests/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer.IntegrationTests) (push) Has been skipped
Materialised SystemPlatform/Galaxy variables previously stayed
BadWaitingForInitialData because nothing told the driver to subscribe
(OpcUaPublishActor TODO 'on a future SubscribeBulk pass') and published
values were only forwarded to the VirtualTag mux, never the OPC UA sink.

DriverHostActor now, after each apply, groups the deployment's galaxy tag
MXAccess refs by driver and sends DriverInstanceActor.SetDesiredSubscriptions;
the actor retains the set and (re)subscribes on every Connected entry, so
values resume after reconnects/redeploys (closes the F8b/#113 gap). Published
values are also forwarded to OpcUaPublishActor as AttributeValueUpdate
(NodeId == galaxy MxAccessRef) so the materialised variable shows live data.

Verified live in docker-dev: galaxy TestMachine_001 tags go Good with a
changing TestChangingInt. +1 unit test.
2026-06-06 12:31:55 -04:00
Joseph Doherty 662f3f9f5c refactor(driver-pages): address Phase 6/8 deep-review findings
v2-ci / build (push) Failing after 32s
v2-ci / unit-tests (tests/Core/ZB.MOM.WW.OtOpcUa.Cluster.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.ControlPlane.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Runtime.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Security.Tests) (push) Has been skipped
v2-ci / integration (tests/Server/ZB.MOM.WW.OtOpcUa.Host.IntegrationTests) (push) Has been skipped
v2-ci / integration (tests/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer.IntegrationTests) (push) Has been skipped
- Topic-name drift fix: DriverHealthChanged.TopicName and
  DriverControlTopic.Name now live on the message contracts in
  Commons. AkkaDriverHealthPublisher, DriverStatusSignalRBridge,
  DriverHostActor, and AdminOperationsActor all delegate to the
  single constant so a rename can't silently desynchronise
  publisher and subscriber.
- DriverStatusPanel._opResultClearTimer switched from
  System.Timers.Timer to System.Threading.Timer + awaited
  DisposeAsync. Prevents an in-flight 8s clear-callback from
  invoking StateHasChanged on a component whose hub has already
  been released.
- PublishHealthSnapshot deduplicates against the last published
  (state, lastSuccess, lastError, errorCount) fingerprint. The
  30s heartbeat no longer floods the SignalR layer with identical
  Healthy snapshots — newly-joined clients still warm up via the
  snapshot store on JoinDriver.
2026-05-28 11:52:20 -04:00
Joseph Doherty dcd2509548 refactor(driver-pages): address post-review follow-ups
- DriverInstanceSpec carries ClusterId from the deployment artifact;
  DriverHostActor threads the real cluster identity into
  DriverInstanceActor instead of the local NodeId. Old pre-PR
  artifacts without a ClusterId field fall back to the NodeId so
  in-flight deployments keep working.
- DriverHostActor.ChildEntry holds the full DriverInstanceSpec
  (was only carrying DriverType + LastConfigJson). Restart respawns
  preserve RowId, Name, Enabled, ClusterId — no placeholder values.
- Drop the unnecessary _faultLock on DriverInstanceActor — every
  read/write site runs inside an Akka message handler which is
  single-threaded per actor instance.
- DriverStatusPanel.DisposeAsync awaits Timer.DisposeAsync so an
  in-flight 5s tick can't invoke StateHasChanged on a component
  whose hub has already been torn down.
2026-05-28 11:41:46 -04:00
Joseph Doherty ffcc8d1065 feat(adminui): Reconnect/Restart on DriverStatusPanel (DriverOperator-gated)
- RestartDriver / ReconnectDriver messages + AdminOperationsActor
  handlers (broadcast via driver-control DPS topic; audited via
  ConfigEdits).
- DriverHostActor subscribes to driver-control; locates the
  matching child DriverInstanceActor and stops+respawns it
  (Restart) or sends it a ForceReconnect internal message
  (Reconnect — re-enters Reconnecting state without full stop).
  DriverInstanceSpec constructor call uses named args to handle
  the full 6-parameter signature.
- New DriverOperator authorization policy mapped to DriverOperator
  or FleetAdmin role; documented in docs/security.md. Map LDAP
  group via GroupToRole (e.g. "ot-driver-operator": "DriverOperator").
- DriverStatusPanel renders Reconnect + Restart buttons when the
  user holds the DriverOperator policy (hidden otherwise). Restart
  requires an in-page Razor confirm block (no JS confirm, keeps
  SignalR event loop unblocked). Both buttons show a spinner and
  are disabled during in-flight; result chip auto-clears after 8s.
  Username sourced from AuthenticationStateProvider.

Reconnect resolves to "ForceReconnect" (re-enter Reconnecting,
not full stop+respawn) — transport drops and retries while actor
and in-memory state are preserved. All DriverInstanceActor states
handle ForceReconnect safely (no-op when already in transition).
2026-05-28 11:14:04 -04:00
Joseph Doherty 4203b84d51 feat(runtime): publish DriverHealthChanged via DriverInstanceActor
- IDriverHealthPublisher in Core.Abstractions + NullDriverHealthPublisher
  no-op for tests/dev-stub paths.
- AkkaDriverHealthPublisher in Runtime forwards to the cluster-wide
  `driver-health` DPS topic.
- DriverInstanceActor instrumented to publish snapshots on every
  observable state change + a periodic 30s heartbeat so the AdminUI
  snapshot store warms up for newly-joined SignalR clients.
- Sliding 5-minute Faulted-count tracked per actor via Queue<DateTime>.
- DriverHostActor.SpawnChild threads clusterId (_localNode.Value) and
  the health publisher down to every DriverInstanceActor child.
- ServiceCollectionExtensions.AddOtOpcUaRuntime registers
  AkkaDriverHealthPublisher as IDriverHealthPublisher singleton.
2026-05-28 10:22:44 -04:00
Joseph Doherty 64e3fbe035 docs: backfill XML documentation across 756 files
v2-ci / build (push) Failing after 1m43s
v2-ci / unit-tests (tests/Core/ZB.MOM.WW.OtOpcUa.Cluster.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.ControlPlane.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Runtime.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Security.Tests) (push) Has been skipped
v2-ci / integration (tests/Server/ZB.MOM.WW.OtOpcUa.Host.IntegrationTests) (push) Has been skipped
v2-ci / integration (tests/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer.IntegrationTests) (push) Has been skipped
Adds <summary>, <param>, <typeparam>, and <inheritdoc/> tags to public
members surfaced by commentchecker — resolves 5,847 of 5,869 issues
(99.6%) across three /fixdocs passes.
2026-05-28 08:10:17 -04:00
Joseph Doherty 52997ee164 feat(observability): F13d Prometheus + OpenTelemetry instrumentation
v2-ci / build (push) Failing after 38s
v2-ci / unit-tests (tests/Core/ZB.MOM.WW.OtOpcUa.Cluster.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.ControlPlane.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Runtime.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Security.Tests) (push) Has been skipped
v2-ci / integration (push) Has been skipped
OtOpcUaTelemetry (Commons/Observability) centralizes the project's Meter
+ ActivitySource so all instrumentation points emit through a single
named surface. Counters cover the hot paths:

  otopcua.deploy.applied               (outcome=ack|reject)
  otopcua.deploy.apply.duration        (s, histogram)
  otopcua.driver.lifecycle             (event=spawn|spawn_stub|stop|fault)
  otopcua.virtualtag.eval              (outcome=ok|fail|skip)
  otopcua.scriptedalarm.transition     (state=activated|acknowledged|cleared)
  otopcua.opcua.sink.write             (kind=value|alarm|rebuild)
  otopcua.redundancy.service_level_change (level=byte)

Plus two ActivitySource spans:

  otopcua.deploy.apply                 wraps DriverHostActor.ApplyAndAck
  otopcua.opcua.address_space_rebuild  wraps OpcUaPublishActor.HandleRebuild

Instruments are no-op until a listener attaches, so tests + dev hosts
pay nothing for unread telemetry.

Host Program.cs gains AddOtOpcUaObservability() (binds the OtOpcUa Meter
+ ActivitySource to OpenTelemetry, attaches a Prometheus exporter) and
MapOtOpcUaMetrics() (mounts /metrics scrape endpoint). Driver-side
internals + ASP.NET request metrics deliberately stay off — the scrape
payload is scoped to OtOpcUa signals only.

Tests use MeterListener + ActivityListener to verify
VirtualTagActor.eval, OpcUaPublishActor.AttributeValueUpdate, and
RebuildAddressSpace actually emit on the central instruments. Runtime
suite is 72 / 72 green (+3).

Closes #105. Path A (F13b/c/d) complete; next batch options: #85 UNS
folder hierarchy in SDK, or F8b/F9b production engine bindings.
2026-05-26 10:29:40 -04:00
Joseph Doherty 7e22e2250c feat(runtime): #109 OpcUaPublishActor — load artifact, compose, plan-diff, apply
v2-ci / build (push) Failing after 45s
v2-ci / unit-tests (tests/Core/ZB.MOM.WW.OtOpcUa.Cluster.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.ControlPlane.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Runtime.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Security.Tests) (push) Has been skipped
v2-ci / integration (push) Has been skipped
Closes the loop between F10b (SDK NodeManager) and F14 (Phase7Plan +
Phase7Applier). DriverHostActor's successful apply now triggers a
RebuildAddressSpace on the publish actor, which loads the latest
deployment artifact + walks composer → planner → applier through the
sink. The OPC UA address space tracks the deployed composition.

DeploymentArtifact:
  - New ParseComposition(blob) → Phase7CompositionResult that decodes
    Equipment + DriverInstance + ScriptedAlarm arrays into the
    projection records Phase7Planner consumes. Pascal-case property
    names mirror ConfigComposer.SnapshotAndFlattenAsync's output.
  - Each entity reader is tolerant: missing-id rows are dropped,
    natural-key sort matches Phase7Composer's contract.

OpcUaPublishActor:
  - New Props params: dbFactory + applier. When wired, RebuildAddressSpace
    does:
      1. LoadLatestArtifact (most recent Sealed Deployment.ArtifactBlob)
      2. ParseComposition → Phase7CompositionResult
      3. Phase7Planner.Compute(lastApplied, next) → Phase7Plan
      4. Empty plan ⇒ no-op (deploy of unchanged composition is benign)
      5. applier.Apply(plan) drives sink.RebuildAddressSpace +
         WriteAlarmState for removed nodes
      6. lastApplied = next so the next rebuild diffs forward
  - Without dbFactory/applier wiring, falls back to raw
    sink.RebuildAddressSpace — the dev/Mac path before #108 binds prod.

DriverHostActor:
  - New Props param opcUaPublishActor (IActorRef?). After successful
    ApplyAndAck (status Applied, ACK sent), tells the publish actor
    RebuildAddressSpace with the same correlation id so the audit trail
    threads through. Null publish actor ⇒ no trigger (admin-only nodes).

Tests: Runtime 63 -> 69 (+6):
- ParseComposition reads Equipment/Driver/Alarm sorted by natural key
- ParseComposition returns empty for empty blob
- Rebuild with dbFactory + sealed deployment artifact triggers exactly
  one sink.Rebuild call (Equipment topology added)
- Rebuild with no artifact is idempotent no-op
- Second rebuild with same composition is empty-plan no-op
- Rebuild without dbFactory falls back to raw sink.Rebuild (legacy path)

All 6 v2 test suites green: 173 tests passing.

Closes #109. Engine-wiring data flow is now end-to-end through:
  Deploy → DriverHostActor.ApplyAndAck → driver spawn + ACK +
    RebuildAddressSpace → OpcUaPublishActor → Phase7Applier → SDK
    NodeManager → subscribed OPC UA clients see the change.
2026-05-26 09:55:11 -04:00
Joseph Doherty 7fa863f6da feat(runtime): #113 DependencyMuxActor — drivers → virtual-tag fan-out
v2-ci / build (push) Failing after 36s
v2-ci / unit-tests (tests/Core/ZB.MOM.WW.OtOpcUa.Cluster.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.ControlPlane.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Runtime.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Security.Tests) (push) Has been skipped
v2-ci / integration (push) Has been skipped
End-to-end data path is now wired on the read side: driver subscriptions
fire AttributeValuePublished → DriverHostActor → DependencyMuxActor →
DependencyValueChanged to every interested VirtualTagActor. Previously
the publish hit a dead-letter at the host.

DependencyMuxActor:
  - Per-node fan-out router. Maintains tagRef → Set<IActorRef> with a
    reverse subscriber → refs index so unregister/replace are O(refs).
  - Watches subscribers; Terminated triggers automatic unregister so
    dead virtual-tag actors stop receiving publishes.
  - Re-register replaces the prior interest set — no stale-ref leaks
    on actor restart.
  - Drops publishes for refs with no interested subscribers.

VirtualTagActor:
  - New Props params: dependencyRefs + mux ActorRef.
  - PreStart sends RegisterInterest to the mux; PostStop sends
    UnregisterInterest. Default both null so older callers stay quiet.

DriverHostActor:
  - New dependencyMux Props param. Steady + Applying states now
    receive AttributeValuePublished from their DriverInstance children
    and forward to the mux. Null mux is a no-op (dev/Mac).

ServiceCollectionExtensions:
  - WithOtOpcUaRuntimeActors spawns DependencyMuxActor before
    DriverHostActor and threads its ActorRef into the host's Props.
    New DependencyMuxActorKey + DependencyMuxActorName.

Tests: Runtime 57 -> 63 (+6):
- Mux forwards to only subscribers interested in each ref
- Publish for unregistered ref is dropped silently
- Unregister stops forwarding
- Re-register replaces prior interest set
- VirtualTagActor PreStart registration drives end-to-end eval
  (uses AwaitAssert to race-safely settle the PreStart Tell)
- DriverHostActor forwards AttributeValuePublished through to mux

All 6 v2 test suites green: 163 tests passing.

F8 (#79) state updated — dep subscribe seam shipped, Core.VirtualTags
production engine binding (compile + ITagUpstreamSource subscribe) is
the residual.
2026-05-26 09:43:06 -04:00
Joseph Doherty da141497f8 feat(runtime): F7 spawn lifecycle + F20 ShouldStub gate
DriverHostActor.ApplyAndAck now reads the deployment artifact and
reconciles its set of DriverInstanceActor children — spawn the missing,
ApplyDelta to those with changed config, stop the removed/disabled.
The diff lives in pure DriverSpawnPlanner so it can be unit-tested
without an ActorSystem.

Adds IDriverFactory in Core.Abstractions (consumed by Runtime) +
DriverFactoryRegistryAdapter in Core.Hosting that wraps the existing
v1 DriverFactoryRegistry — Runtime stays decoupled from Polly/Serilog,
the Host wires the adapter once driver assemblies have registered.

ShouldStub(type, roles) is now actually called on every spawn — Galaxy
+ Wonderware-Historian boot stubbed on macOS/Linux or whenever the host
carries the dev role. Missing factory ⇒ stub fallback, never a crash.

Tests: 24 → 34 in Runtime (+10):
- DriverSpawnPlannerTests x7 (diff cases, type change ⇒ stop+respawn)
- DeploymentArtifactTests  x5 (empty/malformed/missing fields tolerant)
- DriverHostActorReconcileTests x4 (spawn count, stub fallback,
  ShouldStub gate, second-apply stops the removed)
All 6 v2 test suites green: 120 tests passing.

Closes F20 (ShouldStub wired). F7 marked partial — subscription
publishing + write path still stubbed in DriverInstanceActor itself.
2026-05-26 08:57:16 -04:00
Joseph Doherty 8f32b89fb9 feat(adminui): FleetDiagnosticsClient real Akka ActorSelection round-trip (F17)
v2-ci / unit-tests (tests/Core/ZB.MOM.WW.OtOpcUa.Cluster.Tests) (push) Has been cancelled
v2-ci / build (push) Has been cancelled
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.ControlPlane.Tests) (push) Has been cancelled
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer.Tests) (push) Has been cancelled
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Runtime.Tests) (push) Has been cancelled
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Security.Tests) (push) Has been cancelled
v2-ci / integration (push) Has been cancelled
- New Commons.Messages.Fleet.GetDiagnostics request record.
- DriverHostActor handles GetDiagnostics in all three states (Steady, Applying,
  Stale); replies with a NodeDiagnosticsSnapshot built from _currentRevision
  + the local NodeId. Drivers list is empty until F7 wires the per-instance
  children.
- FleetDiagnosticsClient now resolves the target via ActorSelection at
  akka.tcp://{system}@{nodeId}/user/driver-host and Asks with a 3s timeout.
  On timeout/peer-down it returns an empty snapshot so the UI degrades
  gracefully rather than throwing.

Two new integration tests in Host.IntegrationTests:
- GetDiagnostics_returns_snapshot_with_target_NodeId verifies the
  cross-node Ask/Reply works.
- GetDiagnostics_after_deploy_reports_current_revision exercises the
  end-to-end path: AdminOps starts a deployment, both DriverHostActors
  apply, then diagnostics reports the new revision on both nodes.

All 98 v2 tests pass (was 96 + 2 new).
2026-05-26 06:58:11 -04:00
Joseph Doherty 5cfbe8b5dd test(host): deploy happy-path + idempotency integration tests (Task 59)
DeployHappyPathTests exercises the full deploy pipeline on the 2-node harness:
AdminOperationsActor → ConfigPublishCoordinator → DistributedPubSub →
DriverHostActor on both nodes → ApplyAck → coordinator seals. Verifies both
NodeDeploymentState rows reach Applied and Deployment.Status reaches Sealed.

Exposed + fixed two production bugs along the way:

1. Coordinator was publishing DispatchDeployment on the "deployments" topic but
   never subscribed to anything — DriverHostActor ACKs published on the same
   topic could not reach it. Added dedicated "deployment-acks" topic with
   coordinator subscription in PreStart, and DriverHostActor publishes ACKs
   there.

2. NodeId derivation used member.Address.Host only — two cluster members on a
   shared loopback host (test harness, dev VMs) collided to one identity. The
   coordinator's expected-ack set became {1} and the system sealed after only
   half the nodes acked. Switched to host:port everywhere (ClusterRoleInfo +
   coordinator) so loopback nodes stay distinct and production identities are
   harmlessly more specific.

Tests: 95 v2 tests pass (was 93 + 2 deploy tests), 0 skipped.

Failover scenarios (design §8 cases 3-7: node-kill-mid-apply, split-brain,
restart-during-deploy) deferred — they need controlled node-down primitives
on the harness. Tracked as F22 (failover scenario test cases).
2026-05-26 06:34:36 -04:00
Joseph Doherty ed130135ca feat(runtime): DriverHostActor state machine with PreStart recovery + DispatchDeployment + stale fallback 2026-05-26 05:02:42 -04:00