Commit Graph

139 Commits

Author SHA1 Message Date
Joseph Doherty 0a747c343d feat(runtime): decode IsArray/ArrayLength byte-parity in DeploymentArtifact 2026-06-16 21:40:22 -04:00
Joseph Doherty 6a8020e7e7 feat(adminui): native-alarm HistorizeToAveva opt-out 2026-06-16 16:27:31 -04:00
Joseph Doherty fcb3801415 fix(historian): dead-letter poison events after maxAttempts (finding 002) 2026-06-16 05:25:43 -04:00
Joseph Doherty 93d9160dae feat(alarms): DriverHostActor routes native-condition acks to the owning driver [H6d] 2026-06-15 14:46:00 -04:00
Joseph Doherty 4af8e65af1 fix(redundancy): PeerProbeSupervisor explicitly ignores co-mingled OpcUaProbeResult (integration review)
v2-ci / build (push) Failing after 34s
v2-ci / unit-tests (tests/Core/ZB.MOM.WW.OtOpcUa.Cluster.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.ControlPlane.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Runtime.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Security.Tests) (push) Has been skipped
v2-ci / integration (tests/Server/ZB.MOM.WW.OtOpcUa.Host.IntegrationTests) (push) Has been skipped
v2-ci / integration (tests/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer.IntegrationTests) (push) Has been skipped
2026-06-15 13:40:16 -04:00
Joseph Doherty 4c78dcd358 feat(redundancy): wire dbHealth into OpcUaPublishActor + spawn PeerProbeSupervisor per node 2026-06-15 13:33:34 -04:00
Joseph Doherty 5a064e086d test(redundancy): lock in stale-Terminated guard + clarify OnTerminated (code-review) 2026-06-15 13:29:58 -04:00
Joseph Doherty f41e957e07 feat(redundancy): PeerProbeSupervisor maintains one peer OPC UA probe per driver peer 2026-06-15 13:22:38 -04:00
Joseph Doherty 37b32a5623 feat(redundancy): periodic HealthTick refreshes DB reachability via Ask/PipeTo 2026-06-15 13:15:26 -04:00
Joseph Doherty 5382eea9b5 test(redundancy): cover stale-probe-not-demoted branch + make _probeFreshnessWindow readonly (code-review) 2026-06-15 13:11:01 -04:00
Joseph Doherty cf278035d2 feat(redundancy): OpcUaProbeOk from peer-probes-me with freshness debounce 2026-06-15 13:04:41 -04:00
Joseph Doherty a9ff1a64b2 fix(redundancy): always publish first ServiceLevel (even 0) + log SafeSelfStatus failures (code-review) 2026-06-15 13:00:25 -04:00
Joseph Doherty 3e609a2b19 feat(redundancy): OpcUaPublishActor computes ServiceLevel via calculator (DB+stale+leader; legacy seam) 2026-06-15 12:51:32 -04:00
Joseph Doherty c6a543d1b6 docs(vtags): note rename-respawn transient + write-side-only historize (integration review)
v2-ci / build (push) Failing after 44s
v2-ci / unit-tests (tests/Core/ZB.MOM.WW.OtOpcUa.Cluster.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.ControlPlane.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Runtime.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Security.Tests) (push) Has been skipped
v2-ci / integration (tests/Server/ZB.MOM.WW.OtOpcUa.Host.IntegrationTests) (push) Has been skipped
v2-ci / integration (tests/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer.IntegrationTests) (push) Has been skipped
2026-06-15 10:50:08 -04:00
Joseph Doherty 4501f12669 feat(vtags): wire IHistoryWriter through DriverHostActor (Null default; durable sink infra-gated) (H5d, stillpending §1) 2026-06-15 10:38:49 -04:00
Joseph Doherty 0c6d4c5491 feat(vtags): forward historized vtag results to IHistoryWriter (H5c, stillpending §1) 2026-06-15 10:26:25 -04:00
Joseph Doherty 9c5a091395 feat(vtags): decode VirtualTag Historize from artifact, byte-parity with composer (H5b, stillpending §1) 2026-06-15 10:17:08 -04:00
Joseph Doherty ebf2f1dd7a fix(vtags): prune _planByVtag on child termination + crash-then-change test (H1b review follow-up) 2026-06-15 10:12:11 -04:00
Joseph Doherty ada01e1af8 fix(vtags): respawn equipment virtualtag child on in-place plan change (H1b, stillpending §1) 2026-06-15 10:05:29 -04:00
Joseph Doherty b4af9e7f37 docs(comments): correct 7 stale 'later task/milestone' comments (stillpending §9) 2026-06-15 09:47:08 -04:00
Joseph Doherty 013882262a fix(galaxy): bound alarm-subscription handles to one (no reconnect leak)
v2-ci / build (push) Failing after 44s
v2-ci / unit-tests (tests/Core/ZB.MOM.WW.OtOpcUa.Cluster.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.ControlPlane.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Runtime.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Security.Tests) (push) Has been skipped
v2-ci / integration (tests/Server/ZB.MOM.WW.OtOpcUa.Host.IntegrationTests) (push) Has been skipped
v2-ci / integration (tests/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer.IntegrationTests) (push) Has been skipped
GalaxyDriver's StreamAlarms feed is session-less and survives an in-place
reconnect, so DriverInstanceActor re-subscribed on every Connected re-entry
(after dropping its own cached handle without an Unsubscribe — sync teardown).
The re-subscribe was additive: _alarmSubscriptions.Add grew the list by one
untracked handle per reconnect cycle — a slow unbounded leak. Functionally
harmless (the gate is Count>0 and OnAlarmFeedTransition only reads [0], firing
once regardless), but it accumulated forever.

Fix: SubscribeAlarmsAsync clears the set before adding, collapsing to a single
live handle (under the existing _alarmHandlersLock, atomic w.r.t. the fan-out
reader). There is exactly one consumer per driver instance (factory-per-actor
lifecycle), so replacing the set with the latest handle is faithful. Chosen
over making the actor's sync DetachAlarmSource call UnsubscribeAlarmsAsync
async/fire-and-forget — disproportionate for a minor leak.

Regression test Re_subscribe_collapses_to_a_single_handle_no_accumulation
(TDD-verified: FAILS without the Clear — releasing the latest handle leaves
the feed open because stale handles remain; PASSES with the fix). Galaxy tests
263 pass / 3 skip; Runtime native-alarm 24 pass. Code-reviewed (approved).
2026-06-15 05:49:07 -04:00
Joseph Doherty c9643f68ba fix(runtime): restart driver no longer throws 'actor name is not unique'
v2-ci / build (push) Failing after 42s
v2-ci / unit-tests (tests/Core/ZB.MOM.WW.OtOpcUa.Cluster.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.ControlPlane.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Runtime.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Security.Tests) (push) Has been skipped
v2-ci / integration (tests/Server/ZB.MOM.WW.OtOpcUa.Host.IntegrationTests) (push) Has been skipped
v2-ci / integration (tests/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer.IntegrationTests) (push) Has been skipped
HandleRestartDriver stopped + respawned the child within one synchronous
message handler, reusing the base actor name drv-<id>. Context.Stop is async
(the child processes its own stop on its own mailbox), so the old child was
ALWAYS still registered when the respawn ran — Context.ActorOf threw
InvalidActorNameException deterministically on every AdminUI Restart press,
crashing + restarting the host.

Fix: a monotonic _childSpawnGeneration counter (single-threaded actor) feeds a
-g<gen> suffix on every spawned child name, so a respawn can never collide with
the still-terminating predecessor. Children are tracked by the _children dict
(by IActorRef), never by actor path, so the suffix is invisible to callers.
This also closes the same-shaped latent race in the reconcile path (a removed-
then-readded instance, and a driver-type-change ToStop+ToSpawn in one plan).

Regression test RestartDriver_respawns_the_child_without_an_actor_name_collision
(verified: FAILS on the old code with the exact InvalidActorNameException,
PASSES with the fix). Runtime.Tests 238/238 green. Code-reviewed (approved).
2026-06-15 05:41:18 -04:00
Joseph Doherty a833d1b4aa fix(alarms): address code review — accurate reconnect comment + SubscribeAlarms drop handlers
- Correct the misleading DetachAlarmSource comment: a session-less feed (Galaxy) is NOT
  torn down on an in-place reconnect, so re-subscribe is additive (harmless; gate reads [0]).
- Add trace-only SubscribeAlarms drop handlers in Connecting/Reconnecting (symmetry with
  NativeAlarmRaised) so a self-tell overtaken by a queued disconnect doesn't dead-letter.
- Document the deliberate no-unsubscribe-on-empty asymmetry vs the value path.
Behavior-neutral for the un-gate path. Minor handle-accumulation leak tracked as follow-up.
2026-06-15 00:49:19 -04:00
Joseph Doherty 7f313df7a6 fix(alarms): subscribe native alarms to un-gate the IAlarmSource feed
Phase B native alarms never fired end-to-end: GalaxyDriver suppresses OnAlarmEvent until
an alarm subscription exists (_alarmSubscriptions.Count > 0), but the runtime only attached
the OnAlarmEvent handler and never called SubscribeAlarmsAsync — so the central feed stayed
gated and no transition reached the Part 9 condition / /alerts. Unit tests passed because
they inject through the IAlarmSource seam directly; the deferred live /run surfaced it.

DriverHostActor computes per-driver alarm refs (alarm-bearing tags' FullNames) and hands them
via SetDesiredSubscriptions; DriverInstanceActor calls SubscribeAlarmsAsync for IAlarmSource
drivers on Connected entry and whenever alarm refs are pushed while Connected (the deploy path),
idempotent via a cached handle reset on detach so reconnect re-subscribes.
2026-06-15 00:42:43 -04:00
Joseph Doherty 83e1318425 refactor(historian): align ServerHistorianOptions with AlarmHistorian (Port default, Validate list, log context) 2026-06-14 20:23:22 -04:00
Joseph Doherty a6f1f4ef15 feat(historian): AddServerHistorian DI + Host wiring of IHistorianDataSource 2026-06-14 20:17:10 -04:00
Joseph Doherty 50e1141fc2 test(historian): assert Deferred sink forwards historianTagname + doc nits
I1: DeferredAddressSpaceSinkTests.RecordingSink now captures HistorianTagname
per EnsureVariable call (HistorianQueue/HistorianCalls, matching the
Phase7ApplierTests pattern); new test EnsureVariable_forwards_historianTagname_to_inner_sink
asserts the arg is forwarded unchanged through DeferredAddressSpaceSink.

M1: OtOpcUaNodeManager.EnsureVariable doc-comment notes that a changed
historize intent on an already-registered node is silently ignored until
a RebuildAddressSpace (rebuild precondition for Task 3 implementers).

N2: DeploymentArtifact.ExtractTagHistorize doc wording: "The live-edit
side" → "The live-edit composer side".
2026-06-14 19:16:23 -04:00
Joseph Doherty c35c1d3734 refactor(historian): single-parse ExtractTagHistorize + review-nit tests/docs
Stop parsing TagConfig twice per tag on the deploy hot path: Phase7Composer's
equipment-tag Select lambda is now block-bodied (captures isHistorized/historianTagname
once), and DeploymentArtifact.BuildEquipmentTagPlans captures locals before result.Add.
Add wrong-type-historianTagname InlineData to ExtractTagHistorizeTests. Extend the
parity round-trip fixture with a 4th tag (isHistorized:false + JSON-null tagname)
exercising the artifact-side private guard path. Align DeploymentArtifact's
ExtractTagHistorize doc-comment with the composer-side phrasing (ExtractTagFullName /
ExtractTagAlarm cross-reference).
2026-06-14 19:02:02 -04:00
Joseph Doherty 440929c82a feat(historian): carry isHistorized + historianTagname through EquipmentTagPlan (byte-parity) 2026-06-14 18:55:04 -04:00
Joseph Doherty 7ffe8939e7 style(drivers): clarify Reconnecting InitializeFailed no-op is generation-agnostic (code-review nit) 2026-06-14 17:19:34 -04:00
Joseph Doherty 751786ec8c fix(drivers): adopt corrected config via ApplyDelta while (re)connecting (#7)
A DriverInstanceActor stuck Reconnecting/Connecting now adopts a config delivered via ApplyDelta and
re-initialises with it, instead of dead-lettering and retrying the stale config forever. A monotonic
init generation supersedes the in-flight init so the corrected config always wins.
2026-06-14 17:15:28 -04:00
Joseph Doherty f9be38430c fix(alarms): route native alarms by ConditionId (dotted FullName), not bare SourceNodeId (integration review)
v2-ci / build (push) Failing after 46s
v2-ci / unit-tests (tests/Core/ZB.MOM.WW.OtOpcUa.Cluster.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.ControlPlane.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Runtime.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Security.Tests) (push) Has been skipped
v2-ci / integration (tests/Server/ZB.MOM.WW.OtOpcUa.Host.IntegrationTests) (push) Has been skipped
v2-ci / integration (tests/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer.IntegrationTests) (push) Has been skipped
2026-06-14 04:09:01 -04:00
Joseph Doherty 7e86fa7099 fix(alarms): normalise native TransitionKind to canonical EmissionKind vocabulary (review) 2026-06-14 03:58:46 -04:00
Joseph Doherty 8736fcc37c feat(alarms): Primary-gated AlarmTransitionEvent fan-out for native alarms (Phase B WS-5) 2026-06-14 03:48:41 -04:00
Joseph Doherty 4c56a1719b feat(alarms): DriverHostActor routes native alarm transitions to Part 9 conditions (Phase B WS-4c) 2026-06-14 03:34:25 -04:00
Joseph Doherty 422e5b7db2 refactor(alarms): harden ExtractTagAlarm severity parse (TryGetInt32) + trim projector prior-state (review nits) 2026-06-14 03:27:03 -04:00
Joseph Doherty 25c3bd16ba feat(alarms): DriverInstanceActor forwards native OnAlarmEvent to parent (Phase B WS-4b) 2026-06-14 03:24:24 -04:00
Joseph Doherty c1aeafaaf3 feat(alarms): NativeAlarmProjector maps transitions to condition snapshots (Phase B WS-4a) 2026-06-14 03:16:44 -04:00
Joseph Doherty e1ccd99ea2 feat(alarms): EquipmentTagPlan.Alarm parsed byte-parity from TagConfig (Phase B WS-2) 2026-06-14 03:12:48 -04:00
Joseph Doherty 590e497872 fix(runtime): narrow ActorNodeWriteGateway catch + drop vacuous no-actor assertion 2026-06-14 01:32:34 -04:00
Joseph Doherty 526ddb6a57 feat(runtime): ActorNodeWriteGateway — Asks RouteNodeWrite, returns NodeWriteOutcome 2026-06-14 01:23:43 -04:00
Joseph Doherty 7e405e949b fix(runtime): swallow self SubscriptionFailed too (symmetric to SubscriptionEstablished) 2026-06-14 00:42:31 -04:00
Joseph Doherty 42b4a923fd fix(runtime): fast-fail writes in degraded driver states + swallow self SubscriptionEstablished 2026-06-14 00:34:37 -04:00
Joseph Doherty 4cda275b8d fix(runtime): fast-fail RouteNodeWrite while Stale + micro-opts + raw-blob routing test 2026-06-14 00:16:47 -04:00
Joseph Doherty a23fb2b82e feat(server): equipment-tag node writability from Tag.AccessLevel (parity-safe, no migration) 2026-06-13 11:46:00 -04:00
Joseph Doherty f8f1027287 feat(runtime): NodeId->driver reverse routing + primary-gated RouteNodeWrite 2026-06-13 11:44:26 -04:00
Joseph Doherty c4435e4fd6 feat(runtime): route driver values to folder-scoped equipment NodeIds (live-value delivery)
v2-ci / build (push) Failing after 44s
v2-ci / unit-tests (tests/Core/ZB.MOM.WW.OtOpcUa.Cluster.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.ControlPlane.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Runtime.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Security.Tests) (push) Has been skipped
v2-ci / integration (tests/Server/ZB.MOM.WW.OtOpcUa.Host.IntegrationTests) (push) Has been skipped
v2-ci / integration (tests/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer.IntegrationTests) (push) Has been skipped
2026-06-13 06:32:38 -04:00
Joseph Doherty 5432d8a021 refactor(opcua): repoint Phase7Applier + VirtualTagHostActor to shared EquipmentNodeIds 2026-06-13 06:31:44 -04:00
Joseph Doherty da1accceff feat(runtime): carry DriverInstanceId on AttributeValuePublished (live-value routing key) 2026-06-13 06:27:52 -04:00
Joseph Doherty 7d25480fee docs(galaxy): neutralize remaining stale SystemPlatform/alias terminology in comments + a test name
Replace "SystemPlatform mirror tag", "Galaxy alias", and "SystemPlatform-kind" in doc-comments and
test names with neutral accurate wording ("FolderPath-scoped tag", "EquipmentId == null", etc.).
No code, logic, or test bodies changed — comments and one test method name only.
2026-06-12 22:30:50 -04:00