Joseph Doherty
93d9160dae
feat(alarms): DriverHostActor routes native-condition acks to the owning driver [H6d]
2026-06-15 14:46:00 -04:00
Joseph Doherty
87dd65b97a
test(alarms): native ack wrong-role deny + tidy NativeAlarmAck doc (code-review)
2026-06-15 14:39:26 -04:00
Joseph Doherty
a6d9de091b
feat(alarms): native condition Acknowledge routes to NativeAlarmAckRouter with principal [H6c]
2026-06-15 14:33:58 -04:00
Joseph Doherty
be6858baa1
fix(alarms): OnEnableDisable native-check via lock-guarded IsNativeAlarmNode + unstale AlarmCommand doc (code-review)
2026-06-15 14:30:17 -04:00
Joseph Doherty
328bd1b9ee
feat(alarms): wire OnEnableDisable over OPC UA (AlarmAck-gated; native→BadNotSupported) [H4]
2026-06-15 14:24:19 -04:00
Joseph Doherty
226587d817
test(alarms): cover isNative rebuild/kind-flip lifecycle + Phase7Applier call-site (code-review)
2026-06-15 14:20:20 -04:00
Joseph Doherty
2423edf232
test(alarms): assert Galaxy ack null-OperatorUser falls back to empty (code-review)
2026-06-15 14:18:57 -04:00
Joseph Doherty
418663b359
feat(alarms): thread isNative through MaterialiseAlarmCondition; node manager tracks native conditions [H6a]
2026-06-15 14:13:30 -04:00
Joseph Doherty
ed941c51da
feat(alarms): AlarmAcknowledgeRequest carries OperatorUser; Galaxy/ScriptedAlarmSource honor it [H6b]
2026-06-15 14:11:40 -04:00
Joseph Doherty
c236263e8d
fix(authz): give HistoryUpdate its own NodePermissions bit (was aliased to HistoryRead) [H2]
2026-06-15 14:09:35 -04:00
Joseph Doherty
6ab3d8630b
docs(alarms): Phase 3 implementation plan + tasks (H4 + H2-bit + H6)
2026-06-15 14:05:00 -04:00
Joseph Doherty
40b883effe
docs(alarms): Phase 3 design — OPC UA standards completeness (H4 Enable/Disable + H2 HistoryUpdate bit + H6 native-ack→AVEVA)
2026-06-15 13:59:28 -04:00
Joseph Doherty
4af8e65af1
fix(redundancy): PeerProbeSupervisor explicitly ignores co-mingled OpcUaProbeResult (integration review)
v2-ci / build (push) Failing after 34s
v2-ci / unit-tests (tests/Core/ZB.MOM.WW.OtOpcUa.Cluster.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.ControlPlane.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Runtime.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Security.Tests) (push) Has been skipped
v2-ci / integration (tests/Server/ZB.MOM.WW.OtOpcUa.Host.IntegrationTests) (push) Has been skipped
v2-ci / integration (tests/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer.IntegrationTests) (push) Has been skipped
2026-06-15 13:40:16 -04:00
Joseph Doherty
4c78dcd358
feat(redundancy): wire dbHealth into OpcUaPublishActor + spawn PeerProbeSupervisor per node
2026-06-15 13:33:34 -04:00
Joseph Doherty
393b746d9b
docs(redundancy): sync component table with the wired calculator + PeerProbeSupervisor
2026-06-15 13:30:55 -04:00
Joseph Doherty
5a064e086d
test(redundancy): lock in stale-Terminated guard + clarify OnTerminated (code-review)
2026-06-15 13:29:58 -04:00
Joseph Doherty
70e6d3d2c0
docs(redundancy): ServiceLevelCalculator is wired into the live publish path
2026-06-15 13:26:34 -04:00
Joseph Doherty
f41e957e07
feat(redundancy): PeerProbeSupervisor maintains one peer OPC UA probe per driver peer
2026-06-15 13:22:38 -04:00
Joseph Doherty
37b32a5623
feat(redundancy): periodic HealthTick refreshes DB reachability via Ask/PipeTo
2026-06-15 13:15:26 -04:00
Joseph Doherty
5382eea9b5
test(redundancy): cover stale-probe-not-demoted branch + make _probeFreshnessWindow readonly (code-review)
2026-06-15 13:11:01 -04:00
Joseph Doherty
cf278035d2
feat(redundancy): OpcUaProbeOk from peer-probes-me with freshness debounce
2026-06-15 13:04:41 -04:00
Joseph Doherty
a9ff1a64b2
fix(redundancy): always publish first ServiceLevel (even 0) + log SafeSelfStatus failures (code-review)
2026-06-15 13:00:25 -04:00
Joseph Doherty
3e609a2b19
feat(redundancy): OpcUaPublishActor computes ServiceLevel via calculator (DB+stale+leader; legacy seam)
2026-06-15 12:51:32 -04:00
Joseph Doherty
ff0f62db38
refactor(redundancy): move ServiceLevelCalculator to Core.Cluster (shared, Runtime-reachable)
2026-06-15 12:45:17 -04:00
Joseph Doherty
7605f4d8fd
docs(redundancy): Phase 2 implementation plan + tasks (H3 ServiceLevel wiring)
2026-06-15 12:41:51 -04:00
Joseph Doherty
0528353315
docs(redundancy): Phase 2 design — health-aware ServiceLevel (H3)
2026-06-15 12:33:09 -04:00
Joseph Doherty
4bd7180e7f
fix(docker-dev): stop seeding retired SystemPlatform namespace
...
v2-ci / build (push) Failing after 36s
v2-ci / unit-tests (tests/Core/ZB.MOM.WW.OtOpcUa.Cluster.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.ControlPlane.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Runtime.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Security.Tests) (push) Has been skipped
v2-ci / integration (tests/Server/ZB.MOM.WW.OtOpcUa.Host.IntegrationTests) (push) Has been skipped
v2-ci / integration (tests/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer.IntegrationTests) (push) Has been skipped
The seed re-inserted a Namespace with Kind='SystemPlatform' (+ a
GalaxyMxGateway driver + 3 mirror tags), but that NamespaceKind member was
removed when Galaxy became Equipment-kind (migration CleanupSystemPlatformNamespaces).
cluster-seed runs after the migrator, so a fresh down -v/up re-introduced a Kind
the current code can't EF-materialize — 500ing /deployments and failing every
publish (ConfigComposer reads db.Namespaces). Remove the obsolete inserts;
author an Equipment-kind Galaxy driver via the UI if a fixture is needed.
2026-06-15 12:17:02 -04:00
Joseph Doherty
907005d2d2
docs(claude): note local docker-dev rig has login disabled — run live /run verification directly, don't wait for sign-in
v2-ci / build (push) Failing after 48s
v2-ci / unit-tests (tests/Core/ZB.MOM.WW.OtOpcUa.Cluster.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.ControlPlane.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Runtime.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Security.Tests) (push) Has been skipped
v2-ci / integration (tests/Server/ZB.MOM.WW.OtOpcUa.Host.IntegrationTests) (push) Has been skipped
v2-ci / integration (tests/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer.IntegrationTests) (push) Has been skipped
2026-06-15 11:50:55 -04:00
Joseph Doherty
c6a543d1b6
docs(vtags): note rename-respawn transient + write-side-only historize (integration review)
v2-ci / build (push) Failing after 44s
v2-ci / unit-tests (tests/Core/ZB.MOM.WW.OtOpcUa.Cluster.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.ControlPlane.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Runtime.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Security.Tests) (push) Has been skipped
v2-ci / integration (tests/Server/ZB.MOM.WW.OtOpcUa.Host.IntegrationTests) (push) Has been skipped
v2-ci / integration (tests/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer.IntegrationTests) (push) Has been skipped
2026-06-15 10:50:08 -04:00
Joseph Doherty
aaa5d8b851
docs(vtags): document runtime Historize honoring + infra-gated durable sink (Phase 1 H5)
2026-06-15 10:43:29 -04:00
Joseph Doherty
4501f12669
feat(vtags): wire IHistoryWriter through DriverHostActor (Null default; durable sink infra-gated) (H5d, stillpending §1)
2026-06-15 10:38:49 -04:00
Joseph Doherty
2f30c54dc1
test(vtags): thread-safe CapturingHistoryWriter + drop redundant wait (H5c review follow-up)
2026-06-15 10:33:14 -04:00
Joseph Doherty
0c6d4c5491
feat(vtags): forward historized vtag results to IHistoryWriter (H5c, stillpending §1)
2026-06-15 10:26:25 -04:00
Joseph Doherty
83d3b9f7be
test(vtags): planner detects Historize-only toggle as a change + doc nit (H5a review follow-up)
2026-06-15 10:21:31 -04:00
Joseph Doherty
9c5a091395
feat(vtags): decode VirtualTag Historize from artifact, byte-parity with composer (H5b, stillpending §1)
2026-06-15 10:17:08 -04:00
Joseph Doherty
fc8121cbf3
feat(vtags): carry VirtualTag.Historize onto EquipmentVirtualTagPlan (H5a, stillpending §1)
2026-06-15 10:17:05 -04:00
Joseph Doherty
ebf2f1dd7a
fix(vtags): prune _planByVtag on child termination + crash-then-change test (H1b review follow-up)
2026-06-15 10:12:11 -04:00
Joseph Doherty
ada01e1af8
fix(vtags): respawn equipment virtualtag child on in-place plan change (H1b, stillpending §1)
2026-06-15 10:05:29 -04:00
Joseph Doherty
1dc713693a
fix(deploy): count removed equipment tags/vtags in RemovedNodes (H1a review follow-up)
2026-06-15 10:01:37 -04:00
Joseph Doherty
1e95856b00
fix(deploy): rebuild address space on changed-only deploys (H1a, stillpending §1)
2026-06-15 09:57:40 -04:00
Joseph Doherty
50a2fdf32d
chore(plans): mark confirmed-shipped .tasks.json completed so audits don't re-flag (stillpending §7)
2026-06-15 09:52:51 -04:00
Joseph Doherty
a9d267c91a
docs(security,core): correct stale write-outcome doc + note benign DraftSnapshot/LeaderChanged residue (stillpending §9/§3)
2026-06-15 09:48:14 -04:00
Joseph Doherty
b4af9e7f37
docs(comments): correct 7 stale 'later task/milestone' comments (stillpending §9)
2026-06-15 09:47:08 -04:00
Joseph Doherty
68a0f759f0
docs(plans): Phase 0+1 implementation plan for the still-pending backlog
...
12 tasks (0 branch; 1-3 Phase 0 hygiene; 4-5 H1 changed-only-deploy fix;
6-9 H5 vtag Historize threading + IHistoryWriter seam; 10 docs; 11 verify).
Conservative rebuild-on-change; no EF migration (Historize column + artifact
already carry it); durable AVEVA sink flagged infra-gated.
2026-06-15 09:40:03 -04:00
Joseph Doherty
f64be52796
docs(plans): phased completion design for the still-pending backlog
...
Roadmap for closing stillpending.md §1-§5 + §7/§9 cleanup in 9 phases
(0 hygiene -> 1 silent-deploy bugs H1/H5 -> 2 ServiceLevel H3 ->
3 OPC UA standards H4/H2-bit/H6 -> 4 driver coverage -> 5 probes ->
6 AdminUI -> 7 Client.UI -> 8 per-cluster scoping). Conservative
rebuild-on-change for H1; plan-and-execute phase-by-phase; no EF
migration; defer-list flagged (Denied/Simulated/Language/InlayHints/
HistoryUpdate-service/Galaxy-gateway-write).
2026-06-15 09:27:06 -04:00
Joseph Doherty
151b7165af
docs(abcip,focas): document RetireAsync one-tick overlap residual + guard Dispose
...
v2-ci / build (push) Failing after 2m47s
v2-ci / unit-tests (tests/Core/ZB.MOM.WW.OtOpcUa.Cluster.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.ControlPlane.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Runtime.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Security.Tests) (push) Has been skipped
v2-ci / integration (tests/Server/ZB.MOM.WW.OtOpcUa.Host.IntegrationTests) (push) Has been skipped
v2-ci / integration (tests/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer.IntegrationTests) (push) Has been skipped
Code-review follow-ups on the poll-loop collapse: (1) RetireAsync is fire-and-
forget and does NOT guarantee zero overlap — the retired loop runs until its
in-flight read+tick finish and it observes cancellation, so a device transition
landing in that one-tick window can fire once on both loops (at most ONE
duplicate raise/clear per reconnect, transient + self-correcting; upstream Part
9 conditions dedupe on ConditionId). Documented in both RetireAsync XML docs so
it isn't mistaken for a zero-overlap guarantee. (2) wrap Cts.Dispose() so the
fire-and-forget task has no theoretical unobserved-exception path.
2026-06-15 06:14:44 -04:00
Joseph Doherty
6ba59f9d4d
fix(abcip,focas): collapse alarm projection to a single poll loop (no reconnect leak)
...
The owning DriverInstanceActor re-subscribes alarms on every Connected
entry (DetachAlarmSource nulls its cached handle on Connected->Reconnecting
without calling UnsubscribeAlarmsAsync), and the driver object + its alarm
projection are reused across every in-place reconnect. Each SubscribeAsync
started a fresh, never-cancelled Task.Run poll loop and added it to _subs,
so N reconnects leaked N concurrent loops all polling the device and all
firing the same raise/clear transitions => duplicate alarm events + CPU/mem
growth.
Mirrors the Galaxy #399 fix (Clear-before-Add) but for live poll loops the
collapse must also CANCEL the superseded loops, not just drop references.
SubscribeAsync now snapshots existing subs under _subsLock, clears _subs,
adds the new sub, starts its loop, then retires each stale sub out-of-band
(RetireAsync: Cancel + await loop + Dispose CTS, fire-and-forget so the new
subscription's return isn't blocked on a poll interval). Snapshot+clear under
the same lock DisposeAsync uses guarantees no double-own / double-dispose.
There is exactly one consumer per driver instance (factory-per-actor), so
retiring all prior subscriptions before starting the new one is faithful.
Regression tests (TDD, fail->pass): subscribe twice then drive one device
raise; assert OnAlarmEvent fires exactly once (was twice with two leaked
loops).
2026-06-15 06:09:38 -04:00
Joseph Doherty
43b3769a1d
docs(plans): add write-outcome self-correction implementation plan
...
v2-ci / build (push) Failing after 32s
v2-ci / unit-tests (tests/Core/ZB.MOM.WW.OtOpcUa.Cluster.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.ControlPlane.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Runtime.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Security.Tests) (push) Has been skipped
v2-ci / integration (tests/Server/ZB.MOM.WW.OtOpcUa.Host.IntegrationTests) (push) Has been skipped
v2-ci / integration (tests/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer.IntegrationTests) (push) Has been skipped
The plan + task list for the write-outcome self-correction work (B1, already
shipped via master 1d797c1c ). Its design-doc counterpart is already committed;
this adds the matching plan artifacts, consistent with the other docs/plans/.
2026-06-15 05:57:15 -04:00
Joseph Doherty
5a70cd7910
chore(deps): bump vendored MxGateway.Client/Contracts 0.1.0 -> 0.1.1
...
User-published 0.1.1 of the MxGateway client + contracts packages into the
local-mxgw vendored source (nuget-packages/). Bumps Directory.Packages.props to
match and adds the 0.1.1 .nupkg artifacts alongside the existing 0.1.0 ones.
Full solution builds clean against 0.1.1.
2026-06-15 05:57:09 -04:00
Joseph Doherty
013882262a
fix(galaxy): bound alarm-subscription handles to one (no reconnect leak)
...
v2-ci / build (push) Failing after 44s
v2-ci / unit-tests (tests/Core/ZB.MOM.WW.OtOpcUa.Cluster.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.ControlPlane.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Runtime.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Security.Tests) (push) Has been skipped
v2-ci / integration (tests/Server/ZB.MOM.WW.OtOpcUa.Host.IntegrationTests) (push) Has been skipped
v2-ci / integration (tests/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer.IntegrationTests) (push) Has been skipped
GalaxyDriver's StreamAlarms feed is session-less and survives an in-place
reconnect, so DriverInstanceActor re-subscribed on every Connected re-entry
(after dropping its own cached handle without an Unsubscribe — sync teardown).
The re-subscribe was additive: _alarmSubscriptions.Add grew the list by one
untracked handle per reconnect cycle — a slow unbounded leak. Functionally
harmless (the gate is Count>0 and OnAlarmFeedTransition only reads [0], firing
once regardless), but it accumulated forever.
Fix: SubscribeAlarmsAsync clears the set before adding, collapsing to a single
live handle (under the existing _alarmHandlersLock, atomic w.r.t. the fan-out
reader). There is exactly one consumer per driver instance (factory-per-actor
lifecycle), so replacing the set with the latest handle is faithful. Chosen
over making the actor's sync DetachAlarmSource call UnsubscribeAlarmsAsync
async/fire-and-forget — disproportionate for a minor leak.
Regression test Re_subscribe_collapses_to_a_single_handle_no_accumulation
(TDD-verified: FAILS without the Clear — releasing the latest handle leaves
the feed open because stale handles remain; PASSES with the fix). Galaxy tests
263 pass / 3 skip; Runtime native-alarm 24 pass. Code-reviewed (approved).
2026-06-15 05:49:07 -04:00