The seed re-inserted a Namespace with Kind='SystemPlatform' (+ a
GalaxyMxGateway driver + 3 mirror tags), but that NamespaceKind member was
removed when Galaxy became Equipment-kind (migration CleanupSystemPlatformNamespaces).
cluster-seed runs after the migrator, so a fresh down -v/up re-introduced a Kind
the current code can't EF-materialize — 500ing /deployments and failing every
publish (ConfigComposer reads db.Namespaces). Remove the obsolete inserts;
author an Equipment-kind Galaxy driver via the UI if a fixture is needed.
Code-review follow-ups on the poll-loop collapse: (1) RetireAsync is fire-and-
forget and does NOT guarantee zero overlap — the retired loop runs until its
in-flight read+tick finish and it observes cancellation, so a device transition
landing in that one-tick window can fire once on both loops (at most ONE
duplicate raise/clear per reconnect, transient + self-correcting; upstream Part
9 conditions dedupe on ConditionId). Documented in both RetireAsync XML docs so
it isn't mistaken for a zero-overlap guarantee. (2) wrap Cts.Dispose() so the
fire-and-forget task has no theoretical unobserved-exception path.
The owning DriverInstanceActor re-subscribes alarms on every Connected
entry (DetachAlarmSource nulls its cached handle on Connected->Reconnecting
without calling UnsubscribeAlarmsAsync), and the driver object + its alarm
projection are reused across every in-place reconnect. Each SubscribeAsync
started a fresh, never-cancelled Task.Run poll loop and added it to _subs,
so N reconnects leaked N concurrent loops all polling the device and all
firing the same raise/clear transitions => duplicate alarm events + CPU/mem
growth.
Mirrors the Galaxy #399 fix (Clear-before-Add) but for live poll loops the
collapse must also CANCEL the superseded loops, not just drop references.
SubscribeAsync now snapshots existing subs under _subsLock, clears _subs,
adds the new sub, starts its loop, then retires each stale sub out-of-band
(RetireAsync: Cancel + await loop + Dispose CTS, fire-and-forget so the new
subscription's return isn't blocked on a poll interval). Snapshot+clear under
the same lock DisposeAsync uses guarantees no double-own / double-dispose.
There is exactly one consumer per driver instance (factory-per-actor), so
retiring all prior subscriptions before starting the new one is faithful.
Regression tests (TDD, fail->pass): subscribe twice then drive one device
raise; assert OnAlarmEvent fires exactly once (was twice with two leaked
loops).
The plan + task list for the write-outcome self-correction work (B1, already
shipped via master 1d797c1c). Its design-doc counterpart is already committed;
this adds the matching plan artifacts, consistent with the other docs/plans/.
User-published 0.1.1 of the MxGateway client + contracts packages into the
local-mxgw vendored source (nuget-packages/). Bumps Directory.Packages.props to
match and adds the 0.1.1 .nupkg artifacts alongside the existing 0.1.0 ones.
Full solution builds clean against 0.1.1.
GalaxyDriver's StreamAlarms feed is session-less and survives an in-place
reconnect, so DriverInstanceActor re-subscribed on every Connected re-entry
(after dropping its own cached handle without an Unsubscribe — sync teardown).
The re-subscribe was additive: _alarmSubscriptions.Add grew the list by one
untracked handle per reconnect cycle — a slow unbounded leak. Functionally
harmless (the gate is Count>0 and OnAlarmFeedTransition only reads [0], firing
once regardless), but it accumulated forever.
Fix: SubscribeAlarmsAsync clears the set before adding, collapsing to a single
live handle (under the existing _alarmHandlersLock, atomic w.r.t. the fan-out
reader). There is exactly one consumer per driver instance (factory-per-actor
lifecycle), so replacing the set with the latest handle is faithful. Chosen
over making the actor's sync DetachAlarmSource call UnsubscribeAlarmsAsync
async/fire-and-forget — disproportionate for a minor leak.
Regression test Re_subscribe_collapses_to_a_single_handle_no_accumulation
(TDD-verified: FAILS without the Clear — releasing the latest handle leaves
the feed open because stale handles remain; PASSES with the fix). Galaxy tests
263 pass / 3 skip; Runtime native-alarm 24 pass. Code-reviewed (approved).
HandleRestartDriver stopped + respawned the child within one synchronous
message handler, reusing the base actor name drv-<id>. Context.Stop is async
(the child processes its own stop on its own mailbox), so the old child was
ALWAYS still registered when the respawn ran — Context.ActorOf threw
InvalidActorNameException deterministically on every AdminUI Restart press,
crashing + restarting the host.
Fix: a monotonic _childSpawnGeneration counter (single-threaded actor) feeds a
-g<gen> suffix on every spawned child name, so a respawn can never collide with
the still-terminating predecessor. Children are tracked by the _children dict
(by IActorRef), never by actor path, so the suffix is invisible to callers.
This also closes the same-shaped latent race in the reconcile path (a removed-
then-readded instance, and a driver-type-change ToStop+ToSpawn in one plan).
Regression test RestartDriver_respawns_the_child_without_an_actor_name_collision
(verified: FAILS on the old code with the exact InvalidActorNameException,
PASSES with the fix). Runtime.Tests 238/238 green. Code-reviewed (approved).