The seed re-inserted a Namespace with Kind='SystemPlatform' (+ a
GalaxyMxGateway driver + 3 mirror tags), but that NamespaceKind member was
removed when Galaxy became Equipment-kind (migration CleanupSystemPlatformNamespaces).
cluster-seed runs after the migrator, so a fresh down -v/up re-introduced a Kind
the current code can't EF-materialize — 500ing /deployments and failing every
publish (ConfigComposer reads db.Namespaces). Remove the obsolete inserts;
author an Equipment-kind Galaxy driver via the UI if a fixture is needed.
Code-review follow-ups on the poll-loop collapse: (1) RetireAsync is fire-and-
forget and does NOT guarantee zero overlap — the retired loop runs until its
in-flight read+tick finish and it observes cancellation, so a device transition
landing in that one-tick window can fire once on both loops (at most ONE
duplicate raise/clear per reconnect, transient + self-correcting; upstream Part
9 conditions dedupe on ConditionId). Documented in both RetireAsync XML docs so
it isn't mistaken for a zero-overlap guarantee. (2) wrap Cts.Dispose() so the
fire-and-forget task has no theoretical unobserved-exception path.
The owning DriverInstanceActor re-subscribes alarms on every Connected
entry (DetachAlarmSource nulls its cached handle on Connected->Reconnecting
without calling UnsubscribeAlarmsAsync), and the driver object + its alarm
projection are reused across every in-place reconnect. Each SubscribeAsync
started a fresh, never-cancelled Task.Run poll loop and added it to _subs,
so N reconnects leaked N concurrent loops all polling the device and all
firing the same raise/clear transitions => duplicate alarm events + CPU/mem
growth.
Mirrors the Galaxy #399 fix (Clear-before-Add) but for live poll loops the
collapse must also CANCEL the superseded loops, not just drop references.
SubscribeAsync now snapshots existing subs under _subsLock, clears _subs,
adds the new sub, starts its loop, then retires each stale sub out-of-band
(RetireAsync: Cancel + await loop + Dispose CTS, fire-and-forget so the new
subscription's return isn't blocked on a poll interval). Snapshot+clear under
the same lock DisposeAsync uses guarantees no double-own / double-dispose.
There is exactly one consumer per driver instance (factory-per-actor), so
retiring all prior subscriptions before starting the new one is faithful.
Regression tests (TDD, fail->pass): subscribe twice then drive one device
raise; assert OnAlarmEvent fires exactly once (was twice with two leaked
loops).
The plan + task list for the write-outcome self-correction work (B1, already
shipped via master 1d797c1c). Its design-doc counterpart is already committed;
this adds the matching plan artifacts, consistent with the other docs/plans/.
User-published 0.1.1 of the MxGateway client + contracts packages into the
local-mxgw vendored source (nuget-packages/). Bumps Directory.Packages.props to
match and adds the 0.1.1 .nupkg artifacts alongside the existing 0.1.0 ones.
Full solution builds clean against 0.1.1.
GalaxyDriver's StreamAlarms feed is session-less and survives an in-place
reconnect, so DriverInstanceActor re-subscribed on every Connected re-entry
(after dropping its own cached handle without an Unsubscribe — sync teardown).
The re-subscribe was additive: _alarmSubscriptions.Add grew the list by one
untracked handle per reconnect cycle — a slow unbounded leak. Functionally
harmless (the gate is Count>0 and OnAlarmFeedTransition only reads [0], firing
once regardless), but it accumulated forever.
Fix: SubscribeAlarmsAsync clears the set before adding, collapsing to a single
live handle (under the existing _alarmHandlersLock, atomic w.r.t. the fan-out
reader). There is exactly one consumer per driver instance (factory-per-actor
lifecycle), so replacing the set with the latest handle is faithful. Chosen
over making the actor's sync DetachAlarmSource call UnsubscribeAlarmsAsync
async/fire-and-forget — disproportionate for a minor leak.
Regression test Re_subscribe_collapses_to_a_single_handle_no_accumulation
(TDD-verified: FAILS without the Clear — releasing the latest handle leaves
the feed open because stale handles remain; PASSES with the fix). Galaxy tests
263 pass / 3 skip; Runtime native-alarm 24 pass. Code-reviewed (approved).
HandleRestartDriver stopped + respawned the child within one synchronous
message handler, reusing the base actor name drv-<id>. Context.Stop is async
(the child processes its own stop on its own mailbox), so the old child was
ALWAYS still registered when the respawn ran — Context.ActorOf threw
InvalidActorNameException deterministically on every AdminUI Restart press,
crashing + restarting the host.
Fix: a monotonic _childSpawnGeneration counter (single-threaded actor) feeds a
-g<gen> suffix on every spawned child name, so a respawn can never collide with
the still-terminating predecessor. Children are tracked by the _children dict
(by IActorRef), never by actor path, so the suffix is invisible to callers.
This also closes the same-shaped latent race in the reconcile path (a removed-
then-readded instance, and a driver-type-change ToStop+ToSpawn in one plan).
Regression test RestartDriver_respawns_the_child_without_an_actor_name_collision
(verified: FAILS on the old code with the exact InvalidActorNameException,
PASSES with the fix). Runtime.Tests 238/238 green. Code-reviewed (approved).
C1 (critical): a boundary tie cluster larger than NumValuesPerNode could
silently truncate a resumed read to GoodNoData, permanently dropping the
un-emitted ties — the (timestamp, skip) cursor cannot advance past a single
timestamp the fixed-(start,end,cap) backend keeps re-returning. Now detected
and failed LOUDLY per node with BadHistoryOperationUnsupported + a log naming
the tag/timestamp/cap; documented in Historian.md with the larger-cap remedy.
Regression test Raw_tie_cluster_larger_than_page_fails_loudly_not_silently.
I3: build HistoryData before Save() so a projection failure can never orphan a
stored continuation cursor.
N1 (YAGNI): drop the never-produced HistoryReadKind enum + Processed-only
Aggregate/IntervalTicks fields from HistoryContinuationState — only Raw pages.
N3: ComputeResumeCursor guards its documented non-empty precondition.
I1: document InMemoryHistoryContinuationStore's eventual-consistency (test double).
Build clean, 182/182 OpcUaServer tests pass.
The Wonderware historian backend is single-shot — it returns up to
NumValuesPerNode samples with a null continuation point — so paging is
synthesised server-side, time-based, for the only count-capped arm (Raw):
- A full page (count == NumValuesPerNode, NumValuesPerNode > 0) emits an
opaque 16-byte continuation point and stores a resume cursor; a short page
(or NumValuesPerNode == 0 "all values") emits none.
- A resume read takes the stored cursor, reads the next page from the boundary
forward, and emits a fresh CP only if that page is also full.
- The resume cursor is tie-safe (HistoryPaging.ComputeResumeCursor /
TrimBoundaryDuplicates): the next page resumes from the boundary timestamp
INCLUSIVE and drops the head ties already returned, so samples sharing the
boundary SourceTimestamp are neither duplicated nor skipped.
Continuation points are bound to the OPC UA session via the SDK's
ISession.SaveHistoryContinuationPoint / RestoreHistoryContinuationPoint store
(SessionHistoryContinuationStore) — capped by ServerConfiguration.
MaxHistoryContinuationPoints (default 100, oldest-evicted) and disposed on
session close. releaseContinuationPoints is honoured via an override of
HistoryReleaseContinuationPoints (the base dispatcher routes release-only reads
there, never to the per-details arms). An unknown / evicted / released point
resumes to BadContinuationPointInvalid.
Processed and AtTime stay single-shot: neither details type carries a client
count cap, so the single-shot backend returns the complete result in one read
and there is no "full page" signal to page on (spec-conformant). Modified-value
history remains out of scope.
The pure paging decisions + CP store contract are unit-tested via HistoryPaging
+ InMemoryHistoryContinuationStore; the full multi-page round trip is driven
end-to-end through the node manager with an in-memory store + a series-backed
fake historian (the in-process harness is session-less).