The 2026-06-25 re-review observed that the gateway server no longer uses the
Contracts-generated Galaxy types (it consumes the wire-identical types from the
ZB.MOM.WW.GalaxyRepository package). That made galaxy_repository.proto look like
dead code. It is not: Go/Rust/Java/Python clients compile it by path, the .NET
client consumes Contracts.Proto.Galaxy.* via project reference, and
clients/proto/proto-inputs.json publishes it. Document this at the csproj entry
and in CLAUDE.md so it is not deleted in a future cleanup.
Resolves Server-059, Tests-041, Server-060 (2026-06-25 re-review).
DashboardSnapshotService memoized the whole Galaxy summary keyed on the cache
Sequence, but the shared library bumps Sequence only on a heavy refresh: the
steady-state tick, the refresh-failure path, and the age-based status getter all
replace the entry via 'previous with { ... }' at the SAME Sequence. The dashboard
therefore froze LastQueriedAt and, during a Galaxy SQL outage, kept showing
Healthy with no error for the whole outage.
Split DashboardGalaxySummaryProjector into ComputeBreakdown (the O(N) template/
category work, the only sequence-bound part) and BuildSummary (cheap volatile
fields copied fresh). ResolveGalaxySummary now memoizes only the breakdown by
Sequence and rebuilds the summary from the current entry each tick. Removed the
redundant DashboardGalaxyProjector wrapper.
Tests: same-sequence status/error/timestamp now reflected (the regression);
memoization-hit and sequence-invalidation guards; GatewayApplicationTests asserts
the DI container resolves IGalaxyBrowseScopeProvider to GatewayBrowseScopeProvider
(pins the registration-order invariant over the library's no-op default).
Add GalaxyRepositoryHostWiringTests.BrowseChildren_BrowseSubtreesConstraintThroughHostWiring_FiltersChildren:
constructs the real GatewayRequestIdentityAccessor + GatewayBrowseScopeProvider, passes the
provider as IGalaxyBrowseScopeProvider to the lib GalaxyRepositoryGrpcService, and asserts
two children (unconstrained) then empty (BrowseSubtrees=["NonExistent"]) — proving the full
production authz-filtering chain is correctly wired.
Strengthen DashboardSnapshotServiceTests.GetSnapshot_ProjectsGalaxySummaryFromHierarchyCache:
add Assert.Equal(2, TopTemplates.Count) and Assert.Contains($Area, InstanceCount==1) so the
test guards the complete summary output, not just the $Pump entry.
The package's ZB.MOM.WW.GalaxyRepository namespace shadows the bare
'GalaxyRepository' class reference in this inline test (CS0234). The lib
now owns the identical mapping test, so remove the duplicate (pulls one
file of Task 10's test reconciliation forward to keep Tests compiling).
SQLitePCLRaw.lib.e_sqlite3 2.1.11 (transitive via Microsoft.Data.Sqlite)
carries GHSA-2m69-gcr7-jv3q, surfacing as NU1903 warning-as-error and
breaking the build (already red on main). No patched e_sqlite3 exists yet.
Targeted NuGetAuditSuppress keeps all other transitive packages audited.
Add packageSourceMapping entry for ZB.MOM.WW.GalaxyRepository in nuget.config
and add the PackageReference (Version 0.2.0) to the Server csproj. Also set
NuGetAuditMode=direct to suppress the pre-existing NU1903 transitive vulnerability
in SQLitePCLRaw.lib.e_sqlite3 2.1.11 (no upstream fix available yet).
- Server-057: extend []-suffix normalization to AddItemBulk/AddBufferedItem so bulk-added
array tags bind write-capable handles (authz check, worker bind, and registration kept
consistent); update gateway.md + client READMEs. Tests: AddItemBulk/AddBufferedItem wiring.
- Server-058: assert []-fallback-resolved bare array names are still denied when out of
read/write scope and that MaxWriteClassification is enforced on suffixed array registrations.
- Contracts-023/024/025: round-trip + field-19 descriptor pin for MxSparseArray; document
MxSparseArray in docs/Contracts.md; enumerate it in the protocol-version-3 test summary.
- Tests-040: add wiring tests for the six uncovered sparse-write arms (WriteSecured, Write2,
WriteSecured2, Write2Bulk, WriteSecuredBulk, WriteSecured2Bulk).
dotnet build + targeted tests green (184 passed).
Guard against proto uint32 total_length values that exceed Array.MaxLength
before casting; the previous checked cast threw OverflowException (gRPC
Internal) instead of the intended InvalidArgument. Adds tests for the new
guard, for the null-value ArgumentNullException path, and removes the
checked keyword (redundant after the guard).
Client.Java-040..048, Worker.Tests-034/035/036. Edits applied on the Mac,
which has no JRE and cannot build the x86+MXAccess worker tests; findings are
marked In Progress pending gradle + x86 build verification on windev. Do not
mark Resolved until verified there.
Server-054/055/056, Contracts-020/021/022, Tests-036/038/039,
IntegrationTests-030/031/032 (+033 deferred to live rig),
Client.Dotnet-026/028/029 (+027 won't-fix), Client.Go-030..034,
Client.Python-032..036, Client.Rust-033..038.
Key fix: SessionEventDistributor orphaned a subscriber that registered after
the pump completed but before disposal (Server-056) -> register paths now
complete late registrants under _lifecycleLock; regression test added. The
racy dashboard-mirror gRPC test made deterministic (Tests-039).
Verified green locally: gateway Tests targeted classes (GatewaySession,
SessionEventDistributor, GatewayOptionsValidator, ProtobufContractRoundTrip,
GatewaySessionDashboardMirror) + dotnet/go/python/rust client suites.
Adds WorkerReadyWaitTimeoutMs to SessionOptions (default 0 = disabled),
validates >= 0 in GatewayOptionsValidator, documents it in
GatewayConfiguration.md, and adds validator + default-value tests.
No wait/poll logic is implemented here (that is Task 8).
The hierarchy query returned deployed objects only (deployed_package_id <> 0), so
areas whose containing area is undeployed were orphaned and hidden from /browse —
on wonder, only the lone deployed root area surfaced. Include category-13 Area
objects regardless of deployment, and in GalaxyHierarchyIndex re-root any object
whose parent is absent from the set (e.g. a deleted container area) so nothing
disappears under a phantom parent id.
- EventStreamService: remove dead per-item sequence guard in the replay
loop (RegisterWithReplay already returns only events > afterSequence)
and correct the comment that falsely claimed a "per-item constraint
filter" is applied; the event stream has no per-event constraint
filtering today.
- SessionEventDistributor.RegisterWithReplay: set oldestAvailableSequence=0
when gap==false so the implementation matches the documented contract
(OldestAvailableSequence is meaningful only when Gap is true).
Update the two RegisterWithReplay tests that asserted the old non-zero
value in the no-gap path.
- RegisterSubscriber: remove stray blank line at method entry.
- SessionEventDistributorTests: add RegisterWithReplay_AfterDispose_
ThrowsObjectDisposedException to pin nested-lock disposal behavior.
- EffectiveSessionConfiguration: add DetachGraceSeconds field; GatewayConfigurationProvider
forwards value.Sessions.DetachGraceSeconds (blocker fix).
- GatewaySession.InvokeAsync and ReadEventsAsync: switch TouchClientActivity calls from
DateTimeOffset.UtcNow to _eventStreaming.TimeProvider.GetUtcNow() so Task 12 fake-clock
control works end-to-end (split-clock fix).
- TOCTOU fix: add TryBeginCloseIfExpired(now, out alreadyClosing) to GatewaySession that
re-checks IsLeaseExpiredCore/IsDetachGraceExpiredCore AND _activeEventSubscriberCount==0
under _syncRoot before transitioning to Closing; CloseExpiredLeasesAsync calls it before
CloseSessionCoreAsync so a reattach that wins the race leaves the session Ready/usable.
- Minors: lease-expiry-takes-precedence comment in CloseExpiredLeasesAsync; TOCTOU comment
block; sweep-cycle latency note added to SessionOptions.DetachGraceSeconds XML doc and to
GatewayConfiguration.md DetachGraceSeconds row.
- New tests: TryBeginCloseIfExpired_ReattachedSubscriberWinsRace_DeclinesClose (GatewaySession),
CloseExpiredLeasesAsync_DoesNotCloseSessionThatReattachedBeforeSweepCloses (SessionManager),
plus IsLeaseExpiredCore/IsDetachGraceExpiredCore private helpers used by the guard.
Add two comment-only clarifications to mxaccess_gateway.proto (no field/number changes):
1. MxEvent.replay_gap: states the sentinel is ONLY ever set on StreamEvents events
and is ALWAYS unset on DrainEventsReply events, preventing Task 12 from
accidentally emitting it on the drain path and removing any client ambiguity.
2. ReplayGap.oldest_available_sequence: clarifies that the value IS retained and
replayable, and that a client resumes gap-free by setting
after_worker_sequence = oldest_available_sequence - 1 in the next
StreamEventsRequest (receiving events starting at oldest_available_sequence).
Regenerated Generated/MxaccessGateway.cs (comment-only XML-doc change).
Replace Task.Delay(100) subscriber-attachment races with WaitForSubscriberCountAsync,
a polling gate on GatewaySession.ActiveEventSubscriberCount so Advise and event fan-out
cannot proceed until all subscribers are confirmed registered.
Fix WaitForMessageCountAsync to honor a single CancellationTokenSource deadline across
the poll loop rather than resetting the timeout on each intermediate wakeup.
Add ordering comment in the cancellation test explaining why stream1Task must be awaited
before AllowNextEvent to guarantee sub1 is unregistered before the 2nd event is fanned.
Assert capException.Status.Detail contains "maximum" in the cap test to distinguish
EventSubscriberLimitReached (AllowMultiple=true cap) from EventSubscriberAlreadyActive
(single-subscriber rejection) — both map to ResourceExhausted.
Extract shared ConfigureCommandReply helper and move FakeWorkerProcess to TestSupport/
so both fake-worker test classes reference one definition.
Adds GatewayEndToEndMultiSubscriberTests covering three scenarios
through the real gRPC StreamEvents path with AllowMultipleEventSubscribers=true:
- Fan-out: two concurrent StreamEvents RPCs both receive every event the fake
worker emits, in the same order (WorkerSequence matches, values indexed).
- Independent cancellation: cancelling one subscriber's stream leaves the other
receiving subsequent events; the session stays usable.
- Cap enforcement: with MaxEventSubscribersPerSession=2 a third concurrent
StreamEvents is rejected with gRPC ResourceExhausted while the first two
keep streaming.
Extends RecordingServerStreamWriter<T> with WaitForMessageCountAsync to
allow deterministic bounded-timeout awaits for an N-message count without
fixed sleeps.
Remove the per-call allowMultipleSubscribers param from AttachEventSubscriber and
derive the mode internally from _eventStreaming.AllowMultipleEventSubscribers — the
same source SessionEventDistributor uses for singleSubscriberMode — so the two can
never structurally diverge. The maxSubscribers cap param is kept because
MaxEventSubscribersPerSession lives in SessionOptions, which the session does not hold
directly (only EventOptions flows through SessionEventStreaming).
Other nits:
- SubscriberCount XML doc clarifies it includes internal subscribers and differs from
GatewaySession.ActiveEventSubscriberCount (external/gRPC only).
- SingleSubscriberMode_LoneExternalOverflow test: add Assert.Equal(1, observedSet) guard
before the value assertion so the test cannot pass vacuously if the handler never fired.
- GatewayOptionsValidator.ValidateSessions: add explanatory code comment documenting why
!AllowMultipleEventSubscribers && MaxEventSubscribersPerSession > 1 is NOT rejected as
a hard error (the default config ships with this combination; the cap is simply unused
in single-subscriber mode, not a behavior bug).
- GatewaySession.DetachEventSubscriber: add Debug.Assert before the clamp so a genuine
double-decrement surfaces in debug builds.
Add MaxEventSubscribersPerSession (value 8) to the Sessions block of the
Configuration Shape JSON example in GatewayConfiguration.md, matching the
appsettings.json default the options table already documents. Assert both
MaxEventSubscribersPerSession (8) and MaxPendingCommandsPerSession (128)
defaults in GatewayOptionsTests.OptionsBinding_UsesDesignDefaults.
Remove the hard-rejection of AllowMultipleEventSubscribers=true in GatewayOptionsValidator
(fan-out is now implemented via SessionEventDistributor). Add MaxEventSubscribersPerSession
(default 8, must be >= 1) to SessionOptions, validate it, expose it in
EffectiveSessionConfiguration / GatewayConfigurationProvider, document it in
GatewayConfiguration.md and appsettings.json. Tests cover the no-error path for
AllowMultipleEventSubscribers=true, the 0/-1 rejection, positive pass, and default pass.
(1) GatewaySession.StartDashboardMirror: publish _dashboardMirrorLease and _dashboardMirrorTask
atomically under one _syncRoot section; if the session is already Closing/Closed/Faulted,
dispose the just-created lease and return without starting the mirror task so nothing is orphaned.
(2) WaitUntilAsync test helper: catch OperationCanceledException and call Assert.Fail with the
timeout duration and predicate source text instead of letting the exception propagate raw.
(3) New SessionEventDistributorTests.InternalSubscriberOverflow_HandlerSeesIsOnlySubscriberFalse:
verifies CountExternalSubscribers excludes the internal subscriber, so isOnlySubscriber==false
even when the internal subscriber is the only registered subscriber.
(4) SubscriberOverflowHandler delegate gains isInternal parameter; overflow metric label is
"dashboard-mirror" for internal subscribers and "grpc-event-stream" for external ones.
(5) DashboardEventBroadcaster.Publish: wrap SendAsync Task acquisition in try/catch so a
synchronous throw cannot escape the never-throw Publish interface contract.
- Issue 1: document the isOnlySubscriber snapshot race-safety assumption in
OnSubscriberOverflow; flags the Task 7/8 revisit point explicitly.
- Issue 2: pin StreamDisconnects==1 in the FailFast overflow test so a
regression dropping the StreamDisconnected("Detached") finally call is caught.
- Issue 3: replace plain int/bool? reads in SlowSubscriberOverflow test with
Volatile.Read/Write + Interlocked.Increment stores to close the C# memory
model data race on overflowCalls and observedIsOnlySubscriber.
- Issue 4: add SlowSubscriberOverflow_WithMultipleSubscribers_... distributor
test pinning that isOnlySubscriber==false disables the session-fault path;
includes TODO(Task 8) note for the GatewaySession-level assertion.
- Issue 5: reword SubscriberOverflowHandler XML doc to make explicit that the
handler must NOT complete the subscriber's channel; the distributor owns that.
Issue 1: replace plain bool _disposed in EventSubscriberLease with an
Interlocked.Exchange int (_leaseDisposed) matching the SubscriberLease
pattern in SessionEventDistributor. Concurrent stream-completion +
client-cancellation racing Dispose() now decrements _activeEventSubscriberCount
exactly once, never to -1.
Issue 5: remove the `using` declaration on the subscriber lease in
EventStreamService.StreamEventsAsync; the finally block already disposes it
alongside the reader, so the using was a redundant second dispose on the
same code path.
Issue 2: add an inline comment at the StartAsync().GetAwaiter().GetResult()
call documenting the sync-over-async invariant (StartAsync only schedules via
Task.Run and is synchronous; do not make it truly async without changing
this call site).
Issue 10: remove the redundant .WithCancellation(cancellationToken) chained
on ReadEventsAsync(cancellationToken) in MapWorkerEventsAsync; the
[EnumeratorCancellation] token already flows through the direct argument.
Issue 9: add EventSubscriberLease_ConcurrentDispose_DecrementsCountExactlyOnce
to GatewaySessionTests — 16 concurrent Dispose() calls on the same lease for
200 iterations; asserts count is exactly 0 after each race and a subsequent
single-subscriber AttachEventSubscriber succeeds.