Commit Graph

496 Commits

Author SHA1 Message Date
Joseph Doherty 4ab3bd55e5 fix(server): single-clock poll loop + drop dead sync GetReadyWorkerClient (Task 8 review) 2026-06-16 17:04:18 -04:00
Joseph Doherty 70d2842c16 test(java-cli): add in-process gRPC harness fixture 2026-06-16 16:55:48 -04:00
Joseph Doherty 4966ef3359 feat(server): bounded worker-ready wait in GatewaySession (default off) 2026-06-16 16:48:02 -04:00
Joseph Doherty 0efa7d8cca test(java-cli): cover secured/secured2/bench bulk + close-session 2026-06-16 16:42:28 -04:00
Joseph Doherty ea17528767 feat(server): add MxGateway:Sessions:WorkerReadyWaitTimeoutMs (default off)
Adds WorkerReadyWaitTimeoutMs to SessionOptions (default 0 = disabled),
validates >= 0 in GatewayOptionsValidator, documents it in
GatewayConfiguration.md, and adds validator + default-value tests.
No wait/poll logic is implemented here (that is Task 8).
2026-06-16 16:38:31 -04:00
Joseph Doherty 1cfad83c06 test(java-cli): cover read-bulk/write-bulk/write2-bulk round trips 2026-06-16 16:33:42 -04:00
Joseph Doherty 76ffd5c9a3 docs: implementation plan for stillpending §8 completion (9 tasks) 2026-06-16 16:29:49 -04:00
Joseph Doherty 6030bfa18e docs: design for stillpending §8 completion (Approach C)
Also codify targeted-test-per-task rule in CLAUDE.md Source Update Workflow.
2026-06-16 16:19:49 -04:00
Joseph Doherty 82755a3623 docs(stillpending): reflect session-resilience merge (multi-subscriber resolved, reconnect core) + disable-login flag 2026-06-16 15:53:11 -04:00
Joseph Doherty 121ab7e263 fix(galaxy): include undeployed areas in browse + re-root orphaned objects
The hierarchy query returned deployed objects only (deployed_package_id <> 0), so
areas whose containing area is undeployed were orphaned and hidden from /browse —
on wonder, only the lone deployed root area surfaced. Include category-13 Area
objects regardless of deployment, and in GalaxyHierarchyIndex re-root any object
whose parent is absent from the set (e.g. a deleted container area) so nothing
disappears under a phantom parent id.
2026-06-16 12:49:03 -04:00
Joseph Doherty ca443b1903 test+docs(dashboard): assert hub policy under disable-login; correct warning/doc wording 2026-06-16 09:30:13 -04:00
Joseph Doherty 7a2da4d8b6 chore(plan): mark dashboard disable-login tasks complete 2026-06-16 09:23:50 -04:00
Joseph Doherty a0b21ca225 refactor(dashboard): consistent effective-user in disable-login warning; doc the config read 2026-06-16 09:23:13 -04:00
Joseph Doherty 3690e4c2ca feat(dashboard): swap to auto-login handler when DisableLogin is set 2026-06-16 09:14:27 -04:00
Joseph Doherty 1d652b24c6 refactor(dashboard): normalize auto-login user in ctor; clarify claim-shape doc; add custom-user test 2026-06-16 08:23:14 -04:00
Joseph Doherty ee1423db7a docs: document dashboard DisableLogin / AutoLoginUser dev flag 2026-06-16 08:16:45 -04:00
Joseph Doherty 4993057ed5 feat(dashboard): add auto-login auth handler for DisableLogin mode 2026-06-16 08:14:51 -04:00
Joseph Doherty 073252d7a6 feat(dashboard): add DisableLogin + AutoLoginUser options (default off) 2026-06-16 08:11:10 -04:00
Joseph Doherty a894717319 docs(plan): dashboard disable-login implementation plan + tasks 2026-06-16 07:54:43 -04:00
Joseph Doherty 0856cd4f93 docs: dashboard disable-login design + save session-resilience tasklist snapshot 2026-06-16 07:47:31 -04:00
Joseph Doherty c446bef64f chore(plan): mark session-resilience tasks 1-12 complete 2026-06-16 07:29:17 -04:00
Joseph Doherty c7a7cd1e5e fix(sessions): tidy replay filter/comment; zero OldestAvailableSequence when no gap
- EventStreamService: remove dead per-item sequence guard in the replay
  loop (RegisterWithReplay already returns only events > afterSequence)
  and correct the comment that falsely claimed a "per-item constraint
  filter" is applied; the event stream has no per-event constraint
  filtering today.
- SessionEventDistributor.RegisterWithReplay: set oldestAvailableSequence=0
  when gap==false so the implementation matches the documented contract
  (OldestAvailableSequence is meaningful only when Gap is true).
  Update the two RegisterWithReplay tests that asserted the old non-zero
  value in the no-gap path.
- RegisterSubscriber: remove stray blank line at method entry.
- SessionEventDistributorTests: add RegisterWithReplay_AfterDispose_
  ThrowsObjectDisposedException to pin nested-lock disposal behavior.
2026-06-16 07:28:37 -04:00
Joseph Doherty 36ab8d15f1 feat(sessions): replay-on-reconnect with ReplayGap sentinel 2026-06-16 07:22:19 -04:00
Joseph Doherty 042f5e3d82 fix(sessions): expose DetachGraceSeconds in effective-config; single clock; close reconnect-vs-sweep race
- EffectiveSessionConfiguration: add DetachGraceSeconds field; GatewayConfigurationProvider
  forwards value.Sessions.DetachGraceSeconds (blocker fix).
- GatewaySession.InvokeAsync and ReadEventsAsync: switch TouchClientActivity calls from
  DateTimeOffset.UtcNow to _eventStreaming.TimeProvider.GetUtcNow() so Task 12 fake-clock
  control works end-to-end (split-clock fix).
- TOCTOU fix: add TryBeginCloseIfExpired(now, out alreadyClosing) to GatewaySession that
  re-checks IsLeaseExpiredCore/IsDetachGraceExpiredCore AND _activeEventSubscriberCount==0
  under _syncRoot before transitioning to Closing; CloseExpiredLeasesAsync calls it before
  CloseSessionCoreAsync so a reattach that wins the race leaves the session Ready/usable.
- Minors: lease-expiry-takes-precedence comment in CloseExpiredLeasesAsync; TOCTOU comment
  block; sweep-cycle latency note added to SessionOptions.DetachGraceSeconds XML doc and to
  GatewayConfiguration.md DetachGraceSeconds row.
- New tests: TryBeginCloseIfExpired_ReattachedSubscriberWinsRace_DeclinesClose (GatewaySession),
  CloseExpiredLeasesAsync_DoesNotCloseSessionThatReattachedBeforeSweepCloses (SessionManager),
  plus IsLeaseExpiredCore/IsDetachGraceExpiredCore private helpers used by the guard.
2026-06-16 07:11:59 -04:00
Joseph Doherty db95f8644f feat(sessions): detach-grace retention window for reconnect 2026-06-16 06:15:46 -04:00
Joseph Doherty 85e4334bb7 docs(contracts): clarify ReplayGap drain-exclusion and resume-boundary semantics
Add two comment-only clarifications to mxaccess_gateway.proto (no field/number changes):
1. MxEvent.replay_gap: states the sentinel is ONLY ever set on StreamEvents events
   and is ALWAYS unset on DrainEventsReply events, preventing Task 12 from
   accidentally emitting it on the drain path and removing any client ambiguity.
2. ReplayGap.oldest_available_sequence: clarifies that the value IS retained and
   replayable, and that a client resumes gap-free by setting
   after_worker_sequence = oldest_available_sequence - 1 in the next
   StreamEventsRequest (receiving events starting at oldest_available_sequence).
Regenerated Generated/MxaccessGateway.cs (comment-only XML-doc change).
2026-06-16 05:09:47 -04:00
Joseph Doherty 9beb67c1e9 feat(contracts): add ReplayGap signal to MxEvent for reconnect replay 2026-06-16 05:04:17 -04:00
Joseph Doherty 056bb39a4d test(gateway): deterministic multi-subscriber test sync + cap-rejection specificity
Replace Task.Delay(100) subscriber-attachment races with WaitForSubscriberCountAsync,
a polling gate on GatewaySession.ActiveEventSubscriberCount so Advise and event fan-out
cannot proceed until all subscribers are confirmed registered.

Fix WaitForMessageCountAsync to honor a single CancellationTokenSource deadline across
the poll loop rather than resetting the timeout on each intermediate wakeup.

Add ordering comment in the cancellation test explaining why stream1Task must be awaited
before AllowNextEvent to guarantee sub1 is unregistered before the 2nd event is fanned.

Assert capException.Status.Detail contains "maximum" in the cap test to distinguish
EventSubscriberLimitReached (AllowMultiple=true cap) from EventSubscriberAlreadyActive
(single-subscriber rejection) — both map to ResourceExhausted.

Extract shared ConfigureCommandReply helper and move FakeWorkerProcess to TestSupport/
so both fake-worker test classes reference one definition.
2026-06-15 16:34:12 -04:00
Joseph Doherty 9dd97a27f1 test(gateway): end-to-end multi-subscriber fan-out via fake worker
Adds GatewayEndToEndMultiSubscriberTests covering three scenarios
through the real gRPC StreamEvents path with AllowMultipleEventSubscribers=true:
- Fan-out: two concurrent StreamEvents RPCs both receive every event the fake
  worker emits, in the same order (WorkerSequence matches, values indexed).
- Independent cancellation: cancelling one subscriber's stream leaves the other
  receiving subsequent events; the session stays usable.
- Cap enforcement: with MaxEventSubscribersPerSession=2 a third concurrent
  StreamEvents is rejected with gRPC ResourceExhausted while the first two
  keep streaming.

Extends RecordingServerStreamWriter<T> with WaitForMessageCountAsync to
allow deterministic bounded-timeout awaits for an N-message count without
fixed sleeps.
2026-06-15 16:09:58 -04:00
Joseph Doherty 281e00b300 refactor(sessions): derive subscriber mode from session config; close Task 8 review nits
Remove the per-call allowMultipleSubscribers param from AttachEventSubscriber and
derive the mode internally from _eventStreaming.AllowMultipleEventSubscribers — the
same source SessionEventDistributor uses for singleSubscriberMode — so the two can
never structurally diverge. The maxSubscribers cap param is kept because
MaxEventSubscribersPerSession lives in SessionOptions, which the session does not hold
directly (only EventOptions flows through SessionEventStreaming).

Other nits:
- SubscriberCount XML doc clarifies it includes internal subscribers and differs from
  GatewaySession.ActiveEventSubscriberCount (external/gRPC only).
- SingleSubscriberMode_LoneExternalOverflow test: add Assert.Equal(1, observedSet) guard
  before the value assertion so the test cannot pass vacuously if the handler never fired.
- GatewayOptionsValidator.ValidateSessions: add explanatory code comment documenting why
  !AllowMultipleEventSubscribers && MaxEventSubscribersPerSession > 1 is NOT rejected as
  a hard error (the default config ships with this combination; the cap is simply unused
  in single-subscriber mode, not a behavior bug).
- GatewaySession.DetachEventSubscriber: add Debug.Assert before the clamp so a genuine
  double-decrement surfaces in debug builds.
2026-06-15 15:53:27 -04:00
Joseph Doherty ac42783e36 feat(sessions): multi-subscriber cap enforcement + mode-gated FailFast 2026-06-15 15:32:08 -04:00
Joseph Doherty 7b12eebbd1 docs: add MaxEventSubscribersPerSession to config shape example; assert its default
Add MaxEventSubscribersPerSession (value 8) to the Sessions block of the
Configuration Shape JSON example in GatewayConfiguration.md, matching the
appsettings.json default the options table already documents. Assert both
MaxEventSubscribersPerSession (8) and MaxPendingCommandsPerSession (128)
defaults in GatewayOptionsTests.OptionsBinding_UsesDesignDefaults.
2026-06-15 15:16:44 -04:00
Joseph Doherty bd190ab012 feat(config): allow multiple event subscribers + add MaxEventSubscribersPerSession cap
Remove the hard-rejection of AllowMultipleEventSubscribers=true in GatewayOptionsValidator
(fan-out is now implemented via SessionEventDistributor). Add MaxEventSubscribersPerSession
(default 8, must be >= 1) to SessionOptions, validate it, expose it in
EffectiveSessionConfiguration / GatewayConfigurationProvider, document it in
GatewayConfiguration.md and appsettings.json. Tests cover the no-error path for
AllowMultipleEventSubscribers=true, the 0/-1 rejection, positive pass, and default pass.
2026-06-15 15:13:21 -04:00
Joseph Doherty 2ead9bc200 fix(dashboard): close StartDashboardMirror/DisposeAsync race; internal-overflow test + metric label
(1) GatewaySession.StartDashboardMirror: publish _dashboardMirrorLease and _dashboardMirrorTask
    atomically under one _syncRoot section; if the session is already Closing/Closed/Faulted,
    dispose the just-created lease and return without starting the mirror task so nothing is orphaned.
(2) WaitUntilAsync test helper: catch OperationCanceledException and call Assert.Fail with the
    timeout duration and predicate source text instead of letting the exception propagate raw.
(3) New SessionEventDistributorTests.InternalSubscriberOverflow_HandlerSeesIsOnlySubscriberFalse:
    verifies CountExternalSubscribers excludes the internal subscriber, so isOnlySubscriber==false
    even when the internal subscriber is the only registered subscriber.
(4) SubscriberOverflowHandler delegate gains isInternal parameter; overflow metric label is
    "dashboard-mirror" for internal subscribers and "grpc-event-stream" for external ones.
(5) DashboardEventBroadcaster.Publish: wrap SendAsync Task acquisition in try/catch so a
    synchronous throw cannot escape the never-throw Publish interface contract.
2026-06-15 15:02:36 -04:00
Joseph Doherty 1ea08c3b10 feat(dashboard): mirror events via SessionEventDistributor subscriber (fixes dark feed without gRPC client) 2026-06-15 14:42:32 -04:00
Joseph Doherty 4f43733b96 test(sessions): document overflow race safety + close backpressure coverage gaps
- Issue 1: document the isOnlySubscriber snapshot race-safety assumption in
  OnSubscriberOverflow; flags the Task 7/8 revisit point explicitly.
- Issue 2: pin StreamDisconnects==1 in the FailFast overflow test so a
  regression dropping the StreamDisconnected("Detached") finally call is caught.
- Issue 3: replace plain int/bool? reads in SlowSubscriberOverflow test with
  Volatile.Read/Write + Interlocked.Increment stores to close the C# memory
  model data race on overflowCalls and observedIsOnlySubscriber.
- Issue 4: add SlowSubscriberOverflow_WithMultipleSubscribers_... distributor
  test pinning that isOnlySubscriber==false disables the session-fault path;
  includes TODO(Task 8) note for the GatewaySession-level assertion.
- Issue 5: reword SubscriberOverflowHandler XML doc to make explicit that the
  handler must NOT complete the subscriber's channel; the distributor owns that.
2026-06-15 13:46:37 -04:00
Joseph Doherty 039111ca05 feat(sessions): per-subscriber backpressure isolation in SessionEventDistributor 2026-06-15 13:39:25 -04:00
Joseph Doherty 61627fc5b0 fix(sessions): make EventSubscriberLease dispose atomic; dedupe lease dispose
Issue 1: replace plain bool _disposed in EventSubscriberLease with an
Interlocked.Exchange int (_leaseDisposed) matching the SubscriberLease
pattern in SessionEventDistributor. Concurrent stream-completion +
client-cancellation racing Dispose() now decrements _activeEventSubscriberCount
exactly once, never to -1.

Issue 5: remove the `using` declaration on the subscriber lease in
EventStreamService.StreamEventsAsync; the finally block already disposes it
alongside the reader, so the using was a redundant second dispose on the
same code path.

Issue 2: add an inline comment at the StartAsync().GetAwaiter().GetResult()
call documenting the sync-over-async invariant (StartAsync only schedules via
Task.Run and is synchronous; do not make it truly async without changing
this call site).

Issue 10: remove the redundant .WithCancellation(cancellationToken) chained
on ReadEventsAsync(cancellationToken) in MapWorkerEventsAsync; the
[EnumeratorCancellation] token already flows through the direct argument.

Issue 9: add EventSubscriberLease_ConcurrentDispose_DecrementsCountExactlyOnce
to GatewaySessionTests — 16 concurrent Dispose() calls on the same lease for
200 iterations; asserts count is exactly 0 after each race and a subsequent
single-subscriber AttachEventSubscriber succeeds.
2026-06-15 13:29:27 -04:00
Joseph Doherty 7f1018bac1 feat(sessions): route event streaming through SessionEventDistributor 2026-06-15 13:18:28 -04:00
Joseph Doherty c2c518862f fix(sessions): replay-buffer gap edge cases, effective-config exposure, capacity-0 tests
#2: Replace afterSequence+1<oldestRetained with overflow-safe oldestRetained>0&&afterSequence<oldestRetained-1 to prevent ulong wrap at MaxValue falsely reporting gap=true.
#3: Add ReplayBufferCapacity and ReplayRetentionSeconds to EffectiveEventConfiguration and populate from EventOptions in GatewayConfigurationProvider.
#4: Add four new SessionEventDistributorTests covering capacity=0 gap/no-gap paths and the ulong.MaxValue boundary case.
#5: Update class-level <remarks> to describe the Task 3 replay ring buffer (capacity + age eviction, TryGetReplayFrom) rather than its absence.
#6: Add O(n)-is-acceptable comment at TryGetReplayFrom linear scan.
#8: Narrow no-replay 4-arg ctor to internal; InternalsVisibleTo already covers the test project.
2026-06-15 12:48:11 -04:00
Joseph Doherty e962737d2c feat(sessions): add bounded replay ring buffer to SessionEventDistributor 2026-06-15 12:42:15 -04:00
Joseph Doherty 7773bdebbd fix(sessions): close SessionEventDistributor dispose/register races + add overflow logging 2026-06-15 12:37:39 -04:00
Joseph Doherty c79b292968 feat(sessions): add SessionEventDistributor (pump + per-subscriber fan-out skeleton) 2026-06-15 12:32:13 -04:00
Joseph Doherty a43b2ee6af test(sessions): cover OwnerKeyId service-layer forwarding; doc 11-param ctor
Add LastOwnerKeyId capture to FakeSessionManager and assert it equals
"operator01" in OpenSession_WithValidRequest_ReturnsSessionDetails, closing
the gap where OwnerKeyId threading through the service layer had no test
coverage. Add a <remarks> to the 11-param GatewaySession convenience ctor
documenting that OwnerKeyId is null there and authenticated call sites must
use the 12-param overload.
2026-06-15 12:29:16 -04:00
Joseph Doherty f5479f3ca3 feat(sessions): record OwnerKeyId on session creation
Add a nullable string? OwnerKeyId property to GatewaySession that captures
the API key identifier (KeyId) of the authenticated caller that opened the
session. Wire it through ISessionManager.OpenSessionAsync → SessionManager
→ GatewaySession constructor. The gRPC service passes identityAccessor
.Current?.KeyId; internal callers (GatewayAlarmMonitor, DashboardLiveDataService)
pass null. Covers the positive and null cases with two new TDD-first tests.
2026-06-15 12:24:29 -04:00
Joseph Doherty 00c849e63b docs: session-resilience implementation plan (28 tasks, 5 phases) 2026-06-15 12:15:34 -04:00
Joseph Doherty 3fc6ccad30 docs: session-resilience epic design (fan-out, reconnect, ACL, reattach) 2026-06-15 12:11:46 -04:00
Joseph Doherty 0e4843612b feat(python): add --parent drill-down to galaxy-browse for 5/5 CLI parity
Add --parent-gobject-id (integer) to the galaxy-browse CLI command so the
Python client matches the Go (-parent) and Rust (--parent-gobject-id) CLIs.
When set, drives BrowseChildren paging via browse_children_raw (page size 500,
repeated-token guard) and renders the same JSON node shape (flattened object
fields + hasChildrenHint + empty children array) and indented-text tree as the
root-walk path. --depth is ignored on the parent path with a one-line stderr
warning, matching the Go/Rust behaviour. Tests added in TDD order.
2026-06-15 11:45:44 -04:00
Joseph Doherty a56ce0ddbd docs: refresh stillpending.md for completed work; record residuals (§7/E2)
Mark §1.1 (11 worker commands), §1.2 (audit CorrelationId), and §4 client
CLI/helper parity as Resolved with commit refs; correct §4.4 (dotnet version
already worked). Record open residuals: §1.3 live failover counter, §3.2
multi-sample buffered conversion, §1.4 vendor-stub ack, DrainEvents snapshot
semantics.
2026-06-15 11:35:48 -04:00
Joseph Doherty f7ada90359 test(integration): harden B8 live assertions (ArchestrAUserToId user_id, bootstrap arrival, split guard)
Fix 1 (Important): assert ArchestrAUserToId Ok-path payload carries a non-zero user_id, mirroring the AuthenticateUser pattern.
Fix 2 (Important): assert bootstrapBufferedEvents > 0 before the residual return so the "empty NoData bootstrap arrives" claim is verified, not just assumed.
Fix 3 (Minor): change SplitLiveItemForBuffered guard from lastDot <= 0 to lastDot < 0 so a leading-dot reference ".TestInt" (lastDot==0) is not mis-handled as undotted.
2026-06-15 11:30:15 -04:00