Commit Graph

11 Commits

Author SHA1 Message Date
Joseph Doherty ac42783e36 feat(sessions): multi-subscriber cap enforcement + mode-gated FailFast 2026-06-15 15:32:08 -04:00
Joseph Doherty 2ead9bc200 fix(dashboard): close StartDashboardMirror/DisposeAsync race; internal-overflow test + metric label
(1) GatewaySession.StartDashboardMirror: publish _dashboardMirrorLease and _dashboardMirrorTask
    atomically under one _syncRoot section; if the session is already Closing/Closed/Faulted,
    dispose the just-created lease and return without starting the mirror task so nothing is orphaned.
(2) WaitUntilAsync test helper: catch OperationCanceledException and call Assert.Fail with the
    timeout duration and predicate source text instead of letting the exception propagate raw.
(3) New SessionEventDistributorTests.InternalSubscriberOverflow_HandlerSeesIsOnlySubscriberFalse:
    verifies CountExternalSubscribers excludes the internal subscriber, so isOnlySubscriber==false
    even when the internal subscriber is the only registered subscriber.
(4) SubscriberOverflowHandler delegate gains isInternal parameter; overflow metric label is
    "dashboard-mirror" for internal subscribers and "grpc-event-stream" for external ones.
(5) DashboardEventBroadcaster.Publish: wrap SendAsync Task acquisition in try/catch so a
    synchronous throw cannot escape the never-throw Publish interface contract.
2026-06-15 15:02:36 -04:00
Joseph Doherty 1ea08c3b10 feat(dashboard): mirror events via SessionEventDistributor subscriber (fixes dark feed without gRPC client) 2026-06-15 14:42:32 -04:00
Joseph Doherty 039111ca05 feat(sessions): per-subscriber backpressure isolation in SessionEventDistributor 2026-06-15 13:39:25 -04:00
Joseph Doherty 61627fc5b0 fix(sessions): make EventSubscriberLease dispose atomic; dedupe lease dispose
Issue 1: replace plain bool _disposed in EventSubscriberLease with an
Interlocked.Exchange int (_leaseDisposed) matching the SubscriberLease
pattern in SessionEventDistributor. Concurrent stream-completion +
client-cancellation racing Dispose() now decrements _activeEventSubscriberCount
exactly once, never to -1.

Issue 5: remove the `using` declaration on the subscriber lease in
EventStreamService.StreamEventsAsync; the finally block already disposes it
alongside the reader, so the using was a redundant second dispose on the
same code path.

Issue 2: add an inline comment at the StartAsync().GetAwaiter().GetResult()
call documenting the sync-over-async invariant (StartAsync only schedules via
Task.Run and is synchronous; do not make it truly async without changing
this call site).

Issue 10: remove the redundant .WithCancellation(cancellationToken) chained
on ReadEventsAsync(cancellationToken) in MapWorkerEventsAsync; the
[EnumeratorCancellation] token already flows through the direct argument.

Issue 9: add EventSubscriberLease_ConcurrentDispose_DecrementsCountExactlyOnce
to GatewaySessionTests — 16 concurrent Dispose() calls on the same lease for
200 iterations; asserts count is exactly 0 after each race and a subsequent
single-subscriber AttachEventSubscriber succeeds.
2026-06-15 13:29:27 -04:00
Joseph Doherty 7f1018bac1 feat(sessions): route event streaming through SessionEventDistributor 2026-06-15 13:18:28 -04:00
Joseph Doherty a43b2ee6af test(sessions): cover OwnerKeyId service-layer forwarding; doc 11-param ctor
Add LastOwnerKeyId capture to FakeSessionManager and assert it equals
"operator01" in OpenSession_WithValidRequest_ReturnsSessionDetails, closing
the gap where OwnerKeyId threading through the service layer had no test
coverage. Add a <remarks> to the 11-param GatewaySession convenience ctor
documenting that OwnerKeyId is null there and authenticated call sites must
use the 12-param overload.
2026-06-15 12:29:16 -04:00
Joseph Doherty f5479f3ca3 feat(sessions): record OwnerKeyId on session creation
Add a nullable string? OwnerKeyId property to GatewaySession that captures
the API key identifier (KeyId) of the authenticated caller that opened the
session. Wire it through ISessionManager.OpenSessionAsync → SessionManager
→ GatewaySession constructor. The gRPC service passes identityAccessor
.Current?.KeyId; internal callers (GatewayAlarmMonitor, DashboardLiveDataService)
pass null. Covers the positive and null cases with two new TDD-first tests.
2026-06-15 12:24:29 -04:00
Joseph Doherty 615b487a77 docs+ui: backfill XML doc comments and finish dashboard layout pass
Adds missing <summary>/<param> XML docs across 99 server, worker, and test
files so CommentChecker reports zero issues (TreatWarningsAsErrors needs the
analyzer clean). Bundles in WIP dashboard work: NavSection extraction,
MainLayout/site.css/js styling alignment, and DashboardOptions/Auth tweaks.
2026-05-27 14:20:10 -04:00
Joseph Doherty 4d77279e7e Resolve Server-044..050: KillWorker accounting + admin service hardening
Server-044  KillWorkerAsync catch path now calls _metrics.SessionRemoved
            so the open-session gauge does not leak when KillWorker throws.
Server-045  KillWorkerAsync routes through a new
            GatewaySession.KillWorkerWithCloseGateAsync that takes the
            per-session close lock, so concurrent kills count SessionsClosed
            exactly once.
Server-046  CloseSessionCoreAsync's SessionCloseStartedException branch and
            ShutdownAsync's kill fallback both increment SessionsClosed (not
            just the gauge), so the counter and gauge stay consistent.
Server-047  ApiKeysPage.ConfirmPendingAsync holds PendingAction across the
            awaited action and clears it in finally, matching the sessions
            pages.
Server-048  Closed: the 044/045 regression tests cover the previously-
            untested kill paths.
Server-049  IDashboardSessionAdminService + DashboardSessionAdminService
            now carry XML docs that pin the Admin gate, missing-session
            return-Fail semantics, and the dashboard-admin-kill reason.
Server-050  CloseSessionAsync and KillWorkerAsync catch unexpected
            exceptions after the SessionManagerException catches and return
            a friendly Fail; OperationCanceledException tied to the caller
            token still propagates.

All resolved at 2026-05-24; 503/503 gateway tests pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-24 08:49:34 -04:00
Joseph Doherty dc9c0c950c rename: prefix gateway projects/namespaces with ZB.MOM.WW + sln→slnx
Apply the ZB.MOM.WW. prefix to all gateway-side projects, folders,
.csproj/.sln contents, C# namespaces, using directives, generated proto
C# (csharp_namespace + checked-in generated files), InternalsVisibleTo
attributes, project-name string literals (LoadProject, .sln lookups,
worker exe paths, staticwebassets manifest), and the install/script/doc
references that point at any of the above. Migrate the solution from
.sln to .slnx via `dotnet sln migrate` and delete the old file.

External-runtime identifiers are intentionally NOT prefixed so external
configuration keeps working:
- GatewayMetrics.cs MeterName ("MxGateway.Server")
- DashboardAuthenticationDefaults Scheme/Policy ("MxGateway.Dashboard")
- GatewayRequestLoggingMiddleware logger category ("MxGateway.Request")
- StaRuntime thread name ("MxGateway.Worker.STA")
- appsettings.json root section "MxGateway" + env-var prefix
  MxGateway__... and secret-name MxGateway:ApiKeyPepper
- C:\ProgramData\MxGateway\ data dir paths

Also fixes two tests that were not rename-related but became visible
while validating the rename:

- WorkerLiveMxAccessSmokeTests.ShutDownAsync: cancellation that the
  gateway service correctly maps to RpcException(Cancelled) per gRPC
  convention was being misclassified as a stream fault. Added a sibling
  catch on RpcException with StatusCode.Cancelled.

- IntegrationTestEnvironment.ResolveRepositoryRoot: extracted IsRepositoryRoot
  and made it accept either a .git marker OR a .sln/.slnx next to src/
  so the worker-exe walker works in non-git working copies.

clients/proto/proto-inputs.json's protoRoot updated to point at
src/ZB.MOM.WW.MxGateway.Contracts/Protos.

Verified by `dotnet build` and a full `dotnet test` of the .slnx with
MXGATEWAY_RUN_LIVE_{MXACCESS,LDAP,GALAXY}_TESTS=1:
  Tests: 472/472 pass
  Worker.Tests: 280/280 pass (4 dev-rig [Fact(Skip=...)] skipped)
  IntegrationTests: 18/18 pass

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-23 16:22:23 -04:00