Commit Graph

463 Commits

Author SHA1 Message Date
Joseph Doherty 2ead9bc200 fix(dashboard): close StartDashboardMirror/DisposeAsync race; internal-overflow test + metric label
(1) GatewaySession.StartDashboardMirror: publish _dashboardMirrorLease and _dashboardMirrorTask
    atomically under one _syncRoot section; if the session is already Closing/Closed/Faulted,
    dispose the just-created lease and return without starting the mirror task so nothing is orphaned.
(2) WaitUntilAsync test helper: catch OperationCanceledException and call Assert.Fail with the
    timeout duration and predicate source text instead of letting the exception propagate raw.
(3) New SessionEventDistributorTests.InternalSubscriberOverflow_HandlerSeesIsOnlySubscriberFalse:
    verifies CountExternalSubscribers excludes the internal subscriber, so isOnlySubscriber==false
    even when the internal subscriber is the only registered subscriber.
(4) SubscriberOverflowHandler delegate gains isInternal parameter; overflow metric label is
    "dashboard-mirror" for internal subscribers and "grpc-event-stream" for external ones.
(5) DashboardEventBroadcaster.Publish: wrap SendAsync Task acquisition in try/catch so a
    synchronous throw cannot escape the never-throw Publish interface contract.
2026-06-15 15:02:36 -04:00
Joseph Doherty 1ea08c3b10 feat(dashboard): mirror events via SessionEventDistributor subscriber (fixes dark feed without gRPC client) 2026-06-15 14:42:32 -04:00
Joseph Doherty 4f43733b96 test(sessions): document overflow race safety + close backpressure coverage gaps
- Issue 1: document the isOnlySubscriber snapshot race-safety assumption in
  OnSubscriberOverflow; flags the Task 7/8 revisit point explicitly.
- Issue 2: pin StreamDisconnects==1 in the FailFast overflow test so a
  regression dropping the StreamDisconnected("Detached") finally call is caught.
- Issue 3: replace plain int/bool? reads in SlowSubscriberOverflow test with
  Volatile.Read/Write + Interlocked.Increment stores to close the C# memory
  model data race on overflowCalls and observedIsOnlySubscriber.
- Issue 4: add SlowSubscriberOverflow_WithMultipleSubscribers_... distributor
  test pinning that isOnlySubscriber==false disables the session-fault path;
  includes TODO(Task 8) note for the GatewaySession-level assertion.
- Issue 5: reword SubscriberOverflowHandler XML doc to make explicit that the
  handler must NOT complete the subscriber's channel; the distributor owns that.
2026-06-15 13:46:37 -04:00
Joseph Doherty 039111ca05 feat(sessions): per-subscriber backpressure isolation in SessionEventDistributor 2026-06-15 13:39:25 -04:00
Joseph Doherty 61627fc5b0 fix(sessions): make EventSubscriberLease dispose atomic; dedupe lease dispose
Issue 1: replace plain bool _disposed in EventSubscriberLease with an
Interlocked.Exchange int (_leaseDisposed) matching the SubscriberLease
pattern in SessionEventDistributor. Concurrent stream-completion +
client-cancellation racing Dispose() now decrements _activeEventSubscriberCount
exactly once, never to -1.

Issue 5: remove the `using` declaration on the subscriber lease in
EventStreamService.StreamEventsAsync; the finally block already disposes it
alongside the reader, so the using was a redundant second dispose on the
same code path.

Issue 2: add an inline comment at the StartAsync().GetAwaiter().GetResult()
call documenting the sync-over-async invariant (StartAsync only schedules via
Task.Run and is synchronous; do not make it truly async without changing
this call site).

Issue 10: remove the redundant .WithCancellation(cancellationToken) chained
on ReadEventsAsync(cancellationToken) in MapWorkerEventsAsync; the
[EnumeratorCancellation] token already flows through the direct argument.

Issue 9: add EventSubscriberLease_ConcurrentDispose_DecrementsCountExactlyOnce
to GatewaySessionTests — 16 concurrent Dispose() calls on the same lease for
200 iterations; asserts count is exactly 0 after each race and a subsequent
single-subscriber AttachEventSubscriber succeeds.
2026-06-15 13:29:27 -04:00
Joseph Doherty 7f1018bac1 feat(sessions): route event streaming through SessionEventDistributor 2026-06-15 13:18:28 -04:00
Joseph Doherty c2c518862f fix(sessions): replay-buffer gap edge cases, effective-config exposure, capacity-0 tests
#2: Replace afterSequence+1<oldestRetained with overflow-safe oldestRetained>0&&afterSequence<oldestRetained-1 to prevent ulong wrap at MaxValue falsely reporting gap=true.
#3: Add ReplayBufferCapacity and ReplayRetentionSeconds to EffectiveEventConfiguration and populate from EventOptions in GatewayConfigurationProvider.
#4: Add four new SessionEventDistributorTests covering capacity=0 gap/no-gap paths and the ulong.MaxValue boundary case.
#5: Update class-level <remarks> to describe the Task 3 replay ring buffer (capacity + age eviction, TryGetReplayFrom) rather than its absence.
#6: Add O(n)-is-acceptable comment at TryGetReplayFrom linear scan.
#8: Narrow no-replay 4-arg ctor to internal; InternalsVisibleTo already covers the test project.
2026-06-15 12:48:11 -04:00
Joseph Doherty e962737d2c feat(sessions): add bounded replay ring buffer to SessionEventDistributor 2026-06-15 12:42:15 -04:00
Joseph Doherty 7773bdebbd fix(sessions): close SessionEventDistributor dispose/register races + add overflow logging 2026-06-15 12:37:39 -04:00
Joseph Doherty c79b292968 feat(sessions): add SessionEventDistributor (pump + per-subscriber fan-out skeleton) 2026-06-15 12:32:13 -04:00
Joseph Doherty a43b2ee6af test(sessions): cover OwnerKeyId service-layer forwarding; doc 11-param ctor
Add LastOwnerKeyId capture to FakeSessionManager and assert it equals
"operator01" in OpenSession_WithValidRequest_ReturnsSessionDetails, closing
the gap where OwnerKeyId threading through the service layer had no test
coverage. Add a <remarks> to the 11-param GatewaySession convenience ctor
documenting that OwnerKeyId is null there and authenticated call sites must
use the 12-param overload.
2026-06-15 12:29:16 -04:00
Joseph Doherty f5479f3ca3 feat(sessions): record OwnerKeyId on session creation
Add a nullable string? OwnerKeyId property to GatewaySession that captures
the API key identifier (KeyId) of the authenticated caller that opened the
session. Wire it through ISessionManager.OpenSessionAsync → SessionManager
→ GatewaySession constructor. The gRPC service passes identityAccessor
.Current?.KeyId; internal callers (GatewayAlarmMonitor, DashboardLiveDataService)
pass null. Covers the positive and null cases with two new TDD-first tests.
2026-06-15 12:24:29 -04:00
Joseph Doherty 00c849e63b docs: session-resilience implementation plan (28 tasks, 5 phases) 2026-06-15 12:15:34 -04:00
Joseph Doherty 3fc6ccad30 docs: session-resilience epic design (fan-out, reconnect, ACL, reattach) 2026-06-15 12:11:46 -04:00
Joseph Doherty 0e4843612b feat(python): add --parent drill-down to galaxy-browse for 5/5 CLI parity
Add --parent-gobject-id (integer) to the galaxy-browse CLI command so the
Python client matches the Go (-parent) and Rust (--parent-gobject-id) CLIs.
When set, drives BrowseChildren paging via browse_children_raw (page size 500,
repeated-token guard) and renders the same JSON node shape (flattened object
fields + hasChildrenHint + empty children array) and indented-text tree as the
root-walk path. --depth is ignored on the parent path with a one-line stderr
warning, matching the Go/Rust behaviour. Tests added in TDD order.
2026-06-15 11:45:44 -04:00
Joseph Doherty a56ce0ddbd docs: refresh stillpending.md for completed work; record residuals (§7/E2)
Mark §1.1 (11 worker commands), §1.2 (audit CorrelationId), and §4 client
CLI/helper parity as Resolved with commit refs; correct §4.4 (dotnet version
already worked). Record open residuals: §1.3 live failover counter, §3.2
multi-sample buffered conversion, §1.4 vendor-stub ack, DrainEvents snapshot
semantics.
2026-06-15 11:35:48 -04:00
Joseph Doherty f7ada90359 test(integration): harden B8 live assertions (ArchestrAUserToId user_id, bootstrap arrival, split guard)
Fix 1 (Important): assert ArchestrAUserToId Ok-path payload carries a non-zero user_id, mirroring the AuthenticateUser pattern.
Fix 2 (Important): assert bootstrapBufferedEvents > 0 before the residual return so the "empty NoData bootstrap arrives" claim is verified, not just assumed.
Fix 3 (Minor): change SplitLiveItemForBuffered guard from lastDot <= 0 to lastDot < 0 so a leading-dot reference ".TestInt" (lastDot==0) is not mis-handled as undotted.
2026-06-15 11:30:15 -04:00
Joseph Doherty efd99718d7 test(integration): live COM command smoke + buffered capture (B8) 2026-06-15 11:19:12 -04:00
Joseph Doherty b298ca74be fix(java): picocli ParameterException for browse --depth; warn on --parent 0
Replaces the raw IllegalArgumentException thrown by GalaxyBrowseCommand for
--depth < 0 with a CommandLine.ParameterException so picocli surfaces a clean
single-line error instead of an unhandled stack trace. Adds an upper bound of
50 (matching the Python client) so --depth > 50 is also rejected cleanly.

Emits a stderr warning when --parent 0 is supplied explicitly, matching
Go/Rust client behaviour, because gobject id 0 is the server's root-walk
sentinel and passing it via --parent is almost always a mistake.

Adds three new tests: negative depth, depth > 50, and the --parent 0 warning path.
2026-06-15 11:08:07 -04:00
Joseph Doherty 0d5b488c11 feat(java): add ping + galaxy-browse CLI subcommands and galaxy command aliases
- D4: add 'ping' subcommand (MX_COMMAND_KIND_PING / PingCommand{message}),
  accepting --session-id and optional --message (default "ping"); prints the
  worker's echoed diagnostic message.
- D8-java: add 'galaxy-browse' subcommand over browse()/LazyBrowseNode.expand()
  and raw BrowseChildren paging for --parent. JSON node shape matches the
  cross-client surface (flattened object fields + hasChildrenHint + nested
  children array).
- D9-java: make galaxy-test-connection / galaxy-last-deploy the primary names,
  keeping galaxy-test / galaxy-deploy-time as deprecated picocli aliases.
- Tests for ping, galaxy-browse JSON hasChildrenHint key, and alias resolution.
- README updated for the new/renamed subcommands.
2026-06-15 10:58:04 -04:00
Joseph Doherty bb5139fec2 test(gateway): fake worker responds to control commands (A6)
Add RespondToControlCommandAsync to FakeWorkerHarness so scripted fake
workers can auto-reply to the five control command kinds (Ping,
GetSessionState, GetWorkerInfo, DrainEvents, ShutdownWorker) with canned
replies whose shapes match the real WorkerPipeSession helpers.

Add five unit tests in FakeWorkerHarnessTests covering each control
command kind through the WorkerClient→pipe roundtrip, and one gateway
E2E test (GatewayService_WithFakeWorker_ControlCommandsRoundtripThroughGateway)
that exercises Ping, GetWorkerInfo, and DrainEvents through the full
gRPC→SessionManager→WorkerClient→named-pipe path using a scripted
ControlCommandFakeWorkerProcessLauncher.
2026-06-15 10:56:56 -04:00
Joseph Doherty dde9934e60 test(worker): silence CS0649 on reflection-only FakeMxStatus fields 2026-06-15 10:42:59 -04:00
Joseph Doherty 29399325d5 feat(worker): implement 6 MXAccess COM commands in executor
Wire up the previously-unimplemented Suspend, Activate, AuthenticateUser,
ArchestrAUserToId, AddBufferedItem, and SetBufferedUpdateInterval command
kinds in MxAccessCommandExecutor. These are real COM calls and run on the
STA via the executor.

- IMxAccessServer gains the 6 methods; MxAccessComServer routes them to the
  right interface version (Suspend/Activate -> ILMXProxyServer4 out MxStatus,
  AuthenticateUser -> base ILMXProxyServer, ArchestrAUserToId ->
  ILMXProxyServer2, AddBufferedItem/SetBufferedUpdateInterval ->
  ILMXProxyServer5).
- Suspend/Activate surface the native MxStatus, converted to MxStatusProxy
  via the existing MxStatusProxyConverter.
- AuthenticateUser hands the credential straight to MXAccess and never logs
  it; native HResult failures propagate via the dispatcher.
- MxAccessSession gains matching pass-throughs; AddBufferedItem registers
  the item handle in the handle registry.
- Unit tests (fake IMxAccessServer / fake COM object) cover each arm plus a
  password-non-leak assertion; existing IMxAccessServer fakes updated.

No proto changes (all request/reply messages already exist).
2026-06-15 10:41:22 -04:00
Joseph Doherty f94c206489 test(worker): use Register (a real STA command) for STA-dispatch race tests
Ping is now intercepted as a worker control command and answered on the
message-loop thread, so the dispatch/heartbeat/shutdown-race tests must use a
genuine STA-dispatched command kind to keep exercising DispatchAsync.
2026-06-15 10:25:34 -04:00
Joseph Doherty 72e1aca716 test(worker): fix control-command test helpers (CreatePipeSession overload, drop ConfigureAwait) 2026-06-15 10:23:45 -04:00
Joseph Doherty bf72cd8961 feat(worker): implement Ping/GetSessionState/GetWorkerInfo/DrainEvents/ShutdownWorker control commands
Answer the five worker control/lifecycle commands at the WorkerPipeSession
message-loop layer instead of the STA-bound MxAccessCommandExecutor. These
replies are built from process-level state (worker pid, assembly version,
worker lifecycle, the runtime session's event queue) the executor cannot see,
and ShutdownWorker must emit its OK reply before the graceful shutdown joins
the STA thread - dispatching it onto the STA would deadlock.

- Ping: OK reply, echoes message into diagnostic_message.
- GetSessionState: maps WorkerState to proto SessionState.
- GetWorkerInfo: pid, worker version, MXAccess ProgID/CLSID.
- DrainEvents: drains the runtime event queue into DrainEventsReply.
- ShutdownWorker: OK reply, then graceful shutdown, then stops the loop.

Tests added in WorkerPipeSessionTests; FakeRuntimeSession gains a
batch-size drain suppressor so DrainEvents does not race the background
drain loop.
2026-06-15 10:20:51 -04:00
Joseph Doherty 5a7f8ace77 fix(go): use hasChildrenHint JSON key for browse parity; warn on -parent 0
Rename the browse JSON key from hasChildren to hasChildrenHint to match the
Rust and Python CLIs and the library property name (HasChildrenHint). Update
the text-output label to match. Add a one-line stderr warning when -parent 0
is supplied, since 0 is the server root sentinel and omitting -parent is the
intended way to walk from the root.
2026-06-15 10:09:38 -04:00
Joseph Doherty c10faa2ee5 fix(dotnet): use hasChildrenHint JSON key for browse cross-client parity
Rename the anonymous-object member `hasChildren` → `hasChildrenHint` so the
serialized JSON key matches the Rust and Python CLIs and the library property
HasChildrenHint. Also update the text-output suffix to `hasChildrenHint=` for
consistency.
2026-06-15 10:09:36 -04:00
Joseph Doherty 7975b09325 fix(python): bound galaxy-browse --depth; assert no _text leak in JSON
Guard _galaxy_browse against unbounded recursion by rejecting --depth
values outside [0, 50] with a descriptive BadParameter. Add test coverage
for --depth 99 and --depth -1 rejection, and assert _text is never
present in the JSON output from galaxy-browse.
2026-06-15 10:09:30 -04:00
Joseph Doherty d7e2a8b3cf feat(dotnet): add galaxy-browse CLI (§4.6); chore: verify version subcommand (§4.4) 2026-06-15 10:07:24 -04:00
Joseph Doherty 39ec2a3275 feat(python): add galaxy-browse CLI subcommand (§4.6) 2026-06-15 10:00:52 -04:00
Joseph Doherty 8cb416ba30 feat(go): add galaxy-browse CLI subcommand (§4.6) 2026-06-15 10:00:36 -04:00
Joseph Doherty 55526d5e56 fix(gateway): preserve raw client correlation id in denial audit DetailsJson + add wiring test (§1.2) 2026-06-15 09:56:24 -04:00
Joseph Doherty a59fc998e3 fix(python): UTC-normalize galaxy-last-deploy output, add deploy-event collector, help text, test 2026-06-15 09:53:01 -04:00
Joseph Doherty 539e6ef2de fix(rust): warn when browse --depth ignored, extract page-size const, tidy clones
Warn on stderr when --parent-gobject-id and --depth>0 are both supplied
since depth is silently ignored in the single-level parent path. Also
updates the --depth arg doc to document this. Extracts BROWSE_PAGE_SIZE
const (500) with a cross-reference to galaxy.rs instead of a bare literal.
Removes three redundant .clone() calls in BrowseChildrenOptions construction
since the originals are not used after the struct is built.
2026-06-15 09:52:24 -04:00
Joseph Doherty 742ced7970 test(go): assert ping echo in JSON output; comment ping fallback
TestRunPingJSON now verifies the fake gateway's echoed text appears in
the serialised reply body, catching any future wiring regression that
maps PingRaw to the wrong proto field.  runPing gains a one-line comment
explaining why DiagnosticMessage carries the echo, why the kind-string
fallback exists, and why writeCommandOutput is not reused on the
plain-text path.
2026-06-15 09:52:13 -04:00
Joseph Doherty bd46ba1270 fix(test): drop removed logger arg from GalaxyRepositoryGrpcService test call sites; docs: STA phrasing
Remove the trailing NullLogger<GalaxyRepositoryGrpcService>.Instance argument
from all four CreateService/inline constructions in GalaxyRepositoryGrpcServiceTests
and GalaxyFilterInputSafetyTests, matching the now-4-param constructor after the
dead logger parameter was removed in 0032d2d. Also drop the now-unused
Microsoft.Extensions.Logging.Abstractions using from both files.

Rephrase the §5 STA blurb in docs/AlarmClientDiscovery.md: GatewayAlarmMonitor
routes polling *through* the worker's StaRuntime (which owns the STA pump) rather
than owning the pump itself.
2026-06-15 09:52:07 -04:00
Joseph Doherty 0032d2dc44 docs+chore: fix stale prose, project names, remove dead MapSqlException (§7)
- docs/plans/2026-06-14-deferred-followups.md: mark D1 as executed
  (commit 4af24b9; metric emitted at DashboardSnapshotService.cs:198);
  note D2 resolved as no-op; D3-D5 remain pending
- docs/AlarmClientDiscovery.md §5: rewrite STA "production fix needed"
  to past tense — alarms now route through GatewayAlarmMonitor/worker STA
- EventsHub.cs: replace stale "publisher side is a future follow-up"
  comment; DashboardEventBroadcaster is live and DI-registered
- CLAUDE.md: fix all project-name drift (src/MxGateway.* →
  src/ZB.MOM.WW.MxGateway.*; MxGateway.sln → ZB.MOM.WW.MxGateway.slnx;
  clients/dotnet/MxGateway.Client.sln → ZB.MOM.WW.MxGateway.Client.slnx)
- GalaxyRepositoryGrpcService.cs: remove dead MapSqlException method and
  its IDE0051 suppression pragma; drop now-unused ILogger ctor param and
  Microsoft.Data.SqlClient using; build confirmed 0 warnings/errors
2026-06-15 09:43:00 -04:00
Joseph Doherty 8415f35abd feat(gateway): thread ClientCorrelationId into constraint-denial audit (§1.2) 2026-06-15 09:42:40 -04:00
Joseph Doherty 639e36b1bc feat(rust): add browse CLI subcommand (§4.6) 2026-06-15 09:42:16 -04:00
Joseph Doherty 90529dce6e feat(go): add ping CLI subcommand (§4.3)
Add PingRaw to Session (session.go), runPing to the CLI dispatch
(main.go), and three tests covering plain-text echo, JSON output,
and missing-session-id validation (main_test.go). Default message
is "ping"; gateway echo is read from DiagnosticMessage, falling
back to the kind string if absent.
2026-06-15 09:41:40 -04:00
Joseph Doherty a211faefed feat(python): add galaxy-* CLI commands (§4.2) 2026-06-15 09:40:55 -04:00
Joseph Doherty 849f1d2f6d feat(go): add single-shot Write2 session helper (§4.1)
Add Write2/Write2Raw to the Go client Session, mirroring the existing
Write/WriteRaw pair, so all five language clients now expose write2.
Includes three TDD tests covering payload propagation, raw-reply return,
and nil-value rejection.
2026-06-15 09:40:15 -04:00
Joseph Doherty 883557fc8a docs: implementation plan for stillpending.md completion
28 tasks across 5 workstreams (A worker control cmds, B worker COM cmds,
C audit CorrelationId, D client CLI parity, E docs). Zero proto changes;
worker net48/x86 + Java on windev, rest local.
2026-06-15 09:35:50 -04:00
Joseph Doherty 4a00b1bdc1 docs: design for completing stillpending.md actionable items
Covers the 11 worker command kinds (§1.1), audit CorrelationId threading
(§1.2), client CLI/helper parity (§4), and doc hygiene (§7). Key finding:
all 11 commands already have proto/validation/scope/routing, so this is a
worker-executor + COM-wrapper + client-CLI effort with zero contract changes.
2026-06-15 09:32:01 -04:00
Joseph Doherty c7f754c77b fix(client/python): cap setuptools<77 so dist stays metadata 2.2 for Gitea PyPI feed; proprietary via classifier clients/go/v0.1.1 2026-06-15 05:30:22 -04:00
Joseph Doherty 144c293f05 chore(clients): bump all five clients 0.1.0 -> 0.1.1 for release 2026-06-15 05:07:17 -04:00
Joseph Doherty 7c957908f8 docs(code-review): regenerate index — all re-review findings resolved 2026-06-15 03:04:37 -04:00
Joseph Doherty 6659653673 fix(client/java): handle PROVIDER_STATUS alarm-feed arm — CLI build break (Client.Java-039) 2026-06-15 03:01:13 -04:00
Joseph Doherty 75a39f5a8c fix(client/java): correct browseChildrenRaw README; CLI --require-certificate-validation (Client.Java-037,038) 2026-06-15 02:56:15 -04:00