Compare commits

...

33 Commits

Author SHA1 Message Date
Joseph Doherty 6a4833bd32 Regenerate code-reviews index after Medium findings Batch B
Reflects resolution of Tests-003..006, Worker.Tests-003..007,
IntegrationTests-003..006, Client.Python-003/005/009.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-18 21:45:29 -04:00
Joseph Doherty e4fbbb541a Resolve Client.Python-003, -005, -009 code-review findings
Client.Python-003: stream_events_raw and query_active_alarms passed `timeout`
to the stub with no TypeError fallback, unlike _unary. Both now route through
a shared _open_stream helper that strips `timeout` on TypeError.

Client.Python-005: discover_hierarchy buffered the entire Galaxy hierarchy in
memory. Added GalaxyRepositoryClient.iter_hierarchy, a lazy async generator
yielding objects page-by-page; discover_hierarchy is now a thin wrapper that
preserves its list contract. README documents iter_hierarchy.

Client.Python-009: added regression coverage for previously untested paths —
write2/add_item2 request shape, the MAX_BULK_ITEMS boundary, the None-argument
TypeError guards, TLS ca_file reading, and the non-auth map_rpc_error fallthrough.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-18 21:45:16 -04:00
Joseph Doherty f13f35bc79 Resolve IntegrationTests-003..006 code-review findings
IntegrationTests-003: the live MXAccess smoke test asserted on the first
streamed event, which a registration/quality bootstrap event could occupy.
The recording writer now waits for the first event matching a predicate
(Family == OnDataChange).

IntegrationTests-004: the cleanup `finally` could throw and mask an original
assertion failure. Shutdown now routes through a helper that logs cleanup
exceptions instead of propagating them.

IntegrationTests-005: added live MXAccess parity tests — a Write round-trip
to an advised item, and an invalid-handle command surfacing the MXAccess
failure without a transport fault.

IntegrationTests-006: added live LDAP failure-path tests — wrong password
(no password leak), unknown username, and server-unreachable.

docs/GatewayTesting.md updated to describe the new cases.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-18 21:45:11 -04:00
Joseph Doherty 18ce2922e2 Resolve Worker.Tests-003..007 code-review findings
Worker.Tests-003: removed the wall-clock `Elapsed < 2s` assertion from
InvokeAsync_WakesIdlePumpForQueuedCommand; the awaited completion against a
30s idle period already proves the wake event drove dispatch.

Worker.Tests-004: MxAccessStaSession.Dispose now joins the alarm poll task
after cancelling the CTS (consistent with ShutdownGracefullyAsync), and
Dispose_StopsAlarmPollLoop asserts deterministically instead of via Task.Delay.

Worker.Tests-005: undisposed MemoryStream instances across the frame-protocol
and pipe-session tests are now `using` declarations.

Worker.Tests-006: Dispose_StopsAlarmPollLoop now constructs MxAccessStaSession
with `using` so a failed assertion cannot leak the STA poll loop.

Worker.Tests-007: docs/WorkerFrameProtocol.md verification section corrected
to target MxGateway.Worker.Tests / MxGateway.Worker with -p:Platform=x86.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-18 21:45:01 -04:00
Joseph Doherty 5ade3f4f48 Resolve Tests-003, -004, -005, -006 code-review findings
Tests-003: temp auth-DB directories leaked under %TEMP%. Added the
TempDatabaseDirectory IDisposable helper (clears the Sqlite connection pool,
then recursively deletes); SqliteAuthStoreTests and ApiKeyAdminCliRunnerTests
now dispose every directory they create.

Tests-004: added end-to-end coverage composing the real authorization
interceptor in front of the real MxAccessGatewayService, plus scope-resolver
tests confirming an unmapped request type fails closed to the admin scope.

Tests-005: added coverage for a worker faulting mid-command — a pipe
disconnect and a worker fault while an InvokeAsync is in flight both fail the
pending invoke. No product change needed.

Tests-006 (re-triaged): the flaky ReadLoop_WhenClientFaults_KillsOwnedWorkerProcess
is a test race, not a product bug — the kill runs synchronously inside
SetFaulted. Rewrote it to await FakeWorkerProcess exit deterministically, and
replaced fixed Task.Delay timing in the late-reply and heartbeat tests with
FIFO ordering and an injected ManualTimeProvider.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-18 21:44:55 -04:00
Joseph Doherty 98f9b7792b Regenerate code-reviews index after Medium findings Batch A
Reflects resolution of Server-002/004/005/006, Worker-004..008,
Client.Dotnet-001/002/003, Client.Go-002/003, Client.Java-001..005.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-18 21:32:02 -04:00
Joseph Doherty ff41556b9a Resolve Client.Java-001..005 code-review findings
Client.Java-001: redactApiKey echoed the last 4 secret characters. It now
keeps only the non-secret mxgw_<key-id>_ prefix plus ***; non-gateway-shaped
tokens return <redacted>.

Client.Java-002: a close() after a queue-overflow could wipe the enqueued
overflow exception. Terminal transitions are now serialized through a single
guarded terminate() — first terminal condition wins.

Client.Java-003: openSession never read gateway_protocol_version. Both
openSession paths now call ensureGatewayProtocolCompatible, rejecting a
non-zero mismatch and accepting unset (0) for older gateways.

Client.Java-004: register/addItem/addItem2 fell back to a return_value that
silently yields 0 when unset. The fallback is now guarded by hasReturnValue()
and throws on a protocol violation.

Client.Java-005: close() in try-with-resources could mask the body exception
when the CloseSession RPC failed. close() now catches and logs the
close-time failure; closeRaw() still surfaces it for callers that want it.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-18 21:31:46 -04:00
Joseph Doherty f88a029ecc Resolve Client.Go-002, -003 code-review findings
Client.Go-002: the Events/EventsAfter compatibility path silently dropped
events when the 16-slot results channel filled — it cancelled the stream and
closed the channel with no error delivered. sendEventResult now evicts an
old buffered event and delivers a terminal EventResult carrying the new
exported ErrEventBufferOverflow before close, so the overflow is observable.

Client.Go-003: parseInt32List panicked on a malformed -item-handles token,
crashing the CLI with a stack trace. It now returns an error that
runUnsubscribeBulk propagates, exiting 2 with a clean message.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-18 21:31:36 -04:00
Joseph Doherty 8023eccfa6 Resolve Client.Dotnet-001, -002, -003 code-review findings
Client.Dotnet-001: MapRpcException typed only Unauthenticated and
PermissionDenied; every other gRPC status collapsed to an untyped exception
with the status code discarded. Added a nullable StatusCode to
MxGatewayException, extracted the duplicated mappers into a shared
RpcExceptionMapper that records the code for every status, and documented it.

Client.Dotnet-002: the production retry branch (MxGatewayException wrapping
RpcException) was never exercised. FakeGatewayTransport gained a
MapTransportExceptions mode that runs thrown RpcExceptions through
RpcExceptionMapper exactly as the production transport does.

Client.Dotnet-003: MxGatewaySession.DisposeAsync disposed _closeLock while a
concurrent CloseAsync could be parked in WaitAsync. DisposeAsync now drains
in-flight CloseAsync callers before disposing the semaphore; the client's
_disposed flag is accessed via Interlocked.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-18 21:31:33 -04:00
Joseph Doherty 54325343bd Resolve Worker-004, -005, -006, -007, -008 code-review findings
Worker-004: post-watchdog-fault heartbeats reported a non-faulted state.
ReportWatchdogFaultIfNeededAsync now sets _state = Faulted before writing
the StaHung fault.

Worker-005 (re-triaged): the cited OnPoll site was removed by Worker-001;
the real silent-failure bug was in MxAccessStaSession.RunAlarmPollLoopAsync,
which caught only graceful-stop exceptions. A failing PollOnce now records a
WorkerFault on the event queue instead of vanishing on a non-awaited task.

Worker-006: RunAsync's finally skipped runtime disposal when shutdown timed
out, leaking the STA thread and COM object. It now always disposes
(MxAccessStaSession.Dispose is idempotent and bounded).

Worker-007 (re-triaged): replaced MxAccessComServer's Type.InvokeMember
reflection fallback with an IMxAccessServer fast path plus typed
ILMXProxyServer* casts; a non-conforming object now fails fast.

Worker-008: alarm consumer STA affinity was unenforced. MxAccessStaSession
records the alarm consumer's STA thread id and asserts every PollOnce runs
on it via a unit-testable guard.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-18 21:31:23 -04:00
Joseph Doherty 1d9e3afadd Resolve Server-002, -004, -005, -006 code-review findings
Server-002: the gateway never terminated leftover MxGateway.Worker.exe
processes at startup, contradicting gateway.md and CLAUDE.md. Added
IRunningProcessInspector/SystemRunningProcessInspector, OrphanWorkerTerminator,
and OrphanWorkerCleanupHostedService (best-effort, runs before sessions are
accepted); updated gateway.md to describe the implemented behavior.

Server-004: API-key scopes were persisted verbatim with no validation. Added
GatewayScopes.All/IsKnown; the CLI parser and dashboard create path now
reject unknown scope strings.

Server-005: a non-SqlException/InvalidOperationException fault on the initial
Galaxy hierarchy load faulted the BackgroundService. ExecuteAsync now catches
all non-cancellation exceptions on first load and RefreshCoreAsync broadens
its catch so the cache records Stale/Unavailable instead.

Server-006: OpenSessionAsync incremented the open-sessions gauge before
alarm auto-subscribe; an auto-subscribe failure leaked the gauge. The catch
path now calls SessionRemoved() when the gauge was incremented.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-18 21:31:10 -04:00
Joseph Doherty 5e795aeeb8 Regenerate code-reviews index after High/Critical resolution batch
Reflects the resolution of Tests-001/002, IntegrationTests-001/002,
Client.Go-001, Worker-001/002/003 and Worker.Tests-001/002.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-18 21:08:06 -04:00
Joseph Doherty 1b4dcf32d5 Resolve Worker.Tests-001 and Worker.Tests-002 code-review findings
Worker.Tests-001: StaMessagePump had no direct unit test. Added
Sta/StaMessagePumpTests.cs — 8 STA-thread facts covering WaitForWorkOrMessages
(wake-event signalled before/during the wait, timeout expiry, zero-timeout
fast path, the QS_ALLINPUT posted-message wake path) and PumpPendingMessages
drain counting.

Worker.Tests-002: no test drove a COM event through the integrated
sink -> mapper -> queue path. Added MxAccess/MxAccessBaseEventSinkTests.cs —
5 facts driving OnDataChange, OnWriteComplete, OperationComplete and
OnBufferedDataChange through a real MxAccessBaseEventSink + mapper + queue and
asserting the converted WorkerEvent lands in MxAccessEventQueue. The four COM
event handlers were widened private -> internal and InternalsVisibleTo for
MxGateway.Worker.Tests was added, mirroring MxAccessAlarmEventSink's existing
test seam; no worker behavior changes.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-18 21:07:48 -04:00
Joseph Doherty 53e3973209 Resolve Worker-001, Worker-002, Worker-003 code-review findings
Worker-001: WnWrapAlarmConsumer armed a System.Threading.Timer whose OnPoll
callback ran GetXmlCurrentAlarms2 on a thread-pool thread against the
Apartment-threaded wnwrap COM object, which can deadlock on cross-apartment
marshaling. Removed the pollTimer/pollIntervalMs fields, OnPoll, the
poll-interval constructor parameter, and the timer arm/disposal. Polls are
driven externally by the STA via StaRuntime.InvokeAsync(PollOnce).

Worker-002: RunHeartbeatLoopAsync delayed a full HeartbeatInterval before
the first heartbeat. Restructured so the first beat is sent immediately on
entering the loop and the delay applies only between subsequent beats.

Worker-003: ProcessCommandAsync silently returned without a reply when
_state was not a command-serving state after dispatch. Both drop sites now
log a WorkerCommandResultDropped diagnostic with correlation_id via
IWorkerLogger; _state is now volatile.

Three pre-existing tests that asserted strict frame ordering were updated to
tolerate an interleaved first heartbeat (Worker-002 consequence).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-18 20:59:46 -04:00
Joseph Doherty e967e85973 Resolve Client.Go-001 code-review finding
MxAccessError.Unwrap returned e.Command directly; on the HRESULT-only path
Command is a nil *CommandError, so Unwrap returned a non-nil error wrapping
a typed nil and errors.As bound a nil *CommandError. Unwrap now returns an
untyped nil when Command is nil. Added errors_test.go regression coverage
for the HRESULT-only and populated-Command paths.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-18 20:46:12 -04:00
Joseph Doherty bc55396334 Resolve IntegrationTests-001 and IntegrationTests-002 code-review findings
IntegrationTests-001: documented the live Galaxy Repository test suite and
its MXGATEWAY_RUN_LIVE_GALAXY_TESTS / MXGATEWAY_LIVE_GALAXY_CONN gating in
docs/GatewayTesting.md.

IntegrationTests-002: documented the live LDAP test suite in
docs/GatewayTesting.md and added a concrete "Provisioning the GwAdmin group"
step to glauth.md so DashboardLdapLiveTests' GwAdmin-membership assumption
is reproducible.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-18 20:46:09 -04:00
Joseph Doherty b381bfcaf1 Resolve Tests-001 and Tests-002 code-review findings
Tests-001: FakeSessionManager.TryGetSession unconditionally synthesized a
session, so Invoke_WhenSessionMissing_ThrowsNotFound did not actually
verify the missing-session path. Added ResolveOnlySeededSessions/SeedSession
to the fake, rewrote the missing-session test, and added seeded-resolution
and alarm-RPC missing-session coverage.

Tests-002: re-triaged. GalaxyRepository issues only constant SQL; filters
are applied in-memory by GalaxyHierarchyProjector/GalaxyGlobMatcher. Kept
as a valid coverage gap and added GalaxyFilterInputSafetyTests exercising
filter/glob input safety directly.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-18 20:46:02 -04:00
Joseph Doherty 2a635c8522 Add code-reviews/prompt.md orchestration prompt
Reusable prompt for working the code-reviews/ backlog: batches one
subagent per module, TDD per finding, per-module commits, regenerates
the index. Adapted to mxaccessgw toolchains and module layout.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-18 17:18:39 -04:00
Joseph Doherty 9082e504a9 Mark Client.Rust findings resolved
All eleven Client.Rust findings are fixed in 0d8a28d; their Status is
now Resolved with the fixing commit recorded. Adds Client.Rust-012 —
an additional clippy::clone_on_copy violation in galaxy.rs found while
verifying that `cargo clippy -- -D warnings` passes — already Resolved
in the same commit. Regenerates code-reviews/README.md.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-18 17:10:30 -04:00
Joseph Doherty 0d8a28d2fe Fix all MxGateway.Client.Rust code-review findings
Resolves Client.Rust-001 through Client.Rust-011.

Build/test/clippy gate (Client.Rust-001/002/003):
- options.rs: doc comments on with_max_grpc_message_bytes /
  max_grpc_message_bytes (#![warn(missing_docs)])
- session.rs: rename BulkReplyKind variants to drop the shared `Bulk`
  suffix (clippy::enum_variant_names)
- galaxy.rs: deref instead of clone on Option<Timestamp>
  (clippy::clone_on_copy — an extra violation the gate also hit)
- mxgw-cli: assert version_json against GATEWAY/WORKER_PROTOCOL_VERSION
  constants instead of the stale literal 2
`cargo clippy --workspace --all-targets -- -D warnings` now passes.

Correctness / error handling:
- version.rs: CLIENT_VERSION = env!("CARGO_PKG_VERSION") (Client.Rust-004)
- session.rs: register/add_item/add_item2 handle extractors and
  bulk_results now return Err(Error::MalformedReply) instead of a
  silent 0 / empty vec on a shapeless OK reply (Client.Rust-005/006)
- error.rs: new Error::Unavailable classifies Code::Unavailable /
  ResourceExhausted as transient (Client.Rust-010)
- session.rs: per-call unique correlation ids via an atomic counter
  (Client.Rust-011)

Other:
- value.rs: MxValue/MxArrayValue compute the projection on demand
  instead of caching it, so a wire-only value pays no projection cost
  (Client.Rust-008)
- RustClientDesign.md: correct the crate layout, drop the unused
  `tracing` dependency (Client.Rust-007)
- client_behavior.rs: tests for the bulk-size cap, a mid-stream status
  fault, and the unreadable-CA-file path (Client.Rust-009)

cargo fmt / test --workspace (27 tests) / clippy all pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-18 17:08:55 -04:00
Joseph Doherty f0a4af62b9 Review the clients/ language clients; mark Server-001/003 resolved
Adds per-module code reviews for the five language clients under
clients/ (Client.Dotnet, Client.Go, Client.Java, Client.Python,
Client.Rust) at commit 3cc53a8 — 53 findings (4 High, 15 Medium,
34 Low; all Open). Extends REVIEW-PROCESS.md so a "module" may also be
a language client under clients/, not only a src/ project.

Marks Server-001 (Critical) and Server-003 (High) Resolved — fixed in
a8aafdf — and regenerates code-reviews/README.md (now 11 modules).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-18 16:51:00 -04:00
Joseph Doherty a8aafdf974 Enforce dashboard authorization on all component routes
Fixes code-review findings Server-001 (Critical) and Server-003 (High).

Server-001: the dashboard Razor components were mapped with no
authorization policy, so every dashboard page — including the API Keys
page — was reachable unauthenticated. MapRazorComponents<App>() now
requires DashboardAuthenticationDefaults.AuthorizationPolicy;
unauthenticated requests are challenged by the cookie scheme and
redirected to the login page.

Server-003: DashboardAuthenticator.CreatePrincipal never issued the
'scope' claim that DashboardAuthorizationHandler checks when
Dashboard:RequireAdminScope is enabled, so enforcing the policy would
have denied every LDAP login. CreatePrincipal (reached only after the
required-group check passes) now emits the admin scope claim.

Replaces the GatewayApplicationTests case that asserted dashboard
routes allow anonymous access — it encoded the bug as expected
behavior — with tests that verify component routes require the policy
and the login/logout/denied endpoints allow anonymous.

All 309 MxGateway.Tests pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-18 16:45:29 -04:00
Joseph Doherty 3cc53a8c69 Harden code-review tooling and align REVIEW-PROCESS.md with mxaccessgw
- regen-readme.py: use `python` not the broken `python3` Store alias in
  the generated note and docstring; --check now also fails when a module
  header's "Open findings" count disagrees with finding statuses or a
  finding has an unrecognised Status (find_inconsistencies)
- REVIEW-PROCESS.md: rewritten for mxaccessgw (was describing ScadaLink)
  — MxGateway.* modules, "mxaccessgw conventions" checklist category,
  gateway.md/docs/ design context, `python` command
- scripts/check-code-reviews-readme.ps1: CI/pre-commit wrapper for
  regen-readme.py --check
- code-reviews/test_regen_readme.py: dependency-free parser tests
- code-reviews/README.md: regenerated

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-18 16:36:25 -04:00
Joseph Doherty ae164ea34f Add per-module code review tree under code-reviews/
Set up the code review process scaffolding adapted to mxaccessgw and
record a full per-module review of every src/MxGateway.* project at
commit 6c64030.

- code-reviews/_template/findings.md: per-module findings template
- code-reviews/regen-readme.py: generates README.md from findings.md
  files; --check fails if stale
- code-reviews/<Module>/findings.md: reviews for Contracts, Server,
  Worker, Tests, Worker.Tests, IntegrationTests (74 findings:
  1 Critical, 10 High, 23 Medium, 40 Low; all Open)
- code-reviews/README.md: generated cross-module index
- REVIEW-PROCESS.md: review process document

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-18 16:28:50 -04:00
Joseph Doherty 6c640306e5 Merge branch 'fix/alarm-sta-wiring'
Close the two alarm production gaps: wire alarmCommandHandlerFactory in
WorkerPipeSession, and drive WnWrapAlarmConsumer.PollOnce from the STA
instead of a threadpool timer.
2026-05-18 06:31:05 -04:00
Joseph Doherty a67a5a4857 fix(worker): wire alarm command handler and STA poll loop (Gap 1 + Gap 2)
Gap 1 — WorkerPipeSession now passes `eq => new AlarmCommandHandler(eq)` as
the alarmCommandHandlerFactory in all three places it constructs
MxAccessStaSession (two convenience constructors and InitializeMxAccessAsync).
Previously the parameterless MxAccessStaSession() set the factory to null,
so every SubscribeAlarms / AcknowledgeAlarm / QueryActiveAlarms command
returned "alarm consumer not configured" in a deployed worker.

  - Added internal `MxAccessStaSession(Func<MxAccessEventQueue, IAlarmCommandHandler>?)`
    constructor that builds all defaults but accepts a factory.
  - Added public `MxAccessStaSession(StaRuntime, factory, eventQueue, alarmFactory?)`
    4-arg overload to complete the constructor chain.

Gap 2 — WnWrapAlarmConsumer now disables its internal threadpool Timer
(pollIntervalMilliseconds=0 in the default constructor). MxAccessStaSession
starts a `RunAlarmPollLoopAsync` background task that sleeps off-STA then
calls `staRuntime.InvokeAsync(() => handler.PollOnce())` at 500ms intervals.
This satisfies the ThreadingModel=Apartment requirement of wwAlarmConsumerClass:
every GetXmlCurrentAlarms2 call now runs on the worker's STA.

  - Added `PollOnce()` to `IMxAccessAlarmConsumer`, `AlarmDispatcher`,
    `IAlarmCommandHandler`, and `AlarmCommandHandler`.
  - Poll loop cancelled and awaited before alarm handler disposal in both
    ShutdownGracefullyAsync and Dispose.

Tests: 4 new tests in MxAccessStaSessionTests verify that
  - SubscribeAlarms reaches the handler when the factory is wired (Gap 1)
  - SubscribeAlarms returns InvalidRequest without a factory (regression guard)
  - PollOnce is called on the STA thread within 3s (Gap 2)
  - The poll loop stops after Dispose (Gap 2 lifecycle)
All fake IMxAccessAlarmConsumer / IAlarmCommandHandler test implementations
updated with no-op PollOnce() to satisfy the new interface member.

Worker tests: 199 passed / 1 pre-existing failure / 4 skipped (was 195/1/4).
Server tests: 308 passed / 0 failures (unchanged).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-18 06:30:14 -04:00
Joseph Doherty e00ee61cf0 Place Last Refresh next to Last Deploy on the Galaxy page
Group the two double-width timestamp cards at the start of the metric
row so the deploy/refresh pair reads together, ahead of the count cards.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-17 01:20:30 -04:00
Joseph Doherty 271bf7edff Give Galaxy timestamp cards double-width boxes
The Last Deploy and Last Refresh metric cards hold full timestamps that
wrapped to three or four lines in a single-width card. Add a Wide option
to MetricCard (grid-column: span 2) and set it on both Galaxy timestamp
cards. Also switch .metric-value to overflow-wrap: break-word so a date
token is never split mid-value.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-17 01:16:56 -04:00
Joseph Doherty 3397e99783 Document the dashboard API Keys management page
The dashboard's API Keys page (list plus Create/Rotate/Revoke and the
create dialog) had no design-doc coverage even though Authorization.md
already documents the constraint model it exposes. Add an "API keys
page" section to GatewayDashboardDesign.md describing the table columns,
the LDAP-group-gated management actions, the one-time secret reveal, and
audit logging. Cross-link it from the constraint-enforcement section of
Authorization.md and the CLI section of Authentication.md so the two
key-management surfaces reference each other.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-17 01:07:35 -04:00
Joseph Doherty f598b3a647 Stop dashboard table cells breaking whole words
overflow-wrap: anywhere let the table layout shrink a column below its
longest word, splitting tokens like "unconstrained" mid-word. Switch to
break-word so a word only breaks when it genuinely cannot fit.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-17 00:56:55 -04:00
Joseph Doherty 509b0118d4 Keep dashboard button labels on a single line
Bootstrap 5 .btn no longer pins white-space, so the Rotate/Revoke action
buttons broke mid-word in the narrow API Keys actions column. Pin .btn
to white-space: nowrap so a label always reads as one word.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-17 00:54:53 -04:00
Joseph Doherty 298836d2f3 Keep dashboard status chips on a single line
Status chips wrapped to two lines in narrow table columns (e.g. the
session State column). Pin .chip to white-space: nowrap so an enumerated
state always reads as one token.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-16 23:19:36 -04:00
Joseph Doherty 96bea1d478 Apply technical-light design system to the gateway dashboard
Restyles the Blazor dashboard onto a portable token-based theme so it
reads like an instrument panel: warm-paper background, hairline-ruled
panels, IBM Plex type, monospace tabular numerics, and status carried by
colour chips. Vendors theme.css + IBM Plex fonts, rewrites dashboard.css
as a thin token-driven view layer, and swaps the Bootstrap navbar and
status badges for the design-system app bar and chips.

Also includes pending API-key management, Galaxy hierarchy projection,
and constraint-enforcement work with their tests.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-16 22:31:04 -04:00
159 changed files with 12368 additions and 632 deletions
+140
View File
@@ -0,0 +1,140 @@
# Code Review Process
This document describes how to perform a comprehensive, per-module code review of
the `mxaccessgw` codebase and how to track findings to resolution.
A **module** is one buildable project under `src/` (e.g. `src/MxGateway.Worker`)
or one language client under `clients/` (e.g. `clients/rust`). Each module has
its own folder under `code-reviews/` containing a single `findings.md`.
## 1. Before you start
1. Pick the module to review. Its folder is `code-reviews/<Module>/`:
- For a `src/` project, `<Module>` is the project name with the `MxGateway.`
prefix stripped — `src/MxGateway.Server` is reviewed in `code-reviews/Server/`.
- For a language client, `<Module>` is `Client.<Lang>``clients/rust` is
reviewed in `code-reviews/Client.Rust/`.
2. Identify the design context for the module:
- `gateway.md` — top-level architecture, command/event surface, IPC envelope,
STA thread model, fault handling.
- The relevant component design docs under `docs/` (e.g.
`docs/MxAccessWorkerInstanceDesign.md`, `docs/GatewayProcessDesign.md`,
`docs/Sessions.md`, `docs/Authentication.md`, `docs/GalaxyRepository.md`).
- `docs/DesignDecisions.md` for the v1 design choices.
- The **Repository-Specific Conventions** and **Process / Platform Notes** in
`CLAUDE.md`.
3. Record the exact commit being reviewed: `git rev-parse --short HEAD`. Every
review is a snapshot — a finding only means something relative to a known
commit.
4. Open `code-reviews/<Module>/findings.md` and fill in the header table
(reviewer, date, commit SHA, status).
## 2. Review checklist
Work through **every** category below for the module. A comprehensive review
means the checklist is completed even where it produces no findings — record
"No issues found" for a category rather than leaving it ambiguous.
1. **Correctness & logic bugs** — off-by-one, null handling, incorrect
conditionals, misuse of APIs, broken edge cases.
2. **mxaccessgw conventions** — the rules in `CLAUDE.md` and the style guides
under `docs/style-guides/`: the gateway never instantiates MXAccess COM
directly; all MXAccess COM calls run on the worker's dedicated STA thread and
the STA loop pumps Windows messages; IPC uses one bidirectional named pipe per
worker carrying length-prefixed `WorkerEnvelope` protobuf frames; MXAccess
parity is the contract (don't "fix" surprising MXAccess behaviour, never
synthesize events); one worker and one event subscriber per session; the
gateway terminates orphan workers on startup and does not reattach; C# style
(file-scoped namespaces, `sealed` by default, `Async` suffix, MXAccess-aligned
names); no Blazor UI component libraries; no logging of secrets or full tag
values; generated code is never hand-edited.
3. **Concurrency & thread safety** — shared mutable state, STA affinity, race
conditions, correct use of `async`/`await`, locking, disposal races.
4. **Error handling & resilience** — exception paths, worker crash / reconnect
handling, fail-fast event backpressure, transient vs permanent error
classification, graceful degradation, correct gRPC status codes.
5. **Security** — authentication/authorization checks, API-key scope enforcement,
input validation, SQL injection in the Galaxy Repository RPCs, secret
handling, the dashboard anonymous-localhost bypass, logging of sensitive data.
6. **Performance & resource management**`IDisposable` disposal, pipe / stream
/ COM lifetimes, buffering and back-pressure, unnecessary allocations on hot
paths, N+1 queries.
7. **Design-document adherence** — does the code match `gateway.md`, the relevant
`docs/` component designs, `docs/DesignDecisions.md`, and `CLAUDE.md`? Flag
both code that drifts from the design and design docs that are now stale.
8. **Code organization & conventions** — namespace hierarchy, project layout, the
Options pattern, separation of concerns, additive-only contract evolution.
9. **Testing coverage** — are the module's behaviours covered by tests
(`src/MxGateway.Tests`, `src/MxGateway.Worker.Tests`,
`src/MxGateway.IntegrationTests`)? Note untested critical paths and missing
edge-case tests.
10. **Documentation & comments** — XML doc accuracy, misleading or stale comments,
undocumented non-obvious behaviour.
## 3. Recording findings
Add one entry per finding to the `## Findings` section of the module's
`findings.md`, using the entry format in
[`_template/findings.md`](code-reviews/_template/findings.md).
- **Finding ID** — `<Module>-NNN`, numbered sequentially within the module and
never reused (e.g. `Worker-001`). IDs are permanent even after resolution.
- **Severity:**
- **Critical** — data loss, security breach, crash/deadlock, or outage.
- **High** — incorrect behaviour with significant impact; no safe workaround.
- **Medium** — incorrect or risky behaviour with limited impact or a workaround.
- **Low** — minor issues, style, maintainability, documentation.
- **Category** — one of the 10 checklist categories above.
- **Location** — `file:line` (clickable), or a list of locations.
- **Description** — what is wrong and why it matters.
- **Recommendation** — concrete suggested fix.
After recording findings, update the module header table (status, open-finding
count) and regenerate the base README (step 5).
## 4. Marking an item resolved
Findings are **never deleted** — they are an audit trail. To close one, change
its **Status** and complete the **Resolution** field:
- `Open` — newly recorded, not yet addressed.
- `In Progress` — a fix is actively being worked on.
- `Resolved` — fixed. The Resolution field must state the fixing commit SHA, the
date, and a one-line description of the fix.
- `Won't Fix` — intentionally not fixed. The Resolution field must justify why.
- `Deferred` — valid but postponed. The Resolution field must say what it is
waiting on (e.g. a tracked issue or a later milestone).
`Resolved`, `Won't Fix`, and `Deferred` findings are all considered **closed**.
`Open` and `In Progress` are **pending** and appear in the base README's Pending
Findings table.
## 5. Updating the base README
`code-reviews/README.md` holds the single cross-module view (the Module Status
table and the Pending / Closed Findings tables). It is **generated** from the
per-module `findings.md` files — do not edit it by hand.
After any review or status change, regenerate it:
```
python code-reviews/regen-readme.py
```
`regen-readme.py --check` exits non-zero if `README.md` is stale, if a module
header's `Open findings` count disagrees with its finding statuses, or if a
finding carries an unrecognised Status value. The PowerShell wrapper
`scripts/check-code-reviews-readme.ps1` runs that check and is the intended hook
for CI or a pre-commit step.
> The repo's installed `python` is the real interpreter; the bare `python3`
> alias resolves to the Windows Store stub and fails. Use `python`.
The per-module `findings.md` files are the source of truth; `README.md` is the
aggregated index and must always agree with them — which the script guarantees.
## 6. Re-reviewing a module
Re-reviews append to the same `findings.md`. Update the header to the new commit
and date, continue the finding numbering from the last used ID, and leave prior
findings (including closed ones) in place as history.
@@ -91,6 +91,19 @@ internal sealed class FakeGatewayTransport(MxGatewayClientOptions options) : IMx
/// </summary>
public Queue<Exception> CloseSessionExceptions { get; } = new();
/// <summary>
/// Gets or sets a value indicating whether thrown <see cref="RpcException"/>s are mapped
/// to <see cref="MxGatewayException"/> the way the production gRPC transport does. Lets
/// retry tests exercise the wrapped-exception predicate branch that runs in production.
/// </summary>
public bool MapTransportExceptions { get; set; }
/// <summary>
/// Gets or sets an optional hook awaited inside CloseSessionAsync after the call is
/// recorded; lets tests pause a close mid-flight to observe concurrent dispose.
/// </summary>
public Func<Task>? CloseSessionHook { get; set; }
/// <summary>
/// Gets the queue of exceptions to throw from InvokeAsync.
/// </summary>
@@ -108,7 +121,7 @@ internal sealed class FakeGatewayTransport(MxGatewayClientOptions options) : IMx
OpenSessionCalls.Add((request, callOptions));
if (OpenSessionExceptions.TryDequeue(out Exception? exception))
{
throw exception;
throw Translate(exception, callOptions);
}
return Task.FromResult(OpenSessionReply);
@@ -119,17 +132,23 @@ internal sealed class FakeGatewayTransport(MxGatewayClientOptions options) : IMx
/// </summary>
/// <param name="request">The CloseSessionRequest to process.</param>
/// <param name="callOptions">Call options specifying RPC behavior.</param>
public Task<CloseSessionReply> CloseSessionAsync(
public async Task<CloseSessionReply> CloseSessionAsync(
CloseSessionRequest request,
CallOptions callOptions)
{
CloseSessionCalls.Add((request, callOptions));
if (CloseSessionExceptions.TryDequeue(out Exception? exception))
if (CloseSessionHook is not null)
{
throw exception;
await CloseSessionHook().ConfigureAwait(false);
}
return Task.FromResult(CloseSessionReply);
if (CloseSessionExceptions.TryDequeue(out Exception? exception))
{
throw Translate(exception, callOptions);
}
return CloseSessionReply;
}
/// <summary>
@@ -144,7 +163,7 @@ internal sealed class FakeGatewayTransport(MxGatewayClientOptions options) : IMx
InvokeCalls.Add((request, callOptions));
if (InvokeExceptions.TryDequeue(out Exception? exception))
{
throw exception;
throw Translate(exception, callOptions);
}
return Task.FromResult(_invokeReplies.Dequeue());
@@ -239,4 +258,18 @@ internal sealed class FakeGatewayTransport(MxGatewayClientOptions options) : IMx
{
_activeAlarmSnapshots.Add(snapshot);
}
/// <summary>
/// Maps a queued exception the way the production gRPC transport does when
/// <see cref="MapTransportExceptions"/> is set; otherwise returns it unchanged.
/// </summary>
private Exception Translate(Exception exception, CallOptions callOptions)
{
if (MapTransportExceptions && exception is RpcException rpcException)
{
return RpcExceptionMapper.Map(rpcException, callOptions.CancellationToken);
}
return exception;
}
}
@@ -231,6 +231,52 @@ public sealed class MxGatewayClientSessionTests
Assert.Equal("session-fixture", call.Request.SessionId);
}
/// <summary>
/// Verifies that disposing a session while other callers are concurrently inside
/// <see cref="MxGatewaySession.CloseAsync"/> — one holding the close lock and one
/// parked on it — never throws <see cref="ObjectDisposedException"/> into those
/// callers. The close lock must outlive every pending close.
/// </summary>
[Fact]
public async Task DisposeAsync_DoesNotRaceConcurrentCloseAsync()
{
for (int iteration = 0; iteration < 100; iteration++)
{
FakeGatewayTransport transport = CreateTransport();
using SemaphoreSlim firstCloseEntered = new(0, 1);
using SemaphoreSlim releaseFirstClose = new(0, 1);
// The first CloseAsync to reach the transport parks here while holding the
// session's close lock; later callers queue on the lock behind it.
transport.CloseSessionHook = async () =>
{
firstCloseEntered.Release();
await releaseFirstClose.WaitAsync().ConfigureAwait(false);
transport.CloseSessionHook = null;
};
await using MxGatewayClient client = CreateClient(transport);
MxGatewaySession session = await client.OpenSessionAsync();
// Holder enters CloseAsync, acquires the lock, and parks in the hook.
Task holder = Task.Run(() => session.CloseAsync());
await firstCloseEntered.WaitAsync();
// Waiter is parked on the close lock behind the holder.
Task waiter = Task.Run(() => session.CloseAsync());
// DisposeAsync runs concurrently; it must wait out both callers before
// disposing the close lock rather than tearing it down underneath them.
Task dispose = session.DisposeAsync().AsTask();
releaseFirstClose.Release();
await holder;
await waiter;
await dispose;
}
}
/// <summary>Verifies that invoke retries safe diagnostic commands on transient RPC failure.</summary>
[Fact]
public async Task InvokeAsync_RetriesSafeDiagnosticCommandOnTransientGrpcFailure()
@@ -255,6 +301,35 @@ public sealed class MxGatewayClientSessionTests
Assert.Equal(2, transport.InvokeCalls.Count);
}
/// <summary>
/// Verifies that the retry pipeline still retries when the transport maps the raw
/// <see cref="RpcException"/> to an <see cref="MxGatewayException"/> before it reaches
/// the retry predicate — the wrapped-exception shape that production always produces.
/// </summary>
[Fact]
public async Task InvokeAsync_RetriesSafeDiagnosticCommand_WhenTransportMapsRpcException()
{
FakeGatewayTransport transport = CreateTransport();
transport.MapTransportExceptions = true;
transport.InvokeExceptions.Enqueue(CreateTransientRpcException());
transport.AddInvokeReply(new MxCommandReply
{
SessionId = "session-fixture",
Kind = MxCommandKind.Ping,
ProtocolStatus = new ProtocolStatus { Code = ProtocolStatusCode.Ok },
});
await using MxGatewayClient client = CreateClient(transport);
MxGatewaySession session = await client.OpenSessionAsync();
await session.InvokeAsync(new MxCommandRequest
{
SessionId = session.SessionId,
Command = new MxCommand { Kind = MxCommandKind.Ping, Ping = new PingCommand() },
});
Assert.Equal(2, transport.InvokeCalls.Count);
}
/// <summary>Verifies that open session does not retry on transient RPC failure.</summary>
[Fact]
public async Task OpenSessionAsync_DoesNotRetryTransientGrpcFailure()
@@ -0,0 +1,76 @@
using Grpc.Core;
namespace MxGateway.Client.Tests;
/// <summary>Tests for the shared gRPC-to-native exception mapping used by the transports.</summary>
public sealed class RpcExceptionMapperTests
{
/// <summary>Verifies that an unauthenticated status maps to the authentication exception.</summary>
[Fact]
public void Map_UnauthenticatedStatus_ProducesAuthenticationException()
{
RpcException rpc = new(new Status(StatusCode.Unauthenticated, "no key"));
Exception mapped = RpcExceptionMapper.Map(rpc, CancellationToken.None);
MxGatewayAuthenticationException authentication =
Assert.IsType<MxGatewayAuthenticationException>(mapped);
Assert.Equal(StatusCode.Unauthenticated, authentication.StatusCode);
}
/// <summary>Verifies that a permission-denied status maps to the authorization exception.</summary>
[Fact]
public void Map_PermissionDeniedStatus_ProducesAuthorizationException()
{
RpcException rpc = new(new Status(StatusCode.PermissionDenied, "missing scope"));
Exception mapped = RpcExceptionMapper.Map(rpc, CancellationToken.None);
MxGatewayAuthorizationException authorization =
Assert.IsType<MxGatewayAuthorizationException>(mapped);
Assert.Equal(StatusCode.PermissionDenied, authorization.StatusCode);
}
/// <summary>Verifies that a cancelled status maps to OperationCanceledException.</summary>
[Fact]
public void Map_CancelledStatus_ProducesOperationCanceledException()
{
RpcException rpc = new(new Status(StatusCode.Cancelled, "cancelled"));
Exception mapped = RpcExceptionMapper.Map(rpc, CancellationToken.None);
Assert.IsType<OperationCanceledException>(mapped);
}
/// <summary>
/// Verifies that non-auth statuses surface the originating gRPC status code on the
/// mapped exception so callers can distinguish transient from permanent failures
/// without reflecting into InnerException.
/// </summary>
[Theory]
[InlineData(StatusCode.NotFound)]
[InlineData(StatusCode.InvalidArgument)]
[InlineData(StatusCode.ResourceExhausted)]
[InlineData(StatusCode.FailedPrecondition)]
[InlineData(StatusCode.Unavailable)]
[InlineData(StatusCode.Internal)]
public void Map_NonAuthStatus_CarriesStatusCodeOnMxGatewayException(StatusCode statusCode)
{
RpcException rpc = new(new Status(statusCode, "boom"));
Exception mapped = RpcExceptionMapper.Map(rpc, CancellationToken.None);
MxGatewayException gatewayException = Assert.IsType<MxGatewayException>(mapped);
Assert.Equal(statusCode, gatewayException.StatusCode);
Assert.Same(rpc, gatewayException.InnerException);
}
/// <summary>Verifies that an MxGatewayException built without a gRPC status reports a null StatusCode.</summary>
[Fact]
public void StatusCode_IsNull_WhenNoGrpcStatusProvided()
{
MxGatewayException gatewayException = new("plain failure");
Assert.Null(gatewayException.StatusCode);
}
}
@@ -0,0 +1,24 @@
namespace MxGateway.Client;
public sealed record DiscoverHierarchyOptions
{
public int? RootGobjectId { get; init; }
public string? RootTagName { get; init; }
public string? RootContainedPath { get; init; }
public int? MaxDepth { get; init; }
public IReadOnlyList<int> CategoryIds { get; init; } = Array.Empty<int>();
public IReadOnlyList<string> TemplateChainContains { get; init; } = Array.Empty<string>();
public string? TagNameGlob { get; init; }
public bool? IncludeAttributes { get; init; }
public bool AlarmBearingOnly { get; init; }
public bool HistorizedOnly { get; init; }
}
@@ -36,7 +36,7 @@ internal sealed class GrpcGalaxyRepositoryClientTransport(
}
catch (RpcException exception)
{
throw MapRpcException(exception, callOptions.CancellationToken);
throw RpcExceptionMapper.Map(exception, callOptions.CancellationToken);
}
}
@@ -53,7 +53,7 @@ internal sealed class GrpcGalaxyRepositoryClientTransport(
}
catch (RpcException exception)
{
throw MapRpcException(exception, callOptions.CancellationToken);
throw RpcExceptionMapper.Map(exception, callOptions.CancellationToken);
}
}
@@ -70,7 +70,7 @@ internal sealed class GrpcGalaxyRepositoryClientTransport(
}
catch (RpcException exception)
{
throw MapRpcException(exception, callOptions.CancellationToken);
throw RpcExceptionMapper.Map(exception, callOptions.CancellationToken);
}
}
@@ -101,7 +101,7 @@ internal sealed class GrpcGalaxyRepositoryClientTransport(
}
catch (RpcException exception)
{
throw MapRpcException(exception, effectiveCancellationToken);
throw RpcExceptionMapper.Map(exception, effectiveCancellationToken);
}
yield return deployEvent;
@@ -115,28 +115,4 @@ internal sealed class GrpcGalaxyRepositoryClientTransport(
{
return WatchDeployEventsAsync(request, callOptions);
}
private static Exception MapRpcException(
RpcException exception,
CancellationToken cancellationToken)
{
if (cancellationToken.IsCancellationRequested || exception.StatusCode == StatusCode.Cancelled)
{
return new OperationCanceledException(
exception.Status.Detail,
exception,
cancellationToken);
}
return exception.StatusCode switch
{
StatusCode.Unauthenticated => new MxGatewayAuthenticationException(
exception.Status.Detail,
innerException: exception),
StatusCode.PermissionDenied => new MxGatewayAuthorizationException(
exception.Status.Detail,
innerException: exception),
_ => new MxGatewayException(exception.Status.Detail, exception),
};
}
}
@@ -36,7 +36,7 @@ internal sealed class GrpcMxGatewayClientTransport(
}
catch (RpcException exception)
{
throw MapRpcException(exception, callOptions.CancellationToken);
throw RpcExceptionMapper.Map(exception, callOptions.CancellationToken);
}
}
@@ -53,7 +53,7 @@ internal sealed class GrpcMxGatewayClientTransport(
}
catch (RpcException exception)
{
throw MapRpcException(exception, callOptions.CancellationToken);
throw RpcExceptionMapper.Map(exception, callOptions.CancellationToken);
}
}
@@ -70,7 +70,7 @@ internal sealed class GrpcMxGatewayClientTransport(
}
catch (RpcException exception)
{
throw MapRpcException(exception, callOptions.CancellationToken);
throw RpcExceptionMapper.Map(exception, callOptions.CancellationToken);
}
}
@@ -101,7 +101,7 @@ internal sealed class GrpcMxGatewayClientTransport(
}
catch (RpcException exception)
{
throw MapRpcException(exception, effectiveCancellationToken);
throw RpcExceptionMapper.Map(exception, effectiveCancellationToken);
}
yield return gatewayEvent;
@@ -129,7 +129,7 @@ internal sealed class GrpcMxGatewayClientTransport(
}
catch (RpcException exception)
{
throw MapRpcException(exception, callOptions.CancellationToken);
throw RpcExceptionMapper.Map(exception, callOptions.CancellationToken);
}
}
@@ -160,7 +160,7 @@ internal sealed class GrpcMxGatewayClientTransport(
}
catch (RpcException exception)
{
throw MapRpcException(exception, effectiveCancellationToken);
throw RpcExceptionMapper.Map(exception, effectiveCancellationToken);
}
yield return snapshot;
@@ -174,28 +174,4 @@ internal sealed class GrpcMxGatewayClientTransport(
{
return QueryActiveAlarmsAsync(request, callOptions);
}
private static Exception MapRpcException(
RpcException exception,
CancellationToken cancellationToken)
{
if (cancellationToken.IsCancellationRequested || exception.StatusCode == StatusCode.Cancelled)
{
return new OperationCanceledException(
exception.Status.Detail,
exception,
cancellationToken);
}
return exception.StatusCode switch
{
StatusCode.Unauthenticated => new MxGatewayAuthenticationException(
exception.Status.Detail,
innerException: exception),
StatusCode.PermissionDenied => new MxGatewayAuthorizationException(
exception.Status.Detail,
innerException: exception),
_ => new MxGatewayException(exception.Status.Detail, exception),
};
}
}
@@ -1,3 +1,4 @@
using Grpc.Core;
using MxGateway.Contracts.Proto;
namespace MxGateway.Client;
@@ -13,6 +14,7 @@ public sealed class MxGatewayAuthenticationException : MxGatewayException
/// <param name="hResult">The HResult code, if available.</param>
/// <param name="statuses">The MXAccess statuses, if available.</param>
/// <param name="innerException">The underlying exception, if any.</param>
/// <param name="statusCode">The gRPC status code reported by the failed call, if available.</param>
public MxGatewayAuthenticationException(
string message,
string? sessionId = null,
@@ -20,7 +22,8 @@ public sealed class MxGatewayAuthenticationException : MxGatewayException
ProtocolStatus? protocolStatus = null,
int? hResult = null,
IReadOnlyList<MxStatusProxy>? statuses = null,
Exception? innerException = null)
Exception? innerException = null,
StatusCode? statusCode = null)
: base(
message,
sessionId,
@@ -28,7 +31,8 @@ public sealed class MxGatewayAuthenticationException : MxGatewayException
protocolStatus,
hResult,
statuses ?? [],
innerException)
innerException,
statusCode)
{
}
}
@@ -1,3 +1,4 @@
using Grpc.Core;
using MxGateway.Contracts.Proto;
namespace MxGateway.Client;
@@ -13,6 +14,7 @@ public sealed class MxGatewayAuthorizationException : MxGatewayException
/// <param name="hResult">The HResult code, if available.</param>
/// <param name="statuses">The MXAccess statuses, if available.</param>
/// <param name="innerException">The underlying exception, if any.</param>
/// <param name="statusCode">The gRPC status code reported by the failed call, if available.</param>
public MxGatewayAuthorizationException(
string message,
string? sessionId = null,
@@ -20,7 +22,8 @@ public sealed class MxGatewayAuthorizationException : MxGatewayException
ProtocolStatus? protocolStatus = null,
int? hResult = null,
IReadOnlyList<MxStatusProxy>? statuses = null,
Exception? innerException = null)
Exception? innerException = null,
StatusCode? statusCode = null)
: base(
message,
sessionId,
@@ -28,7 +31,8 @@ public sealed class MxGatewayAuthorizationException : MxGatewayException
protocolStatus,
hResult,
statuses ?? [],
innerException)
innerException,
statusCode)
{
}
}
@@ -17,7 +17,7 @@ public sealed class MxGatewayClient : IAsyncDisposable
private readonly GrpcChannel _channel;
private readonly IMxGatewayClientTransport _transport;
private readonly ResiliencePipeline _safeUnaryRetryPipeline;
private bool _disposed;
private int _disposed;
/// <summary>
/// Initializes a new instance of the <see cref="MxGatewayClient"/> with given options and transport.
@@ -229,12 +229,11 @@ public sealed class MxGatewayClient : IAsyncDisposable
/// </summary>
public ValueTask DisposeAsync()
{
if (_disposed)
if (Interlocked.Exchange(ref _disposed, 1) != 0)
{
return ValueTask.CompletedTask;
}
_disposed = true;
_channel?.Dispose();
return ValueTask.CompletedTask;
}
@@ -335,6 +334,6 @@ public sealed class MxGatewayClient : IAsyncDisposable
private void ThrowIfDisposed()
{
ObjectDisposedException.ThrowIf(_disposed, this);
ObjectDisposedException.ThrowIf(Volatile.Read(ref _disposed) != 0, this);
}
}
@@ -1,3 +1,4 @@
using Grpc.Core;
using MxGateway.Contracts.Proto;
namespace MxGateway.Client;
@@ -28,6 +29,20 @@ public class MxGatewayException : Exception
Statuses = [];
}
/// <summary>
/// Initializes a new instance of the MxGatewayException class carrying the originating
/// gRPC status code so callers can distinguish transient from permanent failures.
/// </summary>
/// <param name="message">Diagnostic message describing the failure.</param>
/// <param name="statusCode">The gRPC status code reported by the failed call.</param>
/// <param name="innerException">Underlying exception that caused this failure.</param>
public MxGatewayException(string message, StatusCode statusCode, Exception? innerException)
: base(message, innerException)
{
StatusCode = statusCode;
Statuses = [];
}
/// <summary>
/// Initializes a new instance of the MxGatewayException class with full diagnostic information.
/// </summary>
@@ -38,6 +53,7 @@ public class MxGatewayException : Exception
/// <param name="hResult">HRESULT code returned by the worker or MXAccess, if available.</param>
/// <param name="statuses">List of MXAccess status codes returned by the operation.</param>
/// <param name="innerException">Underlying exception that caused this failure.</param>
/// <param name="statusCode">The gRPC status code reported by the failed call, if available.</param>
public MxGatewayException(
string message,
string? sessionId,
@@ -45,7 +61,8 @@ public class MxGatewayException : Exception
ProtocolStatus? protocolStatus,
int? hResult,
IReadOnlyList<MxStatusProxy> statuses,
Exception? innerException = null)
Exception? innerException = null,
StatusCode? statusCode = null)
: base(message, innerException)
{
SessionId = sessionId;
@@ -53,6 +70,7 @@ public class MxGatewayException : Exception
ProtocolStatus = protocolStatus;
HResultCode = hResult;
Statuses = statuses;
StatusCode = statusCode;
}
/// <summary>
@@ -79,4 +97,15 @@ public class MxGatewayException : Exception
/// Gets the list of MXAccess status codes returned by the operation.
/// </summary>
public IReadOnlyList<MxStatusProxy> Statuses { get; }
/// <summary>
/// Gets the gRPC status code reported by the failed call, if the failure originated
/// from a gRPC <see cref="RpcException"/>. <see langword="null"/> when the exception
/// was not produced from a gRPC status (for example, a protocol-level reply failure).
/// Callers can inspect this to distinguish a transient outage
/// (<see cref="Grpc.Core.StatusCode.Unavailable"/>) from a permanent error
/// (<see cref="Grpc.Core.StatusCode.InvalidArgument"/>) without downcasting
/// <see cref="Exception.InnerException"/>.
/// </summary>
public StatusCode? StatusCode { get; }
}
@@ -9,7 +9,10 @@ public sealed class MxGatewaySession : IAsyncDisposable
{
private readonly MxGatewayClient _client;
private readonly SemaphoreSlim _closeLock = new(1, 1);
private readonly object _disposeGate = new();
private CloseSessionReply? _closeReply;
private int _activeCloseCount;
private bool _closeLockDisposed;
/// <summary>
/// Initializes a new session backed by the given MXAccess gateway client.
@@ -46,23 +49,42 @@ public sealed class MxGatewaySession : IAsyncDisposable
return _closeReply;
}
await _closeLock.WaitAsync(cancellationToken).ConfigureAwait(false);
// Register as an in-flight closer under the dispose gate. DisposeAsync waits for
// _activeCloseCount to drain before disposing the close lock, so the semaphore is
// guaranteed to outlive every WaitAsync started here.
lock (_disposeGate)
{
ObjectDisposedException.ThrowIf(_closeLockDisposed, this);
_activeCloseCount++;
}
try
{
if (_closeReply is not null)
await _closeLock.WaitAsync(cancellationToken).ConfigureAwait(false);
try
{
if (_closeReply is not null)
{
return _closeReply;
}
_closeReply = await _client.CloseSessionRawAsync(
new CloseSessionRequest { SessionId = SessionId },
cancellationToken)
.ConfigureAwait(false);
return _closeReply;
}
_closeReply = await _client.CloseSessionRawAsync(
new CloseSessionRequest { SessionId = SessionId },
cancellationToken)
.ConfigureAwait(false);
return _closeReply;
finally
{
_closeLock.Release();
}
}
finally
{
_closeLock.Release();
lock (_disposeGate)
{
_activeCloseCount--;
}
}
}
@@ -658,7 +680,32 @@ public sealed class MxGatewaySession : IAsyncDisposable
/// </summary>
public async ValueTask DisposeAsync()
{
lock (_disposeGate)
{
if (_closeLockDisposed)
{
return;
}
}
await CloseAsync().ConfigureAwait(false);
// Wait for every concurrent CloseAsync caller to leave the close lock before
// disposing it; once _closeReply is set those callers return without awaiting.
while (true)
{
lock (_disposeGate)
{
if (_activeCloseCount == 0)
{
_closeLockDisposed = true;
break;
}
}
await Task.Yield();
}
_closeLock.Dispose();
}
@@ -0,0 +1,55 @@
using Grpc.Core;
namespace MxGateway.Client;
/// <summary>
/// Maps low-level <see cref="RpcException"/>s raised by the gRPC stack to the client's
/// native exception hierarchy. Shared by every gateway and Galaxy Repository transport
/// so the gRPC-to-native translation has exactly one implementation.
/// </summary>
internal static class RpcExceptionMapper
{
/// <summary>
/// Translates a <see cref="RpcException"/> into the most specific native exception type.
/// </summary>
/// <param name="exception">The gRPC exception to translate.</param>
/// <param name="cancellationToken">
/// The cancellation token of the originating call; used to distinguish a caller-driven
/// cancellation from a server-side <see cref="StatusCode.Cancelled"/> status.
/// </param>
/// <returns>
/// An <see cref="OperationCanceledException"/> when the call was cancelled, a typed
/// authentication/authorization exception for auth statuses, or an
/// <see cref="MxGatewayException"/> carrying the originating gRPC <see cref="StatusCode"/>.
/// </returns>
public static Exception Map(
RpcException exception,
CancellationToken cancellationToken)
{
ArgumentNullException.ThrowIfNull(exception);
if (cancellationToken.IsCancellationRequested || exception.StatusCode == StatusCode.Cancelled)
{
return new OperationCanceledException(
exception.Status.Detail,
exception,
cancellationToken);
}
return exception.StatusCode switch
{
StatusCode.Unauthenticated => new MxGatewayAuthenticationException(
exception.Status.Detail,
statusCode: exception.StatusCode,
innerException: exception),
StatusCode.PermissionDenied => new MxGatewayAuthorizationException(
exception.Status.Detail,
statusCode: exception.StatusCode,
innerException: exception),
_ => new MxGatewayException(
exception.Status.Detail,
exception.StatusCode,
exception),
};
}
}
+11
View File
@@ -112,6 +112,17 @@ can keep the full `MxCommandReply`, HRESULT, and status array when MXAccess
itself rejects a command. `MxAccessException.Reply` contains the raw generated
reply.
When a gRPC call itself fails, the transport maps the underlying
`RpcException` to a native exception: `Unauthenticated` becomes
`MxGatewayAuthenticationException`, `PermissionDenied` becomes
`MxGatewayAuthorizationException`, a cancelled call becomes
`OperationCanceledException`, and every other status becomes a base
`MxGatewayException`. `MxGatewayException.StatusCode` carries the originating
gRPC `Grpc.Core.StatusCode` (non-null whenever the failure came from a gRPC
status), so callers can distinguish a transient outage (`Unavailable`) from a
permanent error (`InvalidArgument`, `NotFound`) without downcasting
`InnerException`.
## CLI Usage
The test CLI supports deterministic JSON output for automation:
+7 -1
View File
@@ -79,7 +79,13 @@ client, err := mxgateway.Dial(ctx, mxgateway.Options{
`AddItem`, `AddItem2`, `Advise`, `Write`, `Events`, and `Close`. Prefer
`SubscribeEvents` or `SubscribeEventsAfter` for long-running streams because the
returned subscription owns cancellation and exposes `Close` for deterministic
goroutine cleanup. Raw protobuf messages remain available through the
goroutine cleanup. `Events` and `EventsAfter` are a compatibility shim with a
bounded internal buffer: if the consumer drains too slowly the buffer fills,
the underlying stream is cancelled, and a terminal `EventResult` carrying
`ErrEventBufferOverflow` is delivered as the channel's last item before it
closes — so a slow consumer can distinguish dropped events from a normal
end-of-stream. `SubscribeEvents` blocks instead of dropping, so use it when no
events may be lost. Raw protobuf messages remain available through the
`mxgateway` package aliases and the `Raw` helper methods. Typed errors support
`errors.As` for `GatewayError`, `CommandError`, and `MxAccessError`; command
errors preserve the raw reply.
+9 -4
View File
@@ -331,6 +331,11 @@ func runUnsubscribeBulk(ctx context.Context, args []string, stdout, stderr io.Wr
return errors.New("session-id and item-handles are required")
}
handles, err := parseInt32List(*itemHandles)
if err != nil {
return err
}
client, options, err := dialForCommand(ctx, common)
if err != nil {
return err
@@ -338,7 +343,7 @@ func runUnsubscribeBulk(ctx context.Context, args []string, stdout, stderr io.Wr
defer client.Close()
session := mxgateway.NewSessionForID(client, *sessionID)
results, err := session.UnsubscribeBulk(ctx, int32(*serverHandle), parseInt32List(*itemHandles))
results, err := session.UnsubscribeBulk(ctx, int32(*serverHandle), handles)
return writeBulkOutput(stdout, *jsonOutput, "unsubscribe-bulk", options, results, err)
}
@@ -514,7 +519,7 @@ func parseStringList(value string) []string {
return items
}
func parseInt32List(value string) []int32 {
func parseInt32List(value string) ([]int32, error) {
parts := strings.Split(value, ",")
items := make([]int32, 0, len(parts))
for _, part := range parts {
@@ -524,11 +529,11 @@ func parseInt32List(value string) []int32 {
}
parsed, err := strconv.ParseInt(item, 10, 32)
if err != nil {
panic(err)
return nil, fmt.Errorf("invalid item handle %q: %w", item, err)
}
items = append(items, int32(parsed))
}
return items
return items, nil
}
func bindCommonFlags(flags *flag.FlagSet) *commonOptions {
+29
View File
@@ -56,3 +56,32 @@ func TestParseValueBuildsTypedValue(t *testing.T) {
t.Fatalf("int32 value = %d, want 123", got)
}
}
func TestParseInt32ListParsesValidTokens(t *testing.T) {
items, err := parseInt32List("1, 2 ,3")
if err != nil {
t.Fatalf("parseInt32List() error = %v", err)
}
want := []int32{1, 2, 3}
if len(items) != len(want) {
t.Fatalf("parseInt32List() = %v, want %v", items, want)
}
for i := range want {
if items[i] != want[i] {
t.Fatalf("parseInt32List()[%d] = %d, want %d", i, items[i], want[i])
}
}
}
func TestParseInt32ListReturnsErrorOnMalformedToken(t *testing.T) {
items, err := parseInt32List("1,foo")
if err == nil {
t.Fatalf("parseInt32List() error = nil, want a parse error; items = %v", items)
}
if items != nil {
t.Fatalf("parseInt32List() items = %v, want nil on error", items)
}
if !strings.Contains(err.Error(), "foo") {
t.Fatalf("parseInt32List() error = %q, want it to name the bad token", err.Error())
}
}
+15 -2
View File
@@ -117,7 +117,7 @@ func TestEventsAfterCancelsStreamWhenCompatibilityChannelIsAbandoned(t *testing.
fake := &fakeGatewayServer{
streamStarted: make(chan struct{}),
streamDone: make(chan struct{}),
streamEventCount: 64,
streamEventCount: 256,
}
client, cleanup := newBufconnClient(t, fake)
defer cleanup()
@@ -135,12 +135,25 @@ func TestEventsAfterCancelsStreamWhenCompatibilityChannelIsAbandoned(t *testing.
t.Fatal("compatibility event stream did not stop after result channel filled")
}
// A slow consumer that abandons the buffer must still receive an explicit
// terminal overflow error before the channel closes, so it can tell
// "events dropped" apart from "stream ended normally".
var sawOverflow bool
for {
select {
case _, ok := <-events:
case result, ok := <-events:
if !ok {
if !sawOverflow {
t.Fatal("compatibility event channel closed without an ErrEventBufferOverflow result")
}
return
}
if result.Err != nil {
if !errors.Is(result.Err, ErrEventBufferOverflow) {
t.Fatalf("terminal result error = %v, want ErrEventBufferOverflow", result.Err)
}
sawOverflow = true
}
case <-time.After(2 * time.Second):
t.Fatal("compatibility event channel did not close")
}
+14 -1
View File
@@ -1,11 +1,20 @@
package mxgateway
import (
"errors"
"fmt"
pb "gitea.dohertylan.com/dohertj2/mxaccessgw/clients/go/internal/generated"
)
// ErrEventBufferOverflow is the terminal error delivered on the compatibility
// event channel returned by Session.Events / Session.EventsAfter when a slow
// consumer lets the bounded result buffer fill. It signals that the stream was
// cancelled and events were dropped, so a consumer can tell an overflow apart
// from a normal end-of-stream. Use Session.SubscribeEvents to block instead of
// dropping.
var ErrEventBufferOverflow = errors.New("mxgateway: event buffer overflow; compatibility stream cancelled and events dropped")
// GatewayError wraps transport-level gRPC failures.
type GatewayError struct {
// Op names the operation that failed (for example "dial" or "invoke").
@@ -85,8 +94,12 @@ func (e *MxAccessError) Error() string {
}
// Unwrap returns the wrapped CommandError, when one is present.
//
// When Command is nil (the HRESULT / MxStatusProxy path) it returns an
// untyped nil rather than a typed-nil *CommandError, so errors.As does not
// bind a nil pointer that a caller would then panic on.
func (e *MxAccessError) Unwrap() error {
if e == nil {
if e == nil || e.Command == nil {
return nil
}
return e.Command
+42
View File
@@ -0,0 +1,42 @@
package mxgateway
import (
"errors"
"testing"
)
// TestMxAccessErrorUnwrapHResultPathNoTypedNilCommandError reproduces
// Client.Go-001: an MxAccessError built via the HRESULT / MxStatusProxy path
// leaves Command nil. Unwrap must not hand back a typed-nil *CommandError,
// because errors.As would then succeed while binding a nil pointer and a
// caller dereferencing it would panic.
func TestMxAccessErrorUnwrapHResultPathNoTypedNilCommandError(t *testing.T) {
hresult := int32(-2147467259) // 0x80004005, a failing HRESULT.
reply := &MxCommandReply{Hresult: &hresult}
err := EnsureMxAccessSuccess("invoke", reply)
if err == nil {
t.Fatal("expected MxAccessError for a failing HRESULT, got nil")
}
var ce *CommandError
if errors.As(err, &ce) {
t.Fatalf("errors.As bound *CommandError from an HRESULT-only MxAccessError (ce=%v); "+
"a caller dereferencing ce.Status would panic", ce)
}
}
// TestMxAccessErrorUnwrapPopulatedCommand confirms the non-nil Command path
// still unwraps to the wrapped *CommandError.
func TestMxAccessErrorUnwrapPopulatedCommand(t *testing.T) {
command := &CommandError{Op: "invoke"}
err := &MxAccessError{Command: command}
var ce *CommandError
if !errors.As(err, &ce) {
t.Fatal("errors.As failed to bind the populated *CommandError")
}
if ce != command {
t.Fatalf("errors.As bound an unexpected *CommandError: got %v want %v", ce, command)
}
}
+25 -1
View File
@@ -490,7 +490,7 @@ func ensureBulkSize(name string, length int) error {
func sendEventResult(
ctx context.Context,
results chan<- EventResult,
results chan EventResult,
result EventResult,
cancelWhenBufferFull bool,
cancel context.CancelFunc,
@@ -502,7 +502,12 @@ func sendEventResult(
case <-ctx.Done():
return false
default:
// The bounded compatibility buffer is full. Cancel the stream and
// deliver an explicit terminal overflow error so a slow consumer
// can tell dropped events apart from a normal end-of-stream,
// rather than seeing the channel close silently.
cancel()
deliverTerminalResult(results, EventResult{Err: ErrEventBufferOverflow})
return false
}
}
@@ -515,6 +520,25 @@ func sendEventResult(
}
}
// deliverTerminalResult places result on a full buffered channel by evicting
// one of the oldest buffered events to make room. The caller closes results
// afterwards, so the terminal result becomes the consumer's last item.
func deliverTerminalResult(results chan EventResult, result EventResult) {
for {
select {
case results <- result:
return
default:
}
select {
case <-results:
default:
// Another receiver drained the channel between the send and
// receive attempts; retry the send.
}
}
}
func (s *Session) invokeCommand(ctx context.Context, command *MxCommand) (*MxCommandReply, error) {
return s.client.Invoke(ctx, &pb.MxCommandRequest{
SessionId: s.ID(),
+12
View File
@@ -62,6 +62,18 @@ underlying protobuf messages. `MxGatewayCommandException` and
`MxAccessException` preserve the raw `MxCommandReply` when the gateway returns a
data-bearing MXAccess failure.
`openSession` verifies the gateway's reported `gateway_protocol_version` against
the version this client was generated for and throws `MxGatewayException` on a
mismatch, so an incompatible client fails fast with a clear message instead of
issuing commands that fail downstream. A gateway that does not populate the
field is accepted unchanged.
`MxGatewaySession` implements `AutoCloseable`. The try-with-resources `close()`
performs a `CloseSession` network RPC but swallows (and logs) any failure of
that RPC so a close-time error never replaces the exception a try-with-resources
body is already propagating. Call `closeRaw()` explicitly when you need to
observe the close result or handle a close-time failure.
`MxEventStream` implements `Iterator<MxEvent>` and `AutoCloseable`. Closing it
cancels the underlying gRPC stream. Canceling or timing out a Java client call
only stops the client from waiting; it does not abort an in-flight MXAccess COM
@@ -62,8 +62,10 @@ final class MxGatewayCliTests {
assertEquals(0, run.exitCode());
assertTrue(run.output().contains("\"command\":\"open-session\""));
assertTrue(run.output().contains("\"sessionId\":\"session-cli\""));
assertTrue(run.output().contains("mxgw***********cret"));
// Only the non-secret mxgw_<key-id>_ prefix survives; the secret is fully masked.
assertTrue(run.output().contains("mxgw_visible_***"));
assertFalse(run.output().contains("visible_secret"));
assertFalse(run.output().contains("cret"));
}
@Test
@@ -21,13 +21,23 @@ import mxaccess_gateway.v1.MxaccessGateway.StreamEventsRequest;
* stream cancels the underlying gRPC call. If the queue overflows the call is
* cancelled and a follow-up call to {@link #next()} throws
* {@link MxGatewayException}.
*
* <p><strong>Threading:</strong> the iterator methods ({@link #hasNext()} and
* {@link #next()}) are <em>not</em> thread-safe and must be driven by a single
* consumer thread. {@link #close()} may be called from any thread. Terminal
* state transitions (queue overflow, server completion, and {@code close()})
* are serialised so that the first terminal condition wins deterministically:
* once an overflow exception has been observed it is never silently replaced
* by an end-of-stream marker.
*/
public final class MxEventStream implements Iterator<MxEvent>, AutoCloseable {
private static final Object END = new Object();
private final BlockingQueue<Object> queue;
private final Object terminalLock = new Object();
private volatile ClientCallStreamObserver<StreamEventsRequest> requestStream;
private volatile boolean closed;
private boolean terminated;
private Object next;
MxEventStream(int capacity) {
@@ -98,7 +108,7 @@ public final class MxEventStream implements Iterator<MxEvent>, AutoCloseable {
if (stream != null) {
stream.cancel("client cancelled event stream", null);
}
offer(END);
terminate(null);
}
private Object take() {
@@ -115,10 +125,7 @@ public final class MxEventStream implements Iterator<MxEvent>, AutoCloseable {
private void offer(Object value) {
Objects.requireNonNull(value, "value");
if (value == END) {
if (!queue.offer(value)) {
queue.clear();
queue.offer(value);
}
terminate(null);
return;
}
if (!queue.offer(value)) {
@@ -126,9 +133,38 @@ public final class MxEventStream implements Iterator<MxEvent>, AutoCloseable {
if (stream != null) {
stream.cancel("client event stream queue overflowed", null);
}
queue.clear();
queue.offer(new MxGatewayException("gateway stream events queue overflowed"));
queue.offer(END);
terminate(new MxGatewayException("gateway stream events queue overflowed"));
}
}
/**
* Drives the single terminal transition. The first caller wins: a later
* end-of-stream or {@code close()} cannot overwrite or discard an overflow
* exception that has already been published to the consumer.
*
* @param fault the fault to surface to the consumer, or {@code null} for a
* clean end-of-stream
*/
private void terminate(MxGatewayException fault) {
synchronized (terminalLock) {
if (terminated) {
return;
}
terminated = true;
if (fault != null) {
// Make room for the fault marker; the consumer only needs the
// terminal signal, queued data events are no longer relevant.
queue.clear();
queue.offer(fault);
queue.offer(END);
return;
}
// Clean end-of-stream: ensure the END marker is delivered even when
// the queue is currently full of undrained data events.
if (!queue.offer(END)) {
queue.clear();
queue.offer(END);
}
}
}
}
@@ -150,6 +150,7 @@ public final class MxGatewayClient implements AutoCloseable {
try {
OpenSessionReply reply = rawBlockingStub().openSession(request);
MxGatewayErrors.ensureProtocolSuccess("open session", reply.getProtocolStatus(), null);
ensureGatewayProtocolCompatible(reply);
return reply;
} catch (RuntimeException error) {
if (error instanceof MxGatewayException) {
@@ -159,6 +160,24 @@ public final class MxGatewayClient implements AutoCloseable {
}
}
/**
* Verifies that the gateway speaks the protocol version this client was
* generated against. A gateway that leaves {@code gateway_protocol_version}
* unset (value {@code 0}, e.g. an older gateway) is accepted unchanged.
*
* @param reply the {@code OpenSessionReply} returned by the gateway
* @throws MxGatewayException if the gateway reports an incompatible protocol version
*/
private static void ensureGatewayProtocolCompatible(OpenSessionReply reply) {
int gatewayVersion = reply.getGatewayProtocolVersion();
int clientVersion = MxGatewayClientVersion.gatewayProtocolVersion();
if (gatewayVersion != 0 && gatewayVersion != clientVersion) {
throw new MxGatewayException("gateway protocol version mismatch: gateway reports "
+ gatewayVersion + " but this client was built for " + clientVersion
+ "; upgrade the client or gateway so the protocol versions match");
}
}
/**
* Invokes {@code OpenSession} asynchronously.
*
@@ -170,6 +189,7 @@ public final class MxGatewayClient implements AutoCloseable {
CompletableFuture<OpenSessionReply> future = toCompletable(rawFutureStub().openSession(request));
return future.thenApply(reply -> {
MxGatewayErrors.ensureProtocolSuccess("open session", reply.getProtocolStatus(), null);
ensureGatewayProtocolCompatible(reply);
return reply;
});
}
@@ -11,25 +11,35 @@ public final class MxGatewaySecrets {
}
/**
* Redacts the body of an API key, leaving only short prefix and suffix
* windows so it remains comparable in logs.
* Redacts the secret portion of an API key, leaving only the non-secret
* key identifier visible so the value remains comparable in logs.
*
* <p>A gateway API key has the form {@code mxgw_<key-id>_<secret>}. Only the
* {@code mxgw_<key-id>_} prefix is non-secret; everything after the second
* underscore is the secret and is masked entirely &mdash; no leading or
* trailing characters of the secret are echoed. Tokens that do not match
* the gateway shape are masked completely as {@code "<redacted>"}.
*
* @param apiKey the API key to redact, may be {@code null} or empty
* @return an empty string for {@code null}/empty input, {@code "<redacted>"}
* for keys eight characters or shorter, or a masked form preserving
* the leading and trailing four characters
* for non-gateway-shaped tokens, or {@code mxgw_<key-id>_***} with the
* secret masked for gateway-shaped keys
*/
public static String redactApiKey(String apiKey) {
if (apiKey == null || apiKey.isEmpty()) {
return "";
}
if (apiKey.length() <= 8) {
return "<redacted>";
// Gateway keys are mxgw_<key-id>_<secret>; keep only the non-secret prefix.
if (apiKey.startsWith("mxgw_")) {
int secretSeparator = apiKey.indexOf('_', "mxgw_".length());
if (secretSeparator >= 0 && secretSeparator < apiKey.length() - 1) {
return apiKey.substring(0, secretSeparator + 1) + "***";
}
}
return apiKey.substring(0, 4)
+ "*".repeat(apiKey.length() - 8)
+ apiKey.substring(apiKey.length() - 4);
// Anything else is treated as wholly secret reveal nothing.
return "<redacted>";
}
/**
@@ -40,6 +40,7 @@ import mxaccess_gateway.v1.MxaccessGateway.WriteCommand;
*/
public final class MxGatewaySession implements AutoCloseable {
private static final SecureRandom RANDOM = new SecureRandom();
private static final System.Logger LOGGER = System.getLogger(MxGatewaySession.class.getName());
private final MxGatewayClient client;
private final OpenSessionReply openReply;
@@ -99,9 +100,26 @@ public final class MxGatewaySession implements AutoCloseable {
return closeReply;
}
/**
* Closes the session as part of try-with-resources.
*
* <p>This performs a {@code CloseSession} network RPC. Unlike
* {@link #closeRaw()}, any failure of that RPC is swallowed (and recorded
* as a suppressed exception when the JVM permits) rather than thrown: a
* close-time transport or protocol failure must not replace the exception
* that a try-with-resources body is already propagating. Callers that need
* to observe the close result should call {@link #closeRaw()} explicitly.
*/
@Override
public void close() {
closeRaw();
try {
closeRaw();
} catch (MxGatewayException error) {
LOGGER.log(
System.Logger.Level.WARNING,
() -> "ignoring close-time failure for session " + sessionId(),
error);
}
}
/**
@@ -116,7 +134,11 @@ public final class MxGatewaySession implements AutoCloseable {
if (reply.hasRegister()) {
return reply.getRegister().getServerHandle();
}
return reply.getReturnValue().getInt32Value();
if (reply.hasReturnValue()) {
return reply.getReturnValue().getInt32Value();
}
throw new MxGatewayException(
"gateway register reply carried neither a register payload nor a return value");
}
/**
@@ -159,7 +181,11 @@ public final class MxGatewaySession implements AutoCloseable {
if (reply.hasAddItem()) {
return reply.getAddItem().getItemHandle();
}
return reply.getReturnValue().getInt32Value();
if (reply.hasReturnValue()) {
return reply.getReturnValue().getInt32Value();
}
throw new MxGatewayException(
"gateway addItem reply carried neither an add-item payload nor a return value");
}
/**
@@ -193,7 +219,11 @@ public final class MxGatewaySession implements AutoCloseable {
if (reply.hasAddItem2()) {
return reply.getAddItem2().getItemHandle();
}
return reply.getReturnValue().getInt32Value();
if (reply.hasReturnValue()) {
return reply.getReturnValue().getInt32Value();
}
throw new MxGatewayException(
"gateway addItem2 reply carried neither an add-item payload nor a return value");
}
/**
@@ -0,0 +1,394 @@
package com.dohertylan.mxgateway.client;
import static org.junit.jupiter.api.Assertions.assertEquals;
import static org.junit.jupiter.api.Assertions.assertFalse;
import static org.junit.jupiter.api.Assertions.assertNotNull;
import static org.junit.jupiter.api.Assertions.assertThrows;
import static org.junit.jupiter.api.Assertions.assertTrue;
import io.grpc.ManagedChannel;
import io.grpc.Server;
import io.grpc.inprocess.InProcessChannelBuilder;
import io.grpc.inprocess.InProcessServerBuilder;
import io.grpc.stub.StreamObserver;
import java.time.Duration;
import java.util.UUID;
import mxaccess_gateway.v1.MxAccessGatewayGrpc;
import mxaccess_gateway.v1.MxaccessGateway.CloseSessionReply;
import mxaccess_gateway.v1.MxaccessGateway.CloseSessionRequest;
import mxaccess_gateway.v1.MxaccessGateway.MxCommandKind;
import mxaccess_gateway.v1.MxaccessGateway.MxCommandReply;
import mxaccess_gateway.v1.MxaccessGateway.MxCommandRequest;
import mxaccess_gateway.v1.MxaccessGateway.OpenSessionReply;
import mxaccess_gateway.v1.MxaccessGateway.OpenSessionRequest;
import mxaccess_gateway.v1.MxaccessGateway.ProtocolStatus;
import mxaccess_gateway.v1.MxaccessGateway.ProtocolStatusCode;
import org.junit.jupiter.api.Test;
/**
* Regression tests for the Medium-severity Client.Java code-review findings
* (Client.Java-001 through Client.Java-005).
*/
final class MxGatewayMediumFindingsTests {
// --- Client.Java-001: redactApiKey must not leak trailing secret chars ---
@Test
void redactApiKeyDoesNotLeakAnyCharacterOfTheSecret() {
// mxgw_<key-id>_<secret> the secret is the segment after the second underscore.
String apiKey = "mxgw_keyid01_supersecretvalue";
String redacted = MxGatewaySecrets.redactApiKey(apiKey);
// None of the secret characters may appear in the redacted output.
assertFalse(redacted.contains("value"), () -> "redacted form leaked secret tail: " + redacted);
assertFalse(redacted.endsWith("alue"), () -> "redacted form leaked trailing secret chars: " + redacted);
assertFalse(redacted.contains("supersecret"), () -> "redacted form leaked secret: " + redacted);
// The non-secret key-id prefix may stay so the value is still comparable in logs.
assertTrue(redacted.startsWith("mxgw_keyid01_"), () -> "redacted form lost key-id prefix: " + redacted);
}
@Test
void redactApiKeyForNonGatewayShapedKeyRevealsNothing() {
String redacted = MxGatewaySecrets.redactApiKey("plain-opaque-token-1234");
assertFalse(redacted.contains("1234"), () -> "redacted form leaked trailing chars: " + redacted);
assertFalse(redacted.contains("plain-opaque-token"), () -> "redacted form leaked body: " + redacted);
}
@Test
void redactApiKeyStillHandlesNullAndShortInput() {
assertEquals("", MxGatewaySecrets.redactApiKey(null));
assertEquals("", MxGatewaySecrets.redactApiKey(""));
assertEquals("<redacted>", MxGatewaySecrets.redactApiKey("short"));
}
// --- Client.Java-002: terminal-state transition must be deterministic ---
@Test
void eventStreamOverflowExceptionSurvivesASubsequentClose() {
// Deterministic reproduction of Client.Java-002: an overflow enqueues the
// overflow exception, then a later close() must NOT discard it. The first
// terminal condition (overflow) must win and stay observable by next().
MxEventStream stream = new MxEventStream(2);
io.grpc.stub.ClientResponseObserver<
mxaccess_gateway.v1.MxaccessGateway.StreamEventsRequest,
mxaccess_gateway.v1.MxaccessGateway.MxEvent>
observer = stream.observer();
observer.beforeStart(new NoopRequestStream());
// Force a queue overflow on a capacity-2 stream.
for (int i = 0; i < 8; i++) {
observer.onNext(testEvent(i));
}
// A close() arriving after the overflow must not erase the overflow signal.
stream.close();
MxGatewayException error = assertThrows(MxGatewayException.class, () -> {
while (stream.hasNext()) {
stream.next();
}
});
assertTrue(error.getMessage().contains("overflow"), error::getMessage);
}
@Test
void eventStreamConcurrentOverflowAndCloseAlwaysTerminate() throws Exception {
// The terminal-state transition must be serialised: whatever the interleaving
// of overflow and close, hasNext() always reaches a terminal state.
for (int iteration = 0; iteration < 300; iteration++) {
MxEventStream stream = new MxEventStream(2);
io.grpc.stub.ClientResponseObserver<
mxaccess_gateway.v1.MxaccessGateway.StreamEventsRequest,
mxaccess_gateway.v1.MxaccessGateway.MxEvent>
observer = stream.observer();
observer.beforeStart(new NoopRequestStream());
Thread filler = new Thread(() -> {
for (int i = 0; i < 8; i++) {
observer.onNext(testEvent(i));
}
});
Thread closer = new Thread(stream::close);
filler.start();
closer.start();
filler.join();
closer.join();
try {
while (stream.hasNext()) {
stream.next();
}
} catch (MxGatewayException expected) {
assertTrue(expected.getMessage().contains("overflow"), expected::getMessage);
}
assertFalse(stream.hasNext());
}
}
private static final class NoopRequestStream
extends io.grpc.stub.ClientCallStreamObserver<mxaccess_gateway.v1.MxaccessGateway.StreamEventsRequest> {
@Override
public void cancel(String message, Throwable cause) {
}
@Override
public boolean isReady() {
return true;
}
@Override
public void setOnReadyHandler(Runnable onReadyHandler) {
}
@Override
public void request(int count) {
}
@Override
public void setMessageCompression(boolean enable) {
}
@Override
public void disableAutoInboundFlowControl() {
}
@Override
public void onNext(mxaccess_gateway.v1.MxaccessGateway.StreamEventsRequest value) {
}
@Override
public void onError(Throwable t) {
}
@Override
public void onCompleted() {
}
}
// --- Client.Java-003: gateway protocol version mismatch must be rejected ---
@Test
void openSessionRejectsIncompatibleGatewayProtocolVersion() throws Exception {
TestService service = new TestService() {
@Override
public void openSession(OpenSessionRequest request, StreamObserver<OpenSessionReply> responseObserver) {
responseObserver.onNext(OpenSessionReply.newBuilder()
.setSessionId("session-mismatch")
.setGatewayProtocolVersion(MxGatewayClientVersion.gatewayProtocolVersion() + 1)
.setProtocolStatus(ok())
.build());
responseObserver.onCompleted();
}
};
try (Harness harness = Harness.start(service)) {
MxGatewayException error = assertThrows(
MxGatewayException.class,
() -> harness.client().openSession("junit-session"));
assertTrue(error.getMessage().contains("protocol version"), error::getMessage);
}
}
@Test
void openSessionAcceptsMatchingOrUnsetGatewayProtocolVersion() throws Exception {
TestService matching = new TestService() {
@Override
public void openSession(OpenSessionRequest request, StreamObserver<OpenSessionReply> responseObserver) {
responseObserver.onNext(OpenSessionReply.newBuilder()
.setSessionId("session-ok")
.setGatewayProtocolVersion(MxGatewayClientVersion.gatewayProtocolVersion())
.setProtocolStatus(ok())
.build());
responseObserver.onCompleted();
}
};
try (Harness harness = Harness.start(matching)) {
assertEquals("session-ok", harness.client().openSession("junit-session").sessionId());
}
// A gateway that leaves the field unset (0) must not be rejected older gateways
// simply do not populate it.
TestService unset = new TestService();
try (Harness harness = Harness.start(unset)) {
assertEquals("session-java", harness.client().openSession("junit-session").sessionId());
}
}
// --- Client.Java-004: missing typed payload AND missing return_value must throw ---
@Test
void registerThrowsWhenReplyHasNeitherTypedPayloadNorReturnValue() throws Exception {
TestService service = new TestService() {
@Override
public void invoke(MxCommandRequest request, StreamObserver<MxCommandReply> responseObserver) {
// Reply with neither register payload nor return_value set.
responseObserver.onNext(MxCommandReply.newBuilder()
.setSessionId(request.getSessionId())
.setKind(request.getCommand().getKind())
.setProtocolStatus(ok())
.build());
responseObserver.onCompleted();
}
};
try (Harness harness = Harness.start(service)) {
MxGatewaySession session = MxGatewaySession.forSessionId(harness.client(), "s");
MxGatewayException error = assertThrows(
MxGatewayException.class, () -> session.register("c"));
assertTrue(error.getMessage().contains("register"), error::getMessage);
}
}
@Test
void addItemThrowsWhenReplyHasNeitherTypedPayloadNorReturnValue() throws Exception {
TestService service = new TestService() {
@Override
public void invoke(MxCommandRequest request, StreamObserver<MxCommandReply> responseObserver) {
responseObserver.onNext(MxCommandReply.newBuilder()
.setSessionId(request.getSessionId())
.setKind(request.getCommand().getKind())
.setProtocolStatus(ok())
.build());
responseObserver.onCompleted();
}
};
try (Harness harness = Harness.start(service)) {
MxGatewaySession session = MxGatewaySession.forSessionId(harness.client(), "s");
assertThrows(MxGatewayException.class, () -> session.addItem(1, "Tag"));
assertThrows(MxGatewayException.class, () -> session.addItem2(1, "Tag", "ctx"));
}
}
@Test
void addItemStillHonoursReturnValueFallback() throws Exception {
TestService service = new TestService() {
@Override
public void invoke(MxCommandRequest request, StreamObserver<MxCommandReply> responseObserver) {
responseObserver.onNext(MxCommandReply.newBuilder()
.setSessionId(request.getSessionId())
.setKind(request.getCommand().getKind())
.setProtocolStatus(ok())
.setReturnValue(mxaccess_gateway.v1.MxaccessGateway.MxValue.newBuilder()
.setInt32Value(99))
.build());
responseObserver.onCompleted();
}
};
try (Harness harness = Harness.start(service)) {
MxGatewaySession session = MxGatewaySession.forSessionId(harness.client(), "s");
assertEquals(99, session.addItem(1, "Tag"));
}
}
// --- Client.Java-005: close() must not mask the primary try-with-resources error ---
@Test
void closeSuppressesCloseTimeFailureInsteadOfMaskingBodyException() throws Exception {
TestService service = new TestService() {
@Override
public void closeSession(CloseSessionRequest request, StreamObserver<CloseSessionReply> responseObserver) {
responseObserver.onError(io.grpc.Status.UNAVAILABLE
.withDescription("WORKER_UNAVAILABLE")
.asRuntimeException());
}
};
try (Harness harness = Harness.start(service)) {
IllegalStateException bodyError = assertThrows(IllegalStateException.class, () -> {
try (MxGatewaySession session = MxGatewaySession.forSessionId(harness.client(), "s")) {
throw new IllegalStateException("body failure");
}
});
// The body exception must propagate; the close-time RPC failure must not replace it.
assertEquals("body failure", bodyError.getMessage());
}
}
@Test
void closeRawStillSurfacesCloseTimeFailureForCallersWhoWantIt() throws Exception {
TestService service = new TestService() {
@Override
public void closeSession(CloseSessionRequest request, StreamObserver<CloseSessionReply> responseObserver) {
responseObserver.onError(io.grpc.Status.UNAVAILABLE
.withDescription("WORKER_UNAVAILABLE")
.asRuntimeException());
}
};
try (Harness harness = Harness.start(service)) {
MxGatewaySession session = MxGatewaySession.forSessionId(harness.client(), "s");
assertThrows(MxGatewayException.class, session::closeRaw);
}
}
private static mxaccess_gateway.v1.MxaccessGateway.MxEvent testEvent(int sequence) {
return mxaccess_gateway.v1.MxaccessGateway.MxEvent.newBuilder()
.setWorkerSequence(sequence)
.build();
}
private static ProtocolStatus ok() {
return ProtocolStatus.newBuilder()
.setCode(ProtocolStatusCode.PROTOCOL_STATUS_CODE_OK)
.build();
}
private static class TestService extends MxAccessGatewayGrpc.MxAccessGatewayImplBase {
@Override
public void openSession(OpenSessionRequest request, StreamObserver<OpenSessionReply> responseObserver) {
responseObserver.onNext(OpenSessionReply.newBuilder()
.setSessionId("session-java")
.setProtocolStatus(ok())
.build());
responseObserver.onCompleted();
}
@Override
public void closeSession(CloseSessionRequest request, StreamObserver<CloseSessionReply> responseObserver) {
responseObserver.onNext(CloseSessionReply.newBuilder()
.setSessionId(request.getSessionId())
.setProtocolStatus(ok())
.build());
responseObserver.onCompleted();
}
@Override
public void invoke(MxCommandRequest request, StreamObserver<MxCommandReply> responseObserver) {
responseObserver.onNext(MxCommandReply.newBuilder()
.setSessionId(request.getSessionId())
.setKind(MxCommandKind.MX_COMMAND_KIND_UNSPECIFIED)
.setProtocolStatus(ok())
.build());
responseObserver.onCompleted();
}
}
private record Harness(Server server, ManagedChannel channel, MxGatewayClient client) implements AutoCloseable {
static Harness start(MxAccessGatewayGrpc.MxAccessGatewayImplBase service) throws Exception {
String name = "mxgw-medium-" + UUID.randomUUID();
Server server = InProcessServerBuilder.forName(name)
.directExecutor()
.addService(service)
.build()
.start();
ManagedChannel channel = InProcessChannelBuilder.forName(name).directExecutor().build();
MxGatewayClient client = new MxGatewayClient(
channel,
MxGatewayClientOptions.builder()
.endpoint("in-process")
.apiKey("")
.plaintext(true)
.callTimeout(Duration.ofSeconds(5))
.build());
return new Harness(server, channel, client);
}
@Override
public void close() {
channel.shutdownNow();
server.shutdownNow();
}
}
}
+19
View File
@@ -131,6 +131,25 @@ The methods return native Python types (`bool`, `datetime | None`, and a
into the hierarchy without learning the underlying stub class. The
service requires the `metadata:read` scope on the API key.
`discover_hierarchy` buffers every object (with its full attribute list)
into a single in-memory `list`. For a large Galaxy use `iter_hierarchy`
instead — it is an async generator that fetches one page at a time and
yields objects as they arrive, so peak memory stays bounded by a single
page rather than the whole hierarchy:
```python
async with await GalaxyRepositoryClient.connect(
endpoint="localhost:5000",
api_key="<gateway-api-key>",
plaintext=True,
) as galaxy:
async for obj in galaxy.iter_hierarchy():
print(obj.tag_name, obj.contained_name)
```
Pages are fetched lazily: the next page is only requested once the
caller has consumed every object from the current page.
### Watching deploy events
`GalaxyRepositoryClient.watch_deploy_events` opens a server-streaming
+19 -2
View File
@@ -133,7 +133,7 @@ class GatewayClient:
kwargs: dict[str, Any] = {"metadata": merge_metadata(self.options.api_key, metadata)}
if self.options.stream_timeout is not None:
kwargs["timeout"] = self.options.stream_timeout
call = self.raw_stub.StreamEvents(request, **kwargs)
call = _open_stream(self.raw_stub.StreamEvents, request, kwargs)
return _canceling_iterator(call)
async def acknowledge_alarm(
@@ -169,7 +169,7 @@ class GatewayClient:
kwargs: dict[str, Any] = {"metadata": merge_metadata(self.options.api_key, metadata)}
if self.options.stream_timeout is not None:
kwargs["timeout"] = self.options.stream_timeout
call = self.raw_stub.QueryActiveAlarms(request, **kwargs)
call = _open_stream(self.raw_stub.QueryActiveAlarms, request, kwargs)
return _canceling_active_alarms_iterator(call)
async def _unary(
@@ -201,6 +201,23 @@ class GatewayClient:
raise map_rpc_error(operation, error) from error
def _open_stream(method: Any, request: Any, kwargs: dict[str, Any]) -> Any:
"""Open a server-streaming call, dropping ``timeout`` if the stub rejects it.
Mirrors the fallback in ``_unary`` so an older or fake stub that does not
accept a ``timeout`` keyword argument does not crash when ``stream_timeout``
is configured.
"""
try:
return method(request, **kwargs)
except TypeError as error:
if "timeout" not in kwargs or "unexpected keyword argument 'timeout'" not in str(error):
raise
kwargs.pop("timeout")
return method(request, **kwargs)
async def _canceling_iterator(call: Any) -> AsyncIterator[pb.MxEvent]:
try:
async for event in call:
+23 -5
View File
@@ -114,10 +114,17 @@ class GalaxyRepositoryClient:
return None
return reply.time_of_last_deploy.ToDatetime()
async def discover_hierarchy(self) -> list[galaxy_pb.GalaxyObject]:
"""Return the deployed Galaxy object hierarchy as raw proto messages."""
async def iter_hierarchy(self) -> AsyncIterator[galaxy_pb.GalaxyObject]:
"""Yield the deployed Galaxy object hierarchy one object at a time.
Pages are fetched lazily: a page is only requested once the caller has
consumed every object from the previous page. This keeps peak memory
bounded by a single page (``_DISCOVER_HIERARCHY_PAGE_SIZE`` objects)
rather than the whole Galaxy. Use this for large Galaxies; use
:meth:`discover_hierarchy` when a fully buffered ``list`` is convenient
and the Galaxy is known to be small.
"""
objects: list[galaxy_pb.GalaxyObject] = []
seen_page_tokens: set[str] = set()
page_token = ""
while True:
@@ -129,16 +136,27 @@ class GalaxyRepositoryClient:
page_token=page_token,
),
)
objects.extend(reply.objects)
for obj in reply.objects:
yield obj
page_token = reply.next_page_token
if not page_token:
return objects
return
if page_token in seen_page_tokens:
raise MxGatewayError(
f"galaxy discover hierarchy returned repeated page token {page_token!r}"
)
seen_page_tokens.add(page_token)
async def discover_hierarchy(self) -> list[galaxy_pb.GalaxyObject]:
"""Return the deployed Galaxy object hierarchy as raw proto messages.
This buffers every object (and its full attribute list) into a single
in-memory ``list``. For a large Galaxy prefer :meth:`iter_hierarchy`,
which streams objects page by page without holding the whole hierarchy.
"""
return [obj async for obj in self.iter_hierarchy()]
def watch_deploy_events(
self,
last_seen_deploy_time: datetime | None = None,
+284
View File
@@ -0,0 +1,284 @@
"""Regression tests for Client.Python-009: untested public paths.
Covers `Session.write2`/`add_item2` request construction, the bulk-size limit
guard, the ``None``-argument ``TypeError`` guards, the TLS ``ca_file`` read
path in `create_channel`, the generic `map_rpc_error` fallthrough, and a
happy-path CLI command body driven by a fake stub.
"""
from __future__ import annotations
import json
from datetime import datetime, timezone
from typing import Any
import grpc
import pytest
from click.testing import CliRunner
from mxgateway import ClientOptions, GatewayClient
from mxgateway.errors import MxGatewayTransportError, map_rpc_error
from mxgateway.generated import mxaccess_gateway_pb2 as pb
from mxgateway.options import create_channel
from mxgateway.session import MAX_BULK_ITEMS, Session
class _FakeUnary:
def __init__(self, replies: list[Any]) -> None:
self.replies = list(replies)
self.requests: list[Any] = []
self.metadata: tuple[tuple[str, str], ...] | None = None
async def __call__(
self,
request: Any,
*,
metadata: tuple[tuple[str, str], ...],
) -> Any:
self.requests.append(request)
self.metadata = metadata
return self.replies.pop(0)
class _FakeGatewayStub:
def __init__(self) -> None:
self.open_session = _FakeUnary(
[
pb.OpenSessionReply(
session_id="session-1",
protocol_status=pb.ProtocolStatus(code=pb.PROTOCOL_STATUS_CODE_OK),
),
],
)
self.invoke = _FakeUnary([])
self.OpenSession = self.open_session
self.Invoke = self.invoke
def _ok_reply(kind: int, **fields: Any) -> pb.MxCommandReply:
return pb.MxCommandReply(
session_id="session-1",
kind=kind,
protocol_status=pb.ProtocolStatus(code=pb.PROTOCOL_STATUS_CODE_OK),
**fields,
)
# --- write2 / add_item2 request construction -------------------------------
@pytest.mark.asyncio
async def test_add_item2_sends_item_context_and_returns_handle() -> None:
stub = _FakeGatewayStub()
stub.invoke.replies = [
_ok_reply(pb.MX_COMMAND_KIND_ADD_ITEM2, add_item2=pb.AddItem2Reply(item_handle=77)),
]
client = await GatewayClient.connect(
ClientOptions(endpoint="fake", plaintext=True),
stub=stub,
)
session = await client.open_session()
item_handle = await session.add_item2(12, "Object.Attribute", "ctx-A")
assert item_handle == 77
command = stub.invoke.requests[0].command
assert command.kind == pb.MX_COMMAND_KIND_ADD_ITEM2
assert command.add_item2.server_handle == 12
assert command.add_item2.item_definition == "Object.Attribute"
assert command.add_item2.item_context == "ctx-A"
@pytest.mark.asyncio
async def test_write2_sends_value_and_timestamp_value() -> None:
stub = _FakeGatewayStub()
stub.invoke.replies = [_ok_reply(pb.MX_COMMAND_KIND_WRITE2)]
client = await GatewayClient.connect(
ClientOptions(endpoint="fake", plaintext=True),
stub=stub,
)
session = await client.open_session()
when = datetime(2025, 4, 1, 12, 0, 0, tzinfo=timezone.utc)
await session.write2(12, 34, 123, when, user_id=5)
command = stub.invoke.requests[0].command
assert command.kind == pb.MX_COMMAND_KIND_WRITE2
assert command.write2.server_handle == 12
assert command.write2.item_handle == 34
assert command.write2.user_id == 5
# The integer value is carried as the int32 field of the MxValue oneof.
assert command.write2.value.WhichOneof("kind") == "int32_value"
assert command.write2.value.int32_value == 123
# The timestamp value carries the datetime via the timestamp_value oneof.
assert command.write2.timestamp_value.WhichOneof("kind") == "timestamp_value"
assert command.write2.timestamp_value.timestamp_value.ToDatetime(
tzinfo=timezone.utc,
) == when
# --- bulk-size limit + None-argument guards --------------------------------
@pytest.mark.asyncio
async def test_subscribe_bulk_rejects_oversized_request() -> None:
stub = _FakeGatewayStub()
client = await GatewayClient.connect(
ClientOptions(endpoint="fake", plaintext=True),
stub=stub,
)
session = await client.open_session()
oversized = [f"Tag_{i}" for i in range(MAX_BULK_ITEMS + 1)]
with pytest.raises(ValueError, match=str(MAX_BULK_ITEMS)):
await session.subscribe_bulk(12, oversized)
# No RPC should have been issued for a rejected request.
assert stub.invoke.requests == []
@pytest.mark.asyncio
async def test_advise_item_bulk_rejects_none_argument() -> None:
stub = _FakeGatewayStub()
client = await GatewayClient.connect(
ClientOptions(endpoint="fake", plaintext=True),
stub=stub,
)
session = await client.open_session()
with pytest.raises(TypeError, match="item_handles is required"):
await session.advise_item_bulk(12, None) # type: ignore[arg-type]
@pytest.mark.asyncio
async def test_add_item_bulk_at_limit_is_allowed() -> None:
stub = _FakeGatewayStub()
stub.invoke.replies = [
_ok_reply(
pb.MX_COMMAND_KIND_ADD_ITEM_BULK,
add_item_bulk=pb.BulkSubscribeReply(results=[]),
),
]
client = await GatewayClient.connect(
ClientOptions(endpoint="fake", plaintext=True),
stub=stub,
)
session = await client.open_session()
at_limit = [f"Tag_{i}" for i in range(MAX_BULK_ITEMS)]
results = await session.add_item_bulk(12, at_limit)
assert results == []
assert len(stub.invoke.requests) == 1
assert len(stub.invoke.requests[0].command.add_item_bulk.tag_addresses) == MAX_BULK_ITEMS
# --- TLS ca_file read path -------------------------------------------------
@pytest.mark.asyncio
async def test_create_channel_reads_ca_file(tmp_path: Any) -> None:
ca_path = tmp_path / "ca.pem"
ca_path.write_bytes(b"-----BEGIN CERTIFICATE-----\nfake\n-----END CERTIFICATE-----\n")
channel = create_channel(
ClientOptions(
endpoint="mxgateway.example.local:5001",
ca_file=str(ca_path),
server_name_override="mxgateway.example.local",
),
)
# A secure channel object is returned without raising; the ca_file was read.
assert channel is not None
await channel.close()
def test_create_channel_missing_ca_file_raises() -> None:
with pytest.raises(FileNotFoundError):
create_channel(
ClientOptions(
endpoint="mxgateway.example.local:5001",
ca_file="C:/does/not/exist/ca.pem",
),
)
# --- map_rpc_error generic fallthrough -------------------------------------
class _FakeRpcError(grpc.RpcError):
def __init__(self, code: grpc.StatusCode, details: str) -> None:
self._code = code
self._details = details
def code(self) -> grpc.StatusCode:
return self._code
def details(self) -> str:
return self._details
def test_map_rpc_error_generic_branch_returns_transport_error() -> None:
error = _FakeRpcError(grpc.StatusCode.UNAVAILABLE, "connection refused")
mapped = map_rpc_error("invoke", error)
assert type(mapped) is MxGatewayTransportError
assert "invoke failed: connection refused" in str(mapped)
def test_map_rpc_error_handles_error_without_code() -> None:
mapped = map_rpc_error("invoke", grpc.RpcError())
assert type(mapped) is MxGatewayTransportError
assert "invoke failed:" in str(mapped)
# --- happy-path CLI command body -------------------------------------------
def test_cli_register_happy_path_emits_server_handle(monkeypatch: Any) -> None:
"""Drive the `register` CLI command end to end against a fake stub."""
from mxgateway_cli import commands
invoke = _FakeUnary(
[
_ok_reply(
pb.MX_COMMAND_KIND_REGISTER,
register=pb.RegisterReply(server_handle=99),
),
],
)
class _Stub:
def __init__(self) -> None:
self.Invoke = invoke
async def _fake_connect(kwargs: dict[str, Any]) -> GatewayClient:
return await GatewayClient.connect(
ClientOptions(endpoint=kwargs["endpoint"], plaintext=True),
stub=_Stub(),
)
monkeypatch.setattr(commands, "_connect", _fake_connect)
runner = CliRunner()
result = runner.invoke(
commands.main,
[
"register",
"--endpoint",
"localhost:5000",
"--session-id",
"session-1",
"--client-name",
"pytest-client",
"--json",
],
)
assert result.exit_code == 0, result.output
assert json.loads(result.output) == {"serverHandle": 99}
assert invoke.requests[0].command.register.client_name == "pytest-client"
@@ -0,0 +1,127 @@
"""Regression tests for Client.Python-005: streaming hierarchy iteration.
`GalaxyRepositoryClient.iter_hierarchy` yields objects page by page instead of
buffering the entire Galaxy hierarchy in memory, and `discover_hierarchy`
remains a convenience wrapper built on top of it.
"""
from __future__ import annotations
from typing import Any
import pytest
from mxgateway import ClientOptions, GalaxyRepositoryClient
from mxgateway.generated import galaxy_repository_pb2 as galaxy_pb
class _FakeUnary:
def __init__(self, replies: list[Any]) -> None:
self.replies = list(replies)
self.requests: list[Any] = []
self.metadata: tuple[tuple[str, str], ...] | None = None
async def __call__(
self,
request: Any,
*,
metadata: tuple[tuple[str, str], ...],
timeout: float | None = None,
) -> Any:
self.requests.append(request)
self.metadata = metadata
return self.replies.pop(0)
class _FakeGalaxyStub:
def __init__(self, discover_replies: list[Any]) -> None:
self.DiscoverHierarchy = _FakeUnary(discover_replies)
def _two_page_replies() -> list[galaxy_pb.DiscoverHierarchyReply]:
return [
galaxy_pb.DiscoverHierarchyReply(
next_page_token="page-2",
total_object_count=3,
objects=[
galaxy_pb.GalaxyObject(gobject_id=1, tag_name="Area_001", is_area=True),
galaxy_pb.GalaxyObject(gobject_id=2, tag_name="Pump_001"),
],
),
galaxy_pb.DiscoverHierarchyReply(
total_object_count=3,
objects=[
galaxy_pb.GalaxyObject(gobject_id=3, tag_name="Pump_002"),
],
),
]
@pytest.mark.asyncio
async def test_iter_hierarchy_yields_objects_across_pages() -> None:
stub = _FakeGalaxyStub(_two_page_replies())
client = await GalaxyRepositoryClient.connect(
ClientOptions(endpoint="fake", plaintext=True),
stub=stub,
)
tags = [obj.tag_name async for obj in client.iter_hierarchy()]
assert tags == ["Area_001", "Pump_001", "Pump_002"]
assert len(stub.DiscoverHierarchy.requests) == 2
assert stub.DiscoverHierarchy.requests[0].page_token == ""
assert stub.DiscoverHierarchy.requests[1].page_token == "page-2"
@pytest.mark.asyncio
async def test_iter_hierarchy_is_lazy_and_does_not_prefetch_next_page() -> None:
"""Pulling only the first object must not have requested the second page."""
stub = _FakeGalaxyStub(_two_page_replies())
client = await GalaxyRepositoryClient.connect(
ClientOptions(endpoint="fake", plaintext=True),
stub=stub,
)
iterator = client.iter_hierarchy()
first = await iterator.__anext__()
assert first.tag_name == "Area_001"
# Only the first page should have been fetched so far.
assert len(stub.DiscoverHierarchy.requests) == 1
await iterator.aclose()
@pytest.mark.asyncio
async def test_iter_hierarchy_rejects_repeated_page_token() -> None:
stub = _FakeGalaxyStub(
[
galaxy_pb.DiscoverHierarchyReply(next_page_token="7:1"),
galaxy_pb.DiscoverHierarchyReply(next_page_token="7:1"),
],
)
client = await GalaxyRepositoryClient.connect(
ClientOptions(endpoint="fake", plaintext=True),
stub=stub,
)
with pytest.raises(Exception, match="repeated page token"):
async for _ in client.iter_hierarchy():
pass
@pytest.mark.asyncio
async def test_discover_hierarchy_still_returns_full_list() -> None:
"""The convenience wrapper must keep returning a buffered list."""
stub = _FakeGalaxyStub(_two_page_replies())
client = await GalaxyRepositoryClient.connect(
ClientOptions(endpoint="fake", plaintext=True),
stub=stub,
)
objects = await client.discover_hierarchy()
assert isinstance(objects, list)
assert [obj.tag_name for obj in objects] == ["Area_001", "Pump_001", "Pump_002"]
@@ -0,0 +1,132 @@
"""Regression tests for Client.Python-003: stream timeout-kwarg fallback.
`stream_events_raw` and `query_active_alarms` must tolerate a fake/older stub
that does not accept a ``timeout`` keyword argument, matching the fallback
already present in `galaxy.watch_deploy_events` and the unary `_unary` helper.
"""
from __future__ import annotations
from typing import Any
import pytest
from mxgateway import ClientOptions, GatewayClient
from mxgateway.generated import mxaccess_gateway_pb2 as pb
class _NoTimeoutStream:
"""Sync-callable unary-stream fake that rejects a ``timeout`` kwarg."""
def __init__(self, replies: list[Any]) -> None:
self._replies = list(replies)
self.requests: list[Any] = []
self.metadata: tuple[tuple[str, str], ...] | None = None
self.cancelled = False
def __call__(
self,
request: Any,
*,
metadata: tuple[tuple[str, str], ...],
) -> "_NoTimeoutStream":
self.requests.append(request)
self.metadata = metadata
return self
def __aiter__(self) -> "_NoTimeoutStream":
return self
async def __anext__(self) -> Any:
if not self._replies:
raise StopAsyncIteration
return self._replies.pop(0)
def cancel(self) -> None:
self.cancelled = True
class _NoTimeoutStubStreamEvents:
def __init__(self, stream: _NoTimeoutStream) -> None:
self.StreamEvents = stream
class _NoTimeoutStubQueryAlarms:
def __init__(self, stream: _NoTimeoutStream) -> None:
self.QueryActiveAlarms = stream
@pytest.mark.asyncio
async def test_stream_events_raw_falls_back_when_stub_rejects_timeout() -> None:
stream = _NoTimeoutStream(
[pb.MxEvent(session_id="session-1", worker_sequence=1)],
)
client = await GatewayClient.connect(
ClientOptions(endpoint="fake", plaintext=True, stream_timeout=5.0),
stub=_NoTimeoutStubStreamEvents(stream),
)
received = [
event
async for event in client.stream_events_raw(
pb.StreamEventsRequest(session_id="session-1"),
)
]
assert len(received) == 1
assert received[0].worker_sequence == 1
@pytest.mark.asyncio
async def test_query_active_alarms_falls_back_when_stub_rejects_timeout() -> None:
stream = _NoTimeoutStream(
[pb.ActiveAlarmSnapshot(alarm_full_reference="Tank01.Level.HiHi")],
)
client = await GatewayClient.connect(
ClientOptions(endpoint="fake", plaintext=True, stream_timeout=5.0),
stub=_NoTimeoutStubQueryAlarms(stream),
)
received = [
snapshot
async for snapshot in client.query_active_alarms(
pb.QueryActiveAlarmsRequest(session_id="session-1"),
)
]
assert len(received) == 1
assert received[0].alarm_full_reference == "Tank01.Level.HiHi"
@pytest.mark.asyncio
async def test_stream_events_raw_still_passes_timeout_to_capable_stub() -> None:
"""A stub that accepts ``timeout`` must still receive the configured value."""
captured: dict[str, Any] = {}
class _CapableStream(_NoTimeoutStream):
def __call__( # type: ignore[override]
self,
request: Any,
*,
metadata: tuple[tuple[str, str], ...],
timeout: float | None = None,
) -> "_CapableStream":
captured["timeout"] = timeout
return super().__call__(request, metadata=metadata)
stream = _CapableStream([pb.MxEvent(session_id="session-1", worker_sequence=9)])
client = await GatewayClient.connect(
ClientOptions(endpoint="fake", plaintext=True, stream_timeout=7.5),
stub=_NoTimeoutStubStreamEvents(stream),
)
received = [
event
async for event in client.stream_events_raw(
pb.StreamEventsRequest(session_id="session-1"),
)
]
assert len(received) == 1
assert captured["timeout"] == 7.5
+19 -14
View File
@@ -11,28 +11,34 @@ generated contract inputs.
## Crate Layout
Recommended layout:
Actual layout — the `mxgateway-client` library crate is the workspace root,
with the `mxgw` test CLI as a workspace member:
```text
clients/rust/
clients/rust/ # `mxgateway-client` library crate (workspace root)
Cargo.toml
build.rs
src/
lib.rs
client.rs
session.rs
galaxy.rs
options.rs
auth.rs
value.rs
version.rs
error.rs
generated.rs
crates/
mxgateway-client/
src/lib.rs
src/client.rs
src/session.rs
src/options.rs
src/auth.rs
src/value.rs
src/error.rs
src/generated/
mxgw-cli/
mxgw-cli/ # `mxgw` test CLI (workspace member)
Cargo.toml
src/main.rs
tests/
client_behavior.rs
proto_fixtures.rs
```
Expected dependencies:
Dependencies:
- `tonic`
- `prost`
@@ -43,7 +49,6 @@ Expected dependencies:
- `clap`
- `serde`
- `serde_json`
- `tracing`
## Library API
+8 -2
View File
@@ -1048,8 +1048,14 @@ mod tests {
fn version_json_output_has_protocol_versions() {
let value = super::version_json();
assert_eq!(value["gatewayProtocolVersion"], 2);
assert_eq!(value["workerProtocolVersion"], 1);
assert_eq!(
value["gatewayProtocolVersion"],
super::GATEWAY_PROTOCOL_VERSION
);
assert_eq!(
value["workerProtocolVersion"],
super::WORKER_PROTOCOL_VERSION
);
}
#[test]
+3 -1
View File
@@ -219,7 +219,9 @@ impl GatewayClient {
request: AcknowledgeAlarmRequest,
) -> Result<AcknowledgeAlarmReply, Error> {
let mut client = self.inner.clone();
let response = client.acknowledge_alarm(self.unary_request(request)).await?;
let response = client
.acknowledge_alarm(self.unary_request(request))
.await?;
let reply = response.into_inner();
ensure_protocol_success("acknowledge alarm", reply.protocol_status.as_ref())?;
Ok(reply)
+28 -4
View File
@@ -1,10 +1,10 @@
//! Error types surfaced by the Rust client.
//!
//! [`Error`] is the umbrella enum returned by every async wrapper. It
//! classifies `tonic::Status` codes (auth, timeout, cancellation) and folds
//! gateway protocol failures and command-level rejections into structured
//! variants. Credentials embedded in status messages are scrubbed before the
//! message reaches a caller.
//! classifies `tonic::Status` codes (auth, timeout, cancellation, transient
//! unavailability) and folds gateway protocol failures and command-level
//! rejections into structured variants. Credentials embedded in status
//! messages are scrubbed before the message reaches a caller.
use thiserror::Error as ThisError;
use tonic::Code;
@@ -85,6 +85,17 @@ pub enum Error {
status: Box<tonic::Status>,
},
/// Server returned `Unavailable` or `ResourceExhausted` — a transient
/// failure (gateway restart, overload) that a caller may reasonably retry.
#[error("gateway temporarily unavailable: {message}")]
Unavailable {
/// Redacted server-supplied detail message.
message: String,
/// Original `tonic::Status`.
#[source]
status: Box<tonic::Status>,
},
/// Any other `tonic::Status` that did not match a more specific variant.
#[error("gateway status error: {0}")]
Status(Box<tonic::Status>),
@@ -106,6 +117,15 @@ pub enum Error {
/// Detail message from the server.
message: String,
},
/// The gateway returned an OK reply whose payload did not carry the data
/// the command contract requires (for example, an `AddItem` reply with no
/// item handle and no `return_value`).
#[error("malformed gateway reply: {detail}")]
MalformedReply {
/// Human-readable description of what the reply was missing.
detail: String,
},
}
/// Wrapper around an [`MxCommandReply`] whose `protocol_status` reported a
@@ -174,6 +194,10 @@ impl From<tonic::Status> for Error {
message,
status: Box::new(status),
},
Code::Unavailable | Code::ResourceExhausted => Self::Unavailable {
message,
status: Box::new(status),
},
_ => Self::Status(Box::new(status)),
}
}
+1 -1
View File
@@ -279,7 +279,7 @@ mod tests {
_request: Request<GetLastDeployTimeRequest>,
) -> Result<Response<GetLastDeployTimeReply>, Status> {
let present = *self.state.present.lock().unwrap();
let time = self.state.last_deploy.lock().unwrap().clone();
let time = *self.state.last_deploy.lock().unwrap();
Ok(Response::new(GetLastDeployTimeReply {
present,
time_of_last_deploy: time,
+3
View File
@@ -95,6 +95,8 @@ impl ClientOptions {
self
}
/// Maximum encoded/decoded gRPC message size, in bytes, the transport
/// will accept. Defaults to 16 MiB.
pub fn with_max_grpc_message_bytes(mut self, max_grpc_message_bytes: usize) -> Self {
self.max_grpc_message_bytes = max_grpc_message_bytes;
self
@@ -140,6 +142,7 @@ impl ClientOptions {
self.stream_timeout
}
/// Configured maximum gRPC message size in bytes.
pub fn max_grpc_message_bytes(&self) -> usize {
self.max_grpc_message_bytes
}
+68 -44
View File
@@ -8,6 +8,8 @@
//! Bulk commands enforce a 1000-item cap before contacting the worker, in
//! line with the gateway's documented `MAX_BULK_ITEMS`.
use std::sync::atomic::{AtomicU64, Ordering};
use crate::client::{EventStream, GatewayClient};
use crate::error::{ensure_protocol_success, Error};
use crate::generated::mxaccess_gateway::v1::mx_command::Payload;
@@ -23,6 +25,16 @@ use crate::value::MxValue;
const MAX_BULK_ITEMS: usize = 1_000;
/// Process-wide monotonic counter that keeps client correlation ids unique.
static CORRELATION_SEQUENCE: AtomicU64 = AtomicU64::new(0);
/// Build a unique `client_correlation_id` for a request so concurrent or
/// repeated calls of the same command kind can be told apart in gateway logs.
fn next_correlation_id(label: &str) -> String {
let sequence = CORRELATION_SEQUENCE.fetch_add(1, Ordering::Relaxed);
format!("rust-client-{label}-{sequence}")
}
/// Handle to an opened gateway session.
///
/// `Session` carries the gateway-issued session id and a cloned
@@ -76,7 +88,7 @@ impl Session {
.client
.close_session_raw(CloseSessionRequest {
session_id: self.id.clone(),
client_correlation_id: "rust-client-close-session".to_owned(),
client_correlation_id: next_correlation_id("close-session"),
})
.await?;
ensure_protocol_success("close session", reply.protocol_status.as_ref())?;
@@ -99,7 +111,7 @@ impl Session {
)
.await?;
Ok(register_server_handle(&reply))
register_server_handle(&reply)
}
/// Run MXAccess `AddItem` against `server_handle` and return the
@@ -120,7 +132,7 @@ impl Session {
)
.await?;
Ok(add_item_handle(&reply))
add_item_handle(&reply)
}
/// Run MXAccess `AddItem2` (item with a caller-supplied context string)
@@ -146,7 +158,7 @@ impl Session {
)
.await?;
Ok(add_item2_handle(&reply))
add_item2_handle(&reply)
}
/// Run MXAccess `RemoveItem` for the given handle pair.
@@ -226,7 +238,7 @@ impl Session {
)
.await?;
Ok(bulk_results(reply, BulkReplyKind::AddItemBulk))
bulk_results(reply, BulkReplyKind::AddItem)
}
/// Bulk variant of [`Session::advise`].
@@ -250,7 +262,7 @@ impl Session {
)
.await?;
Ok(bulk_results(reply, BulkReplyKind::AdviseItemBulk))
bulk_results(reply, BulkReplyKind::AdviseItem)
}
/// Bulk variant of [`Session::remove_item`].
@@ -274,7 +286,7 @@ impl Session {
)
.await?;
Ok(bulk_results(reply, BulkReplyKind::RemoveItemBulk))
bulk_results(reply, BulkReplyKind::RemoveItem)
}
/// Bulk variant of [`Session::un_advise`].
@@ -298,7 +310,7 @@ impl Session {
)
.await?;
Ok(bulk_results(reply, BulkReplyKind::UnAdviseItemBulk))
bulk_results(reply, BulkReplyKind::UnAdviseItem)
}
/// Bulk `Subscribe` (atomic add-and-advise) for a list of tag addresses.
@@ -322,7 +334,7 @@ impl Session {
)
.await?;
Ok(bulk_results(reply, BulkReplyKind::SubscribeBulk))
bulk_results(reply, BulkReplyKind::Subscribe)
}
/// Bulk `Unsubscribe` (atomic un-advise-and-remove) for a list of
@@ -347,7 +359,7 @@ impl Session {
)
.await?;
Ok(bulk_results(reply, BulkReplyKind::UnsubscribeBulk))
bulk_results(reply, BulkReplyKind::Unsubscribe)
}
/// Run MXAccess `Write` (single-value, no caller-supplied timestamp).
@@ -466,7 +478,7 @@ impl Session {
fn command_request(&self, kind: MxCommandKind, payload: Payload) -> MxCommandRequest {
MxCommandRequest {
session_id: self.id.clone(),
client_correlation_id: format!("rust-client-{}", kind.as_str_name()),
client_correlation_id: next_correlation_id(kind.as_str_name()),
command: Some(MxCommand {
kind: kind as i32,
payload: Some(payload),
@@ -486,71 +498,83 @@ fn ensure_bulk_size(name: &'static str, len: usize) -> Result<(), Error> {
}
}
fn register_server_handle(reply: &MxCommandReply) -> i32 {
fn register_server_handle(reply: &MxCommandReply) -> Result<i32, Error> {
match reply.payload.as_ref() {
Some(mx_command_reply::Payload::Register(register)) => register.server_handle,
Some(mx_command_reply::Payload::Register(register)) => Ok(register.server_handle),
_ => reply
.return_value
.as_ref()
.and_then(int32_reply_value)
.unwrap_or_default(),
.ok_or_else(|| Error::MalformedReply {
detail: "Register reply carried neither a register payload nor an \
int32 return value"
.to_owned(),
}),
}
}
fn add_item_handle(reply: &MxCommandReply) -> i32 {
fn add_item_handle(reply: &MxCommandReply) -> Result<i32, Error> {
match reply.payload.as_ref() {
Some(mx_command_reply::Payload::AddItem(add_item)) => add_item.item_handle,
Some(mx_command_reply::Payload::AddItem(add_item)) => Ok(add_item.item_handle),
_ => reply
.return_value
.as_ref()
.and_then(int32_reply_value)
.unwrap_or_default(),
.ok_or_else(|| Error::MalformedReply {
detail: "AddItem reply carried neither an add_item payload nor an \
int32 return value"
.to_owned(),
}),
}
}
fn add_item2_handle(reply: &MxCommandReply) -> i32 {
fn add_item2_handle(reply: &MxCommandReply) -> Result<i32, Error> {
match reply.payload.as_ref() {
Some(mx_command_reply::Payload::AddItem2(add_item)) => add_item.item_handle,
Some(mx_command_reply::Payload::AddItem2(add_item)) => Ok(add_item.item_handle),
_ => reply
.return_value
.as_ref()
.and_then(int32_reply_value)
.unwrap_or_default(),
.ok_or_else(|| Error::MalformedReply {
detail: "AddItem2 reply carried neither an add_item2 payload nor an \
int32 return value"
.to_owned(),
}),
}
}
enum BulkReplyKind {
AddItemBulk,
AdviseItemBulk,
RemoveItemBulk,
UnAdviseItemBulk,
SubscribeBulk,
UnsubscribeBulk,
AddItem,
AdviseItem,
RemoveItem,
UnAdviseItem,
Subscribe,
Unsubscribe,
}
fn bulk_results(reply: MxCommandReply, kind: BulkReplyKind) -> Vec<SubscribeResult> {
fn bulk_results(reply: MxCommandReply, kind: BulkReplyKind) -> Result<Vec<SubscribeResult>, Error> {
match (reply.payload, kind) {
(Some(mx_command_reply::Payload::AddItemBulk(reply)), BulkReplyKind::AddItemBulk) => {
reply.results
(Some(mx_command_reply::Payload::AddItemBulk(reply)), BulkReplyKind::AddItem) => {
Ok(reply.results)
}
(Some(mx_command_reply::Payload::AdviseItemBulk(reply)), BulkReplyKind::AdviseItemBulk) => {
reply.results
(Some(mx_command_reply::Payload::AdviseItemBulk(reply)), BulkReplyKind::AdviseItem) => {
Ok(reply.results)
}
(Some(mx_command_reply::Payload::RemoveItemBulk(reply)), BulkReplyKind::RemoveItemBulk) => {
reply.results
(Some(mx_command_reply::Payload::RemoveItemBulk(reply)), BulkReplyKind::RemoveItem) => {
Ok(reply.results)
}
(
Some(mx_command_reply::Payload::UnAdviseItemBulk(reply)),
BulkReplyKind::UnAdviseItemBulk,
) => reply.results,
(Some(mx_command_reply::Payload::SubscribeBulk(reply)), BulkReplyKind::SubscribeBulk) => {
reply.results
(Some(mx_command_reply::Payload::UnAdviseItemBulk(reply)), BulkReplyKind::UnAdviseItem) => {
Ok(reply.results)
}
(
Some(mx_command_reply::Payload::UnsubscribeBulk(reply)),
BulkReplyKind::UnsubscribeBulk,
) => reply.results,
_ => Vec::new(),
(Some(mx_command_reply::Payload::SubscribeBulk(reply)), BulkReplyKind::Subscribe) => {
Ok(reply.results)
}
(Some(mx_command_reply::Payload::UnsubscribeBulk(reply)), BulkReplyKind::Unsubscribe) => {
Ok(reply.results)
}
_ => Err(Error::MalformedReply {
detail: "bulk command reply did not carry the expected bulk result payload".to_owned(),
}),
}
}
+17 -16
View File
@@ -25,15 +25,13 @@ use crate::generated::mxaccess_gateway::v1::{
#[derive(Clone, Debug, PartialEq)]
pub struct MxValue {
raw: ProtoMxValue,
projection: MxValueProjection,
}
impl MxValue {
/// Wrap a protobuf [`ProtoMxValue`] and compute its
/// [`MxValueProjection`].
/// Wrap a protobuf [`ProtoMxValue`]. The typed [`MxValueProjection`] is
/// computed on demand by [`MxValue::projection`].
pub fn from_proto(raw: ProtoMxValue) -> Self {
let projection = MxValueProjection::from_proto(&raw);
Self { raw, projection }
Self { raw }
}
/// Build a boolean `MxValue` (`MxDataType::Boolean`, `VT_BOOL`).
@@ -102,9 +100,13 @@ impl MxValue {
&self.raw
}
/// Borrow the typed projection.
pub fn projection(&self) -> &MxValueProjection {
&self.projection
/// Compute the typed projection of this value.
///
/// The projection is derived from the raw message on each call rather than
/// cached, so a value built only to be sent over the wire never pays the
/// projection's allocation cost.
pub fn projection(&self) -> MxValueProjection {
MxValueProjection::from_proto(&self.raw)
}
/// Consume the wrapper and return the underlying protobuf message.
@@ -183,15 +185,13 @@ impl MxValueProjection {
#[derive(Clone, Debug, PartialEq)]
pub struct MxArrayValue {
raw: MxArray,
projection: MxArrayProjection,
}
impl MxArrayValue {
/// Wrap a protobuf [`MxArray`] and compute its
/// [`MxArrayProjection`].
/// Wrap a protobuf [`MxArray`]. The typed [`MxArrayProjection`] is
/// computed on demand by [`MxArrayValue::projection`].
pub fn from_proto(raw: MxArray) -> Self {
let projection = MxArrayProjection::from_proto(&raw);
Self { raw, projection }
Self { raw }
}
/// Build a one-dimensional string array (`VT_ARRAY|VT_BSTR`).
@@ -210,9 +210,10 @@ impl MxArrayValue {
&self.raw
}
/// Borrow the typed projection of the array's elements.
pub fn projection(&self) -> &MxArrayProjection {
&self.projection
/// Compute the typed projection of the array's elements, derived from the
/// raw message on each call rather than cached.
pub fn projection(&self) -> MxArrayProjection {
MxArrayProjection::from_proto(&self.raw)
}
}
+3 -2
View File
@@ -3,8 +3,9 @@
//! The protocol versions track the values the gateway and worker negotiate on
//! `OpenSession` and let test harnesses cross-check the wire contract.
/// Semantic version of this Rust client crate. Mirrors `Cargo.toml`.
pub const CLIENT_VERSION: &str = "0.1.0-dev";
/// Semantic version of this Rust client crate, taken from `Cargo.toml` at
/// compile time so the two cannot drift.
pub const CLIENT_VERSION: &str = env!("CARGO_PKG_VERSION");
/// Public gateway gRPC protocol version this client targets.
pub const GATEWAY_PROTOCOL_VERSION: u32 = 3;
+73 -2
View File
@@ -203,7 +203,7 @@ fn value_conversion_fixtures_keep_typed_projection_and_raw_metadata() {
});
assert_eq!(
int64_value.projection(),
&MxValueProjection::Int64(9_223_372_036_854_770_000)
MxValueProjection::Int64(9_223_372_036_854_770_000)
);
let raw_case = case_by_id(cases, "raw-fallback.variant");
@@ -220,7 +220,7 @@ fn value_conversion_fixtures_keep_typed_projection_and_raw_metadata() {
});
assert_eq!(
raw_value.projection(),
&MxValueProjection::Raw(vec![1, 2, 3, 4, 5])
MxValueProjection::Raw(vec![1, 2, 3, 4, 5])
);
assert_eq!(raw_value.raw().raw_data_type, 32767);
assert!(raw_value.raw().raw_diagnostic.contains("No lossless"));
@@ -272,11 +272,76 @@ fn command_error_display_keeps_raw_reply_accessible() {
assert!(error.to_string().contains("MxaccessFailure"));
}
#[tokio::test]
async fn add_item_bulk_rejects_input_above_the_thousand_item_cap() {
let state = Arc::new(FakeState::default());
let endpoint = spawn_fake_gateway(state.clone()).await;
let client = GatewayClient::connect(ClientOptions::new(endpoint))
.await
.unwrap();
let session = client.session("session-fixture");
let oversized: Vec<String> = (0..1001).map(|index| format!("Tag{index}")).collect();
let error = session.add_item_bulk(12, oversized).await.unwrap_err();
assert!(
matches!(&error, Error::InvalidArgument { name, .. } if name.as_str() == "tag_addresses"),
"expected InvalidArgument for tag_addresses, got {error:?}"
);
}
#[tokio::test]
async fn event_stream_surfaces_a_mid_stream_status_fault() {
let state = Arc::new(FakeState::default());
state.emit_stream_fault.store(true, Ordering::SeqCst);
let endpoint = spawn_fake_gateway(state.clone()).await;
let client = GatewayClient::connect(ClientOptions::new(endpoint))
.await
.unwrap();
let mut stream = client
.stream_events(StreamEventsRequest {
session_id: "session-fixture".to_owned(),
after_worker_sequence: 0,
})
.await
.unwrap();
assert_eq!(stream.next().await.unwrap().unwrap().worker_sequence, 1);
assert_eq!(stream.next().await.unwrap().unwrap().worker_sequence, 2);
let fault = stream.next().await.unwrap().unwrap_err();
assert!(
matches!(fault, Error::Unavailable { .. }),
"expected Error::Unavailable, got {fault:?}"
);
}
#[tokio::test]
async fn connect_with_unreadable_ca_file_reports_invalid_endpoint() {
let options = ClientOptions::new("https://127.0.0.1:65000")
.with_plaintext(false)
.with_ca_file("definitely-not-a-real-ca-file.pem");
// GatewayClient is not Debug, so unwrap_err is unavailable here.
let error = match GatewayClient::connect(options).await {
Ok(_) => panic!("connect should fail when the CA file cannot be read"),
Err(error) => error,
};
assert!(
matches!(error, Error::InvalidEndpoint { .. }),
"expected Error::InvalidEndpoint, got {error:?}"
);
}
#[derive(Default)]
struct FakeState {
authorization: Mutex<Option<String>>,
last_command_kind: Mutex<Option<i32>>,
stream_dropped: Arc<AtomicBool>,
emit_stream_fault: AtomicBool,
}
#[derive(Clone)]
@@ -376,6 +441,12 @@ impl MxAccessGateway for FakeGateway {
let (sender, receiver) = mpsc::channel(4);
sender.send(Ok(event(1))).await.unwrap();
sender.send(Ok(event(2))).await.unwrap();
if self.state.emit_stream_fault.load(Ordering::SeqCst) {
sender
.send(Err(Status::unavailable("worker dropped the session")))
.await
.unwrap();
}
Ok(Response::new(DropAwareStream {
inner: ReceiverStream::new(receiver),
+147
View File
@@ -0,0 +1,147 @@
# Code Review — Client.Dotnet
| Field | Value |
|---|---|
| Module | `clients/dotnet` |
| Reviewer | Claude Code |
| Review date | 2026-05-18 |
| Commit reviewed | `3cc53a8` |
| Status | Reviewed |
| Open findings | 5 |
## Checklist coverage
| # | Category | Result |
|---|---|---|
| 1 | Correctness & logic bugs | Minor: handle-selector fallback `?? reply.ReturnValue.Int32Value` can mask a missing typed reply (Client.Dotnet-005); CLI redactor misses env-var keys (Client.Dotnet-008). |
| 2 | mxaccessgw conventions | Good — consumes the shared contracts project, no forked proto, `authorization: Bearer` metadata correct, parity preserved via split `EnsureProtocolSuccess`/`EnsureMxAccessSuccess`. |
| 3 | Concurrency & thread safety | Issue found: `_disposed` flags unsynchronized; `MxGatewaySession.DisposeAsync` can race a concurrent `CloseAsync` (Client.Dotnet-003). |
| 4 | Error handling & resilience | Issues found: gRPC-to-native mapping collapses non-auth statuses into one untyped exception (Client.Dotnet-001); shared retry/timeout budget (Client.Dotnet-004). |
| 5 | Security | Good — API key never logged by the library, CLI redacts keys, TLS custom-root validation correct. |
| 6 | Performance & resource management | No issues found — channels and streaming calls disposed correctly. |
| 7 | Design-document adherence | No issues found — matches `ClientLibrariesDesign.md`. |
| 8 | Code organization & conventions | Issue found: undocumented public members (Client.Dotnet-006). |
| 9 | Testing coverage | Issue found: the production retry path is never exercised (Client.Dotnet-002). |
| 10 | Documentation & comments | Issue found: doc misstates the unary timeout retry budget as per-call (Client.Dotnet-004, Client.Dotnet-007). |
## Findings
### Client.Dotnet-001
| Field | Value |
|---|---|
| Severity | Medium |
| Category | Error handling & resilience |
| Location | `clients/dotnet/MxGateway.Client/GrpcMxGatewayClientTransport.cs:190-199`, `clients/dotnet/MxGateway.Client/GrpcGalaxyRepositoryClientTransport.cs:131-140` |
| Status | Resolved |
**Description:** `MapRpcException` only produces typed exceptions for `Unauthenticated` and `PermissionDenied`. Every other gRPC status — `NotFound`, `InvalidArgument`, `ResourceExhausted`, `FailedPrecondition`, `Unavailable`, `Internal` — collapses into the base `MxGatewayException` with no surfaced `StatusCode`. Callers cannot programmatically distinguish a transient outage from a permanent bad-argument error without reflecting into `InnerException` and downcasting to `RpcException`.
**Recommendation:** Carry the gRPC `StatusCode` on `MxGatewayException` (e.g. a `StatusCode` property) and/or add typed subclasses for at least `NotFound`, `InvalidArgument`, and `Unavailable`. Populate it from `exception.StatusCode` in `MapRpcException`.
**Resolution:** (2026-05-18) Confirmed against source: both transports had a duplicated private `MapRpcException` that only typed two statuses and discarded the gRPC code for the rest. Added a nullable `StatusCode` property (`Grpc.Core.StatusCode?`) to `MxGatewayException` plus constructors that carry it, threaded it through `MxGatewayAuthenticationException`/`MxGatewayAuthorizationException`, and extracted the two duplicated mappers into a single shared internal `RpcExceptionMapper` (`RpcExceptionMapper.cs`) that populates `StatusCode` from `exception.StatusCode` for every status. Callers can now distinguish transient from permanent failures without downcasting `InnerException`. Documented in `clients/dotnet/README.md`. Regression test: `RpcExceptionMapperTests` (8 cases incl. the `[Theory]` over `NotFound`/`InvalidArgument`/`ResourceExhausted`/`FailedPrecondition`/`Unavailable`/`Internal`).
### Client.Dotnet-002
| Field | Value |
|---|---|
| Severity | Medium |
| Category | Testing coverage |
| Location | `clients/dotnet/MxGateway.Client.Tests/FakeGatewayTransport.cs:145-148`, `clients/dotnet/MxGateway.Client.Tests/MxGatewayClientSessionTests.cs:236-256` |
| Status | Resolved |
**Description:** The retry predicate `MxGatewayClientRetryPolicy.IsTransientGrpcFailure` handles two shapes: a raw `RpcException` and an `MxGatewayException { InnerException: RpcException }`. In production the transport always maps `RpcException``MxGatewayException` before it reaches the retry pipeline, so only the wrapped-`MxGatewayException` branch ever runs in production. But `FakeGatewayTransport` throws the raw `RpcException` and never maps it, so every retry test exercises only the raw-`RpcException` branch — the branch that never occurs in production. The production retry behaviour is effectively untested.
**Recommendation:** Add a fake/transport mode that maps `RpcException` to `MxGatewayException` the way `GrpcMxGatewayClientTransport` does (or add tests that enqueue a pre-wrapped `MxGatewayException`), so the actually-used predicate branch is covered.
**Resolution:** (2026-05-18) Confirmed against source: `FakeGatewayTransport` threw queued exceptions verbatim, so the existing retry tests only ever hit the raw-`RpcException` predicate branch. Added a `MapTransportExceptions` flag to `FakeGatewayTransport` that, when set, runs thrown `RpcException`s through the same shared `RpcExceptionMapper` the production gRPC transport uses, producing the wrapped `MxGatewayException` shape. Added regression test `MxGatewayClientSessionTests.InvokeAsync_RetriesSafeDiagnosticCommand_WhenTransportMapsRpcException`, which exercises the previously-untested production predicate branch. Verified red: removing the `MxGatewayException { InnerException: RpcException }` case from `IsTransientGrpcFailure` fails the new test while the pre-existing raw-`RpcException` test still passes.
### Client.Dotnet-003
| Field | Value |
|---|---|
| Severity | Medium |
| Category | Concurrency & thread safety |
| Location | `clients/dotnet/MxGateway.Client/MxGatewaySession.cs:659-663`, `clients/dotnet/MxGateway.Client/MxGatewayClient.cs:230-240` |
| Status | Resolved |
**Description:** `DisposeAsync` calls `CloseAsync()` (no token) then unconditionally `_closeLock.Dispose()`. If another thread is concurrently awaiting `CloseAsync(token)` — legal, since the type exposes public async methods and no single-threaded contract — disposing the `SemaphoreSlim` while a `WaitAsync` is pending throws `ObjectDisposedException` into that caller. The `_disposed` flags in both clients are also plain unsynchronised `bool` reads/writes; `ThrowIfDisposed` racing `DisposeAsync` can observe a stale value.
**Recommendation:** Either document `MxGatewaySession`/`MxGatewayClient` as not thread-safe for concurrent dispose, or guard `_disposed` with `Interlocked`/`volatile` and avoid disposing `_closeLock` until all in-flight `CloseAsync` calls complete.
**Resolution:** (2026-05-18) Confirmed against source: `MxGatewaySession.DisposeAsync` disposed `_closeLock` unconditionally, racing concurrent `CloseAsync` callers; `MxGatewayClient._disposed` was a plain `bool`. Fixed `MxGatewaySession` by tracking in-flight `CloseAsync` callers with an `_activeCloseCount` guarded by a dedicated `_disposeGate` lock and a `_closeLockDisposed` flag: `CloseAsync` registers under the gate (and throws `ObjectDisposedException` if disposal already won) before awaiting `_closeLock.WaitAsync`, and `DisposeAsync` drains `_activeCloseCount` to zero before disposing the semaphore, so the close lock provably outlives every pending `WaitAsync`. Fixed `MxGatewayClient` by changing `_disposed` to an `int` accessed via `Interlocked.Exchange`/`Volatile.Read`. Regression test `MxGatewayClientSessionTests.DisposeAsync_DoesNotRaceConcurrentCloseAsync` runs 100 iterations with one close holding the lock and one parked behind it while `DisposeAsync` runs concurrently; verified red against the original `DisposeAsync` (fails with `ObjectDisposedException`), green after the fix.
### Client.Dotnet-004
| Field | Value |
|---|---|
| Severity | Low |
| Category | Error handling & resilience |
| Location | `clients/dotnet/MxGateway.Client/MxGatewayClient.cs:283-294`, `clients/dotnet/MxGateway.Client/GalaxyRepositoryClient.cs:392-403` |
| Status | Open |
**Description:** `ExecuteSafeUnaryAsync` wraps the whole Polly retry pipeline in a single linked CTS cancelled after `Options.DefaultCallTimeout`, while `CreateCallOptions` also stamps each individual call with a `DefaultCallTimeout` gRPC deadline. The retry pipeline therefore shares one `DefaultCallTimeout` budget across the initial attempt plus all retries plus backoff delays. The README/XML docs describe `DefaultCallTimeout` as a per-call timeout, which misrepresents this. `DeadlineExceeded` is also classified as transient, so an attempt that exhausts the shared budget is retried only to immediately fail again.
**Recommendation:** Decide whether `DefaultCallTimeout` is per-attempt or per-operation and make code and docs consistent — e.g. a separate per-attempt deadline and a distinct overall-operation timeout. Reconsider retrying on `DeadlineExceeded` when the deadline was client-imposed.
**Resolution:** _(open)_
### Client.Dotnet-005
| Field | Value |
|---|---|
| Severity | Low |
| Category | Correctness & logic bugs |
| Location | `clients/dotnet/MxGateway.Client/MxGatewaySession.cs:82,124,175` |
| Status | Open |
**Description:** `RegisterAsync`/`AddItemAsync`/`AddItem2Async` return `reply.<Typed>?.ServerHandle ?? reply.ReturnValue.Int32Value`. After `EnsureMxAccessSuccess()` passes, a missing typed payload silently falls back to `ReturnValue.Int32Value`, which for a reply carrying no return value is `0`. A caller then uses `0` as a `ServerHandle`/`ItemHandle`, producing a confusing downstream invalid-handle failure rather than a clear "gateway reply missing payload" error.
**Recommendation:** If the typed sub-message is the contract for these commands, treat its absence on an otherwise-successful reply as an error (throw a descriptive `MxGatewayException`) rather than falling through to `ReturnValue.Int32Value`.
**Resolution:** _(open)_
### Client.Dotnet-006
| Field | Value |
|---|---|
| Severity | Low |
| Category | Code organization & conventions |
| Location | `clients/dotnet/MxGateway.Client/MxGatewayClientOptions.cs:50`, `clients/dotnet/MxGateway.Client/MxGatewayClientContractInfo.cs:10-14` |
| Status | Open |
**Description:** `MxGatewayClientOptions.MaxGrpcMessageBytes` and the two `const`s in `MxGatewayClientContractInfo` are public members with no XML doc comments, inconsistent with every other public member in the assembly and with the repo's documented C# style emphasis on a documented public surface.
**Recommendation:** Add `<summary>` doc comments to `MaxGrpcMessageBytes`, `GatewayProtocolVersion`, and `WorkerProtocolVersion`.
**Resolution:** _(open)_
### Client.Dotnet-007
| Field | Value |
|---|---|
| Severity | Low |
| Category | Documentation & comments |
| Location | `clients/dotnet/MxGateway.Client/MxGatewayClient.cs:185-192` |
| Status | Open |
**Description:** The `AcknowledgeAlarmAsync` XML comment states the gateway authenticates against an `invoke:alarm-ack` scope, but `CLAUDE.md` documents the scope set without any `invoke:alarm-ack` sub-scope. The comment may describe an intended finer-grained scope that does not exist, misleading integrators about what API key they need.
**Recommendation:** Reconcile the comment with the actual server-side scope check, or update the scope documentation if sub-scopes were genuinely added; keep client doc and gateway auth model in sync.
**Resolution:** _(open)_
### Client.Dotnet-008
| Field | Value |
|---|---|
| Severity | Low |
| Category | Correctness & logic bugs |
| Location | `clients/dotnet/MxGateway.Client.Cli/MxGatewayCliSecretRedactor.cs:9-17` |
| Status | Open |
**Description:** The CLI redactor only removes the API key string when it was supplied via `--api-key`; `RunCoreAsync` passes `arguments.GetOptional("api-key")` to `Redact`. When the key comes from an environment variable (`--api-key-env`, the documented default path), `apiKey` is `null` and no redaction occurs. If a gRPC/transport error message ever echoes the bearer token, it would be printed unredacted.
**Recommendation:** Resolve the effective API key (same logic as `ResolveApiKey`) before redacting, so the env-var-sourced key is also stripped from error output.
**Resolution:** _(open)_
+177
View File
@@ -0,0 +1,177 @@
# Code Review — Client.Go
| Field | Value |
|---|---|
| Module | `clients/go` |
| Reviewer | Claude Code |
| Review date | 2026-05-18 |
| Commit reviewed | `3cc53a8` |
| Status | Reviewed |
| Open findings | 7 |
## Checklist coverage
| # | Category | Result |
|---|---|---|
| 1 | Correctness & logic bugs | Issues found: a typed-nil `Unwrap`/`errors.As` trap (Client.Go-001), a CLI `panic` on malformed input (Client.Go-003), empty-string correlation id on rand failure (Client.Go-007). |
| 2 | mxaccessgw conventions | Generally good; two test files fail `gofmt`, breaking the documented workflow (Client.Go-004). |
| 3 | Concurrency & thread safety | No issues found — stream goroutines and cancellation are sound. |
| 4 | Error handling & resilience | Issues found: the compatibility event path silently drops events (Client.Go-002); no transient/permanent classification (Client.Go-006). |
| 5 | Security | No issues found — TLS by default with a TLS 1.2 floor, API key redaction, no secret logging. |
| 6 | Performance & resource management | No issues found — connections/streams closed via deferred `Close`/`cancel`. |
| 7 | Design-document adherence | Issues found: deprecated `grpc.DialContext`+`WithBlock` usage and a missing error taxonomy (Client.Go-005, Client.Go-006). |
| 8 | Code organization & conventions | Issue found: duplication between `Client` and `GalaxyClient` (Client.Go-009). |
| 9 | Testing coverage | Issue found: TLS path, `callContext` deadline logic, and `NativeValue`/`NativeArray` edges untested (Client.Go-008). |
| 10 | Documentation & comments | Issue found: a stale `WithBlock` dial-cancellation claim (Client.Go-010). |
## Findings
### Client.Go-001
| Field | Value |
|---|---|
| Severity | High |
| Category | Correctness & logic bugs |
| Location | `clients/go/mxgateway/errors.go:88-93`, `clients/go/mxgateway/errors.go:117-128` |
| Status | Resolved |
**Description:** `MxAccessError.Unwrap` returns `e.Command` directly. `EnsureMxAccessSuccess` constructs `&MxAccessError{Reply: reply}` with `Command` left nil (the HRESULT / failing-`MxStatusProxy` path). When `Command` is a nil `*CommandError`, `Unwrap()` returns a non-nil `error` interface wrapping a nil pointer. Consequently `errors.As(err, &ce)` for `*CommandError` returns `true` while setting `ce` to nil — a caller writing the idiomatic `if errors.As(err, &commandErr) { use commandErr.Status }` nil-dereferences and panics. Verified empirically; the existing test only exercises the populated-`Command` path.
**Recommendation:** Make `Unwrap` return an untyped nil when `Command` is nil: `if e == nil || e.Command == nil { return nil }; return e.Command`. Add a test for the HRESULT-only `MxAccessError` asserting `errors.As(err, &ce)` is `false`.
**Resolution:** Resolved 2026-05-18: `MxAccessError.Unwrap` now returns an untyped nil when `Command` is nil, so `errors.As` no longer binds a typed-nil `*CommandError`; added `errors_test.go` regression coverage for the HRESULT-only and populated-`Command` paths.
### Client.Go-002
| Field | Value |
|---|---|
| Severity | Medium |
| Category | Error handling & resilience |
| Location | `clients/go/mxgateway/session.go:440-516` |
| Status | Resolved |
**Description:** For the `Events`/`EventsAfter` compatibility API (`cancelWhenResultBufferFull == true`), when the 16-slot `results` channel is full `sendEventResult` cancels and returns `false`; the goroutine returns and `close(results)` runs — the consumer sees the channel close with **no `EventResult{Err: ...}` ever delivered**. A slow consumer cannot distinguish "stream ended normally" from "events were silently dropped." This contradicts the design doc's "libraries should not reorder, coalesce, or drop events by default", and a test currently pins this lossy behaviour.
**Recommendation:** Before cancelling on a full buffer, deliver a terminal `EventResult` carrying an explicit error (e.g. `ErrEventBufferOverflow`). Document the behaviour on `Session.Events`; steer callers to `SubscribeEvents` (which blocks instead of dropping).
**Resolution:** Resolved 2026-05-18: confirmed against source — on a full bounded buffer the compatibility path cancelled and closed `results` with no terminal result. Added the exported sentinel `ErrEventBufferOverflow` (`errors.go`); `sendEventResult` now, on a full buffer, cancels the stream then calls the new `deliverTerminalResult` helper, which evicts one of the oldest buffered events to make room and places `EventResult{Err: ErrEventBufferOverflow}` so it becomes the consumer's last item before the channel closes. The previously lossy regression test (`TestEventsAfterCancelsStreamWhenCompatibilityChannelIsAbandoned`) was re-pointed to assert the terminal `ErrEventBufferOverflow` result is delivered. `clients/go/README.md` now documents the bounded-buffer/overflow behaviour and steers no-loss callers to `SubscribeEvents`.
### Client.Go-003
| Field | Value |
|---|---|
| Severity | Medium |
| Category | Correctness & logic bugs |
| Location | `clients/go/cmd/mxgw-go/main.go:517-532` |
| Status | Resolved |
**Description:** `parseInt32List` calls `panic(err)` when an `item-handles` token fails to parse as an int32. The CLI is a documented user-facing tool; a typo like `-item-handles 1,foo` crashes the process with an unrecovered panic and stack trace instead of returning a clean error and exit code 2 like every other validation path in `main.go`.
**Recommendation:** Change `parseInt32List` to return `([]int32, error)` and have `runUnsubscribeBulk` propagate the error, matching `parseValue`'s pattern.
**Resolution:** Resolved 2026-05-18: confirmed against source — `parseInt32List` called `panic(err)` on a malformed token. It now returns `([]int32, error)`, wrapping the bad token (`invalid item handle %q: %w`); `runUnsubscribeBulk` parses item handles before dialing and returns the error, so a typo flows through `runWithIO` to `os.Exit(2)` like other validation paths. Regression tests `TestParseInt32ListParsesValidTokens` and `TestParseInt32ListReturnsErrorOnMalformedToken` added to `cmd/mxgw-go/main_test.go`.
### Client.Go-004
| Field | Value |
|---|---|
| Severity | Low |
| Category | mxaccessgw conventions |
| Location | `clients/go/mxgateway/alarms_test.go:153-154`, `clients/go/mxgateway/galaxy_test.go:58-59` |
| Status | Open |
**Description:** `gofmt -l` flags `alarms_test.go` and `galaxy_test.go` for misaligned struct-literal field padding. The Go client README lists `gofmt` as part of the workflow and the repo enforces style; unformatted committed code breaks `gofmt`-gated checks and CI.
**Recommendation:** Run `gofmt -w mxgateway/alarms_test.go mxgateway/galaxy_test.go`.
**Resolution:** _(open)_
### Client.Go-005
| Field | Value |
|---|---|
| Severity | Low |
| Category | Design-document adherence |
| Location | `clients/go/mxgateway/client.go:64,68`, `clients/go/mxgateway/galaxy.go:83,87` |
| Status | Open |
**Description:** The client uses `grpc.DialContext` with `grpc.WithBlock()`. In current grpc-go both are deprecated in favour of `grpc.NewClient` (lazy connection). `WithBlock` also changes failure semantics: a transient gateway-unavailable at dial time becomes a hard `Dial` error rather than a connection that recovers when the gateway comes up, working against the design doc's resilience intent.
**Recommendation:** Migrate to `grpc.NewClient`; if a fail-fast connect probe is still wanted, do an explicit readiness wait bounded by `DialTimeout`, and update the doc comment.
**Resolution:** _(open)_
### Client.Go-006
| Field | Value |
|---|---|
| Severity | Low |
| Category | Error handling & resilience |
| Location | `clients/go/mxgateway/errors.go:9-130` |
| Status | Open |
**Description:** `docs/ClientLibrariesDesign.md` recommends a high-level error taxonomy (`TransportError`, `AuthenticationError`, `TimeoutError`, etc.). The Go client collapses all transport/gRPC failures into a single `GatewayError` with no way to classify transient (`Unavailable`, `DeadlineExceeded`) vs permanent (`Unauthenticated`, `InvalidArgument`) without manually unwrapping and calling `status.Code`.
**Recommendation:** Add a helper (e.g. `IsTransient(err) bool`) or expose the gRPC `codes.Code` on `GatewayError`, so retry/timeout/auth handling can be written without re-parsing the wrapped error.
**Resolution:** _(open)_
### Client.Go-007
| Field | Value |
|---|---|
| Severity | Low |
| Category | Correctness & logic bugs |
| Location | `clients/go/mxgateway/session.go:526-532` |
| Status | Open |
**Description:** `newCorrelationID` returns an empty string when `crypto/rand.Read` fails, silently producing an `MxCommandRequest` with no correlation id. `rand.Read` failure is rare, but the failure mode (untraceable command, no error surfaced) is worse than failing loud, and the empty-id path is untested.
**Recommendation:** Either propagate the error up through `invokeCommand`, or fall back to a time/counter-based id rather than an empty string.
**Resolution:** _(open)_
### Client.Go-008
| Field | Value |
|---|---|
| Severity | Low |
| Category | Testing coverage |
| Location | `clients/go/mxgateway/` (test files) |
| Status | Open |
**Description:** Several critical paths are untested: TLS credential resolution in `resolveTransportCredentials` (only the `Plaintext` path is exercised); the `callContext` deadline-shortening logic (`client.go:198-204`) including the negative-timeout disable case; and `NativeValue`/`NativeArray` for the array, raw-bytes, null, and unsupported-kind branches.
**Recommendation:** Add unit tests for `resolveTransportCredentials` precedence, `callContext` deadline arithmetic, and `NativeValue`/`NativeArray` round-trips for every kind.
**Resolution:** _(open)_
### Client.Go-009
| Field | Value |
|---|---|
| Severity | Low |
| Category | Code organization & conventions |
| Location | `clients/go/mxgateway/galaxy.go:60-93,241-256`, `clients/go/mxgateway/client.go:41-74,190-205` |
| Status | Open |
**Description:** `DialGalaxy`/`Dial` and `GalaxyClient.callContext`/`Client.callContext` are near-identical duplicates (dial-context setup, credential resolution, dial-option assembly, deadline arithmetic). A fix to one (e.g. the Client.Go-005 dial migration) must be applied twice and can drift.
**Recommendation:** Extract a shared unexported `dial(ctx, opts)` and a free `callContext(opts, ctx)` function, and have both client constructors call them.
**Resolution:** _(open)_
### Client.Go-010
| Field | Value |
|---|---|
| Severity | Low |
| Category | Documentation & comments |
| Location | `clients/go/mxgateway/client.go:39-40` |
| Status | Open |
**Description:** The `Dial` doc comment states it configures "blocking dial cancellation from ctx." This describes the deprecated `WithBlock` behaviour; once Client.Go-005 is addressed the comment is misleading about how connection establishment and cancellation work.
**Recommendation:** Reword to describe the actual connect/timeout semantics after resolving Client.Go-005, and clarify that `DialTimeout` bounds the initial connect attempt.
**Resolution:** _(open)_
+207
View File
@@ -0,0 +1,207 @@
# Code Review — Client.Java
| Field | Value |
|---|---|
| Module | `clients/java` |
| Reviewer | Claude Code |
| Review date | 2026-05-18 |
| Commit reviewed | `3cc53a8` |
| Status | Reviewed |
| Open findings | 7 |
## Checklist coverage
| # | Category | Result |
|---|---|---|
| 1 | Correctness & logic bugs | Issues found: `register`/`addItem` silently fall back to `getReturnValue()` masking missing payloads (Client.Java-004); fragile `resolved()` mutation pattern (Client.Java-012). |
| 2 | mxaccessgw conventions | Largely adheres; the gateway protocol-version handshake is never verified despite the contract field existing (Client.Java-003). |
| 3 | Concurrency & thread safety | Issue found: `MxEventStream.next` is a plain field and terminal-state transitions race (Client.Java-002). |
| 4 | Error handling & resilience | Issues found: `close()` can mask the primary exception (Client.Java-005); async/sync error surfaces inconsistent (Client.Java-008). |
| 5 | Security | Issue found: API-key redaction leaks the trailing 4 secret characters (Client.Java-001). |
| 6 | Performance & resource management | Issues found: `close()` does not await termination (Client.Java-006); no stream flow control (Client.Java-011). |
| 7 | Design-document adherence | Matches `JavaClientDesign.md` closely; the protocol-version check is undocumented-missing (Client.Java-003). |
| 8 | Code organization & conventions | Issue found: ~80 duplicated lines across the two clients (Client.Java-009). |
| 9 | Testing coverage | Issue found: alarm RPCs, TLS setup, async streams, and queue overflow untested (Client.Java-007). |
| 10 | Documentation & comments | Issue found: README/Javadoc assert undocumented scope names (Client.Java-010). |
## Findings
### Client.Java-001
| Field | Value |
|---|---|
| Severity | Medium |
| Category | Security |
| Location | `clients/java/mxgateway-client/src/main/java/com/dohertylan/mxgateway/client/MxGatewaySecrets.java:30-32` |
| Status | Resolved |
**Description:** `redactApiKey` preserves the leading and trailing four characters of the key. A gateway API key has the form `mxgw_<key-id>_<secret>`; the last four characters belong to the secret portion, so the "redacted" form leaks 4 characters of the actual secret into logs, CLI JSON output (`CommonOptions.redactedJsonMap`), and `MxGatewayClientOptions.toString()`. CLAUDE.md states API keys must never reach logs.
**Recommendation:** Redact the secret entirely. Show only a stable non-secret prefix (e.g. the `mxgw_<key-id>_` portion) and mask everything after it, or emit a fixed `mxgw_***` form. Do not echo any trailing characters of the secret.
**Resolution:** (2026-05-18) Confirmed against source: the old `substring(0,4) + stars + substring(len-4)` echoed the last four secret characters. `redactApiKey` now masks the secret entirely: for gateway-shaped keys it returns the non-secret `mxgw_<key-id>_` prefix followed by `***` (locating the secret separator as the first `_` after `mxgw_`); any non-gateway-shaped token returns `<redacted>`. No leading/trailing secret characters are ever emitted. The pre-existing `MxGatewayCliTests.openSessionJsonRedactsApiKey` assertion that hardcoded the leaky `mxgw***********cret` form was corrected to assert the masked `mxgw_visible_***` form. Regression tests: `MxGatewayMediumFindingsTests.redactApiKeyDoesNotLeakAnyCharacterOfTheSecret`, `redactApiKeyForNonGatewayShapedKeyRevealsNothing`, `redactApiKeyStillHandlesNullAndShortInput`.
### Client.Java-002
| Field | Value |
|---|---|
| Severity | Medium |
| Category | Concurrency & thread safety |
| Location | `clients/java/mxgateway-client/src/main/java/com/dohertylan/mxgateway/client/MxEventStream.java:31,66-92` |
| Status | Resolved |
**Description:** The `next` field is a plain (non-volatile) instance field, and `MxEventStream` exposes no thread-confinement guarantee. More concretely, a queue-overflow `offer()` and a `close()` `offer(END)` can interleave so the overflow exception is enqueued after `END` and never observed — the contract that "next() throws after overflow" is not guaranteed once `close()` has been called.
**Recommendation:** Document single-consumer-thread usage explicitly in the Javadoc, and serialise terminal state transitions (overflow vs END vs close) behind a single guarded flag so the first terminal condition wins deterministically.
**Resolution:** (2026-05-18) Confirmed against source: the old `offer()` END-branch did `queue.clear(); queue.offer(END)` when full, so a `close()` arriving after an overflow wiped the already-enqueued overflow exception, leaving the consumer with a clean end-of-stream and the overflow silently lost. Terminal transitions are now serialised through a single `terminate(MxGatewayException)` method guarded by a `terminated` flag and a `terminalLock`; the first terminal condition wins and a later `close()`/`END` cannot overwrite a published overflow fault. The Javadoc now explicitly documents that the iterator methods are single-consumer-only while `close()` is safe from any thread. Regression tests: `MxGatewayMediumFindingsTests.eventStreamOverflowExceptionSurvivesASubsequentClose` (deterministic) and `eventStreamConcurrentOverflowAndCloseAlwaysTerminate` (300-iteration race stress).
### Client.Java-003
| Field | Value |
|---|---|
| Severity | Medium |
| Category | mxaccessgw conventions |
| Location | `clients/java/mxgateway-client/src/main/java/com/dohertylan/mxgateway/client/MxGatewayClient.java:119-140` |
| Status | Resolved |
**Description:** `OpenSessionReply` carries `gateway_protocol_version` (proto field 8), and `MxGatewayClientVersion.GATEWAY_PROTOCOL_VERSION` exists so the client can reject incompatible generated-code inputs. The client never reads `reply.getGatewayProtocolVersion()` nor compares it against the compiled-in version. A client built against an older/newer contract issues commands blindly and fails with confusing downstream errors instead of a clear version-mismatch failure.
**Recommendation:** In `openSession`/`openSessionRaw`, compare `reply.getGatewayProtocolVersion()` with `MxGatewayClientVersion.gatewayProtocolVersion()` and throw a typed `MxGatewayException` on mismatch.
**Resolution:** (2026-05-18) Confirmed against source: neither `openSessionRaw` nor `openSessionAsync` read `getGatewayProtocolVersion()`. Added a private `ensureGatewayProtocolCompatible` helper, called from both `openSessionRaw` and `openSessionAsync`, that throws `MxGatewayException` with a clear mismatch message when the gateway reports a non-zero version differing from `MxGatewayClientVersion.gatewayProtocolVersion()`. A gateway that leaves the field unset (value 0, e.g. an older gateway) is accepted unchanged for backward compatibility. `clients/java/README.md` documents the new fail-fast check. Regression tests: `MxGatewayMediumFindingsTests.openSessionRejectsIncompatibleGatewayProtocolVersion` and `openSessionAcceptsMatchingOrUnsetGatewayProtocolVersion`.
### Client.Java-004
| Field | Value |
|---|---|
| Severity | Medium |
| Category | Correctness & logic bugs |
| Location | `clients/java/mxgateway-client/src/main/java/com/dohertylan/mxgateway/client/MxGatewaySession.java:114-120,157-163,191-197` |
| Status | Resolved |
**Description:** `register`, `addItem`, and `addItem2` check `reply.hasRegister()`/`hasAddItem()` and otherwise fall back to `reply.getReturnValue().getInt32Value()`. If the gateway returns a reply with neither the typed payload nor a `return_value` set, the method silently returns `0` — indistinguishable from a legitimate handle of 0. This masks a contract violation rather than surfacing it.
**Recommendation:** If the expected typed payload is absent and no `return_value` is present, throw `MxGatewayException` (protocol violation) instead of returning `0`.
**Resolution:** (2026-05-18) Confirmed against source: all three methods returned `reply.getReturnValue().getInt32Value()` (which yields `0` for an unset message field) when the typed payload was absent. Each method now guards the fallback with `reply.hasReturnValue()` and throws `MxGatewayException` describing the protocol violation when neither the typed payload nor a `return_value` is present. The legitimate `return_value` fallback is preserved. Regression tests: `MxGatewayMediumFindingsTests.registerThrowsWhenReplyHasNeitherTypedPayloadNorReturnValue`, `addItemThrowsWhenReplyHasNeitherTypedPayloadNorReturnValue`, and `addItemStillHonoursReturnValueFallback`.
### Client.Java-005
| Field | Value |
|---|---|
| Severity | Medium |
| Category | Error handling & resilience |
| Location | `clients/java/mxgateway-client/src/main/java/com/dohertylan/mxgateway/client/MxGatewaySession.java:92-105` |
| Status | Resolved |
**Description:** `close()` delegates to `closeRaw()`, which performs a network RPC. When `MxGatewaySession` is used in try-with-resources and the body throws, a failure inside `closeSession` (e.g. `WORKER_UNAVAILABLE`) throws from `close()` and replaces the original exception as the propagated throwable (the body exception becomes a suppressed exception) — a known try-with-resources footgun for I/O-performing `close()`.
**Recommendation:** Either make `close()` swallow/log close-time failures (keeping `closeRaw()` for callers who want the result), or document clearly that `close()` performs a network call that can throw.
**Resolution:** (2026-05-18) Confirmed against source: `close()` called `closeRaw()` directly, so a `CloseSession` RPC failure propagated out of try-with-resources and replaced the body exception. `close()` now catches `MxGatewayException` from `closeRaw()` and logs it at WARNING via `System.Logger` instead of rethrowing, so a close-time failure never masks the body exception. `closeRaw()` is unchanged and still throws for callers who want to observe the close result. The behavior change and the recommendation to use `closeRaw()` for explicit close handling are documented in `clients/java/README.md` and the `close()` Javadoc. Regression tests: `MxGatewayMediumFindingsTests.closeSuppressesCloseTimeFailureInsteadOfMaskingBodyException` and `closeRawStillSurfacesCloseTimeFailureForCallersWhoWantIt`.
### Client.Java-006
| Field | Value |
|---|---|
| Severity | Low |
| Category | Performance & resource management |
| Location | `clients/java/mxgateway-client/src/main/java/com/dohertylan/mxgateway/client/MxGatewayClient.java:323-328`, `clients/java/mxgateway-client/src/main/java/com/dohertylan/mxgateway/client/GalaxyRepositoryClient.java:279-284` |
| Status | Open |
**Description:** `close()` (the `AutoCloseable` method invoked by try-with-resources) calls only `ownedChannel.shutdown()` and returns immediately without awaiting termination. In-flight calls and Netty event-loop threads may still be running when the caller assumes the resource is released. `closeAndAwaitTermination()` does it correctly but is not the method try-with-resources uses, and the README examples all rely on try-with-resources.
**Recommendation:** Have `close()` await termination for a bounded time and `shutdownNow()` on timeout (the logic already in `closeAndAwaitTermination()`), or document that try-with-resources callers should call `closeAndAwaitTermination()`.
**Resolution:** _(open)_
### Client.Java-007
| Field | Value |
|---|---|
| Severity | Low |
| Category | Testing coverage |
| Location | `clients/java/mxgateway-client/src/test/java/com/dohertylan/mxgateway/client/` |
| Status | Open |
**Description:** The alarm surface — `acknowledgeAlarm`/`acknowledgeAlarmAsync`/`queryActiveAlarms` and `MxGatewayActiveAlarmsSubscription` — has zero test coverage. TLS channel construction, the async `streamEventsAsync` path, `MxGatewayEventSubscription` pre-start cancellation, and `MxEventStream` queue overflow are likewise untested. `JavaClientDesign.md` explicitly lists async stream-observer cancellation and status/error mapping as required tests.
**Recommendation:** Add in-process gRPC tests for the alarm RPCs, the async streaming/subscription cancellation paths, and at least one TLS-config construction test.
**Resolution:** _(open)_
### Client.Java-008
| Field | Value |
|---|---|
| Severity | Low |
| Category | Error handling & resilience |
| Location | `clients/java/mxgateway-client/src/main/java/com/dohertylan/mxgateway/client/MxGatewayClient.java:298-304` |
| Status | Open |
**Description:** `acknowledgeAlarmAsync` and `openSessionAsync` apply `ensureProtocolSuccess` inside `thenApply`. If that validator throws a non-`MxGatewayException` `RuntimeException` it is wrapped by `CompletionException` with no `fromGrpc` normalisation, unlike the synchronous paths which normalise via `try/catch`. The async and sync error surfaces are therefore inconsistent.
**Recommendation:** Wrap the `thenApply` body so any non-`MxGatewayException` is routed through `MxGatewayErrors.fromGrpc`, matching the synchronous methods.
**Resolution:** _(open)_
### Client.Java-009
| Field | Value |
|---|---|
| Severity | Low |
| Category | Code organization & conventions |
| Location | `clients/java/mxgateway-client/src/main/java/com/dohertylan/mxgateway/client/GalaxyRepositoryClient.java:310-391`, `clients/java/mxgateway-client/src/main/java/com/dohertylan/mxgateway/client/MxGatewayClient.java:346-413` |
| Status | Open |
**Description:** `createChannel`, `withDeadline`, `withStreamDeadline`, and `toCompletable` are duplicated nearly verbatim across `MxGatewayClient` and `GalaxyRepositoryClient` (~80 lines). A fix to one will not propagate to the other.
**Recommendation:** Extract the channel-builder and future-adaptor helpers into a shared package-private utility class.
**Resolution:** _(open)_
### Client.Java-010
| Field | Value |
|---|---|
| Severity | Low |
| Category | Documentation & comments |
| Location | `clients/java/mxgateway-client/src/main/java/com/dohertylan/mxgateway/client/MxGatewayClient.java:269-272`, `clients/java/README.md:76` |
| Status | Open |
**Description:** The `acknowledgeAlarm` Javadoc states the gateway authenticates against an `invoke:alarm-ack` scope, and the README states the Galaxy Repository requires a `metadata:read` scope. CLAUDE.md's documented scope set names neither — the Javadoc/README assert a scope contract the project's own auth documentation does not corroborate.
**Recommendation:** Reconcile the scope names with `src/MxGateway.Server/Security/` and CLAUDE.md; correct the Javadoc/README to the actual scope strings, or fix CLAUDE.md if sub-scopes were genuinely added.
**Resolution:** _(open)_
### Client.Java-011
| Field | Value |
|---|---|
| Severity | Low |
| Category | Performance & resource management |
| Location | `clients/java/mxgateway-client/src/main/java/com/dohertylan/mxgateway/client/MxEventStream.java:37-63` |
| Status | Open |
**Description:** The event stream relies on default gRPC auto-inbound flow control: the async stub auto-requests messages, so the server can push faster than the 16-element bounded queue drains. A momentarily slow consumer triggers queue overflow and an immediate stream-fault cancel. This is consistent with the documented fail-fast event-backpressure design, but the client never applies real flow control, so even brief consumer stalls kill the subscription.
**Recommendation:** Confirm fail-fast is intended (it appears to be); if so, document it on `MxEventStream` so callers know a slow consumer terminates the stream. Optionally expose the queue capacity or opt-in flow control.
**Resolution:** _(open)_
### Client.Java-012
| Field | Value |
|---|---|
| Severity | Low |
| Category | Correctness & logic bugs |
| Location | `clients/java/mxgateway-cli/src/main/java/com/dohertylan/mxgateway/cli/MxGatewayCli.java:667-674` |
| Status | Open |
**Description:** `CommonOptions.resolved()` mutates `this` (`resolvedApiKey`, `resolvedTimeout`) and returns `this`, but `toClientOptions()` and `redactedJsonMap()` read those mutated fields. If `redactedJsonMap()` is ever called before `resolved()`, it silently emits empty-string defaults. The "return this after mutating" pattern is fragile and surprising.
**Recommendation:** Make `resolved()` return an immutable resolved value object, or compute `resolvedApiKey`/`resolvedTimeout` lazily in their getters so call ordering cannot produce stale output.
**Resolution:** _(open)_
+207
View File
@@ -0,0 +1,207 @@
# Code Review — Client.Python
| Field | Value |
|---|---|
| Module | `clients/python` |
| Reviewer | Claude Code |
| Review date | 2026-05-18 |
| Commit reviewed | `3cc53a8` |
| Status | Reviewed |
| Open findings | 9 |
## Checklist coverage
| # | Category | Result |
|---|---|---|
| 1 | Correctness & logic bugs | Issues found: dead `closed` variable (Client.Python-004); float/bytes value-mapping assumptions (Client.Python-008). |
| 2 | mxaccessgw conventions | Largely adheres; one missing export and a `*_raw` MXAccess-failure documentation gap (Client.Python-002, Client.Python-012). |
| 3 | Concurrency & thread safety | Issue found: `close()` idempotency claim does not hold under concurrent close (Client.Python-006). |
| 4 | Error handling & resilience | Issues found: inconsistent timeout-kwarg fallback (Client.Python-003); `success == 0` default-value hazard (Client.Python-011); inconsistent cancel helpers (Client.Python-007). |
| 5 | Security | No issues found — API keys redacted in repr and CLI output, TLS supported, no secret logging. |
| 6 | Performance & resource management | Issue found: `discover_hierarchy` buffers the whole hierarchy in memory (Client.Python-005). |
| 7 | Design-document adherence | Matches the design docs closely; minor CLI doc drift (Client.Python-001). |
| 8 | Code organization & conventions | Issues found: `MxGatewayCommandError` omitted from `__all__` (Client.Python-002); fragile circular-import workaround (Client.Python-010). |
| 9 | Testing coverage | Issue found: `write2`, `add_item2`, bulk-size limits, TLS `ca_file`, and CLI command bodies untested (Client.Python-009). |
| 10 | Documentation & comments | Issue found: stale "scaffold" package description (Client.Python-001). |
## Findings
### Client.Python-001
| Field | Value |
|---|---|
| Severity | Low |
| Category | Documentation & comments |
| Location | `clients/python/pyproject.toml:8,25`, `clients/python/src/mxgateway_cli/commands.py:25` |
| Status | Open |
**Description:** The package `description` in `pyproject.toml` still says "Async Python client *scaffold*" even though the client is fully implemented. Stale "scaffold" wording misrepresents maturity to anyone reading PyPI metadata. (The `mxgw-py` console-script name is itself consistent between `pyproject.toml` and the README.)
**Recommendation:** Update the `pyproject.toml` description to drop "scaffold"; keep README CLI examples in sync with the actual `mxgw-py` entry point.
**Resolution:** _(open)_
### Client.Python-002
| Field | Value |
|---|---|
| Severity | Low |
| Category | Code organization & conventions |
| Location | `clients/python/src/mxgateway/__init__.py:27` |
| Status | Open |
**Description:** `MxGatewayCommandError` is imported into `__init__.py` and is a documented public exception, but it is missing from `__all__`. It is the parent of `MxAccessError` and a meaningful catch target, so omitting it from the public surface is inconsistent — `from mxgateway import *` will not expose it and tooling that respects `__all__` treats it as private.
**Recommendation:** Add `"MxGatewayCommandError"` to the `__all__` list.
**Resolution:** _(open)_
### Client.Python-003
| Field | Value |
|---|---|
| Severity | Medium |
| Category | Error handling & resilience |
| Location | `clients/python/src/mxgateway/client.py:125-137,155-173` |
| Status | Resolved |
**Description:** `stream_events_raw` and `query_active_alarms` call the stub directly with a `timeout` kwarg when `stream_timeout` is set, with no `TypeError` fallback. `galaxy.py:watch_deploy_events` and `_unary` *do* have a fallback that strips `timeout` if the callable rejects it. This asymmetry means a fake/older stub that does not accept `timeout` crashes for gateway streams but not Galaxy streams. It is only masked today because `stream_timeout` defaults to `None`.
**Recommendation:** Apply the same `try/except TypeError` timeout-fallback pattern to `stream_events_raw` and `query_active_alarms`, or remove the fallback everywhere and standardise on a single behaviour.
**Resolution:** 2026-05-18 — Confirmed: both stream methods in `client.py` called the stub with `timeout` unconditionally and had no `TypeError` fallback, unlike `_unary` and `galaxy.watch_deploy_events`. Added a shared `_open_stream` helper in `client.py` that opens a server-streaming call and strips the `timeout` kwarg when the stub raises `TypeError: ... unexpected keyword argument 'timeout'`, then routed both `stream_events_raw` and `query_active_alarms` through it. Regression tests in `tests/test_stream_timeout_fallback.py` (`test_stream_events_raw_falls_back_when_stub_rejects_timeout`, `test_query_active_alarms_falls_back_when_stub_rejects_timeout`, `test_stream_events_raw_still_passes_timeout_to_capable_stub`) failed before the fix and pass after. No public behaviour change for real gRPC stubs, so no README update needed.
### Client.Python-004
| Field | Value |
|---|---|
| Severity | Low |
| Category | Correctness & logic bugs |
| Location | `clients/python/src/mxgateway_cli/commands.py:386,402-404` |
| Status | Open |
**Description:** In `_smoke`, the local variable `closed` is set to `False` and never reassigned; the `finally` block's `if not closed:` is therefore always true. This is dead/misleading code suggesting a removed early-close path.
**Recommendation:** Remove the `closed` variable and the `if not closed:` guard; call `await session.close()` directly in the `finally` block (or use `async with session:`).
**Resolution:** _(open)_
### Client.Python-005
| Field | Value |
|---|---|
| Severity | Medium |
| Category | Performance & resource management |
| Location | `clients/python/src/mxgateway/galaxy.py:117-140` |
| Status | Resolved |
**Description:** `discover_hierarchy` pages through the entire Galaxy object hierarchy and accumulates every `GalaxyObject` (each carrying its full attribute list) into a single in-memory `list` before returning. For a large Galaxy this is a very large allocation with no streaming alternative and no caller-side bound.
**Recommendation:** Offer an async-generator variant (e.g. `iter_hierarchy()`) that yields objects/pages as they arrive, keeping `discover_hierarchy()` as a convenience wrapper. At minimum document the memory characteristic.
**Resolution:** 2026-05-18 — Confirmed: `discover_hierarchy` buffered the entire hierarchy with no streaming alternative. Added `GalaxyRepositoryClient.iter_hierarchy`, an async generator that fetches one `DiscoverHierarchyRequest` page at a time and yields each `GalaxyObject` as it arrives, so peak memory is bounded by a single page (`_DISCOVER_HIERARCHY_PAGE_SIZE`). Pages are fetched lazily — the next page is only requested after the current page is fully consumed. `discover_hierarchy` is now a thin convenience wrapper (`[obj async for obj in self.iter_hierarchy()]`) that preserves its `list[GalaxyObject]` contract, including the repeated-page-token guard. Regression tests in `tests/test_galaxy_iter_hierarchy.py` (`test_iter_hierarchy_yields_objects_across_pages`, `test_iter_hierarchy_is_lazy_and_does_not_prefetch_next_page`, `test_iter_hierarchy_rejects_repeated_page_token`, `test_discover_hierarchy_still_returns_full_list`) failed before the fix and pass after. `clients/python/README.md` updated with the `iter_hierarchy` usage and memory guidance since this adds a new public method.
### Client.Python-006
| Field | Value |
|---|---|
| Severity | Low |
| Category | Concurrency & thread safety |
| Location | `clients/python/src/mxgateway/client.py:74-82`, `clients/python/src/mxgateway/galaxy.py:85-93`, `clients/python/src/mxgateway/session.py:38-55` |
| Status | Open |
**Description:** `close()` on the clients and `Session.close()` use a plain `self._closed` check-then-set with an `await` between, with no lock. If two coroutines call `close()` concurrently both can pass the guard before either sets it, causing a double `channel.close()` / double `CloseSession` RPC. Single-task usage is the documented contract, so impact is low, but the idempotency guarantee asserted in docstrings only holds for sequential calls.
**Recommendation:** Set `self._closed = True` before the `await`, or guard with an `asyncio.Lock`, so the idempotency claim holds under concurrent close.
**Resolution:** _(open)_
### Client.Python-007
| Field | Value |
|---|---|
| Severity | Low |
| Category | Error handling & resilience |
| Location | `clients/python/src/mxgateway/client.py:204-213` |
| Status | Open |
**Description:** `_canceling_iterator` (gateway event stream) does not catch `asyncio.CancelledError` to invoke `call.cancel()` explicitly — it relies on the `finally` block. `galaxy.py:_canceling_iterator` *does* explicitly catch `CancelledError`, cancel, and re-raise. The two are functionally equivalent today, but the inconsistency between near-identical helpers invites future divergence.
**Recommendation:** Make the two `_canceling_iterator` helpers identical, ideally by factoring a single shared helper.
**Resolution:** _(open)_
### Client.Python-008
| Field | Value |
|---|---|
| Severity | Low |
| Category | Correctness & logic bugs |
| Location | `clients/python/src/mxgateway/values.py:62-67,83-88` |
| Status | Open |
**Description:** `to_mx_value` maps any Python `float` to `VT_R8`/`MX_DATA_TYPE_DOUBLE` with no handling for `nan`/`inf`, which are serialised and forwarded to MXAccess which may reject or mis-handle them. `bytes` is mapped to `VT_RECORD`/`MX_DATA_TYPE_UNKNOWN`, a questionable default. The `data_type` keyword exists but `Session.write` never forwards it.
**Recommendation:** Document the float/bytes mapping assumptions, optionally validate finiteness, and consider plumbing the `data_type` keyword through `Session.write`/`write2`.
**Resolution:** _(open)_
### Client.Python-009
| Field | Value |
|---|---|
| Severity | Medium |
| Category | Testing coverage |
| Location | `clients/python/tests/` |
| Status | Resolved |
**Description:** Several non-trivial public paths are untested: `Session.write2`/`add_item2` request construction; the bulk-size limit `_ensure_bulk_size`/`MAX_BULK_ITEMS` guard; the `None`-argument `TypeError` guards in bulk methods; the TLS `ca_file` read path in `create_channel`; most CLI command bodies; and `map_rpc_error`'s default (non-auth) branch.
**Recommendation:** Add tests for `write2`/`add_item2` request shape, the bulk-size `ValueError`, the `ca_file` TLS branch, the generic `map_rpc_error` fallthrough, and at least one happy-path CLI command using a fake stub.
**Resolution:** 2026-05-18 — Confirmed coverage gap against the existing `tests/` files. Added `tests/test_coverage_gaps.py` covering every path the finding lists: `test_add_item2_sends_item_context_and_returns_handle` and `test_write2_sends_value_and_timestamp_value` (request shape + `MxValue` oneof), `test_subscribe_bulk_rejects_oversized_request` and `test_add_item_bulk_at_limit_is_allowed` (the `MAX_BULK_ITEMS` `_ensure_bulk_size` boundary), `test_advise_item_bulk_rejects_none_argument` (the `None`-argument `TypeError` guard), `test_create_channel_reads_ca_file` and `test_create_channel_missing_ca_file_raises` (the TLS `ca_file` read path), `test_map_rpc_error_generic_branch_returns_transport_error` and `test_map_rpc_error_handles_error_without_code` (the non-auth `map_rpc_error` fallthrough and the no-`code` path), and `test_cli_register_happy_path_emits_server_handle` (a happy-path CLI command body driven end to end through `CliRunner` with a fake stub via a monkeypatched `_connect`). All 10 new tests pass. No source change required — this is a pure coverage finding.
### Client.Python-010
| Field | Value |
|---|---|
| Severity | Low |
| Category | Code organization & conventions |
| Location | `clients/python/src/mxgateway/session.py:404`, `clients/python/src/mxgateway_cli/commands.py:422-425` |
| Status | Open |
**Description:** `session.py` ends with a module-level late import `from .client import GatewayClient # noqa: E402` purely to satisfy a string type hint, and `commands.py:_session` does a function-local import. Both work around a circular dependency that `from __future__ import annotations` (already in effect) makes unnecessary. `_session` also lacks a return type annotation.
**Recommendation:** Drop the runtime late import in `session.py` and use a `TYPE_CHECKING`-guarded import for the hint; add the `-> Session` return annotation to `commands.py:_session`.
**Resolution:** _(open)_
### Client.Python-011
| Field | Value |
|---|---|
| Severity | Low |
| Category | Error handling & resilience |
| Location | `clients/python/src/mxgateway/errors.py:122-148` |
| Status | Open |
**Description:** `ensure_mxaccess_success` raises `MxAccessError` if any `mx_status.success == 0`. This treats `success == 0` as the failure sentinel, but `0` is also the proto3 scalar default for an unset `MxStatusProxy`. If the gateway ever returns a reply with an unpopulated status entry (e.g. a partially-filled bulk result), the client raises `MxAccessError` even though no real failure occurred.
**Recommendation:** Confirm against the proto/gateway contract whether `success` is guaranteed populated for every `statuses` entry; if not, key the failure decision on an explicit failure field rather than the `success == 0` default.
**Resolution:** _(open)_
### Client.Python-012
| Field | Value |
|---|---|
| Severity | Low |
| Category | mxaccessgw conventions |
| Location | `clients/python/src/mxgateway/client.py:84-108`, `clients/python/src/mxgateway/session.py:57-77` |
| Status | Open |
**Description:** `Session.invoke_raw` does not run `ensure_mxaccess_success` while `Session.invoke` does, so a caller using `invoke_raw` for parity tests gets a reply where an MXAccess HRESULT failure is silently embedded with no exception. This is by design but under-documented — the README's "preserve raw replies" sentence does not state that `*_raw` methods skip MXAccess-failure detection entirely.
**Recommendation:** Document explicitly (README + docstring) that `*_raw` methods surface MXAccess HRESULT/status failures only inside the reply and do not raise `MxAccessError`, so parity-test callers know to inspect `protocol_status`/`hresult`/`statuses` themselves.
**Resolution:** _(open)_
+207
View File
@@ -0,0 +1,207 @@
# Code Review — Client.Rust
| Field | Value |
|---|---|
| Module | `clients/rust` |
| Reviewer | Claude Code |
| Review date | 2026-05-18 |
| Commit reviewed | `3cc53a8` |
| Status | Reviewed |
| Open findings | 0 |
## Checklist coverage
| # | Category | Result |
|---|---|---|
| 1 | Correctness & logic bugs | Issues found: a stale unit test fails the suite (Client.Rust-003); handle extractors silently return 0 on a shapeless OK reply (Client.Rust-005). |
| 2 | mxaccessgw conventions | `cargo clippy --workspace --all-targets -- -D warnings` fails (Client.Rust-001, Client.Rust-002, Client.Rust-012), violating a CLAUDE.md hard requirement; hard-coded correlation ids (Client.Rust-011). |
| 3 | Concurrency & thread safety | No issues found — clients are cheaply cloneable, streams are `Send`, drop-cancels-call is verified. |
| 4 | Error handling & resilience | Issues found: empty-vec on shapeless bulk reply (Client.Rust-006); no transient/permanent classification (Client.Rust-010). |
| 5 | Security | No issues found — API keys redacted in `Debug`/`Display`, status messages scrubbed, TLS handled correctly. |
| 6 | Performance & resource management | Issue found: value/array projections clone every element, doubling array memory (Client.Rust-008). |
| 7 | Design-document adherence | Issue found: `RustClientDesign.md` documents a stale crate layout and an unused `tracing` dependency (Client.Rust-007). |
| 8 | Code organization & conventions | Issue found: `BulkReplyKind` trips a clippy lint; undocumented public methods (Client.Rust-001, Client.Rust-002). |
| 9 | Testing coverage | Issue found: TLS setup, mid-stream fault propagation, and the bulk-size cap untested (Client.Rust-009). |
| 10 | Documentation & comments | Issue found: the version-constant doc comment is wrong (Client.Rust-004). |
## Findings
### Client.Rust-001
| Field | Value |
|---|---|
| Severity | High |
| Category | mxaccessgw conventions |
| Location | `clients/rust/src/options.rs:98,143` |
| Status | Resolved |
**Description:** `with_max_grpc_message_bytes` and `max_grpc_message_bytes` have no `///` doc comments. The crate sets `#![warn(missing_docs)]` and CLAUDE.md mandates that `cargo clippy --workspace --all-targets -- -D warnings` pass. Under `-D warnings` these become hard errors, so clippy fails to compile the crate — breaking the documented build/test workflow for the module.
**Recommendation:** Add doc comments to both methods, e.g. `/// Maximum encoded/decoded gRPC message size in bytes (default 16 MiB).`
**Resolution:** Resolved in `0d8a28d` (2026-05-18): doc comments added to both methods.
### Client.Rust-002
| Field | Value |
|---|---|
| Severity | High |
| Category | mxaccessgw conventions |
| Location | `clients/rust/src/session.rs:522` |
| Status | Resolved |
**Description:** The `BulkReplyKind` enum's variants (`AddItemBulk`, `AdviseItemBulk`, `RemoveItemBulk`, `UnAdviseItemBulk`, `SubscribeBulk`, `UnsubscribeBulk`) all share the `Bulk` suffix, tripping `clippy::enum_variant_names`. Under `-D warnings` this is a compile error, so `cargo clippy --workspace --all-targets -- -D warnings` fails — a violation of the CLAUDE.md requirement that clippy pass cleanly.
**Recommendation:** Rename the variants to drop the common suffix (e.g. `AddItem`, `AdviseItem`, …) or add a narrowly-scoped `#[allow(clippy::enum_variant_names)]` with a reason comment.
**Resolution:** Resolved in `0d8a28d` (2026-05-18): variants renamed to `AddItem`/`AdviseItem`/`RemoveItem`/`UnAdviseItem`/`Subscribe`/`Unsubscribe`, which no longer share a common suffix.
### Client.Rust-003
| Field | Value |
|---|---|
| Severity | High |
| Category | Correctness & logic bugs |
| Location | `clients/rust/crates/mxgw-cli/src/main.rs:1051` |
| Status | Resolved |
**Description:** The unit test `version_json_output_has_protocol_versions` asserts `value["gatewayProtocolVersion"] == 2`, but `GATEWAY_PROTOCOL_VERSION` is `3` (version.rs:10), matching the authoritative server constant `GatewayContractInfo.GatewayProtocolVersion = 3`. The test fails, so `cargo test --workspace` (the documented test step) does not pass — the test was not updated when the protocol version was bumped.
**Recommendation:** Update the assertion to `3`, or better, assert against `GATEWAY_PROTOCOL_VERSION` so it cannot drift again.
**Resolution:** Resolved in `0d8a28d` (2026-05-18): the test now asserts against the `GATEWAY_PROTOCOL_VERSION` / `WORKER_PROTOCOL_VERSION` constants, so it cannot drift again.
### Client.Rust-004
| Field | Value |
|---|---|
| Severity | Low |
| Category | Documentation & comments |
| Location | `clients/rust/src/version.rs:7` |
| Status | Resolved |
**Description:** `CLIENT_VERSION` is `"0.1.0-dev"` and its doc comment claims "Mirrors `Cargo.toml`", but `Cargo.toml` declares `version = "0.1.0"` (no `-dev` suffix). The comment is misleading and the value is not actually kept in sync with the manifest.
**Recommendation:** Either set `CLIENT_VERSION` from the build via `env!("CARGO_PKG_VERSION")`, or correct the constant to `"0.1.0"` and drop the "Mirrors Cargo.toml" claim.
**Resolution:** Resolved in `0d8a28d` (2026-05-18): `CLIENT_VERSION` is now `env!("CARGO_PKG_VERSION")`, taken from `Cargo.toml` at compile time so the two cannot drift.
### Client.Rust-005
| Field | Value |
|---|---|
| Severity | Medium |
| Category | Correctness & logic bugs |
| Location | `clients/rust/src/session.rs:489-520` |
| Status | Resolved |
**Description:** `register_server_handle`, `add_item_handle`, and `add_item2_handle` fall through to `reply.return_value … .unwrap_or_default()`, returning `0` when the reply carries neither the expected typed payload nor an `Int32` `return_value`. Because `Session::invoke` has already confirmed `protocol_status == Ok`, a malformed-but-OK reply silently yields handle `0`, which the caller then uses as a real handle against the worker.
**Recommendation:** Return `Err(Error::ProtocolStatus { … })` (or a dedicated `Error::MalformedReply`) when an OK reply lacks an extractable handle, instead of defaulting to `0`.
**Resolution:** Resolved in `0d8a28d` (2026-05-18): the three handle extractors now return `Result<i32, Error>` and yield the new `Error::MalformedReply` when an OK reply carries no usable handle.
### Client.Rust-006
| Field | Value |
|---|---|
| Severity | Medium |
| Category | Error handling & resilience |
| Location | `clients/rust/src/session.rs:531-555` |
| Status | Resolved |
**Description:** `bulk_results` returns `Vec::new()` for any `(payload, kind)` combination that does not match the expected arm — including an OK reply carrying the wrong or no payload. A caller of `subscribe_bulk`/`add_item_bulk` then sees an empty result vector and cannot distinguish "zero items processed" from "gateway returned a shapeless reply".
**Recommendation:** Treat a missing/mismatched bulk payload on an OK reply as an error rather than an empty vector, or document the empty-vec fallback explicitly and log it.
**Resolution:** Resolved in `0d8a28d` (2026-05-18): `bulk_results` now returns `Result<Vec<SubscribeResult>, Error>` and yields `Error::MalformedReply` on a mismatched or absent bulk payload.
### Client.Rust-007
| Field | Value |
|---|---|
| Severity | Low |
| Category | Design-document adherence |
| Location | `clients/rust/RustClientDesign.md:14-55` |
| Status | Resolved |
**Description:** `RustClientDesign.md` is stale relative to the implemented code. It documents a nested `crates/mxgateway-client/` layout (the real crate root is `clients/rust/` with a flat `src/`), and lists `tracing` among "Expected dependencies", but `tracing` appears in no `Cargo.toml`. CLAUDE.md requires docs to change with the source.
**Recommendation:** Update `RustClientDesign.md` to the actual flat layout and remove `tracing` from the dependency list (or add `tracing` if structured logging is genuinely intended).
**Resolution:** Resolved in `0d8a28d` (2026-05-18): the "Crate Layout" section now shows the actual flat layout (`mxgateway-client` as the workspace-root crate, `mxgw-cli` as a member) and the unused `tracing` entry was removed from the dependency list.
### Client.Rust-008
| Field | Value |
|---|---|
| Severity | Low |
| Category | Performance & resource management |
| Location | `clients/rust/src/value.rs:161-261` |
| Status | Resolved |
**Description:** `MxValueProjection::from_proto` and `MxArrayProjection::from_proto` deep-clone every element out of the wire message while `MxValue`/`MxArrayValue` also retain the original `raw` message. Every `MxValue` therefore holds two copies of its payload, wasteful for large string arrays or raw blobs arriving on the event stream.
**Recommendation:** Compute the projection lazily on demand, or have the projection borrow from `raw`, so array/raw payloads are not duplicated for every wrapped value.
**Resolution:** Resolved in `0d8a28d` (2026-05-18): `MxValue` and `MxArrayValue` no longer cache a `projection` field — `projection()` computes the typed view on demand from `raw`. A value built only to be sent over the wire now holds a single copy of its payload and pays no projection cost.
### Client.Rust-009
| Field | Value |
|---|---|
| Severity | Low |
| Category | Testing coverage |
| Location | `clients/rust/tests/client_behavior.rs`, `clients/rust/src/galaxy.rs` |
| Status | Resolved |
**Description:** Several critical paths are untested: TLS channel setup (`with_plaintext(false)` / CA-file loading), mid-stream `tonic::Status` fault propagation through `EventStream`/`DeployEventStream` (tests only send `Ok` items), and the bulk-size cap (`ensure_bulk_size` rejecting >1000 items).
**Recommendation:** Add tests that (a) feed an `Err(Status)` into the event/deploy streams and assert it surfaces as the mapped `Error`, (b) assert `add_item_bulk` with 1001 items returns `Error::InvalidArgument`, and (c) exercise the CA-file/`InvalidEndpoint` error path.
**Resolution:** Resolved in `0d8a28d` (2026-05-18): added `add_item_bulk_rejects_input_above_the_thousand_item_cap`, `event_stream_surfaces_a_mid_stream_status_fault` (the fake gateway now optionally emits a mid-stream `Status::unavailable`), and `connect_with_unreadable_ca_file_reports_invalid_endpoint`.
### Client.Rust-010
| Field | Value |
|---|---|
| Severity | Low |
| Category | Error handling & resilience |
| Location | `clients/rust/src/client.rs:255-268`, `clients/rust/src/galaxy.rs:204-216` |
| Status | Resolved |
**Description:** The client applies only a per-call deadline via `Request::set_timeout` and has no retry, reconnect, or transient-vs-permanent classification. A transient `Unavailable` (e.g. a gateway restart) maps to the catch-all `Error::Status` and is indistinguishable from a permanent failure. This is an acceptable v1 stance but is undocumented.
**Recommendation:** Either add a documented `Error::Unavailable` variant classifying `Code::Unavailable`/`Code::ResourceExhausted`, or explicitly document in the README that the client performs no retries and that transient failures arrive as `Error::Status`.
**Resolution:** Resolved in `0d8a28d` (2026-05-18): added the `Error::Unavailable` variant; `From<tonic::Status>` maps `Code::Unavailable` and `Code::ResourceExhausted` to it, so callers can classify transient failures without unwrapping the raw status.
### Client.Rust-011
| Field | Value |
|---|---|
| Severity | Low |
| Category | mxaccessgw conventions |
| Location | `clients/rust/src/session.rs:469` |
| Status | Resolved |
**Description:** `command_request` hard-codes `client_correlation_id` as `format!("rust-client-{}", kind.as_str_name())`. Every invocation of the same command kind on a session uses an identical correlation id, so the id cannot correlate a specific request/reply pair in gateway logs or among concurrent in-flight calls. MXAccess parity diagnostics rely on correlation ids being unique per call.
**Recommendation:** Append a per-call unique suffix (monotonic counter or UUID) to the correlation id, or expose a way for the caller to supply one.
**Resolution:** Resolved in `0d8a28d` (2026-05-18): correlation ids are built by `next_correlation_id`, which appends a process-wide atomic sequence number; `Session::close` uses it too.
### Client.Rust-012
| Field | Value |
|---|---|
| Severity | High |
| Category | mxaccessgw conventions |
| Location | `clients/rust/src/galaxy.rs:282` |
| Status | Resolved |
**Description:** Found while verifying the fix for Client.Rust-001/002: `cargo clippy --workspace --all-targets -- -D warnings` reported a third violation the original review missed. The `get_last_deploy_time` test fake calls `.clone()` on a `MutexGuard<Option<prost_types::Timestamp>>`, and `Option<Timestamp>` is `Copy` (`clippy::clone_on_copy`). Under `-D warnings` this is a compile error, so clippy still did not pass after Client.Rust-001/002 alone.
**Recommendation:** Dereference instead of cloning: `*self.state.last_deploy.lock().unwrap()`.
**Resolution:** Resolved in `0d8a28d` (2026-05-18): replaced `.clone()` with a deref. `cargo clippy --workspace --all-targets -- -D warnings` now passes cleanly.
+147
View File
@@ -0,0 +1,147 @@
# Code Review — Contracts
| Field | Value |
|---|---|
| Module | `src/MxGateway.Contracts` |
| Reviewer | Claude Code |
| Review date | 2026-05-18 |
| Commit reviewed | `6c64030` |
| Status | Reviewed |
| Open findings | 8 |
## Checklist coverage
| # | Category | Result |
|---|---|---|
| 1 | Correctness & logic bugs | No functional bugs; one missing reply-payload case for the by-name ack command and an `int32`-typed `success` flag that reads like a bool (Contracts-002, Contracts-006). |
| 2 | mxaccessgw conventions | Additive-only evolution honored (no renumbered/removed tags), MXAccess-aligned naming consistent, generated code untouched; no `reserved` statements declared as a guardrail (Contracts-005). |
| 3 | Concurrency & thread safety | N/A — pure contract definitions plus a static const class with no shared mutable state. |
| 4 | Error handling & resilience | HRESULT / `MxStatusProxy` / `ProtocolStatus` carriers are complete; the worker-side by-name alarm ack has no dedicated reply payload (Contracts-002). |
| 5 | Security | Credential-sensitive fields are clearly commented; no secrets forced into loggable shapes. No issues found. |
| 6 | Performance & resource management | `DiscoverHierarchy` is paged; alarm-snapshot streams are server-streamed; no bloat issues. No issues found. |
| 7 | Design-document adherence | `.proto` files match design intent but `docs/Grpc.md` is stale (Contracts-001); worker vs public alarm-status shapes unreconciled in docs (Contracts-008). |
| 8 | Code organization & conventions | Package/file layout correct; `mxaccess_worker.proto` Protobuf item missing `ProtoRoot` (Contracts-003); stale class summary (Contracts-004). |
| 9 | Testing coverage | Gateway/worker/alarm round-trips covered; Galaxy Repository protos and raw `MxArray` paths untested (Contracts-007). |
| 10 | Documentation & comments | Proto comments accurate and domain-rich; one stale class summary (Contracts-004). |
## Findings
### Contracts-001
| Field | Value |
|---|---|
| Severity | Low |
| Category | Design-document adherence |
| Location | `docs/Grpc.md:13` (and `:3`, `:32`, `:39`) |
| Status | Open |
**Description:** `mxaccess_gateway.proto` now declares six RPCs on `MxAccessGateway` (`OpenSession`, `CloseSession`, `Invoke`, `StreamEvents`, `AcknowledgeAlarm`, `QueryActiveAlarms`). `docs/Grpc.md` still describes "the four `MxAccessGateway` RPCs" in its type table and omits `AcknowledgeAlarm`/`QueryActiveAlarms` from the Validation Rules table. CLAUDE.md requires docs to change in the same commit as the contract; the alarm RPC commits left this doc stale and misleading about the public surface.
**Recommendation:** Update `docs/Grpc.md` to enumerate all six RPCs and add `AcknowledgeAlarm`/`QueryActiveAlarms` to the type/handler and validation tables, or explicitly cross-reference `AlarmClientDiscovery.md`.
**Resolution:** _(open)_
### Contracts-002
| Field | Value |
|---|---|
| Severity | Medium |
| Category | Error handling & resilience |
| Location | `src/MxGateway.Contracts/Protos/mxaccess_gateway.proto:384-385`, `:95` |
| Status | Open |
**Description:** `MxCommandKind` includes `MX_COMMAND_KIND_ACKNOWLEDGE_ALARM_BY_NAME = 29` and `MxCommand.payload` carries `AcknowledgeAlarmByNameCommand acknowledge_alarm_by_name_command = 38`, but `MxCommandReply.payload` has only `acknowledge_alarm = 34` and `query_active_alarms = 35` — there is no by-name reply case. The by-name ack must reuse `AcknowledgeAlarmReplyPayload` or rely on the top-level `hresult`. The command/reply payload asymmetry is undocumented and easy to dispatch incorrectly.
**Recommendation:** Either add an explicit comment to `MxCommandReply` stating that by-name ack reuses the `acknowledge_alarm` payload case, or add a dedicated payload case for symmetry, and document the chosen contract in `docs/Contracts.md` / `AlarmClientDiscovery.md`.
**Resolution:** _(open)_
### Contracts-003
| Field | Value |
|---|---|
| Severity | Low |
| Category | Code organization & conventions |
| Location | `src/MxGateway.Contracts/MxGateway.Contracts.csproj:10` |
| Status | Open |
**Description:** The `<Protobuf>` item for `mxaccess_worker.proto` omits `ProtoRoot="Protos"`, while the items for `mxaccess_gateway.proto` (line 9) and `galaxy_repository.proto` (line 11) both set it. `mxaccess_worker.proto` does `import "mxaccess_gateway.proto"`, which resolves only because Grpc.Tools adds the importing file's own directory to the proto path. The inconsistency is fragile — tooling changes to ProtoRoot handling could break import resolution.
**Recommendation:** Add `ProtoRoot="Protos"` to the `mxaccess_worker.proto` `<Protobuf>` item so all three entries are consistent.
**Resolution:** _(open)_
### Contracts-004
| Field | Value |
|---|---|
| Severity | Low |
| Category | Documentation & comments |
| Location | `src/MxGateway.Contracts/GatewayContractInfo.cs:3-6` |
| Status | Open |
**Description:** The XML summary says the class exposes version metadata "before generated protobuf contracts are introduced." Generated protobuf contracts have long been introduced and are consumed across the solution. The comment is stale; the class now holds the authoritative `GatewayProtocolVersion`/`WorkerProtocolVersion` advertised in `OpenSessionReply` and used to validate `WorkerEnvelope` framing.
**Recommendation:** Reword the summary to describe the current purpose — version constants advertised in `OpenSessionReply` and used to validate `WorkerEnvelope` protocol framing.
**Resolution:** _(open)_
### Contracts-005
| Field | Value |
|---|---|
| Severity | Low |
| Category | mxaccessgw conventions |
| Location | `src/MxGateway.Contracts/Protos/mxaccess_gateway.proto`, `src/MxGateway.Contracts/Protos/mxaccess_worker.proto` |
| Status | Open |
**Description:** The ProtobufStyleGuide mandates reserving removed field numbers / enum values. Evolution to date has been purely additive, so this is not a current violation — but none of the `.proto` files contain any `reserved` declarations, leaving no in-file guardrail for the first removal. This is a latent maintainability gap.
**Recommendation:** When any field or enum value is eventually removed, add a `reserved` range/name in the same change. Consider a short comment block in each message documenting the policy so future editors apply `reserved` rather than reusing tags.
**Resolution:** _(open)_
### Contracts-006
| Field | Value |
|---|---|
| Severity | Low |
| Category | Correctness & logic bugs |
| Location | `src/MxGateway.Contracts/Protos/mxaccess_gateway.proto:647` |
| Status | Open |
**Description:** `MxStatusProxy.success` is declared `int32 success = 1` with no comment. The name reads like a boolean flag but the type is a 32-bit integer (mirroring MXAccess `MXSTATUS_PROXY`, which stores a numeric success/HResult-like value). Without a comment a client author can reasonably misinterpret the field (treat non-1 as failure, or expect only 0/1).
**Recommendation:** Add a comment clarifying the semantic — what range of values it carries and how 0 vs non-zero map to MXAccess status — per the style guide rule to comment fields carrying raw MXAccess status detail.
**Resolution:** _(open)_
### Contracts-007
| Field | Value |
|---|---|
| Severity | Low |
| Category | Testing coverage |
| Location | `src/MxGateway.Tests/Contracts/ProtobufContractRoundTripTests.cs` |
| Status | Open |
**Description:** `ProtobufContractRoundTripTests` covers gateway command/reply/event, alarm transition, alarm ack request/reply, active-alarm snapshot, and the worker envelope. It has no coverage for: (a) any `galaxy_repository.proto` message (`DiscoverHierarchy*`, `GalaxyObject`, `GalaxyAttribute`, `DeployEvent`, the `root` oneof, wrapper-typed fields); (b) `BulkSubscribeReply`/`SubscribeResult` and the bulk command kinds; (c) `MxValue`/`MxArray` `raw_value`/`RawArray` (`bytes`) paths and the `WorkerFault`/`WorkerHeartbeat` IPC bodies.
**Recommendation:** Add round-trip tests for the Galaxy Repository messages (including the `root` oneof and proto wrapper fields), the bulk-subscribe reply, and the remaining `WorkerEnvelope` body cases.
**Resolution:** _(open)_
### Contracts-008
| Field | Value |
|---|---|
| Severity | Low |
| Category | Design-document adherence |
| Location | `src/MxGateway.Contracts/Protos/mxaccess_gateway.proto:451-459`, `:627-636` |
| Status | Open |
**Description:** The worker-side `AcknowledgeAlarmReplyPayload` carries the alarm-ack outcome as `int32 native_status`, while the public `AcknowledgeAlarmReply` carries it as `MxStatusProxy status` plus `optional int32 hresult`. The comment explains the worker echoes `native_status` into `AcknowledgeAlarmReply.hresult`, but the two outcome shapes (raw `int32` vs structured `MxStatusProxy`) are not reconciled in `docs/Contracts.md` / `AlarmClientDiscovery.md`. A reader cannot tell whether `MxStatusProxy status` is always populated or only on COM-layer failure.
**Recommendation:** Document in `docs/Contracts.md` (or `AlarmClientDiscovery.md`) how the worker `native_status` maps onto the public reply's `status`/`hresult` pair so client authors know which field is authoritative.
**Resolution:** _(open)_
+177
View File
@@ -0,0 +1,177 @@
# Code Review — IntegrationTests
| Field | Value |
|---|---|
| Module | `src/MxGateway.IntegrationTests` |
| Reviewer | Claude Code |
| Review date | 2026-05-18 |
| Commit reviewed | `6c64030` |
| Status | Reviewed |
| Open findings | 4 |
## Checklist coverage
| # | Category | Result |
|---|---|---|
| 1 | Correctness & logic bugs | Issues found: IntegrationTests-003 (asserts only on first event), IntegrationTests-010 (`WaitForFirstMessageAsync` ignores cancellation). |
| 2 | mxaccessgw conventions | Live tests correctly gated and skip (not fail) when prerequisites are absent; `LiveGalaxyRepositoryFactAttribute` undocumented in the opt-in matrix. |
| 3 | Concurrency & thread safety | Issue found: IntegrationTests-007 (no `[Collection]`/parallelism guard for shared MXAccess/ZB/GLAuth). |
| 4 | Error handling & resilience | Issue found: IntegrationTests-004 (cleanup `WaitAsync` can mask the original failure). |
| 5 | Security | No production secrets; only documented dev GLAuth creds and a localhost ZB connection string, all env-overridable. No issues found. |
| 6 | Performance & resource management | Worker process disposed transitively via session disposal; no leaked pipes/COM/processes. No issues found. |
| 7 | Design-document adherence | Issues found: IntegrationTests-001 (Galaxy live suite absent from the opt-in matrix), IntegrationTests-002 (`GwAdmin` LDAP prerequisite undocumented). |
| 8 | Code organization & conventions | Issue found: IntegrationTests-008 (three near-identical fact attributes). |
| 9 | Testing coverage | Issues found: IntegrationTests-005 (thin MXAccess parity coverage), IntegrationTests-006 (thin LDAP failure-path coverage). |
| 10 | Documentation & comments | Issue found: IntegrationTests-009 (`TestServerCallContext` mislabelled "Mock"). |
## Findings
### IntegrationTests-001
| Field | Value |
|---|---|
| Severity | High |
| Category | Design-document adherence |
| Location | `src/MxGateway.IntegrationTests/Galaxy/LiveGalaxyRepositoryFactAttribute.cs:7`, `src/MxGateway.IntegrationTests/Galaxy/GalaxyRepositoryLiveTests.cs` |
| Status | Resolved |
**Description:** The Galaxy Repository live test suite and its gating env var `MXGATEWAY_RUN_LIVE_GALAXY_TESTS` (plus connection-string override `MXGATEWAY_LIVE_GALAXY_CONN`) are completely absent from `docs/GatewayTesting.md`. CLAUDE.md mandates updating docs in the same change as the source. The opt-in matrix documents only the MXAccess and LDAP env vars, so an operator running the documented matrix has no way to know these tests exist or how to enable them.
**Recommendation:** Add a "Live Galaxy Repository" section to `docs/GatewayTesting.md` documenting `MXGATEWAY_RUN_LIVE_GALAXY_TESTS=1`, `MXGATEWAY_LIVE_GALAXY_CONN`, the `ZB` database prerequisite, and the covered RPCs, mirroring the existing "Live MXAccess Smoke" section.
**Resolution:** Resolved 2026-05-18: Added a "Live Galaxy Repository" section to `docs/GatewayTesting.md` documenting `MXGATEWAY_RUN_LIVE_GALAXY_TESTS`, `MXGATEWAY_LIVE_GALAXY_CONN`, the deployed-`ZB` prerequisite, and the covered `GalaxyRepository` RPCs.
### IntegrationTests-002
| Field | Value |
|---|---|
| Severity | High |
| Category | Design-document adherence |
| Location | `src/MxGateway.IntegrationTests/DashboardLdapLiveTests.cs:13`, `src/MxGateway.Server/Configuration/LdapOptions.cs:27` |
| Status | Resolved |
**Description:** `DashboardLdapLiveTests` builds the authenticator with `new GatewayOptions()`, so it relies on `LdapOptions.RequiredGroup` defaulting to `GwAdmin` and asserts the `admin` user is a member of a `GwAdmin` LDAP group. `glauth.md` does not list `GwAdmin` as a provisioned group — it lists `admin` only in the five role groups and describes `GwAdmin` as a group to add "when reuse isn't enough." If GLAuth has only the documented baseline groups, `AuthenticateAsync_AdminInGwAdminGroup_Succeeds` fails (not skips) on any box where the env var is set. This is an undocumented hard prerequisite beyond "LDAP is up."
**Recommendation:** Either document the required `GwAdmin` GLAuth provisioning step in `glauth.md` and `GatewayTesting.md`, or have the test set `RequiredGroup` to a baseline group `glauth.md` guarantees `admin` belongs to (e.g. `WriteOperate`).
**Resolution:** Resolved 2026-05-18: Took the documentation fix — promoted the `glauth.md` "Adding a gw-specific group" section into a concrete "Provisioning the GwAdmin group" step that grants `GwAdmin` to `admin`, cross-referenced it from the groups/verification sections, and added a "Live LDAP" section to `docs/GatewayTesting.md` calling out `GwAdmin` as a hard prerequisite. Alternative considered: weaken the test to a baseline group (`WriteOperate`) — rejected because `GwAdmin` is the real default `LdapOptions.RequiredGroup` and the test should exercise it.
### IntegrationTests-003
| Field | Value |
|---|---|
| Severity | Medium |
| Category | Correctness & logic bugs |
| Location | `src/MxGateway.IntegrationTests/WorkerLiveMxAccessSmokeTests.cs:89-97` |
| Status | Resolved |
**Description:** The test asserts only on the first `MxEvent` recorded by `RecordingServerStreamWriter`. A live MXAccess provider can deliver an initial state/quality event whose family or handles differ from the expected `OnDataChange` (e.g. a registration-state or bad-quality bootstrap event). Because `WaitForFirstMessageAsync` returns whatever arrives first, a genuine ordering/family defect could fail spuriously or leave later wrong events unverified.
**Recommendation:** Filter for the first event with `Family == OnDataChange` (with a bounded retry/poll) or assert the full recorded sequence, so the test verifies the event the worker is supposed to emit.
**Resolution:** Resolved 2026-05-18: Confirmed against source — `WaitForFirstMessageAsync` completed a `TaskCompletionSource` on the very first `WriteAsync`. Replaced it with `RecordingServerStreamWriter.WaitForMessageAsync(predicate, timeout)`, which scans recorded messages, skips earlier non-matching events, and blocks on a `SemaphoreSlim` until a matching one arrives or the timeout elapses (throwing a `TimeoutException` that reports the scanned count). `GatewaySession_WithLiveWorker_RegistersAdvisesStreamsDataAndCloses` now waits for the first `Family == OnDataChange` event. Live execution was not possible in this environment (no MXAccess COM); verified by build.
### IntegrationTests-004
| Field | Value |
|---|---|
| Severity | Medium |
| Category | Error handling & resilience |
| Location | `src/MxGateway.IntegrationTests/WorkerLiveMxAccessSmokeTests.cs:108-111` |
| Status | Resolved |
**Description:** In the `finally` block, after `CloseSessionAsync`, the test does `await streamTask.WaitAsync(StreamShutdownTimeout)`. If closing the session does not promptly complete the stream (or `StreamEvents` itself faults), this throws `TimeoutException` from inside `finally`, which replaces/masks any original assertion failure from the `try` block. The diagnostic value of the real failure is lost.
**Recommendation:** Wrap the `streamTask.WaitAsync` (and ideally `WaitForProcessesAsync`) in a try/catch that logs the cleanup exception via `output.WriteLine` instead of letting it propagate.
**Resolution:** Resolved 2026-05-18: Confirmed — the `finally` block awaited `streamTask.WaitAsync` and `WaitForProcessesAsync` with no exception handling. Extracted a shared `ShutDownAsync` helper that wraps the session-close + stream-drain in one try/catch and the worker-process wait in a second try/catch, logging each cleanup exception via `output.WriteLine` instead of throwing. All three live tests now route shutdown through it, so a cleanup timeout can no longer mask an assertion failure. Live execution was not possible in this environment; verified by build.
### IntegrationTests-005
| Field | Value |
|---|---|
| Severity | Medium |
| Category | Testing coverage |
| Location | `src/MxGateway.IntegrationTests/WorkerLiveMxAccessSmokeTests.cs` |
| Status | Resolved |
**Description:** The only live MXAccess test covers the Register→AddItem→Advise→one-OnDataChange→Close happy path. CLAUDE.md stresses that MXAccess parity is the contract and calls out non-obvious behaviors (`WriteSecured` ordering, `OperationComplete` semantics, invalid-handle exceptions). None of `Write`, `WriteSecured`, `Unadvise`, `RemoveItem`, `Unregister`, `OperationComplete`, an invalid-handle command, or a worker-fault path is exercised against live COM — exactly the paths fake-worker tests cannot validate.
**Recommendation:** Add live coverage for at least a `Write` round-trip and an invalid-handle command, plus a worker-fault/abnormal-exit scenario, even if behind additional opt-in env vars.
**Resolution:** Resolved 2026-05-18: Added two `[LiveMxAccessFact]`-gated tests to `WorkerLiveMxAccessSmokeTests`. `GatewaySession_WithLiveWorker_WritesValueToAdvisedItem` registers/adds/advises then issues a `Write` of an integer value, asserting the command round-trips with `ProtocolStatusCode.Ok` and `MxCommandKind.Write`. `GatewaySession_WithLiveWorker_InvalidHandleCommand_SurfacesFailureWithoutTransportFault` issues `AddItem` against `int.MaxValue` as the server handle (never issued by MXAccess) and asserts the failure surfaces in the command reply without a usable item handle. Both reuse the existing opt-in env var and the `ShutDownAsync` cleanup helper. A worker-fault/abnormal-exit case was deliberately scoped out — it needs a controlled COM crash injection beyond what the existing harness supports; the two added cases cover the `Write` round-trip and invalid-handle paths the recommendation calls out. Live execution was not possible in this environment; verified by build.
### IntegrationTests-006
| Field | Value |
|---|---|
| Severity | Medium |
| Category | Testing coverage |
| Location | `src/MxGateway.IntegrationTests/DashboardLdapLiveTests.cs` |
| Status | Resolved |
**Description:** LDAP live coverage is two cases: admin succeeds, readonly is denied for missing group. There is no coverage of a wrong password for a valid user, an unknown username, or the LDAP-server-unreachable path — all of which `DashboardAuthenticator` has distinct branches for (the `LdapException` catch, the `candidate is null` branch). The negative test only proves group-membership denial, not credential rejection.
**Recommendation:** Add a live test for `admin` with a wrong password asserting `Succeeded == false` and that the password is not leaked into `FailureMessage`, and a test for an unknown username.
**Resolution:** Resolved 2026-05-18: Added three `[LiveLdapFact]`-gated tests to `DashboardLdapLiveTests`. `AuthenticateAsync_AdminWithWrongPassword_FailsWithoutLeakingPassword` exercises the `LdapException` catch via a rejected candidate bind and asserts the wrong password never reaches `FailureMessage`. `AuthenticateAsync_UnknownUsername_Fails` exercises the `candidate is null` branch. `AuthenticateAsync_ServerUnreachable_FailsWithoutThrowing` builds the authenticator with `LdapOptions.Port = 1` (a reserved port no LDAP server listens on) and asserts the connect failure is absorbed into a failed result rather than thrown — covering the generic `catch (Exception)` branch. All three are gated by the existing `MXGATEWAY_RUN_LIVE_LDAP_TESTS` opt-in so they stay opt-in. Live execution was not possible in this environment (no live LDAP); verified by build.
### IntegrationTests-007
| Field | Value |
|---|---|
| Severity | Low |
| Category | Concurrency & thread safety |
| Location | `src/MxGateway.IntegrationTests/WorkerLiveMxAccessSmokeTests.cs:20`, `src/MxGateway.IntegrationTests/Galaxy/GalaxyRepositoryLiveTests.cs:5`, `src/MxGateway.IntegrationTests/DashboardLdapLiveTests.cs:9` |
| Status | Open |
**Description:** The live test classes contend for genuinely shared singletons — one MXAccess COM provider, one ZB SQL database, one GLAuth instance with a 3-fail/10-minute per-IP lockout. No `[Collection]` annotation or `DisableTestParallelization` is declared, so xUnit's default cross-class parallelism could run the Galaxy tests concurrently or interleave an LDAP failure burst that trips the GLAuth lockout.
**Recommendation:** Place the live test classes in a shared `[Collection]`, or set `[assembly: CollectionBehavior(DisableTestParallelization = true)]` for this opt-in project, so live external resources are accessed serially.
**Resolution:** _(open)_
### IntegrationTests-008
| Field | Value |
|---|---|
| Severity | Low |
| Category | Code organization & conventions |
| Location | `src/MxGateway.IntegrationTests/LiveLdapFactAttribute.cs`, `src/MxGateway.IntegrationTests/Galaxy/LiveGalaxyRepositoryFactAttribute.cs`, `src/MxGateway.IntegrationTests/LiveMxAccessFactAttribute.cs` |
| Status | Open |
**Description:** Three near-identical fact attributes each re-implement the same "compare env var to `1` with `StringComparison.Ordinal`, set `Skip` otherwise" pattern. `LiveMxAccessFactAttribute` delegates to `IntegrationTestEnvironment` while the other two inline the logic, so the project has two divergent styles for the same concern.
**Recommendation:** Extract a shared helper (e.g. `IntegrationTestEnvironment.IsEnabled(string variableName)`) and have all three attributes call it.
**Resolution:** _(open)_
### IntegrationTests-009
| Field | Value |
|---|---|
| Severity | Low |
| Category | Documentation & comments |
| Location | `src/MxGateway.IntegrationTests/WorkerLiveMxAccessSmokeTests.cs:372-375` |
| Status | Open |
**Description:** `TestServerCallContext` is XML-documented as a "Mock server call context," but it is a hand-written stub/fake with no mocking framework and no verification behavior. Per the style guides (accurate naming; explain why not what), calling it a mock misleads readers who may expect verifiable interactions.
**Recommendation:** Reword the summary to "test stub" / "minimal `ServerCallContext` implementation for in-process gRPC calls."
**Resolution:** _(open)_
### IntegrationTests-010
| Field | Value |
|---|---|
| Severity | Low |
| Category | Correctness & logic bugs |
| Location | `src/MxGateway.IntegrationTests/WorkerLiveMxAccessSmokeTests.cs:366-369` |
| Status | Open |
**Description:** `WaitForFirstMessageAsync` accepts only a `timeout` and never observes a `CancellationToken`. There is no per-test cancellation propagation, so if the gateway/worker hangs without writing an event the test relies solely on the 15s `WaitAsync` timeout and gives no contextual diagnostics. Combined with IntegrationTests-004, a hung live worker produces a bare `TimeoutException`.
**Recommendation:** Accept a `CancellationToken` (linked to `TestServerCallContext`'s token), pass it to `firstMessage.Task.WaitAsync(timeout, token)`, and on timeout emit the recorded `Messages` count via `output.WriteLine` before throwing.
**Resolution:** _(open)_
+165
View File
@@ -0,0 +1,165 @@
# Code Reviews
<!-- GENERATED FILE - do not edit by hand. Regenerate with: python code-reviews/regen-readme.py -->
Cross-module code review index for the `mxaccessgw` codebase. The review process is defined in [../REVIEW-PROCESS.md](../REVIEW-PROCESS.md).
Each module's `findings.md` is the source of truth; this file is generated from them by `regen-readme.py` and must not be edited by hand.
## Module status
| Module | Reviewer | Date | Commit | Status | Open | Total |
|---|---|---|---|---|---|---|
| [Client.Dotnet](Client.Dotnet/findings.md) | Claude Code | 2026-05-18 | `3cc53a8` | Reviewed | 5 | 8 |
| [Client.Go](Client.Go/findings.md) | Claude Code | 2026-05-18 | `3cc53a8` | Reviewed | 7 | 10 |
| [Client.Java](Client.Java/findings.md) | Claude Code | 2026-05-18 | `3cc53a8` | Reviewed | 7 | 12 |
| [Client.Python](Client.Python/findings.md) | Claude Code | 2026-05-18 | `3cc53a8` | Reviewed | 9 | 12 |
| [Client.Rust](Client.Rust/findings.md) | Claude Code | 2026-05-18 | `3cc53a8` | Reviewed | 0 | 12 |
| [Contracts](Contracts/findings.md) | Claude Code | 2026-05-18 | `6c64030` | Reviewed | 8 | 8 |
| [IntegrationTests](IntegrationTests/findings.md) | Claude Code | 2026-05-18 | `6c64030` | Reviewed | 4 | 10 |
| [Server](Server/findings.md) | Claude Code | 2026-05-18 | `6c64030` | Reviewed | 8 | 14 |
| [Tests](Tests/findings.md) | Claude Code | 2026-05-18 | `6c64030` | Reviewed | 6 | 12 |
| [Worker](Worker/findings.md) | Claude Code | 2026-05-18 | `6c64030` | Reviewed | 7 | 15 |
| [Worker.Tests](Worker.Tests/findings.md) | Claude Code | 2026-05-18 | `6c64030` | Reviewed | 8 | 15 |
## Pending findings
Findings with status `Open` or `In Progress`, ordered by severity.
| ID | Severity | Category | Location | Description |
|---|---|---|---|---|
| Contracts-002 | Medium | Error handling & resilience | `src/MxGateway.Contracts/Protos/mxaccess_gateway.proto:384-385`, `:95` | `MxCommandKind` includes `MX_COMMAND_KIND_ACKNOWLEDGE_ALARM_BY_NAME = 29` and `MxCommand.payload` carries `AcknowledgeAlarmByNameCommand acknowledge_alarm_by_name_command = 38`, but `MxCommandReply.payload` has only `acknowledge_alarm = 34… |
| Client.Dotnet-004 | Low | Error handling & resilience | `clients/dotnet/MxGateway.Client/MxGatewayClient.cs:283-294`, `clients/dotnet/MxGateway.Client/GalaxyRepositoryClient.cs:392-403` | `ExecuteSafeUnaryAsync` wraps the whole Polly retry pipeline in a single linked CTS cancelled after `Options.DefaultCallTimeout`, while `CreateCallOptions` also stamps each individual call with a `DefaultCallTimeout` gRPC deadline. The ret… |
| Client.Dotnet-005 | Low | Correctness & logic bugs | `clients/dotnet/MxGateway.Client/MxGatewaySession.cs:82,124,175` | `RegisterAsync`/`AddItemAsync`/`AddItem2Async` return `reply.<Typed>?.ServerHandle ?? reply.ReturnValue.Int32Value`. After `EnsureMxAccessSuccess()` passes, a missing typed payload silently falls back to `ReturnValue.Int32Value`, which for… |
| Client.Dotnet-006 | Low | Code organization & conventions | `clients/dotnet/MxGateway.Client/MxGatewayClientOptions.cs:50`, `clients/dotnet/MxGateway.Client/MxGatewayClientContractInfo.cs:10-14` | `MxGatewayClientOptions.MaxGrpcMessageBytes` and the two `const`s in `MxGatewayClientContractInfo` are public members with no XML doc comments, inconsistent with every other public member in the assembly and with the repo's documented C# s… |
| Client.Dotnet-007 | Low | Documentation & comments | `clients/dotnet/MxGateway.Client/MxGatewayClient.cs:185-192` | The `AcknowledgeAlarmAsync` XML comment states the gateway authenticates against an `invoke:alarm-ack` scope, but `CLAUDE.md` documents the scope set without any `invoke:alarm-ack` sub-scope. The comment may describe an intended finer-grai… |
| Client.Dotnet-008 | Low | Correctness & logic bugs | `clients/dotnet/MxGateway.Client.Cli/MxGatewayCliSecretRedactor.cs:9-17` | The CLI redactor only removes the API key string when it was supplied via `--api-key`; `RunCoreAsync` passes `arguments.GetOptional("api-key")` to `Redact`. When the key comes from an environment variable (`--api-key-env`, the documented d… |
| Client.Go-004 | Low | mxaccessgw conventions | `clients/go/mxgateway/alarms_test.go:153-154`, `clients/go/mxgateway/galaxy_test.go:58-59` | `gofmt -l` flags `alarms_test.go` and `galaxy_test.go` for misaligned struct-literal field padding. The Go client README lists `gofmt` as part of the workflow and the repo enforces style; unformatted committed code breaks `gofmt`-gated che… |
| Client.Go-005 | Low | Design-document adherence | `clients/go/mxgateway/client.go:64,68`, `clients/go/mxgateway/galaxy.go:83,87` | The client uses `grpc.DialContext` with `grpc.WithBlock()`. In current grpc-go both are deprecated in favour of `grpc.NewClient` (lazy connection). `WithBlock` also changes failure semantics: a transient gateway-unavailable at dial time be… |
| Client.Go-006 | Low | Error handling & resilience | `clients/go/mxgateway/errors.go:9-130` | `docs/ClientLibrariesDesign.md` recommends a high-level error taxonomy (`TransportError`, `AuthenticationError`, `TimeoutError`, etc.). The Go client collapses all transport/gRPC failures into a single `GatewayError` with no way to classif… |
| Client.Go-007 | Low | Correctness & logic bugs | `clients/go/mxgateway/session.go:526-532` | `newCorrelationID` returns an empty string when `crypto/rand.Read` fails, silently producing an `MxCommandRequest` with no correlation id. `rand.Read` failure is rare, but the failure mode (untraceable command, no error surfaced) is worse… |
| Client.Go-008 | Low | Testing coverage | `clients/go/mxgateway/` (test files) | Several critical paths are untested: TLS credential resolution in `resolveTransportCredentials` (only the `Plaintext` path is exercised); the `callContext` deadline-shortening logic (`client.go:198-204`) including the negative-timeout disa… |
| Client.Go-009 | Low | Code organization & conventions | `clients/go/mxgateway/galaxy.go:60-93,241-256`, `clients/go/mxgateway/client.go:41-74,190-205` | `DialGalaxy`/`Dial` and `GalaxyClient.callContext`/`Client.callContext` are near-identical duplicates (dial-context setup, credential resolution, dial-option assembly, deadline arithmetic). A fix to one (e.g. the Client.Go-005 dial migrati… |
| Client.Go-010 | Low | Documentation & comments | `clients/go/mxgateway/client.go:39-40` | The `Dial` doc comment states it configures "blocking dial cancellation from ctx." This describes the deprecated `WithBlock` behaviour; once Client.Go-005 is addressed the comment is misleading about how connection establishment and cancel… |
| Client.Java-006 | Low | Performance & resource management | `clients/java/mxgateway-client/src/main/java/com/dohertylan/mxgateway/client/MxGatewayClient.java:323-328`, `clients/java/mxgateway-client/src/main/java/com/dohertylan/mxgateway/client/GalaxyRepositoryClient.java:279-284` | `close()` (the `AutoCloseable` method invoked by try-with-resources) calls only `ownedChannel.shutdown()` and returns immediately without awaiting termination. In-flight calls and Netty event-loop threads may still be running when the call… |
| Client.Java-007 | Low | Testing coverage | `clients/java/mxgateway-client/src/test/java/com/dohertylan/mxgateway/client/` | The alarm surface — `acknowledgeAlarm`/`acknowledgeAlarmAsync`/`queryActiveAlarms` and `MxGatewayActiveAlarmsSubscription` — has zero test coverage. TLS channel construction, the async `streamEventsAsync` path, `MxGatewayEventSubscription`… |
| Client.Java-008 | Low | Error handling & resilience | `clients/java/mxgateway-client/src/main/java/com/dohertylan/mxgateway/client/MxGatewayClient.java:298-304` | `acknowledgeAlarmAsync` and `openSessionAsync` apply `ensureProtocolSuccess` inside `thenApply`. If that validator throws a non-`MxGatewayException` `RuntimeException` it is wrapped by `CompletionException` with no `fromGrpc` normalisation… |
| Client.Java-009 | Low | Code organization & conventions | `clients/java/mxgateway-client/src/main/java/com/dohertylan/mxgateway/client/GalaxyRepositoryClient.java:310-391`, `clients/java/mxgateway-client/src/main/java/com/dohertylan/mxgateway/client/MxGatewayClient.java:346-413` | `createChannel`, `withDeadline`, `withStreamDeadline`, and `toCompletable` are duplicated nearly verbatim across `MxGatewayClient` and `GalaxyRepositoryClient` (~80 lines). A fix to one will not propagate to the other. |
| Client.Java-010 | Low | Documentation & comments | `clients/java/mxgateway-client/src/main/java/com/dohertylan/mxgateway/client/MxGatewayClient.java:269-272`, `clients/java/README.md:76` | The `acknowledgeAlarm` Javadoc states the gateway authenticates against an `invoke:alarm-ack` scope, and the README states the Galaxy Repository requires a `metadata:read` scope. CLAUDE.md's documented scope set names neither — the Javadoc… |
| Client.Java-011 | Low | Performance & resource management | `clients/java/mxgateway-client/src/main/java/com/dohertylan/mxgateway/client/MxEventStream.java:37-63` | The event stream relies on default gRPC auto-inbound flow control: the async stub auto-requests messages, so the server can push faster than the 16-element bounded queue drains. A momentarily slow consumer triggers queue overflow and an im… |
| Client.Java-012 | Low | Correctness & logic bugs | `clients/java/mxgateway-cli/src/main/java/com/dohertylan/mxgateway/cli/MxGatewayCli.java:667-674` | `CommonOptions.resolved()` mutates `this` (`resolvedApiKey`, `resolvedTimeout`) and returns `this`, but `toClientOptions()` and `redactedJsonMap()` read those mutated fields. If `redactedJsonMap()` is ever called before `resolved()`, it si… |
| Client.Python-001 | Low | Documentation & comments | `clients/python/pyproject.toml:8,25`, `clients/python/src/mxgateway_cli/commands.py:25` | The package `description` in `pyproject.toml` still says "Async Python client *scaffold*" even though the client is fully implemented. Stale "scaffold" wording misrepresents maturity to anyone reading PyPI metadata. (The `mxgw-py` console-… |
| Client.Python-002 | Low | Code organization & conventions | `clients/python/src/mxgateway/__init__.py:27` | `MxGatewayCommandError` is imported into `__init__.py` and is a documented public exception, but it is missing from `__all__`. It is the parent of `MxAccessError` and a meaningful catch target, so omitting it from the public surface is inc… |
| Client.Python-004 | Low | Correctness & logic bugs | `clients/python/src/mxgateway_cli/commands.py:386,402-404` | In `_smoke`, the local variable `closed` is set to `False` and never reassigned; the `finally` block's `if not closed:` is therefore always true. This is dead/misleading code suggesting a removed early-close path. |
| Client.Python-006 | Low | Concurrency & thread safety | `clients/python/src/mxgateway/client.py:74-82`, `clients/python/src/mxgateway/galaxy.py:85-93`, `clients/python/src/mxgateway/session.py:38-55` | `close()` on the clients and `Session.close()` use a plain `self._closed` check-then-set with an `await` between, with no lock. If two coroutines call `close()` concurrently both can pass the guard before either sets it, causing a double `… |
| Client.Python-007 | Low | Error handling & resilience | `clients/python/src/mxgateway/client.py:204-213` | `_canceling_iterator` (gateway event stream) does not catch `asyncio.CancelledError` to invoke `call.cancel()` explicitly — it relies on the `finally` block. `galaxy.py:_canceling_iterator` *does* explicitly catch `CancelledError`, cancel,… |
| Client.Python-008 | Low | Correctness & logic bugs | `clients/python/src/mxgateway/values.py:62-67,83-88` | `to_mx_value` maps any Python `float` to `VT_R8`/`MX_DATA_TYPE_DOUBLE` with no handling for `nan`/`inf`, which are serialised and forwarded to MXAccess which may reject or mis-handle them. `bytes` is mapped to `VT_RECORD`/`MX_DATA_TYPE_UNK… |
| Client.Python-010 | Low | Code organization & conventions | `clients/python/src/mxgateway/session.py:404`, `clients/python/src/mxgateway_cli/commands.py:422-425` | `session.py` ends with a module-level late import `from .client import GatewayClient # noqa: E402` purely to satisfy a string type hint, and `commands.py:_session` does a function-local import. Both work around a circular dependency that `… |
| Client.Python-011 | Low | Error handling & resilience | `clients/python/src/mxgateway/errors.py:122-148` | `ensure_mxaccess_success` raises `MxAccessError` if any `mx_status.success == 0`. This treats `success == 0` as the failure sentinel, but `0` is also the proto3 scalar default for an unset `MxStatusProxy`. If the gateway ever returns a rep… |
| Client.Python-012 | Low | mxaccessgw conventions | `clients/python/src/mxgateway/client.py:84-108`, `clients/python/src/mxgateway/session.py:57-77` | `Session.invoke_raw` does not run `ensure_mxaccess_success` while `Session.invoke` does, so a caller using `invoke_raw` for parity tests gets a reply where an MXAccess HRESULT failure is silently embedded with no exception. This is by desi… |
| Contracts-001 | Low | Design-document adherence | `docs/Grpc.md:13` (and `:3`, `:32`, `:39`) | `mxaccess_gateway.proto` now declares six RPCs on `MxAccessGateway` (`OpenSession`, `CloseSession`, `Invoke`, `StreamEvents`, `AcknowledgeAlarm`, `QueryActiveAlarms`). `docs/Grpc.md` still describes "the four `MxAccessGateway` RPCs" in its… |
| Contracts-003 | Low | Code organization & conventions | `src/MxGateway.Contracts/MxGateway.Contracts.csproj:10` | The `<Protobuf>` item for `mxaccess_worker.proto` omits `ProtoRoot="Protos"`, while the items for `mxaccess_gateway.proto` (line 9) and `galaxy_repository.proto` (line 11) both set it. `mxaccess_worker.proto` does `import "mxaccess_gateway… |
| Contracts-004 | Low | Documentation & comments | `src/MxGateway.Contracts/GatewayContractInfo.cs:3-6` | The XML summary says the class exposes version metadata "before generated protobuf contracts are introduced." Generated protobuf contracts have long been introduced and are consumed across the solution. The comment is stale; the class now… |
| Contracts-005 | Low | mxaccessgw conventions | `src/MxGateway.Contracts/Protos/mxaccess_gateway.proto`, `src/MxGateway.Contracts/Protos/mxaccess_worker.proto` | The ProtobufStyleGuide mandates reserving removed field numbers / enum values. Evolution to date has been purely additive, so this is not a current violation — but none of the `.proto` files contain any `reserved` declarations, leaving no… |
| Contracts-006 | Low | Correctness & logic bugs | `src/MxGateway.Contracts/Protos/mxaccess_gateway.proto:647` | `MxStatusProxy.success` is declared `int32 success = 1` with no comment. The name reads like a boolean flag but the type is a 32-bit integer (mirroring MXAccess `MXSTATUS_PROXY`, which stores a numeric success/HResult-like value). Without… |
| Contracts-007 | Low | Testing coverage | `src/MxGateway.Tests/Contracts/ProtobufContractRoundTripTests.cs` | `ProtobufContractRoundTripTests` covers gateway command/reply/event, alarm transition, alarm ack request/reply, active-alarm snapshot, and the worker envelope. It has no coverage for: (a) any `galaxy_repository.proto` message (`DiscoverHie… |
| Contracts-008 | Low | Design-document adherence | `src/MxGateway.Contracts/Protos/mxaccess_gateway.proto:451-459`, `:627-636` | The worker-side `AcknowledgeAlarmReplyPayload` carries the alarm-ack outcome as `int32 native_status`, while the public `AcknowledgeAlarmReply` carries it as `MxStatusProxy status` plus `optional int32 hresult`. The comment explains the wo… |
| IntegrationTests-007 | Low | Concurrency & thread safety | `src/MxGateway.IntegrationTests/WorkerLiveMxAccessSmokeTests.cs:20`, `src/MxGateway.IntegrationTests/Galaxy/GalaxyRepositoryLiveTests.cs:5`, `src/MxGateway.IntegrationTests/DashboardLdapLiveTests.cs:9` | The live test classes contend for genuinely shared singletons — one MXAccess COM provider, one ZB SQL database, one GLAuth instance with a 3-fail/10-minute per-IP lockout. No `[Collection]` annotation or `DisableTestParallelization` is dec… |
| IntegrationTests-008 | Low | Code organization & conventions | `src/MxGateway.IntegrationTests/LiveLdapFactAttribute.cs`, `src/MxGateway.IntegrationTests/Galaxy/LiveGalaxyRepositoryFactAttribute.cs`, `src/MxGateway.IntegrationTests/LiveMxAccessFactAttribute.cs` | Three near-identical fact attributes each re-implement the same "compare env var to `1` with `StringComparison.Ordinal`, set `Skip` otherwise" pattern. `LiveMxAccessFactAttribute` delegates to `IntegrationTestEnvironment` while the other t… |
| IntegrationTests-009 | Low | Documentation & comments | `src/MxGateway.IntegrationTests/WorkerLiveMxAccessSmokeTests.cs:372-375` | `TestServerCallContext` is XML-documented as a "Mock server call context," but it is a hand-written stub/fake with no mocking framework and no verification behavior. Per the style guides (accurate naming; explain why not what), calling it… |
| IntegrationTests-010 | Low | Correctness & logic bugs | `src/MxGateway.IntegrationTests/WorkerLiveMxAccessSmokeTests.cs:366-369` | `WaitForFirstMessageAsync` accepts only a `timeout` and never observes a `CancellationToken`. There is no per-test cancellation propagation, so if the gateway/worker hangs without writing an event the test relies solely on the 15s `WaitAsy… |
| Server-007 | Low | Performance & resource management | `src/MxGateway.Server/Galaxy/GalaxyHierarchyProjector.cs:55-70` | `Project` always iterates the full `entry.Index.ObjectViews` collection and re-applies all filters to skip `offset` matched items before collecting a page. Paging through a large Galaxy hierarchy is therefore O(total) per page and O(total²… |
| Server-008 | Low | Performance & resource management | `src/MxGateway.Server/Grpc/GalaxyRepositoryGrpcService.cs:111-134,160-189` | `WatchDeployEvents` calls `ResolveBrowseSubtrees()` on every streamed event, and `MapDeployEvent` re-runs `GalaxyHierarchyProjector.Project` over the entire cached hierarchy (and `Sum`s attribute counts) for every event of every constraine… |
| Server-009 | Low | Error handling & resilience | `src/MxGateway.Server/Security/Authentication/AuthSqliteConnectionFactory.cs:15-32` | Each auth-store operation opens a fresh `SqliteConnection` with no busy timeout, no WAL journal mode, and default journaling. `MarkKeyUsedAsync` runs on every authenticated request and `SqliteApiKeyAuditStore` appends on every denial; unde… |
| Server-010 | Low | Security | `src/MxGateway.Server/Security/Authentication/SqliteApiKeyAdminStore.cs:91-114`, `src/MxGateway.Server/Dashboard/Components/Pages/ApiKeysPage.razor:168-172` | `RotateAsync` sets `revoked_utc = NULL`, so rotating a previously revoked key silently reactivates it. This is documented intentional behavior in `docs/Authentication.md:167`, but the dashboard renders the "Rotate" button unconditionally —… |
| Server-011 | Low | Code organization & conventions | `src/MxGateway.Server/Sessions/WorkerAlarmRpcDispatcher.cs:1-46` | `WorkerAlarmRpcDispatcher` deviates from the module's conventions: it fully-qualifies `System.Guid`, `System.ArgumentNullException`, and `System.Threading` types inline instead of relying on `using` directives, and uses an explicit constru… |
| Server-012 | Low | Documentation & comments | `CLAUDE.md` (Authentication section and `apikey create` example) | CLAUDE.md describes scopes as `session`, `invoke`, `event`, `metadata`, `admin` and shows `apikey create --scopes session,invoke,event,metadata,admin`. The actual canonical scope strings (used by `GatewayScopes`, `GatewayGrpcScopeResolver`… |
| Server-013 | Low | Testing coverage | `src/MxGateway.Tests/Gateway/Dashboard/DashboardAuthorizationHandlerTests.cs`, `src/MxGateway.Tests/Gateway/GatewayApplicationTests.cs` | `DashboardAuthorizationHandler` is unit-tested in isolation, but no test exercises the dashboard routes end-to-end to confirm the policy is actually enforced — which is why Server-001 (policy registered but never wired) went uncaught. Ther… |
| Server-014 | Low | Documentation & comments | `src/MxGateway.Server/Grpc/MxAccessGatewayService.cs:162-171,191-198,206-214,229-237` | The XML `<remarks>` and inline comments on `AcknowledgeAlarm` and `QueryActiveAlarms` describe the alarm path as not yet wired and say `NotWiredAlarmRpcDispatcher` is the default ("Clients calling this method today receive an OK reply with… |
| Tests-007 | Low | Code organization & conventions | `src/MxGateway.Tests/Gateway/Grpc/MxAccessGatewayServiceTests.cs:682`, `src/MxGateway.Tests/Gateway/Grpc/GalaxyRepositoryGrpcServiceTests.cs:324`, `src/MxGateway.Tests/Gateway/GatewayEndToEndFakeWorkerSmokeTests.cs:460`, `src/MxGateway.Tests/Security/Authorization/GatewayGrpcAuthorizationInterceptorTests.cs:233` | A near-identical `TestServerCallContext` implementation is copy-pasted into at least four test files (and `AllowAllConstraintEnforcer` / `TestServerStreamWriter` / `RecordingStreamWriter` into several). Duplication risks the copies driftin… |
| Tests-008 | Low | mxaccessgw conventions | `src/MxGateway.Tests/Gateway/Sessions/WorkerAlarmRpcDispatcherTests.cs:1-9`, `src/MxGateway.Tests/Gateway/Sessions/NotWiredAlarmRpcDispatcherTests.cs:1-3`, `src/MxGateway.Tests/Gateway/Sessions/SessionManagerAlarmAutoSubscribeTests.cs:1` | The alarm test files diverge from the project's C# style and the rest of the suite: snake_case test method names instead of the PascalCase `Method_Condition_Result` pattern; redundant explicit `using System;`/`System.Threading;` imports de… |
| Tests-009 | Low | Documentation & comments | `src/MxGateway.Tests/Gateway/Sessions/SessionManagerTests.cs:36-37,99,365` | Several XML `<summary>` comments are copy-paste mismatches: the comment above `OpenSessionAsync_SetsInitialDefaultLease` describes correlation-ID generation; the comment above `GatewaySessionSubscribeBulkAsync_ForwardsOneBulkCommand…` desc… |
| Tests-010 | Low | Security | `src/MxGateway.Tests/Gateway/Dashboard/DashboardAuthorizationHandlerTests.cs:26-36` | The anonymous-localhost bypass is tested only for the success case (`allowAnonymousLocalhost: true` + loopback succeeds) and the remote-unauthenticated denial. There is no test for the security-critical negatives: anonymous + loopback when… |
| Tests-011 | Low | Correctness & logic bugs | `src/MxGateway.Tests/Gateway/GatewayEndToEndFakeWorkerSmokeTests.cs:233-301` | `GatewayEndToEndFakeWorkerSmokeTests` correctly stores and awaits `launcher.WorkerTask`, but `SessionWorkerClientFactoryFakeWorkerTests` uses `_ = RunWorkerAsync(...)` with no stored task (lines 152, 184, 220). An unhandled exception in th… |
| Tests-012 | Low | Concurrency & thread safety | `src/MxGateway.Tests/Gateway/Workers/Fakes/FakeWorkerHarness.cs:62`, `src/MxGateway.Tests/Gateway/Workers/WorkerClientTests.cs:472` | Pipe names are uniquified per test with a GUID (good), but xUnit runs test classes in parallel by default and there is no `xunit.runner.json` or collection configuration. Tests that build a full `WebApplication` bind ephemeral ports (`--ur… |
| Worker-009 | Low | Performance & resource management | `src/MxGateway.Worker/Ipc/WorkerFrameReader.cs:31,49`, `src/MxGateway.Worker/Ipc/WorkerFrameWriter.cs:57-58` | Every frame read allocates a fresh 4-byte length buffer and a payload `byte[]`; every write allocates `ToByteArray()` plus a 4-byte prefix. On the hot event-drain path (batches of up to 128 `WorkerEvent` frames every 25 ms) this produces s… |
| Worker-010 | Low | Correctness & logic bugs | `src/MxGateway.Worker/Conversion/VariantConverter.cs:204-226` | `ConvertInt64Scalar` is reached for `TypeCode.UInt32` and `TypeCode.Int64`. For a `uint` with `expectedDataType == MxDataType.Time`, the value is treated as a Windows `FILETIME` via `DateTime.FromFileTimeUtc(longValue)`; a 32-bit FILETIME… |
| Worker-011 | Low | Correctness & logic bugs | `src/MxGateway.Worker/Ipc/WorkerPipeClient.cs:169-171` | `retryAttempts` is computed as `(connectTimeout / min(connectTimeout, attemptTimeout)) - 1`. With defaults (30000 / 2000) this yields 14 retries, but each retry also incurs Polly exponential backoff. The overall `connectDeadline` (`CancelA… |
| Worker-012 | Low | Documentation & comments | `src/MxGateway.Worker/MxAccess/MxAccessAlarmEventSink.cs:44-55`, `src/MxGateway.Worker/MxAccess/WnWrapAlarmConsumer.cs:38-43`, `src/MxGateway.Worker/MxAccess/MxAccessEventMapper.cs:106-112` | Multiple comments describe the alarm path as not-yet-wired future work ("PR A.2 — COM-side subscription scaffold … the worker advertises no alarm subscription", "the worker bootstrap will gain a thin 'run-on-STA' wrapper as part of A.3").… |
| Worker-013 | Low | Testing coverage | `src/MxGateway.Worker/Sta/StaMessagePump.cs` | `StaMessagePump` — the heart of COM event delivery (`MsgWaitForMultipleObjectsEx` + `PeekMessage`/`DispatchMessage`) — has no direct unit tests. `StaRuntimeTests` exercises it indirectly for command wake-up but never verifies that a posted… |
| Worker-014 | Low | Code organization & conventions | `src/MxGateway.Worker/MxAccess/AlarmCommandHandler.cs:33`, `:202` | The file declares two public types — the `AlarmCommandHandler` class and the `IAlarmCommandHandler` interface. The C# style guide and the rest of the module follow one-public-type-per-file (e.g. interfaces in their own `I*.cs` files like `… |
| Worker-015 | Low | Correctness & logic bugs | `src/MxGateway.Worker/MxAccess/MxAccessEventQueue.cs:115-145` | On overflow, `Enqueue` records the overflow fault and throws `MxAccessEventQueueOverflowException`; `MxAccessBaseEventSink.EnqueueEvent` catches it and calls `RecordFault` again. `RecordFault` is a no-op when a fault already exists, so the… |
| Worker.Tests-008 | Low | Documentation & comments | `src/MxGateway.Worker.Tests/Conversion/VariantConverterTests.cs:175-182` | `Redactor_WithCredentialBearingValueFields_RedactsBeforeLogging` lives in `VariantConverterTests` but asserts on `WorkerLogRedactor.RedactValue`, which has nothing to do with `VariantConverter`. It is also a near-duplicate of coverage in `… |
| Worker.Tests-009 | Low | Code organization & conventions | `src/MxGateway.Worker.Tests/MxAccess/AlarmCommandHandlerTests.cs`, `AlarmDispatcherTests.cs`, `AlarmCommandExecutorTests.cs`, `AlarmRecordTransitionMapperTests.cs`, `WnWrapAlarmConsumerXmlTests.cs` | The alarm-related test files use `snake_case` method names while the rest of the project uses the `Method_State_Result` PascalCase convention. `docs/style-guides/CSharpStyleGuide.md` and the surrounding code establish PascalCase as the pro… |
| Worker.Tests-010 | Low | Correctness & logic bugs | `src/MxGateway.Worker.Tests/MxAccess/MxAccessStaSessionTests.cs:230-258` | `StartAsync_WithoutAlarmCommandHandlerFactory_SubscribeAlarmsReturnsInvalidRequest` asserts `Assert.Contains("alarm", reply.DiagnosticMessage, StringComparison.OrdinalIgnoreCase)`. The XML doc claims it verifies the diagnostic says "alarm… |
| Worker.Tests-011 | Low | Documentation & comments | `src/MxGateway.Worker.Tests/Sta/StaCommandDispatcherTests.cs:92-112` | `DispatchAsync_WhenCanceledAfterExecutionStarts_StillReturnsLateReply` is named and documented as if it proves cancellation arrived after execution began. The test does `Started.Wait(...)` then `cancellation.Cancel()`, which proves executi… |
| Worker.Tests-012 | Low | Testing coverage | `src/MxGateway.Worker.Tests/Ipc/WorkerFrameProtocolTests.cs` | `docs/WorkerFrameProtocol.md` states the reader "rejects zero-length payloads and payloads larger than the configured maximum (default 16 MiB) before allocating the payload buffer." `WorkerFrameProtocolTests` covers malformed-length, wrong… |
| Worker.Tests-013 | Low | Concurrency & thread safety | `src/MxGateway.Worker.Tests/Ipc/WorkerPipeSessionTests.cs:539-546` | `ThrowIfCompletedAsync` does an unconditional `await Task.Delay(TimeSpan.FromMilliseconds(100))` then checks `task.IsCompleted`. This adds a fixed 100 ms to the test and only catches a `RunAsync` that fails within that arbitrary window; a… |
| Worker.Tests-014 | Low | Code organization & conventions | `src/MxGateway.Worker.Tests/Ipc/WorkerPipeClientTests.cs:194`, `WorkerPipeSessionTests.cs:622`, `Sta/StaCommandDispatcherTests.cs:348`, `MxAccess/MxAccessStaSessionTests.cs:334`, `MxAccess/MxAccessCommandExecutorTests.cs:1124` | `FakeRuntimeSession`, `NoopComApartmentInitializer`, `NoopEventSink`/`NullEventSink`, and the `CreateFrame`/`WriteUInt32LittleEndian` helpers are re-implemented independently in multiple test files. The two `FakeRuntimeSession` implementat… |
| Worker.Tests-015 | Low | Testing coverage | `src/MxGateway.Worker.Tests/MxAccess/MxAccessEventQueueTests.cs` | `MxAccessEventQueueTests` covers monotonic sequencing, drain, capacity overflow, and first-fault-wins, but does not cover `Drain` with `maxEvents: 0` (drain-all) — a branch `FakeRuntimeSession.DrainEvents` even special-cases — nor draining… |
## Closed findings
Findings with status `Resolved`, `Won't Fix`, or `Deferred`.
| ID | Severity | Status | Category | Location |
|---|---|---|---|---|
| Server-001 | Critical | Resolved | Security | `src/MxGateway.Server/GatewayApplication.cs:147-149`, `src/MxGateway.Server/Dashboard/DashboardEndpointRouteBuilderExtensions.cs:55-58`, `src/MxGateway.Server/Dashboard/Components/Routes.razor:1-15` |
| Client.Go-001 | High | Resolved | Correctness & logic bugs | `clients/go/mxgateway/errors.go:88-93`, `clients/go/mxgateway/errors.go:117-128` |
| Client.Rust-001 | High | Resolved | mxaccessgw conventions | `clients/rust/src/options.rs:98,143` |
| Client.Rust-002 | High | Resolved | mxaccessgw conventions | `clients/rust/src/session.rs:522` |
| Client.Rust-003 | High | Resolved | Correctness & logic bugs | `clients/rust/crates/mxgw-cli/src/main.rs:1051` |
| Client.Rust-012 | High | Resolved | mxaccessgw conventions | `clients/rust/src/galaxy.rs:282` |
| IntegrationTests-001 | High | Resolved | Design-document adherence | `src/MxGateway.IntegrationTests/Galaxy/LiveGalaxyRepositoryFactAttribute.cs:7`, `src/MxGateway.IntegrationTests/Galaxy/GalaxyRepositoryLiveTests.cs` |
| IntegrationTests-002 | High | Resolved | Design-document adherence | `src/MxGateway.IntegrationTests/DashboardLdapLiveTests.cs:13`, `src/MxGateway.Server/Configuration/LdapOptions.cs:27` |
| Server-003 | High | Resolved | Security | `src/MxGateway.Server/Dashboard/DashboardAuthorizationHandler.cs:39,54-59`, `src/MxGateway.Server/Dashboard/DashboardAuthenticator.cs:236-258` |
| Tests-001 | High | Resolved | Testing coverage | `src/MxGateway.Tests/Gateway/Grpc/MxAccessGatewayServiceTests.cs:483-489` |
| Tests-002 | High | Resolved | Security | `src/MxGateway.Tests/Gateway/Grpc/GalaxyRepositoryGrpcServiceTests.cs:198-210` |
| Worker-001 | High | Resolved | Concurrency & thread safety | `src/MxGateway.Worker/MxAccess/WnWrapAlarmConsumer.cs:204-207` |
| Worker-002 | High | Resolved | Correctness & logic bugs | `src/MxGateway.Worker/Ipc/WorkerPipeSession.cs:545-549` |
| Worker-003 | High | Resolved | Correctness & logic bugs | `src/MxGateway.Worker/Ipc/WorkerPipeSession.cs:399-403`, `:416-419` |
| Worker.Tests-001 | High | Resolved | Testing coverage | `src/MxGateway.Worker.Tests/Sta/` (no `StaMessagePumpTests.cs`) |
| Worker.Tests-002 | High | Resolved | Testing coverage | `src/MxGateway.Worker.Tests/MxAccess/MxAccessStaSessionTests.cs`, `src/MxGateway.Worker.Tests/MxAccess/MxAccessEventMapperTests.cs` |
| Client.Dotnet-001 | Medium | Resolved | Error handling & resilience | `clients/dotnet/MxGateway.Client/GrpcMxGatewayClientTransport.cs:190-199`, `clients/dotnet/MxGateway.Client/GrpcGalaxyRepositoryClientTransport.cs:131-140` |
| Client.Dotnet-002 | Medium | Resolved | Testing coverage | `clients/dotnet/MxGateway.Client.Tests/FakeGatewayTransport.cs:145-148`, `clients/dotnet/MxGateway.Client.Tests/MxGatewayClientSessionTests.cs:236-256` |
| Client.Dotnet-003 | Medium | Resolved | Concurrency & thread safety | `clients/dotnet/MxGateway.Client/MxGatewaySession.cs:659-663`, `clients/dotnet/MxGateway.Client/MxGatewayClient.cs:230-240` |
| Client.Go-002 | Medium | Resolved | Error handling & resilience | `clients/go/mxgateway/session.go:440-516` |
| Client.Go-003 | Medium | Resolved | Correctness & logic bugs | `clients/go/cmd/mxgw-go/main.go:517-532` |
| Client.Java-001 | Medium | Resolved | Security | `clients/java/mxgateway-client/src/main/java/com/dohertylan/mxgateway/client/MxGatewaySecrets.java:30-32` |
| Client.Java-002 | Medium | Resolved | Concurrency & thread safety | `clients/java/mxgateway-client/src/main/java/com/dohertylan/mxgateway/client/MxEventStream.java:31,66-92` |
| Client.Java-003 | Medium | Resolved | mxaccessgw conventions | `clients/java/mxgateway-client/src/main/java/com/dohertylan/mxgateway/client/MxGatewayClient.java:119-140` |
| Client.Java-004 | Medium | Resolved | Correctness & logic bugs | `clients/java/mxgateway-client/src/main/java/com/dohertylan/mxgateway/client/MxGatewaySession.java:114-120,157-163,191-197` |
| Client.Java-005 | Medium | Resolved | Error handling & resilience | `clients/java/mxgateway-client/src/main/java/com/dohertylan/mxgateway/client/MxGatewaySession.java:92-105` |
| Client.Python-003 | Medium | Resolved | Error handling & resilience | `clients/python/src/mxgateway/client.py:125-137,155-173` |
| Client.Python-005 | Medium | Resolved | Performance & resource management | `clients/python/src/mxgateway/galaxy.py:117-140` |
| Client.Python-009 | Medium | Resolved | Testing coverage | `clients/python/tests/` |
| Client.Rust-005 | Medium | Resolved | Correctness & logic bugs | `clients/rust/src/session.rs:489-520` |
| Client.Rust-006 | Medium | Resolved | Error handling & resilience | `clients/rust/src/session.rs:531-555` |
| IntegrationTests-003 | Medium | Resolved | Correctness & logic bugs | `src/MxGateway.IntegrationTests/WorkerLiveMxAccessSmokeTests.cs:89-97` |
| IntegrationTests-004 | Medium | Resolved | Error handling & resilience | `src/MxGateway.IntegrationTests/WorkerLiveMxAccessSmokeTests.cs:108-111` |
| IntegrationTests-005 | Medium | Resolved | Testing coverage | `src/MxGateway.IntegrationTests/WorkerLiveMxAccessSmokeTests.cs` |
| IntegrationTests-006 | Medium | Resolved | Testing coverage | `src/MxGateway.IntegrationTests/DashboardLdapLiveTests.cs` |
| Server-002 | Medium | Resolved | Design-document adherence | `src/MxGateway.Server/Program.cs:24`, `src/MxGateway.Server/GatewayApplication.cs` |
| Server-004 | Medium | Resolved | Code organization & conventions | `src/MxGateway.Server/Security/Authentication/ApiKeyAdminCommandLineParser.cs:227-233`, `src/MxGateway.Server/Security/Authentication/ApiKeyAdminCliRunner.cs:53-77`, `src/MxGateway.Server/Dashboard/DashboardApiKeyManagementService.cs:21-67` |
| Server-005 | Medium | Resolved | Error handling & resilience | `src/MxGateway.Server/Galaxy/GalaxyHierarchyRefreshService.cs:22-28`, `src/MxGateway.Server/Galaxy/GalaxyHierarchyCache.cs:184` |
| Server-006 | Medium | Resolved | Correctness & logic bugs | `src/MxGateway.Server/Sessions/SessionManager.cs:84-114` |
| Tests-003 | Medium | Resolved | Performance & resource management | `src/MxGateway.Tests/Security/Authentication/SqliteAuthStoreTests.cs:170-176`, `src/MxGateway.Tests/Security/Authentication/ApiKeyAdminCliRunnerTests.cs:252-258` |
| Tests-004 | Medium | Resolved | Testing coverage | `src/MxGateway.Tests/Security/Authorization/GatewayGrpcAuthorizationInterceptorTests.cs` |
| Tests-005 | Medium | Resolved | Testing coverage | `src/MxGateway.Tests/Gateway/Grpc/EventStreamServiceTests.cs:239-261`, `src/MxGateway.Tests/Gateway/Sessions/SessionManagerTests.cs` |
| Tests-006 | Medium | Resolved | Concurrency & thread safety | `src/MxGateway.Tests/Gateway/Workers/WorkerClientTests.cs:76`, `src/MxGateway.Tests/Gateway/Workers/FakeWorkerHarnessTests.cs:122` |
| Worker-004 | Medium | Resolved | Correctness & logic bugs | `src/MxGateway.Worker/Ipc/WorkerPipeSession.cs:565-588` |
| Worker-005 | Medium | Resolved | Error handling & resilience | `src/MxGateway.Worker/MxAccess/MxAccessStaSession.cs:205-258` (production alarm poll loop) |
| Worker-006 | Medium | Resolved | Correctness & logic bugs | `src/MxGateway.Worker/Ipc/WorkerPipeSession.cs:117-124`, `src/MxGateway.Worker/MxAccess/MxAccessStaSession.cs:386-491` |
| Worker-007 | Medium | Resolved | mxaccessgw conventions | `src/MxGateway.Worker/MxAccess/MxAccessComServer.cs:130-150` |
| Worker-008 | Medium | Resolved | Concurrency & thread safety | `src/MxGateway.Worker/MxAccess/MxAccessStaSession.cs:205-249`, `:429-447` |
| Worker.Tests-003 | Medium | Resolved | Concurrency & thread safety | `src/MxGateway.Worker.Tests/Sta/StaRuntimeTests.cs:46-48` |
| Worker.Tests-004 | Medium | Resolved | Concurrency & thread safety | `src/MxGateway.Worker.Tests/MxAccess/MxAccessStaSessionTests.cs:281-329` |
| Worker.Tests-005 | Medium | Resolved | Performance & resource management | `src/MxGateway.Worker.Tests/Ipc/WorkerFrameProtocolTests.cs:20-31,103-105`, `src/MxGateway.Worker.Tests/Ipc/WorkerPipeSessionTests.cs:28-31` |
| Worker.Tests-006 | Medium | Resolved | Performance & resource management | `src/MxGateway.Worker.Tests/MxAccess/MxAccessStaSessionTests.cs:282,305,315,323` |
| Worker.Tests-007 | Medium | Resolved | Design-document adherence | `docs/WorkerFrameProtocol.md:38-49` |
| Client.Rust-004 | Low | Resolved | Documentation & comments | `clients/rust/src/version.rs:7` |
| Client.Rust-007 | Low | Resolved | Design-document adherence | `clients/rust/RustClientDesign.md:14-55` |
| Client.Rust-008 | Low | Resolved | Performance & resource management | `clients/rust/src/value.rs:161-261` |
| Client.Rust-009 | Low | Resolved | Testing coverage | `clients/rust/tests/client_behavior.rs`, `clients/rust/src/galaxy.rs` |
| Client.Rust-010 | Low | Resolved | Error handling & resilience | `clients/rust/src/client.rs:255-268`, `clients/rust/src/galaxy.rs:204-216` |
| Client.Rust-011 | Low | Resolved | mxaccessgw conventions | `clients/rust/src/session.rs:469` |
+237
View File
@@ -0,0 +1,237 @@
# Code Review — Server
| Field | Value |
|---|---|
| Module | `src/MxGateway.Server` |
| Reviewer | Claude Code |
| Review date | 2026-05-18 |
| Commit reviewed | `6c64030` |
| Status | Reviewed |
| Open findings | 8 |
## Checklist coverage
| # | Category | Result |
|---|---|---|
| 1 | Correctness & logic bugs | Issues found: Server-006 (metrics open-session leak on alarm auto-subscribe failure), Server-010 (rotate reactivates revoked keys). |
| 2 | mxaccessgw conventions | Issues found: Server-002 (orphan-worker termination on startup not implemented), Server-011 (style deviation in `WorkerAlarmRpcDispatcher`). |
| 3 | Concurrency & thread safety | No issues found — locking is correct; inconsistent-but-safe discipline in `GatewayMetrics` noted only. |
| 4 | Error handling & resilience | Issues found: Server-005 (Galaxy first-load can fault the host BackgroundService), Server-009 (SQLite has no busy-timeout/WAL under concurrent writes). |
| 5 | Security | Issues found: Server-001 (Critical: dashboard authorization never enforced on any route), Server-003 (LDAP dashboard users denied for lack of a scope claim), Server-010. |
| 6 | Performance & resource management | Issues found: Server-007 (DiscoverHierarchy paging is O(total) per page), Server-008 (WatchDeployEvents re-projects whole hierarchy per event). |
| 7 | Design-document adherence | Issues found: Server-002 (orphan workers), Server-012 (CLAUDE.md scope names stale vs code/docs). |
| 8 | Code organization & conventions | Issues found: Server-011 (style), Server-004 (CLI accepts unvalidated scope strings). |
| 9 | Testing coverage | Issues found: Server-013 (no dashboard route-level authorization test; `WorkerExecutableValidator`, `GalaxyGlobMatcher`, projector paging untested). |
| 10 | Documentation & comments | Issues found: Server-014 (stale "not yet wired" alarm comments), Server-012. |
## Findings
### Server-001
| Field | Value |
|---|---|
| Severity | Critical |
| Category | Security |
| Location | `src/MxGateway.Server/GatewayApplication.cs:147-149`, `src/MxGateway.Server/Dashboard/DashboardEndpointRouteBuilderExtensions.cs:55-58`, `src/MxGateway.Server/Dashboard/Components/Routes.razor:1-15` |
| Status | Resolved |
**Description:** The dashboard authorization policy (`DashboardAuthenticationDefaults.AuthorizationPolicy`), `DashboardAuthorizationRequirement`, and `DashboardAuthorizationHandler` are registered in DI but never applied to any endpoint. `MapRazorComponents<App>()` has no `.RequireAuthorization(...)`, the `<Router>` in `Routes.razor` uses plain `RouteView` (not `AuthorizeRouteView`), and no dashboard page carries `[Authorize]` — a module-wide grep finds zero `RequireAuthorization`/`[Authorize]`/`AuthorizeRouteView` usages. Every dashboard page (Sessions, Workers, Events, Galaxy, Settings, and the API Keys list exposing key IDs, scopes, and constraints) is reachable by any unauthenticated remote client regardless of `Dashboard:AllowAnonymousLocalhost` or `Dashboard:RequireAdminScope`. Only the API-key mutation operations remain protected, via the separate `DashboardApiKeyManagementService.CanManage` check.
**Recommendation:** Apply the policy at the route level — `endpoints.MapRazorComponents<App>().AddInteractiveServerRenderMode().RequireAuthorization(DashboardAuthenticationDefaults.AuthorizationPolicy)` — and/or switch `Routes.razor` to `AuthorizeRouteView` with a `[Authorize]` fallback policy plus a `NotAuthorized` redirect to the login page. Add an integration test that GETs a dashboard page anonymously and asserts 302-to-login / 401.
**Resolution:** Resolved in `a8aafdf` (2026-05-18): `MapRazorComponents<App>()` now calls `.RequireAuthorization(DashboardAuthenticationDefaults.AuthorizationPolicy)`, so an unauthenticated request to any dashboard component route is challenged by the cookie scheme and redirected to the login page. `GatewayApplicationTests` gained `ComponentRoutesRequireAuthorization` (component routes carry the policy) and `AuthEndpointsAllowAnonymousAccess`, replacing the prior test that asserted the insecure behavior.
### Server-002
| Field | Value |
|---|---|
| Severity | Medium |
| Category | Design-document adherence |
| Location | `src/MxGateway.Server/Program.cs:24`, `src/MxGateway.Server/GatewayApplication.cs` |
| Status | Resolved |
**Description:** `gateway.md:583` and CLAUDE.md state the first version "terminates orphaned workers on startup." No code in MxGateway.Server enumerates or kills leftover `MxGateway.Worker.exe` processes at startup — a grep for `orphan`/`reattach`/`terminate` finds nothing. After an unclean gateway crash, x86 worker processes (each holding an MXAccess COM instance) leak and survive indefinitely, and a restarted gateway does not reclaim or kill them.
**Recommendation:** Add a startup hosted service that finds and kills stale worker processes (by executable path / a well-known argument or environment marker) before the server accepts sessions, or update the design docs if reattachment/cleanup is deliberately deferred.
**Resolution:** Resolved 2026-05-18. Confirmed against source: no code path enumerated or killed leftover workers. Added `IRunningProcessInspector` / `SystemRunningProcessInspector` (a testable seam over `Process.GetProcessesByName`/`Kill`), `OrphanWorkerTerminator` (kills processes matched by the configured worker executable path, or by image name when the x64 gateway cannot introspect the x86 worker's `MainModule`, skipping the current process and tolerating per-process kill failures), and `OrphanWorkerCleanupHostedService` (best-effort `IHostedService`). The hosted service is registered in `AddWorkerProcessLauncher` ahead of `AddGatewaySessions` so cleanup runs before the server accepts sessions. `gateway.md` updated to describe the implemented behavior. Regression tests: `OrphanWorkerTerminatorTests` (`KillsWorkerProcessesMatchingConfiguredExecutablePath`, `KillsImageNameMatchWhenExecutablePathUnreadable`, `DoesNotKillUnrelatedProcessSharingImageName`, `DoesNotKillCurrentProcess`, `ContinuesWhenOneKillThrows`).
### Server-003
| Field | Value |
|---|---|
| Severity | High |
| Category | Security |
| Location | `src/MxGateway.Server/Dashboard/DashboardAuthorizationHandler.cs:39,54-59`, `src/MxGateway.Server/Dashboard/DashboardAuthenticator.cs:236-258` |
| Status | Resolved |
**Description:** When `Dashboard:RequireAdminScope` is true (the default) and the request is not loopback, `DashboardAuthorizationHandler` succeeds only if `HasAdminScope` finds a claim of type `"scope"` with value `"admin"`. But `DashboardAuthenticator.CreatePrincipal` issues only `NameIdentifier`, `Name`, and `LdapGroupClaimType` claims — never a `scope`/`admin` claim. So a correctly LDAP-authenticated user who passed the required-group check is still denied dashboard access on any non-loopback connection. The bug is currently masked by the missing route-level enforcement (Server-001) and by `AllowAnonymousLocalhost`; fixing Server-001 would make the dashboard unusable for all real LDAP logins.
**Recommendation:** Either have `DashboardAuthenticator.CreatePrincipal` add a `scope=admin` claim when the user is in the required group, or change `DashboardAuthorizationHandler.HasAdminScope` to evaluate LDAP group membership (reuse `IsMemberOfRequiredGroup` against the `LdapGroupClaimType` claims, as `DashboardApiKeyAuthorization.CanManage` already does).
**Resolution:** Resolved in `a8aafdf` (2026-05-18): `DashboardAuthenticator.CreatePrincipal` — reached only after the required-group check passes — now emits the `scope=admin` claim that `DashboardAuthorizationHandler` checks, so group-validated LDAP users pass `RequireAdminScope` once route-level authorization (Server-001) is enforced.
### Server-004
| Field | Value |
|---|---|
| Severity | Medium |
| Category | Code organization & conventions |
| Location | `src/MxGateway.Server/Security/Authentication/ApiKeyAdminCommandLineParser.cs:227-233`, `src/MxGateway.Server/Security/Authentication/ApiKeyAdminCliRunner.cs:53-77`, `src/MxGateway.Server/Dashboard/DashboardApiKeyManagementService.cs:21-67` |
| Status | Resolved |
**Description:** `ParseScopes` accepts any comma-separated strings and `CreateKeyAsync` persists them verbatim; neither the CLI nor the dashboard create path validates scopes against `GatewayScopes`. A typo or non-canonical name (e.g. CLAUDE.md's example `--scopes session,invoke,event,metadata,admin`, which does not match the resolver's `session:open`/`invoke:read`/etc.) silently creates a key whose scope strings the authorization resolver never checks for — the key is unusable for those RPCs with no error at creation time.
**Recommendation:** Validate every requested scope against the `GatewayScopes` catalog at create time in both the CLI parser/runner and `DashboardApiKeyManagementService.ValidateCreateRequest`, rejecting unknown scope strings.
**Resolution:** Resolved 2026-05-18. Confirmed against source: `ParseScopes` split unvalidated strings into the create command and `ValidateCreateRequest` checked only key id and display name. Added `GatewayScopes.All` (the canonical scope catalog) and `GatewayScopes.IsKnown(string)`. `ApiKeyAdminCommandLineParser.Parse` now runs `ValidateScopes` for create-key commands and fails the parse listing the unknown scope(s) and valid set; `DashboardApiKeyManagementService.ValidateCreateRequest` rejects requests carrying any non-canonical scope. Revoke/rotate paths are unaffected (no scope input). Regression tests: `ApiKeyAdminCommandLineParserTests.Parse_CreateKeyCommand_RejectsUnknownScope`, `Parse_CreateKeyCommand_AcceptsAllCanonicalScopes`, and `DashboardApiKeyManagementServiceTests.CreateAsync_UnknownScope_DoesNotCallStore`.
### Server-005
| Field | Value |
|---|---|
| Severity | Medium |
| Category | Error handling & resilience |
| Location | `src/MxGateway.Server/Galaxy/GalaxyHierarchyRefreshService.cs:22-28`, `src/MxGateway.Server/Galaxy/GalaxyHierarchyCache.cs:184` |
| Status | Resolved |
**Description:** `GalaxyHierarchyCache.RefreshCoreAsync` only catches `SqlException` and `InvalidOperationException`. The initial `cache.RefreshAsync` call in `GalaxyHierarchyRefreshService.ExecuteAsync` is wrapped only for `OperationCanceledException`. A transient non-`SqlException` failure on the first refresh (e.g. a `Win32Exception`/`TimeoutException` from connection establishment, or another `DbException` subtype) escapes both layers, faults the `BackgroundService`, and — with default host behavior — stops the whole gateway. The periodic-tick loop does catch general exceptions, so only the first load is exposed.
**Recommendation:** Broaden the `catch` in `RefreshCoreAsync` to all non-cancellation exceptions (record `Unavailable`/`Stale` and still complete `_firstLoad`), or wrap the initial `RefreshAsync` in `GalaxyHierarchyRefreshService` with the same general `catch` the tick loop uses.
**Resolution:** Resolved 2026-05-18. Confirmed against source: the initial `RefreshAsync` in `ExecuteAsync` was guarded only for `OperationCanceledException`, and `RefreshCoreAsync` filtered its catch to `SqlException or InvalidOperationException`. Both recommended layers applied: `GalaxyHierarchyRefreshService.ExecuteAsync` now catches every non-cancellation exception on the initial load (logs a warning; the periodic tick retries), and `GalaxyHierarchyCache.RefreshCoreAsync` broadens its catch to all non-cancellation exceptions so the cache still records `Stale`/`Unavailable` and completes `_firstLoad`. The now-unused `Microsoft.Data.SqlClient` using was removed. Regression test: `GalaxyHierarchyRefreshServiceTests.ExecuteAsync_WhenFirstRefreshThrowsNonCancellationException_DoesNotFaultBackgroundService`.
### Server-006
| Field | Value |
|---|---|
| Severity | Medium |
| Category | Correctness & logic bugs |
| Location | `src/MxGateway.Server/Sessions/SessionManager.cs:84-114` |
| Status | Resolved |
**Description:** In `OpenSessionAsync`, `_metrics.SessionOpened()` (line 89) increments the `_openSessions` gauge before `TryAutoSubscribeAlarmsAsync` runs. If auto-subscribe throws (which it does when `Alarms.RequireSubscribeOnOpen` is true and the worker rejects the subscription), the `catch` block disposes and removes the session and records `_metrics.Fault(...)` but never calls `SessionClosed`/`SessionRemoved`. The `mxgateway.sessions.open` gauge permanently over-counts by one for every such failed open.
**Recommendation:** In the `catch` block, when the session had reached the point where `SessionOpened()` was recorded, also call `_metrics.SessionRemoved()` — or move the `SessionOpened()` call to after auto-subscribe succeeds.
**Resolution:** Resolved 2026-05-18. Confirmed against source: the `catch` block in `OpenSessionAsync` recorded `Fault(...)` and removed the session but never decremented the open-session gauge after `SessionOpened()` had run. Added a `sessionOpenedRecorded` flag set immediately after `_metrics.SessionOpened()`; the `catch` block now calls `_metrics.SessionRemoved()` when that flag is set, restoring the gauge for a post-`SessionOpened()` failure (e.g. an auto-subscribe rejection with `RequireSubscribeOnOpen=true`). Regression test: `SessionManagerAlarmAutoSubscribeTests.OpenSessionAsync_DoesNotLeakOpenSessionGauge_WhenAutoSubscribeFailsWithRequireOn`.
### Server-007
| Field | Value |
|---|---|
| Severity | Low |
| Category | Performance & resource management |
| Location | `src/MxGateway.Server/Galaxy/GalaxyHierarchyProjector.cs:55-70` |
| Status | Open |
**Description:** `Project` always iterates the full `entry.Index.ObjectViews` collection and re-applies all filters to skip `offset` matched items before collecting a page. Paging through a large Galaxy hierarchy is therefore O(total) per page and O(total²/pageSize) end-to-end. The cache is in-memory so impact is bounded, but for large galaxies repeated `DiscoverHierarchy` pagination wastes CPU.
**Recommendation:** Precompute and cache the filtered, ordered view list per `(filterSignature, sequence)` so subsequent pages are an O(pageSize) slice; the existing filter signature already keys page tokens.
**Resolution:** _(open)_
### Server-008
| Field | Value |
|---|---|
| Severity | Low |
| Category | Performance & resource management |
| Location | `src/MxGateway.Server/Grpc/GalaxyRepositoryGrpcService.cs:111-134,160-189` |
| Status | Open |
**Description:** `WatchDeployEvents` calls `ResolveBrowseSubtrees()` on every streamed event, and `MapDeployEvent` re-runs `GalaxyHierarchyProjector.Project` over the entire cached hierarchy (and `Sum`s attribute counts) for every event of every constrained subscriber. `GalaxyGlobMatcher.IsMatch` also rebuilds the glob regex on each call. With many constrained subscribers and frequent deploys this is avoidable work.
**Recommendation:** Hoist `ResolveBrowseSubtrees()` out of the loop; compute scoped object/attribute counts once per deploy sequence and cache by `(sequence, browseSubtrees)`; cache compiled glob `Regex` instances in `GalaxyGlobMatcher`.
**Resolution:** _(open)_
### Server-009
| Field | Value |
|---|---|
| Severity | Low |
| Category | Error handling & resilience |
| Location | `src/MxGateway.Server/Security/Authentication/AuthSqliteConnectionFactory.cs:15-32` |
| Status | Open |
**Description:** Each auth-store operation opens a fresh `SqliteConnection` with no busy timeout, no WAL journal mode, and default journaling. `MarkKeyUsedAsync` runs on every authenticated request and `SqliteApiKeyAuditStore` appends on every denial; under concurrent load these writers can collide and surface `SQLITE_BUSY` as a hard failure on the request path.
**Recommendation:** Set `Pooling`, a non-zero `DefaultTimeout`/`busy_timeout`, and enable WAL (`PRAGMA journal_mode=WAL`) once at startup so concurrent readers/writers degrade gracefully.
**Resolution:** _(open)_
### Server-010
| Field | Value |
|---|---|
| Severity | Low |
| Category | Security |
| Location | `src/MxGateway.Server/Security/Authentication/SqliteApiKeyAdminStore.cs:91-114`, `src/MxGateway.Server/Dashboard/Components/Pages/ApiKeysPage.razor:168-172` |
| Status | Open |
**Description:** `RotateAsync` sets `revoked_utc = NULL`, so rotating a previously revoked key silently reactivates it. This is documented intentional behavior in `docs/Authentication.md:167`, but the dashboard renders the "Rotate" button unconditionally — including for keys whose status badge says "Revoked" — so an operator can un-revoke a deliberately disabled key without an explicit warning.
**Recommendation:** Either hide/disable the Rotate action for revoked keys in `ApiKeysPage.razor`, require an explicit confirmation, or have `RotateAsync` preserve `revoked_utc` and add a separate explicit "reactivate" operation.
**Resolution:** _(open)_
### Server-011
| Field | Value |
|---|---|
| Severity | Low |
| Category | Code organization & conventions |
| Location | `src/MxGateway.Server/Sessions/WorkerAlarmRpcDispatcher.cs:1-46` |
| Status | Open |
**Description:** `WorkerAlarmRpcDispatcher` deviates from the module's conventions: it fully-qualifies `System.Guid`, `System.ArgumentNullException`, and `System.Threading` types inline instead of relying on `using` directives, and uses an explicit constructor with `this.`-qualified field assignment while the rest of the module (e.g. `ConstraintEnforcer`, `MxAccessGatewayService`, `GalaxyRepositoryGrpcService`) uses primary constructors. `docs/style-guides/CSharpStyleGuide.md` is authoritative for gateway code.
**Recommendation:** Add the needed `using` directives, drop the inline fully-qualified names, and convert to a primary constructor for consistency.
**Resolution:** _(open)_
### Server-012
| Field | Value |
|---|---|
| Severity | Low |
| Category | Documentation & comments |
| Location | `CLAUDE.md` (Authentication section and `apikey create` example) |
| Status | Open |
**Description:** CLAUDE.md describes scopes as `session`, `invoke`, `event`, `metadata`, `admin` and shows `apikey create --scopes session,invoke,event,metadata,admin`. The actual canonical scope strings (used by `GatewayScopes`, `GatewayGrpcScopeResolver`, and `docs/Authorization.md`) are `session:open`, `session:close`, `invoke:read`, `invoke:write`, `invoke:secure`, `events:read`, `metadata:read`, `admin`. A key created per the CLAUDE.md example carries scopes the resolver never matches.
**Recommendation:** Update CLAUDE.md's scope list and the `apikey` example to the canonical `*:*` scope strings, per CLAUDE.md's own rule that docs change with the code.
**Resolution:** _(open)_
### Server-013
| Field | Value |
|---|---|
| Severity | Low |
| Category | Testing coverage |
| Location | `src/MxGateway.Tests/Gateway/Dashboard/DashboardAuthorizationHandlerTests.cs`, `src/MxGateway.Tests/Gateway/GatewayApplicationTests.cs` |
| Status | Open |
**Description:** `DashboardAuthorizationHandler` is unit-tested in isolation, but no test exercises the dashboard routes end-to-end to confirm the policy is actually enforced — which is why Server-001 (policy registered but never wired) went uncaught. There are also no tests for `WorkerExecutableValidator` (PE-header architecture parsing), `GalaxyGlobMatcher` (anchoring/escaping/empty-glob fail-open), or `GalaxyHierarchyProjector` pagination/page-token behavior.
**Recommendation:** Add a `WebApplicationFactory` integration test that requests a dashboard page unauthenticated and asserts the redirect/401, plus unit tests for `WorkerExecutableValidator`, `GalaxyGlobMatcher`, and projector paging.
**Resolution:** _(open)_
### Server-014
| Field | Value |
|---|---|
| Severity | Low |
| Category | Documentation & comments |
| Location | `src/MxGateway.Server/Grpc/MxAccessGatewayService.cs:162-171,191-198,206-214,229-237` |
| Status | Open |
**Description:** The XML `<remarks>` and inline comments on `AcknowledgeAlarm` and `QueryActiveAlarms` describe the alarm path as not yet wired and say `NotWiredAlarmRpcDispatcher` is the default ("Clients calling this method today receive an OK reply with a 'worker alarm path not yet wired' diagnostic", "an empty stream until PR A.2"). In fact `SessionServiceCollectionExtensions.AddGatewaySessions` registers `WorkerAlarmRpcDispatcher` as `IAlarmRpcDispatcher`, so DI always injects the production dispatcher; `NotWiredAlarmRpcDispatcher` is only the null fallback. The comments are stale and misleading.
**Recommendation:** Update the `AcknowledgeAlarm`/`QueryActiveAlarms` remarks to reflect that `WorkerAlarmRpcDispatcher` is the wired default, and describe its actual GUID-vs-`Provider!Group.Tag` handling.
**Resolution:** _(open)_
+211
View File
@@ -0,0 +1,211 @@
# Code Review — Tests
| Field | Value |
|---|---|
| Module | `src/MxGateway.Tests` |
| Reviewer | Claude Code |
| Review date | 2026-05-18 |
| Commit reviewed | `6c64030` |
| Status | Reviewed |
| Open findings | 6 |
## Checklist coverage
| # | Category | Result |
|---|---|---|
| 1 | Correctness & logic bugs | Issue found: Tests-001 (`FakeSessionManager.TryGetSession` always returns true), Tests-011 (unobserved worker task). |
| 2 | mxaccessgw conventions | FakeWorkerHarness used per docs; no real secrets; minor style drift in three alarm-test files (Tests-008). |
| 3 | Concurrency & thread safety | Issues found: Tests-006 (`Task.Delay`-based timing), Tests-012 (no parallelism guard for `WebApplication` tests). |
| 4 | Error handling & resilience | Strong — timeouts, faults, overflow, kill paths, protocol violations all exercised. No issues found. |
| 5 | Security | Issues found: Tests-002 (no SQL-injection coverage of Galaxy RPCs), Tests-010 (anonymous-localhost negative cases untested). |
| 6 | Performance & resource management | Issue found: Tests-003 (temp DB/worker directories never cleaned up). |
| 7 | Design-document adherence | Tests match `docs/GatewayTesting.md`; no drift found. No issues found. |
| 8 | Code organization & conventions | Issue found: Tests-007 (`TestServerCallContext` copy-pasted into 4+ files). |
| 9 | Testing coverage | Issues found: Tests-001, Tests-004 (no end-to-end interceptor+service test), Tests-005 (no worker-crash-mid-command coverage), Tests-002. |
| 10 | Documentation & comments | Issue found: Tests-009 (stale/mismatched XML `<summary>` comments). |
## Findings
### Tests-001
| Field | Value |
|---|---|
| Severity | High |
| Category | Testing coverage |
| Location | `src/MxGateway.Tests/Gateway/Grpc/MxAccessGatewayServiceTests.cs:483-489` |
| Status | Resolved |
**Description:** `FakeSessionManager.TryGetSession` unconditionally returns `true` and synthesizes a session for any id. As a result, `Invoke_WhenSessionMissing_ThrowsNotFound` (line 52) only passes because `InvokeException` is pre-seeded — it does not verify that the gateway service maps a genuinely missing session to `NotFound`. No test exercises the real gateway path where `TryGetSession` returns `false` (for `StreamEvents`, `CloseSession`, alarm RPCs). A regression dropping the missing-session check would not be caught.
**Recommendation:** Make `FakeSessionManager.TryGetSession` return `false` for unknown ids (return only seeded sessions), then assert `NotFound`/`InvalidArgument` is produced by the service's own lookup logic rather than an injected exception.
**Resolution:** Resolved 2026-05-18: confirmed root cause — added `ResolveOnlySeededSessions`/`SeedSession` to `FakeSessionManager` so `TryGetSession` returns `false` for unseeded ids, rewrote `Invoke_WhenSessionMissing_ThrowsNotFound` to drop the injected `InvokeException` and exercise the service's own `ResolveSession` lookup (asserts `InvokeCount == 0`), and added `Invoke_WhenSessionSeeded_ResolvesAndInvokes`, `AcknowledgeAlarm_WhenSessionMissing_ThrowsNotFound`, and `QueryActiveAlarms_WhenSessionMissing_ThrowsNotFound`.
### Tests-002
| Field | Value |
|---|---|
| Severity | High |
| Category | Security |
| Location | `src/MxGateway.Tests/Gateway/Grpc/GalaxyRepositoryGrpcServiceTests.cs:198-210` |
| Status | Resolved |
**Description:** The Galaxy Repository RPCs browse a SQL Server database (`ZB`). Every test injects a `StubGalaxyHierarchyCache`, so actual SQL query construction, parameterization, and filter/glob translation are never exercised. No test demonstrates that `TagNameGlob`, `RootTagName`, `AlarmFilterPrefix`, etc. are passed as parameters rather than concatenated into SQL. SQL-injection resistance of the Galaxy layer has zero coverage.
**Recommendation:** Add tests for the `GalaxyRepository` query-building layer (against SQLite or an in-memory abstraction, or by asserting parameter objects), covering glob/prefix inputs containing `'`, `%`, `_`, and `;`. At minimum add a unit test over the SQL `LIKE`-pattern escaping helper.
**Re-triage note:** The finding's premise is partly misframed. `GalaxyRepository` issues only four *constant* SQL statements (`HierarchySql`, `AttributesSql`, `SELECT 1`, `SELECT time_of_last_deploy FROM galaxy`) — no `DiscoverHierarchyRequest` field is ever concatenated into SQL, so there is no dynamic SQL-injection surface and no `LIKE`-escaping helper to test. `AlarmFilterPrefix` belongs to the worker alarm path, not the Galaxy SQL layer. All filters (`TagNameGlob`, `RootTagName`, template-chain, category, contained-path) are applied **in memory** by `GalaxyHierarchyProjector`/`GalaxyGlobMatcher` against the cached snapshot. The genuine, testable concern — that adversarial filter strings are treated as opaque literals (no wildcard behaviour, no ReDoS, no exceptions) — remains valid and was previously uncovered. Severity left at High: an unsafe in-memory filter would still be a real security gap.
**Resolution:** Resolved 2026-05-18: added `src/MxGateway.Tests/Galaxy/GalaxyFilterInputSafetyTests.cs` (10 test methods, mostly `[Theory]` over adversarial inputs `'`, `' OR '1'='1`, `'; DROP TABLE gobject;--`, `%`, `_`, `100%_off`, `[abc]`, `Pump'001`) covering `GalaxyGlobMatcher` literal-treatment / `LIKE`-wildcard / pathological-input (ReDoS) behaviour and `GalaxyHierarchyProjector` + `DiscoverHierarchy` RPC handling of adversarial `TagNameGlob`, `RootTagName`, and `TemplateChainContains`. No product bug found — the in-memory filter layer treats all metacharacters as literals; the passing tests resolve the coverage gap.
### Tests-003
| Field | Value |
|---|---|
| Severity | Medium |
| Category | Performance & resource management |
| Location | `src/MxGateway.Tests/Security/Authentication/SqliteAuthStoreTests.cs:170-176`, `src/MxGateway.Tests/Security/Authentication/ApiKeyAdminCliRunnerTests.cs:252-258` |
| Status | Resolved |
**Description:** `CreateTempDatabasePath` creates a fresh directory under `%TEMP%\mxgateway-auth-tests\<guid>` (and `...-cli-tests`) for every test but nothing ever deletes it. `WorkerProcessLauncherTests.TestDirectory` correctly implements `IDisposable` and cleans up; these two do not. SQLite connection pooling can also keep the `.db` handle open after the test. Over many CI runs this leaks temp files and open handles.
**Recommendation:** Wrap the temp directory in an `IDisposable`/`IAsyncDisposable` helper (as `WorkerProcessLauncherTests` does) and call `SqliteConnection.ClearAllPools()` before deletion, or use `Microsoft.Data.Sqlite` in-memory mode where a real file is not needed.
**Resolution:** Resolved 2026-05-18: confirmed root cause — both `CreateTempDatabasePath` helpers created `%TEMP%` directories with no cleanup, and `Microsoft.Data.Sqlite` pools connections by default so the `.db` handle outlives the test. Added a shared `TempDatabaseDirectory` (`src/MxGateway.Tests/Security/Authentication/TempDatabaseDirectory.cs`) `IDisposable` helper that calls `SqliteConnection.ClearAllPools()` and recursively deletes its directory. `SqliteAuthStoreTests` and `ApiKeyAdminCliRunnerTests` now implement `IDisposable`, track every directory created via `CreateTempDatabasePath`, and dispose them after each test. All affected tests still pass.
### Tests-004
| Field | Value |
|---|---|
| Severity | Medium |
| Category | Testing coverage |
| Location | `src/MxGateway.Tests/Security/Authorization/GatewayGrpcAuthorizationInterceptorTests.cs` |
| Status | Resolved |
**Description:** The authorization interceptor and `MxAccessGatewayService` are each tested in isolation, but no test composes the interceptor in front of the real service to confirm scope enforcement gates real RPCs end-to-end. A wiring mistake — interceptor not registered, or a new RPC added without a scope mapping in `GatewayGrpcScopeResolver` — would pass every existing test. `GatewayGrpcScopeResolverTests` also only checks an enumerated allow-list; it never asserts an unmapped request type fails closed.
**Recommendation:** Add an end-to-end test that runs `OpenSession`/`Invoke` through the interceptor+service composition with insufficient scope and asserts `PermissionDenied`; add a `GatewayGrpcScopeResolver` test asserting an unknown/unmapped request type throws or denies rather than returning a permissive default.
**Resolution:** Resolved 2026-05-18: confirmed the coverage gap. Added three interceptor+service composition tests to `GatewayGrpcAuthorizationInterceptorTests` that run the real `GatewayGrpcAuthorizationInterceptor` continuation into a real `MxAccessGatewayService`: `InterceptorComposedWithService_OpenSessionMissingScope_DeniesBeforeServiceRuns` (asserts `PermissionDenied` and `OpenSessionCount == 0`), `InterceptorComposedWithService_OpenSessionWithScope_RunsServiceWithIdentity` (service runs and observes the interceptor-pushed identity), and `InterceptorComposedWithService_InvokeWriteCommandWithReadScope_DeniesBeforeServiceRuns` (a `Write` command with only `invoke:read` is denied). Added two `GatewayGrpcScopeResolverTests`: `ResolveRequiredScope_UnmappedRequestType_FailsClosedToAdminScope` confirms an unmapped request type resolves to the most-restrictive `Admin` scope (the resolver's `_ => GatewayScopes.Admin` default already fails closed — no product bug), and `ResolveRequiredScope_UnknownInvokeCommandKind_ReturnsInvokeReadScope` confirms an unknown command kind does not silently grant write/admin access.
### Tests-005
| Field | Value |
|---|---|
| Severity | Medium |
| Category | Testing coverage |
| Location | `src/MxGateway.Tests/Gateway/Grpc/EventStreamServiceTests.cs:239-261`, `src/MxGateway.Tests/Gateway/Sessions/SessionManagerTests.cs` |
| Status | Resolved |
**Description:** Worker-crash handling is only tested as a clean terminal exception from `ReadEventsAsync` or a pre-set `ShutdownException`. There is no test for a worker that faults mid-command — an `InvokeAsync` in flight when the pipe/worker dies — which is a core fault-handling path of the two-process design. `WorkerClientTests` covers pipe-disconnect faulting the read loop, but not the interaction where a pending `InvokeAsync` task observes the fault and surfaces a meaningful error code.
**Recommendation:** Add a `WorkerClient`/`SessionManager` test that disposes the worker pipe (or emits a `WorkerFault`) while an `InvokeAsync` is pending, and assert the invoke task fails with a `WorkerClientException`/`SessionManagerException` carrying the worker-faulted error code.
**Resolution:** Resolved 2026-05-18: confirmed the coverage gap and confirmed the product path already handles it correctly (`WorkerClient.ReadLoopAsync``SetFaulted``CompletePendingCommands(fault)` fails every pending command with the fault exception). Added two `WorkerClientTests`: `InvokeAsync_WhenPipeDisconnectsMidCommand_FailsPendingInvokeWithPipeDisconnected` (worker reads the command then disposes its pipe side; the pending invoke task fails with `WorkerClientErrorCode.PipeDisconnected`) and `InvokeAsync_WhenWorkerFaultsMidCommand_FailsPendingInvokeWithWorkerFaulted` (worker emits a `WorkerFault` envelope while the invoke is pending; the task fails with `WorkerClientErrorCode.WorkerFaulted`). Both also assert the client transitions to `Faulted`. No product change needed.
### Tests-006
| Field | Value |
|---|---|
| Severity | Medium |
| Category | Concurrency & thread safety |
| Location | `src/MxGateway.Tests/Gateway/Workers/WorkerClientTests.cs:76`, `src/MxGateway.Tests/Gateway/Workers/FakeWorkerHarnessTests.cs:122` |
| Status | Resolved |
**Description:** Several tests rely on fixed `Task.Delay` values: `WorkerClientTests.InvokeAsync_WithLateReply…` waits a hard-coded 50 ms after writing a late reply before issuing the second command, and the heartbeat tests use a 20 ms delay to make timestamps strictly increase. On a slow CI agent the 50 ms delay can be insufficient, and `DateTimeOffset.UtcNow` resolution can make the 20 ms heartbeat-advance assertion flaky.
**Recommendation:** Replace fixed delays with the existing `WaitUntilAsync` condition polling, and inject a controllable `TimeProvider` for heartbeat-timestamp comparisons instead of relying on wall-clock advance.
**Re-triage note:** The brief flagged `ReadLoop_WhenClientFaults_KillsOwnedWorkerProcess` as "a real `WorkerClient` fault→kill bug". On inspection it is **not a product bug** — it is a test race. `WorkerClient.SetFaulted` publishes the `Faulted` state under lock *before* calling `KillOwnedProcess`, so the old test's `WaitUntilAsync(() => client.State == Faulted)` could return between those two statements and observe `process.KillCount == 0`. The kill itself always runs synchronously inside `SetFaulted`, and `ShutdownAsync`/`DisposeAsync` re-issue an idempotent kill, so no real consumer relies on "state==Faulted implies process dead". The fix is therefore a test-quality fix (correctly Medium / Concurrency), not a product fix.
**Resolution:** Resolved 2026-05-18: (1) Made `ReadLoop_WhenClientFaults_KillsOwnedWorkerProcess` deterministic — it now `await`s `FakeWorkerProcess.WaitForExitAsync` (the `TaskCompletionSource` completed inside `Kill()`), which completes exactly when the kill runs, eliminating the state-polling race; verified by running it five times in isolation (5/5 pass). (2) Removed the fixed 50 ms `Task.Delay` from `InvokeAsync_WithLateReply_IgnoresLateReplyAndKeepsClientReady` — the stale reply and the second reply are now sent in pipe (FIFO) order, so the read loop discards the stale reply before the second reply with no timing window. (3) Replaced the 20 ms `Task.Delay` heartbeat-advance hacks in `WorkerClientTests.ReadLoop_WhenHeartbeatArrives_UpdatesLastHeartbeatAndWorkerProcess` and `FakeWorkerHarnessTests.SendHeartbeatAsync_UpdatesClientHeartbeatState` with an injected `ManualTimeProvider` advanced by a fixed `TimeSpan`; both tests now assert the exact post-advance timestamp instead of `>` against wall-clock drift.
### Tests-007
| Field | Value |
|---|---|
| Severity | Low |
| Category | Code organization & conventions |
| Location | `src/MxGateway.Tests/Gateway/Grpc/MxAccessGatewayServiceTests.cs:682`, `src/MxGateway.Tests/Gateway/Grpc/GalaxyRepositoryGrpcServiceTests.cs:324`, `src/MxGateway.Tests/Gateway/GatewayEndToEndFakeWorkerSmokeTests.cs:460`, `src/MxGateway.Tests/Security/Authorization/GatewayGrpcAuthorizationInterceptorTests.cs:233` |
| Status | Open |
**Description:** A near-identical `TestServerCallContext` implementation is copy-pasted into at least four test files (and `AllowAllConstraintEnforcer` / `TestServerStreamWriter` / `RecordingStreamWriter` into several). Duplication risks the copies drifting and bloats each file.
**Recommendation:** Extract a shared `TestServerCallContext`, `RecordingServerStreamWriter<T>`, and `AllowAllConstraintEnforcer` into a common test-support folder/namespace.
**Resolution:** _(open)_
### Tests-008
| Field | Value |
|---|---|
| Severity | Low |
| Category | mxaccessgw conventions |
| Location | `src/MxGateway.Tests/Gateway/Sessions/WorkerAlarmRpcDispatcherTests.cs:1-9`, `src/MxGateway.Tests/Gateway/Sessions/NotWiredAlarmRpcDispatcherTests.cs:1-3`, `src/MxGateway.Tests/Gateway/Sessions/SessionManagerAlarmAutoSubscribeTests.cs:1` |
| Status | Open |
**Description:** The alarm test files diverge from the project's C# style and the rest of the suite: snake_case test method names instead of the PascalCase `Method_Condition_Result` pattern; redundant explicit `using System;`/`System.Threading;` imports despite implicit global usings; and explicit-type `new` instead of target-typed `new()` used elsewhere. There is also a typo in fixture data (`"wnwrap subscribe failed"`).
**Recommendation:** Rename the alarm tests to the house `Method_Condition_Result` convention, drop redundant `System.*` usings, align `new` usage, and fix the `wnwrap` typo.
**Resolution:** _(open)_
### Tests-009
| Field | Value |
|---|---|
| Severity | Low |
| Category | Documentation & comments |
| Location | `src/MxGateway.Tests/Gateway/Sessions/SessionManagerTests.cs:36-37,99,365` |
| Status | Open |
**Description:** Several XML `<summary>` comments are copy-paste mismatches: the comment above `OpenSessionAsync_SetsInitialDefaultLease` describes correlation-ID generation; the comment above `GatewaySessionSubscribeBulkAsync_ForwardsOneBulkCommand…` describes lease refresh; the comment above `CloseExpiredLeasesAsync_DoesNotCloseActiveEventSubscriber` describes shutdown closing all sessions. Misleading test docs hinder triage.
**Recommendation:** Correct the `<summary>` text to match each test's actual behavior, or remove the redundant comments since the test names already describe the behavior.
**Resolution:** _(open)_
### Tests-010
| Field | Value |
|---|---|
| Severity | Low |
| Category | Security |
| Location | `src/MxGateway.Tests/Gateway/Dashboard/DashboardAuthorizationHandlerTests.cs:26-36` |
| Status | Open |
**Description:** The anonymous-localhost bypass is tested only for the success case (`allowAnonymousLocalhost: true` + loopback succeeds) and the remote-unauthenticated denial. There is no test for the security-critical negatives: anonymous + loopback when `AllowAnonymousLocalhost` is `false` must be denied, and anonymous + non-loopback when the flag is `true` must still be denied (the bypass is scoped strictly to loopback). Those are the misconfiguration cases that would expose the dashboard.
**Recommendation:** Add tests: anonymous + loopback + `allowAnonymousLocalhost: false` → not succeeded; anonymous + non-loopback + `allowAnonymousLocalhost: true` → not succeeded.
**Resolution:** _(open)_
### Tests-011
| Field | Value |
|---|---|
| Severity | Low |
| Category | Correctness & logic bugs |
| Location | `src/MxGateway.Tests/Gateway/GatewayEndToEndFakeWorkerSmokeTests.cs:233-301` |
| Status | Open |
**Description:** `GatewayEndToEndFakeWorkerSmokeTests` correctly stores and awaits `launcher.WorkerTask`, but `SessionWorkerClientFactoryFakeWorkerTests` uses `_ = RunWorkerAsync(...)` with no stored task (lines 152, 184, 220). An unhandled exception in the scripted worker becomes an unobserved `TaskException` that can surface as a process-level failure in an unrelated later test rather than failing the owning test.
**Recommendation:** Store the worker task and either await it during disposal or attach a continuation that fails the test on fault, mirroring `GatewayEndToEndFakeWorkerSmokeTests`.
**Resolution:** _(open)_
### Tests-012
| Field | Value |
|---|---|
| Severity | Low |
| Category | Concurrency & thread safety |
| Location | `src/MxGateway.Tests/Gateway/Workers/Fakes/FakeWorkerHarness.cs:62`, `src/MxGateway.Tests/Gateway/Workers/WorkerClientTests.cs:472` |
| Status | Open |
**Description:** Pipe names are uniquified per test with a GUID (good), but xUnit runs test classes in parallel by default and there is no `xunit.runner.json` or collection configuration. Tests that build a full `WebApplication` bind ephemeral ports (`--urls=http://127.0.0.1:0`, fine) but spin up DI containers and hosted services concurrently. Currently safe, but a future test binding a fixed port would silently collide.
**Recommendation:** Add an `xunit.runner.json` or a collection grouping the `WebApplication`-building tests, and keep the `:0` ephemeral-port convention explicit so future tests do not introduce a fixed-port collision.
**Resolution:** _(open)_
+252
View File
@@ -0,0 +1,252 @@
# Code Review — Worker.Tests
| Field | Value |
|---|---|
| Module | `src/MxGateway.Worker.Tests` |
| Reviewer | Claude Code |
| Review date | 2026-05-18 |
| Commit reviewed | `6c64030` |
| Status | Reviewed |
| Open findings | 8 |
## Checklist coverage
| # | Category | Result |
|---|---|---|
| 1 | Correctness & logic bugs | Issues found: Worker.Tests-010 (weak substring assertion), Worker.Tests-011 (test name overstates what it proves). |
| 2 | mxaccessgw conventions | Tests respect STA-affinity and the WorkerEnvelope frame protocol; naming-convention drift only (Worker.Tests-009). |
| 3 | Concurrency & thread safety | Issues found: Worker.Tests-003/004/013 (wall-clock and fixed-delay timing assertions). |
| 4 | Error handling & resilience | COMException/HResult, pipe-never-appears, malformed frames, shutdown-during-command, watchdog all covered; queue branch gap (Worker.Tests-015). |
| 5 | Security | No real secrets; redaction explicitly tested. No issues found. |
| 6 | Performance & resource management | Issues found: Worker.Tests-005 (`MemoryStream` not disposed), Worker.Tests-006 (`MxAccessStaSession` leak on assertion failure). |
| 7 | Design-document adherence | Tests match `docs/Worker*.md`; `docs/WorkerFrameProtocol.md` is stale (Worker.Tests-007). |
| 8 | Code organization & conventions | Issues found: Worker.Tests-009 (two naming conventions), Worker.Tests-014 (duplicated test doubles). |
| 9 | Testing coverage | Issues found: Worker.Tests-001 (`StaMessagePump` untested), Worker.Tests-002 (COM-event delivery untested), Worker.Tests-012 (frame-validation gaps). |
| 10 | Documentation & comments | Issues found: Worker.Tests-008 (misplaced redaction test), Worker.Tests-011 (misleading test name). |
## Findings
### Worker.Tests-001
| Field | Value |
|---|---|
| Severity | High |
| Category | Testing coverage |
| Location | `src/MxGateway.Worker.Tests/Sta/` (no `StaMessagePumpTests.cs`) |
| Status | Resolved |
**Description:** `StaMessagePump` — whose entire reason for existing is pumping Windows messages so MXAccess COM event sink calls deliver onto the STA — has no direct unit test. `WaitForWorkOrMessages` (timeout conversion, the `MsgWaitForMultipleObjectsEx` failure path) and `PumpPendingMessages` (drain count) are exercised only indirectly via `StaRuntime`, which never asserts the pump returns/throws correctly. The `MsgWaitFailed` error branch and `ToTimeoutMilliseconds` edge cases (`InfiniteTimeSpan`, `<= Zero`, `>= uint.MaxValue`) are completely uncovered.
**Recommendation:** Add `StaMessagePumpTests` that post a Windows message to the STA thread and assert `PumpPendingMessages` returns the expected count; cover `WaitForWorkOrMessages` waking on a signaled event vs timeout; cover `ToTimeoutMilliseconds` boundaries through an internals-visible seam.
**Resolution:** 2026-05-18 — Added `src/MxGateway.Worker.Tests/Sta/StaMessagePumpTests.cs` (8 `[Fact]` tests, run on dedicated STA threads). Covers `WaitForWorkOrMessages` null-argument validation, returning immediately when the wake event is pre-signalled, waking when the event is signalled mid-wait, returning on timeout when never signalled, the `TimeSpan.Zero` (`<= Zero`) conversion branch, and waking on a `WM_NULL` Windows message posted to the STA thread (the `QS_ALLINPUT` path). `PumpPendingMessages` is covered for both an empty queue (returns 0) and three posted messages (returns 3). Boundary noted in the file: the `MsgWaitFailed` branch is not exercised because forcing `MsgWaitForMultipleObjectsEx` to fail needs a deliberately invalid native handle, which is unsafe to construct in-process; `ToTimeoutMilliseconds` is `private static` and is covered indirectly through wait-latency assertions rather than reflection.
### Worker.Tests-002
| Field | Value |
|---|---|
| Severity | High |
| Category | Testing coverage |
| Location | `src/MxGateway.Worker.Tests/MxAccess/MxAccessStaSessionTests.cs`, `src/MxGateway.Worker.Tests/MxAccess/MxAccessEventMapperTests.cs` |
| Status | Resolved |
**Description:** No test verifies that a COM event raised on the STA thread is converted to protobuf and lands in the `MxAccessEventQueue`. `MxAccessEventMapperTests` exercises the mapper directly with hand-built fakes, and `AlarmDispatcherTests` covers the alarm sink, but the non-alarm COM-event path (`MxAccessBaseEventSink`/`MxAccessComServer` event handlers → `MxAccessEventMapper` → queue, triggered by an actual sink callback) is never end-to-end tested. Given the worker's core purpose is to convert COM events to protobuf, this is a significant gap.
**Recommendation:** Add a test that invokes the base event sink's data-change handler (via an internal seam or a fake COM event source) and asserts a converted `WorkerEvent` with correct family/sequence appears in the queue.
**Resolution:** 2026-05-18 — Added `src/MxGateway.Worker.Tests/MxAccess/MxAccessBaseEventSinkTests.cs` (5 `[Fact]` tests). The four `MxAccessBaseEventSink` COM event handlers (`OnDataChange`, `OnWriteComplete`, `OperationComplete`, `OnBufferedDataChange`) — the exact delegate targets the MXAccess COM runtime invokes — were widened from `private` to `internal` (with XML-doc notes that this is a unit-test seam), and `[assembly: InternalsVisibleTo("MxGateway.Worker.Tests")]` was added to `MxGateway.Worker.csproj`. The tests construct a real `MxAccessBaseEventSink` over a real `MxAccessEventMapper` and `MxAccessEventQueue`, invoke each handler with COM-style arguments, and assert a correctly-converted protobuf `WorkerEvent` (family, body case, server/item handle, value, quality, source timestamp, monotonic `WorkerSequence`) lands in the queue. Boundary noted in the file: the COM `+=` wire-up in `Attach`/`Detach` casts to the sealed `LMXProxyServerClass` RCW and cannot run without a live MXAccess COM object, so it is not exercised; invoking the handlers directly reproduces an STA-thread COM callback and exercises the genuine conversion + enqueue path.
### Worker.Tests-003
| Field | Value |
|---|---|
| Severity | Medium |
| Category | Concurrency & thread safety |
| Location | `src/MxGateway.Worker.Tests/Sta/StaRuntimeTests.cs:46-48` |
| Status | Resolved |
**Description:** `InvokeAsync_WakesIdlePumpForQueuedCommand` asserts `stopwatch.Elapsed < TimeSpan.FromSeconds(2)` — a wall-clock assertion that on a loaded CI agent can exceed 2s, producing a false failure. The test also does not actually prove the wake event (vs the 50 ms idle pump) caused the dispatch.
**Recommendation:** Remove the wall-clock assertion (the awaited result already proves the command ran), or raise the budget substantially with a comment that it is a coarse smoke check.
**Resolution:** 2026-05-18 — Removed the `Stopwatch` and the `stopwatch.Elapsed < TimeSpan.FromSeconds(2)` wall-clock assertion from `InvokeAsync_WakesIdlePumpForQueuedCommand`. The test already constructs the `StaRuntime` with a 30-second idle pump period, so the awaited `InvokeAsync` completing at all proves the command wake event — not the idle pump tick — drove the dispatch; no timing budget is needed. The XML-doc comment now states this explicitly. The now-unused `using System.Diagnostics;` was removed (`TreatWarningsAsErrors`).
### Worker.Tests-004
| Field | Value |
|---|---|
| Severity | Medium |
| Category | Concurrency & thread safety |
| Location | `src/MxGateway.Worker.Tests/MxAccess/MxAccessStaSessionTests.cs:281-329` |
| Status | Resolved |
**Description:** `StartAsync_WithAlarmCommandHandlerFactory_PollOnceCalledViaSta` and `Dispose_StopsAlarmPollLoop` use poll-until loops, and `Dispose_StopsAlarmPollLoop` additionally does `await Task.Delay(1000)` then asserts `PollCount` is unchanged. The 1s "no further polls" window is a timing race: a poll scheduled just before disposal could increment the counter afterward, and a slow agent could simply not run a poll in the window even without correct stop logic.
**Recommendation:** Make the poll loop deterministically observable — expose a "poll loop stopped" signal or have `Dispose` join the poll task — then assert on that rather than on elapsed-time silence.
**Resolution:** 2026-05-18 — `MxAccessStaSession.Dispose` now joins the alarm poll task (`pollTaskToJoin.Wait(TimeSpan.FromSeconds(5))`) after cancelling the poll CTS, instead of setting `alarmPollTask = null` and discarding it. Once `Dispose` returns, the poll loop has provably exited and no `PollOnce` call can still be in flight. `Dispose_StopsAlarmPollLoop` was rewritten to drop the `await Task.Delay(1000)` "no further polls" window: it now captures `PollCount` immediately after `Dispose()` returns and re-asserts equality after a bare `await Task.Yield()` — a deterministic frozen-count check rather than an elapsed-time race. The success-direction poll-until loop in `PollOnceCalledViaSta` was left as-is: waiting for an event to *occur* is sound; only waiting for an event to *not* occur is the race, and that pattern is now eliminated. Note: `ShutdownGracefullyAsync` already joined the poll task, so this change makes `Dispose` consistent with the graceful path.
### Worker.Tests-005
| Field | Value |
|---|---|
| Severity | Medium |
| Category | Performance & resource management |
| Location | `src/MxGateway.Worker.Tests/Ipc/WorkerFrameProtocolTests.cs:20-31,103-105`, `src/MxGateway.Worker.Tests/Ipc/WorkerPipeSessionTests.cs:28-31` |
| Status | Resolved |
**Description:** `MemoryStream` instances are created and never disposed across the frame-protocol and pipe-session tests (`MemoryStream stream = new();` with no `using`). Disposal is cheap so impact is low, but it is inconsistent with the rest of the suite (which carefully `using`s `CancellationTokenSource`, `StaRuntime`, `PipePair`). `WorkerFrameWriter`/`WorkerFrameReader` are also constructed without disposal.
**Recommendation:** Wrap `MemoryStream` (and reader/writer if they are `IDisposable`) in `using` declarations for consistency.
**Resolution:** 2026-05-18 — All six `MemoryStream` test-body declarations in `WorkerFrameProtocolTests.cs` and the five `inbound`/`outbound` `MemoryStream` declarations in the `WorkerPipeSessionTests.cs` handshake tests were converted to `using` declarations, matching how the rest of the suite handles `CancellationTokenSource`/`StaRuntime`/`PipePair`. Re-triage of the parenthetical: `WorkerFrameWriter` and `WorkerFrameReader` are **not** `IDisposable` (`sealed class` with no `IDisposable` and no `Dispose` member — verified in `src/MxGateway.Worker/Ipc/`), so the finding's "reader/writer if they are `IDisposable`" suggestion does not apply and no change was made there. The shared `MemoryStream` instances inside the `WorkerPipeSessionTests` harness/helper classes (`ReadWrittenFrames` parameter, the `PipePair`/harness fields) are out of the cited line scope and were left untouched.
### Worker.Tests-006
| Field | Value |
|---|---|
| Severity | Medium |
| Category | Performance & resource management |
| Location | `src/MxGateway.Worker.Tests/MxAccess/MxAccessStaSessionTests.cs:282,305,315,323` |
| Status | Resolved |
**Description:** `Dispose_StopsAlarmPollLoop` constructs `MxAccessStaSession session` without `using` (unlike every sibling test) and relies on an explicit `session.Dispose()`. If an assertion between `StartAsync` and `Dispose()` throws, the session — its STA thread and poll loop — leaks for the rest of the run. The `StaRuntime` is `using`d so the thread is eventually reclaimed, but the alarm poll loop and handler are not.
**Recommendation:** Use `using MxAccessStaSession session = ...` and drop the manual `Dispose()`, or wrap the body in try/finally.
**Resolution:** 2026-05-18 — `Dispose_StopsAlarmPollLoop` now declares its `MxAccessStaSession` with a `using` declaration. The manual `session.Dispose()` is kept because the test's purpose is to observe poll behaviour across disposal — but `MxAccessStaSession.Dispose` is idempotent (guarded by the `disposed` field), so the explicit mid-test call and the `using`-scope call do not conflict. An assertion thrown anywhere in the body now still tears the session (STA poll loop + alarm handler) down. The cited line numbers in the finding were imprecise — they straddle `PollOnceCalledViaSta` and `Dispose_StopsAlarmPollLoop` — but the described root cause (one `MxAccessStaSession` constructed without `using`) was singular and is the one in `Dispose_StopsAlarmPollLoop`; the sibling tests `PollOnceCalledViaSta` and `RunAlarmPollLoop_WhenPollOnceThrows_RecordsFaultOnEventQueue` already used `using` and needed no change.
### Worker.Tests-007
| Field | Value |
|---|---|
| Severity | Medium |
| Category | Design-document adherence |
| Location | `docs/WorkerFrameProtocol.md:38-49` |
| Status | Resolved |
**Description:** `docs/WorkerFrameProtocol.md` instructs running `dotnet test src/MxGateway.Tests/MxGateway.Tests.csproj --filter WorkerFrameProtocolTests` and states the frame protocol "is part of `MxGateway.Server`". The frame protocol actually lives in `MxGateway.Worker.Ipc` and is tested by `src/MxGateway.Worker.Tests/Ipc/WorkerFrameProtocolTests.cs`. The doc's verification command points at the wrong project and build, so anyone following it after changing the worker frame protocol will not run the relevant tests.
**Recommendation:** Update `docs/WorkerFrameProtocol.md` to reference `src/MxGateway.Worker.Tests` and the x86 worker build (`-p:Platform=x86`).
**Resolution:** 2026-05-18 — Rewrote the `## Verification` section of `docs/WorkerFrameProtocol.md`. The test command now targets `src/MxGateway.Worker.Tests/MxGateway.Worker.Tests.csproj -p:Platform=x86 --filter WorkerFrameProtocolTests`; the build command now targets `src/MxGateway.Worker/MxGateway.Worker.csproj -p:Platform=x86`. The prose now states the frame protocol lives in `MxGateway.Worker.Ipc` (naming `WorkerFrameReader`/`WorkerFrameWriter`/`WorkerFrameProtocolOptions` and the `WorkerFrameProtocolTests.cs` test file) and notes the worker is an x86 process. Verified against the source: the frame-protocol types are confirmed under `src/MxGateway.Worker/Ipc/` and the tests under `src/MxGateway.Worker.Tests/Ipc/`, so the original doc was wrong on both project and component. Fenced code blocks were also relabelled `powershell` (the build/test commands are run from PowerShell on this Windows dev box).
### Worker.Tests-008
| Field | Value |
|---|---|
| Severity | Low |
| Category | Documentation & comments |
| Location | `src/MxGateway.Worker.Tests/Conversion/VariantConverterTests.cs:175-182` |
| Status | Open |
**Description:** `Redactor_WithCredentialBearingValueFields_RedactsBeforeLogging` lives in `VariantConverterTests` but asserts on `WorkerLogRedactor.RedactValue`, which has nothing to do with `VariantConverter`. It is also a near-duplicate of coverage in `WorkerLogRedactorTests`. Placing redaction coverage inside the variant-converter class is misleading.
**Recommendation:** Move this test into `Bootstrap/WorkerLogRedactorTests.cs` (which already exists and tests `RedactFields`).
**Resolution:** _(open)_
### Worker.Tests-009
| Field | Value |
|---|---|
| Severity | Low |
| Category | Code organization & conventions |
| Location | `src/MxGateway.Worker.Tests/MxAccess/AlarmCommandHandlerTests.cs`, `AlarmDispatcherTests.cs`, `AlarmCommandExecutorTests.cs`, `AlarmRecordTransitionMapperTests.cs`, `WnWrapAlarmConsumerXmlTests.cs` |
| Status | Open |
**Description:** The alarm-related test files use `snake_case` method names while the rest of the project uses the `Method_State_Result` PascalCase convention. `docs/style-guides/CSharpStyleGuide.md` and the surrounding code establish PascalCase as the project convention; the alarm files diverge.
**Recommendation:** Rename alarm-test methods to the `Method_Scenario_Expectation` PascalCase form for one consistent convention.
**Resolution:** _(open)_
### Worker.Tests-010
| Field | Value |
|---|---|
| Severity | Low |
| Category | Correctness & logic bugs |
| Location | `src/MxGateway.Worker.Tests/MxAccess/MxAccessStaSessionTests.cs:230-258` |
| Status | Open |
**Description:** `StartAsync_WithoutAlarmCommandHandlerFactory_SubscribeAlarmsReturnsInvalidRequest` asserts `Assert.Contains("alarm", reply.DiagnosticMessage, StringComparison.OrdinalIgnoreCase)`. The XML doc claims it verifies the diagnostic says "alarm consumer not configured", but the assertion only checks the substring "alarm" — which would also match an unrelated message like "invalid alarm GUID". The assertion is weaker than the documented intent.
**Recommendation:** Assert the full diagnostic phrase so the test fails if the diagnostic regresses to a misleading message.
**Resolution:** _(open)_
### Worker.Tests-011
| Field | Value |
|---|---|
| Severity | Low |
| Category | Documentation & comments |
| Location | `src/MxGateway.Worker.Tests/Sta/StaCommandDispatcherTests.cs:92-112` |
| Status | Open |
**Description:** `DispatchAsync_WhenCanceledAfterExecutionStarts_StillReturnsLateReply` is named and documented as if it proves cancellation arrived after execution began. The test does `Started.Wait(...)` then `cancellation.Cancel()`, which proves execution started, but because the executor is already running on the STA the cancellation is inherently a no-op — the test cannot distinguish "cancel was observed and ignored" from "cancel was never checked". The name overstates what is proven.
**Recommendation:** Either tighten the test (assert the dispatcher's cancel path was reached and declined) or rename/comment it to "cancellation cannot abort an in-flight STA command", matching `gateway.md`'s stated behavior.
**Resolution:** _(open)_
### Worker.Tests-012
| Field | Value |
|---|---|
| Severity | Low |
| Category | Testing coverage |
| Location | `src/MxGateway.Worker.Tests/Ipc/WorkerFrameProtocolTests.cs` |
| Status | Open |
**Description:** `docs/WorkerFrameProtocol.md` states the reader "rejects zero-length payloads and payloads larger than the configured maximum (default 16 MiB) before allocating the payload buffer." `WorkerFrameProtocolTests` covers malformed-length, wrong protocol version, wrong session, and malformed payload, but has no test for the zero-length-payload rejection or the oversized-frame rejection — both explicit security-relevant input-validation paths.
**Recommendation:** Add tests feeding a frame with `payload_length == 0` and one with `payload_length` above the configured maximum, asserting the corresponding `WorkerFrameProtocolErrorCode`.
**Resolution:** _(open)_
### Worker.Tests-013
| Field | Value |
|---|---|
| Severity | Low |
| Category | Concurrency & thread safety |
| Location | `src/MxGateway.Worker.Tests/Ipc/WorkerPipeSessionTests.cs:539-546` |
| Status | Open |
**Description:** `ThrowIfCompletedAsync` does an unconditional `await Task.Delay(TimeSpan.FromMilliseconds(100))` then checks `task.IsCompleted`. This adds a fixed 100 ms to the test and only catches a `RunAsync` that fails within that arbitrary window; a session that faults after 100 ms slips past undetected.
**Recommendation:** Replace with a deterministic race: `await Task.WhenAny(runTask, <first-expected-frame-read>)` and assert the run task did not win.
**Resolution:** _(open)_
### Worker.Tests-014
| Field | Value |
|---|---|
| Severity | Low |
| Category | Code organization & conventions |
| Location | `src/MxGateway.Worker.Tests/Ipc/WorkerPipeClientTests.cs:194`, `WorkerPipeSessionTests.cs:622`, `Sta/StaCommandDispatcherTests.cs:348`, `MxAccess/MxAccessStaSessionTests.cs:334`, `MxAccess/MxAccessCommandExecutorTests.cs:1124` |
| Status | Open |
**Description:** `FakeRuntimeSession`, `NoopComApartmentInitializer`, `NoopEventSink`/`NullEventSink`, and the `CreateFrame`/`WriteUInt32LittleEndian` helpers are re-implemented independently in multiple test files. The two `FakeRuntimeSession` implementations have already diverged (one supports `BlockDispatch`/event enqueue, one does not), and `NoopComApartmentInitializer` is defined four times.
**Recommendation:** Extract shared test doubles (`NoopComApartmentInitializer`, frame helpers, a single configurable `FakeRuntimeSession`) into a `TestSupport` folder/namespace consumed by all test classes.
**Resolution:** _(open)_
### Worker.Tests-015
| Field | Value |
|---|---|
| Severity | Low |
| Category | Testing coverage |
| Location | `src/MxGateway.Worker.Tests/MxAccess/MxAccessEventQueueTests.cs` |
| Status | Open |
**Description:** `MxAccessEventQueueTests` covers monotonic sequencing, drain, capacity overflow, and first-fault-wins, but does not cover `Drain` with `maxEvents: 0` (drain-all) — a branch `FakeRuntimeSession.DrainEvents` even special-cases — nor draining an empty queue, nor enqueue after a manual `RecordFault`. These are minor branches but the overflow/fault interaction is the worker's backpressure contract.
**Recommendation:** Add a `Drain(0)` drain-all test and an empty-queue drain test.
**Resolution:** _(open)_
+256
View File
@@ -0,0 +1,256 @@
# Code Review — Worker
| Field | Value |
|---|---|
| Module | `src/MxGateway.Worker` |
| Reviewer | Claude Code |
| Review date | 2026-05-18 |
| Commit reviewed | `6c64030` |
| Status | Reviewed |
| Open findings | 7 |
## Checklist coverage
| # | Category | Result |
|---|---|---|
| 1 | Correctness & logic bugs | Issues found: heartbeat loop sleeps before first beat (Worker-002), `ProcessCommandAsync` state race drops replies (Worker-003), watchdog/heartbeat state inconsistency (Worker-004), double-dispose path (Worker-006), plus Worker-010/011/015. |
| 2 | mxaccessgw conventions | Issue found: Worker-007 (reflection-based COM invocation bypasses the typed interface contract). |
| 3 | Concurrency & thread safety | Issues found: Worker-001 (`WnWrapAlarmConsumer` timer fires COM off the STA), Worker-008 (consumer factory STA-affinity not enforced). |
| 4 | Error handling & resilience | Issue found: Worker-005 (`OnPoll` silently swallows all poll failures). |
| 5 | Security | No secret logging (redaction applied); inbound frame validation reasonable. No issues found. |
| 6 | Performance & resource management | Issue found: Worker-009 (per-frame `byte[]` allocations on the hot event path). COM release is correct. |
| 7 | Design-document adherence | Code matches `WorkerSta.md`/`WorkerFrameProtocol.md`; stale alarm-path docs (Worker-012). |
| 8 | Code organization & conventions | Issue found: Worker-014 (`AlarmCommandHandler.cs` declares two public types in one file). |
| 9 | Testing coverage | Issue found: Worker-013 (`StaMessagePump` has no direct tests; poll-loop lifecycle untested). |
| 10 | Documentation & comments | Issue found: Worker-012 (stale "future PR / A.3" comments now describe shipped code). |
## Findings
### Worker-001
| Field | Value |
|---|---|
| Severity | High |
| Category | Concurrency & thread safety |
| Location | `src/MxGateway.Worker/MxAccess/WnWrapAlarmConsumer.cs:204-207` |
| Status | Resolved |
**Description:** When constructed with `pollIntervalMilliseconds > 0`, `Subscribe` starts a `System.Threading.Timer` whose `OnPoll` callback runs `PollOnce()` — which calls `wwAlarmConsumerClass.GetXmlCurrentAlarms2` — on a thread-pool thread. The wnwrap CLSID is registered `ThreadingModel=Apartment`; calling its methods off the owning STA violates the hard rule that all COM calls happen on the dedicated STA thread, and can deadlock on cross-apartment marshaling when the STA is not pumping. The production path (default constructor, interval 0) is safe, but the public 3-arg constructor leaves this footgun callable, and tests/live-smoke use it.
**Recommendation:** Remove the internal `Timer` entirely (production already drives `PollOnce` from the STA), or document and gate it so it can only be used from an STA thread. At minimum, make the timer-driven mode unreachable from any production wiring.
**Resolution:** 2026-05-18 — Removed the off-STA timer infrastructure from `WnWrapAlarmConsumer`: the `Timer? pollTimer` and `pollIntervalMs` fields, the `DefaultPollIntervalMilliseconds` constant, the `OnPoll` callback, the timer-arming arm in `Subscribe`, and the timer disposal block in `Dispose`. The `pollIntervalMilliseconds` parameter is gone from both public constructors (the test-seam ctor is now 2-arg: `wwAlarmConsumerClass` + `maxAlarmsPerFetch`), so the off-STA footgun is structurally unreachable. `PollOnce()` remains the public STA-driven entry point. The stale "poll … on a timer below" comment was corrected. Verified by the regression tests `WnWrapAlarmConsumer_has_no_internal_timer_field` and `WnWrapAlarmConsumer_exposes_no_poll_interval_constructor_parameter`; the `AlarmsLiveSmokeTests` call site was updated to the 2-arg constructor.
### Worker-002
| Field | Value |
|---|---|
| Severity | High |
| Category | Correctness & logic bugs |
| Location | `src/MxGateway.Worker/Ipc/WorkerPipeSession.cs:545-549` |
| Status | Resolved |
**Description:** `RunHeartbeatLoopAsync` calls `await Task.Delay(_sessionOptions.HeartbeatInterval, ...)` before sending the first heartbeat. The gateway therefore receives no heartbeat for the first full interval (default 5s) after the worker reaches `Ready`. If the gateway's liveness watchdog expects a heartbeat sooner, a healthy worker can be misclassified as hung at startup.
**Recommendation:** Send an initial heartbeat immediately on entering the loop, or move the `Task.Delay` to the end of the loop body.
**Resolution:** 2026-05-18 — Restructured `RunHeartbeatLoopAsync` so the `Task.Delay(HeartbeatInterval)` is applied between beats only, not before the first. A `firstBeat` guard skips the delay on the initial iteration, so the gateway sees a heartbeat as soon as the worker is `Ready`; cancellation behavior is preserved (the loop still observes the token and the delay still throws on cancellation). Verified by the regression test `RunAsync_SendsFirstHeartbeatImmediatelyOnEnteringLoop`. Three pre-existing tests (`WorkerPipeClientTests.RunAsync_ConnectsToPipeAndCompletesHandshake`, `WorkerPipeClientTests.RunAsync_RetriesUntilPipeServerAppears`, `WorkerPipeSessionTests.RunAsync_WhenCommandThrowsAfterShutdown_DropsLateFaultAndWritesShutdownAck`) assumed strict frame ordering and were updated to skip the now-interleaved first heartbeat while still asserting the same shutdown-ack behavior.
### Worker-003
| Field | Value |
|---|---|
| Severity | High |
| Category | Correctness & logic bugs |
| Location | `src/MxGateway.Worker/Ipc/WorkerPipeSession.cs:399-403`, `:416-419` |
| Status | Resolved |
**Description:** `ProcessCommandAsync` checks `_state` after `DispatchAsync` completes and silently `return`s without writing a `WorkerCommandReply` (or fault) when `_state` is not `Ready`/`ExecutingCommand`. `_state` is a plain field mutated from multiple tasks (heartbeat loop, event-drain loop, shutdown). A command that completes successfully while `_state` has transitioned will have its reply dropped with no diagnostic, and the gateway's correlation-id wait then hangs until its own timeout. The `_state` read is also not synchronized.
**Recommendation:** Always attempt to write the reply/fault for an in-flight command, or explicitly reject in-flight commands with a `Canceled`/`WorkerUnavailable` reply during state transitions. Make `_state` access thread-safe (volatile or locked).
**Resolution:** 2026-05-18 — Both silent-drop `return` sites in `ProcessCommandAsync` (the post-`DispatchAsync` success path and the exception path) now call a new `LogCommandResultDropped` helper before returning. The helper logs an Information event named `WorkerCommandResultDropped` via the session's `IWorkerLogger`, carrying the command's `correlation_id` plus `command_method` and `worker_state`, so a stuck gateway correlation-id wait is now traceable. The `_state` field was made `volatile` (`WorkerState` is an int-backed protobuf enum, so volatile is valid) so cross-thread reads observe the latest value without tearing; this is a low-risk, non-behavioral change and did not destabilize any test. Verified by the regression test `RunAsync_WhenReplyIsDroppedAfterShutdown_LogsDiagnostic`.
### Worker-004
| Field | Value |
|---|---|
| Severity | Medium |
| Category | Correctness & logic bugs |
| Location | `src/MxGateway.Worker/Ipc/WorkerPipeSession.cs:565-588` |
| Status | Resolved |
**Description:** After `ReportWatchdogFaultIfNeededAsync` sends an `StaHung` fault, the heartbeat loop continues sending normal heartbeats with `State` derived from `_state`, which the watchdog path never sets to `Faulted`. The heartbeat then keeps reporting a non-faulted state that contradicts the fault just sent.
**Recommendation:** Set `_state = WorkerState.Faulted` (thread-safely) when the watchdog fault fires so heartbeat state and fault stay consistent.
**Resolution:** 2026-05-18 — `ReportWatchdogFaultIfNeededAsync` now sets `_state = WorkerState.Faulted` immediately after `_watchdogFaultSent = true` and before the `StaHung` fault is written, so the next heartbeat reports `Faulted` instead of contradicting the fault. `_state` is already `volatile` (Worker-003), so the cross-thread write from the heartbeat loop is observed correctly by the heartbeat's own `CreateHeartbeat` read; no further locking is required. Verified by the regression test `WorkerPipeSessionTests.RunAsync_AfterWatchdogFault_HeartbeatReportsFaultedState`, which uses a stale-activity snapshot with an empty current-command correlation id so the heartbeat `State` is derived from `_state` rather than forced to `ExecutingCommand`.
### Worker-005
| Field | Value |
|---|---|
| Severity | Medium |
| Category | Error handling & resilience |
| Location | `src/MxGateway.Worker/MxAccess/MxAccessStaSession.cs:205-258` (production alarm poll loop) |
| Status | Resolved |
**Description:** `OnPoll` catches every exception from `PollOnce()` and discards it (`_ = ex;`). The production poll path (`MxAccessStaSession.RunAlarmPollLoopAsync``AlarmCommandHandler.PollOnce``AlarmDispatcher.PollOnce``consumer.PollOnce()`) has no fault recording either. A permanently failing alarm provider (e.g. `GetXmlCurrentAlarms2` returning `E_FAIL`, malformed XML throwing in `XmlDocument.LoadXml`) is therefore completely silent — no fault on the event queue, no log.
**Recommendation:** Route poll failures to `MxAccessEventQueue.RecordFault` (or a logger) so a broken alarm subscription becomes observable. Update the now-stale comment.
**Re-triage:** The cited location `WnWrapAlarmConsumer.cs:297-313` and the `OnPoll` callback no longer exist as of this branch — Worker-001 removed the off-STA `Timer` and its `OnPoll` callback entirely. The substantive concern still held, however: the **production** poll path in `MxAccessStaSession.RunAlarmPollLoopAsync` caught only `OperationCanceledException`, `ObjectDisposedException`, and `InvalidOperationException`. A genuine poll failure (`COMException` from `GetXmlCurrentAlarms2`, a malformed-XML `XmlException`) escaped uncaught, faulted the never-awaited `Task.Run` poll task, and was silently lost — exactly the silent-failure the finding describes. The finding was re-pointed at the live location and fixed there rather than at the removed `OnPoll`.
**Resolution:** 2026-05-18 — `RunAlarmPollLoopAsync` gained a trailing `catch (Exception exception)` arm after the three graceful-stop catches. A real alarm-poll failure is now converted to a `WorkerFault` (category `MxaccessEventConversionFailed`, carrying the exception type and, for a `COMException`, its `HResult`) by the new `CreateAlarmPollFault` helper and recorded on the session's `MxAccessEventQueue` via `RecordFault`. The worker's event-drain loop drains that fault and forwards it to the gateway, so a broken alarm subscription is now observable on the IPC fault path instead of vanishing. The poll loop still stops after the failure (the subscription is dead). No new proto enum value was added — `MxaccessEventConversionFailed` is the closest existing alarm-path category, avoiding a contracts regeneration across all clients. Verified by the regression test `MxAccessStaSessionTests.RunAlarmPollLoop_WhenPollOnceThrows_RecordsFaultOnEventQueue`.
### Worker-006
| Field | Value |
|---|---|
| Severity | Medium |
| Category | Correctness & logic bugs |
| Location | `src/MxGateway.Worker/Ipc/WorkerPipeSession.cs:117-124`, `src/MxGateway.Worker/MxAccess/MxAccessStaSession.cs:386-491` |
| Status | Resolved |
**Description:** `RunAsync`'s `finally` calls `_runtimeSession?.Dispose()` unless `_shutdownTimedOut`. On the normal path `ShutdownGracefullyAsync` already disposed the STA runtime, so re-entering `Dispose()` is a harmless no-op only because `ShutdownGracefullyAsync` reached its end and set `disposed = true`. If `ShutdownGracefullyAsync` throws `TimeoutException` after partial teardown with `_shutdownTimedOut` set, the session is never disposed at all — the `finally` skips it — leaking the STA thread and COM object, leaving cleanup to rely solely on process exit.
**Recommendation:** Make the dispose decision explicit and confirm process exit always follows a timed-out shutdown; otherwise dispose defensively. At minimum document why disposal is deliberately skipped on timeout.
**Resolution:** 2026-05-18 — `RunAsync`'s `finally` now always calls `_runtimeSession?.Dispose()`; the `if (!_shutdownTimedOut)` guard and the `_shutdownTimedOut` field (which had become write-only) were removed. `MxAccessStaSession.Dispose` is idempotent (`if (disposed) return`) and bounded — each STA join is capped with `Wait(TimeSpan.FromSeconds(2))` — so re-entering it on the normal path (where `ShutdownGracefullyAsync` already disposed the runtime) is a harmless no-op, while on the timed-out path it is now the only thing that reclaims the STA thread and releases the MXAccess COM object. The previous behaviour leaked both on a shutdown timeout and relied solely on process exit. A code comment in the `finally` block documents the reasoning. Verified by the regression test `WorkerPipeSessionTests.RunAsync_WhenShutdownTimesOut_StillDisposesRuntimeSession`, which forces a `TimeoutException` from `ShutdownGracefullyAsync` and asserts the runtime session is disposed before `RunAsync` rethrows.
### Worker-007
| Field | Value |
|---|---|
| Severity | Medium |
| Category | mxaccessgw conventions |
| Location | `src/MxGateway.Worker/MxAccess/MxAccessComServer.cs:130-150` |
| Status | Resolved |
**Description:** `Invoke` uses late-bound `Type.InvokeMember` reflection as a fallback when the COM object does not cast to `ILMXProxyServer*`. In production the object is always `LMXProxyServerClass`, so the reflection path exists only for test doubles — it is dead/untested code on the production path and obscures the interface contract. `params object[] arguments` also boxes value-type handles on every call.
**Recommendation:** Drop the reflection fallback and require the COM object to implement the interface (tests can supply a typed fake), or clearly mark the fallback as test-only.
**Re-triage:** The finding's claim that the reflection path is "dead/untested code" is partly inaccurate — it was in fact the path exercised by the entire `MxAccessCommandExecutorTests` suite, whose `FakeMxAccessComObject` did not implement any typed interface. So the reflection fallback was test-only but *not* untested. The convention concern (bypassing the typed interface contract, boxing value-type handles) is valid, so the fix follows the recommendation's first option.
**Resolution:** 2026-05-18 — The late-bound `Type.InvokeMember` reflection fallback and its `params object[]`-boxing `Invoke` helper were removed from `MxAccessComServer`. Each adapter method now takes one of two typed paths: an `is IMxAccessServer` fast path (test fakes implement `IMxAccessServer` directly) and the production path that casts to the typed `ILMXProxyServer` / `ILMXProxyServer3` / `ILMXProxyServer4` COM interfaces via new `AsProxyServer*` helpers. A COM object implementing neither now fails fast with a clear `InvalidOperationException` naming the missing interface, instead of an opaque late-bound call. The test seam was migrated accordingly: `MxAccessCommandExecutorTests.FakeMxAccessComObject` now declares `: IMxAccessServer` (its method signatures already matched the interface exactly, so no behavioural change). Verified by the new `MxAccessComServerTests` (typed-server routing, untyped-object rejection, original-exception propagation — no more `TargetInvocationException` wrapping) plus the unchanged, still-passing `MxAccessCommandExecutorTests` suite which now exercises the typed `IMxAccessServer` path.
### Worker-008
| Field | Value |
|---|---|
| Severity | Medium |
| Category | Concurrency & thread safety |
| Location | `src/MxGateway.Worker/MxAccess/MxAccessStaSession.cs:205-249`, `:429-447` |
| Status | Resolved |
**Description:** `RunAlarmPollLoopAsync` correctly marshals `handler.PollOnce()` onto the STA via `staRuntime.InvokeAsync`, and the cancel/await/dispose ordering in `ShutdownGracefullyAsync` is sound. However, nothing enforces that the `consumerFactory` and all `IMxAccessAlarmConsumer` calls run on the STA thread; a future caller could break STA affinity silently.
**Recommendation:** Add an assertion or documented invariant that the consumer factory and all `IMxAccessAlarmConsumer` calls run on the STA thread, mirroring the existing `MxAccessSession.CreationThreadId` pattern.
**Resolution:** 2026-05-18 — `MxAccessStaSession` now records the STA thread id (`alarmConsumerThreadId`) at the point the alarm-command-handler factory is invoked — which already runs inside `staRuntime.InvokeAsync` during `StartAsync`, mirroring the `MxAccessSession.CreationThreadId` capture. `RunAlarmPollLoopAsync`'s marshalled poll lambda now calls `EnsureOnAlarmConsumerThread()` before `handler.PollOnce()`, asserting the poll runs on the recorded STA thread. The check is delegated to a new `internal static` guard `AssertOnAlarmConsumerThread(int? expected, int actual)` that throws a descriptive `InvalidOperationException` on an affinity violation and is a no-op when the consumer thread is unrecorded (no alarm handler configured). Making the guard `static` and `internal` keeps it directly unit-testable. The STA-affinity invariant is documented in the guard's XML doc. Verified by the regression tests `MxAccessStaSessionTests.AssertOnAlarmConsumerThread_WhenOffOwningThread_Throws` and `AssertOnAlarmConsumerThread_OnOwningThreadOrUnset_DoesNotThrow`.
### Worker-009
| Field | Value |
|---|---|
| Severity | Low |
| Category | Performance & resource management |
| Location | `src/MxGateway.Worker/Ipc/WorkerFrameReader.cs:31,49`, `src/MxGateway.Worker/Ipc/WorkerFrameWriter.cs:57-58` |
| Status | Open |
**Description:** Every frame read allocates a fresh 4-byte length buffer and a payload `byte[]`; every write allocates `ToByteArray()` plus a 4-byte prefix. On the hot event-drain path (batches of up to 128 `WorkerEvent` frames every 25 ms) this produces steady gen-0 garbage. `WorkerFrameWriter` also effectively serializes twice (`CalculateSize()` then `ToByteArray()`).
**Recommendation:** Reuse a pooled buffer / `ArrayPool<byte>` for the length prefix and payload, and write directly into a pooled buffer using `CodedOutputStream`. Low priority unless event throughput is high.
**Resolution:** _(open)_
### Worker-010
| Field | Value |
|---|---|
| Severity | Low |
| Category | Correctness & logic bugs |
| Location | `src/MxGateway.Worker/Conversion/VariantConverter.cs:204-226` |
| Status | Open |
**Description:** `ConvertInt64Scalar` is reached for `TypeCode.UInt32` and `TypeCode.Int64`. For a `uint` with `expectedDataType == MxDataType.Time`, the value is treated as a Windows `FILETIME` via `DateTime.FromFileTimeUtc(longValue)`; a 32-bit FILETIME is never a valid full FILETIME, so this silently produces a near-epoch timestamp rather than a raw/diagnostic value. Unlikely in practice but a silent misconversion.
**Recommendation:** Only apply the `MxDataType.Time` FILETIME projection for 64-bit source types; for `uint` fall through to integer or raw.
**Resolution:** _(open)_
### Worker-011
| Field | Value |
|---|---|
| Severity | Low |
| Category | Correctness & logic bugs |
| Location | `src/MxGateway.Worker/Ipc/WorkerPipeClient.cs:169-171` |
| Status | Open |
**Description:** `retryAttempts` is computed as `(connectTimeout / min(connectTimeout, attemptTimeout)) - 1`. With defaults (30000 / 2000) this yields 14 retries, but each retry also incurs Polly exponential backoff. The overall `connectDeadline` (`CancelAfter(connectTimeout)`) is the real bound, so the computed attempt count can be larger or smaller than the time budget allows, and the formula is opaque.
**Recommendation:** Drive retries purely off the `connectDeadline` token (Polly stops when cancelled) and drop the fragile attempt-count arithmetic, or add a comment explaining the intent.
**Resolution:** _(open)_
### Worker-012
| Field | Value |
|---|---|
| Severity | Low |
| Category | Documentation & comments |
| Location | `src/MxGateway.Worker/MxAccess/MxAccessAlarmEventSink.cs:44-55`, `src/MxGateway.Worker/MxAccess/WnWrapAlarmConsumer.cs:38-43`, `src/MxGateway.Worker/MxAccess/MxAccessEventMapper.cs:106-112` |
| Status | Open |
**Description:** Multiple comments describe the alarm path as not-yet-wired future work ("PR A.2 — COM-side subscription scaffold … the worker advertises no alarm subscription", "the worker bootstrap will gain a thin 'run-on-STA' wrapper as part of A.3"). As of commit 6c64030 the alarm command handler, STA poll loop, and `SubscribeAlarms`/`AcknowledgeAlarm`/`QueryActiveAlarms` are all wired. These comments are stale and misleading.
**Recommendation:** Update the XML docs/comments to describe the shipped behavior; remove the "future PR" framing.
**Resolution:** _(open)_
### Worker-013
| Field | Value |
|---|---|
| Severity | Low |
| Category | Testing coverage |
| Location | `src/MxGateway.Worker/Sta/StaMessagePump.cs` |
| Status | Open |
**Description:** `StaMessagePump` — the heart of COM event delivery (`MsgWaitForMultipleObjectsEx` + `PeekMessage`/`DispatchMessage`) — has no direct unit tests. `StaRuntimeTests` exercises it indirectly for command wake-up but never verifies that a posted Windows message actually wakes the wait and is dispatched, nor that `PumpPendingMessages` returns a correct count. The alarm poll-loop lifecycle in `MxAccessStaSession` (start/cancel/await on shutdown) also has no test. These are the most failure-sensitive paths in the module.
**Recommendation:** Add tests that post a message to the STA thread and assert it is pumped, and tests covering alarm poll-loop start/stop and shutdown ordering.
**Resolution:** _(open)_
### Worker-014
| Field | Value |
|---|---|
| Severity | Low |
| Category | Code organization & conventions |
| Location | `src/MxGateway.Worker/MxAccess/AlarmCommandHandler.cs:33`, `:202` |
| Status | Open |
**Description:** The file declares two public types — the `AlarmCommandHandler` class and the `IAlarmCommandHandler` interface. The C# style guide and the rest of the module follow one-public-type-per-file (e.g. interfaces in their own `I*.cs` files like `IMxAccessAlarmConsumer.cs`).
**Recommendation:** Move `IAlarmCommandHandler` to its own `IAlarmCommandHandler.cs` for consistency.
**Resolution:** _(open)_
### Worker-015
| Field | Value |
|---|---|
| Severity | Low |
| Category | Correctness & logic bugs |
| Location | `src/MxGateway.Worker/MxAccess/MxAccessEventQueue.cs:115-145` |
| Status | Open |
**Description:** On overflow, `Enqueue` records the overflow fault and throws `MxAccessEventQueueOverflowException`; `MxAccessBaseEventSink.EnqueueEvent` catches it and calls `RecordFault` again. `RecordFault` is a no-op when a fault already exists, so the second call is harmless — but the intent is muddled, and there is no test asserting the dropped-event behavior. This is acceptable per the fail-fast design but undocumented at the call site.
**Recommendation:** Add a brief comment in `EnqueueEvent` clarifying that an overflow exception is expected and already self-records its fault, so the catch is intentionally a near no-op.
**Resolution:** _(open)_
+53
View File
@@ -0,0 +1,53 @@
# Code Review — &lt;Module&gt;
<!-- Template for a per-module findings file. Copy to code-reviews/<Module>/findings.md.
See ../../REVIEW-PROCESS.md for the full process. The base README.md is generated
from these files by regen-readme.py — do not edit README.md by hand. -->
| Field | Value |
|---|---|
| Module | `src/MxGateway.<Module>` |
| Reviewer | <name> |
| Review date | <YYYY-MM-DD> |
| Commit reviewed | `<short-sha>` |
| Status | Not started |
| Open findings | 0 |
## Checklist coverage
A comprehensive review completes every category, recording "No issues found" where
a category produced nothing rather than leaving it blank.
| # | Category | Result |
|---|---|---|
| 1 | Correctness & logic bugs | _pending_ |
| 2 | mxaccessgw conventions | _pending_ |
| 3 | Concurrency & thread safety | _pending_ |
| 4 | Error handling & resilience | _pending_ |
| 5 | Security | _pending_ |
| 6 | Performance & resource management | _pending_ |
| 7 | Design-document adherence | _pending_ |
| 8 | Code organization & conventions | _pending_ |
| 9 | Testing coverage | _pending_ |
| 10 | Documentation & comments | _pending_ |
## Findings
<!-- One ### entry per finding. IDs are <Module>-NNN, sequential within the module,
never reused. Findings are never deleted — close them by changing Status and
completing Resolution. -->
### <Module>-001
| Field | Value |
|---|---|
| Severity | Critical / High / Medium / Low |
| Category | one of the 10 checklist categories |
| Location | `path/to/File.cs:NN` |
| Status | Open / In Progress / Resolved / Won't Fix / Deferred |
**Description:** What is wrong and why it matters.
**Recommendation:** Concrete suggested fix.
**Resolution:** _(empty until closed; on close, record the fixing commit SHA, the date, and a one-line description of the fix)_
+76
View File
@@ -0,0 +1,76 @@
# Prompt — resolve open code-review findings
Reusable orchestration prompt for clearing the `code-reviews/` backlog. Paste it
to a fresh agent when you want the remaining findings worked through.
---
Resolve all open code-review findings (every severity), following the same
workflow already used to resolve the Critical dashboard finding and the
Client.Rust module (see git commits `a8aafdf`, `0d8a28d`, `9082e50`).
## Setup
- Read `code-reviews/README.md` for the open findings and `REVIEW-PROCESS.md`
for the workflow. Group the open findings by module.
- A module is one folder under `code-reviews/` — a `src/MxGateway.*` project or
a `clients/` language client. The module→source mapping and the per-module
build/test commands are in `CLAUDE.md` (the "Source Update Workflow" table and
the per-client commands).
## Dispatch — one general-purpose subagent per module, in batches of ~5 modules
Each subagent, for every open finding in its assigned module, must:
- Verify the finding's root cause against the actual source. Do NOT trust the
finding text — if it is wrong or misclassified, re-triage it (correct the
severity/description in that module's `findings.md`) instead of forcing a fix.
- Use real TDD: write the regression test FIRST and run it to confirm it fails,
THEN implement the root-cause fix, THEN confirm it passes. (Do not use
`git stash` — parallel agents would race on the shared stash stack.)
- Run that module's full build and test suite with the module-appropriate
toolchain and confirm it is green:
- `src/MxGateway.*` .NET projects — `dotnet build` + `dotnet test` for the
project; the Worker must build x86 (`-p:Platform=x86`).
- `clients/dotnet``dotnet build clients/dotnet/MxGateway.Client.sln` and its tests.
- `clients/go``gofmt`, `go build ./...`, `go test ./...`.
- `clients/rust``cargo fmt`, `cargo test --workspace`,
`cargo clippy --workspace --all-targets -- -D warnings`.
- `clients/python``python -m pytest`.
- `clients/java``gradle test`.
- A regression test for a gateway-server finding belongs in `src/MxGateway.Tests`;
for a worker finding, in `src/MxGateway.Worker.Tests`. Adding a test there is
permitted even though it is a different module's source tree.
- Update only that module's `code-reviews/<Module>/findings.md`: set each
resolved finding's Status to `Resolved` with a Resolution note describing the
fix (the orchestrator appends the fixing commit SHA), and update the header
"Open findings" count.
- CONSTRAINTS: edit only the source and test files needed for the assigned
module's findings, plus that module's own `findings.md`. Do NOT edit
`code-reviews/README.md`. Do NOT commit. Do NOT touch another module's
`findings.md`.
- Report a summary: each finding — root-cause confirmation, the fix, test names,
and any re-triage.
Batch so that no two subagents in the same batch write to the same test project
— e.g. do not run the `Server` and `Contracts` agents together, since both add
regression tests under `src/MxGateway.Tests`.
## After each batch returns (orchestrator does this — keep your own context lean)
- Build and test every component the batch touched, using the `CLAUDE.md`
commands; confirm clean. For any .NET change, `dotnet build src/MxGateway.sln`.
- Commit per module — one commit per module, message referencing the finding
IDs. Record the fixing commit SHA in each finding's Resolution.
- Regenerate the index: `python code-reviews/regen-readme.py`, then
`python code-reviews/regen-readme.py --check` to confirm it is consistent;
stage `code-reviews/README.md`. (Use `python` — the bare `python3` alias on
this box resolves to the Windows Store stub and fails.) You may stage
`README.md` with each module's commit, or commit it once per batch after the
script runs.
- Push.
## Continue
Continue batch by batch until all findings are Resolved or re-triaged. If a
finding needs a design decision, skip it and surface it rather than guessing.
+236
View File
@@ -0,0 +1,236 @@
#!/usr/bin/env python3
"""Regenerate code-reviews/README.md from the per-module findings.md files.
The per-module findings.md files are the source of truth. This script aggregates
them into the single cross-module README.md (module status + pending/closed
finding tables).
Usage:
python code-reviews/regen-readme.py # rewrite README.md
python code-reviews/regen-readme.py --check # exit 1 if stale or inconsistent
`--check` fails when README.md is out of date OR when a module's header
`Open findings` count disagrees with its finding statuses, or a finding
carries an unrecognised Status value.
"""
from __future__ import annotations
import re
import sys
from pathlib import Path
ROOT = Path(__file__).resolve().parent
README = ROOT / "README.md"
PENDING_STATUSES = {"Open", "In Progress"}
KNOWN_STATUSES = {"Open", "In Progress", "Resolved", "Won't Fix", "Deferred"}
SEVERITY_ORDER = {"Critical": 0, "High": 1, "Medium": 2, "Low": 3}
GENERATED_NOTE = (
"<!-- GENERATED FILE - do not edit by hand. "
"Regenerate with: python code-reviews/regen-readme.py -->"
)
def cell(value: str) -> str:
"""Escape a value for safe inclusion in a markdown table cell."""
return value.replace("|", "\\|").strip()
def summarize(value: str, limit: int = 240) -> str:
"""Trim a long description to a single-cell-friendly summary."""
value = value.strip()
if len(value) <= limit:
return value
return value[: limit - 1].rstrip() + ""
def first_table(text: str) -> dict[str, str]:
"""Parse the first contiguous block of '| key | value |' rows into a dict."""
rows: dict[str, str] = {}
started = False
for line in text.splitlines():
stripped = line.strip()
if stripped.startswith("|"):
started = True
cells = [c.strip() for c in stripped.strip("|").split("|")]
if len(cells) >= 2:
key, value = cells[0], cells[1]
if key and not set(key) <= {"-", ":"} and key != "Field":
rows[key] = value
elif started:
break
return rows
def parse_module(findings_path: Path) -> dict:
"""Parse one module's findings.md into its header and finding list."""
text = findings_path.read_text(encoding="utf-8")
module = findings_path.parent.name
parts = re.split(r"^##\s+Findings\s*$", text, maxsplit=1, flags=re.M)
header = first_table(parts[0])
findings: list[dict] = []
if len(parts) > 1:
for chunk in re.split(r"^###\s+", parts[1], flags=re.M)[1:]:
fid = chunk.splitlines()[0].strip()
tbl = first_table(chunk)
desc_m = re.search(
r"\*\*Description:\*\*\s*(.*?)(?=\n\*\*|\Z)", chunk, re.S
)
desc = re.sub(r"\s+", " ", desc_m.group(1)).strip() if desc_m else ""
findings.append(
{
"id": fid,
"severity": tbl.get("Severity", ""),
"category": tbl.get("Category", ""),
"location": tbl.get("Location", ""),
"status": tbl.get("Status", ""),
"description": desc,
}
)
return {"module": module, "header": header, "findings": findings}
def build_readme(modules: list[dict]) -> str:
modules = sorted(modules, key=lambda m: m["module"])
all_findings = [
dict(f, module=m["module"]) for m in modules for f in m["findings"]
]
pending = [f for f in all_findings if f["status"] in PENDING_STATUSES]
closed = [
f
for f in all_findings
if f["status"] and f["status"] not in PENDING_STATUSES
]
def sev_key(f: dict) -> tuple:
return (SEVERITY_ORDER.get(f["severity"], 9), f["id"])
pending.sort(key=sev_key)
closed.sort(key=sev_key)
out: list[str] = [
"# Code Reviews",
"",
GENERATED_NOTE,
"",
"Cross-module code review index for the `mxaccessgw` codebase. The review "
"process is defined in [../REVIEW-PROCESS.md](../REVIEW-PROCESS.md).",
"",
"Each module's `findings.md` is the source of truth; this file is generated "
"from them by `regen-readme.py` and must not be edited by hand.",
"",
"## Module status",
"",
"| Module | Reviewer | Date | Commit | Status | Open | Total |",
"|---|---|---|---|---|---|---|",
]
for m in modules:
h = m["header"]
open_n = sum(
1 for f in m["findings"] if f["status"] in PENDING_STATUSES
)
out.append(
f"| [{m['module']}]({m['module']}/findings.md) "
f"| {cell(h.get('Reviewer', ''))} "
f"| {cell(h.get('Review date', ''))} "
f"| {cell(h.get('Commit reviewed', ''))} "
f"| {cell(h.get('Status', ''))} "
f"| {open_n} | {len(m['findings'])} |"
)
out += ["", "## Pending findings", ""]
out.append(
"Findings with status `Open` or `In Progress`, ordered by severity."
)
out.append("")
if pending:
out.append("| ID | Severity | Category | Location | Description |")
out.append("|---|---|---|---|---|")
for f in pending:
out.append(
f"| {cell(f['id'])} | {cell(f['severity'])} "
f"| {cell(f['category'])} | {cell(f['location'])} "
f"| {cell(summarize(f['description']))} |"
)
else:
out.append("_No pending findings._")
out += ["", "## Closed findings", ""]
out.append("Findings with status `Resolved`, `Won't Fix`, or `Deferred`.")
out.append("")
if closed:
out.append("| ID | Severity | Status | Category | Location |")
out.append("|---|---|---|---|---|")
for f in closed:
out.append(
f"| {cell(f['id'])} | {cell(f['severity'])} "
f"| {cell(f['status'])} | {cell(f['category'])} "
f"| {cell(f['location'])} |"
)
else:
out.append("_No closed findings._")
return "\n".join(out) + "\n"
def find_inconsistencies(modules: list[dict]) -> list[str]:
"""Return human-readable problems in the per-module findings.md files.
Checks that each module header's `Open findings` count agrees with its
finding statuses, and that every finding carries a known Status value.
"""
issues: list[str] = []
for m in modules:
open_n = sum(
1 for f in m["findings"] if f["status"] in PENDING_STATUSES
)
declared = m["header"].get("Open findings", "").strip()
if declared != str(open_n):
issues.append(
f"{m['module']}: header 'Open findings' = '{declared}' but "
f"{open_n} finding(s) are Open/In Progress"
)
for f in m["findings"]:
if f["status"] not in KNOWN_STATUSES:
issues.append(
f"{m['module']}: finding {f['id']} has unrecognised "
f"Status '{f['status']}'"
)
return issues
def main(argv: list[str]) -> int:
check = "--check" in argv[1:]
module_dirs = sorted(
d
for d in ROOT.iterdir()
if d.is_dir() and d.name != "_template" and (d / "findings.md").is_file()
)
modules = [parse_module(d / "findings.md") for d in module_dirs]
content = build_readme(modules)
issues = find_inconsistencies(modules)
if check:
stale = (
README.read_text(encoding="utf-8") if README.exists() else ""
) != content
for issue in issues:
print(f"inconsistent: {issue}", file=sys.stderr)
if stale:
print(
"code-reviews/README.md is stale - run regen-readme.py",
file=sys.stderr,
)
if stale or issues:
return 1
print("code-reviews/README.md is up to date and consistent.")
return 0
for issue in issues:
print(f"warning: {issue}", file=sys.stderr)
README.write_text(content, encoding="utf-8", newline="\n")
print(f"Wrote {README} ({len(modules)} modules).")
return 0
if __name__ == "__main__":
raise SystemExit(main(sys.argv))
+158
View File
@@ -0,0 +1,158 @@
#!/usr/bin/env python3
"""Tests for regen-readme.py.
Dependency-free: run with `python code-reviews/test_regen_readme.py`.
Exits 0 if all tests pass, 1 otherwise.
"""
from __future__ import annotations
import importlib.util
import tempfile
import traceback
from pathlib import Path
HERE = Path(__file__).resolve().parent
# regen-readme.py is not an importable module name (hyphen), so load it by path.
_spec = importlib.util.spec_from_file_location("regen_readme", HERE / "regen-readme.py")
regen = importlib.util.module_from_spec(_spec)
_spec.loader.exec_module(regen)
FIXTURE = """# Code Review — Demo
| Field | Value |
|---|---|
| Module | `src/Demo` |
| Reviewer | Tester |
| Review date | 2026-05-18 |
| Commit reviewed | `abc1234` |
| Status | Reviewed |
| Open findings | 1 |
## Findings
### Demo-001
| Field | Value |
|---|---|
| Severity | High |
| Category | Security |
| Location | `src/Demo/File.cs:10` |
| Status | Open |
**Description:** A first problem that matters.
**Recommendation:** Fix it.
**Resolution:** _(open)_
### Demo-002
| Field | Value |
|---|---|
| Severity | Low |
| Category | Documentation & comments |
| Location | `src/Demo/File.cs:20` |
| Status | Resolved |
**Description:** A second, minor problem.
**Recommendation:** Tidy it.
**Resolution:** Fixed in def5678 on 2026-05-18.
"""
def _parse_fixture() -> dict:
"""Write FIXTURE to a temp Demo/findings.md and parse it."""
with tempfile.TemporaryDirectory() as tmp:
path = Path(tmp) / "Demo" / "findings.md"
path.parent.mkdir()
path.write_text(FIXTURE, encoding="utf-8")
return regen.parse_module(path)
def test_first_table_skips_separator_and_field_header():
table = regen.first_table("| Field | Value |\n|---|---|\n| Severity | High |\n")
assert table == {"Severity": "High"}, table
def test_parse_module_header():
m = _parse_fixture()
assert m["module"] == "Demo", m["module"]
assert m["header"]["Reviewer"] == "Tester"
assert m["header"]["Status"] == "Reviewed"
assert m["header"]["Open findings"] == "1"
def test_parse_module_findings():
m = _parse_fixture()
assert len(m["findings"]) == 2, len(m["findings"])
first = m["findings"][0]
assert first["id"] == "Demo-001"
assert first["severity"] == "High"
assert first["category"] == "Security"
assert first["location"] == "`src/Demo/File.cs:10`"
assert first["status"] == "Open"
assert first["description"] == "A first problem that matters."
assert m["findings"][1]["status"] == "Resolved"
def test_build_readme_splits_pending_and_closed():
readme = regen.build_readme([_parse_fixture()])
assert "## Pending findings" in readme
assert "## Closed findings" in readme
pending, closed = readme.split("## Closed findings", 1)
assert "Demo-001" in pending # Open -> pending
assert "Demo-001" not in closed
assert "Demo-002" in closed # Resolved -> closed
assert "_No pending findings._" not in pending
def test_find_inconsistencies_clean_fixture():
assert regen.find_inconsistencies([_parse_fixture()]) == []
def test_find_inconsistencies_detects_wrong_open_count():
m = _parse_fixture()
m["header"]["Open findings"] = "7"
issues = regen.find_inconsistencies([m])
assert len(issues) == 1 and "Open findings" in issues[0], issues
def test_find_inconsistencies_detects_unknown_status():
m = _parse_fixture()
m["findings"][0]["status"] = "Bogus"
issues = regen.find_inconsistencies([m])
# Wrong status also shifts the open count, so expect the status issue present.
assert any("unrecognised Status" in i for i in issues), issues
def test_summarize_truncates_long_text():
long = "x" * 500
out = regen.summarize(long)
assert len(out) <= 240 and out.endswith(""), len(out)
assert regen.summarize("short") == "short"
def main() -> int:
tests = sorted(
(name, fn)
for name, fn in globals().items()
if name.startswith("test_") and callable(fn)
)
failed = 0
for name, fn in tests:
try:
fn()
print(f"PASS {name}")
except Exception: # noqa: BLE001 - test runner reports all failures
failed += 1
print(f"FAIL {name}")
traceback.print_exc()
print(f"\n{len(tests) - failed}/{len(tests)} passed.")
return 1 if failed else 0
if __name__ == "__main__":
raise SystemExit(main())
+5
View File
@@ -223,6 +223,10 @@ constraints remain fully unconstrained after migration.
Key ids are restricted by the parser to ASCII letters, digits, periods, and hyphens so they remain safe to embed in the token format and in URL paths used by administrative tooling.
The CLI is not the only management surface: the dashboard API Keys page
creates, rotates, and revokes keys through the same `IApiKeyAdminStore`. See
[Gateway Dashboard Design](./GatewayDashboardDesign.md#api-keys-page).
## Scope Serialization
Scopes are persisted as a single TEXT column rather than a join table because the set is small, never queried by membership at the database level, and changes atomically with the owning row. `ApiKeyScopeSerializer.Serialize` writes a JSON array sorted with `StringComparer.Ordinal` so equivalent scope sets produce byte-identical column values, which makes audit diffing and database comparisons deterministic:
@@ -276,4 +280,5 @@ Singletons are safe because each operation opens its own short-lived `SqliteConn
- [Gateway Configuration](./GatewayConfiguration.md)
- [Authorization](./Authorization.md)
- [Gateway Dashboard Design](./GatewayDashboardDesign.md)
- [Diagnostics](./Diagnostics.md)
+7
View File
@@ -161,6 +161,12 @@ Glob matching is anchored, case-insensitive, and supports `*` and `?`.
Subtree and tag glob lists are alternatives: matching either list allows that
scope dimension. Empty lists mean unconstrained for that dimension.
Constraints are set when a key is created — through the `apikey create-key`
flags (see [Authentication](./Authentication.md)) or the dashboard API Keys
page create dialog (see
[Gateway Dashboard Design](./GatewayDashboardDesign.md#api-keys-page)). The
dashboard API Keys page also renders each key's effective constraints.
The service checks read constraints for `AddItem`, `AddItem2`, `AddItemBulk`,
`SubscribeBulk`, and `AdviseItemBulk`. It checks write constraints for
`Write`, `Write2`, `WriteSecured`, and `WriteSecured2`. Successful item
@@ -252,6 +258,7 @@ Singleton lifetimes are appropriate because none of the three classes hold per-r
## Related Documentation
- [Authentication](./Authentication.md)
- [Gateway Dashboard Design](./GatewayDashboardDesign.md)
- [Grpc](./Grpc.md)
- [GatewayConfiguration](./GatewayConfiguration.md)
- [Galaxy Repository Browse](./GalaxyRepository.md)
+51
View File
@@ -49,6 +49,7 @@ Endpoint layout:
/dashboard/workers
/dashboard/events
/dashboard/galaxy
/dashboard/apikeys
/dashboard/settings
/dashboard/_blazor
```
@@ -83,6 +84,7 @@ MxGateway.Server
SessionDetailsPage.razor
WorkersPage.razor
EventsPage.razor
ApiKeysPage.razor
SettingsPage.razor
Shared/
MetricCard.razor
@@ -91,6 +93,9 @@ MxGateway.Server
DashboardSnapshotService.cs
DashboardAuthorizationHandler.cs
DashboardAuthenticator.cs
DashboardApiKeyAuthorization.cs
DashboardApiKeyManagementService.cs
DashboardApiKeySummary.cs
DashboardSnapshot.cs
DashboardSessionSummary.cs
DashboardWorkerSummary.cs
@@ -249,6 +254,52 @@ Show aggregate event diagnostics:
Do not display full tag values by default. If value display is later added, make
it opt-in and redacted.
### API keys page
`/dashboard/apikeys` lists the gateway's API keys and, for authorized
operators, manages them. It reads key metadata through the same
`IApiKeyAdminStore` the `apikey` CLI uses, so the dashboard and the CLI act
on one source of truth.
The table shows one row per key:
- key id,
- status (`Active` or `Revoked`),
- display name,
- scopes,
- constraints (rendered as `unconstrained` when none are set),
- created timestamp,
- last-used timestamp.
Key secrets are never listed. Only the peppered hash is stored, and the page
never reconstructs a key. See [Authorization](./Authorization.md#constraint-enforcement)
for what each constraint means and how it is enforced on the gRPC path.
#### Management actions
Create, Rotate, and Revoke controls render only when the signed-in user is
authorized. `DashboardApiKeyAuthorization.CanManage` requires an authenticated
principal that is a member of the LDAP `MxGateway:Ldap:RequiredGroup` — the
same group the dashboard login enforces. An anonymous localhost viewer can read
the table but sees no action controls.
- **Create** opens a dialog for the key id, display name, scope checkboxes
(the `GatewayScopes` catalog), and the optional constraint fields: read and
write subtrees, read and write tag globs, browse subtrees, max write
classification, and the read-alarm-only / read-historized-only flags.
- **Rotate** issues a new secret for an existing key id and invalidates the
old one.
- **Revoke** marks a key revoked; a revoked key cannot be un-revoked.
Create and Rotate return the assembled `mxgw_<keyId>_<secret>` token **once**,
in a one-time banner. It is never shown again, so the operator must copy it
immediately. This mirrors the `apikey create-key` / `rotate-key` CLI.
Every management action appends an `api_key_audit` entry
(`dashboard-create-key`, `dashboard-rotate-key`, `dashboard-revoke-key`) with
the key id and the caller's remote address. Secrets and pepper values are never
logged.
### Settings page
Show read-only effective configuration:
+74 -3
View File
@@ -44,9 +44,22 @@ skipped unless `MXGATEWAY_RUN_LIVE_MXACCESS_TESTS=1` is set because it creates
the installed MXAccess COM object and depends on live provider state.
The live smoke opens a gateway session, launches the x86 worker, runs
`Register`, `AddItem`, and `Advise`, waits a bounded time for one
`OnDataChange`, and closes the session in a `finally` block so the worker gets a
graceful shutdown request even when a command or event assertion fails.
`Register`, `AddItem`, and `Advise`, waits a bounded time for the first
`OnDataChange` event (skipping any earlier bootstrap/registration-state event),
and closes the session in a `finally` block so the worker gets a graceful
shutdown request even when a command or event assertion fails. Cleanup failures
in that `finally` block are logged rather than thrown, so a real assertion
failure is never masked by a shutdown timeout.
`WorkerLiveMxAccessSmokeTests` additionally covers two MXAccess parity paths the
fake-worker tests cannot validate:
- a `Write` round-trip against an advised item, and
- an `AddItem` against an invalid server handle, asserting the MXAccess failure
surfaces in the command reply without faulting the gateway transport.
All three tests are gated by the same `MXGATEWAY_RUN_LIVE_MXACCESS_TESTS=1`
opt-in variable.
Build the worker before running the smoke:
@@ -74,6 +87,64 @@ The test output includes session id, worker process id, command status,
HRESULT/status diagnostics, event sequence and handles, close status, and worker
stdout/stderr lines emitted during the run.
## Live Galaxy Repository
`GalaxyRepositoryLiveTests` in `src/MxGateway.IntegrationTests/Galaxy/` exercises
`GalaxyRepository` directly against the `ZB` Galaxy Repository SQL database. It is
skipped unless `MXGATEWAY_RUN_LIVE_GALAXY_TESTS=1` is set because it depends on a
reachable SQL Server instance and deployed Galaxy state — fake-worker tests cannot
cover the SQL browse RPCs.
The suite covers `TestConnectionAsync`, `GetLastDeployTimeAsync`,
`GetHierarchyAsync`, and `GetAttributesAsync`. `GetHierarchyAsync` and
`GetAttributesAsync` assert a non-empty result, so the connected `ZB` database
must contain a deployed Galaxy, not just an empty schema.
Run the Galaxy live tests explicitly:
```bash
$env:MXGATEWAY_RUN_LIVE_GALAXY_TESTS = "1"
dotnet test src/MxGateway.IntegrationTests/MxGateway.IntegrationTests.csproj --filter FullyQualifiedName~GalaxyRepositoryLiveTests
```
Optional live Galaxy variables:
| Variable | Default | Description |
|----------|---------|-------------|
| `MXGATEWAY_LIVE_GALAXY_CONN` | `Server=localhost;Database=ZB;Integrated Security=True;TrustServerCertificate=True;Encrypt=False;` | Galaxy Repository connection string. Set this when the `ZB` database is on a non-default instance or needs SQL authentication. |
The default connection string targets `ZB` on `localhost` with Windows
authentication, which matches the Galaxy Repository conventions in CLAUDE.md.
## Live LDAP
`DashboardLdapLiveTests` in `src/MxGateway.IntegrationTests/` exercises
`DashboardAuthenticator` against the live GLAuth directory. It is skipped unless
`MXGATEWAY_RUN_LIVE_LDAP_TESTS=1` is set because it binds against the GLAuth
service described in `glauth.md`.
The suite builds the authenticator with a default `GatewayOptions`, so
`LdapOptions.RequiredGroup` keeps its `GwAdmin` default. `GwAdmin` is the
gateway-specific dashboard-admin role and is **not** part of the five baseline
GLAuth role groups — it must be provisioned before the LDAP live tests pass.
`AuthenticateAsync_AdminInGwAdminGroup_Succeeds` fails (rather than skips) when
GLAuth has only the baseline groups, so this is a hard prerequisite beyond "LDAP
is up." See the "Adding a gw-specific group" section of `glauth.md` for the
provisioning step that adds `GwAdmin` and grants it to `admin`.
The suite covers both the success path and the `DashboardAuthenticator` failure
branches: `admin` in `GwAdmin` succeeds; `readonly` is denied for missing group;
`admin` with a wrong password is rejected by the candidate bind without leaking
the password into `FailureMessage`; an unknown username yields no candidate; and
an unreachable LDAP server is absorbed into a failed result rather than throwing.
Run the LDAP live tests explicitly:
```bash
$env:MXGATEWAY_RUN_LIVE_LDAP_TESTS = "1"
dotnet test src/MxGateway.IntegrationTests/MxGateway.IntegrationTests.csproj --filter FullyQualifiedName~DashboardLdapLiveTests
```
## Client E2E Scripts
`scripts/discover-testmachine-tags.ps1` queries the ZB Galaxy Repository for the
+11 -6
View File
@@ -35,17 +35,22 @@ oversized frames, protocol version mismatches, and session mismatches.
## Verification
The frame protocol lives in `MxGateway.Worker.Ipc` (`WorkerFrameReader`,
`WorkerFrameWriter`, `WorkerFrameProtocolOptions`) and is covered by
`src/MxGateway.Worker.Tests/Ipc/WorkerFrameProtocolTests.cs`. The worker is an
x86 process, so build and test it with `-p:Platform=x86`.
Run the focused tests after changing the frame protocol:
```bash
dotnet test src/MxGateway.Tests/MxGateway.Tests.csproj --filter WorkerFrameProtocolTests
```powershell
dotnet test src/MxGateway.Worker.Tests/MxGateway.Worker.Tests.csproj -p:Platform=x86 --filter WorkerFrameProtocolTests
```
Run the gateway build because the frame protocol is part of
`MxGateway.Server`:
Run the x86 worker build because the frame protocol is part of
`MxGateway.Worker`:
```bash
dotnet build src/MxGateway.Server/MxGateway.Server.csproj
```powershell
dotnet build src/MxGateway.Worker/MxGateway.Worker.csproj -p:Platform=x86
```
## Related Documentation
+5 -2
View File
@@ -579,8 +579,11 @@ Policy:
- command exceptions return structured command fault with HRESULT if known,
- stale sessions are closed by lease timeout,
- stuck workers are killed by process id,
- gateway restart should not attempt to reattach old workers unless explicitly
designed; first version should terminate orphaned workers on startup.
- gateway restart does not reattach old workers; `OrphanWorkerCleanupHostedService`
runs `OrphanWorkerTerminator` once on startup — before the server accepts
sessions — to kill leftover `MxGateway.Worker.exe` processes (matched by the
configured worker executable path, or by image name when the x64 gateway cannot
introspect the x86 worker's module) left behind by a previous unclean run.
Because each client owns one worker, a crash or leak affects only that session.
+280
View File
@@ -0,0 +1,280 @@
# GLAuth — LDAP authn reference for mxaccessgw
GLAuth is a lightweight LDAP server installed on this dev box at
`C:\publish\glauth\` and run as a Windows service via NSSM. It already
backs the LmxOpcUa OPC UA server's UserName-token authn and the LmxOpcUa
Admin UI's cookie login; this doc captures everything mxaccessgw needs
to consume the same directory so a single set of dev credentials covers
both stacks.
The authoritative copy of LmxOpcUa's reference lives at
`C:\publish\glauth\auth.md`. This doc is a redistilled view tailored to
mxaccessgw — what users + groups are already provisioned, how to bind
against them, and what's needed to add a gw-specific role.
## Connection details
| Setting | Value |
|---|---|
| Protocol | LDAP (unencrypted) |
| Host | `localhost` |
| Port | `3893` |
| LDAPS | disabled in dev (set `[ldaps]` block to enable) |
| Base DN | `dc=lmxopcua,dc=local` |
| Bind DN format | `cn={username},dc=lmxopcua,dc=local` |
| Group OU | `ou=<groupname>,ou=groups,dc=lmxopcua,dc=local` |
| Failed-bind throttle | 3 fails → 10-minute IP lockout (per `[behaviors]`) |
## Pre-existing groups (LmxOpcUa role taxonomy)
These map cleanly onto MxAccess capability boundaries — mxaccessgw
should reuse them rather than define parallel groups so an operator with
LmxOpcUa write rights doesn't need a second account for the gw.
| Group | GID | DN | LmxOpcUa meaning | Suggested mxgw mapping |
|---|---|---|---|---|
| ReadOnly | 5501 | `ou=ReadOnly,ou=groups,dc=lmxopcua,dc=local` | Browse + read OPC UA nodes | `Browse` + `Subscribe` (read paths only) |
| WriteOperate | 5502 | `ou=WriteOperate,ou=groups,dc=lmxopcua,dc=local` | Write FreeAccess / Operate attrs | `Write` (plain) |
| WriteTune | 5504 | `ou=WriteTune,ou=groups,dc=lmxopcua,dc=local` | Write Tune attrs | `WriteSecured` (Tune only) |
| WriteConfigure | 5505 | `ou=WriteConfigure,ou=groups,dc=lmxopcua,dc=local` | Write Configure attrs | `WriteSecured` (Configure) |
| AlarmAck | 5503 | `ou=AlarmAck,ou=groups,dc=lmxopcua,dc=local` | Acknowledge alarms | gw alarm-ack RPC, when added |
**A user can be in multiple groups** — `othergroups = [...]` in the
config is a list. `admin` is the canonical example (in every role
group below).
## Pre-provisioned users
| Username | Password | UID | Primary group | Other groups | Capabilities |
|---|---|---|---|---|---|
| `readonly` | `readonly123` | 5001 | ReadOnly | — | Browse, read |
| `writeop` | `writeop123` | 5002 | WriteOperate | — | + plain Write |
| `writetune` | `writetune123` | 5005 | WriteTune | — | + WriteSecured (Tune) |
| `writeconfig` | `writeconfig123` | 5006 | WriteConfigure | — | + WriteSecured (Configure) |
| `alarmack` | `alarmack123` | 5003 | AlarmAck | — | Alarm acknowledgment |
| `admin` | `admin123` | 5004 | ReadOnly | WriteOperate, AlarmAck, WriteTune, WriteConfigure | All roles |
| `serviceaccount` | `serviceaccount123` | 5999 | ReadOnly | — | LDAP search capability (for bind-then-search) |
For mxaccessgw dev, `admin` covers every gw-side capability test;
`readonly` is the right "negative" case for proving Browse-OK /
Write-denied.
The gateway dashboard adds one role beyond this LmxOpcUa taxonomy:
`GwAdmin`. `LdapOptions.RequiredGroup` defaults to `GwAdmin`, so the
dashboard login and `DashboardLdapLiveTests` require `admin` to be a
member of a `GwAdmin` group. `GwAdmin` is **not** in the baseline
GLAuth config — it must be provisioned before dashboard authn or the
LDAP live tests work. See [Provisioning the GwAdmin
group](#provisioning-the-gwadmin-group) below.
## Two bind patterns
### 1. Direct bind (simplest)
```
DN: cn=admin,dc=lmxopcua,dc=local
Password: admin123
```
Construct the DN from the username; bind. Works on GLAuth because
`backend.nameformat = "cn"` and `groupformat = "ou"` are set in the
config. **Doesn't translate to Active Directory** — AD users are keyed
by `sAMAccountName`, not `cn`. Use this only for dev convenience.
### 2. Bind-then-search (production-grade)
```
1. Bind as the service account (cn=serviceaccount,dc=lmxopcua,dc=local
/ serviceaccount123).
2. Search under dc=lmxopcua,dc=local with filter
(uid=<entered-username>) — or any attribute the deployment
identifies users by. GLAuth populates uid + cn.
3. Read the returned entry's DN + memberOf list (groups).
4. Bind again as the discovered DN with the entered password. If that
succeeds, authn passes; the memberOf values become the role set.
```
The second bind is the actual password check — the search is just a DN
discovery. This is the AD-friendly path: AD's
`tokenGroups` / `LDAP_MATCHING_RULE_IN_CHAIN` flatten nested groups, but
that's an enhancement, not required for first-pass dev.
LmxOpcUa's `Server/Security/LdapUserAuthenticator.cs` ships a working
implementation of this pattern using `Novell.Directory.Ldap.NETStandard`
v3.6.0 — copy the bind-then-search loop from there if mxaccessgw wants
to avoid re-deriving the LDAP escape-string handling.
## Suggested mxgw configuration shape
A YAML/JSON section for mxaccessgw that mirrors LmxOpcUa's `LdapOptions`
record:
```yaml
ldap:
enabled: true
server: localhost
port: 3893
useTls: false
allowInsecureLdap: true # dev only
searchBase: "dc=lmxopcua,dc=local"
serviceAccountDn: "cn=serviceaccount,dc=lmxopcua,dc=local"
serviceAccountPassword: "serviceaccount123"
userNameAttribute: "uid" # GLAuth populates this; AD uses sAMAccountName
displayNameAttribute: "cn"
groupAttribute: "memberOf"
groupToRole:
ReadOnly: "Browse"
WriteOperate: "Write"
WriteTune: "WriteSecured"
WriteConfigure: "WriteSecured"
AlarmAck: "AlarmAck"
```
`groupAttribute` returns full DNs like
`ou=ReadOnly,ou=groups,dc=lmxopcua,dc=local` — the authenticator
should strip the leading `ou=` (or `cn=` against AD) RDN value and
look that up in `groupToRole`.
## Provisioning the GwAdmin group
`GwAdmin` is the gateway-specific dashboard-admin role. It is the
default `LdapOptions.RequiredGroup`, so the dashboard cookie login and
`DashboardLdapLiveTests` (`MXGATEWAY_RUN_LIVE_LDAP_TESTS=1`) reject
`admin` until a `GwAdmin` group exists and `admin` is a member.
GLAuth's baseline config ships only the five LmxOpcUa role groups, so
`GwAdmin` must be added to GLAuth rather than run from a separate LDAP
server:
1. Edit `C:\publish\glauth\glauth.cfg`
2. Append the group:
```toml
[[groups]]
name = "GwAdmin"
gidnumber = 5510 # pick the next free GID
```
3. Add `5510` to `admin`'s `othergroups` list so `admin` resolves the
`GwAdmin` role. Add it to any other user that needs dashboard-admin
rights. Or create a dedicated user:
```toml
[[users]]
name = "gwadmin"
givenname = "Gateway"
sn = "Admin"
mail = "gwadmin@lmxopcua.local"
uidnumber = 5010
primarygroup = 5510
passsha256 = "<sha256 of the password — see below>"
```
4. `nssm restart GLAuth`
After the restart, `admin`'s `memberOf` includes
`ou=GwAdmin,ou=groups,dc=lmxopcua,dc=local`, which the authenticator
strips to `GwAdmin` and matches against `RequiredGroup`. The same
pattern applies to any future permission that doesn't fit the existing
five roles.
Generate `passsha256` from a plaintext password:
```powershell
# Windows / PowerShell
$bytes = [System.Text.Encoding]::UTF8.GetBytes("yourpassword")
$hash = [System.Security.Cryptography.SHA256]::Create().ComputeHash($bytes)
-join ($hash | ForEach-Object { $_.ToString("x2") })
```
```bash
# WSL / git-bash
echo -n "yourpassword" | openssl dgst -sha256
```
## Quick verification
From mxaccessgw's dev box, prove the directory is reachable:
```powershell
# Plain bind via PowerShell + System.DirectoryServices.Protocols
$ldap = New-Object System.DirectoryServices.Protocols.LdapConnection("localhost:3893")
$ldap.AuthType = [System.DirectoryServices.Protocols.AuthType]::Basic
$ldap.SessionOptions.ProtocolVersion = 3
$ldap.SessionOptions.SecureSocketLayer = $false
$cred = New-Object System.Net.NetworkCredential("cn=admin,dc=lmxopcua,dc=local","admin123")
$ldap.Bind($cred)
"Bind OK"
```
Or via `ldapsearch` if you have OpenLDAP CLI tools:
```bash
ldapsearch -x -H ldap://localhost:3893 \
-D "cn=admin,dc=lmxopcua,dc=local" -w admin123 \
-b "dc=lmxopcua,dc=local" "(uid=admin)"
```
The response should list `admin`'s entry with `memberOf` populated for
all five role groups — plus `GwAdmin` once the gateway-specific group
is provisioned.
## Service management
```powershell
# Status / start / stop / restart
nssm status GLAuth
nssm start GLAuth
nssm stop GLAuth
nssm restart GLAuth
# Inspect what NSSM was told to launch
nssm get GLAuth Parameters
```
Logs:
| File | Purpose |
|---|---|
| `C:\publish\glauth\logs\stdout.log` | Bind events, search responses |
| `C:\publish\glauth\logs\stderr.log` | Startup errors, config parse failures |
After editing `glauth.cfg`, always tail `stderr.log` after the restart
to catch a fat-fingered TOML before it bites at first bind:
```powershell
nssm restart GLAuth
Get-Content C:\publish\glauth\logs\stderr.log -Tail 20 -Wait
```
## Active Directory migration cheat-sheet
LmxOpcUa's `LdapOptions` xml-doc captures the AD overrides; same set
applies to mxaccessgw verbatim. Keys that change:
| Field | GLAuth dev value | AD production value |
|---|---|---|
| `Server` | `localhost` | a domain controller FQDN, or the domain itself |
| `Port` | `3893` | `636` (LDAPS) — AD increasingly rejects plain bind under LDAP-signing enforcement |
| `UseTls` | `false` | `true` |
| `AllowInsecureLdap` | `true` | `false` |
| `SearchBase` | `dc=lmxopcua,dc=local` | `DC=corp,DC=example,DC=com` |
| `ServiceAccountDn` | `cn=serviceaccount,dc=lmxopcua,dc=local` | `CN=MxGwSvc,OU=Service Accounts,DC=corp,...` |
| `UserNameAttribute` | `uid` | `sAMAccountName` (or `userPrincipalName`) |
| `GroupAttribute` | `memberOf` (unchanged) | `memberOf` (unchanged) |
`memberOf` returns full DNs; the authenticator strips the leading
`CN=` value and uses it as the lookup key in `groupToRole`. Nested
groups are **not** auto-expanded; either flatten in the directory or
add a `tokenGroups` query as an enhancement.
## Security notes for production
- **Plaintext passwords in `glauth.cfg` are dev-only.** The config is
unencrypted on disk; anyone with read access to `C:\publish\glauth\`
can SHA256-rainbow-table the entries. Treat the dev creds as
throwaway. Production LDAP is Active Directory.
- The 3-fail / 10-minute lockout is per source IP, not per user — a
shared NAT can lock out a whole office. Tunable in `[behaviors]`.
- LDAPS isn't enabled in dev; binding sends passwords cleartext on the
wire. Fine for `localhost`, never expose port 3893 off-box without
enabling TLS first.
+20
View File
@@ -0,0 +1,20 @@
# Verifies code-reviews/README.md is regenerated from, and consistent with, the
# per-module findings.md files. Intended as a CI / pre-commit gate.
#
# Exits non-zero when README.md is stale, when a module header's "Open findings"
# count disagrees with its finding statuses, or when a finding carries an
# unrecognised Status value. See REVIEW-PROCESS.md section 5.
[CmdletBinding()]
param()
Set-StrictMode -Version Latest
$ErrorActionPreference = "Stop"
$repoRoot = Resolve-Path (Join-Path $PSScriptRoot "..")
$script = Join-Path $repoRoot "code-reviews/regen-readme.py"
# The bare `python3` alias on this platform resolves to the Windows Store stub;
# `python` is the real interpreter.
& python $script --check
exit $LASTEXITCODE
@@ -0,0 +1,115 @@
using System.Security.Claims;
using Microsoft.Extensions.Logging.Abstractions;
using Microsoft.Extensions.Options;
using MxGateway.Server.Configuration;
using MxGateway.Server.Dashboard;
namespace MxGateway.IntegrationTests;
public sealed class DashboardLdapLiveTests
{
[LiveLdapFact]
[Trait("Category", "LiveLdap")]
public async Task AuthenticateAsync_AdminInGwAdminGroup_Succeeds()
{
DashboardAuthenticator authenticator = CreateAuthenticator();
DashboardAuthenticationResult result = await authenticator.AuthenticateAsync(
"admin",
"admin123",
CancellationToken.None);
Assert.True(result.Succeeded);
Assert.NotNull(result.Principal);
Assert.Equal("admin", result.Principal.FindFirst(ClaimTypes.NameIdentifier)?.Value);
Assert.Contains(result.Principal.Claims, claim =>
claim.Type == DashboardAuthenticationDefaults.LdapGroupClaimType
&& claim.Value.Contains("GwAdmin", StringComparison.OrdinalIgnoreCase));
}
[LiveLdapFact]
[Trait("Category", "LiveLdap")]
public async Task AuthenticateAsync_ReadOnlyUserMissingGwAdminGroup_Fails()
{
DashboardAuthenticator authenticator = CreateAuthenticator();
DashboardAuthenticationResult result = await authenticator.AuthenticateAsync(
"readonly",
"readonly123",
CancellationToken.None);
Assert.False(result.Succeeded);
Assert.Null(result.Principal);
Assert.DoesNotContain("readonly123", result.FailureMessage, StringComparison.Ordinal);
}
[LiveLdapFact]
[Trait("Category", "LiveLdap")]
public async Task AuthenticateAsync_AdminWithWrongPassword_FailsWithoutLeakingPassword()
{
// Exercises the LdapException branch: the user exists and the service
// account search succeeds, but the candidate bind is rejected.
const string wrongPassword = "definitely-not-the-admin-password";
DashboardAuthenticator authenticator = CreateAuthenticator();
DashboardAuthenticationResult result = await authenticator.AuthenticateAsync(
"admin",
wrongPassword,
CancellationToken.None);
Assert.False(result.Succeeded);
Assert.Null(result.Principal);
Assert.DoesNotContain(wrongPassword, result.FailureMessage, StringComparison.Ordinal);
}
[LiveLdapFact]
[Trait("Category", "LiveLdap")]
public async Task AuthenticateAsync_UnknownUsername_Fails()
{
// Exercises the `candidate is null` branch: the service-account search
// returns no entry, so no candidate bind is attempted.
DashboardAuthenticator authenticator = CreateAuthenticator();
DashboardAuthenticationResult result = await authenticator.AuthenticateAsync(
"no-such-user-9f3c1",
"irrelevant-password",
CancellationToken.None);
Assert.False(result.Succeeded);
Assert.Null(result.Principal);
}
[LiveLdapFact]
[Trait("Category", "LiveLdap")]
public async Task AuthenticateAsync_ServerUnreachable_FailsWithoutThrowing()
{
// Exercises the connect-failure path: a closed loopback port produces a
// connection error that DashboardAuthenticator must absorb into a Fail
// result rather than propagating an exception to the dashboard.
DashboardAuthenticator authenticator = new(
Options.Create(new GatewayOptions
{
Ldap = new LdapOptions
{
// 1 is a reserved port number that no LDAP server listens on.
Port = 1,
},
}),
NullLogger<DashboardAuthenticator>.Instance);
DashboardAuthenticationResult result = await authenticator.AuthenticateAsync(
"admin",
"admin123",
CancellationToken.None);
Assert.False(result.Succeeded);
Assert.Null(result.Principal);
}
private static DashboardAuthenticator CreateAuthenticator()
{
return new DashboardAuthenticator(
Options.Create(new GatewayOptions()),
NullLogger<DashboardAuthenticator>.Instance);
}
}
@@ -0,0 +1,20 @@
namespace MxGateway.IntegrationTests;
public sealed class LiveLdapFactAttribute : FactAttribute
{
public const string EnableVariableName = "MXGATEWAY_RUN_LIVE_LDAP_TESTS";
public LiveLdapFactAttribute()
{
if (!Enabled)
{
Skip = $"Set {EnableVariableName}=1 to run live LDAP tests.";
}
}
public static bool Enabled =>
string.Equals(
Environment.GetEnvironmentVariable(EnableVariableName),
"1",
StringComparison.Ordinal);
}
@@ -86,8 +86,15 @@ public sealed class WorkerLiveMxAccessSmokeTests(ITestOutputHelper output)
LogReply("Advise", adviseReply);
Assert.Equal(ProtocolStatusCode.Ok, adviseReply.ProtocolStatus.Code);
// A live MXAccess provider can deliver an initial registration-state
// or bad-quality bootstrap event before the OnDataChange the worker
// is contracted to emit. Match on the family rather than trusting
// whatever event arrives first so a genuine ordering defect cannot
// pass spuriously or leave a later wrong event unverified.
MxEvent dataChange = await eventWriter
.WaitForFirstMessageAsync(IntegrationTestEnvironment.LiveMxAccessEventTimeout)
.WaitForMessageAsync(
candidate => candidate.Family == MxEventFamily.OnDataChange,
IntegrationTestEnvironment.LiveMxAccessEventTimeout)
.ConfigureAwait(false);
LogEvent(dataChange);
@@ -98,22 +105,184 @@ public sealed class WorkerLiveMxAccessSmokeTests(ITestOutputHelper output)
}
finally
{
try
{
if (!string.IsNullOrWhiteSpace(sessionId))
{
await CloseSessionAsync(fixture, sessionId).ConfigureAwait(false);
}
await ShutDownAsync(fixture, processFactory, sessionId, streamTask).ConfigureAwait(false);
}
}
if (streamTask is not null)
/// <summary>
/// Verifies that a Write command round-trips through live MXAccess against an advised item.
/// </summary>
[LiveMxAccessFact]
[Trait("Category", "LiveMxAccess")]
public async Task GatewaySession_WithLiveWorker_WritesValueToAdvisedItem()
{
string workerExecutablePath = IntegrationTestEnvironment.ResolveLiveMxAccessWorkerExecutablePath();
Assert.True(
File.Exists(workerExecutablePath),
$"Live MXAccess worker executable was not found at {workerExecutablePath}. Build the worker or set {IntegrationTestEnvironment.LiveMxAccessWorkerExecutableVariableName}.");
TestWorkerProcessFactory processFactory = new(output);
await using GatewayServiceFixture fixture = new(workerExecutablePath, processFactory, output);
string? sessionId = null;
Task? streamTask = null;
try
{
OpenSessionReply openReply = await fixture.Service.OpenSession(
new OpenSessionRequest
{
await streamTask.WaitAsync(StreamShutdownTimeout).ConfigureAwait(false);
}
}
finally
ClientSessionName = "live-mxaccess-write",
ClientCorrelationId = "live-open-write",
CommandTimeout = Duration.FromTimeSpan(CommandTimeout),
},
new TestServerCallContext()).ConfigureAwait(false);
sessionId = openReply.SessionId;
Assert.Equal(ProtocolStatusCode.Ok, openReply.ProtocolStatus.Code);
RecordingServerStreamWriter<MxEvent> eventWriter = new();
streamTask = fixture.Service.StreamEvents(
new StreamEventsRequest { SessionId = sessionId },
eventWriter,
new TestServerCallContext());
MxCommandReply registerReply = await fixture.Service.Invoke(
CreateRegisterRequest(sessionId),
new TestServerCallContext()).ConfigureAwait(false);
LogReply("Register", registerReply);
Assert.Equal(ProtocolStatusCode.Ok, registerReply.ProtocolStatus.Code);
MxCommandReply addItemReply = await fixture.Service.Invoke(
CreateAddItemRequest(sessionId, registerReply.Register.ServerHandle),
new TestServerCallContext()).ConfigureAwait(false);
LogReply("AddItem", addItemReply);
Assert.Equal(ProtocolStatusCode.Ok, addItemReply.ProtocolStatus.Code);
Assert.True(addItemReply.AddItem.ItemHandle > 0);
MxCommandReply adviseReply = await fixture.Service.Invoke(
CreateAdviseRequest(
sessionId,
registerReply.Register.ServerHandle,
addItemReply.AddItem.ItemHandle),
new TestServerCallContext()).ConfigureAwait(false);
LogReply("Advise", adviseReply);
Assert.Equal(ProtocolStatusCode.Ok, adviseReply.ProtocolStatus.Code);
MxCommandReply writeReply = await fixture.Service.Invoke(
CreateWriteRequest(
sessionId,
registerReply.Register.ServerHandle,
addItemReply.AddItem.ItemHandle),
new TestServerCallContext()).ConfigureAwait(false);
LogReply("Write", writeReply);
// The gateway must always report a protocol-level status. MXAccess
// parity details (a write rejection, a secured-item failure) belong
// in hresult / statuses, not in a transport failure — the command
// itself completed its round-trip to the worker and back.
Assert.Equal(ProtocolStatusCode.Ok, writeReply.ProtocolStatus.Code);
Assert.Equal(MxCommandKind.Write, writeReply.Kind);
}
finally
{
await ShutDownAsync(fixture, processFactory, sessionId, streamTask).ConfigureAwait(false);
}
}
/// <summary>
/// Verifies that an AddItem against an invalid server handle surfaces the MXAccess failure
/// without faulting the gateway transport, exercising the invalid-handle parity path.
/// </summary>
[LiveMxAccessFact]
[Trait("Category", "LiveMxAccess")]
public async Task GatewaySession_WithLiveWorker_InvalidHandleCommand_SurfacesFailureWithoutTransportFault()
{
string workerExecutablePath = IntegrationTestEnvironment.ResolveLiveMxAccessWorkerExecutablePath();
Assert.True(
File.Exists(workerExecutablePath),
$"Live MXAccess worker executable was not found at {workerExecutablePath}. Build the worker or set {IntegrationTestEnvironment.LiveMxAccessWorkerExecutableVariableName}.");
TestWorkerProcessFactory processFactory = new(output);
await using GatewayServiceFixture fixture = new(workerExecutablePath, processFactory, output);
string? sessionId = null;
try
{
OpenSessionReply openReply = await fixture.Service.OpenSession(
new OpenSessionRequest
{
ClientSessionName = "live-mxaccess-invalid-handle",
ClientCorrelationId = "live-open-invalid",
CommandTimeout = Duration.FromTimeSpan(CommandTimeout),
},
new TestServerCallContext()).ConfigureAwait(false);
sessionId = openReply.SessionId;
Assert.Equal(ProtocolStatusCode.Ok, openReply.ProtocolStatus.Code);
// Deliberately skip Register: server handle 0x7FFFFFFF was never
// issued by MXAccess. The worker must invoke COM and relay the
// invalid-handle failure rather than the gateway short-circuiting.
MxCommandReply addItemReply = await fixture.Service.Invoke(
CreateAddItemRequest(sessionId, serverHandle: int.MaxValue),
new TestServerCallContext()).ConfigureAwait(false);
LogReply("AddItem(invalid-handle)", addItemReply);
// MXAccess parity: an invalid handle is an MXAccess-level failure.
// The command still completed its worker round-trip, so the gateway
// protocol status is Ok and the failure shows up in hresult / the
// status proxies — it must not be reported as a transport fault.
Assert.NotEqual(ProtocolStatusCode.Ok, addItemReply.ProtocolStatus.Code);
Assert.True(
addItemReply.AddItem is null || addItemReply.AddItem.ItemHandle <= 0,
"Invalid-handle AddItem must not yield a usable item handle.");
}
finally
{
await ShutDownAsync(fixture, processFactory, sessionId, streamTask: null).ConfigureAwait(false);
}
}
/// <summary>
/// Closes the session and drains the event stream / worker processes without letting a
/// cleanup timeout mask the original failure from the test body.
/// </summary>
private async Task ShutDownAsync(
GatewayServiceFixture fixture,
TestWorkerProcessFactory processFactory,
string? sessionId,
Task? streamTask)
{
try
{
if (!string.IsNullOrWhiteSpace(sessionId))
{
await processFactory.WaitForProcessesAsync(StreamShutdownTimeout).ConfigureAwait(false);
await CloseSessionAsync(fixture, sessionId).ConfigureAwait(false);
}
if (streamTask is not null)
{
await streamTask.WaitAsync(StreamShutdownTimeout).ConfigureAwait(false);
}
}
catch (Exception ex)
{
// Cleanup runs in a finally block. A TimeoutException (or a faulted
// StreamEvents task) here would otherwise replace any assertion
// failure raised in the try block. Log it and let the original
// failure surface.
output.WriteLine($"Cleanup error during session/stream shutdown: {ex}");
}
try
{
await processFactory.WaitForProcessesAsync(StreamShutdownTimeout).ConfigureAwait(false);
}
catch (Exception ex)
{
output.WriteLine($"Cleanup error while waiting for worker processes to exit: {ex}");
}
}
@@ -175,6 +344,32 @@ public sealed class WorkerLiveMxAccessSmokeTests(ITestOutputHelper output)
};
}
private static MxCommandRequest CreateWriteRequest(
string sessionId,
int serverHandle,
int itemHandle)
{
return new MxCommandRequest
{
SessionId = sessionId,
ClientCorrelationId = "live-write",
Command = new MxCommand
{
Kind = MxCommandKind.Write,
Write = new WriteCommand
{
ServerHandle = serverHandle,
ItemHandle = itemHandle,
Value = new MxValue
{
DataType = MxDataType.Integer,
Int32Value = 1,
},
},
},
};
}
private async Task CloseSessionAsync(
GatewayServiceFixture fixture,
string sessionId)
@@ -321,8 +516,8 @@ public sealed class WorkerLiveMxAccessSmokeTests(ITestOutputHelper output)
private sealed class RecordingServerStreamWriter<T> : IServerStreamWriter<T>
{
private readonly object syncRoot = new();
private readonly TaskCompletionSource<T> firstMessage = new(TaskCreationOptions.RunContinuationsAsynchronously);
private readonly List<T> messages = [];
private readonly SemaphoreSlim messageArrived = new(0);
/// <summary>
/// All messages that have been written to the stream.
@@ -344,7 +539,7 @@ public sealed class WorkerLiveMxAccessSmokeTests(ITestOutputHelper output)
public WriteOptions? WriteOptions { get; set; }
/// <summary>
/// Records the message and completes the first-message task.
/// Records the message and signals any pending waiter.
/// </summary>
/// <param name="message">The message to write.</param>
public Task WriteAsync(T message)
@@ -354,18 +549,51 @@ public sealed class WorkerLiveMxAccessSmokeTests(ITestOutputHelper output)
messages.Add(message);
}
firstMessage.TrySetResult(message);
messageArrived.Release();
return Task.CompletedTask;
}
/// <summary>
/// Waits for the first message up to the specified timeout.
/// Waits for the first recorded message that satisfies <paramref name="predicate"/>,
/// up to the specified timeout. Earlier non-matching messages (for example a
/// registration-state bootstrap event) are skipped rather than treated as the result.
/// </summary>
/// <param name="timeout">The maximum time to wait.</param>
/// <returns>The first message written to the stream.</returns>
public async Task<T> WaitForFirstMessageAsync(TimeSpan timeout)
/// <param name="predicate">Filter the awaited message must satisfy.</param>
/// <param name="timeout">The maximum total time to wait.</param>
/// <returns>The first message that satisfies the predicate.</returns>
public async Task<T> WaitForMessageAsync(
Func<T, bool> predicate,
TimeSpan timeout)
{
return await firstMessage.Task.WaitAsync(timeout).ConfigureAwait(false);
using CancellationTokenSource timeoutCancellation = new(timeout);
int scanned = 0;
while (true)
{
T[] snapshot;
lock (syncRoot)
{
snapshot = messages.ToArray();
}
for (; scanned < snapshot.Length; scanned++)
{
if (predicate(snapshot[scanned]))
{
return snapshot[scanned];
}
}
try
{
await messageArrived.WaitAsync(timeoutCancellation.Token).ConfigureAwait(false);
}
catch (OperationCanceledException) when (timeoutCancellation.IsCancellationRequested)
{
throw new TimeoutException(
$"No stream message satisfied the predicate within {timeout}. Recorded {scanned} message(s).");
}
}
}
}
@@ -0,0 +1,15 @@
namespace MxGateway.Server.Configuration;
public sealed record EffectiveLdapConfiguration(
bool Enabled,
string Server,
int Port,
bool UseTls,
bool AllowInsecureLdap,
string SearchBase,
string ServiceAccountDn,
string ServiceAccountPassword,
string UserNameAttribute,
string DisplayNameAttribute,
string GroupAttribute,
string RequiredGroup);
@@ -0,0 +1,28 @@
namespace MxGateway.Server.Configuration;
public sealed class LdapOptions
{
public bool Enabled { get; init; } = true;
public string Server { get; init; } = "localhost";
public int Port { get; init; } = 3893;
public bool UseTls { get; init; }
public bool AllowInsecureLdap { get; init; } = true;
public string SearchBase { get; init; } = "dc=lmxopcua,dc=local";
public string ServiceAccountDn { get; init; } = "cn=serviceaccount,dc=lmxopcua,dc=local";
public string ServiceAccountPassword { get; init; } = "serviceaccount123";
public string UserNameAttribute { get; init; } = "cn";
public string DisplayNameAttribute { get; init; } = "cn";
public string GroupAttribute { get; init; } = "memberOf";
public string RequiredGroup { get; init; } = "GwAdmin";
}
@@ -7,6 +7,7 @@
<meta name="viewport" content="width=device-width, initial-scale=1" />
<base href="@DashboardBaseHref" />
<link rel="stylesheet" href="/lib/bootstrap/css/bootstrap.min.css" />
<link rel="stylesheet" href="/css/theme.css" />
<link rel="stylesheet" href="/css/dashboard.css" />
<HeadOutlet @rendermode="InteractiveServer" />
</head>
@@ -2,55 +2,34 @@
@inject IOptions<GatewayOptions> GatewayOptions
<div class="dashboard-shell">
<nav class="navbar navbar-expand-lg bg-body border-bottom dashboard-navbar">
<div class="container-fluid">
<a class="navbar-brand" href="">MXAccess Gateway</a>
<button class="navbar-toggler" type="button" data-bs-toggle="collapse" data-bs-target="#dashboardNav"
aria-controls="dashboardNav" aria-expanded="false" aria-label="Toggle navigation">
<span class="navbar-toggler-icon"></span>
</button>
<div class="collapse navbar-collapse" id="dashboardNav">
<ul class="navbar-nav me-auto mb-2 mb-lg-0">
<li class="nav-item">
<NavLink class="nav-link" href="" Match="NavLinkMatch.All">Overview</NavLink>
</li>
<li class="nav-item">
<NavLink class="nav-link" href="sessions">Sessions</NavLink>
</li>
<li class="nav-item">
<NavLink class="nav-link" href="workers">Workers</NavLink>
</li>
<li class="nav-item">
<NavLink class="nav-link" href="events">Events</NavLink>
</li>
<li class="nav-item">
<NavLink class="nav-link" href="galaxy">Galaxy</NavLink>
</li>
<li class="nav-item">
<NavLink class="nav-link" href="apikeys">API Keys</NavLink>
</li>
<li class="nav-item">
<NavLink class="nav-link" href="settings">Settings</NavLink>
</li>
</ul>
<AuthorizeView>
<Authorized Context="authState">
<div class="d-flex align-items-center gap-2">
<span class="navbar-text">@authState.User.Identity?.Name</span>
<form method="post" action="@DashboardPath("/logout")">
<AntiforgeryToken />
<button class="btn btn-outline-secondary btn-sm" type="submit">Sign out</button>
</form>
</div>
</Authorized>
<NotAuthorized>
<a class="btn btn-outline-secondary btn-sm" href="@DashboardPath("/login")">Sign in</a>
</NotAuthorized>
</AuthorizeView>
</div>
</div>
</nav>
<main class="container-fluid dashboard-content">
<header class="app-bar">
<a class="brand" href=""><span class="mark">&#9646;</span> MXAccess Gateway</a>
<nav class="app-nav">
<NavLink href="" Match="NavLinkMatch.All">Overview</NavLink>
<NavLink href="sessions">Sessions</NavLink>
<NavLink href="workers">Workers</NavLink>
<NavLink href="events">Events</NavLink>
<NavLink href="galaxy">Galaxy</NavLink>
<NavLink href="apikeys">API Keys</NavLink>
<NavLink href="settings">Settings</NavLink>
</nav>
<span class="spacer"></span>
<AuthorizeView>
<Authorized Context="authState">
<div class="app-user">
<span class="meta">@authState.User.Identity?.Name</span>
<form method="post" action="@DashboardPath("/logout")">
<AntiforgeryToken />
<button class="btn btn-outline-secondary btn-sm" type="submit">Sign out</button>
</form>
</div>
</Authorized>
<NotAuthorized>
<a class="btn btn-outline-secondary btn-sm" href="@DashboardPath("/login")">Sign in</a>
</NotAuthorized>
</AuthorizeView>
</header>
<main class="page">
@Body
</main>
</div>
@@ -0,0 +1,459 @@
@page "/apikeys"
@page "/dashboard/apikeys"
@inherits DashboardPageBase
@inject AuthenticationStateProvider AuthenticationStateProvider
@inject IDashboardApiKeyManagementService ApiKeyManagementService
<PageTitle>Dashboard API Keys</PageTitle>
@if (Snapshot is null)
{
<div class="empty-state">Loading API keys.</div>
}
else
{
<div class="dashboard-page-header">
<div>
<h1>API Keys</h1>
<div class="text-secondary">@Snapshot.ApiKeys.Count key rows</div>
</div>
@if (CanManageApiKeys)
{
<button type="button" class="btn btn-primary" @onclick="OpenCreateDialog">
Create API Key
</button>
}
</div>
@if (CanManageApiKeys)
{
@if (!string.IsNullOrWhiteSpace(ResultMessage))
{
<div class="alert @(LastOperationSucceeded ? "alert-success" : "alert-danger")" role="alert">
@ResultMessage
@if (!string.IsNullOrWhiteSpace(LastGeneratedApiKey))
{
<div class="mt-2">
<code class="one-time-secret">@LastGeneratedApiKey</code>
</div>
}
</div>
}
@if (IsCreateDialogOpen)
{
<div class="modal-backdrop fade show"></div>
<div class="modal fade show api-key-create-modal" role="dialog" aria-modal="true" aria-labelledby="createApiKeyTitle">
<div class="modal-dialog modal-xl modal-dialog-scrollable">
<div class="modal-content">
<EditForm Model="@CreateModel" OnSubmit="@CreateApiKeyAsync">
<div class="modal-header">
<h2 class="modal-title h5" id="createApiKeyTitle">Create API Key</h2>
<button type="button" class="btn-close" aria-label="Close" @onclick="CloseCreateDialog"></button>
</div>
<div class="modal-body">
<div class="api-key-management-grid">
<div class="mb-3">
<label for="keyId" class="form-label">Key ID</label>
<input id="keyId" class="form-control" @bind="CreateModel.KeyId" @bind:event="oninput" />
</div>
<div class="mb-3">
<label for="displayName" class="form-label">Display Name</label>
<input id="displayName" class="form-control" @bind="CreateModel.DisplayName" @bind:event="oninput" />
</div>
</div>
<fieldset class="mb-3">
<legend class="form-label">Scopes</legend>
<div class="scope-grid">
@foreach (string scope in AvailableScopes)
{
<label class="form-check">
<input class="form-check-input" type="checkbox"
checked="@IsScopeSelected(scope)"
@onchange="eventArgs => SetScope(scope, eventArgs)" />
<span class="form-check-label">@scope</span>
</label>
}
</div>
</fieldset>
<div class="api-key-management-grid">
<div class="mb-3">
<label for="readSubtrees" class="form-label">Read subtrees</label>
<textarea id="readSubtrees" class="form-control" rows="2" @bind="CreateModel.ReadSubtrees" @bind:event="oninput"></textarea>
</div>
<div class="mb-3">
<label for="writeSubtrees" class="form-label">Write subtrees</label>
<textarea id="writeSubtrees" class="form-control" rows="2" @bind="CreateModel.WriteSubtrees" @bind:event="oninput"></textarea>
</div>
<div class="mb-3">
<label for="readTagGlobs" class="form-label">Read tag globs</label>
<textarea id="readTagGlobs" class="form-control" rows="2" @bind="CreateModel.ReadTagGlobs" @bind:event="oninput"></textarea>
</div>
<div class="mb-3">
<label for="writeTagGlobs" class="form-label">Write tag globs</label>
<textarea id="writeTagGlobs" class="form-control" rows="2" @bind="CreateModel.WriteTagGlobs" @bind:event="oninput"></textarea>
</div>
<div class="mb-3">
<label for="browseSubtrees" class="form-label">Browse subtrees</label>
<textarea id="browseSubtrees" class="form-control" rows="2" @bind="CreateModel.BrowseSubtrees" @bind:event="oninput"></textarea>
</div>
<div class="mb-3">
<label for="maxWriteClassification" class="form-label">Max write classification</label>
<input id="maxWriteClassification" class="form-control" @bind="CreateModel.MaxWriteClassification" @bind:event="oninput" />
</div>
</div>
<div class="d-flex flex-wrap gap-3">
<label class="form-check">
<InputCheckbox class="form-check-input" @bind-Value="CreateModel.ReadAlarmOnly" />
<span class="form-check-label">Read alarm only</span>
</label>
<label class="form-check">
<InputCheckbox class="form-check-input" @bind-Value="CreateModel.ReadHistorizedOnly" />
<span class="form-check-label">Read historized only</span>
</label>
</div>
</div>
<div class="modal-footer">
<button type="button" class="btn btn-outline-secondary" disabled="@IsBusy" @onclick="CloseCreateDialog">
Cancel
</button>
<button type="submit" class="btn btn-primary" disabled="@IsBusy">Create Key</button>
</div>
</EditForm>
</div>
</div>
</div>
}
}
<section class="dashboard-section">
@if (Snapshot.ApiKeys.Count == 0)
{
<div class="empty-state">No API keys are available for display.</div>
}
else
{
<div class="table-responsive">
<table class="table table-sm align-middle dashboard-table">
<thead>
<tr>
<th scope="col">Key</th>
<th scope="col">Status</th>
<th scope="col">Display Name</th>
<th scope="col">Scopes</th>
<th scope="col">Constraints</th>
<th scope="col">Created</th>
<th scope="col">Last Used</th>
@if (CanManageApiKeys)
{
<th scope="col">Actions</th>
}
</tr>
</thead>
<tbody>
@foreach (DashboardApiKeySummary key in Snapshot.ApiKeys)
{
<tr>
<td><code>@key.KeyId</code></td>
<td><StatusBadge Text="@(key.RevokedUtc is null ? "Active" : "Revoked")" /></td>
<td>@DashboardDisplay.Text(key.DisplayName)</td>
<td>@DashboardDisplay.Text(string.Join(", ", key.Scopes.Order(StringComparer.Ordinal)))</td>
<td>@DashboardDisplay.Text(ConstraintText(key.Constraints))</td>
<td>@DashboardDisplay.DateTime(key.CreatedUtc)</td>
<td>@DashboardDisplay.DateTime(key.LastUsedUtc)</td>
@if (CanManageApiKeys)
{
<td>
<div class="btn-group btn-group-sm" role="group" aria-label="API key actions">
<button type="button" class="btn btn-outline-secondary"
disabled="@IsBusy"
@onclick="() => RotateApiKeyAsync(key.KeyId)">
Rotate
</button>
@if (key.RevokedUtc is null)
{
<button type="button" class="btn btn-outline-danger"
disabled="@IsBusy"
@onclick="() => RevokeApiKeyAsync(key.KeyId)">
Revoke
</button>
}
</div>
</td>
}
</tr>
}
</tbody>
</table>
</div>
}
</section>
}
@code {
private static readonly string[] AvailableScopes =
[
GatewayScopes.SessionOpen,
GatewayScopes.SessionClose,
GatewayScopes.InvokeRead,
GatewayScopes.InvokeWrite,
GatewayScopes.InvokeSecure,
GatewayScopes.EventsRead,
GatewayScopes.MetadataRead,
GatewayScopes.Admin
];
private ApiKeyCreateModel CreateModel { get; } = new();
private bool CanManageApiKeys { get; set; }
private bool IsBusy { get; set; }
private bool IsCreateDialogOpen { get; set; }
private string? ResultMessage { get; set; }
private bool LastOperationSucceeded { get; set; }
private string? LastGeneratedApiKey { get; set; }
protected override async Task OnInitializedAsync()
{
AuthenticationState authenticationState = await AuthenticationStateProvider.GetAuthenticationStateAsync()
.ConfigureAwait(false);
CanManageApiKeys = ApiKeyManagementService.CanManage(authenticationState.User);
}
private async Task CreateApiKeyAsync()
{
if (IsBusy)
{
return;
}
if (!TryBuildCreateRequest(out DashboardApiKeyManagementRequest? request, out string? validationMessage))
{
SetResult(DashboardApiKeyManagementResult.Fail(validationMessage ?? "API key request is invalid."));
return;
}
await RunManagementActionAsync(user => ApiKeyManagementService.CreateAsync(
user,
request,
CancellationToken.None))
.ConfigureAwait(false);
}
private async Task RevokeApiKeyAsync(string keyId)
{
await RunManagementActionAsync(user => ApiKeyManagementService.RevokeAsync(
user,
keyId,
CancellationToken.None))
.ConfigureAwait(false);
}
private async Task RotateApiKeyAsync(string keyId)
{
await RunManagementActionAsync(user => ApiKeyManagementService.RotateAsync(
user,
keyId,
CancellationToken.None))
.ConfigureAwait(false);
}
private async Task RunManagementActionAsync(
Func<System.Security.Claims.ClaimsPrincipal, Task<DashboardApiKeyManagementResult>> action)
{
if (IsBusy)
{
return;
}
IsBusy = true;
try
{
AuthenticationState authenticationState = await AuthenticationStateProvider.GetAuthenticationStateAsync()
.ConfigureAwait(false);
CanManageApiKeys = ApiKeyManagementService.CanManage(authenticationState.User);
DashboardApiKeyManagementResult result = await action(authenticationState.User).ConfigureAwait(false);
SetResult(result);
if (result.Succeeded && result.ApiKey is not null)
{
CreateModel.Reset();
IsCreateDialogOpen = false;
}
}
finally
{
IsBusy = false;
}
}
private void SetResult(DashboardApiKeyManagementResult result)
{
LastOperationSucceeded = result.Succeeded;
ResultMessage = result.Message;
LastGeneratedApiKey = result.ApiKey;
}
private void OpenCreateDialog()
{
IsCreateDialogOpen = true;
}
private void CloseCreateDialog()
{
if (!IsBusy)
{
IsCreateDialogOpen = false;
}
}
private bool TryBuildCreateRequest(
[System.Diagnostics.CodeAnalysis.NotNullWhen(true)] out DashboardApiKeyManagementRequest? request,
out string? validationMessage)
{
request = null;
validationMessage = null;
if (!string.IsNullOrWhiteSpace(CreateModel.MaxWriteClassification)
&& !int.TryParse(
CreateModel.MaxWriteClassification,
System.Globalization.NumberStyles.Integer,
System.Globalization.CultureInfo.InvariantCulture,
out int _))
{
validationMessage = "Max write classification must be an integer.";
return false;
}
int? maxWriteClassification = string.IsNullOrWhiteSpace(CreateModel.MaxWriteClassification)
? null
: int.Parse(
CreateModel.MaxWriteClassification,
System.Globalization.NumberStyles.Integer,
System.Globalization.CultureInfo.InvariantCulture);
request = new DashboardApiKeyManagementRequest(
KeyId: CreateModel.KeyId,
DisplayName: CreateModel.DisplayName,
Scopes: CreateModel.SelectedScopes,
Constraints: new MxGateway.Server.Security.Authentication.ApiKeyConstraints(
ReadSubtrees: ParseList(CreateModel.ReadSubtrees),
WriteSubtrees: ParseList(CreateModel.WriteSubtrees),
ReadTagGlobs: ParseList(CreateModel.ReadTagGlobs),
WriteTagGlobs: ParseList(CreateModel.WriteTagGlobs),
MaxWriteClassification: maxWriteClassification,
BrowseSubtrees: ParseList(CreateModel.BrowseSubtrees),
ReadAlarmOnly: CreateModel.ReadAlarmOnly,
ReadHistorizedOnly: CreateModel.ReadHistorizedOnly));
return true;
}
private bool IsScopeSelected(string scope)
{
return CreateModel.SelectedScopes.Contains(scope);
}
private void SetScope(string scope, ChangeEventArgs eventArgs)
{
bool selected = eventArgs.Value is bool value && value;
if (selected)
{
CreateModel.SelectedScopes.Add(scope);
}
else
{
CreateModel.SelectedScopes.Remove(scope);
}
}
private static string ConstraintText(MxGateway.Server.Security.Authentication.ApiKeyConstraints constraints)
{
if (constraints.IsEmpty)
{
return "unconstrained";
}
List<string> parts = [];
AddList(parts, "read_subtrees", constraints.ReadSubtrees);
AddList(parts, "write_subtrees", constraints.WriteSubtrees);
AddList(parts, "read_tag_globs", constraints.ReadTagGlobs);
AddList(parts, "write_tag_globs", constraints.WriteTagGlobs);
AddList(parts, "browse_subtrees", constraints.BrowseSubtrees);
if (constraints.MaxWriteClassification is { } max)
{
parts.Add($"max_write_classification={max}");
}
if (constraints.ReadAlarmOnly)
{
parts.Add("read_alarm_only");
}
if (constraints.ReadHistorizedOnly)
{
parts.Add("read_historized_only");
}
return string.Join("; ", parts);
}
private static void AddList(List<string> parts, string name, IReadOnlyList<string> values)
{
if (values.Count > 0)
{
parts.Add($"{name}=[{string.Join(", ", values)}]");
}
}
private static IReadOnlyList<string> ParseList(string? value)
{
return (value ?? string.Empty)
.Split([',', ';', '\r', '\n'], StringSplitOptions.RemoveEmptyEntries | StringSplitOptions.TrimEntries)
.Where(item => !string.IsNullOrWhiteSpace(item))
.ToArray();
}
private sealed class ApiKeyCreateModel
{
public string KeyId { get; set; } = string.Empty;
public string DisplayName { get; set; } = string.Empty;
public HashSet<string> SelectedScopes { get; } = new(StringComparer.Ordinal);
public string ReadSubtrees { get; set; } = string.Empty;
public string WriteSubtrees { get; set; } = string.Empty;
public string ReadTagGlobs { get; set; } = string.Empty;
public string WriteTagGlobs { get; set; } = string.Empty;
public string BrowseSubtrees { get; set; } = string.Empty;
public string MaxWriteClassification { get; set; } = string.Empty;
public bool ReadAlarmOnly { get; set; }
public bool ReadHistorizedOnly { get; set; }
public void Reset()
{
KeyId = string.Empty;
DisplayName = string.Empty;
SelectedScopes.Clear();
ReadSubtrees = string.Empty;
WriteSubtrees = string.Empty;
ReadTagGlobs = string.Empty;
WriteTagGlobs = string.Empty;
BrowseSubtrees = string.Empty;
MaxWriteClassification = string.Empty;
ReadAlarmOnly = false;
ReadHistorizedOnly = false;
}
}
}
@@ -19,12 +19,12 @@ else
</div>
<section class="metric-grid">
<MetricCard Label="Last Deploy" Value="@DashboardDisplay.DateTime(Snapshot.Galaxy.LastDeployTime)" Detail="@DeployAge()" />
<MetricCard Label="Last Deploy" Value="@DashboardDisplay.DateTime(Snapshot.Galaxy.LastDeployTime)" Detail="@DeployAge()" Wide="true" />
<MetricCard Label="Last Refresh" Value="@DashboardDisplay.DateTime(Snapshot.Galaxy.LastSuccessAt)" Detail="@LastAttemptDetail()" Wide="true" />
<MetricCard Label="Objects" Value="@DashboardDisplay.Count(Snapshot.Galaxy.ObjectCount)" Detail="@($"{Snapshot.Galaxy.AreaCount:N0} areas")" />
<MetricCard Label="Attributes" Value="@DashboardDisplay.Count(Snapshot.Galaxy.AttributeCount)" Detail="dynamic, deployed" />
<MetricCard Label="Historized" Value="@DashboardDisplay.Count(Snapshot.Galaxy.HistorizedAttributeCount)" />
<MetricCard Label="Alarms" Value="@DashboardDisplay.Count(Snapshot.Galaxy.AlarmAttributeCount)" />
<MetricCard Label="Last Refresh" Value="@DashboardDisplay.DateTime(Snapshot.Galaxy.LastSuccessAt)" Detail="@LastAttemptDetail()" />
</section>
@if (Snapshot.Galaxy.Status == DashboardGalaxyStatus.Unknown)
@@ -1,4 +1,4 @@
<div class="card metric-card h-100">
<div class="card metric-card h-100@(Wide ? " metric-card-wide" : string.Empty)">
<div class="card-body">
<div class="metric-label">@Label</div>
<div class="metric-value">@Value</div>
@@ -18,4 +18,8 @@
[Parameter]
public string? Detail { get; set; }
/// <summary>Spans the card across two grid columns for long values such as timestamps.</summary>
[Parameter]
public bool Wide { get; set; }
}
@@ -1,4 +1,4 @@
<span class="badge @CssClass">@Text</span>
<span class="chip @CssClass">@Text</span>
@code {
[Parameter]
@@ -6,12 +6,11 @@
private string CssClass => Text switch
{
"Ready" or "Healthy" => "text-bg-success",
"Creating" or "StartingWorker" or "WaitingForPipe" or "InitializingWorker" or "Closing" => "text-bg-info",
"Closed" => "text-bg-secondary",
"Stale" => "text-bg-warning",
"Faulted" or "Unavailable" => "text-bg-danger",
"Unknown" => "text-bg-light text-dark border",
_ => "text-bg-light text-dark border"
"Ready" or "Healthy" or "Active" => "chip-ok",
"Creating" or "StartingWorker" or "WaitingForPipe" or "InitializingWorker" or "Closing" => "chip-warn",
"Stale" or "Degraded" => "chip-warn",
"Faulted" or "Unavailable" => "chip-bad",
"Closed" or "Revoked" or "Unknown" => "chip-idle",
_ => "chip-idle"
};
}
@@ -0,0 +1,22 @@
using System.Security.Claims;
using Microsoft.Extensions.Options;
using MxGateway.Server.Configuration;
namespace MxGateway.Server.Dashboard;
public sealed class DashboardApiKeyAuthorization(IOptions<GatewayOptions> options)
{
public bool CanManage(ClaimsPrincipal user)
{
if (user.Identity?.IsAuthenticated != true)
{
return false;
}
string requiredGroup = options.Value.Ldap.RequiredGroup;
IEnumerable<string> groups = user.FindAll(DashboardAuthenticationDefaults.LdapGroupClaimType)
.Select(claim => claim.Value);
return DashboardAuthenticator.IsMemberOfRequiredGroup(groups, requiredGroup);
}
}
@@ -0,0 +1,9 @@
using MxGateway.Server.Security.Authentication;
namespace MxGateway.Server.Dashboard;
public sealed record DashboardApiKeyManagementRequest(
string KeyId,
string DisplayName,
IReadOnlySet<string> Scopes,
ApiKeyConstraints Constraints);
@@ -0,0 +1,17 @@
namespace MxGateway.Server.Dashboard;
public sealed record DashboardApiKeyManagementResult(
bool Succeeded,
string Message,
string? ApiKey)
{
public static DashboardApiKeyManagementResult Success(string message, string? apiKey = null)
{
return new DashboardApiKeyManagementResult(true, message, apiKey);
}
public static DashboardApiKeyManagementResult Fail(string message)
{
return new DashboardApiKeyManagementResult(false, message, null);
}
}
@@ -0,0 +1,205 @@
using System.Security.Claims;
using Microsoft.Data.Sqlite;
using MxGateway.Server.Security.Authentication;
using MxGateway.Server.Security.Authorization;
namespace MxGateway.Server.Dashboard;
public sealed class DashboardApiKeyManagementService(
DashboardApiKeyAuthorization authorization,
IApiKeyAdminStore adminStore,
IApiKeyAuditStore auditStore,
IApiKeySecretHasher hasher,
IHttpContextAccessor httpContextAccessor) : IDashboardApiKeyManagementService
{
private const string UnauthorizedMessage = "Sign in with an authorized LDAP account to manage API keys.";
public bool CanManage(ClaimsPrincipal user)
{
return authorization.CanManage(user);
}
public async Task<DashboardApiKeyManagementResult> CreateAsync(
ClaimsPrincipal user,
DashboardApiKeyManagementRequest request,
CancellationToken cancellationToken)
{
if (!CanManage(user))
{
return DashboardApiKeyManagementResult.Fail(UnauthorizedMessage);
}
string? validation = ValidateCreateRequest(request);
if (validation is not null)
{
return DashboardApiKeyManagementResult.Fail(validation);
}
string keyId = request.KeyId.Trim();
string secret = ApiKeySecretGenerator.Generate();
string apiKey = FormatApiKey(keyId, secret);
try
{
await adminStore.CreateAsync(
new ApiKeyCreateRequest(
KeyId: keyId,
KeyPrefix: $"mxgw_{keyId}",
SecretHash: hasher.HashSecret(secret),
DisplayName: request.DisplayName.Trim(),
Scopes: request.Scopes,
Constraints: request.Constraints,
CreatedUtc: DateTimeOffset.UtcNow),
cancellationToken)
.ConfigureAwait(false);
await AppendAuditAsync(keyId, "dashboard-create-key", null, cancellationToken).ConfigureAwait(false);
return DashboardApiKeyManagementResult.Success("API key created. Copy the key now; it will not be shown again.", apiKey);
}
catch (ApiKeyPepperUnavailableException)
{
return DashboardApiKeyManagementResult.Fail("API key pepper is not configured.");
}
catch (SqliteException exception) when (exception.SqliteErrorCode == 19)
{
return DashboardApiKeyManagementResult.Fail("An API key with that id already exists.");
}
}
public async Task<DashboardApiKeyManagementResult> RevokeAsync(
ClaimsPrincipal user,
string keyId,
CancellationToken cancellationToken)
{
if (!CanManage(user))
{
return DashboardApiKeyManagementResult.Fail(UnauthorizedMessage);
}
string? validation = ValidateKeyId(keyId);
if (validation is not null)
{
return DashboardApiKeyManagementResult.Fail(validation);
}
string normalizedKeyId = keyId.Trim();
bool revoked = await adminStore
.RevokeAsync(normalizedKeyId, DateTimeOffset.UtcNow, cancellationToken)
.ConfigureAwait(false);
await AppendAuditAsync(
normalizedKeyId,
"dashboard-revoke-key",
revoked ? "revoked" : "not-found-or-already-revoked",
cancellationToken)
.ConfigureAwait(false);
return revoked
? DashboardApiKeyManagementResult.Success("API key revoked.")
: DashboardApiKeyManagementResult.Fail("API key was not found or is already revoked.");
}
public async Task<DashboardApiKeyManagementResult> RotateAsync(
ClaimsPrincipal user,
string keyId,
CancellationToken cancellationToken)
{
if (!CanManage(user))
{
return DashboardApiKeyManagementResult.Fail(UnauthorizedMessage);
}
string? validation = ValidateKeyId(keyId);
if (validation is not null)
{
return DashboardApiKeyManagementResult.Fail(validation);
}
string normalizedKeyId = keyId.Trim();
string secret = ApiKeySecretGenerator.Generate();
string apiKey = FormatApiKey(normalizedKeyId, secret);
try
{
bool rotated = await adminStore
.RotateAsync(normalizedKeyId, hasher.HashSecret(secret), DateTimeOffset.UtcNow, cancellationToken)
.ConfigureAwait(false);
await AppendAuditAsync(
normalizedKeyId,
"dashboard-rotate-key",
rotated ? "rotated" : "not-found",
cancellationToken)
.ConfigureAwait(false);
return rotated
? DashboardApiKeyManagementResult.Success("API key rotated. Copy the key now; it will not be shown again.", apiKey)
: DashboardApiKeyManagementResult.Fail("API key was not found.");
}
catch (ApiKeyPepperUnavailableException)
{
return DashboardApiKeyManagementResult.Fail("API key pepper is not configured.");
}
}
private async Task AppendAuditAsync(
string? keyId,
string eventType,
string? details,
CancellationToken cancellationToken)
{
await auditStore.AppendAsync(
new ApiKeyAuditEntry(
KeyId: keyId,
EventType: eventType,
RemoteAddress: httpContextAccessor.HttpContext?.Connection.RemoteIpAddress?.ToString(),
Details: details),
cancellationToken)
.ConfigureAwait(false);
}
private static string? ValidateCreateRequest(DashboardApiKeyManagementRequest request)
{
string? keyIdValidation = ValidateKeyId(request.KeyId);
if (keyIdValidation is not null)
{
return keyIdValidation;
}
if (string.IsNullOrWhiteSpace(request.DisplayName))
{
return "Display name is required.";
}
string[] unknownScopes = request.Scopes
.Where(scope => !GatewayScopes.IsKnown(scope))
.ToArray();
if (unknownScopes.Length > 0)
{
return $"Unknown scope(s): {string.Join(", ", unknownScopes)}. "
+ $"Valid scopes are: {string.Join(", ", GatewayScopes.All)}.";
}
return null;
}
private static string? ValidateKeyId(string keyId)
{
if (string.IsNullOrWhiteSpace(keyId))
{
return "API key id is required.";
}
return keyId.Trim().All(character =>
char.IsAsciiLetterOrDigit(character)
|| character is '.' or '-')
? null
: "API key id may contain only letters, numbers, periods, and hyphens.";
}
private static string FormatApiKey(string keyId, string secret)
{
return $"mxgw_{keyId}_{secret}";
}
}
@@ -0,0 +1,12 @@
using MxGateway.Server.Security.Authentication;
namespace MxGateway.Server.Dashboard;
public sealed record DashboardApiKeySummary(
string KeyId,
string DisplayName,
IReadOnlySet<string> Scopes,
ApiKeyConstraints Constraints,
DateTimeOffset CreatedUtc,
DateTimeOffset? LastUsedUtc,
DateTimeOffset? RevokedUtc);
@@ -2,6 +2,7 @@ using System.Security.Claims;
using System.Text;
using Microsoft.Extensions.Options;
using MxGateway.Server.Configuration;
using MxGateway.Server.Security.Authorization;
using Novell.Directory.Ldap;
namespace MxGateway.Server.Dashboard;
@@ -238,10 +239,16 @@ public sealed class DashboardAuthenticator(
string displayName,
IEnumerable<string> groups)
{
// CreatePrincipal is reached only after IsMemberOfRequiredGroup passed,
// so the authenticated user is authorized for the dashboard. Emit the
// admin scope claim that DashboardAuthorizationHandler checks when
// Dashboard:RequireAdminScope is enabled — without it, every LDAP login
// would be denied once route-level authorization is enforced.
List<Claim> claims =
[
new Claim(ClaimTypes.NameIdentifier, username),
new Claim(ClaimTypes.Name, displayName)
new Claim(ClaimTypes.Name, displayName),
new Claim(DashboardAuthenticationDefaults.ScopeClaimType, GatewayScopes.Admin)
];
claims.AddRange(groups.Select(group => new Claim(
@@ -0,0 +1,28 @@
using Microsoft.Data.SqlClient;
namespace MxGateway.Server.Dashboard;
public static class DashboardConnectionStringDisplay
{
public static string GalaxyRepositoryConnectionString(string connectionString)
{
try
{
SqlConnectionStringBuilder builder = new(connectionString);
SqlConnectionStringBuilder display = new()
{
DataSource = builder.DataSource,
InitialCatalog = builder.InitialCatalog,
IntegratedSecurity = builder.IntegratedSecurity,
Encrypt = builder.Encrypt,
TrustServerCertificate = builder.TrustServerCertificate,
};
return display.ConnectionString;
}
catch (ArgumentException)
{
return "[invalid connection string]";
}
}
}
@@ -52,8 +52,13 @@ public static class DashboardEndpointRouteBuilderExtensions
.AllowAnonymous()
.WithName("DashboardAccessDenied");
// Every dashboard Razor component requires an authorized session. The
// login/logout/denied endpoints above opt out via AllowAnonymous(); an
// unauthenticated request to a component route is challenged by the
// cookie scheme and redirected to the login page.
dashboard.MapRazorComponents<App>()
.AddInteractiveServerRenderMode();
.AddInteractiveServerRenderMode()
.RequireAuthorization(DashboardAuthenticationDefaults.AuthorizationPolicy);
return endpoints;
}
@@ -169,11 +174,17 @@ public static class DashboardEndpointRouteBuilderExtensions
<meta name="viewport" content="width=device-width, initial-scale=1" />
<title>{HtmlEncoder.Default.Encode(title)}</title>
<link rel="stylesheet" href="/lib/bootstrap/css/bootstrap.min.css" />
<link rel="stylesheet" href="/css/theme.css" />
<link rel="stylesheet" href="/css/dashboard.css" />
</head>
<body class="dashboard-body">
<main class="container py-5">
<h1 class="h3 mb-4">{HtmlEncoder.Default.Encode(title)}</h1>
<header class="app-bar">
<span class="brand"><span class="mark">&#9646;</span> MXAccess Gateway</span>
</header>
<main class="page">
<div class="dashboard-page-header">
<h1>{HtmlEncoder.Default.Encode(title)}</h1>
</div>
{body}
</main>
<script src="/lib/bootstrap/js/bootstrap.bundle.min.js"></script>
@@ -0,0 +1,23 @@
using System.Security.Claims;
namespace MxGateway.Server.Dashboard;
public interface IDashboardApiKeyManagementService
{
bool CanManage(ClaimsPrincipal user);
Task<DashboardApiKeyManagementResult> CreateAsync(
ClaimsPrincipal user,
DashboardApiKeyManagementRequest request,
CancellationToken cancellationToken);
Task<DashboardApiKeyManagementResult> RevokeAsync(
ClaimsPrincipal user,
string keyId,
CancellationToken cancellationToken);
Task<DashboardApiKeyManagementResult> RotateAsync(
ClaimsPrincipal user,
string keyId,
CancellationToken cancellationToken);
}
@@ -0,0 +1,44 @@
using System.Text;
using System.Text.RegularExpressions;
namespace MxGateway.Server.Galaxy;
public static class GalaxyGlobMatcher
{
public static bool IsMatch(string value, string glob)
{
if (string.IsNullOrWhiteSpace(glob))
{
return true;
}
return Regex.IsMatch(
value ?? string.Empty,
BuildRegex(glob),
RegexOptions.CultureInvariant | RegexOptions.IgnoreCase,
TimeSpan.FromMilliseconds(100));
}
private static string BuildRegex(string glob)
{
StringBuilder builder = new("^", glob.Length + 2);
foreach (char character in glob)
{
switch (character)
{
case '*':
builder.Append(".*");
break;
case '?':
builder.Append('.');
break;
default:
builder.Append(Regex.Escape(character.ToString()));
break;
}
}
builder.Append('$');
return builder.ToString();
}
}
@@ -1,5 +1,4 @@
using Google.Protobuf.WellKnownTypes;
using Microsoft.Data.SqlClient;
using Microsoft.Extensions.Logging;
using MxGateway.Contracts.Proto.Galaxy;
using MxGateway.Server.Dashboard;
@@ -181,8 +180,13 @@ public sealed class GalaxyHierarchyCache : IGalaxyHierarchyCache
{
throw;
}
catch (Exception exception) when (exception is SqlException or InvalidOperationException)
catch (Exception exception)
{
// Catch every non-cancellation failure — not just SqlException /
// InvalidOperationException. A TimeoutException or Win32Exception
// from connection establishment, or another DbException subtype,
// must still degrade gracefully to Stale/Unavailable and complete
// _firstLoad rather than escape and fault the refresh BackgroundService.
_logger?.LogWarning(exception, "Galaxy hierarchy cache refresh failed.");
GalaxyHierarchyCacheEntry failed = previous with
{
@@ -0,0 +1,106 @@
using MxGateway.Contracts.Proto.Galaxy;
namespace MxGateway.Server.Galaxy;
public sealed class GalaxyHierarchyIndex
{
private GalaxyHierarchyIndex(
IReadOnlyList<GalaxyObjectView> objectViews,
IReadOnlyDictionary<int, GalaxyObjectView> objectViewsById,
IReadOnlyDictionary<string, GalaxyTagLookup> tagsByAddress)
{
ObjectViews = objectViews;
ObjectViewsById = objectViewsById;
TagsByAddress = tagsByAddress;
}
public static GalaxyHierarchyIndex Empty { get; } = new(
Array.Empty<GalaxyObjectView>(),
new Dictionary<int, GalaxyObjectView>(),
new Dictionary<string, GalaxyTagLookup>(StringComparer.OrdinalIgnoreCase));
public IReadOnlyList<GalaxyObjectView> ObjectViews { get; }
public IReadOnlyDictionary<int, GalaxyObjectView> ObjectViewsById { get; }
public IReadOnlyDictionary<string, GalaxyTagLookup> TagsByAddress { get; }
public static GalaxyHierarchyIndex Build(IReadOnlyList<GalaxyObject> objects)
{
if (objects.Count == 0)
{
return Empty;
}
Dictionary<int, GalaxyObject> objectsById = new();
foreach (GalaxyObject obj in objects)
{
objectsById.TryAdd(obj.GobjectId, obj);
}
List<GalaxyObjectView> views = new(objects.Count);
Dictionary<int, GalaxyObjectView> viewsById = new();
Dictionary<string, GalaxyTagLookup> tagsByAddress = new(StringComparer.OrdinalIgnoreCase);
foreach (GalaxyObject obj in objects)
{
string path = BuildContainedPath(obj, objectsById);
int depth = string.IsNullOrWhiteSpace(path) ? 0 : path.Count(character => character == '/');
GalaxyObjectView view = new(obj, path, depth);
views.Add(view);
viewsById.TryAdd(obj.GobjectId, view);
if (!string.IsNullOrWhiteSpace(obj.TagName))
{
tagsByAddress.TryAdd(obj.TagName, new GalaxyTagLookup(obj, Attribute: null, path));
}
foreach (GalaxyAttribute attribute in obj.Attributes)
{
if (!string.IsNullOrWhiteSpace(attribute.FullTagReference))
{
tagsByAddress.TryAdd(attribute.FullTagReference, new GalaxyTagLookup(obj, attribute, path));
}
}
}
return new GalaxyHierarchyIndex(
views,
viewsById,
tagsByAddress);
}
private static string BuildContainedPath(
GalaxyObject obj,
IReadOnlyDictionary<int, GalaxyObject> objectsById)
{
Stack<string> names = new();
HashSet<int> seen = [];
GalaxyObject? current = obj;
while (current is not null && seen.Add(current.GobjectId))
{
names.Push(ResolvePathSegment(current));
current = current.ParentGobjectId != 0
&& objectsById.TryGetValue(current.ParentGobjectId, out GalaxyObject? parent)
? parent
: null;
}
return string.Join('/', names.Where(name => !string.IsNullOrWhiteSpace(name)));
}
private static string ResolvePathSegment(GalaxyObject obj)
{
if (!string.IsNullOrWhiteSpace(obj.ContainedName))
{
return obj.ContainedName;
}
if (!string.IsNullOrWhiteSpace(obj.BrowseName))
{
return obj.BrowseName;
}
return obj.TagName;
}
}
@@ -0,0 +1,246 @@
using System.Security.Cryptography;
using System.Text;
using Grpc.Core;
using MxGateway.Contracts.Proto.Galaxy;
namespace MxGateway.Server.Galaxy;
public static class GalaxyHierarchyProjector
{
public static GalaxyHierarchyQueryResult Project(
GalaxyHierarchyCacheEntry entry,
DiscoverHierarchyRequest request,
IReadOnlyList<string>? browseSubtreeGlobs = null)
{
return Project(
entry,
request,
browseSubtreeGlobs,
offset: 0,
pageSize: int.MaxValue);
}
public static GalaxyHierarchyQueryResult Project(
GalaxyHierarchyCacheEntry entry,
DiscoverHierarchyRequest request,
IReadOnlyList<string>? browseSubtreeGlobs,
int offset,
int pageSize)
{
ArgumentNullException.ThrowIfNull(entry);
ArgumentNullException.ThrowIfNull(request);
if (offset < 0)
{
throw new ArgumentOutOfRangeException(nameof(offset), offset, "Offset must be greater than or equal to zero.");
}
if (pageSize <= 0)
{
throw new ArgumentOutOfRangeException(nameof(pageSize), pageSize, "Page size must be greater than zero.");
}
IReadOnlyList<GalaxyObjectView> views = entry.Index.ObjectViews;
GalaxyObjectView? root = ResolveRoot(request, views);
int? maxDepth = request.MaxDepth;
if (maxDepth < 0)
{
throw new RpcException(new Status(
StatusCode.InvalidArgument,
"DiscoverHierarchy max_depth must be greater than or equal to zero when provided."));
}
List<GalaxyObject> page = [];
int matchedCount = 0;
bool includeAttributes = IncludeAttributes(request);
foreach (GalaxyObjectView view in views)
{
if (!MatchesRoot(view, root, maxDepth)
|| !MatchesBrowseSubtrees(view, browseSubtreeGlobs)
|| !MatchesFilters(view.Object, request))
{
continue;
}
if (matchedCount >= offset && page.Count < pageSize)
{
page.Add(CloneObject(view.Object, includeAttributes));
}
matchedCount++;
}
return new GalaxyHierarchyQueryResult(
page,
matchedCount,
ComputeFilterSignature(request, browseSubtreeGlobs));
}
public static GalaxyObject? FindObjectForTag(
GalaxyHierarchyCacheEntry entry,
string tagAddress)
{
if (string.IsNullOrWhiteSpace(tagAddress))
{
return null;
}
return entry.Index.TagsByAddress.TryGetValue(tagAddress, out GalaxyTagLookup? lookup)
? lookup.Object
: null;
}
public static GalaxyAttribute? FindAttributeForTag(
GalaxyHierarchyCacheEntry entry,
string tagAddress)
{
if (string.IsNullOrWhiteSpace(tagAddress))
{
return null;
}
return entry.Index.TagsByAddress.TryGetValue(tagAddress, out GalaxyTagLookup? lookup)
? lookup.Attribute
: null;
}
public static string GetContainedPath(
GalaxyHierarchyCacheEntry entry,
int gobjectId)
{
return entry.Index.ObjectViewsById.TryGetValue(gobjectId, out GalaxyObjectView? view)
? view.ContainedPath
: string.Empty;
}
private static GalaxyObjectView? ResolveRoot(
DiscoverHierarchyRequest request,
IReadOnlyList<GalaxyObjectView> views)
{
GalaxyObjectView? root = request.RootCase switch
{
DiscoverHierarchyRequest.RootOneofCase.None => null,
DiscoverHierarchyRequest.RootOneofCase.RootGobjectId => views.FirstOrDefault(
view => view.Object.GobjectId == request.RootGobjectId),
DiscoverHierarchyRequest.RootOneofCase.RootTagName => views.FirstOrDefault(
view => string.Equals(view.Object.TagName, request.RootTagName, StringComparison.OrdinalIgnoreCase)),
DiscoverHierarchyRequest.RootOneofCase.RootContainedPath => views.FirstOrDefault(
view => string.Equals(view.ContainedPath, request.RootContainedPath, StringComparison.OrdinalIgnoreCase)),
_ => null,
};
if (request.RootCase != DiscoverHierarchyRequest.RootOneofCase.None && root is null)
{
throw new RpcException(new Status(StatusCode.NotFound, "DiscoverHierarchy root was not found."));
}
return root;
}
private static bool MatchesRoot(
GalaxyObjectView view,
GalaxyObjectView? root,
int? maxDepth)
{
if (root is null)
{
return true;
}
bool isRoot = view.Object.GobjectId == root.Object.GobjectId;
bool isDescendant = view.ContainedPath.StartsWith(root.ContainedPath + "/", StringComparison.OrdinalIgnoreCase);
if (!isRoot && !isDescendant)
{
return false;
}
return maxDepth is null || view.Depth - root.Depth <= maxDepth.Value;
}
private static bool MatchesBrowseSubtrees(
GalaxyObjectView view,
IReadOnlyList<string>? browseSubtreeGlobs)
{
return browseSubtreeGlobs is null
|| browseSubtreeGlobs.Count == 0
|| browseSubtreeGlobs.Any(glob => GalaxyGlobMatcher.IsMatch(view.ContainedPath, glob));
}
private static bool MatchesFilters(
GalaxyObject obj,
DiscoverHierarchyRequest request)
{
if (request.CategoryIds.Count > 0 && !request.CategoryIds.Contains(obj.CategoryId))
{
return false;
}
foreach (string templateFilter in request.TemplateChainContains)
{
if (!obj.TemplateChain.Any(template => template.Contains(templateFilter, StringComparison.OrdinalIgnoreCase)))
{
return false;
}
}
if (!string.IsNullOrWhiteSpace(request.TagNameGlob)
&& !GalaxyGlobMatcher.IsMatch(obj.TagName, request.TagNameGlob))
{
return false;
}
if (request.AlarmBearingOnly && !obj.Attributes.Any(attribute => attribute.IsAlarm))
{
return false;
}
if (request.HistorizedOnly && !obj.Attributes.Any(attribute => attribute.IsHistorized))
{
return false;
}
return true;
}
private static bool IncludeAttributes(DiscoverHierarchyRequest request)
{
return !request.HasIncludeAttributes || request.IncludeAttributes;
}
private static GalaxyObject CloneObject(GalaxyObject source, bool includeAttributes)
{
GalaxyObject clone = source.Clone();
if (!includeAttributes)
{
clone.Attributes.Clear();
}
return clone;
}
public static string ComputeFilterSignature(
DiscoverHierarchyRequest request,
IReadOnlyList<string>? browseSubtreeGlobs)
{
StringBuilder builder = new();
builder.Append("root=").Append(request.RootCase).Append('|');
builder.Append(request.RootCase switch
{
DiscoverHierarchyRequest.RootOneofCase.RootGobjectId => request.RootGobjectId.ToString(
System.Globalization.CultureInfo.InvariantCulture),
DiscoverHierarchyRequest.RootOneofCase.RootTagName => request.RootTagName,
DiscoverHierarchyRequest.RootOneofCase.RootContainedPath => request.RootContainedPath,
_ => string.Empty,
});
builder.Append("|max=").Append(request.MaxDepth?.ToString(System.Globalization.CultureInfo.InvariantCulture) ?? "");
builder.Append("|cat=").AppendJoin(',', request.CategoryIds.Order());
builder.Append("|tpl=").AppendJoin(',', request.TemplateChainContains.Order(StringComparer.OrdinalIgnoreCase));
builder.Append("|glob=").Append(request.TagNameGlob);
builder.Append("|attrs=").Append(request.HasIncludeAttributes ? request.IncludeAttributes.ToString() : "unset");
builder.Append("|alarm=").Append(request.AlarmBearingOnly);
builder.Append("|hist=").Append(request.HistorizedOnly);
builder.Append("|browse=").AppendJoin(',', (browseSubtreeGlobs ?? Array.Empty<string>()).Order(StringComparer.OrdinalIgnoreCase));
byte[] hash = SHA256.HashData(Encoding.UTF8.GetBytes(builder.ToString()));
return Convert.ToHexString(hash, 0, 12);
}
}
@@ -0,0 +1,8 @@
using MxGateway.Contracts.Proto.Galaxy;
namespace MxGateway.Server.Galaxy;
public sealed record GalaxyHierarchyQueryResult(
IReadOnlyList<GalaxyObject> Objects,
int TotalObjectCount,
string FilterSignature);
@@ -26,6 +26,15 @@ public sealed class GalaxyHierarchyRefreshService(
{
return;
}
catch (Exception exception)
{
// A transient first-load failure (e.g. a TimeoutException or
// Win32Exception from connection establishment, or a DbException
// subtype the cache does not catch) must not fault this
// BackgroundService and stop the whole gateway. The cache records
// its own Unavailable/Stale status; the periodic tick below retries.
logger.LogWarning(exception, "Initial Galaxy hierarchy cache load failed; will retry on the refresh interval.");
}
using PeriodicTimer timer = new(interval, _timeProvider);
try
@@ -0,0 +1,8 @@
using MxGateway.Contracts.Proto.Galaxy;
namespace MxGateway.Server.Galaxy;
public sealed record GalaxyObjectView(
GalaxyObject Object,
string ContainedPath,
int Depth);
@@ -0,0 +1,8 @@
using MxGateway.Contracts.Proto.Galaxy;
namespace MxGateway.Server.Galaxy;
public sealed record GalaxyTagLookup(
GalaxyObject Object,
GalaxyAttribute? Attribute,
string ContainedPath);
@@ -0,0 +1,4 @@
using System.Runtime.CompilerServices;
[assembly: InternalsVisibleTo("MxGateway.Tests")]
[assembly: InternalsVisibleTo("MxGateway.IntegrationTests")]
@@ -1,3 +1,5 @@
using MxGateway.Server.Security.Authorization;
namespace MxGateway.Server.Security.Authentication;
public static class ApiKeyAdminCommandLineParser
@@ -95,6 +97,12 @@ public static class ApiKeyAdminCommandLineParser
return ApiKeyAdminParseResult.Fail(validationError);
}
string? scopeError = ValidateScopes(kind, scopes);
if (scopeError is not null)
{
return ApiKeyAdminParseResult.Fail(scopeError);
}
return ApiKeyAdminParseResult.Success(new ApiKeyAdminCommand(
Kind: kind,
Json: json,
@@ -152,6 +160,23 @@ public static class ApiKeyAdminCommandLineParser
return null;
}
private static string? ValidateScopes(ApiKeyAdminCommandKind kind, IReadOnlySet<string> scopes)
{
if (kind != ApiKeyAdminCommandKind.CreateKey)
{
return null;
}
string[] unknown = scopes.Where(scope => !GatewayScopes.IsKnown(scope)).ToArray();
if (unknown.Length == 0)
{
return null;
}
return $"Unknown scope(s): {string.Join(", ", unknown)}. "
+ $"Valid scopes are: {string.Join(", ", GatewayScopes.All)}.";
}
private static string KindName(ApiKeyAdminCommandKind kind)
{
return kind switch
@@ -0,0 +1,28 @@
using System.Text.Json;
namespace MxGateway.Server.Security.Authentication;
public static class ApiKeyConstraintSerializer
{
private static readonly JsonSerializerOptions JsonOptions = new()
{
PropertyNamingPolicy = JsonNamingPolicy.SnakeCaseLower,
WriteIndented = false,
};
public static string? Serialize(ApiKeyConstraints constraints)
{
ArgumentNullException.ThrowIfNull(constraints);
return constraints.IsEmpty ? null : JsonSerializer.Serialize(constraints, JsonOptions);
}
public static ApiKeyConstraints Deserialize(string? json)
{
if (string.IsNullOrWhiteSpace(json))
{
return ApiKeyConstraints.Empty;
}
return JsonSerializer.Deserialize<ApiKeyConstraints>(json, JsonOptions) ?? ApiKeyConstraints.Empty;
}
}
@@ -0,0 +1,43 @@
namespace MxGateway.Server.Security.Authentication;
public sealed record ApiKeyConstraints(
IReadOnlyList<string> ReadSubtrees,
IReadOnlyList<string> WriteSubtrees,
IReadOnlyList<string> ReadTagGlobs,
IReadOnlyList<string> WriteTagGlobs,
int? MaxWriteClassification,
IReadOnlyList<string> BrowseSubtrees,
bool ReadAlarmOnly,
bool ReadHistorizedOnly)
{
public static ApiKeyConstraints Empty { get; } = new(
ReadSubtrees: Array.Empty<string>(),
WriteSubtrees: Array.Empty<string>(),
ReadTagGlobs: Array.Empty<string>(),
WriteTagGlobs: Array.Empty<string>(),
MaxWriteClassification: null,
BrowseSubtrees: Array.Empty<string>(),
ReadAlarmOnly: false,
ReadHistorizedOnly: false);
public bool IsEmpty =>
ReadSubtrees.Count == 0
&& WriteSubtrees.Count == 0
&& ReadTagGlobs.Count == 0
&& WriteTagGlobs.Count == 0
&& MaxWriteClassification is null
&& BrowseSubtrees.Count == 0
&& !ReadAlarmOnly
&& !ReadHistorizedOnly;
public bool HasReadConstraints =>
ReadSubtrees.Count > 0
|| ReadTagGlobs.Count > 0
|| ReadAlarmOnly
|| ReadHistorizedOnly;
public bool HasWriteConstraints =>
WriteSubtrees.Count > 0
|| WriteTagGlobs.Count > 0
|| MaxWriteClassification is not null;
}

Some files were not shown because too many files have changed in this diff Show More