Commit Graph

21 Commits

Author SHA1 Message Date
Joseph Doherty e4fbbb541a Resolve Client.Python-003, -005, -009 code-review findings
Client.Python-003: stream_events_raw and query_active_alarms passed `timeout`
to the stub with no TypeError fallback, unlike _unary. Both now route through
a shared _open_stream helper that strips `timeout` on TypeError.

Client.Python-005: discover_hierarchy buffered the entire Galaxy hierarchy in
memory. Added GalaxyRepositoryClient.iter_hierarchy, a lazy async generator
yielding objects page-by-page; discover_hierarchy is now a thin wrapper that
preserves its list contract. README documents iter_hierarchy.

Client.Python-009: added regression coverage for previously untested paths —
write2/add_item2 request shape, the MAX_BULK_ITEMS boundary, the None-argument
TypeError guards, TLS ca_file reading, and the non-auth map_rpc_error fallthrough.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-18 21:45:16 -04:00
Joseph Doherty f13f35bc79 Resolve IntegrationTests-003..006 code-review findings
IntegrationTests-003: the live MXAccess smoke test asserted on the first
streamed event, which a registration/quality bootstrap event could occupy.
The recording writer now waits for the first event matching a predicate
(Family == OnDataChange).

IntegrationTests-004: the cleanup `finally` could throw and mask an original
assertion failure. Shutdown now routes through a helper that logs cleanup
exceptions instead of propagating them.

IntegrationTests-005: added live MXAccess parity tests — a Write round-trip
to an advised item, and an invalid-handle command surfacing the MXAccess
failure without a transport fault.

IntegrationTests-006: added live LDAP failure-path tests — wrong password
(no password leak), unknown username, and server-unreachable.

docs/GatewayTesting.md updated to describe the new cases.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-18 21:45:11 -04:00
Joseph Doherty 18ce2922e2 Resolve Worker.Tests-003..007 code-review findings
Worker.Tests-003: removed the wall-clock `Elapsed < 2s` assertion from
InvokeAsync_WakesIdlePumpForQueuedCommand; the awaited completion against a
30s idle period already proves the wake event drove dispatch.

Worker.Tests-004: MxAccessStaSession.Dispose now joins the alarm poll task
after cancelling the CTS (consistent with ShutdownGracefullyAsync), and
Dispose_StopsAlarmPollLoop asserts deterministically instead of via Task.Delay.

Worker.Tests-005: undisposed MemoryStream instances across the frame-protocol
and pipe-session tests are now `using` declarations.

Worker.Tests-006: Dispose_StopsAlarmPollLoop now constructs MxAccessStaSession
with `using` so a failed assertion cannot leak the STA poll loop.

Worker.Tests-007: docs/WorkerFrameProtocol.md verification section corrected
to target MxGateway.Worker.Tests / MxGateway.Worker with -p:Platform=x86.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-18 21:45:01 -04:00
Joseph Doherty 5ade3f4f48 Resolve Tests-003, -004, -005, -006 code-review findings
Tests-003: temp auth-DB directories leaked under %TEMP%. Added the
TempDatabaseDirectory IDisposable helper (clears the Sqlite connection pool,
then recursively deletes); SqliteAuthStoreTests and ApiKeyAdminCliRunnerTests
now dispose every directory they create.

Tests-004: added end-to-end coverage composing the real authorization
interceptor in front of the real MxAccessGatewayService, plus scope-resolver
tests confirming an unmapped request type fails closed to the admin scope.

Tests-005: added coverage for a worker faulting mid-command — a pipe
disconnect and a worker fault while an InvokeAsync is in flight both fail the
pending invoke. No product change needed.

Tests-006 (re-triaged): the flaky ReadLoop_WhenClientFaults_KillsOwnedWorkerProcess
is a test race, not a product bug — the kill runs synchronously inside
SetFaulted. Rewrote it to await FakeWorkerProcess exit deterministically, and
replaced fixed Task.Delay timing in the late-reply and heartbeat tests with
FIFO ordering and an injected ManualTimeProvider.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-18 21:44:55 -04:00
Joseph Doherty 98f9b7792b Regenerate code-reviews index after Medium findings Batch A
Reflects resolution of Server-002/004/005/006, Worker-004..008,
Client.Dotnet-001/002/003, Client.Go-002/003, Client.Java-001..005.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-18 21:32:02 -04:00
Joseph Doherty ff41556b9a Resolve Client.Java-001..005 code-review findings
Client.Java-001: redactApiKey echoed the last 4 secret characters. It now
keeps only the non-secret mxgw_<key-id>_ prefix plus ***; non-gateway-shaped
tokens return <redacted>.

Client.Java-002: a close() after a queue-overflow could wipe the enqueued
overflow exception. Terminal transitions are now serialized through a single
guarded terminate() — first terminal condition wins.

Client.Java-003: openSession never read gateway_protocol_version. Both
openSession paths now call ensureGatewayProtocolCompatible, rejecting a
non-zero mismatch and accepting unset (0) for older gateways.

Client.Java-004: register/addItem/addItem2 fell back to a return_value that
silently yields 0 when unset. The fallback is now guarded by hasReturnValue()
and throws on a protocol violation.

Client.Java-005: close() in try-with-resources could mask the body exception
when the CloseSession RPC failed. close() now catches and logs the
close-time failure; closeRaw() still surfaces it for callers that want it.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-18 21:31:46 -04:00
Joseph Doherty f88a029ecc Resolve Client.Go-002, -003 code-review findings
Client.Go-002: the Events/EventsAfter compatibility path silently dropped
events when the 16-slot results channel filled — it cancelled the stream and
closed the channel with no error delivered. sendEventResult now evicts an
old buffered event and delivers a terminal EventResult carrying the new
exported ErrEventBufferOverflow before close, so the overflow is observable.

Client.Go-003: parseInt32List panicked on a malformed -item-handles token,
crashing the CLI with a stack trace. It now returns an error that
runUnsubscribeBulk propagates, exiting 2 with a clean message.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-18 21:31:36 -04:00
Joseph Doherty 8023eccfa6 Resolve Client.Dotnet-001, -002, -003 code-review findings
Client.Dotnet-001: MapRpcException typed only Unauthenticated and
PermissionDenied; every other gRPC status collapsed to an untyped exception
with the status code discarded. Added a nullable StatusCode to
MxGatewayException, extracted the duplicated mappers into a shared
RpcExceptionMapper that records the code for every status, and documented it.

Client.Dotnet-002: the production retry branch (MxGatewayException wrapping
RpcException) was never exercised. FakeGatewayTransport gained a
MapTransportExceptions mode that runs thrown RpcExceptions through
RpcExceptionMapper exactly as the production transport does.

Client.Dotnet-003: MxGatewaySession.DisposeAsync disposed _closeLock while a
concurrent CloseAsync could be parked in WaitAsync. DisposeAsync now drains
in-flight CloseAsync callers before disposing the semaphore; the client's
_disposed flag is accessed via Interlocked.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-18 21:31:33 -04:00
Joseph Doherty 54325343bd Resolve Worker-004, -005, -006, -007, -008 code-review findings
Worker-004: post-watchdog-fault heartbeats reported a non-faulted state.
ReportWatchdogFaultIfNeededAsync now sets _state = Faulted before writing
the StaHung fault.

Worker-005 (re-triaged): the cited OnPoll site was removed by Worker-001;
the real silent-failure bug was in MxAccessStaSession.RunAlarmPollLoopAsync,
which caught only graceful-stop exceptions. A failing PollOnce now records a
WorkerFault on the event queue instead of vanishing on a non-awaited task.

Worker-006: RunAsync's finally skipped runtime disposal when shutdown timed
out, leaking the STA thread and COM object. It now always disposes
(MxAccessStaSession.Dispose is idempotent and bounded).

Worker-007 (re-triaged): replaced MxAccessComServer's Type.InvokeMember
reflection fallback with an IMxAccessServer fast path plus typed
ILMXProxyServer* casts; a non-conforming object now fails fast.

Worker-008: alarm consumer STA affinity was unenforced. MxAccessStaSession
records the alarm consumer's STA thread id and asserts every PollOnce runs
on it via a unit-testable guard.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-18 21:31:23 -04:00
Joseph Doherty 1d9e3afadd Resolve Server-002, -004, -005, -006 code-review findings
Server-002: the gateway never terminated leftover MxGateway.Worker.exe
processes at startup, contradicting gateway.md and CLAUDE.md. Added
IRunningProcessInspector/SystemRunningProcessInspector, OrphanWorkerTerminator,
and OrphanWorkerCleanupHostedService (best-effort, runs before sessions are
accepted); updated gateway.md to describe the implemented behavior.

Server-004: API-key scopes were persisted verbatim with no validation. Added
GatewayScopes.All/IsKnown; the CLI parser and dashboard create path now
reject unknown scope strings.

Server-005: a non-SqlException/InvalidOperationException fault on the initial
Galaxy hierarchy load faulted the BackgroundService. ExecuteAsync now catches
all non-cancellation exceptions on first load and RefreshCoreAsync broadens
its catch so the cache records Stale/Unavailable instead.

Server-006: OpenSessionAsync incremented the open-sessions gauge before
alarm auto-subscribe; an auto-subscribe failure leaked the gauge. The catch
path now calls SessionRemoved() when the gauge was incremented.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-18 21:31:10 -04:00
Joseph Doherty 5e795aeeb8 Regenerate code-reviews index after High/Critical resolution batch
Reflects the resolution of Tests-001/002, IntegrationTests-001/002,
Client.Go-001, Worker-001/002/003 and Worker.Tests-001/002.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-18 21:08:06 -04:00
Joseph Doherty 1b4dcf32d5 Resolve Worker.Tests-001 and Worker.Tests-002 code-review findings
Worker.Tests-001: StaMessagePump had no direct unit test. Added
Sta/StaMessagePumpTests.cs — 8 STA-thread facts covering WaitForWorkOrMessages
(wake-event signalled before/during the wait, timeout expiry, zero-timeout
fast path, the QS_ALLINPUT posted-message wake path) and PumpPendingMessages
drain counting.

Worker.Tests-002: no test drove a COM event through the integrated
sink -> mapper -> queue path. Added MxAccess/MxAccessBaseEventSinkTests.cs —
5 facts driving OnDataChange, OnWriteComplete, OperationComplete and
OnBufferedDataChange through a real MxAccessBaseEventSink + mapper + queue and
asserting the converted WorkerEvent lands in MxAccessEventQueue. The four COM
event handlers were widened private -> internal and InternalsVisibleTo for
MxGateway.Worker.Tests was added, mirroring MxAccessAlarmEventSink's existing
test seam; no worker behavior changes.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-18 21:07:48 -04:00
Joseph Doherty 53e3973209 Resolve Worker-001, Worker-002, Worker-003 code-review findings
Worker-001: WnWrapAlarmConsumer armed a System.Threading.Timer whose OnPoll
callback ran GetXmlCurrentAlarms2 on a thread-pool thread against the
Apartment-threaded wnwrap COM object, which can deadlock on cross-apartment
marshaling. Removed the pollTimer/pollIntervalMs fields, OnPoll, the
poll-interval constructor parameter, and the timer arm/disposal. Polls are
driven externally by the STA via StaRuntime.InvokeAsync(PollOnce).

Worker-002: RunHeartbeatLoopAsync delayed a full HeartbeatInterval before
the first heartbeat. Restructured so the first beat is sent immediately on
entering the loop and the delay applies only between subsequent beats.

Worker-003: ProcessCommandAsync silently returned without a reply when
_state was not a command-serving state after dispatch. Both drop sites now
log a WorkerCommandResultDropped diagnostic with correlation_id via
IWorkerLogger; _state is now volatile.

Three pre-existing tests that asserted strict frame ordering were updated to
tolerate an interleaved first heartbeat (Worker-002 consequence).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-18 20:59:46 -04:00
Joseph Doherty e967e85973 Resolve Client.Go-001 code-review finding
MxAccessError.Unwrap returned e.Command directly; on the HRESULT-only path
Command is a nil *CommandError, so Unwrap returned a non-nil error wrapping
a typed nil and errors.As bound a nil *CommandError. Unwrap now returns an
untyped nil when Command is nil. Added errors_test.go regression coverage
for the HRESULT-only and populated-Command paths.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-18 20:46:12 -04:00
Joseph Doherty bc55396334 Resolve IntegrationTests-001 and IntegrationTests-002 code-review findings
IntegrationTests-001: documented the live Galaxy Repository test suite and
its MXGATEWAY_RUN_LIVE_GALAXY_TESTS / MXGATEWAY_LIVE_GALAXY_CONN gating in
docs/GatewayTesting.md.

IntegrationTests-002: documented the live LDAP test suite in
docs/GatewayTesting.md and added a concrete "Provisioning the GwAdmin group"
step to glauth.md so DashboardLdapLiveTests' GwAdmin-membership assumption
is reproducible.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-18 20:46:09 -04:00
Joseph Doherty b381bfcaf1 Resolve Tests-001 and Tests-002 code-review findings
Tests-001: FakeSessionManager.TryGetSession unconditionally synthesized a
session, so Invoke_WhenSessionMissing_ThrowsNotFound did not actually
verify the missing-session path. Added ResolveOnlySeededSessions/SeedSession
to the fake, rewrote the missing-session test, and added seeded-resolution
and alarm-RPC missing-session coverage.

Tests-002: re-triaged. GalaxyRepository issues only constant SQL; filters
are applied in-memory by GalaxyHierarchyProjector/GalaxyGlobMatcher. Kept
as a valid coverage gap and added GalaxyFilterInputSafetyTests exercising
filter/glob input safety directly.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-18 20:46:02 -04:00
Joseph Doherty 2a635c8522 Add code-reviews/prompt.md orchestration prompt
Reusable prompt for working the code-reviews/ backlog: batches one
subagent per module, TDD per finding, per-module commits, regenerates
the index. Adapted to mxaccessgw toolchains and module layout.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-18 17:18:39 -04:00
Joseph Doherty 9082e504a9 Mark Client.Rust findings resolved
All eleven Client.Rust findings are fixed in 0d8a28d; their Status is
now Resolved with the fixing commit recorded. Adds Client.Rust-012 —
an additional clippy::clone_on_copy violation in galaxy.rs found while
verifying that `cargo clippy -- -D warnings` passes — already Resolved
in the same commit. Regenerates code-reviews/README.md.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-18 17:10:30 -04:00
Joseph Doherty f0a4af62b9 Review the clients/ language clients; mark Server-001/003 resolved
Adds per-module code reviews for the five language clients under
clients/ (Client.Dotnet, Client.Go, Client.Java, Client.Python,
Client.Rust) at commit 3cc53a8 — 53 findings (4 High, 15 Medium,
34 Low; all Open). Extends REVIEW-PROCESS.md so a "module" may also be
a language client under clients/, not only a src/ project.

Marks Server-001 (Critical) and Server-003 (High) Resolved — fixed in
a8aafdf — and regenerates code-reviews/README.md (now 11 modules).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-18 16:51:00 -04:00
Joseph Doherty 3cc53a8c69 Harden code-review tooling and align REVIEW-PROCESS.md with mxaccessgw
- regen-readme.py: use `python` not the broken `python3` Store alias in
  the generated note and docstring; --check now also fails when a module
  header's "Open findings" count disagrees with finding statuses or a
  finding has an unrecognised Status (find_inconsistencies)
- REVIEW-PROCESS.md: rewritten for mxaccessgw (was describing ScadaLink)
  — MxGateway.* modules, "mxaccessgw conventions" checklist category,
  gateway.md/docs/ design context, `python` command
- scripts/check-code-reviews-readme.ps1: CI/pre-commit wrapper for
  regen-readme.py --check
- code-reviews/test_regen_readme.py: dependency-free parser tests
- code-reviews/README.md: regenerated

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-18 16:36:25 -04:00
Joseph Doherty ae164ea34f Add per-module code review tree under code-reviews/
Set up the code review process scaffolding adapted to mxaccessgw and
record a full per-module review of every src/MxGateway.* project at
commit 6c64030.

- code-reviews/_template/findings.md: per-module findings template
- code-reviews/regen-readme.py: generates README.md from findings.md
  files; --check fails if stale
- code-reviews/<Module>/findings.md: reviews for Contracts, Server,
  Worker, Tests, Worker.Tests, IntegrationTests (74 findings:
  1 Critical, 10 High, 23 Medium, 40 Low; all Open)
- code-reviews/README.md: generated cross-module index
- REVIEW-PROCESS.md: review process document

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-18 16:28:50 -04:00