Compare commits

...

6 Commits

Author SHA1 Message Date
Joseph Doherty 98f9b7792b Regenerate code-reviews index after Medium findings Batch A
Reflects resolution of Server-002/004/005/006, Worker-004..008,
Client.Dotnet-001/002/003, Client.Go-002/003, Client.Java-001..005.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-18 21:32:02 -04:00
Joseph Doherty ff41556b9a Resolve Client.Java-001..005 code-review findings
Client.Java-001: redactApiKey echoed the last 4 secret characters. It now
keeps only the non-secret mxgw_<key-id>_ prefix plus ***; non-gateway-shaped
tokens return <redacted>.

Client.Java-002: a close() after a queue-overflow could wipe the enqueued
overflow exception. Terminal transitions are now serialized through a single
guarded terminate() — first terminal condition wins.

Client.Java-003: openSession never read gateway_protocol_version. Both
openSession paths now call ensureGatewayProtocolCompatible, rejecting a
non-zero mismatch and accepting unset (0) for older gateways.

Client.Java-004: register/addItem/addItem2 fell back to a return_value that
silently yields 0 when unset. The fallback is now guarded by hasReturnValue()
and throws on a protocol violation.

Client.Java-005: close() in try-with-resources could mask the body exception
when the CloseSession RPC failed. close() now catches and logs the
close-time failure; closeRaw() still surfaces it for callers that want it.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-18 21:31:46 -04:00
Joseph Doherty f88a029ecc Resolve Client.Go-002, -003 code-review findings
Client.Go-002: the Events/EventsAfter compatibility path silently dropped
events when the 16-slot results channel filled — it cancelled the stream and
closed the channel with no error delivered. sendEventResult now evicts an
old buffered event and delivers a terminal EventResult carrying the new
exported ErrEventBufferOverflow before close, so the overflow is observable.

Client.Go-003: parseInt32List panicked on a malformed -item-handles token,
crashing the CLI with a stack trace. It now returns an error that
runUnsubscribeBulk propagates, exiting 2 with a clean message.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-18 21:31:36 -04:00
Joseph Doherty 8023eccfa6 Resolve Client.Dotnet-001, -002, -003 code-review findings
Client.Dotnet-001: MapRpcException typed only Unauthenticated and
PermissionDenied; every other gRPC status collapsed to an untyped exception
with the status code discarded. Added a nullable StatusCode to
MxGatewayException, extracted the duplicated mappers into a shared
RpcExceptionMapper that records the code for every status, and documented it.

Client.Dotnet-002: the production retry branch (MxGatewayException wrapping
RpcException) was never exercised. FakeGatewayTransport gained a
MapTransportExceptions mode that runs thrown RpcExceptions through
RpcExceptionMapper exactly as the production transport does.

Client.Dotnet-003: MxGatewaySession.DisposeAsync disposed _closeLock while a
concurrent CloseAsync could be parked in WaitAsync. DisposeAsync now drains
in-flight CloseAsync callers before disposing the semaphore; the client's
_disposed flag is accessed via Interlocked.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-18 21:31:33 -04:00
Joseph Doherty 54325343bd Resolve Worker-004, -005, -006, -007, -008 code-review findings
Worker-004: post-watchdog-fault heartbeats reported a non-faulted state.
ReportWatchdogFaultIfNeededAsync now sets _state = Faulted before writing
the StaHung fault.

Worker-005 (re-triaged): the cited OnPoll site was removed by Worker-001;
the real silent-failure bug was in MxAccessStaSession.RunAlarmPollLoopAsync,
which caught only graceful-stop exceptions. A failing PollOnce now records a
WorkerFault on the event queue instead of vanishing on a non-awaited task.

Worker-006: RunAsync's finally skipped runtime disposal when shutdown timed
out, leaking the STA thread and COM object. It now always disposes
(MxAccessStaSession.Dispose is idempotent and bounded).

Worker-007 (re-triaged): replaced MxAccessComServer's Type.InvokeMember
reflection fallback with an IMxAccessServer fast path plus typed
ILMXProxyServer* casts; a non-conforming object now fails fast.

Worker-008: alarm consumer STA affinity was unenforced. MxAccessStaSession
records the alarm consumer's STA thread id and asserts every PollOnce runs
on it via a unit-testable guard.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-18 21:31:23 -04:00
Joseph Doherty 1d9e3afadd Resolve Server-002, -004, -005, -006 code-review findings
Server-002: the gateway never terminated leftover MxGateway.Worker.exe
processes at startup, contradicting gateway.md and CLAUDE.md. Added
IRunningProcessInspector/SystemRunningProcessInspector, OrphanWorkerTerminator,
and OrphanWorkerCleanupHostedService (best-effort, runs before sessions are
accepted); updated gateway.md to describe the implemented behavior.

Server-004: API-key scopes were persisted verbatim with no validation. Added
GatewayScopes.All/IsKnown; the CLI parser and dashboard create path now
reject unknown scope strings.

Server-005: a non-SqlException/InvalidOperationException fault on the initial
Galaxy hierarchy load faulted the BackgroundService. ExecuteAsync now catches
all non-cancellation exceptions on first load and RefreshCoreAsync broadens
its catch so the cache records Stale/Unavailable instead.

Server-006: OpenSessionAsync incremented the open-sessions gauge before
alarm auto-subscribe; an auto-subscribe failure leaked the gauge. The catch
path now calls SessionRemoved() when the gauge was incremented.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-18 21:31:10 -04:00
55 changed files with 2229 additions and 242 deletions
@@ -91,6 +91,19 @@ internal sealed class FakeGatewayTransport(MxGatewayClientOptions options) : IMx
/// </summary>
public Queue<Exception> CloseSessionExceptions { get; } = new();
/// <summary>
/// Gets or sets a value indicating whether thrown <see cref="RpcException"/>s are mapped
/// to <see cref="MxGatewayException"/> the way the production gRPC transport does. Lets
/// retry tests exercise the wrapped-exception predicate branch that runs in production.
/// </summary>
public bool MapTransportExceptions { get; set; }
/// <summary>
/// Gets or sets an optional hook awaited inside CloseSessionAsync after the call is
/// recorded; lets tests pause a close mid-flight to observe concurrent dispose.
/// </summary>
public Func<Task>? CloseSessionHook { get; set; }
/// <summary>
/// Gets the queue of exceptions to throw from InvokeAsync.
/// </summary>
@@ -108,7 +121,7 @@ internal sealed class FakeGatewayTransport(MxGatewayClientOptions options) : IMx
OpenSessionCalls.Add((request, callOptions));
if (OpenSessionExceptions.TryDequeue(out Exception? exception))
{
throw exception;
throw Translate(exception, callOptions);
}
return Task.FromResult(OpenSessionReply);
@@ -119,17 +132,23 @@ internal sealed class FakeGatewayTransport(MxGatewayClientOptions options) : IMx
/// </summary>
/// <param name="request">The CloseSessionRequest to process.</param>
/// <param name="callOptions">Call options specifying RPC behavior.</param>
public Task<CloseSessionReply> CloseSessionAsync(
public async Task<CloseSessionReply> CloseSessionAsync(
CloseSessionRequest request,
CallOptions callOptions)
{
CloseSessionCalls.Add((request, callOptions));
if (CloseSessionExceptions.TryDequeue(out Exception? exception))
if (CloseSessionHook is not null)
{
throw exception;
await CloseSessionHook().ConfigureAwait(false);
}
return Task.FromResult(CloseSessionReply);
if (CloseSessionExceptions.TryDequeue(out Exception? exception))
{
throw Translate(exception, callOptions);
}
return CloseSessionReply;
}
/// <summary>
@@ -144,7 +163,7 @@ internal sealed class FakeGatewayTransport(MxGatewayClientOptions options) : IMx
InvokeCalls.Add((request, callOptions));
if (InvokeExceptions.TryDequeue(out Exception? exception))
{
throw exception;
throw Translate(exception, callOptions);
}
return Task.FromResult(_invokeReplies.Dequeue());
@@ -239,4 +258,18 @@ internal sealed class FakeGatewayTransport(MxGatewayClientOptions options) : IMx
{
_activeAlarmSnapshots.Add(snapshot);
}
/// <summary>
/// Maps a queued exception the way the production gRPC transport does when
/// <see cref="MapTransportExceptions"/> is set; otherwise returns it unchanged.
/// </summary>
private Exception Translate(Exception exception, CallOptions callOptions)
{
if (MapTransportExceptions && exception is RpcException rpcException)
{
return RpcExceptionMapper.Map(rpcException, callOptions.CancellationToken);
}
return exception;
}
}
@@ -231,6 +231,52 @@ public sealed class MxGatewayClientSessionTests
Assert.Equal("session-fixture", call.Request.SessionId);
}
/// <summary>
/// Verifies that disposing a session while other callers are concurrently inside
/// <see cref="MxGatewaySession.CloseAsync"/> — one holding the close lock and one
/// parked on it — never throws <see cref="ObjectDisposedException"/> into those
/// callers. The close lock must outlive every pending close.
/// </summary>
[Fact]
public async Task DisposeAsync_DoesNotRaceConcurrentCloseAsync()
{
for (int iteration = 0; iteration < 100; iteration++)
{
FakeGatewayTransport transport = CreateTransport();
using SemaphoreSlim firstCloseEntered = new(0, 1);
using SemaphoreSlim releaseFirstClose = new(0, 1);
// The first CloseAsync to reach the transport parks here while holding the
// session's close lock; later callers queue on the lock behind it.
transport.CloseSessionHook = async () =>
{
firstCloseEntered.Release();
await releaseFirstClose.WaitAsync().ConfigureAwait(false);
transport.CloseSessionHook = null;
};
await using MxGatewayClient client = CreateClient(transport);
MxGatewaySession session = await client.OpenSessionAsync();
// Holder enters CloseAsync, acquires the lock, and parks in the hook.
Task holder = Task.Run(() => session.CloseAsync());
await firstCloseEntered.WaitAsync();
// Waiter is parked on the close lock behind the holder.
Task waiter = Task.Run(() => session.CloseAsync());
// DisposeAsync runs concurrently; it must wait out both callers before
// disposing the close lock rather than tearing it down underneath them.
Task dispose = session.DisposeAsync().AsTask();
releaseFirstClose.Release();
await holder;
await waiter;
await dispose;
}
}
/// <summary>Verifies that invoke retries safe diagnostic commands on transient RPC failure.</summary>
[Fact]
public async Task InvokeAsync_RetriesSafeDiagnosticCommandOnTransientGrpcFailure()
@@ -255,6 +301,35 @@ public sealed class MxGatewayClientSessionTests
Assert.Equal(2, transport.InvokeCalls.Count);
}
/// <summary>
/// Verifies that the retry pipeline still retries when the transport maps the raw
/// <see cref="RpcException"/> to an <see cref="MxGatewayException"/> before it reaches
/// the retry predicate — the wrapped-exception shape that production always produces.
/// </summary>
[Fact]
public async Task InvokeAsync_RetriesSafeDiagnosticCommand_WhenTransportMapsRpcException()
{
FakeGatewayTransport transport = CreateTransport();
transport.MapTransportExceptions = true;
transport.InvokeExceptions.Enqueue(CreateTransientRpcException());
transport.AddInvokeReply(new MxCommandReply
{
SessionId = "session-fixture",
Kind = MxCommandKind.Ping,
ProtocolStatus = new ProtocolStatus { Code = ProtocolStatusCode.Ok },
});
await using MxGatewayClient client = CreateClient(transport);
MxGatewaySession session = await client.OpenSessionAsync();
await session.InvokeAsync(new MxCommandRequest
{
SessionId = session.SessionId,
Command = new MxCommand { Kind = MxCommandKind.Ping, Ping = new PingCommand() },
});
Assert.Equal(2, transport.InvokeCalls.Count);
}
/// <summary>Verifies that open session does not retry on transient RPC failure.</summary>
[Fact]
public async Task OpenSessionAsync_DoesNotRetryTransientGrpcFailure()
@@ -0,0 +1,76 @@
using Grpc.Core;
namespace MxGateway.Client.Tests;
/// <summary>Tests for the shared gRPC-to-native exception mapping used by the transports.</summary>
public sealed class RpcExceptionMapperTests
{
/// <summary>Verifies that an unauthenticated status maps to the authentication exception.</summary>
[Fact]
public void Map_UnauthenticatedStatus_ProducesAuthenticationException()
{
RpcException rpc = new(new Status(StatusCode.Unauthenticated, "no key"));
Exception mapped = RpcExceptionMapper.Map(rpc, CancellationToken.None);
MxGatewayAuthenticationException authentication =
Assert.IsType<MxGatewayAuthenticationException>(mapped);
Assert.Equal(StatusCode.Unauthenticated, authentication.StatusCode);
}
/// <summary>Verifies that a permission-denied status maps to the authorization exception.</summary>
[Fact]
public void Map_PermissionDeniedStatus_ProducesAuthorizationException()
{
RpcException rpc = new(new Status(StatusCode.PermissionDenied, "missing scope"));
Exception mapped = RpcExceptionMapper.Map(rpc, CancellationToken.None);
MxGatewayAuthorizationException authorization =
Assert.IsType<MxGatewayAuthorizationException>(mapped);
Assert.Equal(StatusCode.PermissionDenied, authorization.StatusCode);
}
/// <summary>Verifies that a cancelled status maps to OperationCanceledException.</summary>
[Fact]
public void Map_CancelledStatus_ProducesOperationCanceledException()
{
RpcException rpc = new(new Status(StatusCode.Cancelled, "cancelled"));
Exception mapped = RpcExceptionMapper.Map(rpc, CancellationToken.None);
Assert.IsType<OperationCanceledException>(mapped);
}
/// <summary>
/// Verifies that non-auth statuses surface the originating gRPC status code on the
/// mapped exception so callers can distinguish transient from permanent failures
/// without reflecting into InnerException.
/// </summary>
[Theory]
[InlineData(StatusCode.NotFound)]
[InlineData(StatusCode.InvalidArgument)]
[InlineData(StatusCode.ResourceExhausted)]
[InlineData(StatusCode.FailedPrecondition)]
[InlineData(StatusCode.Unavailable)]
[InlineData(StatusCode.Internal)]
public void Map_NonAuthStatus_CarriesStatusCodeOnMxGatewayException(StatusCode statusCode)
{
RpcException rpc = new(new Status(statusCode, "boom"));
Exception mapped = RpcExceptionMapper.Map(rpc, CancellationToken.None);
MxGatewayException gatewayException = Assert.IsType<MxGatewayException>(mapped);
Assert.Equal(statusCode, gatewayException.StatusCode);
Assert.Same(rpc, gatewayException.InnerException);
}
/// <summary>Verifies that an MxGatewayException built without a gRPC status reports a null StatusCode.</summary>
[Fact]
public void StatusCode_IsNull_WhenNoGrpcStatusProvided()
{
MxGatewayException gatewayException = new("plain failure");
Assert.Null(gatewayException.StatusCode);
}
}
@@ -36,7 +36,7 @@ internal sealed class GrpcGalaxyRepositoryClientTransport(
}
catch (RpcException exception)
{
throw MapRpcException(exception, callOptions.CancellationToken);
throw RpcExceptionMapper.Map(exception, callOptions.CancellationToken);
}
}
@@ -53,7 +53,7 @@ internal sealed class GrpcGalaxyRepositoryClientTransport(
}
catch (RpcException exception)
{
throw MapRpcException(exception, callOptions.CancellationToken);
throw RpcExceptionMapper.Map(exception, callOptions.CancellationToken);
}
}
@@ -70,7 +70,7 @@ internal sealed class GrpcGalaxyRepositoryClientTransport(
}
catch (RpcException exception)
{
throw MapRpcException(exception, callOptions.CancellationToken);
throw RpcExceptionMapper.Map(exception, callOptions.CancellationToken);
}
}
@@ -101,7 +101,7 @@ internal sealed class GrpcGalaxyRepositoryClientTransport(
}
catch (RpcException exception)
{
throw MapRpcException(exception, effectiveCancellationToken);
throw RpcExceptionMapper.Map(exception, effectiveCancellationToken);
}
yield return deployEvent;
@@ -115,28 +115,4 @@ internal sealed class GrpcGalaxyRepositoryClientTransport(
{
return WatchDeployEventsAsync(request, callOptions);
}
private static Exception MapRpcException(
RpcException exception,
CancellationToken cancellationToken)
{
if (cancellationToken.IsCancellationRequested || exception.StatusCode == StatusCode.Cancelled)
{
return new OperationCanceledException(
exception.Status.Detail,
exception,
cancellationToken);
}
return exception.StatusCode switch
{
StatusCode.Unauthenticated => new MxGatewayAuthenticationException(
exception.Status.Detail,
innerException: exception),
StatusCode.PermissionDenied => new MxGatewayAuthorizationException(
exception.Status.Detail,
innerException: exception),
_ => new MxGatewayException(exception.Status.Detail, exception),
};
}
}
@@ -36,7 +36,7 @@ internal sealed class GrpcMxGatewayClientTransport(
}
catch (RpcException exception)
{
throw MapRpcException(exception, callOptions.CancellationToken);
throw RpcExceptionMapper.Map(exception, callOptions.CancellationToken);
}
}
@@ -53,7 +53,7 @@ internal sealed class GrpcMxGatewayClientTransport(
}
catch (RpcException exception)
{
throw MapRpcException(exception, callOptions.CancellationToken);
throw RpcExceptionMapper.Map(exception, callOptions.CancellationToken);
}
}
@@ -70,7 +70,7 @@ internal sealed class GrpcMxGatewayClientTransport(
}
catch (RpcException exception)
{
throw MapRpcException(exception, callOptions.CancellationToken);
throw RpcExceptionMapper.Map(exception, callOptions.CancellationToken);
}
}
@@ -101,7 +101,7 @@ internal sealed class GrpcMxGatewayClientTransport(
}
catch (RpcException exception)
{
throw MapRpcException(exception, effectiveCancellationToken);
throw RpcExceptionMapper.Map(exception, effectiveCancellationToken);
}
yield return gatewayEvent;
@@ -129,7 +129,7 @@ internal sealed class GrpcMxGatewayClientTransport(
}
catch (RpcException exception)
{
throw MapRpcException(exception, callOptions.CancellationToken);
throw RpcExceptionMapper.Map(exception, callOptions.CancellationToken);
}
}
@@ -160,7 +160,7 @@ internal sealed class GrpcMxGatewayClientTransport(
}
catch (RpcException exception)
{
throw MapRpcException(exception, effectiveCancellationToken);
throw RpcExceptionMapper.Map(exception, effectiveCancellationToken);
}
yield return snapshot;
@@ -174,28 +174,4 @@ internal sealed class GrpcMxGatewayClientTransport(
{
return QueryActiveAlarmsAsync(request, callOptions);
}
private static Exception MapRpcException(
RpcException exception,
CancellationToken cancellationToken)
{
if (cancellationToken.IsCancellationRequested || exception.StatusCode == StatusCode.Cancelled)
{
return new OperationCanceledException(
exception.Status.Detail,
exception,
cancellationToken);
}
return exception.StatusCode switch
{
StatusCode.Unauthenticated => new MxGatewayAuthenticationException(
exception.Status.Detail,
innerException: exception),
StatusCode.PermissionDenied => new MxGatewayAuthorizationException(
exception.Status.Detail,
innerException: exception),
_ => new MxGatewayException(exception.Status.Detail, exception),
};
}
}
@@ -1,3 +1,4 @@
using Grpc.Core;
using MxGateway.Contracts.Proto;
namespace MxGateway.Client;
@@ -13,6 +14,7 @@ public sealed class MxGatewayAuthenticationException : MxGatewayException
/// <param name="hResult">The HResult code, if available.</param>
/// <param name="statuses">The MXAccess statuses, if available.</param>
/// <param name="innerException">The underlying exception, if any.</param>
/// <param name="statusCode">The gRPC status code reported by the failed call, if available.</param>
public MxGatewayAuthenticationException(
string message,
string? sessionId = null,
@@ -20,7 +22,8 @@ public sealed class MxGatewayAuthenticationException : MxGatewayException
ProtocolStatus? protocolStatus = null,
int? hResult = null,
IReadOnlyList<MxStatusProxy>? statuses = null,
Exception? innerException = null)
Exception? innerException = null,
StatusCode? statusCode = null)
: base(
message,
sessionId,
@@ -28,7 +31,8 @@ public sealed class MxGatewayAuthenticationException : MxGatewayException
protocolStatus,
hResult,
statuses ?? [],
innerException)
innerException,
statusCode)
{
}
}
@@ -1,3 +1,4 @@
using Grpc.Core;
using MxGateway.Contracts.Proto;
namespace MxGateway.Client;
@@ -13,6 +14,7 @@ public sealed class MxGatewayAuthorizationException : MxGatewayException
/// <param name="hResult">The HResult code, if available.</param>
/// <param name="statuses">The MXAccess statuses, if available.</param>
/// <param name="innerException">The underlying exception, if any.</param>
/// <param name="statusCode">The gRPC status code reported by the failed call, if available.</param>
public MxGatewayAuthorizationException(
string message,
string? sessionId = null,
@@ -20,7 +22,8 @@ public sealed class MxGatewayAuthorizationException : MxGatewayException
ProtocolStatus? protocolStatus = null,
int? hResult = null,
IReadOnlyList<MxStatusProxy>? statuses = null,
Exception? innerException = null)
Exception? innerException = null,
StatusCode? statusCode = null)
: base(
message,
sessionId,
@@ -28,7 +31,8 @@ public sealed class MxGatewayAuthorizationException : MxGatewayException
protocolStatus,
hResult,
statuses ?? [],
innerException)
innerException,
statusCode)
{
}
}
@@ -17,7 +17,7 @@ public sealed class MxGatewayClient : IAsyncDisposable
private readonly GrpcChannel _channel;
private readonly IMxGatewayClientTransport _transport;
private readonly ResiliencePipeline _safeUnaryRetryPipeline;
private bool _disposed;
private int _disposed;
/// <summary>
/// Initializes a new instance of the <see cref="MxGatewayClient"/> with given options and transport.
@@ -229,12 +229,11 @@ public sealed class MxGatewayClient : IAsyncDisposable
/// </summary>
public ValueTask DisposeAsync()
{
if (_disposed)
if (Interlocked.Exchange(ref _disposed, 1) != 0)
{
return ValueTask.CompletedTask;
}
_disposed = true;
_channel?.Dispose();
return ValueTask.CompletedTask;
}
@@ -335,6 +334,6 @@ public sealed class MxGatewayClient : IAsyncDisposable
private void ThrowIfDisposed()
{
ObjectDisposedException.ThrowIf(_disposed, this);
ObjectDisposedException.ThrowIf(Volatile.Read(ref _disposed) != 0, this);
}
}
@@ -1,3 +1,4 @@
using Grpc.Core;
using MxGateway.Contracts.Proto;
namespace MxGateway.Client;
@@ -28,6 +29,20 @@ public class MxGatewayException : Exception
Statuses = [];
}
/// <summary>
/// Initializes a new instance of the MxGatewayException class carrying the originating
/// gRPC status code so callers can distinguish transient from permanent failures.
/// </summary>
/// <param name="message">Diagnostic message describing the failure.</param>
/// <param name="statusCode">The gRPC status code reported by the failed call.</param>
/// <param name="innerException">Underlying exception that caused this failure.</param>
public MxGatewayException(string message, StatusCode statusCode, Exception? innerException)
: base(message, innerException)
{
StatusCode = statusCode;
Statuses = [];
}
/// <summary>
/// Initializes a new instance of the MxGatewayException class with full diagnostic information.
/// </summary>
@@ -38,6 +53,7 @@ public class MxGatewayException : Exception
/// <param name="hResult">HRESULT code returned by the worker or MXAccess, if available.</param>
/// <param name="statuses">List of MXAccess status codes returned by the operation.</param>
/// <param name="innerException">Underlying exception that caused this failure.</param>
/// <param name="statusCode">The gRPC status code reported by the failed call, if available.</param>
public MxGatewayException(
string message,
string? sessionId,
@@ -45,7 +61,8 @@ public class MxGatewayException : Exception
ProtocolStatus? protocolStatus,
int? hResult,
IReadOnlyList<MxStatusProxy> statuses,
Exception? innerException = null)
Exception? innerException = null,
StatusCode? statusCode = null)
: base(message, innerException)
{
SessionId = sessionId;
@@ -53,6 +70,7 @@ public class MxGatewayException : Exception
ProtocolStatus = protocolStatus;
HResultCode = hResult;
Statuses = statuses;
StatusCode = statusCode;
}
/// <summary>
@@ -79,4 +97,15 @@ public class MxGatewayException : Exception
/// Gets the list of MXAccess status codes returned by the operation.
/// </summary>
public IReadOnlyList<MxStatusProxy> Statuses { get; }
/// <summary>
/// Gets the gRPC status code reported by the failed call, if the failure originated
/// from a gRPC <see cref="RpcException"/>. <see langword="null"/> when the exception
/// was not produced from a gRPC status (for example, a protocol-level reply failure).
/// Callers can inspect this to distinguish a transient outage
/// (<see cref="Grpc.Core.StatusCode.Unavailable"/>) from a permanent error
/// (<see cref="Grpc.Core.StatusCode.InvalidArgument"/>) without downcasting
/// <see cref="Exception.InnerException"/>.
/// </summary>
public StatusCode? StatusCode { get; }
}
@@ -9,7 +9,10 @@ public sealed class MxGatewaySession : IAsyncDisposable
{
private readonly MxGatewayClient _client;
private readonly SemaphoreSlim _closeLock = new(1, 1);
private readonly object _disposeGate = new();
private CloseSessionReply? _closeReply;
private int _activeCloseCount;
private bool _closeLockDisposed;
/// <summary>
/// Initializes a new session backed by the given MXAccess gateway client.
@@ -46,23 +49,42 @@ public sealed class MxGatewaySession : IAsyncDisposable
return _closeReply;
}
await _closeLock.WaitAsync(cancellationToken).ConfigureAwait(false);
// Register as an in-flight closer under the dispose gate. DisposeAsync waits for
// _activeCloseCount to drain before disposing the close lock, so the semaphore is
// guaranteed to outlive every WaitAsync started here.
lock (_disposeGate)
{
ObjectDisposedException.ThrowIf(_closeLockDisposed, this);
_activeCloseCount++;
}
try
{
if (_closeReply is not null)
await _closeLock.WaitAsync(cancellationToken).ConfigureAwait(false);
try
{
if (_closeReply is not null)
{
return _closeReply;
}
_closeReply = await _client.CloseSessionRawAsync(
new CloseSessionRequest { SessionId = SessionId },
cancellationToken)
.ConfigureAwait(false);
return _closeReply;
}
_closeReply = await _client.CloseSessionRawAsync(
new CloseSessionRequest { SessionId = SessionId },
cancellationToken)
.ConfigureAwait(false);
return _closeReply;
finally
{
_closeLock.Release();
}
}
finally
{
_closeLock.Release();
lock (_disposeGate)
{
_activeCloseCount--;
}
}
}
@@ -658,7 +680,32 @@ public sealed class MxGatewaySession : IAsyncDisposable
/// </summary>
public async ValueTask DisposeAsync()
{
lock (_disposeGate)
{
if (_closeLockDisposed)
{
return;
}
}
await CloseAsync().ConfigureAwait(false);
// Wait for every concurrent CloseAsync caller to leave the close lock before
// disposing it; once _closeReply is set those callers return without awaiting.
while (true)
{
lock (_disposeGate)
{
if (_activeCloseCount == 0)
{
_closeLockDisposed = true;
break;
}
}
await Task.Yield();
}
_closeLock.Dispose();
}
@@ -0,0 +1,55 @@
using Grpc.Core;
namespace MxGateway.Client;
/// <summary>
/// Maps low-level <see cref="RpcException"/>s raised by the gRPC stack to the client's
/// native exception hierarchy. Shared by every gateway and Galaxy Repository transport
/// so the gRPC-to-native translation has exactly one implementation.
/// </summary>
internal static class RpcExceptionMapper
{
/// <summary>
/// Translates a <see cref="RpcException"/> into the most specific native exception type.
/// </summary>
/// <param name="exception">The gRPC exception to translate.</param>
/// <param name="cancellationToken">
/// The cancellation token of the originating call; used to distinguish a caller-driven
/// cancellation from a server-side <see cref="StatusCode.Cancelled"/> status.
/// </param>
/// <returns>
/// An <see cref="OperationCanceledException"/> when the call was cancelled, a typed
/// authentication/authorization exception for auth statuses, or an
/// <see cref="MxGatewayException"/> carrying the originating gRPC <see cref="StatusCode"/>.
/// </returns>
public static Exception Map(
RpcException exception,
CancellationToken cancellationToken)
{
ArgumentNullException.ThrowIfNull(exception);
if (cancellationToken.IsCancellationRequested || exception.StatusCode == StatusCode.Cancelled)
{
return new OperationCanceledException(
exception.Status.Detail,
exception,
cancellationToken);
}
return exception.StatusCode switch
{
StatusCode.Unauthenticated => new MxGatewayAuthenticationException(
exception.Status.Detail,
statusCode: exception.StatusCode,
innerException: exception),
StatusCode.PermissionDenied => new MxGatewayAuthorizationException(
exception.Status.Detail,
statusCode: exception.StatusCode,
innerException: exception),
_ => new MxGatewayException(
exception.Status.Detail,
exception.StatusCode,
exception),
};
}
}
+11
View File
@@ -112,6 +112,17 @@ can keep the full `MxCommandReply`, HRESULT, and status array when MXAccess
itself rejects a command. `MxAccessException.Reply` contains the raw generated
reply.
When a gRPC call itself fails, the transport maps the underlying
`RpcException` to a native exception: `Unauthenticated` becomes
`MxGatewayAuthenticationException`, `PermissionDenied` becomes
`MxGatewayAuthorizationException`, a cancelled call becomes
`OperationCanceledException`, and every other status becomes a base
`MxGatewayException`. `MxGatewayException.StatusCode` carries the originating
gRPC `Grpc.Core.StatusCode` (non-null whenever the failure came from a gRPC
status), so callers can distinguish a transient outage (`Unavailable`) from a
permanent error (`InvalidArgument`, `NotFound`) without downcasting
`InnerException`.
## CLI Usage
The test CLI supports deterministic JSON output for automation:
+7 -1
View File
@@ -79,7 +79,13 @@ client, err := mxgateway.Dial(ctx, mxgateway.Options{
`AddItem`, `AddItem2`, `Advise`, `Write`, `Events`, and `Close`. Prefer
`SubscribeEvents` or `SubscribeEventsAfter` for long-running streams because the
returned subscription owns cancellation and exposes `Close` for deterministic
goroutine cleanup. Raw protobuf messages remain available through the
goroutine cleanup. `Events` and `EventsAfter` are a compatibility shim with a
bounded internal buffer: if the consumer drains too slowly the buffer fills,
the underlying stream is cancelled, and a terminal `EventResult` carrying
`ErrEventBufferOverflow` is delivered as the channel's last item before it
closes — so a slow consumer can distinguish dropped events from a normal
end-of-stream. `SubscribeEvents` blocks instead of dropping, so use it when no
events may be lost. Raw protobuf messages remain available through the
`mxgateway` package aliases and the `Raw` helper methods. Typed errors support
`errors.As` for `GatewayError`, `CommandError`, and `MxAccessError`; command
errors preserve the raw reply.
+9 -4
View File
@@ -331,6 +331,11 @@ func runUnsubscribeBulk(ctx context.Context, args []string, stdout, stderr io.Wr
return errors.New("session-id and item-handles are required")
}
handles, err := parseInt32List(*itemHandles)
if err != nil {
return err
}
client, options, err := dialForCommand(ctx, common)
if err != nil {
return err
@@ -338,7 +343,7 @@ func runUnsubscribeBulk(ctx context.Context, args []string, stdout, stderr io.Wr
defer client.Close()
session := mxgateway.NewSessionForID(client, *sessionID)
results, err := session.UnsubscribeBulk(ctx, int32(*serverHandle), parseInt32List(*itemHandles))
results, err := session.UnsubscribeBulk(ctx, int32(*serverHandle), handles)
return writeBulkOutput(stdout, *jsonOutput, "unsubscribe-bulk", options, results, err)
}
@@ -514,7 +519,7 @@ func parseStringList(value string) []string {
return items
}
func parseInt32List(value string) []int32 {
func parseInt32List(value string) ([]int32, error) {
parts := strings.Split(value, ",")
items := make([]int32, 0, len(parts))
for _, part := range parts {
@@ -524,11 +529,11 @@ func parseInt32List(value string) []int32 {
}
parsed, err := strconv.ParseInt(item, 10, 32)
if err != nil {
panic(err)
return nil, fmt.Errorf("invalid item handle %q: %w", item, err)
}
items = append(items, int32(parsed))
}
return items
return items, nil
}
func bindCommonFlags(flags *flag.FlagSet) *commonOptions {
+29
View File
@@ -56,3 +56,32 @@ func TestParseValueBuildsTypedValue(t *testing.T) {
t.Fatalf("int32 value = %d, want 123", got)
}
}
func TestParseInt32ListParsesValidTokens(t *testing.T) {
items, err := parseInt32List("1, 2 ,3")
if err != nil {
t.Fatalf("parseInt32List() error = %v", err)
}
want := []int32{1, 2, 3}
if len(items) != len(want) {
t.Fatalf("parseInt32List() = %v, want %v", items, want)
}
for i := range want {
if items[i] != want[i] {
t.Fatalf("parseInt32List()[%d] = %d, want %d", i, items[i], want[i])
}
}
}
func TestParseInt32ListReturnsErrorOnMalformedToken(t *testing.T) {
items, err := parseInt32List("1,foo")
if err == nil {
t.Fatalf("parseInt32List() error = nil, want a parse error; items = %v", items)
}
if items != nil {
t.Fatalf("parseInt32List() items = %v, want nil on error", items)
}
if !strings.Contains(err.Error(), "foo") {
t.Fatalf("parseInt32List() error = %q, want it to name the bad token", err.Error())
}
}
+15 -2
View File
@@ -117,7 +117,7 @@ func TestEventsAfterCancelsStreamWhenCompatibilityChannelIsAbandoned(t *testing.
fake := &fakeGatewayServer{
streamStarted: make(chan struct{}),
streamDone: make(chan struct{}),
streamEventCount: 64,
streamEventCount: 256,
}
client, cleanup := newBufconnClient(t, fake)
defer cleanup()
@@ -135,12 +135,25 @@ func TestEventsAfterCancelsStreamWhenCompatibilityChannelIsAbandoned(t *testing.
t.Fatal("compatibility event stream did not stop after result channel filled")
}
// A slow consumer that abandons the buffer must still receive an explicit
// terminal overflow error before the channel closes, so it can tell
// "events dropped" apart from "stream ended normally".
var sawOverflow bool
for {
select {
case _, ok := <-events:
case result, ok := <-events:
if !ok {
if !sawOverflow {
t.Fatal("compatibility event channel closed without an ErrEventBufferOverflow result")
}
return
}
if result.Err != nil {
if !errors.Is(result.Err, ErrEventBufferOverflow) {
t.Fatalf("terminal result error = %v, want ErrEventBufferOverflow", result.Err)
}
sawOverflow = true
}
case <-time.After(2 * time.Second):
t.Fatal("compatibility event channel did not close")
}
+9
View File
@@ -1,11 +1,20 @@
package mxgateway
import (
"errors"
"fmt"
pb "gitea.dohertylan.com/dohertj2/mxaccessgw/clients/go/internal/generated"
)
// ErrEventBufferOverflow is the terminal error delivered on the compatibility
// event channel returned by Session.Events / Session.EventsAfter when a slow
// consumer lets the bounded result buffer fill. It signals that the stream was
// cancelled and events were dropped, so a consumer can tell an overflow apart
// from a normal end-of-stream. Use Session.SubscribeEvents to block instead of
// dropping.
var ErrEventBufferOverflow = errors.New("mxgateway: event buffer overflow; compatibility stream cancelled and events dropped")
// GatewayError wraps transport-level gRPC failures.
type GatewayError struct {
// Op names the operation that failed (for example "dial" or "invoke").
+25 -1
View File
@@ -490,7 +490,7 @@ func ensureBulkSize(name string, length int) error {
func sendEventResult(
ctx context.Context,
results chan<- EventResult,
results chan EventResult,
result EventResult,
cancelWhenBufferFull bool,
cancel context.CancelFunc,
@@ -502,7 +502,12 @@ func sendEventResult(
case <-ctx.Done():
return false
default:
// The bounded compatibility buffer is full. Cancel the stream and
// deliver an explicit terminal overflow error so a slow consumer
// can tell dropped events apart from a normal end-of-stream,
// rather than seeing the channel close silently.
cancel()
deliverTerminalResult(results, EventResult{Err: ErrEventBufferOverflow})
return false
}
}
@@ -515,6 +520,25 @@ func sendEventResult(
}
}
// deliverTerminalResult places result on a full buffered channel by evicting
// one of the oldest buffered events to make room. The caller closes results
// afterwards, so the terminal result becomes the consumer's last item.
func deliverTerminalResult(results chan EventResult, result EventResult) {
for {
select {
case results <- result:
return
default:
}
select {
case <-results:
default:
// Another receiver drained the channel between the send and
// receive attempts; retry the send.
}
}
}
func (s *Session) invokeCommand(ctx context.Context, command *MxCommand) (*MxCommandReply, error) {
return s.client.Invoke(ctx, &pb.MxCommandRequest{
SessionId: s.ID(),
+12
View File
@@ -62,6 +62,18 @@ underlying protobuf messages. `MxGatewayCommandException` and
`MxAccessException` preserve the raw `MxCommandReply` when the gateway returns a
data-bearing MXAccess failure.
`openSession` verifies the gateway's reported `gateway_protocol_version` against
the version this client was generated for and throws `MxGatewayException` on a
mismatch, so an incompatible client fails fast with a clear message instead of
issuing commands that fail downstream. A gateway that does not populate the
field is accepted unchanged.
`MxGatewaySession` implements `AutoCloseable`. The try-with-resources `close()`
performs a `CloseSession` network RPC but swallows (and logs) any failure of
that RPC so a close-time error never replaces the exception a try-with-resources
body is already propagating. Call `closeRaw()` explicitly when you need to
observe the close result or handle a close-time failure.
`MxEventStream` implements `Iterator<MxEvent>` and `AutoCloseable`. Closing it
cancels the underlying gRPC stream. Canceling or timing out a Java client call
only stops the client from waiting; it does not abort an in-flight MXAccess COM
@@ -62,8 +62,10 @@ final class MxGatewayCliTests {
assertEquals(0, run.exitCode());
assertTrue(run.output().contains("\"command\":\"open-session\""));
assertTrue(run.output().contains("\"sessionId\":\"session-cli\""));
assertTrue(run.output().contains("mxgw***********cret"));
// Only the non-secret mxgw_<key-id>_ prefix survives; the secret is fully masked.
assertTrue(run.output().contains("mxgw_visible_***"));
assertFalse(run.output().contains("visible_secret"));
assertFalse(run.output().contains("cret"));
}
@Test
@@ -21,13 +21,23 @@ import mxaccess_gateway.v1.MxaccessGateway.StreamEventsRequest;
* stream cancels the underlying gRPC call. If the queue overflows the call is
* cancelled and a follow-up call to {@link #next()} throws
* {@link MxGatewayException}.
*
* <p><strong>Threading:</strong> the iterator methods ({@link #hasNext()} and
* {@link #next()}) are <em>not</em> thread-safe and must be driven by a single
* consumer thread. {@link #close()} may be called from any thread. Terminal
* state transitions (queue overflow, server completion, and {@code close()})
* are serialised so that the first terminal condition wins deterministically:
* once an overflow exception has been observed it is never silently replaced
* by an end-of-stream marker.
*/
public final class MxEventStream implements Iterator<MxEvent>, AutoCloseable {
private static final Object END = new Object();
private final BlockingQueue<Object> queue;
private final Object terminalLock = new Object();
private volatile ClientCallStreamObserver<StreamEventsRequest> requestStream;
private volatile boolean closed;
private boolean terminated;
private Object next;
MxEventStream(int capacity) {
@@ -98,7 +108,7 @@ public final class MxEventStream implements Iterator<MxEvent>, AutoCloseable {
if (stream != null) {
stream.cancel("client cancelled event stream", null);
}
offer(END);
terminate(null);
}
private Object take() {
@@ -115,10 +125,7 @@ public final class MxEventStream implements Iterator<MxEvent>, AutoCloseable {
private void offer(Object value) {
Objects.requireNonNull(value, "value");
if (value == END) {
if (!queue.offer(value)) {
queue.clear();
queue.offer(value);
}
terminate(null);
return;
}
if (!queue.offer(value)) {
@@ -126,9 +133,38 @@ public final class MxEventStream implements Iterator<MxEvent>, AutoCloseable {
if (stream != null) {
stream.cancel("client event stream queue overflowed", null);
}
queue.clear();
queue.offer(new MxGatewayException("gateway stream events queue overflowed"));
queue.offer(END);
terminate(new MxGatewayException("gateway stream events queue overflowed"));
}
}
/**
* Drives the single terminal transition. The first caller wins: a later
* end-of-stream or {@code close()} cannot overwrite or discard an overflow
* exception that has already been published to the consumer.
*
* @param fault the fault to surface to the consumer, or {@code null} for a
* clean end-of-stream
*/
private void terminate(MxGatewayException fault) {
synchronized (terminalLock) {
if (terminated) {
return;
}
terminated = true;
if (fault != null) {
// Make room for the fault marker; the consumer only needs the
// terminal signal, queued data events are no longer relevant.
queue.clear();
queue.offer(fault);
queue.offer(END);
return;
}
// Clean end-of-stream: ensure the END marker is delivered even when
// the queue is currently full of undrained data events.
if (!queue.offer(END)) {
queue.clear();
queue.offer(END);
}
}
}
}
@@ -150,6 +150,7 @@ public final class MxGatewayClient implements AutoCloseable {
try {
OpenSessionReply reply = rawBlockingStub().openSession(request);
MxGatewayErrors.ensureProtocolSuccess("open session", reply.getProtocolStatus(), null);
ensureGatewayProtocolCompatible(reply);
return reply;
} catch (RuntimeException error) {
if (error instanceof MxGatewayException) {
@@ -159,6 +160,24 @@ public final class MxGatewayClient implements AutoCloseable {
}
}
/**
* Verifies that the gateway speaks the protocol version this client was
* generated against. A gateway that leaves {@code gateway_protocol_version}
* unset (value {@code 0}, e.g. an older gateway) is accepted unchanged.
*
* @param reply the {@code OpenSessionReply} returned by the gateway
* @throws MxGatewayException if the gateway reports an incompatible protocol version
*/
private static void ensureGatewayProtocolCompatible(OpenSessionReply reply) {
int gatewayVersion = reply.getGatewayProtocolVersion();
int clientVersion = MxGatewayClientVersion.gatewayProtocolVersion();
if (gatewayVersion != 0 && gatewayVersion != clientVersion) {
throw new MxGatewayException("gateway protocol version mismatch: gateway reports "
+ gatewayVersion + " but this client was built for " + clientVersion
+ "; upgrade the client or gateway so the protocol versions match");
}
}
/**
* Invokes {@code OpenSession} asynchronously.
*
@@ -170,6 +189,7 @@ public final class MxGatewayClient implements AutoCloseable {
CompletableFuture<OpenSessionReply> future = toCompletable(rawFutureStub().openSession(request));
return future.thenApply(reply -> {
MxGatewayErrors.ensureProtocolSuccess("open session", reply.getProtocolStatus(), null);
ensureGatewayProtocolCompatible(reply);
return reply;
});
}
@@ -11,25 +11,35 @@ public final class MxGatewaySecrets {
}
/**
* Redacts the body of an API key, leaving only short prefix and suffix
* windows so it remains comparable in logs.
* Redacts the secret portion of an API key, leaving only the non-secret
* key identifier visible so the value remains comparable in logs.
*
* <p>A gateway API key has the form {@code mxgw_<key-id>_<secret>}. Only the
* {@code mxgw_<key-id>_} prefix is non-secret; everything after the second
* underscore is the secret and is masked entirely &mdash; no leading or
* trailing characters of the secret are echoed. Tokens that do not match
* the gateway shape are masked completely as {@code "<redacted>"}.
*
* @param apiKey the API key to redact, may be {@code null} or empty
* @return an empty string for {@code null}/empty input, {@code "<redacted>"}
* for keys eight characters or shorter, or a masked form preserving
* the leading and trailing four characters
* for non-gateway-shaped tokens, or {@code mxgw_<key-id>_***} with the
* secret masked for gateway-shaped keys
*/
public static String redactApiKey(String apiKey) {
if (apiKey == null || apiKey.isEmpty()) {
return "";
}
if (apiKey.length() <= 8) {
return "<redacted>";
// Gateway keys are mxgw_<key-id>_<secret>; keep only the non-secret prefix.
if (apiKey.startsWith("mxgw_")) {
int secretSeparator = apiKey.indexOf('_', "mxgw_".length());
if (secretSeparator >= 0 && secretSeparator < apiKey.length() - 1) {
return apiKey.substring(0, secretSeparator + 1) + "***";
}
}
return apiKey.substring(0, 4)
+ "*".repeat(apiKey.length() - 8)
+ apiKey.substring(apiKey.length() - 4);
// Anything else is treated as wholly secret reveal nothing.
return "<redacted>";
}
/**
@@ -40,6 +40,7 @@ import mxaccess_gateway.v1.MxaccessGateway.WriteCommand;
*/
public final class MxGatewaySession implements AutoCloseable {
private static final SecureRandom RANDOM = new SecureRandom();
private static final System.Logger LOGGER = System.getLogger(MxGatewaySession.class.getName());
private final MxGatewayClient client;
private final OpenSessionReply openReply;
@@ -99,9 +100,26 @@ public final class MxGatewaySession implements AutoCloseable {
return closeReply;
}
/**
* Closes the session as part of try-with-resources.
*
* <p>This performs a {@code CloseSession} network RPC. Unlike
* {@link #closeRaw()}, any failure of that RPC is swallowed (and recorded
* as a suppressed exception when the JVM permits) rather than thrown: a
* close-time transport or protocol failure must not replace the exception
* that a try-with-resources body is already propagating. Callers that need
* to observe the close result should call {@link #closeRaw()} explicitly.
*/
@Override
public void close() {
closeRaw();
try {
closeRaw();
} catch (MxGatewayException error) {
LOGGER.log(
System.Logger.Level.WARNING,
() -> "ignoring close-time failure for session " + sessionId(),
error);
}
}
/**
@@ -116,7 +134,11 @@ public final class MxGatewaySession implements AutoCloseable {
if (reply.hasRegister()) {
return reply.getRegister().getServerHandle();
}
return reply.getReturnValue().getInt32Value();
if (reply.hasReturnValue()) {
return reply.getReturnValue().getInt32Value();
}
throw new MxGatewayException(
"gateway register reply carried neither a register payload nor a return value");
}
/**
@@ -159,7 +181,11 @@ public final class MxGatewaySession implements AutoCloseable {
if (reply.hasAddItem()) {
return reply.getAddItem().getItemHandle();
}
return reply.getReturnValue().getInt32Value();
if (reply.hasReturnValue()) {
return reply.getReturnValue().getInt32Value();
}
throw new MxGatewayException(
"gateway addItem reply carried neither an add-item payload nor a return value");
}
/**
@@ -193,7 +219,11 @@ public final class MxGatewaySession implements AutoCloseable {
if (reply.hasAddItem2()) {
return reply.getAddItem2().getItemHandle();
}
return reply.getReturnValue().getInt32Value();
if (reply.hasReturnValue()) {
return reply.getReturnValue().getInt32Value();
}
throw new MxGatewayException(
"gateway addItem2 reply carried neither an add-item payload nor a return value");
}
/**
@@ -0,0 +1,394 @@
package com.dohertylan.mxgateway.client;
import static org.junit.jupiter.api.Assertions.assertEquals;
import static org.junit.jupiter.api.Assertions.assertFalse;
import static org.junit.jupiter.api.Assertions.assertNotNull;
import static org.junit.jupiter.api.Assertions.assertThrows;
import static org.junit.jupiter.api.Assertions.assertTrue;
import io.grpc.ManagedChannel;
import io.grpc.Server;
import io.grpc.inprocess.InProcessChannelBuilder;
import io.grpc.inprocess.InProcessServerBuilder;
import io.grpc.stub.StreamObserver;
import java.time.Duration;
import java.util.UUID;
import mxaccess_gateway.v1.MxAccessGatewayGrpc;
import mxaccess_gateway.v1.MxaccessGateway.CloseSessionReply;
import mxaccess_gateway.v1.MxaccessGateway.CloseSessionRequest;
import mxaccess_gateway.v1.MxaccessGateway.MxCommandKind;
import mxaccess_gateway.v1.MxaccessGateway.MxCommandReply;
import mxaccess_gateway.v1.MxaccessGateway.MxCommandRequest;
import mxaccess_gateway.v1.MxaccessGateway.OpenSessionReply;
import mxaccess_gateway.v1.MxaccessGateway.OpenSessionRequest;
import mxaccess_gateway.v1.MxaccessGateway.ProtocolStatus;
import mxaccess_gateway.v1.MxaccessGateway.ProtocolStatusCode;
import org.junit.jupiter.api.Test;
/**
* Regression tests for the Medium-severity Client.Java code-review findings
* (Client.Java-001 through Client.Java-005).
*/
final class MxGatewayMediumFindingsTests {
// --- Client.Java-001: redactApiKey must not leak trailing secret chars ---
@Test
void redactApiKeyDoesNotLeakAnyCharacterOfTheSecret() {
// mxgw_<key-id>_<secret> the secret is the segment after the second underscore.
String apiKey = "mxgw_keyid01_supersecretvalue";
String redacted = MxGatewaySecrets.redactApiKey(apiKey);
// None of the secret characters may appear in the redacted output.
assertFalse(redacted.contains("value"), () -> "redacted form leaked secret tail: " + redacted);
assertFalse(redacted.endsWith("alue"), () -> "redacted form leaked trailing secret chars: " + redacted);
assertFalse(redacted.contains("supersecret"), () -> "redacted form leaked secret: " + redacted);
// The non-secret key-id prefix may stay so the value is still comparable in logs.
assertTrue(redacted.startsWith("mxgw_keyid01_"), () -> "redacted form lost key-id prefix: " + redacted);
}
@Test
void redactApiKeyForNonGatewayShapedKeyRevealsNothing() {
String redacted = MxGatewaySecrets.redactApiKey("plain-opaque-token-1234");
assertFalse(redacted.contains("1234"), () -> "redacted form leaked trailing chars: " + redacted);
assertFalse(redacted.contains("plain-opaque-token"), () -> "redacted form leaked body: " + redacted);
}
@Test
void redactApiKeyStillHandlesNullAndShortInput() {
assertEquals("", MxGatewaySecrets.redactApiKey(null));
assertEquals("", MxGatewaySecrets.redactApiKey(""));
assertEquals("<redacted>", MxGatewaySecrets.redactApiKey("short"));
}
// --- Client.Java-002: terminal-state transition must be deterministic ---
@Test
void eventStreamOverflowExceptionSurvivesASubsequentClose() {
// Deterministic reproduction of Client.Java-002: an overflow enqueues the
// overflow exception, then a later close() must NOT discard it. The first
// terminal condition (overflow) must win and stay observable by next().
MxEventStream stream = new MxEventStream(2);
io.grpc.stub.ClientResponseObserver<
mxaccess_gateway.v1.MxaccessGateway.StreamEventsRequest,
mxaccess_gateway.v1.MxaccessGateway.MxEvent>
observer = stream.observer();
observer.beforeStart(new NoopRequestStream());
// Force a queue overflow on a capacity-2 stream.
for (int i = 0; i < 8; i++) {
observer.onNext(testEvent(i));
}
// A close() arriving after the overflow must not erase the overflow signal.
stream.close();
MxGatewayException error = assertThrows(MxGatewayException.class, () -> {
while (stream.hasNext()) {
stream.next();
}
});
assertTrue(error.getMessage().contains("overflow"), error::getMessage);
}
@Test
void eventStreamConcurrentOverflowAndCloseAlwaysTerminate() throws Exception {
// The terminal-state transition must be serialised: whatever the interleaving
// of overflow and close, hasNext() always reaches a terminal state.
for (int iteration = 0; iteration < 300; iteration++) {
MxEventStream stream = new MxEventStream(2);
io.grpc.stub.ClientResponseObserver<
mxaccess_gateway.v1.MxaccessGateway.StreamEventsRequest,
mxaccess_gateway.v1.MxaccessGateway.MxEvent>
observer = stream.observer();
observer.beforeStart(new NoopRequestStream());
Thread filler = new Thread(() -> {
for (int i = 0; i < 8; i++) {
observer.onNext(testEvent(i));
}
});
Thread closer = new Thread(stream::close);
filler.start();
closer.start();
filler.join();
closer.join();
try {
while (stream.hasNext()) {
stream.next();
}
} catch (MxGatewayException expected) {
assertTrue(expected.getMessage().contains("overflow"), expected::getMessage);
}
assertFalse(stream.hasNext());
}
}
private static final class NoopRequestStream
extends io.grpc.stub.ClientCallStreamObserver<mxaccess_gateway.v1.MxaccessGateway.StreamEventsRequest> {
@Override
public void cancel(String message, Throwable cause) {
}
@Override
public boolean isReady() {
return true;
}
@Override
public void setOnReadyHandler(Runnable onReadyHandler) {
}
@Override
public void request(int count) {
}
@Override
public void setMessageCompression(boolean enable) {
}
@Override
public void disableAutoInboundFlowControl() {
}
@Override
public void onNext(mxaccess_gateway.v1.MxaccessGateway.StreamEventsRequest value) {
}
@Override
public void onError(Throwable t) {
}
@Override
public void onCompleted() {
}
}
// --- Client.Java-003: gateway protocol version mismatch must be rejected ---
@Test
void openSessionRejectsIncompatibleGatewayProtocolVersion() throws Exception {
TestService service = new TestService() {
@Override
public void openSession(OpenSessionRequest request, StreamObserver<OpenSessionReply> responseObserver) {
responseObserver.onNext(OpenSessionReply.newBuilder()
.setSessionId("session-mismatch")
.setGatewayProtocolVersion(MxGatewayClientVersion.gatewayProtocolVersion() + 1)
.setProtocolStatus(ok())
.build());
responseObserver.onCompleted();
}
};
try (Harness harness = Harness.start(service)) {
MxGatewayException error = assertThrows(
MxGatewayException.class,
() -> harness.client().openSession("junit-session"));
assertTrue(error.getMessage().contains("protocol version"), error::getMessage);
}
}
@Test
void openSessionAcceptsMatchingOrUnsetGatewayProtocolVersion() throws Exception {
TestService matching = new TestService() {
@Override
public void openSession(OpenSessionRequest request, StreamObserver<OpenSessionReply> responseObserver) {
responseObserver.onNext(OpenSessionReply.newBuilder()
.setSessionId("session-ok")
.setGatewayProtocolVersion(MxGatewayClientVersion.gatewayProtocolVersion())
.setProtocolStatus(ok())
.build());
responseObserver.onCompleted();
}
};
try (Harness harness = Harness.start(matching)) {
assertEquals("session-ok", harness.client().openSession("junit-session").sessionId());
}
// A gateway that leaves the field unset (0) must not be rejected older gateways
// simply do not populate it.
TestService unset = new TestService();
try (Harness harness = Harness.start(unset)) {
assertEquals("session-java", harness.client().openSession("junit-session").sessionId());
}
}
// --- Client.Java-004: missing typed payload AND missing return_value must throw ---
@Test
void registerThrowsWhenReplyHasNeitherTypedPayloadNorReturnValue() throws Exception {
TestService service = new TestService() {
@Override
public void invoke(MxCommandRequest request, StreamObserver<MxCommandReply> responseObserver) {
// Reply with neither register payload nor return_value set.
responseObserver.onNext(MxCommandReply.newBuilder()
.setSessionId(request.getSessionId())
.setKind(request.getCommand().getKind())
.setProtocolStatus(ok())
.build());
responseObserver.onCompleted();
}
};
try (Harness harness = Harness.start(service)) {
MxGatewaySession session = MxGatewaySession.forSessionId(harness.client(), "s");
MxGatewayException error = assertThrows(
MxGatewayException.class, () -> session.register("c"));
assertTrue(error.getMessage().contains("register"), error::getMessage);
}
}
@Test
void addItemThrowsWhenReplyHasNeitherTypedPayloadNorReturnValue() throws Exception {
TestService service = new TestService() {
@Override
public void invoke(MxCommandRequest request, StreamObserver<MxCommandReply> responseObserver) {
responseObserver.onNext(MxCommandReply.newBuilder()
.setSessionId(request.getSessionId())
.setKind(request.getCommand().getKind())
.setProtocolStatus(ok())
.build());
responseObserver.onCompleted();
}
};
try (Harness harness = Harness.start(service)) {
MxGatewaySession session = MxGatewaySession.forSessionId(harness.client(), "s");
assertThrows(MxGatewayException.class, () -> session.addItem(1, "Tag"));
assertThrows(MxGatewayException.class, () -> session.addItem2(1, "Tag", "ctx"));
}
}
@Test
void addItemStillHonoursReturnValueFallback() throws Exception {
TestService service = new TestService() {
@Override
public void invoke(MxCommandRequest request, StreamObserver<MxCommandReply> responseObserver) {
responseObserver.onNext(MxCommandReply.newBuilder()
.setSessionId(request.getSessionId())
.setKind(request.getCommand().getKind())
.setProtocolStatus(ok())
.setReturnValue(mxaccess_gateway.v1.MxaccessGateway.MxValue.newBuilder()
.setInt32Value(99))
.build());
responseObserver.onCompleted();
}
};
try (Harness harness = Harness.start(service)) {
MxGatewaySession session = MxGatewaySession.forSessionId(harness.client(), "s");
assertEquals(99, session.addItem(1, "Tag"));
}
}
// --- Client.Java-005: close() must not mask the primary try-with-resources error ---
@Test
void closeSuppressesCloseTimeFailureInsteadOfMaskingBodyException() throws Exception {
TestService service = new TestService() {
@Override
public void closeSession(CloseSessionRequest request, StreamObserver<CloseSessionReply> responseObserver) {
responseObserver.onError(io.grpc.Status.UNAVAILABLE
.withDescription("WORKER_UNAVAILABLE")
.asRuntimeException());
}
};
try (Harness harness = Harness.start(service)) {
IllegalStateException bodyError = assertThrows(IllegalStateException.class, () -> {
try (MxGatewaySession session = MxGatewaySession.forSessionId(harness.client(), "s")) {
throw new IllegalStateException("body failure");
}
});
// The body exception must propagate; the close-time RPC failure must not replace it.
assertEquals("body failure", bodyError.getMessage());
}
}
@Test
void closeRawStillSurfacesCloseTimeFailureForCallersWhoWantIt() throws Exception {
TestService service = new TestService() {
@Override
public void closeSession(CloseSessionRequest request, StreamObserver<CloseSessionReply> responseObserver) {
responseObserver.onError(io.grpc.Status.UNAVAILABLE
.withDescription("WORKER_UNAVAILABLE")
.asRuntimeException());
}
};
try (Harness harness = Harness.start(service)) {
MxGatewaySession session = MxGatewaySession.forSessionId(harness.client(), "s");
assertThrows(MxGatewayException.class, session::closeRaw);
}
}
private static mxaccess_gateway.v1.MxaccessGateway.MxEvent testEvent(int sequence) {
return mxaccess_gateway.v1.MxaccessGateway.MxEvent.newBuilder()
.setWorkerSequence(sequence)
.build();
}
private static ProtocolStatus ok() {
return ProtocolStatus.newBuilder()
.setCode(ProtocolStatusCode.PROTOCOL_STATUS_CODE_OK)
.build();
}
private static class TestService extends MxAccessGatewayGrpc.MxAccessGatewayImplBase {
@Override
public void openSession(OpenSessionRequest request, StreamObserver<OpenSessionReply> responseObserver) {
responseObserver.onNext(OpenSessionReply.newBuilder()
.setSessionId("session-java")
.setProtocolStatus(ok())
.build());
responseObserver.onCompleted();
}
@Override
public void closeSession(CloseSessionRequest request, StreamObserver<CloseSessionReply> responseObserver) {
responseObserver.onNext(CloseSessionReply.newBuilder()
.setSessionId(request.getSessionId())
.setProtocolStatus(ok())
.build());
responseObserver.onCompleted();
}
@Override
public void invoke(MxCommandRequest request, StreamObserver<MxCommandReply> responseObserver) {
responseObserver.onNext(MxCommandReply.newBuilder()
.setSessionId(request.getSessionId())
.setKind(MxCommandKind.MX_COMMAND_KIND_UNSPECIFIED)
.setProtocolStatus(ok())
.build());
responseObserver.onCompleted();
}
}
private record Harness(Server server, ManagedChannel channel, MxGatewayClient client) implements AutoCloseable {
static Harness start(MxAccessGatewayGrpc.MxAccessGatewayImplBase service) throws Exception {
String name = "mxgw-medium-" + UUID.randomUUID();
Server server = InProcessServerBuilder.forName(name)
.directExecutor()
.addService(service)
.build()
.start();
ManagedChannel channel = InProcessChannelBuilder.forName(name).directExecutor().build();
MxGatewayClient client = new MxGatewayClient(
channel,
MxGatewayClientOptions.builder()
.endpoint("in-process")
.apiKey("")
.plaintext(true)
.callTimeout(Duration.ofSeconds(5))
.build());
return new Harness(server, channel, client);
}
@Override
public void close() {
channel.shutdownNow();
server.shutdownNow();
}
}
}
+7 -7
View File
@@ -7,7 +7,7 @@
| Review date | 2026-05-18 |
| Commit reviewed | `3cc53a8` |
| Status | Reviewed |
| Open findings | 8 |
| Open findings | 5 |
## Checklist coverage
@@ -33,13 +33,13 @@
| Severity | Medium |
| Category | Error handling & resilience |
| Location | `clients/dotnet/MxGateway.Client/GrpcMxGatewayClientTransport.cs:190-199`, `clients/dotnet/MxGateway.Client/GrpcGalaxyRepositoryClientTransport.cs:131-140` |
| Status | Open |
| Status | Resolved |
**Description:** `MapRpcException` only produces typed exceptions for `Unauthenticated` and `PermissionDenied`. Every other gRPC status — `NotFound`, `InvalidArgument`, `ResourceExhausted`, `FailedPrecondition`, `Unavailable`, `Internal` — collapses into the base `MxGatewayException` with no surfaced `StatusCode`. Callers cannot programmatically distinguish a transient outage from a permanent bad-argument error without reflecting into `InnerException` and downcasting to `RpcException`.
**Recommendation:** Carry the gRPC `StatusCode` on `MxGatewayException` (e.g. a `StatusCode` property) and/or add typed subclasses for at least `NotFound`, `InvalidArgument`, and `Unavailable`. Populate it from `exception.StatusCode` in `MapRpcException`.
**Resolution:** _(open)_
**Resolution:** (2026-05-18) Confirmed against source: both transports had a duplicated private `MapRpcException` that only typed two statuses and discarded the gRPC code for the rest. Added a nullable `StatusCode` property (`Grpc.Core.StatusCode?`) to `MxGatewayException` plus constructors that carry it, threaded it through `MxGatewayAuthenticationException`/`MxGatewayAuthorizationException`, and extracted the two duplicated mappers into a single shared internal `RpcExceptionMapper` (`RpcExceptionMapper.cs`) that populates `StatusCode` from `exception.StatusCode` for every status. Callers can now distinguish transient from permanent failures without downcasting `InnerException`. Documented in `clients/dotnet/README.md`. Regression test: `RpcExceptionMapperTests` (8 cases incl. the `[Theory]` over `NotFound`/`InvalidArgument`/`ResourceExhausted`/`FailedPrecondition`/`Unavailable`/`Internal`).
### Client.Dotnet-002
@@ -48,13 +48,13 @@
| Severity | Medium |
| Category | Testing coverage |
| Location | `clients/dotnet/MxGateway.Client.Tests/FakeGatewayTransport.cs:145-148`, `clients/dotnet/MxGateway.Client.Tests/MxGatewayClientSessionTests.cs:236-256` |
| Status | Open |
| Status | Resolved |
**Description:** The retry predicate `MxGatewayClientRetryPolicy.IsTransientGrpcFailure` handles two shapes: a raw `RpcException` and an `MxGatewayException { InnerException: RpcException }`. In production the transport always maps `RpcException``MxGatewayException` before it reaches the retry pipeline, so only the wrapped-`MxGatewayException` branch ever runs in production. But `FakeGatewayTransport` throws the raw `RpcException` and never maps it, so every retry test exercises only the raw-`RpcException` branch — the branch that never occurs in production. The production retry behaviour is effectively untested.
**Recommendation:** Add a fake/transport mode that maps `RpcException` to `MxGatewayException` the way `GrpcMxGatewayClientTransport` does (or add tests that enqueue a pre-wrapped `MxGatewayException`), so the actually-used predicate branch is covered.
**Resolution:** _(open)_
**Resolution:** (2026-05-18) Confirmed against source: `FakeGatewayTransport` threw queued exceptions verbatim, so the existing retry tests only ever hit the raw-`RpcException` predicate branch. Added a `MapTransportExceptions` flag to `FakeGatewayTransport` that, when set, runs thrown `RpcException`s through the same shared `RpcExceptionMapper` the production gRPC transport uses, producing the wrapped `MxGatewayException` shape. Added regression test `MxGatewayClientSessionTests.InvokeAsync_RetriesSafeDiagnosticCommand_WhenTransportMapsRpcException`, which exercises the previously-untested production predicate branch. Verified red: removing the `MxGatewayException { InnerException: RpcException }` case from `IsTransientGrpcFailure` fails the new test while the pre-existing raw-`RpcException` test still passes.
### Client.Dotnet-003
@@ -63,13 +63,13 @@
| Severity | Medium |
| Category | Concurrency & thread safety |
| Location | `clients/dotnet/MxGateway.Client/MxGatewaySession.cs:659-663`, `clients/dotnet/MxGateway.Client/MxGatewayClient.cs:230-240` |
| Status | Open |
| Status | Resolved |
**Description:** `DisposeAsync` calls `CloseAsync()` (no token) then unconditionally `_closeLock.Dispose()`. If another thread is concurrently awaiting `CloseAsync(token)` — legal, since the type exposes public async methods and no single-threaded contract — disposing the `SemaphoreSlim` while a `WaitAsync` is pending throws `ObjectDisposedException` into that caller. The `_disposed` flags in both clients are also plain unsynchronised `bool` reads/writes; `ThrowIfDisposed` racing `DisposeAsync` can observe a stale value.
**Recommendation:** Either document `MxGatewaySession`/`MxGatewayClient` as not thread-safe for concurrent dispose, or guard `_disposed` with `Interlocked`/`volatile` and avoid disposing `_closeLock` until all in-flight `CloseAsync` calls complete.
**Resolution:** _(open)_
**Resolution:** (2026-05-18) Confirmed against source: `MxGatewaySession.DisposeAsync` disposed `_closeLock` unconditionally, racing concurrent `CloseAsync` callers; `MxGatewayClient._disposed` was a plain `bool`. Fixed `MxGatewaySession` by tracking in-flight `CloseAsync` callers with an `_activeCloseCount` guarded by a dedicated `_disposeGate` lock and a `_closeLockDisposed` flag: `CloseAsync` registers under the gate (and throws `ObjectDisposedException` if disposal already won) before awaiting `_closeLock.WaitAsync`, and `DisposeAsync` drains `_activeCloseCount` to zero before disposing the semaphore, so the close lock provably outlives every pending `WaitAsync`. Fixed `MxGatewayClient` by changing `_disposed` to an `int` accessed via `Interlocked.Exchange`/`Volatile.Read`. Regression test `MxGatewayClientSessionTests.DisposeAsync_DoesNotRaceConcurrentCloseAsync` runs 100 iterations with one close holding the lock and one parked behind it while `DisposeAsync` runs concurrently; verified red against the original `DisposeAsync` (fails with `ObjectDisposedException`), green after the fix.
### Client.Dotnet-004
+5 -5
View File
@@ -7,7 +7,7 @@
| Review date | 2026-05-18 |
| Commit reviewed | `3cc53a8` |
| Status | Reviewed |
| Open findings | 9 |
| Open findings | 7 |
## Checklist coverage
@@ -48,13 +48,13 @@
| Severity | Medium |
| Category | Error handling & resilience |
| Location | `clients/go/mxgateway/session.go:440-516` |
| Status | Open |
| Status | Resolved |
**Description:** For the `Events`/`EventsAfter` compatibility API (`cancelWhenResultBufferFull == true`), when the 16-slot `results` channel is full `sendEventResult` cancels and returns `false`; the goroutine returns and `close(results)` runs — the consumer sees the channel close with **no `EventResult{Err: ...}` ever delivered**. A slow consumer cannot distinguish "stream ended normally" from "events were silently dropped." This contradicts the design doc's "libraries should not reorder, coalesce, or drop events by default", and a test currently pins this lossy behaviour.
**Recommendation:** Before cancelling on a full buffer, deliver a terminal `EventResult` carrying an explicit error (e.g. `ErrEventBufferOverflow`). Document the behaviour on `Session.Events`; steer callers to `SubscribeEvents` (which blocks instead of dropping).
**Resolution:** _(open)_
**Resolution:** Resolved 2026-05-18: confirmed against source — on a full bounded buffer the compatibility path cancelled and closed `results` with no terminal result. Added the exported sentinel `ErrEventBufferOverflow` (`errors.go`); `sendEventResult` now, on a full buffer, cancels the stream then calls the new `deliverTerminalResult` helper, which evicts one of the oldest buffered events to make room and places `EventResult{Err: ErrEventBufferOverflow}` so it becomes the consumer's last item before the channel closes. The previously lossy regression test (`TestEventsAfterCancelsStreamWhenCompatibilityChannelIsAbandoned`) was re-pointed to assert the terminal `ErrEventBufferOverflow` result is delivered. `clients/go/README.md` now documents the bounded-buffer/overflow behaviour and steers no-loss callers to `SubscribeEvents`.
### Client.Go-003
@@ -63,13 +63,13 @@
| Severity | Medium |
| Category | Correctness & logic bugs |
| Location | `clients/go/cmd/mxgw-go/main.go:517-532` |
| Status | Open |
| Status | Resolved |
**Description:** `parseInt32List` calls `panic(err)` when an `item-handles` token fails to parse as an int32. The CLI is a documented user-facing tool; a typo like `-item-handles 1,foo` crashes the process with an unrecovered panic and stack trace instead of returning a clean error and exit code 2 like every other validation path in `main.go`.
**Recommendation:** Change `parseInt32List` to return `([]int32, error)` and have `runUnsubscribeBulk` propagate the error, matching `parseValue`'s pattern.
**Resolution:** _(open)_
**Resolution:** Resolved 2026-05-18: confirmed against source — `parseInt32List` called `panic(err)` on a malformed token. It now returns `([]int32, error)`, wrapping the bad token (`invalid item handle %q: %w`); `runUnsubscribeBulk` parses item handles before dialing and returns the error, so a typo flows through `runWithIO` to `os.Exit(2)` like other validation paths. Regression tests `TestParseInt32ListParsesValidTokens` and `TestParseInt32ListReturnsErrorOnMalformedToken` added to `cmd/mxgw-go/main_test.go`.
### Client.Go-004
+11 -11
View File
@@ -7,7 +7,7 @@
| Review date | 2026-05-18 |
| Commit reviewed | `3cc53a8` |
| Status | Reviewed |
| Open findings | 12 |
| Open findings | 7 |
## Checklist coverage
@@ -33,13 +33,13 @@
| Severity | Medium |
| Category | Security |
| Location | `clients/java/mxgateway-client/src/main/java/com/dohertylan/mxgateway/client/MxGatewaySecrets.java:30-32` |
| Status | Open |
| Status | Resolved |
**Description:** `redactApiKey` preserves the leading and trailing four characters of the key. A gateway API key has the form `mxgw_<key-id>_<secret>`; the last four characters belong to the secret portion, so the "redacted" form leaks 4 characters of the actual secret into logs, CLI JSON output (`CommonOptions.redactedJsonMap`), and `MxGatewayClientOptions.toString()`. CLAUDE.md states API keys must never reach logs.
**Recommendation:** Redact the secret entirely. Show only a stable non-secret prefix (e.g. the `mxgw_<key-id>_` portion) and mask everything after it, or emit a fixed `mxgw_***` form. Do not echo any trailing characters of the secret.
**Resolution:** _(open)_
**Resolution:** (2026-05-18) Confirmed against source: the old `substring(0,4) + stars + substring(len-4)` echoed the last four secret characters. `redactApiKey` now masks the secret entirely: for gateway-shaped keys it returns the non-secret `mxgw_<key-id>_` prefix followed by `***` (locating the secret separator as the first `_` after `mxgw_`); any non-gateway-shaped token returns `<redacted>`. No leading/trailing secret characters are ever emitted. The pre-existing `MxGatewayCliTests.openSessionJsonRedactsApiKey` assertion that hardcoded the leaky `mxgw***********cret` form was corrected to assert the masked `mxgw_visible_***` form. Regression tests: `MxGatewayMediumFindingsTests.redactApiKeyDoesNotLeakAnyCharacterOfTheSecret`, `redactApiKeyForNonGatewayShapedKeyRevealsNothing`, `redactApiKeyStillHandlesNullAndShortInput`.
### Client.Java-002
@@ -48,13 +48,13 @@
| Severity | Medium |
| Category | Concurrency & thread safety |
| Location | `clients/java/mxgateway-client/src/main/java/com/dohertylan/mxgateway/client/MxEventStream.java:31,66-92` |
| Status | Open |
| Status | Resolved |
**Description:** The `next` field is a plain (non-volatile) instance field, and `MxEventStream` exposes no thread-confinement guarantee. More concretely, a queue-overflow `offer()` and a `close()` `offer(END)` can interleave so the overflow exception is enqueued after `END` and never observed — the contract that "next() throws after overflow" is not guaranteed once `close()` has been called.
**Recommendation:** Document single-consumer-thread usage explicitly in the Javadoc, and serialise terminal state transitions (overflow vs END vs close) behind a single guarded flag so the first terminal condition wins deterministically.
**Resolution:** _(open)_
**Resolution:** (2026-05-18) Confirmed against source: the old `offer()` END-branch did `queue.clear(); queue.offer(END)` when full, so a `close()` arriving after an overflow wiped the already-enqueued overflow exception, leaving the consumer with a clean end-of-stream and the overflow silently lost. Terminal transitions are now serialised through a single `terminate(MxGatewayException)` method guarded by a `terminated` flag and a `terminalLock`; the first terminal condition wins and a later `close()`/`END` cannot overwrite a published overflow fault. The Javadoc now explicitly documents that the iterator methods are single-consumer-only while `close()` is safe from any thread. Regression tests: `MxGatewayMediumFindingsTests.eventStreamOverflowExceptionSurvivesASubsequentClose` (deterministic) and `eventStreamConcurrentOverflowAndCloseAlwaysTerminate` (300-iteration race stress).
### Client.Java-003
@@ -63,13 +63,13 @@
| Severity | Medium |
| Category | mxaccessgw conventions |
| Location | `clients/java/mxgateway-client/src/main/java/com/dohertylan/mxgateway/client/MxGatewayClient.java:119-140` |
| Status | Open |
| Status | Resolved |
**Description:** `OpenSessionReply` carries `gateway_protocol_version` (proto field 8), and `MxGatewayClientVersion.GATEWAY_PROTOCOL_VERSION` exists so the client can reject incompatible generated-code inputs. The client never reads `reply.getGatewayProtocolVersion()` nor compares it against the compiled-in version. A client built against an older/newer contract issues commands blindly and fails with confusing downstream errors instead of a clear version-mismatch failure.
**Recommendation:** In `openSession`/`openSessionRaw`, compare `reply.getGatewayProtocolVersion()` with `MxGatewayClientVersion.gatewayProtocolVersion()` and throw a typed `MxGatewayException` on mismatch.
**Resolution:** _(open)_
**Resolution:** (2026-05-18) Confirmed against source: neither `openSessionRaw` nor `openSessionAsync` read `getGatewayProtocolVersion()`. Added a private `ensureGatewayProtocolCompatible` helper, called from both `openSessionRaw` and `openSessionAsync`, that throws `MxGatewayException` with a clear mismatch message when the gateway reports a non-zero version differing from `MxGatewayClientVersion.gatewayProtocolVersion()`. A gateway that leaves the field unset (value 0, e.g. an older gateway) is accepted unchanged for backward compatibility. `clients/java/README.md` documents the new fail-fast check. Regression tests: `MxGatewayMediumFindingsTests.openSessionRejectsIncompatibleGatewayProtocolVersion` and `openSessionAcceptsMatchingOrUnsetGatewayProtocolVersion`.
### Client.Java-004
@@ -78,13 +78,13 @@
| Severity | Medium |
| Category | Correctness & logic bugs |
| Location | `clients/java/mxgateway-client/src/main/java/com/dohertylan/mxgateway/client/MxGatewaySession.java:114-120,157-163,191-197` |
| Status | Open |
| Status | Resolved |
**Description:** `register`, `addItem`, and `addItem2` check `reply.hasRegister()`/`hasAddItem()` and otherwise fall back to `reply.getReturnValue().getInt32Value()`. If the gateway returns a reply with neither the typed payload nor a `return_value` set, the method silently returns `0` — indistinguishable from a legitimate handle of 0. This masks a contract violation rather than surfacing it.
**Recommendation:** If the expected typed payload is absent and no `return_value` is present, throw `MxGatewayException` (protocol violation) instead of returning `0`.
**Resolution:** _(open)_
**Resolution:** (2026-05-18) Confirmed against source: all three methods returned `reply.getReturnValue().getInt32Value()` (which yields `0` for an unset message field) when the typed payload was absent. Each method now guards the fallback with `reply.hasReturnValue()` and throws `MxGatewayException` describing the protocol violation when neither the typed payload nor a `return_value` is present. The legitimate `return_value` fallback is preserved. Regression tests: `MxGatewayMediumFindingsTests.registerThrowsWhenReplyHasNeitherTypedPayloadNorReturnValue`, `addItemThrowsWhenReplyHasNeitherTypedPayloadNorReturnValue`, and `addItemStillHonoursReturnValueFallback`.
### Client.Java-005
@@ -93,13 +93,13 @@
| Severity | Medium |
| Category | Error handling & resilience |
| Location | `clients/java/mxgateway-client/src/main/java/com/dohertylan/mxgateway/client/MxGatewaySession.java:92-105` |
| Status | Open |
| Status | Resolved |
**Description:** `close()` delegates to `closeRaw()`, which performs a network RPC. When `MxGatewaySession` is used in try-with-resources and the body throws, a failure inside `closeSession` (e.g. `WORKER_UNAVAILABLE`) throws from `close()` and replaces the original exception as the propagated throwable (the body exception becomes a suppressed exception) — a known try-with-resources footgun for I/O-performing `close()`.
**Recommendation:** Either make `close()` swallow/log close-time failures (keeping `closeRaw()` for callers who want the result), or document clearly that `close()` performs a network call that can throw.
**Resolution:** _(open)_
**Resolution:** (2026-05-18) Confirmed against source: `close()` called `closeRaw()` directly, so a `CloseSession` RPC failure propagated out of try-with-resources and replaced the body exception. `close()` now catches `MxGatewayException` from `closeRaw()` and logs it at WARNING via `System.Logger` instead of rethrowing, so a close-time failure never masks the body exception. `closeRaw()` is unchanged and still throws for callers who want to observe the close result. The behavior change and the recommendation to use `closeRaw()` for explicit close handling are documented in `clients/java/README.md` and the `close()` Javadoc. Regression tests: `MxGatewayMediumFindingsTests.closeSuppressesCloseTimeFailureInsteadOfMaskingBodyException` and `closeRawStillSurfacesCloseTimeFailureForCallersWhoWantIt`.
### Client.Java-006
+24 -24
View File
@@ -10,16 +10,16 @@ Each module's `findings.md` is the source of truth; this file is generated from
| Module | Reviewer | Date | Commit | Status | Open | Total |
|---|---|---|---|---|---|---|
| [Client.Dotnet](Client.Dotnet/findings.md) | Claude Code | 2026-05-18 | `3cc53a8` | Reviewed | 8 | 8 |
| [Client.Go](Client.Go/findings.md) | Claude Code | 2026-05-18 | `3cc53a8` | Reviewed | 9 | 10 |
| [Client.Java](Client.Java/findings.md) | Claude Code | 2026-05-18 | `3cc53a8` | Reviewed | 12 | 12 |
| [Client.Dotnet](Client.Dotnet/findings.md) | Claude Code | 2026-05-18 | `3cc53a8` | Reviewed | 5 | 8 |
| [Client.Go](Client.Go/findings.md) | Claude Code | 2026-05-18 | `3cc53a8` | Reviewed | 7 | 10 |
| [Client.Java](Client.Java/findings.md) | Claude Code | 2026-05-18 | `3cc53a8` | Reviewed | 7 | 12 |
| [Client.Python](Client.Python/findings.md) | Claude Code | 2026-05-18 | `3cc53a8` | Reviewed | 12 | 12 |
| [Client.Rust](Client.Rust/findings.md) | Claude Code | 2026-05-18 | `3cc53a8` | Reviewed | 0 | 12 |
| [Contracts](Contracts/findings.md) | Claude Code | 2026-05-18 | `6c64030` | Reviewed | 8 | 8 |
| [IntegrationTests](IntegrationTests/findings.md) | Claude Code | 2026-05-18 | `6c64030` | Reviewed | 8 | 10 |
| [Server](Server/findings.md) | Claude Code | 2026-05-18 | `6c64030` | Reviewed | 12 | 14 |
| [Server](Server/findings.md) | Claude Code | 2026-05-18 | `6c64030` | Reviewed | 8 | 14 |
| [Tests](Tests/findings.md) | Claude Code | 2026-05-18 | `6c64030` | Reviewed | 10 | 12 |
| [Worker](Worker/findings.md) | Claude Code | 2026-05-18 | `6c64030` | Reviewed | 12 | 15 |
| [Worker](Worker/findings.md) | Claude Code | 2026-05-18 | `6c64030` | Reviewed | 7 | 15 |
| [Worker.Tests](Worker.Tests/findings.md) | Claude Code | 2026-05-18 | `6c64030` | Reviewed | 13 | 15 |
## Pending findings
@@ -28,16 +28,6 @@ Findings with status `Open` or `In Progress`, ordered by severity.
| ID | Severity | Category | Location | Description |
|---|---|---|---|---|
| Client.Dotnet-001 | Medium | Error handling & resilience | `clients/dotnet/MxGateway.Client/GrpcMxGatewayClientTransport.cs:190-199`, `clients/dotnet/MxGateway.Client/GrpcGalaxyRepositoryClientTransport.cs:131-140` | `MapRpcException` only produces typed exceptions for `Unauthenticated` and `PermissionDenied`. Every other gRPC status — `NotFound`, `InvalidArgument`, `ResourceExhausted`, `FailedPrecondition`, `Unavailable`, `Internal` — collapses into t… |
| Client.Dotnet-002 | Medium | Testing coverage | `clients/dotnet/MxGateway.Client.Tests/FakeGatewayTransport.cs:145-148`, `clients/dotnet/MxGateway.Client.Tests/MxGatewayClientSessionTests.cs:236-256` | The retry predicate `MxGatewayClientRetryPolicy.IsTransientGrpcFailure` handles two shapes: a raw `RpcException` and an `MxGatewayException { InnerException: RpcException }`. In production the transport always maps `RpcException` → `MxGate… |
| Client.Dotnet-003 | Medium | Concurrency & thread safety | `clients/dotnet/MxGateway.Client/MxGatewaySession.cs:659-663`, `clients/dotnet/MxGateway.Client/MxGatewayClient.cs:230-240` | `DisposeAsync` calls `CloseAsync()` (no token) then unconditionally `_closeLock.Dispose()`. If another thread is concurrently awaiting `CloseAsync(token)` — legal, since the type exposes public async methods and no single-threaded contract… |
| Client.Go-002 | Medium | Error handling & resilience | `clients/go/mxgateway/session.go:440-516` | For the `Events`/`EventsAfter` compatibility API (`cancelWhenResultBufferFull == true`), when the 16-slot `results` channel is full `sendEventResult` cancels and returns `false`; the goroutine returns and `close(results)` runs — the consum… |
| Client.Go-003 | Medium | Correctness & logic bugs | `clients/go/cmd/mxgw-go/main.go:517-532` | `parseInt32List` calls `panic(err)` when an `item-handles` token fails to parse as an int32. The CLI is a documented user-facing tool; a typo like `-item-handles 1,foo` crashes the process with an unrecovered panic and stack trace instead… |
| Client.Java-001 | Medium | Security | `clients/java/mxgateway-client/src/main/java/com/dohertylan/mxgateway/client/MxGatewaySecrets.java:30-32` | `redactApiKey` preserves the leading and trailing four characters of the key. A gateway API key has the form `mxgw_<key-id>_<secret>`; the last four characters belong to the secret portion, so the "redacted" form leaks 4 characters of the… |
| Client.Java-002 | Medium | Concurrency & thread safety | `clients/java/mxgateway-client/src/main/java/com/dohertylan/mxgateway/client/MxEventStream.java:31,66-92` | The `next` field is a plain (non-volatile) instance field, and `MxEventStream` exposes no thread-confinement guarantee. More concretely, a queue-overflow `offer()` and a `close()` `offer(END)` can interleave so the overflow exception is en… |
| Client.Java-003 | Medium | mxaccessgw conventions | `clients/java/mxgateway-client/src/main/java/com/dohertylan/mxgateway/client/MxGatewayClient.java:119-140` | `OpenSessionReply` carries `gateway_protocol_version` (proto field 8), and `MxGatewayClientVersion.GATEWAY_PROTOCOL_VERSION` exists so the client can reject incompatible generated-code inputs. The client never reads `reply.getGatewayProtoc… |
| Client.Java-004 | Medium | Correctness & logic bugs | `clients/java/mxgateway-client/src/main/java/com/dohertylan/mxgateway/client/MxGatewaySession.java:114-120,157-163,191-197` | `register`, `addItem`, and `addItem2` check `reply.hasRegister()`/`hasAddItem()` and otherwise fall back to `reply.getReturnValue().getInt32Value()`. If the gateway returns a reply with neither the typed payload nor a `return_value` set, t… |
| Client.Java-005 | Medium | Error handling & resilience | `clients/java/mxgateway-client/src/main/java/com/dohertylan/mxgateway/client/MxGatewaySession.java:92-105` | `close()` delegates to `closeRaw()`, which performs a network RPC. When `MxGatewaySession` is used in try-with-resources and the body throws, a failure inside `closeSession` (e.g. `WORKER_UNAVAILABLE`) throws from `close()` and replaces th… |
| Client.Python-003 | Medium | Error handling & resilience | `clients/python/src/mxgateway/client.py:125-137,155-173` | `stream_events_raw` and `query_active_alarms` call the stub directly with a `timeout` kwarg when `stream_timeout` is set, with no `TypeError` fallback. `galaxy.py:watch_deploy_events` and `_unary` *do* have a fallback that strips `timeout`… |
| Client.Python-005 | Medium | Performance & resource management | `clients/python/src/mxgateway/galaxy.py:117-140` | `discover_hierarchy` pages through the entire Galaxy object hierarchy and accumulates every `GalaxyObject` (each carrying its full attribute list) into a single in-memory `list` before returning. For a large Galaxy this is a very large all… |
| Client.Python-009 | Medium | Testing coverage | `clients/python/tests/` | Several non-trivial public paths are untested: `Session.write2`/`add_item2` request construction; the bulk-size limit `_ensure_bulk_size`/`MAX_BULK_ITEMS` guard; the `None`-argument `TypeError` guards in bulk methods; the TLS `ca_file` rea… |
@@ -46,19 +36,10 @@ Findings with status `Open` or `In Progress`, ordered by severity.
| IntegrationTests-004 | Medium | Error handling & resilience | `src/MxGateway.IntegrationTests/WorkerLiveMxAccessSmokeTests.cs:108-111` | In the `finally` block, after `CloseSessionAsync`, the test does `await streamTask.WaitAsync(StreamShutdownTimeout)`. If closing the session does not promptly complete the stream (or `StreamEvents` itself faults), this throws `TimeoutExcep… |
| IntegrationTests-005 | Medium | Testing coverage | `src/MxGateway.IntegrationTests/WorkerLiveMxAccessSmokeTests.cs` | The only live MXAccess test covers the Register→AddItem→Advise→one-OnDataChange→Close happy path. CLAUDE.md stresses that MXAccess parity is the contract and calls out non-obvious behaviors (`WriteSecured` ordering, `OperationComplete` sem… |
| IntegrationTests-006 | Medium | Testing coverage | `src/MxGateway.IntegrationTests/DashboardLdapLiveTests.cs` | LDAP live coverage is two cases: admin succeeds, readonly is denied for missing group. There is no coverage of a wrong password for a valid user, an unknown username, or the LDAP-server-unreachable path — all of which `DashboardAuthenticat… |
| Server-002 | Medium | Design-document adherence | `src/MxGateway.Server/Program.cs:24`, `src/MxGateway.Server/GatewayApplication.cs` | `gateway.md:583` and CLAUDE.md state the first version "terminates orphaned workers on startup." No code in MxGateway.Server enumerates or kills leftover `MxGateway.Worker.exe` processes at startup — a grep for `orphan`/`reattach`/`termina… |
| Server-004 | Medium | Code organization & conventions | `src/MxGateway.Server/Security/Authentication/ApiKeyAdminCommandLineParser.cs:227-233`, `src/MxGateway.Server/Security/Authentication/ApiKeyAdminCliRunner.cs:53-77`, `src/MxGateway.Server/Dashboard/DashboardApiKeyManagementService.cs:21-67` | `ParseScopes` accepts any comma-separated strings and `CreateKeyAsync` persists them verbatim; neither the CLI nor the dashboard create path validates scopes against `GatewayScopes`. A typo or non-canonical name (e.g. CLAUDE.md's example `… |
| Server-005 | Medium | Error handling & resilience | `src/MxGateway.Server/Galaxy/GalaxyHierarchyRefreshService.cs:22-28`, `src/MxGateway.Server/Galaxy/GalaxyHierarchyCache.cs:184` | `GalaxyHierarchyCache.RefreshCoreAsync` only catches `SqlException` and `InvalidOperationException`. The initial `cache.RefreshAsync` call in `GalaxyHierarchyRefreshService.ExecuteAsync` is wrapped only for `OperationCanceledException`. A… |
| Server-006 | Medium | Correctness & logic bugs | `src/MxGateway.Server/Sessions/SessionManager.cs:84-114` | In `OpenSessionAsync`, `_metrics.SessionOpened()` (line 89) increments the `_openSessions` gauge before `TryAutoSubscribeAlarmsAsync` runs. If auto-subscribe throws (which it does when `Alarms.RequireSubscribeOnOpen` is true and the worker… |
| Tests-003 | Medium | Performance & resource management | `src/MxGateway.Tests/Security/Authentication/SqliteAuthStoreTests.cs:170-176`, `src/MxGateway.Tests/Security/Authentication/ApiKeyAdminCliRunnerTests.cs:252-258` | `CreateTempDatabasePath` creates a fresh directory under `%TEMP%\mxgateway-auth-tests\<guid>` (and `...-cli-tests`) for every test but nothing ever deletes it. `WorkerProcessLauncherTests.TestDirectory` correctly implements `IDisposable` a… |
| Tests-004 | Medium | Testing coverage | `src/MxGateway.Tests/Security/Authorization/GatewayGrpcAuthorizationInterceptorTests.cs` | The authorization interceptor and `MxAccessGatewayService` are each tested in isolation, but no test composes the interceptor in front of the real service to confirm scope enforcement gates real RPCs end-to-end. A wiring mistake — intercep… |
| Tests-005 | Medium | Testing coverage | `src/MxGateway.Tests/Gateway/Grpc/EventStreamServiceTests.cs:239-261`, `src/MxGateway.Tests/Gateway/Sessions/SessionManagerTests.cs` | Worker-crash handling is only tested as a clean terminal exception from `ReadEventsAsync` or a pre-set `ShutdownException`. There is no test for a worker that faults mid-command — an `InvokeAsync` in flight when the pipe/worker dies — whic… |
| Tests-006 | Medium | Concurrency & thread safety | `src/MxGateway.Tests/Gateway/Workers/WorkerClientTests.cs:76`, `src/MxGateway.Tests/Gateway/Workers/FakeWorkerHarnessTests.cs:122` | Several tests rely on fixed `Task.Delay` values: `WorkerClientTests.InvokeAsync_WithLateReply…` waits a hard-coded 50 ms after writing a late reply before issuing the second command, and the heartbeat tests use a 20 ms delay to make timest… |
| Worker-004 | Medium | Correctness & logic bugs | `src/MxGateway.Worker/Ipc/WorkerPipeSession.cs:565-588` | After `ReportWatchdogFaultIfNeededAsync` sends an `StaHung` fault, the heartbeat loop continues sending normal heartbeats with `State` derived from `_state`, which the watchdog path never sets to `Faulted`. The heartbeat then keeps reporti… |
| Worker-005 | Medium | Error handling & resilience | `src/MxGateway.Worker/MxAccess/WnWrapAlarmConsumer.cs:297-313` | `OnPoll` catches every exception from `PollOnce()` and discards it (`_ = ex;`). The production poll path (`MxAccessStaSession.RunAlarmPollLoopAsync``AlarmCommandHandler.PollOnce``AlarmDispatcher.PollOnce``consumer.PollOnce()`) has… |
| Worker-006 | Medium | Correctness & logic bugs | `src/MxGateway.Worker/Ipc/WorkerPipeSession.cs:117-124`, `src/MxGateway.Worker/MxAccess/MxAccessStaSession.cs:386-491` | `RunAsync`'s `finally` calls `_runtimeSession?.Dispose()` unless `_shutdownTimedOut`. On the normal path `ShutdownGracefullyAsync` already disposed the STA runtime, so re-entering `Dispose()` is a harmless no-op only because `ShutdownGrace… |
| Worker-007 | Medium | mxaccessgw conventions | `src/MxGateway.Worker/MxAccess/MxAccessComServer.cs:130-150` | `Invoke` uses late-bound `Type.InvokeMember` reflection as a fallback when the COM object does not cast to `ILMXProxyServer*`. In production the object is always `LMXProxyServerClass`, so the reflection path exists only for test doubles —… |
| Worker-008 | Medium | Concurrency & thread safety | `src/MxGateway.Worker/MxAccess/MxAccessStaSession.cs:205-249`, `:429-447` | `RunAlarmPollLoopAsync` correctly marshals `handler.PollOnce()` onto the STA via `staRuntime.InvokeAsync`, and the cancel/await/dispose ordering in `ShutdownGracefullyAsync` is sound. However, nothing enforces that the `consumerFactory` an… |
| Worker.Tests-003 | Medium | Concurrency & thread safety | `src/MxGateway.Worker.Tests/Sta/StaRuntimeTests.cs:46-48` | `InvokeAsync_WakesIdlePumpForQueuedCommand` asserts `stopwatch.Elapsed < TimeSpan.FromSeconds(2)` — a wall-clock assertion that on a loaded CI agent can exceed 2s, producing a false failure. The test also does not actually prove the wake e… |
| Worker.Tests-004 | Medium | Concurrency & thread safety | `src/MxGateway.Worker.Tests/MxAccess/MxAccessStaSessionTests.cs:281-329` | `StartAsync_WithAlarmCommandHandlerFactory_PollOnceCalledViaSta` and `Dispose_StopsAlarmPollLoop` use poll-until loops, and `Dispose_StopsAlarmPollLoop` additionally does `await Task.Delay(1000)` then asserts `PollCount` is unchanged. The… |
| Worker.Tests-005 | Medium | Performance & resource management | `src/MxGateway.Worker.Tests/Ipc/WorkerFrameProtocolTests.cs:20-31,103-105`, `src/MxGateway.Worker.Tests/Ipc/WorkerPipeSessionTests.cs:28-31` | `MemoryStream` instances are created and never disposed across the frame-protocol and pipe-session tests (`MemoryStream stream = new();` with no `using`). Disposal is cheap so impact is low, but it is inconsistent with the rest of the suit… |
@@ -155,8 +136,27 @@ Findings with status `Resolved`, `Won't Fix`, or `Deferred`.
| Worker-003 | High | Resolved | Correctness & logic bugs | `src/MxGateway.Worker/Ipc/WorkerPipeSession.cs:399-403`, `:416-419` |
| Worker.Tests-001 | High | Resolved | Testing coverage | `src/MxGateway.Worker.Tests/Sta/` (no `StaMessagePumpTests.cs`) |
| Worker.Tests-002 | High | Resolved | Testing coverage | `src/MxGateway.Worker.Tests/MxAccess/MxAccessStaSessionTests.cs`, `src/MxGateway.Worker.Tests/MxAccess/MxAccessEventMapperTests.cs` |
| Client.Dotnet-001 | Medium | Resolved | Error handling & resilience | `clients/dotnet/MxGateway.Client/GrpcMxGatewayClientTransport.cs:190-199`, `clients/dotnet/MxGateway.Client/GrpcGalaxyRepositoryClientTransport.cs:131-140` |
| Client.Dotnet-002 | Medium | Resolved | Testing coverage | `clients/dotnet/MxGateway.Client.Tests/FakeGatewayTransport.cs:145-148`, `clients/dotnet/MxGateway.Client.Tests/MxGatewayClientSessionTests.cs:236-256` |
| Client.Dotnet-003 | Medium | Resolved | Concurrency & thread safety | `clients/dotnet/MxGateway.Client/MxGatewaySession.cs:659-663`, `clients/dotnet/MxGateway.Client/MxGatewayClient.cs:230-240` |
| Client.Go-002 | Medium | Resolved | Error handling & resilience | `clients/go/mxgateway/session.go:440-516` |
| Client.Go-003 | Medium | Resolved | Correctness & logic bugs | `clients/go/cmd/mxgw-go/main.go:517-532` |
| Client.Java-001 | Medium | Resolved | Security | `clients/java/mxgateway-client/src/main/java/com/dohertylan/mxgateway/client/MxGatewaySecrets.java:30-32` |
| Client.Java-002 | Medium | Resolved | Concurrency & thread safety | `clients/java/mxgateway-client/src/main/java/com/dohertylan/mxgateway/client/MxEventStream.java:31,66-92` |
| Client.Java-003 | Medium | Resolved | mxaccessgw conventions | `clients/java/mxgateway-client/src/main/java/com/dohertylan/mxgateway/client/MxGatewayClient.java:119-140` |
| Client.Java-004 | Medium | Resolved | Correctness & logic bugs | `clients/java/mxgateway-client/src/main/java/com/dohertylan/mxgateway/client/MxGatewaySession.java:114-120,157-163,191-197` |
| Client.Java-005 | Medium | Resolved | Error handling & resilience | `clients/java/mxgateway-client/src/main/java/com/dohertylan/mxgateway/client/MxGatewaySession.java:92-105` |
| Client.Rust-005 | Medium | Resolved | Correctness & logic bugs | `clients/rust/src/session.rs:489-520` |
| Client.Rust-006 | Medium | Resolved | Error handling & resilience | `clients/rust/src/session.rs:531-555` |
| Server-002 | Medium | Resolved | Design-document adherence | `src/MxGateway.Server/Program.cs:24`, `src/MxGateway.Server/GatewayApplication.cs` |
| Server-004 | Medium | Resolved | Code organization & conventions | `src/MxGateway.Server/Security/Authentication/ApiKeyAdminCommandLineParser.cs:227-233`, `src/MxGateway.Server/Security/Authentication/ApiKeyAdminCliRunner.cs:53-77`, `src/MxGateway.Server/Dashboard/DashboardApiKeyManagementService.cs:21-67` |
| Server-005 | Medium | Resolved | Error handling & resilience | `src/MxGateway.Server/Galaxy/GalaxyHierarchyRefreshService.cs:22-28`, `src/MxGateway.Server/Galaxy/GalaxyHierarchyCache.cs:184` |
| Server-006 | Medium | Resolved | Correctness & logic bugs | `src/MxGateway.Server/Sessions/SessionManager.cs:84-114` |
| Worker-004 | Medium | Resolved | Correctness & logic bugs | `src/MxGateway.Worker/Ipc/WorkerPipeSession.cs:565-588` |
| Worker-005 | Medium | Resolved | Error handling & resilience | `src/MxGateway.Worker/MxAccess/MxAccessStaSession.cs:205-258` (production alarm poll loop) |
| Worker-006 | Medium | Resolved | Correctness & logic bugs | `src/MxGateway.Worker/Ipc/WorkerPipeSession.cs:117-124`, `src/MxGateway.Worker/MxAccess/MxAccessStaSession.cs:386-491` |
| Worker-007 | Medium | Resolved | mxaccessgw conventions | `src/MxGateway.Worker/MxAccess/MxAccessComServer.cs:130-150` |
| Worker-008 | Medium | Resolved | Concurrency & thread safety | `src/MxGateway.Worker/MxAccess/MxAccessStaSession.cs:205-249`, `:429-447` |
| Client.Rust-004 | Low | Resolved | Documentation & comments | `clients/rust/src/version.rs:7` |
| Client.Rust-007 | Low | Resolved | Design-document adherence | `clients/rust/RustClientDesign.md:14-55` |
| Client.Rust-008 | Low | Resolved | Performance & resource management | `clients/rust/src/value.rs:161-261` |
+9 -9
View File
@@ -7,7 +7,7 @@
| Review date | 2026-05-18 |
| Commit reviewed | `6c64030` |
| Status | Reviewed |
| Open findings | 12 |
| Open findings | 8 |
## Checklist coverage
@@ -48,13 +48,13 @@
| Severity | Medium |
| Category | Design-document adherence |
| Location | `src/MxGateway.Server/Program.cs:24`, `src/MxGateway.Server/GatewayApplication.cs` |
| Status | Open |
| Status | Resolved |
**Description:** `gateway.md:583` and CLAUDE.md state the first version "terminates orphaned workers on startup." No code in MxGateway.Server enumerates or kills leftover `MxGateway.Worker.exe` processes at startup — a grep for `orphan`/`reattach`/`terminate` finds nothing. After an unclean gateway crash, x86 worker processes (each holding an MXAccess COM instance) leak and survive indefinitely, and a restarted gateway does not reclaim or kill them.
**Recommendation:** Add a startup hosted service that finds and kills stale worker processes (by executable path / a well-known argument or environment marker) before the server accepts sessions, or update the design docs if reattachment/cleanup is deliberately deferred.
**Resolution:** _(open)_
**Resolution:** Resolved 2026-05-18. Confirmed against source: no code path enumerated or killed leftover workers. Added `IRunningProcessInspector` / `SystemRunningProcessInspector` (a testable seam over `Process.GetProcessesByName`/`Kill`), `OrphanWorkerTerminator` (kills processes matched by the configured worker executable path, or by image name when the x64 gateway cannot introspect the x86 worker's `MainModule`, skipping the current process and tolerating per-process kill failures), and `OrphanWorkerCleanupHostedService` (best-effort `IHostedService`). The hosted service is registered in `AddWorkerProcessLauncher` ahead of `AddGatewaySessions` so cleanup runs before the server accepts sessions. `gateway.md` updated to describe the implemented behavior. Regression tests: `OrphanWorkerTerminatorTests` (`KillsWorkerProcessesMatchingConfiguredExecutablePath`, `KillsImageNameMatchWhenExecutablePathUnreadable`, `DoesNotKillUnrelatedProcessSharingImageName`, `DoesNotKillCurrentProcess`, `ContinuesWhenOneKillThrows`).
### Server-003
@@ -78,13 +78,13 @@
| Severity | Medium |
| Category | Code organization & conventions |
| Location | `src/MxGateway.Server/Security/Authentication/ApiKeyAdminCommandLineParser.cs:227-233`, `src/MxGateway.Server/Security/Authentication/ApiKeyAdminCliRunner.cs:53-77`, `src/MxGateway.Server/Dashboard/DashboardApiKeyManagementService.cs:21-67` |
| Status | Open |
| Status | Resolved |
**Description:** `ParseScopes` accepts any comma-separated strings and `CreateKeyAsync` persists them verbatim; neither the CLI nor the dashboard create path validates scopes against `GatewayScopes`. A typo or non-canonical name (e.g. CLAUDE.md's example `--scopes session,invoke,event,metadata,admin`, which does not match the resolver's `session:open`/`invoke:read`/etc.) silently creates a key whose scope strings the authorization resolver never checks for — the key is unusable for those RPCs with no error at creation time.
**Recommendation:** Validate every requested scope against the `GatewayScopes` catalog at create time in both the CLI parser/runner and `DashboardApiKeyManagementService.ValidateCreateRequest`, rejecting unknown scope strings.
**Resolution:** _(open)_
**Resolution:** Resolved 2026-05-18. Confirmed against source: `ParseScopes` split unvalidated strings into the create command and `ValidateCreateRequest` checked only key id and display name. Added `GatewayScopes.All` (the canonical scope catalog) and `GatewayScopes.IsKnown(string)`. `ApiKeyAdminCommandLineParser.Parse` now runs `ValidateScopes` for create-key commands and fails the parse listing the unknown scope(s) and valid set; `DashboardApiKeyManagementService.ValidateCreateRequest` rejects requests carrying any non-canonical scope. Revoke/rotate paths are unaffected (no scope input). Regression tests: `ApiKeyAdminCommandLineParserTests.Parse_CreateKeyCommand_RejectsUnknownScope`, `Parse_CreateKeyCommand_AcceptsAllCanonicalScopes`, and `DashboardApiKeyManagementServiceTests.CreateAsync_UnknownScope_DoesNotCallStore`.
### Server-005
@@ -93,13 +93,13 @@
| Severity | Medium |
| Category | Error handling & resilience |
| Location | `src/MxGateway.Server/Galaxy/GalaxyHierarchyRefreshService.cs:22-28`, `src/MxGateway.Server/Galaxy/GalaxyHierarchyCache.cs:184` |
| Status | Open |
| Status | Resolved |
**Description:** `GalaxyHierarchyCache.RefreshCoreAsync` only catches `SqlException` and `InvalidOperationException`. The initial `cache.RefreshAsync` call in `GalaxyHierarchyRefreshService.ExecuteAsync` is wrapped only for `OperationCanceledException`. A transient non-`SqlException` failure on the first refresh (e.g. a `Win32Exception`/`TimeoutException` from connection establishment, or another `DbException` subtype) escapes both layers, faults the `BackgroundService`, and — with default host behavior — stops the whole gateway. The periodic-tick loop does catch general exceptions, so only the first load is exposed.
**Recommendation:** Broaden the `catch` in `RefreshCoreAsync` to all non-cancellation exceptions (record `Unavailable`/`Stale` and still complete `_firstLoad`), or wrap the initial `RefreshAsync` in `GalaxyHierarchyRefreshService` with the same general `catch` the tick loop uses.
**Resolution:** _(open)_
**Resolution:** Resolved 2026-05-18. Confirmed against source: the initial `RefreshAsync` in `ExecuteAsync` was guarded only for `OperationCanceledException`, and `RefreshCoreAsync` filtered its catch to `SqlException or InvalidOperationException`. Both recommended layers applied: `GalaxyHierarchyRefreshService.ExecuteAsync` now catches every non-cancellation exception on the initial load (logs a warning; the periodic tick retries), and `GalaxyHierarchyCache.RefreshCoreAsync` broadens its catch to all non-cancellation exceptions so the cache still records `Stale`/`Unavailable` and completes `_firstLoad`. The now-unused `Microsoft.Data.SqlClient` using was removed. Regression test: `GalaxyHierarchyRefreshServiceTests.ExecuteAsync_WhenFirstRefreshThrowsNonCancellationException_DoesNotFaultBackgroundService`.
### Server-006
@@ -108,13 +108,13 @@
| Severity | Medium |
| Category | Correctness & logic bugs |
| Location | `src/MxGateway.Server/Sessions/SessionManager.cs:84-114` |
| Status | Open |
| Status | Resolved |
**Description:** In `OpenSessionAsync`, `_metrics.SessionOpened()` (line 89) increments the `_openSessions` gauge before `TryAutoSubscribeAlarmsAsync` runs. If auto-subscribe throws (which it does when `Alarms.RequireSubscribeOnOpen` is true and the worker rejects the subscription), the `catch` block disposes and removes the session and records `_metrics.Fault(...)` but never calls `SessionClosed`/`SessionRemoved`. The `mxgateway.sessions.open` gauge permanently over-counts by one for every such failed open.
**Recommendation:** In the `catch` block, when the session had reached the point where `SessionOpened()` was recorded, also call `_metrics.SessionRemoved()` — or move the `SessionOpened()` call to after auto-subscribe succeeds.
**Resolution:** _(open)_
**Resolution:** Resolved 2026-05-18. Confirmed against source: the `catch` block in `OpenSessionAsync` recorded `Fault(...)` and removed the session but never decremented the open-session gauge after `SessionOpened()` had run. Added a `sessionOpenedRecorded` flag set immediately after `_metrics.SessionOpened()`; the `catch` block now calls `_metrics.SessionRemoved()` when that flag is set, restoring the gauge for a post-`SessionOpened()` failure (e.g. an auto-subscribe rejection with `RequireSubscribeOnOpen=true`). Regression test: `SessionManagerAlarmAutoSubscribeTests.OpenSessionAsync_DoesNotLeakOpenSessionGauge_WhenAutoSubscribeFailsWithRequireOn`.
### Server-007
+16 -12
View File
@@ -7,7 +7,7 @@
| Review date | 2026-05-18 |
| Commit reviewed | `6c64030` |
| Status | Reviewed |
| Open findings | 12 |
| Open findings | 7 |
## Checklist coverage
@@ -78,13 +78,13 @@
| Severity | Medium |
| Category | Correctness & logic bugs |
| Location | `src/MxGateway.Worker/Ipc/WorkerPipeSession.cs:565-588` |
| Status | Open |
| Status | Resolved |
**Description:** After `ReportWatchdogFaultIfNeededAsync` sends an `StaHung` fault, the heartbeat loop continues sending normal heartbeats with `State` derived from `_state`, which the watchdog path never sets to `Faulted`. The heartbeat then keeps reporting a non-faulted state that contradicts the fault just sent.
**Recommendation:** Set `_state = WorkerState.Faulted` (thread-safely) when the watchdog fault fires so heartbeat state and fault stay consistent.
**Resolution:** _(open)_
**Resolution:** 2026-05-18 — `ReportWatchdogFaultIfNeededAsync` now sets `_state = WorkerState.Faulted` immediately after `_watchdogFaultSent = true` and before the `StaHung` fault is written, so the next heartbeat reports `Faulted` instead of contradicting the fault. `_state` is already `volatile` (Worker-003), so the cross-thread write from the heartbeat loop is observed correctly by the heartbeat's own `CreateHeartbeat` read; no further locking is required. Verified by the regression test `WorkerPipeSessionTests.RunAsync_AfterWatchdogFault_HeartbeatReportsFaultedState`, which uses a stale-activity snapshot with an empty current-command correlation id so the heartbeat `State` is derived from `_state` rather than forced to `ExecutingCommand`.
### Worker-005
@@ -92,14 +92,16 @@
|---|---|
| Severity | Medium |
| Category | Error handling & resilience |
| Location | `src/MxGateway.Worker/MxAccess/WnWrapAlarmConsumer.cs:297-313` |
| Status | Open |
| Location | `src/MxGateway.Worker/MxAccess/MxAccessStaSession.cs:205-258` (production alarm poll loop) |
| Status | Resolved |
**Description:** `OnPoll` catches every exception from `PollOnce()` and discards it (`_ = ex;`). The production poll path (`MxAccessStaSession.RunAlarmPollLoopAsync``AlarmCommandHandler.PollOnce``AlarmDispatcher.PollOnce``consumer.PollOnce()`) has no fault recording either. A permanently failing alarm provider (e.g. `GetXmlCurrentAlarms2` returning `E_FAIL`, malformed XML throwing in `XmlDocument.LoadXml`) is therefore completely silent — no fault on the event queue, no log.
**Recommendation:** Route poll failures to `MxAccessEventQueue.RecordFault` (or a logger) so a broken alarm subscription becomes observable. Update the now-stale comment.
**Resolution:** _(open)_
**Re-triage:** The cited location `WnWrapAlarmConsumer.cs:297-313` and the `OnPoll` callback no longer exist as of this branch — Worker-001 removed the off-STA `Timer` and its `OnPoll` callback entirely. The substantive concern still held, however: the **production** poll path in `MxAccessStaSession.RunAlarmPollLoopAsync` caught only `OperationCanceledException`, `ObjectDisposedException`, and `InvalidOperationException`. A genuine poll failure (`COMException` from `GetXmlCurrentAlarms2`, a malformed-XML `XmlException`) escaped uncaught, faulted the never-awaited `Task.Run` poll task, and was silently lost — exactly the silent-failure the finding describes. The finding was re-pointed at the live location and fixed there rather than at the removed `OnPoll`.
**Resolution:** 2026-05-18 — `RunAlarmPollLoopAsync` gained a trailing `catch (Exception exception)` arm after the three graceful-stop catches. A real alarm-poll failure is now converted to a `WorkerFault` (category `MxaccessEventConversionFailed`, carrying the exception type and, for a `COMException`, its `HResult`) by the new `CreateAlarmPollFault` helper and recorded on the session's `MxAccessEventQueue` via `RecordFault`. The worker's event-drain loop drains that fault and forwards it to the gateway, so a broken alarm subscription is now observable on the IPC fault path instead of vanishing. The poll loop still stops after the failure (the subscription is dead). No new proto enum value was added — `MxaccessEventConversionFailed` is the closest existing alarm-path category, avoiding a contracts regeneration across all clients. Verified by the regression test `MxAccessStaSessionTests.RunAlarmPollLoop_WhenPollOnceThrows_RecordsFaultOnEventQueue`.
### Worker-006
@@ -108,13 +110,13 @@
| Severity | Medium |
| Category | Correctness & logic bugs |
| Location | `src/MxGateway.Worker/Ipc/WorkerPipeSession.cs:117-124`, `src/MxGateway.Worker/MxAccess/MxAccessStaSession.cs:386-491` |
| Status | Open |
| Status | Resolved |
**Description:** `RunAsync`'s `finally` calls `_runtimeSession?.Dispose()` unless `_shutdownTimedOut`. On the normal path `ShutdownGracefullyAsync` already disposed the STA runtime, so re-entering `Dispose()` is a harmless no-op only because `ShutdownGracefullyAsync` reached its end and set `disposed = true`. If `ShutdownGracefullyAsync` throws `TimeoutException` after partial teardown with `_shutdownTimedOut` set, the session is never disposed at all — the `finally` skips it — leaking the STA thread and COM object, leaving cleanup to rely solely on process exit.
**Recommendation:** Make the dispose decision explicit and confirm process exit always follows a timed-out shutdown; otherwise dispose defensively. At minimum document why disposal is deliberately skipped on timeout.
**Resolution:** _(open)_
**Resolution:** 2026-05-18 — `RunAsync`'s `finally` now always calls `_runtimeSession?.Dispose()`; the `if (!_shutdownTimedOut)` guard and the `_shutdownTimedOut` field (which had become write-only) were removed. `MxAccessStaSession.Dispose` is idempotent (`if (disposed) return`) and bounded — each STA join is capped with `Wait(TimeSpan.FromSeconds(2))` — so re-entering it on the normal path (where `ShutdownGracefullyAsync` already disposed the runtime) is a harmless no-op, while on the timed-out path it is now the only thing that reclaims the STA thread and releases the MXAccess COM object. The previous behaviour leaked both on a shutdown timeout and relied solely on process exit. A code comment in the `finally` block documents the reasoning. Verified by the regression test `WorkerPipeSessionTests.RunAsync_WhenShutdownTimesOut_StillDisposesRuntimeSession`, which forces a `TimeoutException` from `ShutdownGracefullyAsync` and asserts the runtime session is disposed before `RunAsync` rethrows.
### Worker-007
@@ -123,13 +125,15 @@
| Severity | Medium |
| Category | mxaccessgw conventions |
| Location | `src/MxGateway.Worker/MxAccess/MxAccessComServer.cs:130-150` |
| Status | Open |
| Status | Resolved |
**Description:** `Invoke` uses late-bound `Type.InvokeMember` reflection as a fallback when the COM object does not cast to `ILMXProxyServer*`. In production the object is always `LMXProxyServerClass`, so the reflection path exists only for test doubles — it is dead/untested code on the production path and obscures the interface contract. `params object[] arguments` also boxes value-type handles on every call.
**Recommendation:** Drop the reflection fallback and require the COM object to implement the interface (tests can supply a typed fake), or clearly mark the fallback as test-only.
**Resolution:** _(open)_
**Re-triage:** The finding's claim that the reflection path is "dead/untested code" is partly inaccurate — it was in fact the path exercised by the entire `MxAccessCommandExecutorTests` suite, whose `FakeMxAccessComObject` did not implement any typed interface. So the reflection fallback was test-only but *not* untested. The convention concern (bypassing the typed interface contract, boxing value-type handles) is valid, so the fix follows the recommendation's first option.
**Resolution:** 2026-05-18 — The late-bound `Type.InvokeMember` reflection fallback and its `params object[]`-boxing `Invoke` helper were removed from `MxAccessComServer`. Each adapter method now takes one of two typed paths: an `is IMxAccessServer` fast path (test fakes implement `IMxAccessServer` directly) and the production path that casts to the typed `ILMXProxyServer` / `ILMXProxyServer3` / `ILMXProxyServer4` COM interfaces via new `AsProxyServer*` helpers. A COM object implementing neither now fails fast with a clear `InvalidOperationException` naming the missing interface, instead of an opaque late-bound call. The test seam was migrated accordingly: `MxAccessCommandExecutorTests.FakeMxAccessComObject` now declares `: IMxAccessServer` (its method signatures already matched the interface exactly, so no behavioural change). Verified by the new `MxAccessComServerTests` (typed-server routing, untyped-object rejection, original-exception propagation — no more `TargetInvocationException` wrapping) plus the unchanged, still-passing `MxAccessCommandExecutorTests` suite which now exercises the typed `IMxAccessServer` path.
### Worker-008
@@ -138,13 +142,13 @@
| Severity | Medium |
| Category | Concurrency & thread safety |
| Location | `src/MxGateway.Worker/MxAccess/MxAccessStaSession.cs:205-249`, `:429-447` |
| Status | Open |
| Status | Resolved |
**Description:** `RunAlarmPollLoopAsync` correctly marshals `handler.PollOnce()` onto the STA via `staRuntime.InvokeAsync`, and the cancel/await/dispose ordering in `ShutdownGracefullyAsync` is sound. However, nothing enforces that the `consumerFactory` and all `IMxAccessAlarmConsumer` calls run on the STA thread; a future caller could break STA affinity silently.
**Recommendation:** Add an assertion or documented invariant that the consumer factory and all `IMxAccessAlarmConsumer` calls run on the STA thread, mirroring the existing `MxAccessSession.CreationThreadId` pattern.
**Resolution:** _(open)_
**Resolution:** 2026-05-18 — `MxAccessStaSession` now records the STA thread id (`alarmConsumerThreadId`) at the point the alarm-command-handler factory is invoked — which already runs inside `staRuntime.InvokeAsync` during `StartAsync`, mirroring the `MxAccessSession.CreationThreadId` capture. `RunAlarmPollLoopAsync`'s marshalled poll lambda now calls `EnsureOnAlarmConsumerThread()` before `handler.PollOnce()`, asserting the poll runs on the recorded STA thread. The check is delegated to a new `internal static` guard `AssertOnAlarmConsumerThread(int? expected, int actual)` that throws a descriptive `InvalidOperationException` on an affinity violation and is a no-op when the consumer thread is unrecorded (no alarm handler configured). Making the guard `static` and `internal` keeps it directly unit-testable. The STA-affinity invariant is documented in the guard's XML doc. Verified by the regression tests `MxAccessStaSessionTests.AssertOnAlarmConsumerThread_WhenOffOwningThread_Throws` and `AssertOnAlarmConsumerThread_OnOwningThreadOrUnset_DoesNotThrow`.
### Worker-009
+5 -2
View File
@@ -579,8 +579,11 @@ Policy:
- command exceptions return structured command fault with HRESULT if known,
- stale sessions are closed by lease timeout,
- stuck workers are killed by process id,
- gateway restart should not attempt to reattach old workers unless explicitly
designed; first version should terminate orphaned workers on startup.
- gateway restart does not reattach old workers; `OrphanWorkerCleanupHostedService`
runs `OrphanWorkerTerminator` once on startup — before the server accepts
sessions — to kill leftover `MxGateway.Worker.exe` processes (matched by the
configured worker executable path, or by image name when the x64 gateway cannot
introspect the x86 worker's module) left behind by a previous unclean run.
Because each client owns one worker, a crash or leak affects only that session.
@@ -1,6 +1,7 @@
using System.Security.Claims;
using Microsoft.Data.Sqlite;
using MxGateway.Server.Security.Authentication;
using MxGateway.Server.Security.Authorization;
namespace MxGateway.Server.Dashboard;
@@ -171,6 +172,15 @@ public sealed class DashboardApiKeyManagementService(
return "Display name is required.";
}
string[] unknownScopes = request.Scopes
.Where(scope => !GatewayScopes.IsKnown(scope))
.ToArray();
if (unknownScopes.Length > 0)
{
return $"Unknown scope(s): {string.Join(", ", unknownScopes)}. "
+ $"Valid scopes are: {string.Join(", ", GatewayScopes.All)}.";
}
return null;
}
@@ -1,5 +1,4 @@
using Google.Protobuf.WellKnownTypes;
using Microsoft.Data.SqlClient;
using Microsoft.Extensions.Logging;
using MxGateway.Contracts.Proto.Galaxy;
using MxGateway.Server.Dashboard;
@@ -181,8 +180,13 @@ public sealed class GalaxyHierarchyCache : IGalaxyHierarchyCache
{
throw;
}
catch (Exception exception) when (exception is SqlException or InvalidOperationException)
catch (Exception exception)
{
// Catch every non-cancellation failure — not just SqlException /
// InvalidOperationException. A TimeoutException or Win32Exception
// from connection establishment, or another DbException subtype,
// must still degrade gracefully to Stale/Unavailable and complete
// _firstLoad rather than escape and fault the refresh BackgroundService.
_logger?.LogWarning(exception, "Galaxy hierarchy cache refresh failed.");
GalaxyHierarchyCacheEntry failed = previous with
{
@@ -26,6 +26,15 @@ public sealed class GalaxyHierarchyRefreshService(
{
return;
}
catch (Exception exception)
{
// A transient first-load failure (e.g. a TimeoutException or
// Win32Exception from connection establishment, or a DbException
// subtype the cache does not catch) must not fault this
// BackgroundService and stop the whole gateway. The cache records
// its own Unavailable/Stale status; the periodic tick below retries.
logger.LogWarning(exception, "Initial Galaxy hierarchy cache load failed; will retry on the refresh interval.");
}
using PeriodicTimer timer = new(interval, _timeProvider);
try
@@ -1,3 +1,5 @@
using MxGateway.Server.Security.Authorization;
namespace MxGateway.Server.Security.Authentication;
public static class ApiKeyAdminCommandLineParser
@@ -95,6 +97,12 @@ public static class ApiKeyAdminCommandLineParser
return ApiKeyAdminParseResult.Fail(validationError);
}
string? scopeError = ValidateScopes(kind, scopes);
if (scopeError is not null)
{
return ApiKeyAdminParseResult.Fail(scopeError);
}
return ApiKeyAdminParseResult.Success(new ApiKeyAdminCommand(
Kind: kind,
Json: json,
@@ -152,6 +160,23 @@ public static class ApiKeyAdminCommandLineParser
return null;
}
private static string? ValidateScopes(ApiKeyAdminCommandKind kind, IReadOnlySet<string> scopes)
{
if (kind != ApiKeyAdminCommandKind.CreateKey)
{
return null;
}
string[] unknown = scopes.Where(scope => !GatewayScopes.IsKnown(scope)).ToArray();
if (unknown.Length == 0)
{
return null;
}
return $"Unknown scope(s): {string.Join(", ", unknown)}. "
+ $"Valid scopes are: {string.Join(", ", GatewayScopes.All)}.";
}
private static string KindName(ApiKeyAdminCommandKind kind)
{
return kind switch
@@ -10,4 +10,28 @@ public static class GatewayScopes
public const string EventsRead = "events:read";
public const string MetadataRead = "metadata:read";
public const string Admin = "admin";
/// <summary>
/// The complete catalog of canonical scope strings the gateway authorization
/// resolver recognizes. Key-creation paths (CLI and dashboard) validate requested
/// scopes against this set so a typo or non-canonical name cannot persist a key
/// whose scope strings the resolver never matches.
/// </summary>
public static readonly IReadOnlySet<string> All = new HashSet<string>(
[
SessionOpen,
SessionClose,
InvokeRead,
InvokeWrite,
InvokeSecure,
EventsRead,
MetadataRead,
Admin,
],
System.StringComparer.Ordinal);
/// <summary>Determines whether the supplied scope string is a recognized canonical scope.</summary>
/// <param name="scope">Scope string to check.</param>
/// <returns><see langword="true"/> when the scope is canonical; otherwise <see langword="false"/>.</returns>
public static bool IsKnown(string scope) => All.Contains(scope);
}
@@ -68,6 +68,7 @@ public sealed class SessionManager : ISessionManager
EnsureSessionCapacity();
GatewaySession? session = null;
bool sessionOpenedRecorded = false;
try
{
session = CreateSession(request, clientIdentity);
@@ -86,6 +87,7 @@ public sealed class SessionManager : ISessionManager
session.AttachWorkerClient(workerClient);
session.MarkReady();
_metrics.SessionOpened();
sessionOpenedRecorded = true;
await TryAutoSubscribeAlarmsAsync(session, cancellationToken).ConfigureAwait(false);
@@ -100,6 +102,14 @@ public sealed class SessionManager : ISessionManager
await session.DisposeAsync().ConfigureAwait(false);
}
// If SessionOpened() already incremented the open-session gauge,
// a failure after that point (e.g. auto-subscribe rejection) must
// decrement it again so mxgateway.sessions.open does not leak.
if (sessionOpenedRecorded)
{
_metrics.SessionRemoved();
}
ReleaseSessionSlot();
_metrics.Fault(SessionManagerErrorCode.OpenFailed.ToString());
_logger.LogWarning(
@@ -0,0 +1,29 @@
namespace MxGateway.Server.Workers;
/// <summary>
/// Abstraction over OS process enumeration and termination. Exists so the
/// orphan-worker cleanup logic can be unit-tested without spawning real
/// processes.
/// </summary>
public interface IRunningProcessInspector
{
/// <summary>
/// Enumerates currently running processes whose image name (without the
/// <c>.exe</c> extension) matches <paramref name="processName"/>.
/// </summary>
/// <param name="processName">Process image name to match, without extension.</param>
/// <returns>The matching running processes.</returns>
IReadOnlyList<RunningProcessInfo> GetProcessesByName(string processName);
/// <summary>Forcibly terminates the process with the given identifier.</summary>
/// <param name="processId">Identifier of the process to terminate.</param>
void Kill(int processId);
}
/// <summary>Identifying information for a running process candidate.</summary>
/// <param name="ProcessId">Operating-system process identifier.</param>
/// <param name="ExecutablePath">
/// Fully-qualified path to the process main module, or <see langword="null"/>
/// when it could not be read (e.g. access denied).
/// </param>
public sealed record RunningProcessInfo(int ProcessId, string? ExecutablePath);
@@ -0,0 +1,30 @@
namespace MxGateway.Server.Workers;
/// <summary>
/// Hosted service that terminates leftover MXAccess worker processes once on
/// gateway startup, before the server begins accepting sessions.
/// </summary>
public sealed class OrphanWorkerCleanupHostedService(
OrphanWorkerTerminator terminator,
ILogger<OrphanWorkerCleanupHostedService> logger) : IHostedService
{
/// <inheritdoc />
public Task StartAsync(CancellationToken cancellationToken)
{
try
{
terminator.TerminateOrphans();
}
catch (Exception exception)
{
// Orphan cleanup is best-effort; a failure here must not prevent
// the gateway from starting.
logger.LogWarning(exception, "Orphan worker cleanup failed on startup.");
}
return Task.CompletedTask;
}
/// <inheritdoc />
public Task StopAsync(CancellationToken cancellationToken) => Task.CompletedTask;
}
@@ -0,0 +1,138 @@
using Microsoft.Extensions.Logging.Abstractions;
using Microsoft.Extensions.Options;
using MxGateway.Server.Configuration;
using MxGateway.Server.Metrics;
namespace MxGateway.Server.Workers;
/// <summary>
/// Terminates leftover MXAccess worker processes on gateway startup.
/// <para>
/// Per <c>gateway.md</c> ("first version should terminate orphaned workers
/// on startup") and CLAUDE.md, a gateway restart does not reattach old
/// workers. After an unclean gateway crash, x86 worker processes — each
/// holding an MXAccess COM instance on an STA — survive indefinitely. This
/// terminator finds those processes by executable name/path and kills them
/// before the restarted gateway accepts sessions.
/// </para>
/// </summary>
public sealed class OrphanWorkerTerminator
{
private readonly IRunningProcessInspector _inspector;
private readonly GatewayMetrics _metrics;
private readonly WorkerOptions _workerOptions;
private readonly ILogger<OrphanWorkerTerminator> _logger;
/// <summary>Initializes a new instance of the <see cref="OrphanWorkerTerminator"/> class.</summary>
/// <param name="gatewayOptions">Gateway configuration options.</param>
/// <param name="inspector">Running-process inspector.</param>
/// <param name="metrics">Gateway metrics collector.</param>
/// <param name="logger">Optional logger for diagnostic output.</param>
public OrphanWorkerTerminator(
IOptions<GatewayOptions> gatewayOptions,
IRunningProcessInspector inspector,
GatewayMetrics metrics,
ILogger<OrphanWorkerTerminator>? logger = null)
{
ArgumentNullException.ThrowIfNull(gatewayOptions);
_inspector = inspector ?? throw new ArgumentNullException(nameof(inspector));
_metrics = metrics ?? throw new ArgumentNullException(nameof(metrics));
_workerOptions = gatewayOptions.Value.Worker;
_logger = logger ?? NullLogger<OrphanWorkerTerminator>.Instance;
}
/// <summary>
/// Finds and kills every leftover worker process. Safe to call once at
/// startup before any session-owned worker is launched.
/// </summary>
/// <returns>The number of orphan worker processes that were terminated.</returns>
public int TerminateOrphans()
{
string? configuredPath = ResolveConfiguredExecutablePath();
string processName = ResolveProcessName(configuredPath);
int currentProcessId = Environment.ProcessId;
int terminated = 0;
foreach (RunningProcessInfo candidate in _inspector.GetProcessesByName(processName))
{
if (candidate.ProcessId == currentProcessId)
{
continue;
}
if (!IsOrphanWorker(candidate, configuredPath))
{
continue;
}
try
{
_inspector.Kill(candidate.ProcessId);
_metrics.WorkerKilled("OrphanStartupCleanup");
terminated++;
_logger.LogWarning(
"Terminated orphan worker process {ProcessId} ({ExecutablePath}) left over from a previous gateway run.",
candidate.ProcessId,
candidate.ExecutablePath ?? processName);
}
catch (Exception exception)
{
// The process may have already exited, or be inaccessible.
// A failure to kill one orphan must not block gateway startup.
_logger.LogWarning(
exception,
"Failed to terminate orphan worker process {ProcessId}.",
candidate.ProcessId);
}
}
if (terminated > 0)
{
_logger.LogInformation("Terminated {Count} orphan worker process(es) on startup.", terminated);
}
return terminated;
}
private static bool IsOrphanWorker(RunningProcessInfo candidate, string? configuredPath)
{
// When the executable path is readable, require an exact match against
// the configured worker path so unrelated processes that merely share
// the image name are never killed.
if (candidate.ExecutablePath is { } path)
{
return configuredPath is not null
&& string.Equals(path, configuredPath, StringComparison.OrdinalIgnoreCase);
}
// A null path means the x64 gateway could not introspect the module —
// the expected case for the x86 worker. Image-name match is the only
// signal available; treat it as an orphan.
return true;
}
private string? ResolveConfiguredExecutablePath()
{
try
{
return Path.GetFullPath(_workerOptions.ExecutablePath);
}
catch (Exception exception) when (exception is ArgumentException
or NotSupportedException
or PathTooLongException)
{
_logger.LogWarning(
exception,
"Configured worker executable path '{ExecutablePath}' is not a valid filesystem path; "
+ "orphan cleanup will match by image name only.",
_workerOptions.ExecutablePath);
return null;
}
}
private static string ResolveProcessName(string? configuredPath)
{
string source = configuredPath ?? "MxGateway.Worker.exe";
return Path.GetFileNameWithoutExtension(source);
}
}
@@ -0,0 +1,55 @@
using System.Diagnostics;
namespace MxGateway.Server.Workers;
/// <summary>
/// <see cref="IRunningProcessInspector"/> backed by <see cref="Process"/>.
/// </summary>
public sealed class SystemRunningProcessInspector : IRunningProcessInspector
{
/// <inheritdoc />
public IReadOnlyList<RunningProcessInfo> GetProcessesByName(string processName)
{
List<RunningProcessInfo> results = [];
Process[] processes = Process.GetProcessesByName(processName);
try
{
foreach (Process process in processes)
{
results.Add(new RunningProcessInfo(process.Id, TryGetExecutablePath(process)));
}
}
finally
{
foreach (Process process in processes)
{
process.Dispose();
}
}
return results;
}
/// <inheritdoc />
public void Kill(int processId)
{
using Process process = Process.GetProcessById(processId);
process.Kill(entireProcessTree: true);
}
private static string? TryGetExecutablePath(Process process)
{
try
{
return process.MainModule?.FileName;
}
catch (Exception exception) when (exception is InvalidOperationException
or System.ComponentModel.Win32Exception
or NotSupportedException)
{
// Access to the main module can be denied (e.g. a 64-bit gateway
// querying a 32-bit worker, or a process owned by another user).
return null;
}
}
}
@@ -11,6 +11,13 @@ public static class WorkerServiceCollectionExtensions
services.AddSingleton<IWorkerStartupProbe, WorkerProcessStartedProbe>();
services.AddSingleton<IWorkerProcessLauncher, WorkerProcessLauncher>();
// Terminate workers leaked by a previous unclean gateway run before the
// server accepts sessions. Registered ahead of AddGatewaySessions so the
// cleanup hosted service starts before the session subsystem.
services.AddSingleton<IRunningProcessInspector, SystemRunningProcessInspector>();
services.AddSingleton<OrphanWorkerTerminator>();
services.AddHostedService<OrphanWorkerCleanupHostedService>();
return services;
}
}
@@ -0,0 +1,64 @@
using Microsoft.Extensions.Logging.Abstractions;
using Microsoft.Extensions.Options;
using MxGateway.Server.Galaxy;
namespace MxGateway.Tests.Galaxy;
/// <summary>
/// Server-005 regression: the initial <c>RefreshAsync</c> call in
/// <see cref="GalaxyHierarchyRefreshService"/> must not let a transient,
/// non-cancellation first-load failure (e.g. a <see cref="TimeoutException"/>
/// or <see cref="System.ComponentModel.Win32Exception"/> from connection
/// establishment) escape and fault the host <c>BackgroundService</c>.
/// </summary>
public sealed class GalaxyHierarchyRefreshServiceTests
{
[Fact]
public async Task ExecuteAsync_WhenFirstRefreshThrowsNonCancellationException_DoesNotFaultBackgroundService()
{
ThrowingCache cache = new(new TimeoutException("connection establishment timed out"));
GalaxyHierarchyRefreshService service = CreateService(cache);
using CancellationTokenSource cts = new();
await service.StartAsync(cts.Token);
await cts.CancelAsync();
// The background loop must have stopped cleanly: ExecuteTask completes
// (RanToCompletion or Canceled) rather than faulting on the first refresh.
Task? executeTask = service.ExecuteTask;
Assert.NotNull(executeTask);
await executeTask;
Assert.False(executeTask.IsFaulted);
Assert.Equal(1, cache.RefreshCallCount);
await service.StopAsync(CancellationToken.None);
}
private static GalaxyHierarchyRefreshService CreateService(IGalaxyHierarchyCache cache)
{
GalaxyRepositoryOptions options = new()
{
DashboardRefreshIntervalSeconds = 3600,
};
return new GalaxyHierarchyRefreshService(
cache,
Options.Create(options),
NullLogger<GalaxyHierarchyRefreshService>.Instance);
}
private sealed class ThrowingCache(Exception toThrow) : IGalaxyHierarchyCache
{
public int RefreshCallCount { get; private set; }
public GalaxyHierarchyCacheEntry Current => GalaxyHierarchyCacheEntry.Empty;
public Task RefreshAsync(CancellationToken cancellationToken)
{
RefreshCallCount++;
throw toThrow;
}
public Task WaitForFirstLoadAsync(CancellationToken cancellationToken) => Task.CompletedTask;
}
}
@@ -112,6 +112,33 @@ public sealed class DashboardApiKeyManagementServiceTests
&& entry.Details == "rotated");
}
/// <summary>
/// Server-004 regression: the dashboard create path must reject a request
/// carrying a non-canonical scope string rather than persisting a key whose
/// scope the authorization resolver never matches.
/// </summary>
[Fact]
public async Task CreateAsync_UnknownScope_DoesNotCallStore()
{
FakeApiKeyAdminStore adminStore = new();
DashboardApiKeyManagementService service = CreateService(adminStore);
DashboardApiKeyManagementRequest request = CreateRequest() with
{
Scopes = new HashSet<string>(
[GatewayScopes.SessionOpen, "invoke", "metadata"],
StringComparer.Ordinal),
};
DashboardApiKeyManagementResult result = await service.CreateAsync(
CreateAuthorizedUser(),
request,
CancellationToken.None);
Assert.False(result.Succeeded);
Assert.Equal(0, adminStore.CreateCount);
}
private static DashboardApiKeyManagementService CreateService(
FakeApiKeyAdminStore? adminStore = null,
FakeApiKeyAuditStore? auditStore = null,
@@ -125,6 +125,44 @@ public sealed class SessionManagerAlarmAutoSubscribeTests
CreateOpenRequest(), "client-1", CancellationToken.None));
}
/// <summary>
/// Server-006 regression: when auto-subscribe throws after
/// <c>SessionOpened()</c> incremented the open-session gauge, the failed
/// open must not leave <c>mxgateway.sessions.open</c> over-counted.
/// </summary>
[Fact]
public async Task OpenSessionAsync_DoesNotLeakOpenSessionGauge_WhenAutoSubscribeFailsWithRequireOn()
{
AlarmAutoSubscribeWorkerClient worker = new()
{
SubscribeAlarmsReplyFactory = _ => new MxCommandReply
{
Kind = MxCommandKind.SubscribeAlarms,
ProtocolStatus = new ProtocolStatus
{
Code = ProtocolStatusCode.MxaccessFailure,
Message = "wnwrap subscribe failed",
},
},
};
using GatewayMetrics metrics = new();
SessionManager manager = NewManager(
worker,
alarms: new AlarmsOptions
{
Enabled = true,
SubscriptionExpression = @"\\HOST\Galaxy!Area1",
RequireSubscribeOnOpen = true,
},
metrics: metrics);
await Assert.ThrowsAsync<SessionManagerException>(
async () => await manager.OpenSessionAsync(
CreateOpenRequest(), "client-1", CancellationToken.None));
Assert.Equal(0, metrics.GetSnapshot().OpenSessions);
}
[Fact]
public async Task OpenSessionAsync_Throws_WhenEnabledButNoExpressionAndRequireOn()
{
@@ -161,7 +199,8 @@ public sealed class SessionManagerAlarmAutoSubscribeTests
private static SessionManager NewManager(
AlarmAutoSubscribeWorkerClient worker,
AlarmsOptions alarms)
AlarmsOptions alarms,
GatewayMetrics? metrics = null)
{
FakeSessionWorkerClientFactory factory = new(worker);
GatewayOptions options = new GatewayOptions
@@ -183,7 +222,7 @@ public sealed class SessionManagerAlarmAutoSubscribeTests
new SessionRegistry(),
factory,
Options.Create(options),
new GatewayMetrics());
metrics ?? new GatewayMetrics());
}
private static SessionOpenRequest CreateOpenRequest()
@@ -0,0 +1,137 @@
using Microsoft.Extensions.Options;
using MxGateway.Server.Configuration;
using MxGateway.Server.Metrics;
using MxGateway.Server.Workers;
namespace MxGateway.Tests.Gateway.Workers;
/// <summary>
/// Server-002 regression: per <c>gateway.md</c> the gateway must terminate
/// orphaned worker processes on startup. These tests pin that the terminator
/// kills leftover workers (matched by executable path, or by image name when
/// the path is unreadable) without touching unrelated processes or itself.
/// </summary>
public sealed class OrphanWorkerTerminatorTests
{
private const string WorkerExecutablePath = @"C:\app\src\MxGateway.Worker\bin\x86\Release\MxGateway.Worker.exe";
[Fact]
public void TerminateOrphans_KillsWorkerProcessesMatchingConfiguredExecutablePath()
{
FakeProcessInspector inspector = new(
[
new RunningProcessInfo(101, WorkerExecutablePath),
new RunningProcessInfo(102, WorkerExecutablePath),
]);
OrphanWorkerTerminator terminator = CreateTerminator(inspector);
int killed = terminator.TerminateOrphans();
Assert.Equal(2, killed);
Assert.Equal([101, 102], inspector.KilledProcessIds.Order());
}
[Fact]
public void TerminateOrphans_KillsImageNameMatchWhenExecutablePathUnreadable()
{
// The x64 gateway cannot introspect the x86 worker's main module, so the
// path comes back null. Image-name match is the only signal — and it is
// exactly the orphan worker case, so the process must still be killed.
FakeProcessInspector inspector = new(
[
new RunningProcessInfo(201, ExecutablePath: null),
]);
OrphanWorkerTerminator terminator = CreateTerminator(inspector);
int killed = terminator.TerminateOrphans();
Assert.Equal(1, killed);
Assert.Equal([201], inspector.KilledProcessIds);
}
[Fact]
public void TerminateOrphans_DoesNotKillUnrelatedProcessSharingImageName()
{
// A process with the same image name but a different executable path is
// not our worker and must be left alone.
FakeProcessInspector inspector = new(
[
new RunningProcessInfo(301, @"C:\other\place\MxGateway.Worker.exe"),
]);
OrphanWorkerTerminator terminator = CreateTerminator(inspector);
int killed = terminator.TerminateOrphans();
Assert.Equal(0, killed);
Assert.Empty(inspector.KilledProcessIds);
}
[Fact]
public void TerminateOrphans_DoesNotKillCurrentProcess()
{
FakeProcessInspector inspector = new(
[
new RunningProcessInfo(Environment.ProcessId, WorkerExecutablePath),
]);
OrphanWorkerTerminator terminator = CreateTerminator(inspector);
int killed = terminator.TerminateOrphans();
Assert.Equal(0, killed);
Assert.Empty(inspector.KilledProcessIds);
}
[Fact]
public void TerminateOrphans_ContinuesWhenOneKillThrows()
{
FakeProcessInspector inspector = new(
[
new RunningProcessInfo(401, WorkerExecutablePath),
new RunningProcessInfo(402, WorkerExecutablePath),
])
{
ThrowOnKillProcessId = 401,
};
OrphanWorkerTerminator terminator = CreateTerminator(inspector);
int killed = terminator.TerminateOrphans();
Assert.Equal(1, killed);
Assert.Contains(402, inspector.KilledProcessIds);
}
private static OrphanWorkerTerminator CreateTerminator(IRunningProcessInspector inspector)
{
GatewayOptions options = new()
{
Worker = new WorkerOptions
{
ExecutablePath = WorkerExecutablePath,
},
};
return new OrphanWorkerTerminator(
Options.Create(options),
inspector,
new GatewayMetrics());
}
private sealed class FakeProcessInspector(IReadOnlyList<RunningProcessInfo> processes)
: IRunningProcessInspector
{
public List<int> KilledProcessIds { get; } = [];
public int? ThrowOnKillProcessId { get; init; }
public IReadOnlyList<RunningProcessInfo> GetProcessesByName(string processName) => processes;
public void Kill(int processId)
{
if (ThrowOnKillProcessId == processId)
{
throw new InvalidOperationException("Process has already exited.");
}
KilledProcessIds.Add(processId);
}
}
}
@@ -52,6 +52,56 @@ public sealed class ApiKeyAdminCommandLineParserTests
Assert.Contains("events:read", result.Command.Scopes);
}
/// <summary>
/// Server-004 regression: a create-key command with a non-canonical scope
/// string (e.g. CLAUDE.md's stale <c>invoke</c> instead of <c>invoke:read</c>)
/// must be rejected at parse time rather than silently persisting an
/// unusable scope the authorization resolver never matches.
/// </summary>
[Fact]
public void Parse_CreateKeyCommand_RejectsUnknownScope()
{
ApiKeyAdminParseResult result = ApiKeyAdminCommandLineParser.Parse(
[
"apikey",
"create-key",
"--key-id",
"operator01",
"--display-name",
"Operator",
"--scopes",
"session:open,invoke,metadata",
]);
Assert.True(result.IsApiKeyCommand);
Assert.Null(result.Command);
Assert.NotNull(result.Error);
Assert.Contains("invoke", result.Error, StringComparison.Ordinal);
Assert.Contains("metadata", result.Error, StringComparison.Ordinal);
}
/// <summary>Verifies a create-key command with only canonical scopes parses successfully.</summary>
[Fact]
public void Parse_CreateKeyCommand_AcceptsAllCanonicalScopes()
{
ApiKeyAdminParseResult result = ApiKeyAdminCommandLineParser.Parse(
[
"apikey",
"create-key",
"--key-id",
"operator01",
"--display-name",
"Operator",
"--scopes",
"session:open,session:close,invoke:read,invoke:write,invoke:secure,events:read,metadata:read,admin",
]);
Assert.True(result.IsApiKeyCommand);
Assert.Null(result.Error);
Assert.NotNull(result.Command);
Assert.Equal(8, result.Command.Scopes.Count);
}
/// <summary>
/// Verifies create key without display name returns error.
/// </summary>
@@ -313,6 +313,121 @@ public sealed class WorkerPipeSessionTests
await SendShutdownAndWaitAsync(pipePair, runTask, cancellation.Token);
}
/// <summary>
/// Worker-004 regression: once the watchdog reports an StaHung fault,
/// subsequent heartbeats must report <see cref="WorkerState.Faulted"/>
/// rather than a non-faulted state that contradicts the fault. The
/// snapshot uses an empty current-command correlation id so the
/// heartbeat State is derived from the session state, not forced to
/// ExecutingCommand.
/// </summary>
[Fact]
public async Task RunAsync_AfterWatchdogFault_HeartbeatReportsFaultedState()
{
using CancellationTokenSource cancellation = new(TimeSpan.FromSeconds(10));
using PipePair pipePair = await PipePair.CreateAsync(cancellation.Token);
FakeRuntimeSession runtime = new();
runtime.SetSnapshot(new WorkerRuntimeHeartbeatSnapshot(
DateTimeOffset.UtcNow - TimeSpan.FromSeconds(5),
pendingCommandCount: 0,
outboundEventQueueDepth: 0,
lastEventSequence: 0,
currentCommandCorrelationId: string.Empty));
WorkerPipeSession session = CreatePipeSession(
pipePair.WorkerStream,
runtime,
new WorkerPipeSessionOptions
{
HeartbeatInterval = TimeSpan.FromMilliseconds(20),
HeartbeatGrace = TimeSpan.FromMilliseconds(50),
});
Task runTask = session.RunAsync(cancellation.Token);
await CompleteGatewayHandshakeAsync(pipePair, cancellation.Token);
WorkerEnvelope fault = await ReadUntilAsync(
pipePair.GatewayReader,
WorkerEnvelope.BodyOneofCase.WorkerFault,
cancellation.Token);
Assert.Equal(WorkerFaultCategory.StaHung, fault.WorkerFault.Category);
// The next heartbeat after the fault must agree with it.
WorkerEnvelope heartbeat = await ReadUntilAsync(
pipePair.GatewayReader,
WorkerEnvelope.BodyOneofCase.WorkerHeartbeat,
cancellation.Token);
Assert.Equal(WorkerState.Faulted, heartbeat.WorkerHeartbeat.State);
await SendShutdownAndWaitAsync(pipePair, runTask, cancellation.Token);
}
/// <summary>
/// Worker-006 regression: when graceful shutdown times out, RunAsync
/// must still dispose the runtime session in its finally block.
/// Skipping disposal on the timed-out path leaked the STA thread and
/// the MXAccess COM object.
/// </summary>
[Fact]
public async Task RunAsync_WhenShutdownTimesOut_StillDisposesRuntimeSession()
{
using CancellationTokenSource cancellation = new(TimeSpan.FromSeconds(10));
using PipePair pipePair = await PipePair.CreateAsync(cancellation.Token);
FakeRuntimeSession runtime = new()
{
ThrowTimeoutOnShutdown = true,
};
WorkerPipeSession session = CreatePipeSession(
pipePair.WorkerStream,
runtime,
new WorkerPipeSessionOptions
{
HeartbeatInterval = TimeSpan.FromSeconds(1),
HeartbeatGrace = TimeSpan.FromSeconds(30),
});
Task runTask = session.RunAsync(cancellation.Token);
await CompleteGatewayHandshakeAsync(pipePair, cancellation.Token);
await pipePair.GatewayWriter
.WriteAsync(CreateShutdownEnvelope(), cancellation.Token);
// Drain the gateway-side pipe (heartbeats + the shutdown-timeout
// fault) so the worker's writes never block on a full pipe buffer.
Task drainTask = DrainReaderUntilFaultedAsync(pipePair.GatewayReader, cancellation.Token);
// RunAsync must rethrow the TimeoutException and still reach its
// finally block, which disposes the runtime session.
await Assert.ThrowsAsync<TimeoutException>(async () => await runTask);
Assert.True(
runtime.Disposed,
"RunAsync must dispose the runtime session even when shutdown times out.");
await drainTask;
}
private static async Task DrainReaderUntilFaultedAsync(
WorkerFrameReader reader,
CancellationToken cancellationToken)
{
try
{
while (!cancellationToken.IsCancellationRequested)
{
WorkerEnvelope envelope = await reader.ReadAsync(cancellationToken).ConfigureAwait(false);
if (envelope.BodyCase == WorkerEnvelope.BodyOneofCase.WorkerFault
&& envelope.WorkerFault.Category == WorkerFaultCategory.ShutdownTimeout)
{
return;
}
}
}
catch (Exception exception) when (
exception is OperationCanceledException
|| exception is IOException
|| exception is WorkerFrameProtocolException)
{
// The worker pipe closed once RunAsync completed — expected.
}
}
/// <summary>Verifies that shutdown drops late replies and sends shutdown ack.</summary>
[Fact]
public async Task RunAsync_WhenShutdownArrivesDuringCommand_DropsLateReplyAndWritesShutdownAck()
@@ -813,6 +928,12 @@ public sealed class WorkerPipeSessionTests
/// <summary>Gets or sets whether to throw an exception after dispatch is released.</summary>
public bool ThrowAfterDispatchReleased { get; set; }
/// <summary>Gets or sets whether ShutdownGracefullyAsync throws a TimeoutException.</summary>
public bool ThrowTimeoutOnShutdown { get; set; }
/// <summary>Gets a value indicating whether Dispose was called.</summary>
public bool Disposed { get; private set; }
/// <summary>Starts the worker session with the given session ID and process ID.</summary>
/// <param name="sessionId">The session identifier.</param>
/// <param name="workerProcessId">The worker process ID.</param>
@@ -939,6 +1060,12 @@ public sealed class WorkerPipeSessionTests
CancellationToken cancellationToken = default)
{
releaseDispatch.Set();
if (ThrowTimeoutOnShutdown)
{
return Task.FromException<MxAccessShutdownResult>(
new TimeoutException("Simulated graceful shutdown timeout."));
}
return Task.FromResult(new MxAccessShutdownResult(Array.Empty<MxAccessShutdownFailure>()));
}
@@ -971,6 +1098,7 @@ public sealed class WorkerPipeSessionTests
/// <summary>Disposes resources.</summary>
public void Dispose()
{
Disposed = true;
releaseDispatch.Set();
releaseDispatch.Dispose();
DispatchStarted.Dispose();
@@ -0,0 +1,138 @@
using System;
using System.Collections.Generic;
using MxGateway.Worker.MxAccess;
namespace MxGateway.Worker.Tests.MxAccess;
/// <summary>
/// Worker-007 regression tests for <see cref="MxAccessComServer"/>. The
/// adapter no longer falls back to late-bound <c>Type.InvokeMember</c>
/// reflection: a COM object must implement either the typed
/// <c>ILMXProxyServer</c> COM interface family (production) or
/// <see cref="IMxAccessServer"/> directly (test fakes).
/// </summary>
public sealed class MxAccessComServerTests
{
/// <summary>
/// A COM object implementing <see cref="IMxAccessServer"/> is routed
/// through the typed interface — no reflection — preserving arguments
/// and return values.
/// </summary>
[Fact]
public void Methods_WithTypedServer_RouteThroughTypedInterface()
{
RecordingMxAccessServer typed = new(registerHandle: 77);
MxAccessComServer adapter = new(typed);
int serverHandle = adapter.Register("client-a");
adapter.Advise(serverHandle, itemHandle: 9);
adapter.Unregister(serverHandle);
Assert.Equal(77, serverHandle);
Assert.Equal("client-a", typed.RegisteredClientName);
Assert.Equal(new[] { "Register:client-a", "Advise:77:9", "Unregister:77" }, typed.Calls);
}
/// <summary>
/// A COM object that implements neither the typed COM interface family
/// nor <see cref="IMxAccessServer"/> fails fast with a clear
/// <see cref="InvalidOperationException"/> instead of a late-bound
/// reflection call.
/// </summary>
[Fact]
public void Methods_WithUntypedObject_ThrowInvalidOperation()
{
MxAccessComServer adapter = new(new object());
InvalidOperationException exception =
Assert.Throws<InvalidOperationException>(() => adapter.Register("client"));
Assert.Contains("does not implement", exception.Message, StringComparison.Ordinal);
Assert.Contains(nameof(IMxAccessServer), exception.Message, StringComparison.Ordinal);
}
/// <summary>
/// Exceptions thrown by the typed server propagate unchanged — no
/// <c>TargetInvocationException</c> wrapping (reflection is gone).
/// </summary>
[Fact]
public void Methods_WhenTypedServerThrows_PropagateOriginalException()
{
RecordingMxAccessServer typed = new(registerHandle: 1)
{
ThrowOnRegister = new InvalidOperationException("register failed"),
};
MxAccessComServer adapter = new(typed);
InvalidOperationException exception =
Assert.Throws<InvalidOperationException>(() => adapter.Register("client"));
Assert.Equal("register failed", exception.Message);
}
private sealed class RecordingMxAccessServer : IMxAccessServer
{
private readonly int registerHandle;
private readonly List<string> calls = new();
public RecordingMxAccessServer(int registerHandle)
{
this.registerHandle = registerHandle;
}
public string? RegisteredClientName { get; private set; }
public Exception? ThrowOnRegister { get; set; }
public IReadOnlyList<string> Calls => calls.ToArray();
public int Register(string clientName)
{
calls.Add($"Register:{clientName}");
RegisteredClientName = clientName;
if (ThrowOnRegister is not null)
{
throw ThrowOnRegister;
}
return registerHandle;
}
public void Unregister(int serverHandle)
{
calls.Add($"Unregister:{serverHandle}");
}
public int AddItem(int serverHandle, string itemDefinition)
{
calls.Add($"AddItem:{serverHandle}:{itemDefinition}");
return 0;
}
public int AddItem2(int serverHandle, string itemDefinition, string itemContext)
{
calls.Add($"AddItem2:{serverHandle}:{itemDefinition}:{itemContext}");
return 0;
}
public void RemoveItem(int serverHandle, int itemHandle)
{
calls.Add($"RemoveItem:{serverHandle}:{itemHandle}");
}
public void Advise(int serverHandle, int itemHandle)
{
calls.Add($"Advise:{serverHandle}:{itemHandle}");
}
public void UnAdvise(int serverHandle, int itemHandle)
{
calls.Add($"UnAdvise:{serverHandle}:{itemHandle}");
}
public void AdviseSupervisory(int serverHandle, int itemHandle)
{
calls.Add($"AdviseSupervisory:{serverHandle}:{itemHandle}");
}
}
}
@@ -816,7 +816,7 @@ public sealed class MxAccessCommandExecutorTests
TimeSpan.FromMilliseconds(25));
}
private sealed class FakeMxAccessComObject
private sealed class FakeMxAccessComObject : IMxAccessServer
{
private readonly int registerHandle;
private readonly int addItemHandle;
@@ -328,6 +328,77 @@ public sealed class MxAccessStaSessionTests
Assert.Equal(pollCountAtDispose, handler.PollCount);
}
/// <summary>
/// Worker-005 regression: when the alarm poll loop's PollOnce throws a
/// real failure (e.g. a COMException from GetXmlCurrentAlarms2), the
/// failure must be recorded as a fault on the event queue so a broken
/// alarm subscription becomes observable on the IPC fault path instead
/// of silently faulting the never-awaited poll task.
/// </summary>
[Fact]
public async Task RunAlarmPollLoop_WhenPollOnceThrows_RecordsFaultOnEventQueue()
{
FakeAlarmCommandHandler handler = new()
{
PollException = new System.Runtime.InteropServices.COMException(
"GetXmlCurrentAlarms2 failed.", unchecked((int)0x80004005)),
};
FakeMxAccessComObjectFactory factory = new();
FakeMxAccessEventSink eventSink = new();
using StaRuntime runtime = CreateRuntime();
MxAccessEventQueue eventQueue = new();
using MxAccessStaSession session = new(
runtime,
factory,
eventSink,
eventQueue,
_eq => handler);
await session.StartAsync("session-1", workerProcessId: 1);
// Wait up to 5s for the poll loop to fault the queue.
using CancellationTokenSource timeout = new CancellationTokenSource(TimeSpan.FromSeconds(5));
while (!eventQueue.IsFaulted && !timeout.IsCancellationRequested)
{
await Task.Delay(50, CancellationToken.None);
}
Assert.True(eventQueue.IsFaulted, "Expected the alarm poll failure to fault the event queue.");
WorkerFault? fault = session.DrainFault();
Assert.NotNull(fault);
Assert.Equal(WorkerFaultCategory.MxaccessEventConversionFailed, fault!.Category);
Assert.Contains("alarm poll failed", fault.DiagnosticMessage, StringComparison.OrdinalIgnoreCase);
Assert.Equal(typeof(System.Runtime.InteropServices.COMException).FullName, fault.ExceptionType);
}
/// <summary>
/// Worker-008 regression: the STA-affinity guard throws when an
/// IMxAccessAlarmConsumer call is attempted off the thread that created
/// the consumer, mirroring the MxAccessSession.CreationThreadId invariant.
/// </summary>
[Fact]
public void AssertOnAlarmConsumerThread_WhenOffOwningThread_Throws()
{
const int owningThread = 7;
const int otherThread = 99;
InvalidOperationException exception = Assert.Throws<InvalidOperationException>(
() => MxAccessStaSession.AssertOnAlarmConsumerThread(owningThread, otherThread));
Assert.Contains("off its owning STA thread", exception.Message, StringComparison.Ordinal);
}
/// <summary>
/// Worker-008: the STA-affinity guard is a no-op on the owning thread and
/// when no alarm consumer is configured (expected thread id null).
/// </summary>
[Fact]
public void AssertOnAlarmConsumerThread_OnOwningThreadOrUnset_DoesNotThrow()
{
MxAccessStaSession.AssertOnAlarmConsumerThread(expectedThreadId: 42, actualThreadId: 42);
MxAccessStaSession.AssertOnAlarmConsumerThread(expectedThreadId: null, actualThreadId: 123);
}
/// <summary>
/// Noop STA COM apartment initializer for testing.
/// </summary>
@@ -360,6 +431,9 @@ public sealed class MxAccessStaSessionTests
public bool IsSubscribed { get; private set; }
public string? LastSubscription { get; private set; }
/// <summary>Exception thrown by PollOnce; null to succeed.</summary>
public Exception? PollException { get; set; }
public int PollCount
{
get { lock (gate) return pollCount; }
@@ -400,6 +474,11 @@ public sealed class MxAccessStaSessionTests
pollCount++;
lastPollThreadId = Thread.CurrentThread.ManagedThreadId;
}
if (PollException is not null)
{
throw PollException;
}
}
public void Dispose() { }
+14 -7
View File
@@ -36,7 +36,6 @@ public sealed class WorkerPipeSession
private volatile WorkerState _state = WorkerState.Starting;
private bool _acceptingCommands = true;
private bool _watchdogFaultSent;
private bool _shutdownTimedOut;
/// <summary>Initializes a new worker pipe session over the provided stream.</summary>
/// <param name="stream">Network stream for reading and writing frames.</param>
@@ -119,11 +118,14 @@ public sealed class WorkerPipeSession
}
finally
{
if (!_shutdownTimedOut)
{
_runtimeSession?.Dispose();
}
// Always dispose the runtime session, including after a
// shutdown timeout. MxAccessStaSession.Dispose is idempotent and
// bounded (each STA join is capped at 2s), so re-entering it on
// the normal path is a harmless no-op, while on the timed-out
// path it is the only thing that reclaims the STA thread and
// releases the MXAccess COM object — skipping it leaked both and
// left cleanup to rely solely on process exit.
_runtimeSession?.Dispose();
_runtimeSession = null;
_state = WorkerState.Stopped;
}
@@ -480,7 +482,6 @@ public sealed class WorkerPipeSession
}
catch (TimeoutException exception)
{
_shutdownTimedOut = true;
_state = WorkerState.Faulted;
await TryWriteFaultAsync(CreateShutdownTimeoutFault(exception), cancellationToken).ConfigureAwait(false);
throw;
@@ -615,6 +616,12 @@ public sealed class WorkerPipeSession
}
_watchdogFaultSent = true;
// The STA is hung — move the session to Faulted before the next
// heartbeat so the heartbeat's reported State stays consistent with
// the StaHung fault just sent. Without this the heartbeat loop keeps
// advertising a non-faulted state that contradicts the fault.
_state = WorkerState.Faulted;
await TryWriteFaultAsync(
CreateFault(
WorkerFaultCategory.StaHung,
@@ -1,13 +1,22 @@
using System;
using System.Reflection;
using System.Runtime.ExceptionServices;
using ArchestrA.MxAccess;
namespace MxGateway.Worker.MxAccess;
/// <summary>
/// Adapter exposing MXAccess COM object methods through the IMxAccessServer interface.
/// Adapter exposing MXAccess COM object methods through the <see cref="IMxAccessServer"/>
/// interface.
/// </summary>
/// <remarks>
/// The supplied object must implement the typed MXAccess COM interface contract.
/// In production it is the <c>LMXProxyServerClass</c> RCW, which implements
/// <see cref="ILMXProxyServer"/> / <see cref="ILMXProxyServer3"/> /
/// <see cref="ILMXProxyServer4"/>. Tests substitute a typed fake that
/// implements <see cref="IMxAccessServer"/> directly. The earlier late-bound
/// <c>Type.InvokeMember</c> reflection fallback was removed: it bypassed the
/// typed interface contract, boxed value-type handles on every call, and only
/// ever served test doubles — a typed fake is the supported test seam now.
/// </remarks>
public sealed class MxAccessComServer : IMxAccessServer
{
private readonly object mxAccessComObject;
@@ -15,7 +24,11 @@ public sealed class MxAccessComServer : IMxAccessServer
/// <summary>
/// Initializes the adapter with the MXAccess COM object.
/// </summary>
/// <param name="mxAccessComObject">MXAccess COM object instance.</param>
/// <param name="mxAccessComObject">
/// MXAccess COM object instance. Must implement either the typed
/// <see cref="ILMXProxyServer"/> COM interface family (production) or
/// <see cref="IMxAccessServer"/> directly (test fakes).
/// </param>
public MxAccessComServer(object mxAccessComObject)
{
this.mxAccessComObject = mxAccessComObject ?? throw new ArgumentNullException(nameof(mxAccessComObject));
@@ -24,24 +37,24 @@ public sealed class MxAccessComServer : IMxAccessServer
/// <inheritdoc />
public int Register(string clientName)
{
if (mxAccessComObject is ILMXProxyServer mxAccessServer)
if (mxAccessComObject is IMxAccessServer typedFake)
{
return mxAccessServer.Register(clientName);
return typedFake.Register(clientName);
}
return (int)Invoke(nameof(Register), clientName);
return AsProxyServer().Register(clientName);
}
/// <inheritdoc />
public void Unregister(int serverHandle)
{
if (mxAccessComObject is ILMXProxyServer mxAccessServer)
if (mxAccessComObject is IMxAccessServer typedFake)
{
mxAccessServer.Unregister(serverHandle);
typedFake.Unregister(serverHandle);
return;
}
Invoke(nameof(Unregister), serverHandle);
AsProxyServer().Unregister(serverHandle);
}
/// <inheritdoc />
@@ -49,12 +62,12 @@ public sealed class MxAccessComServer : IMxAccessServer
int serverHandle,
string itemDefinition)
{
if (mxAccessComObject is ILMXProxyServer mxAccessServer)
if (mxAccessComObject is IMxAccessServer typedFake)
{
return mxAccessServer.AddItem(serverHandle, itemDefinition);
return typedFake.AddItem(serverHandle, itemDefinition);
}
return (int)Invoke(nameof(AddItem), serverHandle, itemDefinition);
return AsProxyServer().AddItem(serverHandle, itemDefinition);
}
/// <inheritdoc />
@@ -63,12 +76,12 @@ public sealed class MxAccessComServer : IMxAccessServer
string itemDefinition,
string itemContext)
{
if (mxAccessComObject is ILMXProxyServer3 mxAccessServer)
if (mxAccessComObject is IMxAccessServer typedFake)
{
return mxAccessServer.AddItem2(serverHandle, itemDefinition, itemContext);
return typedFake.AddItem2(serverHandle, itemDefinition, itemContext);
}
return (int)Invoke(nameof(AddItem2), serverHandle, itemDefinition, itemContext);
return AsProxyServer3().AddItem2(serverHandle, itemDefinition, itemContext);
}
/// <inheritdoc />
@@ -76,13 +89,13 @@ public sealed class MxAccessComServer : IMxAccessServer
int serverHandle,
int itemHandle)
{
if (mxAccessComObject is ILMXProxyServer mxAccessServer)
if (mxAccessComObject is IMxAccessServer typedFake)
{
mxAccessServer.RemoveItem(serverHandle, itemHandle);
typedFake.RemoveItem(serverHandle, itemHandle);
return;
}
Invoke(nameof(RemoveItem), serverHandle, itemHandle);
AsProxyServer().RemoveItem(serverHandle, itemHandle);
}
/// <inheritdoc />
@@ -90,13 +103,13 @@ public sealed class MxAccessComServer : IMxAccessServer
int serverHandle,
int itemHandle)
{
if (mxAccessComObject is ILMXProxyServer mxAccessServer)
if (mxAccessComObject is IMxAccessServer typedFake)
{
mxAccessServer.Advise(serverHandle, itemHandle);
typedFake.Advise(serverHandle, itemHandle);
return;
}
Invoke(nameof(Advise), serverHandle, itemHandle);
AsProxyServer().Advise(serverHandle, itemHandle);
}
/// <inheritdoc />
@@ -104,13 +117,13 @@ public sealed class MxAccessComServer : IMxAccessServer
int serverHandle,
int itemHandle)
{
if (mxAccessComObject is ILMXProxyServer mxAccessServer)
if (mxAccessComObject is IMxAccessServer typedFake)
{
mxAccessServer.UnAdvise(serverHandle, itemHandle);
typedFake.UnAdvise(serverHandle, itemHandle);
return;
}
Invoke(nameof(UnAdvise), serverHandle, itemHandle);
AsProxyServer().UnAdvise(serverHandle, itemHandle);
}
/// <inheritdoc />
@@ -118,34 +131,36 @@ public sealed class MxAccessComServer : IMxAccessServer
int serverHandle,
int itemHandle)
{
if (mxAccessComObject is ILMXProxyServer4 mxAccessServer)
if (mxAccessComObject is IMxAccessServer typedFake)
{
mxAccessServer.AdviseSupervisory(serverHandle, itemHandle);
typedFake.AdviseSupervisory(serverHandle, itemHandle);
return;
}
Invoke(nameof(AdviseSupervisory), serverHandle, itemHandle);
AsProxyServer4().AdviseSupervisory(serverHandle, itemHandle);
}
private object Invoke(
string methodName,
params object[] arguments)
private ILMXProxyServer AsProxyServer()
{
try
{
return mxAccessComObject
.GetType()
.InvokeMember(
methodName,
BindingFlags.Instance | BindingFlags.Public | BindingFlags.InvokeMethod,
binder: null,
target: mxAccessComObject,
args: arguments);
}
catch (TargetInvocationException exception) when (exception.InnerException is not null)
{
ExceptionDispatchInfo.Capture(exception.InnerException).Throw();
throw;
}
return mxAccessComObject as ILMXProxyServer
?? throw new InvalidOperationException(
$"MXAccess COM object of type '{mxAccessComObject.GetType().FullName}' does not implement "
+ $"{nameof(ILMXProxyServer)} or {nameof(IMxAccessServer)}.");
}
private ILMXProxyServer3 AsProxyServer3()
{
return mxAccessComObject as ILMXProxyServer3
?? throw new InvalidOperationException(
$"MXAccess COM object of type '{mxAccessComObject.GetType().FullName}' does not implement "
+ $"{nameof(ILMXProxyServer3)} or {nameof(IMxAccessServer)}.");
}
private ILMXProxyServer4 AsProxyServer4()
{
return mxAccessComObject as ILMXProxyServer4
?? throw new InvalidOperationException(
$"MXAccess COM object of type '{mxAccessComObject.GetType().FullName}' does not implement "
+ $"{nameof(ILMXProxyServer4)} or {nameof(IMxAccessServer)}.");
}
}
@@ -23,6 +23,7 @@ public sealed class MxAccessStaSession : IWorkerRuntimeSession
private IAlarmCommandHandler? alarmCommandHandler;
private CancellationTokenSource? alarmPollCts;
private Task? alarmPollTask;
private int? alarmConsumerThreadId;
private bool disposed;
/// <summary>
@@ -180,6 +181,14 @@ public sealed class MxAccessStaSession : IWorkerRuntimeSession
session = MxAccessSession.Create(factory, eventSink, sessionId);
if (alarmCommandHandlerFactory is not null)
{
// STA-affinity invariant: the alarm consumer factory and
// every IMxAccessAlarmConsumer call must run on the STA
// thread, because the production wnwrap consumer holds an
// Apartment-threaded COM object. The factory runs here
// inside staRuntime.InvokeAsync, so this records the STA
// thread id; RunAlarmPollLoopAsync then asserts each
// PollOnce executes on the same thread.
alarmConsumerThreadId = Environment.CurrentManagedThreadId;
alarmCommandHandler = alarmCommandHandlerFactory(eventQueue);
}
commandDispatcher = new StaCommandDispatcher(
@@ -227,7 +236,11 @@ public sealed class MxAccessStaSession : IWorkerRuntimeSession
try
{
await staRuntime.InvokeAsync(
() => handler.PollOnce(),
() =>
{
EnsureOnAlarmConsumerThread();
handler.PollOnce();
},
cancellationToken).ConfigureAwait(false);
}
catch (OperationCanceledException)
@@ -244,10 +257,77 @@ public sealed class MxAccessStaSession : IWorkerRuntimeSession
// STA runtime shutting down — stop the loop gracefully.
return;
}
catch (Exception exception)
{
// A real alarm-poll failure (COMException from
// GetXmlCurrentAlarms2, malformed-XML parse failure, etc.).
// Record it as a fault on the event queue so a broken
// alarm subscription becomes observable on the IPC fault
// path instead of silently faulting this never-awaited
// task. The loop then stops — the subscription is dead.
eventQueue.RecordFault(CreateAlarmPollFault(exception));
return;
}
}
}, CancellationToken.None);
}
private void EnsureOnAlarmConsumerThread()
{
AssertOnAlarmConsumerThread(alarmConsumerThreadId, Environment.CurrentManagedThreadId);
}
/// <summary>
/// Enforces the STA-affinity invariant for the alarm consumer: every
/// <see cref="IMxAccessAlarmConsumer"/> call (and the consumer factory)
/// must run on the same thread the consumer was created on (the worker's
/// STA). Throws <see cref="InvalidOperationException"/> when a caller
/// breaks affinity — a programming error that would otherwise risk a
/// cross-apartment COM deadlock in the production wnwrap consumer, since
/// its CLSID is registered <c>ThreadingModel=Apartment</c>. The check is
/// a no-op until the consumer thread has been recorded (no alarm handler
/// configured, or session not yet started).
/// </summary>
/// <param name="expectedThreadId">
/// The managed thread id the alarm consumer was created on, or
/// <c>null</c> if no alarm consumer is configured.
/// </param>
/// <param name="actualThreadId">The current managed thread id.</param>
internal static void AssertOnAlarmConsumerThread(int? expectedThreadId, int actualThreadId)
{
if (expectedThreadId is not null && actualThreadId != expectedThreadId.Value)
{
throw new InvalidOperationException(
$"Alarm consumer accessed off its owning STA thread. Expected thread {expectedThreadId.Value}, "
+ $"actual {actualThreadId}. All IMxAccessAlarmConsumer calls must run on the STA that "
+ "created the consumer.");
}
}
private static WorkerFault CreateAlarmPollFault(Exception exception)
{
string message =
$"MXAccess alarm poll failed: {exception.Message}";
WorkerFault fault = new()
{
Category = WorkerFaultCategory.MxaccessEventConversionFailed,
ExceptionType = exception.GetType().FullName ?? string.Empty,
DiagnosticMessage = message,
ProtocolStatus = new ProtocolStatus
{
Code = ProtocolStatusCode.WorkerUnavailable,
Message = message,
},
};
if (exception is System.Runtime.InteropServices.COMException comException)
{
fault.Hresult = comException.HResult;
}
return fault;
}
/// <summary>
/// Dispatches a command to the STA thread for execution asynchronously.
/// </summary>