Alarm pipeline wiring + full code-review backlog resolution #119

Open
dohertj2 wants to merge 51 commits from docs/alarm-client-wm-app-finding into main
233 changed files with 34281 additions and 2467 deletions
+2 -2
View File
@@ -32,7 +32,7 @@ dotnet test src/MxGateway.Worker.Tests/MxGateway.Worker.Tests.csproj -p:Platform
dotnet run --project src/MxGateway.Server/MxGateway.Server.csproj
# API-key admin CLI (same exe, "apikey" subcommand)
dotnet run --project src/MxGateway.Server/MxGateway.Server.csproj -- apikey create --display-name "dev" --scopes session,invoke,event,metadata,admin
dotnet run --project src/MxGateway.Server/MxGateway.Server.csproj -- apikey create --display-name "dev" --scopes session:open,session:close,invoke:read,invoke:write,invoke:secure,events:read,metadata:read,admin
```
Single test by name (xUnit `--filter`):
@@ -114,7 +114,7 @@ External analysis sources referenced by design docs:
## Authentication
Gateway gRPC clients authenticate with an API key in metadata: `authorization: Bearer mxgw_<key-id>_<secret>`. Keys are stored hashed (with a peppered SHA) in a gateway-owned SQLite DB (default `C:\ProgramData\MxGateway\gateway-auth.db`). Scopes (`session`, `invoke`, `event`, `metadata`, `admin`) gate specific RPCs; missing → `Unauthenticated`, insufficient → `PermissionDenied`. The `apikey` subcommand on the server exe manages keys; see `src/MxGateway.Server/Security/Authentication/`.
Gateway gRPC clients authenticate with an API key in metadata: `authorization: Bearer mxgw_<key-id>_<secret>`. Keys are stored hashed (with a peppered SHA) in a gateway-owned SQLite DB (default `C:\ProgramData\MxGateway\gateway-auth.db`). Scopes (`session:open`, `session:close`, `invoke:read`, `invoke:write`, `invoke:secure`, `events:read`, `metadata:read`, `admin`) gate specific RPCs; missing → `Unauthenticated`, insufficient → `PermissionDenied`. The `apikey` subcommand on the server exe manages keys; see `src/MxGateway.Server/Security/Authentication/`.
Dashboard auth uses the same verifier but exchanges the API key for an HTTP-only secure cookie at `/dashboard/login`. `Dashboard:AllowAnonymousLocalhost` bypasses cookie auth on loopback when explicitly enabled.
+140
View File
@@ -0,0 +1,140 @@
# Code Review Process
This document describes how to perform a comprehensive, per-module code review of
the `mxaccessgw` codebase and how to track findings to resolution.
A **module** is one buildable project under `src/` (e.g. `src/MxGateway.Worker`)
or one language client under `clients/` (e.g. `clients/rust`). Each module has
its own folder under `code-reviews/` containing a single `findings.md`.
## 1. Before you start
1. Pick the module to review. Its folder is `code-reviews/<Module>/`:
- For a `src/` project, `<Module>` is the project name with the `MxGateway.`
prefix stripped — `src/MxGateway.Server` is reviewed in `code-reviews/Server/`.
- For a language client, `<Module>` is `Client.<Lang>``clients/rust` is
reviewed in `code-reviews/Client.Rust/`.
2. Identify the design context for the module:
- `gateway.md` — top-level architecture, command/event surface, IPC envelope,
STA thread model, fault handling.
- The relevant component design docs under `docs/` (e.g.
`docs/MxAccessWorkerInstanceDesign.md`, `docs/GatewayProcessDesign.md`,
`docs/Sessions.md`, `docs/Authentication.md`, `docs/GalaxyRepository.md`).
- `docs/DesignDecisions.md` for the v1 design choices.
- The **Repository-Specific Conventions** and **Process / Platform Notes** in
`CLAUDE.md`.
3. Record the exact commit being reviewed: `git rev-parse --short HEAD`. Every
review is a snapshot — a finding only means something relative to a known
commit.
4. Open `code-reviews/<Module>/findings.md` and fill in the header table
(reviewer, date, commit SHA, status).
## 2. Review checklist
Work through **every** category below for the module. A comprehensive review
means the checklist is completed even where it produces no findings — record
"No issues found" for a category rather than leaving it ambiguous.
1. **Correctness & logic bugs** — off-by-one, null handling, incorrect
conditionals, misuse of APIs, broken edge cases.
2. **mxaccessgw conventions** — the rules in `CLAUDE.md` and the style guides
under `docs/style-guides/`: the gateway never instantiates MXAccess COM
directly; all MXAccess COM calls run on the worker's dedicated STA thread and
the STA loop pumps Windows messages; IPC uses one bidirectional named pipe per
worker carrying length-prefixed `WorkerEnvelope` protobuf frames; MXAccess
parity is the contract (don't "fix" surprising MXAccess behaviour, never
synthesize events); one worker and one event subscriber per session; the
gateway terminates orphan workers on startup and does not reattach; C# style
(file-scoped namespaces, `sealed` by default, `Async` suffix, MXAccess-aligned
names); no Blazor UI component libraries; no logging of secrets or full tag
values; generated code is never hand-edited.
3. **Concurrency & thread safety** — shared mutable state, STA affinity, race
conditions, correct use of `async`/`await`, locking, disposal races.
4. **Error handling & resilience** — exception paths, worker crash / reconnect
handling, fail-fast event backpressure, transient vs permanent error
classification, graceful degradation, correct gRPC status codes.
5. **Security** — authentication/authorization checks, API-key scope enforcement,
input validation, SQL injection in the Galaxy Repository RPCs, secret
handling, the dashboard anonymous-localhost bypass, logging of sensitive data.
6. **Performance & resource management**`IDisposable` disposal, pipe / stream
/ COM lifetimes, buffering and back-pressure, unnecessary allocations on hot
paths, N+1 queries.
7. **Design-document adherence** — does the code match `gateway.md`, the relevant
`docs/` component designs, `docs/DesignDecisions.md`, and `CLAUDE.md`? Flag
both code that drifts from the design and design docs that are now stale.
8. **Code organization & conventions** — namespace hierarchy, project layout, the
Options pattern, separation of concerns, additive-only contract evolution.
9. **Testing coverage** — are the module's behaviours covered by tests
(`src/MxGateway.Tests`, `src/MxGateway.Worker.Tests`,
`src/MxGateway.IntegrationTests`)? Note untested critical paths and missing
edge-case tests.
10. **Documentation & comments** — XML doc accuracy, misleading or stale comments,
undocumented non-obvious behaviour.
## 3. Recording findings
Add one entry per finding to the `## Findings` section of the module's
`findings.md`, using the entry format in
[`_template/findings.md`](code-reviews/_template/findings.md).
- **Finding ID** — `<Module>-NNN`, numbered sequentially within the module and
never reused (e.g. `Worker-001`). IDs are permanent even after resolution.
- **Severity:**
- **Critical** — data loss, security breach, crash/deadlock, or outage.
- **High** — incorrect behaviour with significant impact; no safe workaround.
- **Medium** — incorrect or risky behaviour with limited impact or a workaround.
- **Low** — minor issues, style, maintainability, documentation.
- **Category** — one of the 10 checklist categories above.
- **Location** — `file:line` (clickable), or a list of locations.
- **Description** — what is wrong and why it matters.
- **Recommendation** — concrete suggested fix.
After recording findings, update the module header table (status, open-finding
count) and regenerate the base README (step 5).
## 4. Marking an item resolved
Findings are **never deleted** — they are an audit trail. To close one, change
its **Status** and complete the **Resolution** field:
- `Open` — newly recorded, not yet addressed.
- `In Progress` — a fix is actively being worked on.
- `Resolved` — fixed. The Resolution field must state the fixing commit SHA, the
date, and a one-line description of the fix.
- `Won't Fix` — intentionally not fixed. The Resolution field must justify why.
- `Deferred` — valid but postponed. The Resolution field must say what it is
waiting on (e.g. a tracked issue or a later milestone).
`Resolved`, `Won't Fix`, and `Deferred` findings are all considered **closed**.
`Open` and `In Progress` are **pending** and appear in the base README's Pending
Findings table.
## 5. Updating the base README
`code-reviews/README.md` holds the single cross-module view (the Module Status
table and the Pending / Closed Findings tables). It is **generated** from the
per-module `findings.md` files — do not edit it by hand.
After any review or status change, regenerate it:
```
python code-reviews/regen-readme.py
```
`regen-readme.py --check` exits non-zero if `README.md` is stale, if a module
header's `Open findings` count disagrees with its finding statuses, or if a
finding carries an unrecognised Status value. The PowerShell wrapper
`scripts/check-code-reviews-readme.ps1` runs that check and is the intended hook
for CI or a pre-commit step.
> The repo's installed `python` is the real interpreter; the bare `python3`
> alias resolves to the Windows Store stub and fails. Use `python`.
The per-module `findings.md` files are the source of truth; `README.md` is the
aggregated index and must always agree with them — which the script guarantees.
## 6. Re-reviewing a module
Re-reviews append to the same `findings.md`. Update the header to the new commit
and date, continue the finding numbering from the last used ID, and leave prior
findings (including closed ones) in place as history.
@@ -122,7 +122,10 @@ public static class MxGatewayClientCli
}
catch (Exception exception) when (exception is not OperationCanceledException)
{
string? apiKey = arguments.GetOptional("api-key");
// Redact the effective API key — whether it came from --api-key or from
// the (documented default) --api-key-env environment variable — so a
// transport error message that echoes the bearer token is never printed.
string? apiKey = TryResolveApiKey(arguments);
string message = MxGatewayCliSecretRedactor.Redact(exception.Message, apiKey);
if (arguments.HasFlag("json"))
@@ -167,6 +170,27 @@ public static class MxGatewayClientCli
}
private static string ResolveApiKey(CliArguments arguments)
{
string? apiKey = TryResolveApiKey(arguments);
if (!string.IsNullOrWhiteSpace(apiKey))
{
return apiKey;
}
string apiKeyEnvironmentName = arguments.GetOptional("api-key-env")
?? "MXGATEWAY_API_KEY";
throw new ArgumentException(
$"Gateway API key is required. Pass --api-key or set {apiKeyEnvironmentName}.");
}
/// <summary>
/// Resolves the effective API key from <c>--api-key</c> or, failing that, the
/// environment variable named by <c>--api-key-env</c> (default
/// <c>MXGATEWAY_API_KEY</c>). Returns <see langword="null"/> when no key is
/// configured; used for redaction where a missing key must not throw.
/// </summary>
private static string? TryResolveApiKey(CliArguments arguments)
{
string? apiKey = arguments.GetOptional("api-key");
if (!string.IsNullOrWhiteSpace(apiKey))
@@ -177,14 +201,7 @@ public static class MxGatewayClientCli
string apiKeyEnvironmentName = arguments.GetOptional("api-key-env")
?? "MXGATEWAY_API_KEY";
apiKey = Environment.GetEnvironmentVariable(apiKeyEnvironmentName);
if (!string.IsNullOrWhiteSpace(apiKey))
{
return apiKey;
}
throw new ArgumentException(
$"Gateway API key is required. Pass --api-key or set {apiKeyEnvironmentName}.");
return Environment.GetEnvironmentVariable(apiKeyEnvironmentName);
}
private static CancellationTokenSource CreateCancellation(CliArguments arguments, string command)
@@ -91,6 +91,19 @@ internal sealed class FakeGatewayTransport(MxGatewayClientOptions options) : IMx
/// </summary>
public Queue<Exception> CloseSessionExceptions { get; } = new();
/// <summary>
/// Gets or sets a value indicating whether thrown <see cref="RpcException"/>s are mapped
/// to <see cref="MxGatewayException"/> the way the production gRPC transport does. Lets
/// retry tests exercise the wrapped-exception predicate branch that runs in production.
/// </summary>
public bool MapTransportExceptions { get; set; }
/// <summary>
/// Gets or sets an optional hook awaited inside CloseSessionAsync after the call is
/// recorded; lets tests pause a close mid-flight to observe concurrent dispose.
/// </summary>
public Func<Task>? CloseSessionHook { get; set; }
/// <summary>
/// Gets the queue of exceptions to throw from InvokeAsync.
/// </summary>
@@ -108,7 +121,7 @@ internal sealed class FakeGatewayTransport(MxGatewayClientOptions options) : IMx
OpenSessionCalls.Add((request, callOptions));
if (OpenSessionExceptions.TryDequeue(out Exception? exception))
{
throw exception;
throw Translate(exception, callOptions);
}
return Task.FromResult(OpenSessionReply);
@@ -119,17 +132,23 @@ internal sealed class FakeGatewayTransport(MxGatewayClientOptions options) : IMx
/// </summary>
/// <param name="request">The CloseSessionRequest to process.</param>
/// <param name="callOptions">Call options specifying RPC behavior.</param>
public Task<CloseSessionReply> CloseSessionAsync(
public async Task<CloseSessionReply> CloseSessionAsync(
CloseSessionRequest request,
CallOptions callOptions)
{
CloseSessionCalls.Add((request, callOptions));
if (CloseSessionExceptions.TryDequeue(out Exception? exception))
if (CloseSessionHook is not null)
{
throw exception;
await CloseSessionHook().ConfigureAwait(false);
}
return Task.FromResult(CloseSessionReply);
if (CloseSessionExceptions.TryDequeue(out Exception? exception))
{
throw Translate(exception, callOptions);
}
return CloseSessionReply;
}
/// <summary>
@@ -144,7 +163,7 @@ internal sealed class FakeGatewayTransport(MxGatewayClientOptions options) : IMx
InvokeCalls.Add((request, callOptions));
if (InvokeExceptions.TryDequeue(out Exception? exception))
{
throw exception;
throw Translate(exception, callOptions);
}
return Task.FromResult(_invokeReplies.Dequeue());
@@ -239,4 +258,18 @@ internal sealed class FakeGatewayTransport(MxGatewayClientOptions options) : IMx
{
_activeAlarmSnapshots.Add(snapshot);
}
/// <summary>
/// Maps a queued exception the way the production gRPC transport does when
/// <see cref="MapTransportExceptions"/> is set; otherwise returns it unchanged.
/// </summary>
private Exception Translate(Exception exception, CallOptions callOptions)
{
if (MapTransportExceptions && exception is RpcException rpcException)
{
return RpcExceptionMapper.Map(rpcException, callOptions.CancellationToken);
}
return exception;
}
}
@@ -106,6 +106,43 @@ public sealed class MxGatewayClientCliTests
Assert.Contains("[redacted]", error.ToString());
}
/// <summary>
/// Verifies that error output redacts the API key even when it was sourced from
/// the <c>--api-key-env</c> environment variable rather than passed via
/// <c>--api-key</c> — the documented default credential path.
/// </summary>
[Fact]
public async Task RunAsync_ErrorOutput_RedactsApiKey_WhenSourcedFromEnvironmentVariable()
{
const string environmentVariableName = "MXGATEWAY_TEST_API_KEY_REDACT";
using var output = new StringWriter();
using var error = new StringWriter();
Environment.SetEnvironmentVariable(environmentVariableName, "env-secret-api-key");
try
{
int exitCode = await MxGatewayClientCli.RunAsync(
[
"open-session",
"--endpoint",
"http://localhost:5000",
"--api-key-env",
environmentVariableName,
],
output,
error,
_ => throw new InvalidOperationException("boom env-secret-api-key"));
Assert.Equal(1, exitCode);
Assert.DoesNotContain("env-secret-api-key", error.ToString());
Assert.Contains("[redacted]", error.ToString());
}
finally
{
Environment.SetEnvironmentVariable(environmentVariableName, null);
}
}
/// <summary>Verifies that stream-events with max-events limit stops output in non-JSON format.</summary>
[Fact]
public async Task RunAsync_StreamEvents_WithMaxEventsStopsNonJsonOutput()
@@ -231,6 +231,52 @@ public sealed class MxGatewayClientSessionTests
Assert.Equal("session-fixture", call.Request.SessionId);
}
/// <summary>
/// Verifies that disposing a session while other callers are concurrently inside
/// <see cref="MxGatewaySession.CloseAsync"/> — one holding the close lock and one
/// parked on it — never throws <see cref="ObjectDisposedException"/> into those
/// callers. The close lock must outlive every pending close.
/// </summary>
[Fact]
public async Task DisposeAsync_DoesNotRaceConcurrentCloseAsync()
{
for (int iteration = 0; iteration < 100; iteration++)
{
FakeGatewayTransport transport = CreateTransport();
using SemaphoreSlim firstCloseEntered = new(0, 1);
using SemaphoreSlim releaseFirstClose = new(0, 1);
// The first CloseAsync to reach the transport parks here while holding the
// session's close lock; later callers queue on the lock behind it.
transport.CloseSessionHook = async () =>
{
firstCloseEntered.Release();
await releaseFirstClose.WaitAsync().ConfigureAwait(false);
transport.CloseSessionHook = null;
};
await using MxGatewayClient client = CreateClient(transport);
MxGatewaySession session = await client.OpenSessionAsync();
// Holder enters CloseAsync, acquires the lock, and parks in the hook.
Task holder = Task.Run(() => session.CloseAsync());
await firstCloseEntered.WaitAsync();
// Waiter is parked on the close lock behind the holder.
Task waiter = Task.Run(() => session.CloseAsync());
// DisposeAsync runs concurrently; it must wait out both callers before
// disposing the close lock rather than tearing it down underneath them.
Task dispose = session.DisposeAsync().AsTask();
releaseFirstClose.Release();
await holder;
await waiter;
await dispose;
}
}
/// <summary>Verifies that invoke retries safe diagnostic commands on transient RPC failure.</summary>
[Fact]
public async Task InvokeAsync_RetriesSafeDiagnosticCommandOnTransientGrpcFailure()
@@ -255,6 +301,35 @@ public sealed class MxGatewayClientSessionTests
Assert.Equal(2, transport.InvokeCalls.Count);
}
/// <summary>
/// Verifies that the retry pipeline still retries when the transport maps the raw
/// <see cref="RpcException"/> to an <see cref="MxGatewayException"/> before it reaches
/// the retry predicate — the wrapped-exception shape that production always produces.
/// </summary>
[Fact]
public async Task InvokeAsync_RetriesSafeDiagnosticCommand_WhenTransportMapsRpcException()
{
FakeGatewayTransport transport = CreateTransport();
transport.MapTransportExceptions = true;
transport.InvokeExceptions.Enqueue(CreateTransientRpcException());
transport.AddInvokeReply(new MxCommandReply
{
SessionId = "session-fixture",
Kind = MxCommandKind.Ping,
ProtocolStatus = new ProtocolStatus { Code = ProtocolStatusCode.Ok },
});
await using MxGatewayClient client = CreateClient(transport);
MxGatewaySession session = await client.OpenSessionAsync();
await session.InvokeAsync(new MxCommandRequest
{
SessionId = session.SessionId,
Command = new MxCommand { Kind = MxCommandKind.Ping, Ping = new PingCommand() },
});
Assert.Equal(2, transport.InvokeCalls.Count);
}
/// <summary>Verifies that open session does not retry on transient RPC failure.</summary>
[Fact]
public async Task OpenSessionAsync_DoesNotRetryTransientGrpcFailure()
@@ -303,6 +378,84 @@ public sealed class MxGatewayClientSessionTests
Assert.Equal(cancellation.Token, Assert.Single(transport.InvokeCalls).CallOptions.CancellationToken);
}
/// <summary>
/// Verifies that a client-imposed <see cref="StatusCode.DeadlineExceeded"/> is not
/// retried. The deadline budget is shared across the whole safe-unary operation, so
/// an immediate retry would only fail again — the call must surface the failure.
/// </summary>
[Fact]
public async Task InvokeAsync_DoesNotRetrySafeDiagnosticCommand_OnDeadlineExceeded()
{
FakeGatewayTransport transport = CreateTransport();
transport.InvokeExceptions.Enqueue(
new RpcException(new Status(StatusCode.DeadlineExceeded, "deadline exceeded")));
transport.AddInvokeReply(new MxCommandReply
{
SessionId = "session-fixture",
Kind = MxCommandKind.Ping,
ProtocolStatus = new ProtocolStatus { Code = ProtocolStatusCode.Ok },
});
await using MxGatewayClient client = CreateClient(transport);
MxGatewaySession session = await client.OpenSessionAsync();
await Assert.ThrowsAsync<RpcException>(async () => await session.InvokeAsync(
new MxCommandRequest
{
SessionId = session.SessionId,
Command = new MxCommand { Kind = MxCommandKind.Ping, Ping = new PingCommand() },
}));
Assert.Single(transport.InvokeCalls);
}
/// <summary>
/// Verifies that a successful register reply missing the typed <c>register</c>
/// payload throws a descriptive <see cref="MxGatewayException"/> rather than
/// silently returning a zero server handle.
/// </summary>
[Fact]
public async Task RegisterAsync_Throws_WhenSuccessfulReplyMissingPayload()
{
FakeGatewayTransport transport = CreateTransport();
transport.AddInvokeReply(new MxCommandReply
{
SessionId = "session-fixture",
Kind = MxCommandKind.Register,
ProtocolStatus = new ProtocolStatus { Code = ProtocolStatusCode.Ok },
});
await using MxGatewayClient client = CreateClient(transport);
MxGatewaySession session = await client.OpenSessionAsync();
MxGatewayException exception = await Assert.ThrowsAsync<MxGatewayException>(
async () => await session.RegisterAsync("client-name"));
Assert.Contains("register", exception.Message, StringComparison.Ordinal);
}
/// <summary>
/// Verifies that a successful add-item reply missing the typed <c>add_item</c>
/// payload throws a descriptive <see cref="MxGatewayException"/> rather than
/// silently returning a zero item handle.
/// </summary>
[Fact]
public async Task AddItemAsync_Throws_WhenSuccessfulReplyMissingPayload()
{
FakeGatewayTransport transport = CreateTransport();
transport.AddInvokeReply(new MxCommandReply
{
SessionId = "session-fixture",
Kind = MxCommandKind.AddItem,
ProtocolStatus = new ProtocolStatus { Code = ProtocolStatusCode.Ok },
});
await using MxGatewayClient client = CreateClient(transport);
MxGatewaySession session = await client.OpenSessionAsync();
MxGatewayException exception = await Assert.ThrowsAsync<MxGatewayException>(
async () => await session.AddItemAsync(1, "Area.Pump.Speed"));
Assert.Contains("add_item", exception.Message, StringComparison.Ordinal);
}
private static MxGatewayClient CreateClient(FakeGatewayTransport transport)
{
return new MxGatewayClient(transport.Options, transport);
@@ -0,0 +1,76 @@
using Grpc.Core;
namespace MxGateway.Client.Tests;
/// <summary>Tests for the shared gRPC-to-native exception mapping used by the transports.</summary>
public sealed class RpcExceptionMapperTests
{
/// <summary>Verifies that an unauthenticated status maps to the authentication exception.</summary>
[Fact]
public void Map_UnauthenticatedStatus_ProducesAuthenticationException()
{
RpcException rpc = new(new Status(StatusCode.Unauthenticated, "no key"));
Exception mapped = RpcExceptionMapper.Map(rpc, CancellationToken.None);
MxGatewayAuthenticationException authentication =
Assert.IsType<MxGatewayAuthenticationException>(mapped);
Assert.Equal(StatusCode.Unauthenticated, authentication.StatusCode);
}
/// <summary>Verifies that a permission-denied status maps to the authorization exception.</summary>
[Fact]
public void Map_PermissionDeniedStatus_ProducesAuthorizationException()
{
RpcException rpc = new(new Status(StatusCode.PermissionDenied, "missing scope"));
Exception mapped = RpcExceptionMapper.Map(rpc, CancellationToken.None);
MxGatewayAuthorizationException authorization =
Assert.IsType<MxGatewayAuthorizationException>(mapped);
Assert.Equal(StatusCode.PermissionDenied, authorization.StatusCode);
}
/// <summary>Verifies that a cancelled status maps to OperationCanceledException.</summary>
[Fact]
public void Map_CancelledStatus_ProducesOperationCanceledException()
{
RpcException rpc = new(new Status(StatusCode.Cancelled, "cancelled"));
Exception mapped = RpcExceptionMapper.Map(rpc, CancellationToken.None);
Assert.IsType<OperationCanceledException>(mapped);
}
/// <summary>
/// Verifies that non-auth statuses surface the originating gRPC status code on the
/// mapped exception so callers can distinguish transient from permanent failures
/// without reflecting into InnerException.
/// </summary>
[Theory]
[InlineData(StatusCode.NotFound)]
[InlineData(StatusCode.InvalidArgument)]
[InlineData(StatusCode.ResourceExhausted)]
[InlineData(StatusCode.FailedPrecondition)]
[InlineData(StatusCode.Unavailable)]
[InlineData(StatusCode.Internal)]
public void Map_NonAuthStatus_CarriesStatusCodeOnMxGatewayException(StatusCode statusCode)
{
RpcException rpc = new(new Status(statusCode, "boom"));
Exception mapped = RpcExceptionMapper.Map(rpc, CancellationToken.None);
MxGatewayException gatewayException = Assert.IsType<MxGatewayException>(mapped);
Assert.Equal(statusCode, gatewayException.StatusCode);
Assert.Same(rpc, gatewayException.InnerException);
}
/// <summary>Verifies that an MxGatewayException built without a gRPC status reports a null StatusCode.</summary>
[Fact]
public void StatusCode_IsNull_WhenNoGrpcStatusProvided()
{
MxGatewayException gatewayException = new("plain failure");
Assert.Null(gatewayException.StatusCode);
}
}
@@ -0,0 +1,24 @@
namespace MxGateway.Client;
public sealed record DiscoverHierarchyOptions
{
public int? RootGobjectId { get; init; }
public string? RootTagName { get; init; }
public string? RootContainedPath { get; init; }
public int? MaxDepth { get; init; }
public IReadOnlyList<int> CategoryIds { get; init; } = Array.Empty<int>();
public IReadOnlyList<string> TemplateChainContains { get; init; } = Array.Empty<string>();
public string? TagNameGlob { get; init; }
public bool? IncludeAttributes { get; init; }
public bool AlarmBearingOnly { get; init; }
public bool HistorizedOnly { get; init; }
}
@@ -36,7 +36,7 @@ internal sealed class GrpcGalaxyRepositoryClientTransport(
}
catch (RpcException exception)
{
throw MapRpcException(exception, callOptions.CancellationToken);
throw RpcExceptionMapper.Map(exception, callOptions.CancellationToken);
}
}
@@ -53,7 +53,7 @@ internal sealed class GrpcGalaxyRepositoryClientTransport(
}
catch (RpcException exception)
{
throw MapRpcException(exception, callOptions.CancellationToken);
throw RpcExceptionMapper.Map(exception, callOptions.CancellationToken);
}
}
@@ -70,7 +70,7 @@ internal sealed class GrpcGalaxyRepositoryClientTransport(
}
catch (RpcException exception)
{
throw MapRpcException(exception, callOptions.CancellationToken);
throw RpcExceptionMapper.Map(exception, callOptions.CancellationToken);
}
}
@@ -101,7 +101,7 @@ internal sealed class GrpcGalaxyRepositoryClientTransport(
}
catch (RpcException exception)
{
throw MapRpcException(exception, effectiveCancellationToken);
throw RpcExceptionMapper.Map(exception, effectiveCancellationToken);
}
yield return deployEvent;
@@ -115,28 +115,4 @@ internal sealed class GrpcGalaxyRepositoryClientTransport(
{
return WatchDeployEventsAsync(request, callOptions);
}
private static Exception MapRpcException(
RpcException exception,
CancellationToken cancellationToken)
{
if (cancellationToken.IsCancellationRequested || exception.StatusCode == StatusCode.Cancelled)
{
return new OperationCanceledException(
exception.Status.Detail,
exception,
cancellationToken);
}
return exception.StatusCode switch
{
StatusCode.Unauthenticated => new MxGatewayAuthenticationException(
exception.Status.Detail,
innerException: exception),
StatusCode.PermissionDenied => new MxGatewayAuthorizationException(
exception.Status.Detail,
innerException: exception),
_ => new MxGatewayException(exception.Status.Detail, exception),
};
}
}
@@ -36,7 +36,7 @@ internal sealed class GrpcMxGatewayClientTransport(
}
catch (RpcException exception)
{
throw MapRpcException(exception, callOptions.CancellationToken);
throw RpcExceptionMapper.Map(exception, callOptions.CancellationToken);
}
}
@@ -53,7 +53,7 @@ internal sealed class GrpcMxGatewayClientTransport(
}
catch (RpcException exception)
{
throw MapRpcException(exception, callOptions.CancellationToken);
throw RpcExceptionMapper.Map(exception, callOptions.CancellationToken);
}
}
@@ -70,7 +70,7 @@ internal sealed class GrpcMxGatewayClientTransport(
}
catch (RpcException exception)
{
throw MapRpcException(exception, callOptions.CancellationToken);
throw RpcExceptionMapper.Map(exception, callOptions.CancellationToken);
}
}
@@ -101,7 +101,7 @@ internal sealed class GrpcMxGatewayClientTransport(
}
catch (RpcException exception)
{
throw MapRpcException(exception, effectiveCancellationToken);
throw RpcExceptionMapper.Map(exception, effectiveCancellationToken);
}
yield return gatewayEvent;
@@ -129,7 +129,7 @@ internal sealed class GrpcMxGatewayClientTransport(
}
catch (RpcException exception)
{
throw MapRpcException(exception, callOptions.CancellationToken);
throw RpcExceptionMapper.Map(exception, callOptions.CancellationToken);
}
}
@@ -160,7 +160,7 @@ internal sealed class GrpcMxGatewayClientTransport(
}
catch (RpcException exception)
{
throw MapRpcException(exception, effectiveCancellationToken);
throw RpcExceptionMapper.Map(exception, effectiveCancellationToken);
}
yield return snapshot;
@@ -174,28 +174,4 @@ internal sealed class GrpcMxGatewayClientTransport(
{
return QueryActiveAlarmsAsync(request, callOptions);
}
private static Exception MapRpcException(
RpcException exception,
CancellationToken cancellationToken)
{
if (cancellationToken.IsCancellationRequested || exception.StatusCode == StatusCode.Cancelled)
{
return new OperationCanceledException(
exception.Status.Detail,
exception,
cancellationToken);
}
return exception.StatusCode switch
{
StatusCode.Unauthenticated => new MxGatewayAuthenticationException(
exception.Status.Detail,
innerException: exception),
StatusCode.PermissionDenied => new MxGatewayAuthorizationException(
exception.Status.Detail,
innerException: exception),
_ => new MxGatewayException(exception.Status.Detail, exception),
};
}
}
@@ -1,3 +1,4 @@
using Grpc.Core;
using MxGateway.Contracts.Proto;
namespace MxGateway.Client;
@@ -13,6 +14,7 @@ public sealed class MxGatewayAuthenticationException : MxGatewayException
/// <param name="hResult">The HResult code, if available.</param>
/// <param name="statuses">The MXAccess statuses, if available.</param>
/// <param name="innerException">The underlying exception, if any.</param>
/// <param name="statusCode">The gRPC status code reported by the failed call, if available.</param>
public MxGatewayAuthenticationException(
string message,
string? sessionId = null,
@@ -20,7 +22,8 @@ public sealed class MxGatewayAuthenticationException : MxGatewayException
ProtocolStatus? protocolStatus = null,
int? hResult = null,
IReadOnlyList<MxStatusProxy>? statuses = null,
Exception? innerException = null)
Exception? innerException = null,
StatusCode? statusCode = null)
: base(
message,
sessionId,
@@ -28,7 +31,8 @@ public sealed class MxGatewayAuthenticationException : MxGatewayException
protocolStatus,
hResult,
statuses ?? [],
innerException)
innerException,
statusCode)
{
}
}
@@ -1,3 +1,4 @@
using Grpc.Core;
using MxGateway.Contracts.Proto;
namespace MxGateway.Client;
@@ -13,6 +14,7 @@ public sealed class MxGatewayAuthorizationException : MxGatewayException
/// <param name="hResult">The HResult code, if available.</param>
/// <param name="statuses">The MXAccess statuses, if available.</param>
/// <param name="innerException">The underlying exception, if any.</param>
/// <param name="statusCode">The gRPC status code reported by the failed call, if available.</param>
public MxGatewayAuthorizationException(
string message,
string? sessionId = null,
@@ -20,7 +22,8 @@ public sealed class MxGatewayAuthorizationException : MxGatewayException
ProtocolStatus? protocolStatus = null,
int? hResult = null,
IReadOnlyList<MxStatusProxy>? statuses = null,
Exception? innerException = null)
Exception? innerException = null,
StatusCode? statusCode = null)
: base(
message,
sessionId,
@@ -28,7 +31,8 @@ public sealed class MxGatewayAuthorizationException : MxGatewayException
protocolStatus,
hResult,
statuses ?? [],
innerException)
innerException,
statusCode)
{
}
}
@@ -17,7 +17,7 @@ public sealed class MxGatewayClient : IAsyncDisposable
private readonly GrpcChannel _channel;
private readonly IMxGatewayClientTransport _transport;
private readonly ResiliencePipeline _safeUnaryRetryPipeline;
private bool _disposed;
private int _disposed;
/// <summary>
/// Initializes a new instance of the <see cref="MxGatewayClient"/> with given options and transport.
@@ -184,9 +184,10 @@ public sealed class MxGatewayClient : IAsyncDisposable
/// <summary>
/// Acknowledges an active MXAccess alarm condition through the gateway. The
/// gateway authenticates the request against the API key's <c>invoke:alarm-ack</c>
/// scope and forwards the acknowledge to the worker's MXAccess session;
/// the resulting <see cref="MxStatusProxy"/> is returned in the reply.
/// gateway authorizes <see cref="AcknowledgeAlarmRequest"/> against the API
/// key's <c>admin</c> scope (there is no finer-grained alarm-ack sub-scope)
/// and forwards the acknowledge to the worker's MXAccess session; the
/// resulting <see cref="MxStatusProxy"/> is returned in the reply.
/// </summary>
/// <param name="request">The acknowledge request — alarm reference, comment, operator user.</param>
/// <param name="cancellationToken">Cancellation token for the operation.</param>
@@ -229,12 +230,11 @@ public sealed class MxGatewayClient : IAsyncDisposable
/// </summary>
public ValueTask DisposeAsync()
{
if (_disposed)
if (Interlocked.Exchange(ref _disposed, 1) != 0)
{
return ValueTask.CompletedTask;
}
_disposed = true;
_channel?.Dispose();
return ValueTask.CompletedTask;
}
@@ -335,6 +335,6 @@ public sealed class MxGatewayClient : IAsyncDisposable
private void ThrowIfDisposed()
{
ObjectDisposedException.ThrowIf(_disposed, this);
ObjectDisposedException.ThrowIf(Volatile.Read(ref _disposed) != 0, this);
}
}
@@ -7,9 +7,19 @@ namespace MxGateway.Client;
/// </summary>
public static class MxGatewayClientContractInfo
{
/// <summary>
/// Gets the gateway gRPC protocol version compiled into this client package.
/// A client and gateway are wire-compatible only when this value matches the
/// gateway's advertised gateway protocol version.
/// </summary>
public const uint GatewayProtocolVersion =
GatewayContractInfo.GatewayProtocolVersion;
/// <summary>
/// Gets the worker frame protocol version compiled into this client package.
/// Exposed for diagnostics so callers can report the worker protocol the
/// shared contracts were generated against.
/// </summary>
public const uint WorkerProtocolVersion =
GatewayContractInfo.WorkerProtocolVersion;
}
@@ -38,7 +38,12 @@ public sealed class MxGatewayClientOptions
public TimeSpan ConnectTimeout { get; init; } = TimeSpan.FromSeconds(10);
/// <summary>
/// Gets the default timeout for unary gRPC calls.
/// Gets the timeout budget for a unary gRPC operation. This is both the gRPC
/// deadline stamped on each individual attempt and the overall budget for the
/// whole safe-unary operation: for retryable calls the initial attempt, every
/// retry, and the backoff delays between them all share this single budget.
/// It is therefore an upper bound on the total wall-clock time a safe-unary
/// call can take, not a fresh per-retry allowance.
/// </summary>
public TimeSpan DefaultCallTimeout { get; init; } = TimeSpan.FromSeconds(30);
@@ -47,6 +52,11 @@ public sealed class MxGatewayClientOptions
/// </summary>
public TimeSpan? StreamTimeout { get; init; }
/// <summary>
/// Gets the maximum size, in bytes, of a single gRPC message the client will
/// send or receive. Applied to both the send and receive limits of the
/// underlying channel. Defaults to 16 MiB.
/// </summary>
public int MaxGrpcMessageBytes { get; init; } = 16 * 1024 * 1024;
/// <summary>
@@ -61,8 +61,13 @@ internal static class MxGatewayClientRetryPolicy
private static bool IsTransientStatus(StatusCode statusCode)
{
// DeadlineExceeded is intentionally NOT treated as transient. The deadline
// on every unary call is client-imposed (CreateCallOptions stamps the
// DefaultCallTimeout budget), and that same budget is shared across the
// initial attempt plus all retries plus backoff. A DeadlineExceeded means
// the shared budget is exhausted, so an immediate retry would only fail
// again — burning the remaining budget on a call that cannot succeed.
return statusCode is StatusCode.Unavailable
or StatusCode.DeadlineExceeded
or StatusCode.ResourceExhausted;
}
}
@@ -1,3 +1,4 @@
using Grpc.Core;
using MxGateway.Contracts.Proto;
namespace MxGateway.Client;
@@ -28,6 +29,20 @@ public class MxGatewayException : Exception
Statuses = [];
}
/// <summary>
/// Initializes a new instance of the MxGatewayException class carrying the originating
/// gRPC status code so callers can distinguish transient from permanent failures.
/// </summary>
/// <param name="message">Diagnostic message describing the failure.</param>
/// <param name="statusCode">The gRPC status code reported by the failed call.</param>
/// <param name="innerException">Underlying exception that caused this failure.</param>
public MxGatewayException(string message, StatusCode statusCode, Exception? innerException)
: base(message, innerException)
{
StatusCode = statusCode;
Statuses = [];
}
/// <summary>
/// Initializes a new instance of the MxGatewayException class with full diagnostic information.
/// </summary>
@@ -38,6 +53,7 @@ public class MxGatewayException : Exception
/// <param name="hResult">HRESULT code returned by the worker or MXAccess, if available.</param>
/// <param name="statuses">List of MXAccess status codes returned by the operation.</param>
/// <param name="innerException">Underlying exception that caused this failure.</param>
/// <param name="statusCode">The gRPC status code reported by the failed call, if available.</param>
public MxGatewayException(
string message,
string? sessionId,
@@ -45,7 +61,8 @@ public class MxGatewayException : Exception
ProtocolStatus? protocolStatus,
int? hResult,
IReadOnlyList<MxStatusProxy> statuses,
Exception? innerException = null)
Exception? innerException = null,
StatusCode? statusCode = null)
: base(message, innerException)
{
SessionId = sessionId;
@@ -53,6 +70,7 @@ public class MxGatewayException : Exception
ProtocolStatus = protocolStatus;
HResultCode = hResult;
Statuses = statuses;
StatusCode = statusCode;
}
/// <summary>
@@ -79,4 +97,15 @@ public class MxGatewayException : Exception
/// Gets the list of MXAccess status codes returned by the operation.
/// </summary>
public IReadOnlyList<MxStatusProxy> Statuses { get; }
/// <summary>
/// Gets the gRPC status code reported by the failed call, if the failure originated
/// from a gRPC <see cref="RpcException"/>. <see langword="null"/> when the exception
/// was not produced from a gRPC status (for example, a protocol-level reply failure).
/// Callers can inspect this to distinguish a transient outage
/// (<see cref="Grpc.Core.StatusCode.Unavailable"/>) from a permanent error
/// (<see cref="Grpc.Core.StatusCode.InvalidArgument"/>) without downcasting
/// <see cref="Exception.InnerException"/>.
/// </summary>
public StatusCode? StatusCode { get; }
}
@@ -9,7 +9,10 @@ public sealed class MxGatewaySession : IAsyncDisposable
{
private readonly MxGatewayClient _client;
private readonly SemaphoreSlim _closeLock = new(1, 1);
private readonly object _disposeGate = new();
private CloseSessionReply? _closeReply;
private int _activeCloseCount;
private bool _closeLockDisposed;
/// <summary>
/// Initializes a new session backed by the given MXAccess gateway client.
@@ -46,23 +49,42 @@ public sealed class MxGatewaySession : IAsyncDisposable
return _closeReply;
}
await _closeLock.WaitAsync(cancellationToken).ConfigureAwait(false);
// Register as an in-flight closer under the dispose gate. DisposeAsync waits for
// _activeCloseCount to drain before disposing the close lock, so the semaphore is
// guaranteed to outlive every WaitAsync started here.
lock (_disposeGate)
{
ObjectDisposedException.ThrowIf(_closeLockDisposed, this);
_activeCloseCount++;
}
try
{
if (_closeReply is not null)
await _closeLock.WaitAsync(cancellationToken).ConfigureAwait(false);
try
{
if (_closeReply is not null)
{
return _closeReply;
}
_closeReply = await _client.CloseSessionRawAsync(
new CloseSessionRequest { SessionId = SessionId },
cancellationToken)
.ConfigureAwait(false);
return _closeReply;
}
_closeReply = await _client.CloseSessionRawAsync(
new CloseSessionRequest { SessionId = SessionId },
cancellationToken)
.ConfigureAwait(false);
return _closeReply;
finally
{
_closeLock.Release();
}
}
finally
{
_closeLock.Release();
lock (_disposeGate)
{
_activeCloseCount--;
}
}
}
@@ -79,7 +101,8 @@ public sealed class MxGatewaySession : IAsyncDisposable
MxCommandReply reply = await RegisterRawAsync(clientName, cancellationToken)
.ConfigureAwait(false);
reply.EnsureProtocolSuccess().EnsureMxAccessSuccess();
return reply.Register?.ServerHandle ?? reply.ReturnValue.Int32Value;
return reply.Register?.ServerHandle
?? throw CreateMissingPayloadException(reply, "register");
}
/// <summary>
@@ -121,7 +144,8 @@ public sealed class MxGatewaySession : IAsyncDisposable
cancellationToken)
.ConfigureAwait(false);
reply.EnsureProtocolSuccess().EnsureMxAccessSuccess();
return reply.AddItem?.ItemHandle ?? reply.ReturnValue.Int32Value;
return reply.AddItem?.ItemHandle
?? throw CreateMissingPayloadException(reply, "add_item");
}
/// <summary>
@@ -172,7 +196,8 @@ public sealed class MxGatewaySession : IAsyncDisposable
cancellationToken)
.ConfigureAwait(false);
reply.EnsureProtocolSuccess().EnsureMxAccessSuccess();
return reply.AddItem2?.ItemHandle ?? reply.ReturnValue.Int32Value;
return reply.AddItem2?.ItemHandle
?? throw CreateMissingPayloadException(reply, "add_item2");
}
/// <summary>
@@ -658,7 +683,32 @@ public sealed class MxGatewaySession : IAsyncDisposable
/// </summary>
public async ValueTask DisposeAsync()
{
lock (_disposeGate)
{
if (_closeLockDisposed)
{
return;
}
}
await CloseAsync().ConfigureAwait(false);
// Wait for every concurrent CloseAsync caller to leave the close lock before
// disposing it; once _closeReply is set those callers return without awaiting.
while (true)
{
lock (_disposeGate)
{
if (_activeCloseCount == 0)
{
_closeLockDisposed = true;
break;
}
}
await Task.Yield();
}
_closeLock.Dispose();
}
@@ -676,4 +726,21 @@ public sealed class MxGatewaySession : IAsyncDisposable
cancellationToken);
}
/// <summary>
/// Builds the exception thrown when a command reply passed protocol and
/// MXAccess success checks but is missing the typed handle-bearing payload
/// the command contract requires. Surfacing this as a clear error avoids
/// silently handing a zero handle to the caller (it would otherwise fall
/// through to <see cref="MxCommandReply.ReturnValue"/>, which is 0 when the
/// reply carries no return value).
/// </summary>
private static MxGatewayException CreateMissingPayloadException(
MxCommandReply reply,
string expectedPayload)
{
return new MxGatewayException(
$"Gateway reply for command kind={reply.Kind} reported success but is missing "
+ $"the required '{expectedPayload}' payload; cannot resolve a handle. "
+ $"session={reply.SessionId}; correlation={reply.CorrelationId}");
}
}
@@ -0,0 +1,55 @@
using Grpc.Core;
namespace MxGateway.Client;
/// <summary>
/// Maps low-level <see cref="RpcException"/>s raised by the gRPC stack to the client's
/// native exception hierarchy. Shared by every gateway and Galaxy Repository transport
/// so the gRPC-to-native translation has exactly one implementation.
/// </summary>
internal static class RpcExceptionMapper
{
/// <summary>
/// Translates a <see cref="RpcException"/> into the most specific native exception type.
/// </summary>
/// <param name="exception">The gRPC exception to translate.</param>
/// <param name="cancellationToken">
/// The cancellation token of the originating call; used to distinguish a caller-driven
/// cancellation from a server-side <see cref="StatusCode.Cancelled"/> status.
/// </param>
/// <returns>
/// An <see cref="OperationCanceledException"/> when the call was cancelled, a typed
/// authentication/authorization exception for auth statuses, or an
/// <see cref="MxGatewayException"/> carrying the originating gRPC <see cref="StatusCode"/>.
/// </returns>
public static Exception Map(
RpcException exception,
CancellationToken cancellationToken)
{
ArgumentNullException.ThrowIfNull(exception);
if (cancellationToken.IsCancellationRequested || exception.StatusCode == StatusCode.Cancelled)
{
return new OperationCanceledException(
exception.Status.Detail,
exception,
cancellationToken);
}
return exception.StatusCode switch
{
StatusCode.Unauthenticated => new MxGatewayAuthenticationException(
exception.Status.Detail,
statusCode: exception.StatusCode,
innerException: exception),
StatusCode.PermissionDenied => new MxGatewayAuthorizationException(
exception.Status.Detail,
statusCode: exception.StatusCode,
innerException: exception),
_ => new MxGatewayException(
exception.Status.Detail,
exception.StatusCode,
exception),
};
}
}
+13 -1
View File
@@ -112,6 +112,17 @@ can keep the full `MxCommandReply`, HRESULT, and status array when MXAccess
itself rejects a command. `MxAccessException.Reply` contains the raw generated
reply.
When a gRPC call itself fails, the transport maps the underlying
`RpcException` to a native exception: `Unauthenticated` becomes
`MxGatewayAuthenticationException`, `PermissionDenied` becomes
`MxGatewayAuthorizationException`, a cancelled call becomes
`OperationCanceledException`, and every other status becomes a base
`MxGatewayException`. `MxGatewayException.StatusCode` carries the originating
gRPC `Grpc.Core.StatusCode` (non-null whenever the failure came from a gRPC
status), so callers can distinguish a transient outage (`Unavailable`) from a
permanent error (`InvalidArgument`, `NotFound`) without downcasting
`InnerException`.
## CLI Usage
The test CLI supports deterministic JSON output for automation:
@@ -131,7 +142,8 @@ dotnet run --project clients/dotnet/MxGateway.Client.Cli -- smoke --endpoint htt
`smoke` opens a session, registers a client, adds one item, advises it,
optionally writes a value when `--type` and `--value` are supplied, reads a
bounded event stream, and closes the session in a `finally` block. CLI error
output redacts API keys supplied through `--api-key`.
output redacts the effective API key, whether it was supplied through
`--api-key` or resolved from the `--api-key-env` environment variable.
## Galaxy Repository Browse
+20 -1
View File
@@ -79,11 +79,30 @@ client, err := mxgateway.Dial(ctx, mxgateway.Options{
`AddItem`, `AddItem2`, `Advise`, `Write`, `Events`, and `Close`. Prefer
`SubscribeEvents` or `SubscribeEventsAfter` for long-running streams because the
returned subscription owns cancellation and exposes `Close` for deterministic
goroutine cleanup. Raw protobuf messages remain available through the
goroutine cleanup. `Events` and `EventsAfter` are a compatibility shim with a
bounded internal buffer: if the consumer drains too slowly the buffer fills,
the underlying stream is cancelled, and a terminal `EventResult` carrying
`ErrEventBufferOverflow` is delivered as the channel's last item before it
closes — so a slow consumer can distinguish dropped events from a normal
end-of-stream. `SubscribeEvents` blocks instead of dropping, so use it when no
events may be lost. Raw protobuf messages remain available through the
`mxgateway` package aliases and the `Raw` helper methods. Typed errors support
`errors.As` for `GatewayError`, `CommandError`, and `MxAccessError`; command
errors preserve the raw reply.
`Dial` and `DialGalaxy` create the connection lazily (`grpc.NewClient`): a
gateway that is briefly unavailable no longer turns into a hard error — the
connection recovers once the gateway comes up. To keep fail-fast behavior,
both run a readiness probe bounded by `DialTimeout` (default 10s, or the
context deadline when sooner) and return a `*GatewayError` if the gateway
cannot be reached in that window.
For retry, timeout, and auth handling, `GatewayError.Code()` exposes the
wrapped gRPC `codes.Code`, and `mxgateway.IsTransient(err)` reports whether a
failure (`Unavailable`, `DeadlineExceeded`, `ResourceExhausted`, `Aborted`)
may succeed on retry — so callers do not have to unwrap the error and call
`status.Code` themselves.
## Galaxy Repository browse
The `GalaxyRepository` service (proto package `galaxy_repository.v1`) is a
+9 -4
View File
@@ -331,6 +331,11 @@ func runUnsubscribeBulk(ctx context.Context, args []string, stdout, stderr io.Wr
return errors.New("session-id and item-handles are required")
}
handles, err := parseInt32List(*itemHandles)
if err != nil {
return err
}
client, options, err := dialForCommand(ctx, common)
if err != nil {
return err
@@ -338,7 +343,7 @@ func runUnsubscribeBulk(ctx context.Context, args []string, stdout, stderr io.Wr
defer client.Close()
session := mxgateway.NewSessionForID(client, *sessionID)
results, err := session.UnsubscribeBulk(ctx, int32(*serverHandle), parseInt32List(*itemHandles))
results, err := session.UnsubscribeBulk(ctx, int32(*serverHandle), handles)
return writeBulkOutput(stdout, *jsonOutput, "unsubscribe-bulk", options, results, err)
}
@@ -514,7 +519,7 @@ func parseStringList(value string) []string {
return items
}
func parseInt32List(value string) []int32 {
func parseInt32List(value string) ([]int32, error) {
parts := strings.Split(value, ",")
items := make([]int32, 0, len(parts))
for _, part := range parts {
@@ -524,11 +529,11 @@ func parseInt32List(value string) []int32 {
}
parsed, err := strconv.ParseInt(item, 10, 32)
if err != nil {
panic(err)
return nil, fmt.Errorf("invalid item handle %q: %w", item, err)
}
items = append(items, int32(parsed))
}
return items
return items, nil
}
func bindCommonFlags(flags *flag.FlagSet) *commonOptions {
+29
View File
@@ -56,3 +56,32 @@ func TestParseValueBuildsTypedValue(t *testing.T) {
t.Fatalf("int32 value = %d, want 123", got)
}
}
func TestParseInt32ListParsesValidTokens(t *testing.T) {
items, err := parseInt32List("1, 2 ,3")
if err != nil {
t.Fatalf("parseInt32List() error = %v", err)
}
want := []int32{1, 2, 3}
if len(items) != len(want) {
t.Fatalf("parseInt32List() = %v, want %v", items, want)
}
for i := range want {
if items[i] != want[i] {
t.Fatalf("parseInt32List()[%d] = %d, want %d", i, items[i], want[i])
}
}
}
func TestParseInt32ListReturnsErrorOnMalformedToken(t *testing.T) {
items, err := parseInt32List("1,foo")
if err == nil {
t.Fatalf("parseInt32List() error = nil, want a parse error; items = %v", items)
}
if items != nil {
t.Fatalf("parseInt32List() items = %v, want nil on error", items)
}
if !strings.Contains(err.Error(), "foo") {
t.Fatalf("parseInt32List() error = %q, want it to name the bad token", err.Error())
}
}
+5 -3
View File
@@ -150,8 +150,8 @@ func TestQueryActiveAlarmsPassesFilterPrefix(t *testing.T) {
defer cleanup()
stream, err := client.QueryActiveAlarms(context.Background(), &pb.QueryActiveAlarmsRequest{
SessionId: "session-1",
AlarmFilterPrefix: "Tank01.",
SessionId: "session-1",
AlarmFilterPrefix: "Tank01.",
})
if err != nil {
t.Fatalf("QueryActiveAlarms() error = %v", err)
@@ -221,8 +221,10 @@ func newBufconnClientWithAlarms(t *testing.T, fake *fakeGatewayWithAlarms) (*Cli
dialer := func(ctx context.Context, _ string) (net.Conn, error) {
return listener.DialContext(ctx)
}
// grpc.NewClient defaults to the dns scheme; use passthrough so the
// bufconn fake target reaches the context dialer unresolved.
client, err := Dial(context.Background(), Options{
Endpoint: "bufnet",
Endpoint: "passthrough:///bufnet",
APIKey: "test-api-key",
Plaintext: true,
DialOptions: []grpc.DialOption{grpc.WithContextDialer(dialer)},
+68 -15
View File
@@ -19,6 +19,7 @@ import (
pb "gitea.dohertylan.com/dohertj2/mxaccessgw/clients/go/internal/generated"
"google.golang.org/grpc"
"google.golang.org/grpc/connectivity"
"google.golang.org/grpc/credentials"
"google.golang.org/grpc/credentials/insecure"
"google.golang.org/protobuf/types/known/durationpb"
@@ -36,22 +37,36 @@ type Client struct {
opts Options
}
// Dial opens a gRPC connection to the gateway and configures auth metadata,
// transport security, and blocking dial cancellation from ctx.
// Dial opens a gRPC connection to the gateway and configures auth metadata
// and transport security.
//
// The connection is created lazily with grpc.NewClient: the channel is not
// established until the first RPC (or the readiness probe below) needs it, so
// a gateway that is briefly unavailable at Dial time no longer turns into a
// hard error — the connection recovers when the gateway comes up. To preserve
// fail-fast behavior, Dial then runs an explicit readiness probe bounded by
// DialTimeout (default 10s, or ctx's deadline when sooner): it triggers the
// initial connect and waits for the channel to reach Ready, returning a
// *GatewayError if the gateway cannot be reached in that window. Cancelling
// ctx aborts the probe.
func Dial(ctx context.Context, opts Options) (*Client, error) {
conn, err := dial(ctx, opts)
if err != nil {
return nil, err
}
return NewClient(conn, opts), nil
}
// dial builds the shared gRPC connection used by both Client and GalaxyClient:
// it resolves transport credentials, assembles dial options, creates a lazy
// connection with grpc.NewClient, and runs the DialTimeout-bounded readiness
// probe so callers still fail fast when the gateway is unreachable.
func dial(ctx context.Context, opts Options) (*grpc.ClientConn, error) {
if opts.Endpoint == "" {
return nil, errors.New("mxgateway: endpoint is required")
}
dialCtx := ctx
cancel := func() {}
if opts.DialTimeout > 0 {
dialCtx, cancel = context.WithTimeout(ctx, opts.DialTimeout)
} else if _, ok := ctx.Deadline(); !ok {
dialCtx, cancel = context.WithTimeout(ctx, defaultDialTimeout)
}
defer cancel()
transportCredentials, err := resolveTransportCredentials(opts)
if err != nil {
return nil, err
@@ -61,16 +76,46 @@ func Dial(ctx context.Context, opts Options) (*Client, error) {
grpc.WithTransportCredentials(transportCredentials),
grpc.WithUnaryInterceptor(unaryAuthInterceptor(opts.APIKey)),
grpc.WithStreamInterceptor(streamAuthInterceptor(opts.APIKey)),
grpc.WithBlock(),
}
dialOptions = append(dialOptions, opts.DialOptions...)
conn, err := grpc.DialContext(dialCtx, opts.Endpoint, dialOptions...)
conn, err := grpc.NewClient(opts.Endpoint, dialOptions...)
if err != nil {
return nil, &GatewayError{Op: "dial", Err: err}
}
return NewClient(conn, opts), nil
if err := waitForReady(ctx, conn, opts.DialTimeout); err != nil {
_ = conn.Close()
return nil, &GatewayError{Op: "dial", Err: err}
}
return conn, nil
}
// waitForReady triggers the initial connect on conn and blocks until the
// channel reaches connectivity.Ready, the timeout elapses, or ctx is
// cancelled. The wait is bounded by dialTimeout when positive, otherwise by
// ctx's existing deadline, otherwise by defaultDialTimeout.
func waitForReady(ctx context.Context, conn *grpc.ClientConn, dialTimeout time.Duration) error {
probeCtx := ctx
cancel := func() {}
if dialTimeout > 0 {
probeCtx, cancel = context.WithTimeout(ctx, dialTimeout)
} else if _, ok := ctx.Deadline(); !ok {
probeCtx, cancel = context.WithTimeout(ctx, defaultDialTimeout)
}
defer cancel()
conn.Connect()
for {
state := conn.GetState()
if state == connectivity.Ready {
return nil
}
if !conn.WaitForStateChange(probeCtx, state) {
return probeCtx.Err()
}
}
}
// NewClient wraps an existing gRPC connection. The caller owns closing conn
@@ -188,7 +233,15 @@ func (c *Client) Close() error {
}
func (c *Client) callContext(ctx context.Context) (context.Context, context.CancelFunc) {
timeout := c.opts.CallTimeout
return callContext(ctx, c.opts.CallTimeout)
}
// callContext derives a per-RPC context from ctx, applying callTimeout: zero
// uses defaultCallTimeout, a negative value disables the bound entirely, and a
// caller-supplied deadline that is already sooner than the derived timeout is
// kept as-is rather than being lengthened.
func callContext(ctx context.Context, callTimeout time.Duration) (context.Context, context.CancelFunc) {
timeout := callTimeout
if timeout == 0 {
timeout = defaultCallTimeout
}
+19 -3
View File
@@ -117,7 +117,7 @@ func TestEventsAfterCancelsStreamWhenCompatibilityChannelIsAbandoned(t *testing.
fake := &fakeGatewayServer{
streamStarted: make(chan struct{}),
streamDone: make(chan struct{}),
streamEventCount: 64,
streamEventCount: 256,
}
client, cleanup := newBufconnClient(t, fake)
defer cleanup()
@@ -135,12 +135,25 @@ func TestEventsAfterCancelsStreamWhenCompatibilityChannelIsAbandoned(t *testing.
t.Fatal("compatibility event stream did not stop after result channel filled")
}
// A slow consumer that abandons the buffer must still receive an explicit
// terminal overflow error before the channel closes, so it can tell
// "events dropped" apart from "stream ended normally".
var sawOverflow bool
for {
select {
case _, ok := <-events:
case result, ok := <-events:
if !ok {
if !sawOverflow {
t.Fatal("compatibility event channel closed without an ErrEventBufferOverflow result")
}
return
}
if result.Err != nil {
if !errors.Is(result.Err, ErrEventBufferOverflow) {
t.Fatalf("terminal result error = %v, want ErrEventBufferOverflow", result.Err)
}
sawOverflow = true
}
case <-time.After(2 * time.Second):
t.Fatal("compatibility event channel did not close")
}
@@ -279,8 +292,11 @@ func newBufconnClient(t *testing.T, fake *fakeGatewayServer) (*Client, func()) {
dialer := func(ctx context.Context, _ string) (net.Conn, error) {
return listener.DialContext(ctx)
}
// grpc.NewClient defaults the target scheme to dns; the bufconn fake name
// is not DNS-resolvable, so use the passthrough scheme to hand the target
// straight to the context dialer.
client, err := Dial(context.Background(), Options{
Endpoint: "bufnet",
Endpoint: "passthrough:///bufnet",
APIKey: "test-api-key",
Plaintext: true,
DialOptions: []grpc.DialOption{
+401
View File
@@ -0,0 +1,401 @@
package mxgateway
import (
"context"
"crypto/tls"
"errors"
"net"
"reflect"
"strings"
"testing"
"time"
pb "gitea.dohertylan.com/dohertj2/mxaccessgw/clients/go/internal/generated"
"google.golang.org/grpc"
"google.golang.org/grpc/codes"
"google.golang.org/grpc/credentials/insecure"
"google.golang.org/grpc/status"
"google.golang.org/protobuf/types/known/timestamppb"
)
// --- Client.Go-008: resolveTransportCredentials precedence -----------------
// TestResolveTransportCredentialsPrecedence covers every branch of
// resolveTransportCredentials, which previously only had the Plaintext path
// exercised.
func TestResolveTransportCredentialsPrecedence(t *testing.T) {
custom := insecure.NewCredentials()
t.Run("TransportCredentialsWins", func(t *testing.T) {
creds, err := resolveTransportCredentials(Options{
TransportCredentials: custom,
Plaintext: true, // must be ignored
})
if err != nil {
t.Fatalf("unexpected error: %v", err)
}
if creds != custom {
t.Fatal("expected the explicit TransportCredentials to be returned as-is")
}
})
t.Run("Plaintext", func(t *testing.T) {
creds, err := resolveTransportCredentials(Options{Plaintext: true})
if err != nil {
t.Fatalf("unexpected error: %v", err)
}
if got := creds.Info().SecurityProtocol; got != "insecure" {
t.Fatalf("expected insecure credentials, got security protocol %q", got)
}
})
t.Run("CACertFileMissingErrors", func(t *testing.T) {
_, err := resolveTransportCredentials(Options{CACertFile: "does-not-exist.pem"})
if err == nil {
t.Fatal("expected an error for a missing CA cert file")
}
})
t.Run("TLSConfigWithServerNameOverride", func(t *testing.T) {
creds, err := resolveTransportCredentials(Options{
TLSConfig: &tls.Config{MinVersion: tls.VersionTLS13},
ServerNameOverride: "gateway.internal",
})
if err != nil {
t.Fatalf("unexpected error: %v", err)
}
if got := creds.Info().ServerName; got != "gateway.internal" {
t.Fatalf("expected ServerName override to be applied, got %q", got)
}
})
t.Run("DefaultTLSFloor", func(t *testing.T) {
creds, err := resolveTransportCredentials(Options{ServerNameOverride: "host"})
if err != nil {
t.Fatalf("unexpected error: %v", err)
}
if got := creds.Info().SecurityProtocol; got != "tls" {
t.Fatalf("expected the default TLS credentials, got %q", got)
}
})
}
// TestResolveTransportCredentialsDoesNotMutateTLSConfig confirms the supplied
// TLSConfig is cloned, not mutated, when ServerNameOverride is applied.
func TestResolveTransportCredentialsDoesNotMutateTLSConfig(t *testing.T) {
cfg := &tls.Config{MinVersion: tls.VersionTLS12}
if _, err := resolveTransportCredentials(Options{
TLSConfig: cfg,
ServerNameOverride: "override",
}); err != nil {
t.Fatalf("unexpected error: %v", err)
}
if cfg.ServerName != "" {
t.Fatalf("resolveTransportCredentials mutated the caller's TLSConfig (ServerName=%q)", cfg.ServerName)
}
}
// --- Client.Go-008: callContext deadline arithmetic ------------------------
// TestCallContextDeadlineArithmetic covers the shared callContext deadline
// logic, including the negative-timeout disable case and the
// caller-deadline-is-sooner case.
func TestCallContextDeadlineArithmetic(t *testing.T) {
t.Run("ZeroUsesDefault", func(t *testing.T) {
ctx, cancel := callContext(context.Background(), 0)
defer cancel()
deadline, ok := ctx.Deadline()
if !ok {
t.Fatal("expected a deadline for the default timeout")
}
remaining := time.Until(deadline)
if remaining <= 0 || remaining > defaultCallTimeout+time.Second {
t.Fatalf("default deadline out of range: %v", remaining)
}
})
t.Run("NegativeDisablesBound", func(t *testing.T) {
base := context.Background()
ctx, cancel := callContext(base, -1)
defer cancel()
if _, ok := ctx.Deadline(); ok {
t.Fatal("a negative timeout must disable the deadline entirely")
}
if ctx != base {
t.Fatal("a negative timeout must return the caller context unchanged")
}
})
t.Run("PositiveAppliesTimeout", func(t *testing.T) {
ctx, cancel := callContext(context.Background(), 5*time.Second)
defer cancel()
deadline, ok := ctx.Deadline()
if !ok {
t.Fatal("expected a deadline")
}
remaining := time.Until(deadline)
if remaining <= 0 || remaining > 5*time.Second+time.Second {
t.Fatalf("deadline out of range: %v", remaining)
}
})
t.Run("CallerDeadlineSoonerIsKept", func(t *testing.T) {
base, baseCancel := context.WithTimeout(context.Background(), 100*time.Millisecond)
defer baseCancel()
ctx, cancel := callContext(base, 30*time.Second)
defer cancel()
if ctx != base {
t.Fatal("a caller deadline sooner than the timeout must be kept as-is")
}
})
t.Run("CallerDeadlineLaterIsShortened", func(t *testing.T) {
base, baseCancel := context.WithTimeout(context.Background(), time.Hour)
defer baseCancel()
ctx, cancel := callContext(base, time.Second)
defer cancel()
deadline, ok := ctx.Deadline()
if !ok {
t.Fatal("expected a deadline")
}
if remaining := time.Until(deadline); remaining > 2*time.Second {
t.Fatalf("expected the shorter timeout to win, got %v remaining", remaining)
}
})
}
// --- Client.Go-008: NativeValue / NativeArray edge branches ----------------
// TestNativeValueEdgeKinds covers the array, raw-bytes, null, and
// nil-input branches of NativeValue.
func TestNativeValueEdgeKinds(t *testing.T) {
t.Run("NilInput", func(t *testing.T) {
got, err := NativeValue(nil)
if err != nil || got != nil {
t.Fatalf("NativeValue(nil) = (%v, %v), want (nil, nil)", got, err)
}
})
t.Run("ExplicitNull", func(t *testing.T) {
got, err := NativeValue(&pb.MxValue{IsNull: true})
if err != nil || got != nil {
t.Fatalf("NativeValue(null) = (%v, %v), want (nil, nil)", got, err)
}
})
t.Run("RawBytes", func(t *testing.T) {
raw := []byte{0x01, 0x02, 0x03}
got, err := NativeValue(&pb.MxValue{Kind: &pb.MxValue_RawValue{RawValue: raw}})
if err != nil {
t.Fatalf("unexpected error: %v", err)
}
gotBytes, ok := got.([]byte)
if !ok || !reflect.DeepEqual(gotBytes, raw) {
t.Fatalf("NativeValue raw = %v, want %v", got, raw)
}
// The result must be a copy, not aliasing the protobuf field.
gotBytes[0] = 0xFF
if raw[0] != 0x01 {
t.Fatal("NativeValue raw result aliases the protobuf backing array")
}
})
t.Run("ArrayValue", func(t *testing.T) {
value := &pb.MxValue{Kind: &pb.MxValue_ArrayValue{
ArrayValue: &pb.MxArray{Values: &pb.MxArray_Int32Values{
Int32Values: &pb.Int32Array{Values: []int32{7, 8}},
}},
}}
got, err := NativeValue(value)
if err != nil {
t.Fatalf("unexpected error: %v", err)
}
if !reflect.DeepEqual(got, []int32{7, 8}) {
t.Fatalf("NativeValue array = %v, want [7 8]", got)
}
})
}
// TestNativeArrayEdgeKinds covers the nil, raw-bytes, timestamp-with-nil, and
// unsupported-kind branches of NativeArray.
func TestNativeArrayEdgeKinds(t *testing.T) {
t.Run("NilInput", func(t *testing.T) {
got, err := NativeArray(nil)
if err != nil || got != nil {
t.Fatalf("NativeArray(nil) = (%v, %v), want (nil, nil)", got, err)
}
})
t.Run("RawValues", func(t *testing.T) {
got, err := NativeArray(&pb.MxArray{Values: &pb.MxArray_RawValues{
RawValues: &pb.RawArray{Values: [][]byte{{0x0A}, {0x0B}}},
}})
if err != nil {
t.Fatalf("unexpected error: %v", err)
}
want := [][]byte{{0x0A}, {0x0B}}
if !reflect.DeepEqual(got, want) {
t.Fatalf("NativeArray raw = %v, want %v", got, want)
}
})
t.Run("TimestampWithNilEntry", func(t *testing.T) {
got, err := NativeArray(&pb.MxArray{Values: &pb.MxArray_TimestampValues{
TimestampValues: &pb.TimestampArray{Values: []*timestamppb.Timestamp{nil}},
}})
if err != nil {
t.Fatalf("unexpected error: %v", err)
}
times, ok := got.([]time.Time)
if !ok || len(times) != 1 || !times[0].IsZero() {
t.Fatalf("NativeArray timestamp-with-nil = %v, want [zero-time]", got)
}
})
t.Run("UnsupportedKind", func(t *testing.T) {
// An MxArray with no oneof set hits the default branch.
_, err := NativeArray(&pb.MxArray{})
if err == nil {
t.Fatal("expected an error for an MxArray with no values set")
}
if !strings.Contains(err.Error(), "unsupported array value kind") {
t.Fatalf("unexpected error text: %v", err)
}
})
}
// TestNativeValueUnsupportedKind covers the default branch of NativeValue.
func TestNativeValueUnsupportedKind(t *testing.T) {
// An MxValue with no oneof Kind set and IsNull false hits the default.
_, err := NativeValue(&pb.MxValue{})
if err == nil {
t.Fatal("expected an error for an MxValue with no kind set")
}
if !strings.Contains(err.Error(), "unsupported value kind") {
t.Fatalf("unexpected error text: %v", err)
}
}
// --- Client.Go-005: dial migration -----------------------------------------
// TestDialFailsFastWhenGatewayUnreachable confirms that after the migration to
// grpc.NewClient the DialTimeout-bounded readiness probe still fails fast (and
// wraps the failure in *GatewayError) when the gateway cannot be reached.
func TestDialFailsFastWhenGatewayUnreachable(t *testing.T) {
dialer := func(ctx context.Context, _ string) (net.Conn, error) {
return nil, errors.New("connection refused")
}
start := time.Now()
client, err := Dial(context.Background(), Options{
Endpoint: "passthrough:///unreachable",
APIKey: "k",
Plaintext: true,
DialTimeout: 500 * time.Millisecond,
DialOptions: []grpc.DialOption{grpc.WithContextDialer(dialer)},
})
elapsed := time.Since(start)
if err == nil {
client.Close()
t.Fatal("expected Dial to fail for an unreachable gateway")
}
var gwErr *GatewayError
if !errors.As(err, &gwErr) || gwErr.Op != "dial" {
t.Fatalf("expected a *GatewayError with Op=dial, got %#v", err)
}
if elapsed > 5*time.Second {
t.Fatalf("Dial did not honor DialTimeout: took %v", elapsed)
}
}
// TestDialReadinessProbeReachesReady confirms the readiness probe succeeds
// against a live (bufconn) gateway, i.e. the lazy grpc.NewClient connection is
// driven to Ready before Dial returns.
func TestDialReadinessProbeReachesReady(t *testing.T) {
client, cleanup := newBufconnClient(t, &fakeGatewayServer{
openReply: &pb.OpenSessionReply{},
})
defer cleanup()
if client == nil {
t.Fatal("expected a connected client")
}
}
// --- Client.Go-006: error taxonomy ----------------------------------------
// TestGatewayErrorCode confirms GatewayError.Code surfaces the wrapped gRPC
// status code without the caller unwrapping it.
func TestGatewayErrorCode(t *testing.T) {
var nilErr *GatewayError
if got := nilErr.Code(); got != codes.OK {
t.Fatalf("nil GatewayError.Code() = %v, want OK", got)
}
gwErr := &GatewayError{Op: "invoke", Err: status.Error(codes.Unavailable, "down")}
if got := gwErr.Code(); got != codes.Unavailable {
t.Fatalf("GatewayError.Code() = %v, want Unavailable", got)
}
plain := &GatewayError{Op: "dial", Err: errors.New("boom")}
if got := plain.Code(); got != codes.Unknown {
t.Fatalf("GatewayError.Code() for a non-status error = %v, want Unknown", got)
}
}
// TestIsTransient verifies the transient/permanent classification including
// the unwrap-through-GatewayError path.
func TestIsTransient(t *testing.T) {
tests := []struct {
name string
err error
want bool
}{
{name: "nil", err: nil, want: false},
{name: "unavailable wrapped", err: &GatewayError{Op: "invoke", Err: status.Error(codes.Unavailable, "x")}, want: true},
{name: "deadline wrapped", err: &GatewayError{Op: "invoke", Err: status.Error(codes.DeadlineExceeded, "x")}, want: true},
{name: "resource exhausted", err: &GatewayError{Err: status.Error(codes.ResourceExhausted, "x")}, want: true},
{name: "unauthenticated permanent", err: &GatewayError{Err: status.Error(codes.Unauthenticated, "x")}, want: false},
{name: "invalid argument permanent", err: &GatewayError{Err: status.Error(codes.InvalidArgument, "x")}, want: false},
{name: "bare status unavailable", err: status.Error(codes.Unavailable, "x"), want: true},
{name: "plain error", err: errors.New("nope"), want: false},
}
for _, tt := range tests {
t.Run(tt.name, func(t *testing.T) {
if got := IsTransient(tt.err); got != tt.want {
t.Fatalf("IsTransient(%v) = %v, want %v", tt.err, got, tt.want)
}
})
}
}
// --- Client.Go-007: correlation id fallback --------------------------------
// TestNewCorrelationIDUsesRandEntropy confirms the happy path yields a
// 32-hex-character id.
func TestNewCorrelationIDUsesRandEntropy(t *testing.T) {
id := newCorrelationID()
if len(id) != 32 {
t.Fatalf("expected a 32-char hex id, got %q (len %d)", id, len(id))
}
}
// TestNewCorrelationIDFallsBackOnRandFailure reproduces Client.Go-007: when
// crypto/rand fails, newCorrelationID must not return an empty string but a
// unique, non-empty fallback id so the command stays traceable.
func TestNewCorrelationIDFallsBackOnRandFailure(t *testing.T) {
original := randRead
randRead = func([]byte) (int, error) { return 0, errors.New("entropy unavailable") }
defer func() { randRead = original }()
first := newCorrelationID()
second := newCorrelationID()
if first == "" || second == "" {
t.Fatal("newCorrelationID returned an empty id on rand failure")
}
if !strings.HasPrefix(first, "fallback-") {
t.Fatalf("expected a fallback- prefixed id, got %q", first)
}
if first == second {
t.Fatalf("fallback correlation ids must be unique, got %q twice", first)
}
}
+55 -1
View File
@@ -1,11 +1,22 @@
package mxgateway
import (
"errors"
"fmt"
pb "gitea.dohertylan.com/dohertj2/mxaccessgw/clients/go/internal/generated"
"google.golang.org/grpc/codes"
"google.golang.org/grpc/status"
)
// ErrEventBufferOverflow is the terminal error delivered on the compatibility
// event channel returned by Session.Events / Session.EventsAfter when a slow
// consumer lets the bounded result buffer fill. It signals that the stream was
// cancelled and events were dropped, so a consumer can tell an overflow apart
// from a normal end-of-stream. Use Session.SubscribeEvents to block instead of
// dropping.
var ErrEventBufferOverflow = errors.New("mxgateway: event buffer overflow; compatibility stream cancelled and events dropped")
// GatewayError wraps transport-level gRPC failures.
type GatewayError struct {
// Op names the operation that failed (for example "dial" or "invoke").
@@ -33,6 +44,45 @@ func (e *GatewayError) Unwrap() error {
return e.Err
}
// Code returns the gRPC status code of the wrapped transport error. It returns
// codes.OK when the error is nil and codes.Unknown when the wrapped error does
// not carry a gRPC status. Callers can use it to write retry, timeout, and
// auth handling without manually unwrapping and re-parsing the error.
func (e *GatewayError) Code() codes.Code {
if e == nil || e.Err == nil {
return codes.OK
}
return status.Code(e.Err)
}
// IsTransient reports whether err is a transport failure that may succeed on
// retry — for example a gateway that is briefly Unavailable or a call that
// hit a DeadlineExceeded. Permanent failures (Unauthenticated, PermissionDenied,
// InvalidArgument, NotFound, and similar) return false. It unwraps through
// *GatewayError and any other error chain carrying a gRPC status, so callers
// do not need to call status.Code themselves.
func IsTransient(err error) bool {
if err == nil {
return false
}
switch transientCode(err) {
case codes.Unavailable, codes.DeadlineExceeded, codes.ResourceExhausted, codes.Aborted:
return true
default:
return false
}
}
// transientCode extracts a gRPC status code from err, preferring a wrapped
// *GatewayError's Code and otherwise falling back to status.Code on the chain.
func transientCode(err error) codes.Code {
var gatewayErr *GatewayError
if errors.As(err, &gatewayErr) {
return gatewayErr.Code()
}
return status.Code(err)
}
// CommandError reports a non-OK gateway protocol status and keeps the raw
// command reply when one exists.
type CommandError struct {
@@ -85,8 +135,12 @@ func (e *MxAccessError) Error() string {
}
// Unwrap returns the wrapped CommandError, when one is present.
//
// When Command is nil (the HRESULT / MxStatusProxy path) it returns an
// untyped nil rather than a typed-nil *CommandError, so errors.As does not
// bind a nil pointer that a caller would then panic on.
func (e *MxAccessError) Unwrap() error {
if e == nil {
if e == nil || e.Command == nil {
return nil
}
return e.Command
+42
View File
@@ -0,0 +1,42 @@
package mxgateway
import (
"errors"
"testing"
)
// TestMxAccessErrorUnwrapHResultPathNoTypedNilCommandError reproduces
// Client.Go-001: an MxAccessError built via the HRESULT / MxStatusProxy path
// leaves Command nil. Unwrap must not hand back a typed-nil *CommandError,
// because errors.As would then succeed while binding a nil pointer and a
// caller dereferencing it would panic.
func TestMxAccessErrorUnwrapHResultPathNoTypedNilCommandError(t *testing.T) {
hresult := int32(-2147467259) // 0x80004005, a failing HRESULT.
reply := &MxCommandReply{Hresult: &hresult}
err := EnsureMxAccessSuccess("invoke", reply)
if err == nil {
t.Fatal("expected MxAccessError for a failing HRESULT, got nil")
}
var ce *CommandError
if errors.As(err, &ce) {
t.Fatalf("errors.As bound *CommandError from an HRESULT-only MxAccessError (ce=%v); "+
"a caller dereferencing ce.Status would panic", ce)
}
}
// TestMxAccessErrorUnwrapPopulatedCommand confirms the non-nil Command path
// still unwraps to the wrapped *CommandError.
func TestMxAccessErrorUnwrapPopulatedCommand(t *testing.T) {
command := &CommandError{Op: "invoke"}
err := &MxAccessError{Command: command}
var ce *CommandError
if !errors.As(err, &ce) {
t.Fatal("errors.As failed to bind the populated *CommandError")
}
if ce != command {
t.Fatalf("errors.As bound an unexpected *CommandError: got %v want %v", ce, command)
}
}
+3 -43
View File
@@ -2,7 +2,6 @@ package mxgateway
import (
"context"
"errors"
"io"
"time"
@@ -56,39 +55,13 @@ type GalaxyClient struct {
// DialGalaxy opens a gRPC connection to the gateway for the Galaxy Repository
// service. It applies the same authentication metadata, transport security,
// and dial-timeout behavior as Dial.
// lazy connection, and DialTimeout-bounded readiness probe as Dial.
func DialGalaxy(ctx context.Context, opts Options) (*GalaxyClient, error) {
if opts.Endpoint == "" {
return nil, errors.New("mxgateway: endpoint is required")
}
dialCtx := ctx
cancel := func() {}
if opts.DialTimeout > 0 {
dialCtx, cancel = context.WithTimeout(ctx, opts.DialTimeout)
} else if _, ok := ctx.Deadline(); !ok {
dialCtx, cancel = context.WithTimeout(ctx, defaultDialTimeout)
}
defer cancel()
transportCredentials, err := resolveTransportCredentials(opts)
conn, err := dial(ctx, opts)
if err != nil {
return nil, err
}
dialOptions := []grpc.DialOption{
grpc.WithTransportCredentials(transportCredentials),
grpc.WithUnaryInterceptor(unaryAuthInterceptor(opts.APIKey)),
grpc.WithStreamInterceptor(streamAuthInterceptor(opts.APIKey)),
grpc.WithBlock(),
}
dialOptions = append(dialOptions, opts.DialOptions...)
conn, err := grpc.DialContext(dialCtx, opts.Endpoint, dialOptions...)
if err != nil {
return nil, &GatewayError{Op: "dial", Err: err}
}
return NewGalaxyClient(conn, opts), nil
}
@@ -239,18 +212,5 @@ func (c *GalaxyClient) Close() error {
}
func (c *GalaxyClient) callContext(ctx context.Context) (context.Context, context.CancelFunc) {
timeout := c.opts.CallTimeout
if timeout == 0 {
timeout = defaultCallTimeout
}
if timeout < 0 {
return ctx, func() {}
}
if deadline, ok := ctx.Deadline(); ok {
timeoutDeadline := time.Now().Add(timeout)
if deadline.Before(timeoutDeadline) {
return ctx, func() {}
}
}
return context.WithTimeout(ctx, timeout)
return callContext(ctx, c.opts.CallTimeout)
}
+5 -3
View File
@@ -55,8 +55,8 @@ func TestGalaxyGetLastDeployTimeReturnsTimestampWhenPresent(t *testing.T) {
want := time.Date(2026, 4, 28, 12, 34, 56, 0, time.UTC)
fake := &fakeGalaxyServer{
deployReply: &pb.GetLastDeployTimeReply{
Present: true,
TimeOfLastDeploy: timestamppb.New(want),
Present: true,
TimeOfLastDeploy: timestamppb.New(want),
},
}
client, cleanup := newGalaxyBufconnClient(t, fake)
@@ -348,8 +348,10 @@ func newGalaxyBufconnClient(t *testing.T, fake *fakeGalaxyServer) (*GalaxyClient
dialer := func(ctx context.Context, _ string) (net.Conn, error) {
return listener.DialContext(ctx)
}
// grpc.NewClient defaults to the dns scheme; use passthrough so the
// bufconn fake target reaches the context dialer unresolved.
client, err := DialGalaxy(context.Background(), Options{
Endpoint: "bufnet",
Endpoint: "passthrough:///bufnet",
APIKey: "test-api-key",
Plaintext: true,
DialOptions: []grpc.DialOption{
+44 -3
View File
@@ -8,6 +8,8 @@ import (
"fmt"
"io"
"sync"
"sync/atomic"
"time"
pb "gitea.dohertylan.com/dohertj2/mxaccessgw/clients/go/internal/generated"
"google.golang.org/grpc/codes"
@@ -490,7 +492,7 @@ func ensureBulkSize(name string, length int) error {
func sendEventResult(
ctx context.Context,
results chan<- EventResult,
results chan EventResult,
result EventResult,
cancelWhenBufferFull bool,
cancel context.CancelFunc,
@@ -502,7 +504,12 @@ func sendEventResult(
case <-ctx.Done():
return false
default:
// The bounded compatibility buffer is full. Cancel the stream and
// deliver an explicit terminal overflow error so a slow consumer
// can tell dropped events apart from a normal end-of-stream,
// rather than seeing the channel close silently.
cancel()
deliverTerminalResult(results, EventResult{Err: ErrEventBufferOverflow})
return false
}
}
@@ -515,6 +522,25 @@ func sendEventResult(
}
}
// deliverTerminalResult places result on a full buffered channel by evicting
// one of the oldest buffered events to make room. The caller closes results
// afterwards, so the terminal result becomes the consumer's last item.
func deliverTerminalResult(results chan EventResult, result EventResult) {
for {
select {
case results <- result:
return
default:
}
select {
case <-results:
default:
// Another receiver drained the channel between the send and
// receive attempts; retry the send.
}
}
}
func (s *Session) invokeCommand(ctx context.Context, command *MxCommand) (*MxCommandReply, error) {
return s.client.Invoke(ctx, &pb.MxCommandRequest{
SessionId: s.ID(),
@@ -523,10 +549,25 @@ func (s *Session) invokeCommand(ctx context.Context, command *MxCommand) (*MxCom
})
}
// correlationIDCounter backs the deterministic fallback id used when
// crypto/rand is unavailable, so every command still carries a unique,
// traceable correlation id.
var correlationIDCounter atomic.Uint64
// randRead is the entropy source for newCorrelationID. It is a package
// variable solely so tests can simulate a crypto/rand failure.
var randRead = rand.Read
// newCorrelationID returns a unique correlation id for an MxCommandRequest.
// It prefers 16 bytes of crypto/rand entropy; if rand.Read fails (rare) it
// falls back to a "fallback-" prefixed id built from the current time and a
// process-wide monotonic counter rather than returning an empty string, which
// would leave the command untraceable in gateway logs.
func newCorrelationID() string {
var buffer [16]byte
if _, err := rand.Read(buffer[:]); err != nil {
return ""
if _, err := randRead(buffer[:]); err != nil {
return fmt.Sprintf("fallback-%x-%x",
time.Now().UnixNano(), correlationIDCounter.Add(1))
}
return hex.EncodeToString(buffer[:])
}
+28 -1
View File
@@ -62,10 +62,37 @@ underlying protobuf messages. `MxGatewayCommandException` and
`MxAccessException` preserve the raw `MxCommandReply` when the gateway returns a
data-bearing MXAccess failure.
`openSession` verifies the gateway's reported `gateway_protocol_version` against
the version this client was generated for and throws `MxGatewayException` on a
mismatch, so an incompatible client fails fast with a clear message instead of
issuing commands that fail downstream. A gateway that does not populate the
field is accepted unchanged.
`MxGatewaySession` implements `AutoCloseable`. The try-with-resources `close()`
performs a `CloseSession` network RPC but swallows (and logs) any failure of
that RPC so a close-time error never replaces the exception a try-with-resources
body is already propagating. Call `closeRaw()` explicitly when you need to
observe the close result or handle a close-time failure.
`MxGatewayClient` and `GalaxyRepositoryClient` implement `AutoCloseable`. For a
client that owns its channel (built with `connect`), the try-with-resources
`close()` shuts the channel down and waits up to the configured connect timeout
for termination, forcibly shutting it down on timeout, so in-flight calls and
Netty event-loop threads are not left running after the block exits. If the
calling thread is interrupted while waiting, the channel is forcibly shut down
and the interrupt flag is restored. `closeAndAwaitTermination()` does the same
but throws `InterruptedException` for callers that want a checked,
blocking-aware shutdown. `close()` is a no-op for a caller-managed channel.
`MxEventStream` implements `Iterator<MxEvent>` and `AutoCloseable`. Closing it
cancels the underlying gRPC stream. Canceling or timing out a Java client call
only stops the client from waiting; it does not abort an in-flight MXAccess COM
call on the worker STA.
call on the worker STA. The event stream uses gRPC's default auto-inbound flow
control with a fixed 16-element buffer and no client-side flow control: this is
the gateway's documented fail-fast event-backpressure model, so a consumer that
stalls long enough to fill the buffer triggers an overflow that cancels the
subscription and surfaces an `MxGatewayException` from the next `next()` call.
Drain events promptly and be prepared to resubscribe with a resume cursor.
## Galaxy Repository Browse
@@ -661,33 +661,60 @@ public final class MxGatewayCli implements Callable<Integer> {
@Option(names = "--timeout", defaultValue = "30s", description = "Per-call timeout.")
String timeout;
private String resolvedApiKey = "";
private Duration resolvedTimeout = Duration.ofSeconds(30);
/**
* Returns this options object unchanged.
*
* <p>Retained as a no-op for call sites that read more naturally as
* {@code common.resolved()}. Resolution of the API key and timeout is
* computed lazily on demand by {@link #resolvedApiKey()} and
* {@link #resolvedTimeout()}, so {@link #toClientOptions()} and
* {@link #redactedJsonMap()} produce correct output regardless of
* whether this method was ever called.
*
* @return this options object
*/
CommonOptions resolved() {
resolvedApiKey = apiKey == null || apiKey.isBlank() ? System.getenv(apiKeyEnv) : apiKey;
if (resolvedApiKey == null) {
resolvedApiKey = "";
}
resolvedTimeout = parseDuration(timeout);
return this;
}
/**
* Resolves the effective API key: the explicit {@code --api-key} value
* when non-blank, otherwise the value of the {@code --api-key-env}
* environment variable, otherwise an empty string. Computed on each
* call so there is no stale cached state.
*
* @return the resolved API key, never {@code null}
*/
String resolvedApiKey() {
String resolved = apiKey == null || apiKey.isBlank() ? System.getenv(apiKeyEnv) : apiKey;
return resolved == null ? "" : resolved;
}
/**
* Resolves the effective per-call timeout from the {@code --timeout}
* option. Computed on each call so there is no stale cached state.
*
* @return the resolved call timeout
*/
Duration resolvedTimeout() {
return parseDuration(timeout);
}
MxGatewayClientOptions toClientOptions() {
return MxGatewayClientOptions.builder()
.endpoint(endpoint)
.apiKey(resolvedApiKey)
.apiKey(resolvedApiKey())
.plaintext(plaintext)
.caCertificatePath(caFile)
.serverNameOverride(serverNameOverride)
.callTimeout(resolvedTimeout)
.callTimeout(resolvedTimeout())
.build();
}
Map<String, Object> redactedJsonMap() {
Map<String, Object> values = new LinkedHashMap<>();
values.put("endpoint", endpoint);
values.put("apiKey", MxGatewaySecrets.redactApiKey(resolvedApiKey));
values.put("apiKey", MxGatewaySecrets.redactApiKey(resolvedApiKey()));
values.put("apiKeyEnv", apiKeyEnv);
values.put("plaintext", plaintext);
values.put("caFile", caFile == null ? "" : caFile.toString());
@@ -62,8 +62,10 @@ final class MxGatewayCliTests {
assertEquals(0, run.exitCode());
assertTrue(run.output().contains("\"command\":\"open-session\""));
assertTrue(run.output().contains("\"sessionId\":\"session-cli\""));
assertTrue(run.output().contains("mxgw***********cret"));
// Only the non-secret mxgw_<key-id>_ prefix survives; the secret is fully masked.
assertTrue(run.output().contains("mxgw_visible_***"));
assertFalse(run.output().contains("visible_secret"));
assertFalse(run.output().contains("cret"));
}
@Test
@@ -1,8 +1,5 @@
package com.dohertylan.mxgateway.client;
import com.google.common.util.concurrent.FutureCallback;
import com.google.common.util.concurrent.Futures;
import com.google.common.util.concurrent.MoreExecutors;
import galaxy_repository.v1.GalaxyRepositoryGrpc;
import galaxy_repository.v1.GalaxyRepositoryOuterClass.DeployEvent;
import galaxy_repository.v1.GalaxyRepositoryOuterClass.DiscoverHierarchyReply;
@@ -17,8 +14,6 @@ import com.google.protobuf.Timestamp;
import io.grpc.Channel;
import io.grpc.ClientInterceptors;
import io.grpc.ManagedChannel;
import io.grpc.netty.shaded.io.grpc.netty.GrpcSslContexts;
import io.grpc.netty.shaded.io.grpc.netty.NettyChannelBuilder;
import io.grpc.stub.StreamObserver;
import java.time.Instant;
import java.util.Iterator;
@@ -27,7 +22,6 @@ import java.util.Objects;
import java.util.Optional;
import java.util.concurrent.CompletableFuture;
import java.util.concurrent.TimeUnit;
import javax.net.ssl.SSLException;
/**
* Thin wrapper around the generated {@link GalaxyRepositoryGrpc} stubs that
@@ -78,7 +72,8 @@ public final class GalaxyRepositoryClient implements AutoCloseable {
* @return a connected client
*/
public static GalaxyRepositoryClient connect(MxGatewayClientOptions options) {
return new GalaxyRepositoryClient(createChannel(options), options);
return new GalaxyRepositoryClient(
MxGatewayChannels.createChannel(options, "failed to configure galaxy repository TLS"), options);
}
/**
@@ -87,7 +82,7 @@ public final class GalaxyRepositoryClient implements AutoCloseable {
* @return the blocking stub
*/
public GalaxyRepositoryGrpc.GalaxyRepositoryBlockingStub rawBlockingStub() {
return withDeadline(blockingStub);
return MxGatewayChannels.withDeadline(blockingStub, options);
}
/**
@@ -96,7 +91,7 @@ public final class GalaxyRepositoryClient implements AutoCloseable {
* @return the future stub
*/
public GalaxyRepositoryGrpc.GalaxyRepositoryFutureStub rawFutureStub() {
return withDeadline(futureStub);
return MxGatewayChannels.withDeadline(futureStub, options);
}
/**
@@ -133,7 +128,9 @@ public final class GalaxyRepositoryClient implements AutoCloseable {
* exceptionally with {@link MxGatewayException} on failure
*/
public CompletableFuture<Boolean> testConnectionAsync() {
return toCompletable(rawFutureStub().testConnection(TestConnectionRequest.getDefaultInstance()))
return MxGatewayChannels.toCompletable(
rawFutureStub().testConnection(TestConnectionRequest.getDefaultInstance()),
"galaxy test connection")
.thenApply(TestConnectionReply::getOk);
}
@@ -165,8 +162,11 @@ public final class GalaxyRepositoryClient implements AutoCloseable {
* completed exceptionally with {@link MxGatewayException} on failure
*/
public CompletableFuture<Optional<Instant>> getLastDeployTimeAsync() {
return toCompletable(rawFutureStub().getLastDeployTime(GetLastDeployTimeRequest.getDefaultInstance()))
.thenApply(GalaxyRepositoryClient::mapDeployTime);
return MxGatewayChannels.toCompletable(
rawFutureStub().getLastDeployTime(GetLastDeployTimeRequest.getDefaultInstance()),
"galaxy get last deploy time")
.thenApply(MxGatewayChannels.normalisingValidator(
"galaxy get last deploy time", GalaxyRepositoryClient::mapDeployTime));
}
/**
@@ -224,7 +224,8 @@ public final class GalaxyRepositoryClient implements AutoCloseable {
*/
public DeployEventStream watchDeployEvents(Instant lastSeenDeployTime) {
DeployEventStream stream = new DeployEventStream(16);
withStreamDeadline(rawAsyncStub()).watchDeployEvents(buildWatchRequest(lastSeenDeployTime), stream.observer());
MxGatewayChannels.withStreamDeadline(rawAsyncStub(), options)
.watchDeployEvents(buildWatchRequest(lastSeenDeployTime), stream.observer());
return stream;
}
@@ -253,7 +254,7 @@ public final class GalaxyRepositoryClient implements AutoCloseable {
Instant lastSeenDeployTime, StreamObserver<DeployEvent> observer) {
Objects.requireNonNull(observer, "observer");
DeployEventSubscription subscription = new DeployEventSubscription();
withStreamDeadline(rawAsyncStub())
MxGatewayChannels.withStreamDeadline(rawAsyncStub(), options)
.watchDeployEvents(buildWatchRequest(lastSeenDeployTime), subscription.wrap(observer));
return subscription;
}
@@ -269,17 +270,31 @@ public final class GalaxyRepositoryClient implements AutoCloseable {
return builder.build();
}
private <T extends io.grpc.stub.AbstractStub<T>> T withStreamDeadline(T stub) {
if (options.streamTimeout() == null || options.streamTimeout().isNegative()) {
return stub;
}
return stub.withDeadlineAfter(options.streamTimeout().toNanos(), TimeUnit.NANOSECONDS);
}
/**
* Shuts the owned channel down and awaits termination so try-with-resources
* callers do not leave in-flight calls or Netty event-loop threads running
* after the block exits.
*
* <p>Waits up to the configured connect timeout for graceful termination
* and forcibly shuts the channel down on timeout. If the calling thread is
* interrupted while waiting, the channel is forcibly shut down and the
* thread's interrupt flag is restored. No-op for clients that do not own
* their channel. For an explicitly checked, blocking-aware shutdown call
* {@link #closeAndAwaitTermination()}.
*/
@Override
public void close() {
if (ownedChannel != null) {
ownedChannel.shutdown();
if (ownedChannel == null) {
return;
}
ownedChannel.shutdown();
try {
if (!ownedChannel.awaitTermination(options.connectTimeout().toMillis(), TimeUnit.MILLISECONDS)) {
ownedChannel.shutdownNow();
}
} catch (InterruptedException error) {
ownedChannel.shutdownNow();
Thread.currentThread().interrupt();
}
}
@@ -307,86 +322,26 @@ public final class GalaxyRepositoryClient implements AutoCloseable {
return Optional.of(Instant.ofEpochSecond(ts.getSeconds(), ts.getNanos()));
}
private static ManagedChannel createChannel(MxGatewayClientOptions options) {
NettyChannelBuilder builder = NettyChannelBuilder.forTarget(options.endpoint())
.maxInboundMessageSize(options.maxGrpcMessageBytes());
if (!options.connectTimeout().isNegative()) {
builder.withOption(
io.grpc.netty.shaded.io.netty.channel.ChannelOption.CONNECT_TIMEOUT_MILLIS,
Math.toIntExact(options.connectTimeout().toMillis()));
}
if (options.plaintext()) {
builder.usePlaintext();
} else if (options.caCertificatePath() != null) {
try {
builder.sslContext(GrpcSslContexts.forClient()
.trustManager(options.caCertificatePath().toFile())
.build());
} catch (SSLException error) {
throw new MxGatewayException("failed to configure galaxy repository TLS", error);
}
} else {
builder.useTransportSecurity();
}
if (!options.serverNameOverride().isBlank()) {
builder.overrideAuthority(options.serverNameOverride());
}
return builder.build();
}
private <T extends io.grpc.stub.AbstractStub<T>> T withDeadline(T stub) {
if (options.callTimeout().isNegative()) {
return stub;
}
return stub.withDeadlineAfter(options.callTimeout().toNanos(), TimeUnit.NANOSECONDS);
}
private CompletableFuture<List<GalaxyObject>> discoverHierarchyPageAsync(
String pageToken, java.util.ArrayList<GalaxyObject> objects, java.util.HashSet<String> seenPageTokens) {
DiscoverHierarchyRequest request = DiscoverHierarchyRequest.newBuilder()
.setPageSize(DISCOVER_HIERARCHY_PAGE_SIZE)
.setPageToken(pageToken)
.build();
return toCompletable(rawFutureStub().discoverHierarchy(request)).thenCompose(reply -> {
objects.addAll(reply.getObjectsList());
if (reply.getNextPageToken().isBlank()) {
return CompletableFuture.completedFuture(objects);
}
if (!seenPageTokens.add(reply.getNextPageToken())) {
CompletableFuture<List<GalaxyObject>> failed = new CompletableFuture<>();
failed.completeExceptionally(new MxGatewayException(
"galaxy discover hierarchy returned repeated page token: " + reply.getNextPageToken()));
return failed;
}
return discoverHierarchyPageAsync(reply.getNextPageToken(), objects, seenPageTokens);
});
}
private static <T> CompletableFuture<T> toCompletable(com.google.common.util.concurrent.ListenableFuture<T> source) {
CompletableFuture<T> target = new CompletableFuture<>();
Futures.addCallback(
source,
new FutureCallback<>() {
@Override
public void onSuccess(T result) {
target.complete(result);
return MxGatewayChannels.toCompletable(rawFutureStub().discoverHierarchy(request), "galaxy discover hierarchy")
.thenCompose(reply -> {
objects.addAll(reply.getObjectsList());
if (reply.getNextPageToken().isBlank()) {
return CompletableFuture.completedFuture(objects);
}
@Override
public void onFailure(Throwable error) {
if (error instanceof RuntimeException runtimeException) {
target.completeExceptionally(MxGatewayErrors.fromGrpc("galaxy async call", runtimeException));
return;
}
target.completeExceptionally(error);
if (!seenPageTokens.add(reply.getNextPageToken())) {
CompletableFuture<List<GalaxyObject>> failed = new CompletableFuture<>();
failed.completeExceptionally(new MxGatewayException(
"galaxy discover hierarchy returned repeated page token: "
+ reply.getNextPageToken()));
return failed;
}
},
MoreExecutors.directExecutor());
target.whenComplete((ignoredResult, ignoredError) -> {
if (target.isCancelled()) {
source.cancel(true);
}
});
return target;
return discoverHierarchyPageAsync(reply.getNextPageToken(), objects, seenPageTokens);
});
}
}
@@ -21,13 +21,35 @@ import mxaccess_gateway.v1.MxaccessGateway.StreamEventsRequest;
* stream cancels the underlying gRPC call. If the queue overflows the call is
* cancelled and a follow-up call to {@link #next()} throws
* {@link MxGatewayException}.
*
* <p><strong>Backpressure (fail-fast):</strong> this adaptor relies on gRPC's
* default auto-inbound flow control the async stub auto-requests messages, so
* the gateway can push events faster than the consumer drains the bounded
* 16-element buffer. There is intentionally <em>no</em> real client flow
* control: a consumer that stalls long enough to let the buffer fill triggers
* an immediate overflow that cancels the subscription and surfaces an
* {@link MxGatewayException} on the next {@link #next()} call. This matches the
* gateway's documented fail-fast event-backpressure design a slow consumer
* loses its subscription rather than silently dropping events. Consumers that
* cannot keep up must drain {@link #next()} promptly (e.g. hand events to their
* own larger queue) and be prepared to resubscribe with a resume cursor.
*
* <p><strong>Threading:</strong> the iterator methods ({@link #hasNext()} and
* {@link #next()}) are <em>not</em> thread-safe and must be driven by a single
* consumer thread. {@link #close()} may be called from any thread. Terminal
* state transitions (queue overflow, server completion, and {@code close()})
* are serialised so that the first terminal condition wins deterministically:
* once an overflow exception has been observed it is never silently replaced
* by an end-of-stream marker.
*/
public final class MxEventStream implements Iterator<MxEvent>, AutoCloseable {
private static final Object END = new Object();
private final BlockingQueue<Object> queue;
private final Object terminalLock = new Object();
private volatile ClientCallStreamObserver<StreamEventsRequest> requestStream;
private volatile boolean closed;
private boolean terminated;
private Object next;
MxEventStream(int capacity) {
@@ -98,7 +120,7 @@ public final class MxEventStream implements Iterator<MxEvent>, AutoCloseable {
if (stream != null) {
stream.cancel("client cancelled event stream", null);
}
offer(END);
terminate(null);
}
private Object take() {
@@ -115,10 +137,7 @@ public final class MxEventStream implements Iterator<MxEvent>, AutoCloseable {
private void offer(Object value) {
Objects.requireNonNull(value, "value");
if (value == END) {
if (!queue.offer(value)) {
queue.clear();
queue.offer(value);
}
terminate(null);
return;
}
if (!queue.offer(value)) {
@@ -126,9 +145,38 @@ public final class MxEventStream implements Iterator<MxEvent>, AutoCloseable {
if (stream != null) {
stream.cancel("client event stream queue overflowed", null);
}
queue.clear();
queue.offer(new MxGatewayException("gateway stream events queue overflowed"));
queue.offer(END);
terminate(new MxGatewayException("gateway stream events queue overflowed"));
}
}
/**
* Drives the single terminal transition. The first caller wins: a later
* end-of-stream or {@code close()} cannot overwrite or discard an overflow
* exception that has already been published to the consumer.
*
* @param fault the fault to surface to the consumer, or {@code null} for a
* clean end-of-stream
*/
private void terminate(MxGatewayException fault) {
synchronized (terminalLock) {
if (terminated) {
return;
}
terminated = true;
if (fault != null) {
// Make room for the fault marker; the consumer only needs the
// terminal signal, queued data events are no longer relevant.
queue.clear();
queue.offer(fault);
queue.offer(END);
return;
}
// Clean end-of-stream: ensure the END marker is delivered even when
// the queue is currently full of undrained data events.
if (!queue.offer(END)) {
queue.clear();
queue.offer(END);
}
}
}
}
@@ -0,0 +1,164 @@
package com.dohertylan.mxgateway.client;
import com.google.common.util.concurrent.FutureCallback;
import com.google.common.util.concurrent.Futures;
import com.google.common.util.concurrent.ListenableFuture;
import com.google.common.util.concurrent.MoreExecutors;
import io.grpc.ManagedChannel;
import io.grpc.netty.shaded.io.grpc.netty.GrpcSslContexts;
import io.grpc.netty.shaded.io.grpc.netty.NettyChannelBuilder;
import io.grpc.stub.AbstractStub;
import java.util.concurrent.CompletableFuture;
import java.util.concurrent.TimeUnit;
import java.util.function.Function;
import javax.net.ssl.SSLException;
/**
* Shared channel-builder and future-adaptor helpers used by both
* {@link MxGatewayClient} and {@link GalaxyRepositoryClient}.
*
* <p>Extracted so transport construction, per-call deadlines, and the
* {@link ListenableFuture}-to-{@link CompletableFuture} bridge live in one
* place instead of being duplicated verbatim across the two clients.
*/
final class MxGatewayChannels {
private MxGatewayChannels() {
}
/**
* Builds a Netty managed channel from the supplied options, applying the
* connect timeout, message-size limit, and the configured transport
* security mode (plaintext, custom CA trust, or system trust).
*
* @param options the client options carrying endpoint and transport config
* @param tlsErrorPrefix a human-readable prefix for the {@link MxGatewayException}
* thrown when a custom CA certificate cannot be loaded
* @return a new managed channel; the caller owns its lifecycle
*/
static ManagedChannel createChannel(MxGatewayClientOptions options, String tlsErrorPrefix) {
NettyChannelBuilder builder = NettyChannelBuilder.forTarget(options.endpoint())
.maxInboundMessageSize(options.maxGrpcMessageBytes());
if (!options.connectTimeout().isNegative()) {
builder.withOption(
io.grpc.netty.shaded.io.netty.channel.ChannelOption.CONNECT_TIMEOUT_MILLIS,
Math.toIntExact(options.connectTimeout().toMillis()));
}
if (options.plaintext()) {
builder.usePlaintext();
} else if (options.caCertificatePath() != null) {
try {
builder.sslContext(GrpcSslContexts.forClient()
.trustManager(options.caCertificatePath().toFile())
.build());
} catch (SSLException | RuntimeException error) {
// SSLException covers handshake-context failures; RuntimeException
// (IllegalArgumentException wrapping CertificateException) covers a
// missing or unreadable CA file. Either way callers see one typed
// failure instead of a raw, unwrapped exception leaking out.
throw new MxGatewayException(tlsErrorPrefix, error);
}
} else {
builder.useTransportSecurity();
}
if (!options.serverNameOverride().isBlank()) {
builder.overrideAuthority(options.serverNameOverride());
}
return builder.build();
}
/**
* Applies the configured per-call deadline to a unary stub.
*
* @param stub the stub to decorate
* @param options the client options carrying the call timeout
* @param <T> the concrete stub type
* @return the stub with the call deadline applied, or the stub unchanged
* when the call timeout is negative (disabled)
*/
static <T extends AbstractStub<T>> T withDeadline(T stub, MxGatewayClientOptions options) {
if (options.callTimeout().isNegative()) {
return stub;
}
return stub.withDeadlineAfter(options.callTimeout().toNanos(), TimeUnit.NANOSECONDS);
}
/**
* Applies the configured streaming deadline to a streaming stub.
*
* @param stub the stub to decorate
* @param options the client options carrying the stream timeout
* @param <T> the concrete stub type
* @return the stub with the stream deadline applied, or the stub unchanged
* when the stream timeout is unset or negative (disabled)
*/
static <T extends AbstractStub<T>> T withStreamDeadline(T stub, MxGatewayClientOptions options) {
if (options.streamTimeout() == null || options.streamTimeout().isNegative()) {
return stub;
}
return stub.withDeadlineAfter(options.streamTimeout().toNanos(), TimeUnit.NANOSECONDS);
}
/**
* Bridges a Guava {@link ListenableFuture} to a {@link CompletableFuture},
* normalising any failure through {@link MxGatewayErrors#fromGrpc} so the
* async error surface matches the synchronous methods. Cancelling the
* returned future cancels the source RPC.
*
* @param source the gRPC future-stub result
* @param operation the operation name used in normalised error messages
* @param <T> the reply type
* @return a completable future mirroring the source
*/
static <T> CompletableFuture<T> toCompletable(ListenableFuture<T> source, String operation) {
CompletableFuture<T> target = new CompletableFuture<>();
Futures.addCallback(
source,
new FutureCallback<>() {
@Override
public void onSuccess(T result) {
target.complete(result);
}
@Override
public void onFailure(Throwable error) {
if (error instanceof RuntimeException runtimeException) {
target.completeExceptionally(MxGatewayErrors.fromGrpc(operation, runtimeException));
return;
}
target.completeExceptionally(error);
}
},
MoreExecutors.directExecutor());
target.whenComplete((ignoredResult, ignoredError) -> {
if (target.isCancelled()) {
source.cancel(true);
}
});
return target;
}
/**
* Adapts a reply-validating function for use inside {@code thenApply} so
* any non-{@link MxGatewayException} {@link RuntimeException} it raises is
* routed through {@link MxGatewayErrors#fromGrpc}. This keeps the async
* error surface consistent with the synchronous methods, which normalise
* failures with a {@code try/catch}.
*
* @param operation the operation name used in normalised error messages
* @param validator the validating/transforming function applied to the reply
* @param <T> the reply type
* @param <R> the result type
* @return a function suitable for {@link CompletableFuture#thenApply}
*/
static <T, R> Function<T, R> normalisingValidator(String operation, Function<T, R> validator) {
return reply -> {
try {
return validator.apply(reply);
} catch (MxGatewayException error) {
throw error;
} catch (RuntimeException error) {
throw MxGatewayErrors.fromGrpc(operation, error);
}
};
}
}
@@ -1,19 +1,13 @@
package com.dohertylan.mxgateway.client;
import com.google.common.util.concurrent.FutureCallback;
import com.google.common.util.concurrent.Futures;
import com.google.common.util.concurrent.MoreExecutors;
import com.google.protobuf.Duration;
import io.grpc.Channel;
import io.grpc.ClientInterceptors;
import io.grpc.ManagedChannel;
import io.grpc.netty.shaded.io.grpc.netty.GrpcSslContexts;
import io.grpc.netty.shaded.io.grpc.netty.NettyChannelBuilder;
import io.grpc.stub.StreamObserver;
import java.util.Objects;
import java.util.concurrent.CompletableFuture;
import java.util.concurrent.TimeUnit;
import javax.net.ssl.SSLException;
import mxaccess_gateway.v1.MxAccessGatewayGrpc;
import mxaccess_gateway.v1.MxaccessGateway.AcknowledgeAlarmReply;
import mxaccess_gateway.v1.MxaccessGateway.AcknowledgeAlarmRequest;
@@ -79,7 +73,8 @@ public final class MxGatewayClient implements AutoCloseable {
* @return a connected client
*/
public static MxGatewayClient connect(MxGatewayClientOptions options) {
return new MxGatewayClient(createChannel(options), options);
return new MxGatewayClient(
MxGatewayChannels.createChannel(options, "failed to configure gateway TLS"), options);
}
/**
@@ -88,7 +83,7 @@ public final class MxGatewayClient implements AutoCloseable {
* @return the blocking stub
*/
public MxAccessGatewayGrpc.MxAccessGatewayBlockingStub rawBlockingStub() {
return withDeadline(blockingStub);
return MxGatewayChannels.withDeadline(blockingStub, options);
}
/**
@@ -97,7 +92,7 @@ public final class MxGatewayClient implements AutoCloseable {
* @return the future stub
*/
public MxAccessGatewayGrpc.MxAccessGatewayFutureStub rawFutureStub() {
return withDeadline(futureStub);
return MxGatewayChannels.withDeadline(futureStub, options);
}
/**
@@ -150,6 +145,7 @@ public final class MxGatewayClient implements AutoCloseable {
try {
OpenSessionReply reply = rawBlockingStub().openSession(request);
MxGatewayErrors.ensureProtocolSuccess("open session", reply.getProtocolStatus(), null);
ensureGatewayProtocolCompatible(reply);
return reply;
} catch (RuntimeException error) {
if (error instanceof MxGatewayException) {
@@ -159,6 +155,24 @@ public final class MxGatewayClient implements AutoCloseable {
}
}
/**
* Verifies that the gateway speaks the protocol version this client was
* generated against. A gateway that leaves {@code gateway_protocol_version}
* unset (value {@code 0}, e.g. an older gateway) is accepted unchanged.
*
* @param reply the {@code OpenSessionReply} returned by the gateway
* @throws MxGatewayException if the gateway reports an incompatible protocol version
*/
private static void ensureGatewayProtocolCompatible(OpenSessionReply reply) {
int gatewayVersion = reply.getGatewayProtocolVersion();
int clientVersion = MxGatewayClientVersion.gatewayProtocolVersion();
if (gatewayVersion != 0 && gatewayVersion != clientVersion) {
throw new MxGatewayException("gateway protocol version mismatch: gateway reports "
+ gatewayVersion + " but this client was built for " + clientVersion
+ "; upgrade the client or gateway so the protocol versions match");
}
}
/**
* Invokes {@code OpenSession} asynchronously.
*
@@ -167,11 +181,13 @@ public final class MxGatewayClient implements AutoCloseable {
* with {@link MxGatewayException} on failure
*/
public CompletableFuture<OpenSessionReply> openSessionAsync(OpenSessionRequest request) {
CompletableFuture<OpenSessionReply> future = toCompletable(rawFutureStub().openSession(request));
return future.thenApply(reply -> {
CompletableFuture<OpenSessionReply> future =
MxGatewayChannels.toCompletable(rawFutureStub().openSession(request), "open session");
return future.thenApply(MxGatewayChannels.normalisingValidator("open session", reply -> {
MxGatewayErrors.ensureProtocolSuccess("open session", reply.getProtocolStatus(), null);
ensureGatewayProtocolCompatible(reply);
return reply;
});
}));
}
/**
@@ -206,12 +222,13 @@ public final class MxGatewayClient implements AutoCloseable {
* on failure
*/
public CompletableFuture<MxCommandReply> invokeAsync(MxCommandRequest request) {
CompletableFuture<MxCommandReply> future = toCompletable(rawFutureStub().invoke(request));
return future.thenApply(reply -> {
CompletableFuture<MxCommandReply> future =
MxGatewayChannels.toCompletable(rawFutureStub().invoke(request), "invoke");
return future.thenApply(MxGatewayChannels.normalisingValidator("invoke", reply -> {
MxGatewayErrors.ensureProtocolSuccess("invoke", reply.getProtocolStatus(), reply);
MxGatewayErrors.ensureMxAccessSuccess("invoke", reply);
return reply;
});
}));
}
/**
@@ -244,7 +261,7 @@ public final class MxGatewayClient implements AutoCloseable {
*/
public MxEventStream streamEvents(StreamEventsRequest request) {
MxEventStream stream = new MxEventStream(16);
withStreamDeadline(rawAsyncStub()).streamEvents(request, stream.observer());
MxGatewayChannels.withStreamDeadline(rawAsyncStub(), options).streamEvents(request, stream.observer());
return stream;
}
@@ -259,15 +276,17 @@ public final class MxGatewayClient implements AutoCloseable {
public MxGatewayEventSubscription streamEventsAsync(
StreamEventsRequest request, StreamObserver<MxEvent> observer) {
MxGatewayEventSubscription subscription = new MxGatewayEventSubscription();
withStreamDeadline(rawAsyncStub()).streamEvents(request, subscription.wrap(observer));
MxGatewayChannels.withStreamDeadline(rawAsyncStub(), options)
.streamEvents(request, subscription.wrap(observer));
return subscription;
}
/**
* Acknowledges an active MXAccess alarm condition through the gateway.
*
* <p>The gateway authenticates the request against the API key's
* {@code invoke:alarm-ack} scope and forwards the acknowledge to the
* <p>The gateway authorizes this request against the API key's
* {@code admin} scope (the gateway scope resolver maps alarm RPCs to the
* default {@code admin} scope) and forwards the acknowledge to the
* worker's MXAccess session; the resulting native MxStatus is returned
* in the reply. Acks are idempotent at the MxAccess layer.
*
@@ -296,11 +315,12 @@ public final class MxGatewayClient implements AutoCloseable {
* with {@link MxGatewayException} on failure
*/
public CompletableFuture<AcknowledgeAlarmReply> acknowledgeAlarmAsync(AcknowledgeAlarmRequest request) {
CompletableFuture<AcknowledgeAlarmReply> future = toCompletable(rawFutureStub().acknowledgeAlarm(request));
return future.thenApply(reply -> {
CompletableFuture<AcknowledgeAlarmReply> future =
MxGatewayChannels.toCompletable(rawFutureStub().acknowledgeAlarm(request), "acknowledge alarm");
return future.thenApply(MxGatewayChannels.normalisingValidator("acknowledge alarm", reply -> {
MxGatewayErrors.ensureProtocolSuccess("acknowledge alarm", reply.getProtocolStatus(), null);
return reply;
});
}));
}
/**
@@ -316,14 +336,36 @@ public final class MxGatewayClient implements AutoCloseable {
public MxGatewayActiveAlarmsSubscription queryActiveAlarms(
QueryActiveAlarmsRequest request, StreamObserver<ActiveAlarmSnapshot> observer) {
MxGatewayActiveAlarmsSubscription subscription = new MxGatewayActiveAlarmsSubscription();
withStreamDeadline(rawAsyncStub()).queryActiveAlarms(request, subscription.wrap(observer));
MxGatewayChannels.withStreamDeadline(rawAsyncStub(), options)
.queryActiveAlarms(request, subscription.wrap(observer));
return subscription;
}
/**
* Shuts the owned channel down and awaits termination so try-with-resources
* callers do not leave in-flight calls or Netty event-loop threads running
* after the block exits.
*
* <p>Waits up to the configured connect timeout for graceful termination
* and forcibly shuts the channel down on timeout. If the calling thread is
* interrupted while waiting, the channel is forcibly shut down and the
* thread's interrupt flag is restored. No-op for clients that do not own
* their channel. For an explicitly checked, blocking-aware shutdown call
* {@link #closeAndAwaitTermination()}.
*/
@Override
public void close() {
if (ownedChannel != null) {
ownedChannel.shutdown();
if (ownedChannel == null) {
return;
}
ownedChannel.shutdown();
try {
if (!ownedChannel.awaitTermination(options.connectTimeout().toMillis(), TimeUnit.MILLISECONDS)) {
ownedChannel.shutdownNow();
}
} catch (InterruptedException error) {
ownedChannel.shutdownNow();
Thread.currentThread().interrupt();
}
}
@@ -343,75 +385,6 @@ public final class MxGatewayClient implements AutoCloseable {
}
}
private static ManagedChannel createChannel(MxGatewayClientOptions options) {
NettyChannelBuilder builder = NettyChannelBuilder.forTarget(options.endpoint())
.maxInboundMessageSize(options.maxGrpcMessageBytes());
if (!options.connectTimeout().isNegative()) {
builder.withOption(
io.grpc.netty.shaded.io.netty.channel.ChannelOption.CONNECT_TIMEOUT_MILLIS,
Math.toIntExact(options.connectTimeout().toMillis()));
}
if (options.plaintext()) {
builder.usePlaintext();
} else if (options.caCertificatePath() != null) {
try {
builder.sslContext(GrpcSslContexts.forClient()
.trustManager(options.caCertificatePath().toFile())
.build());
} catch (SSLException error) {
throw new MxGatewayException("failed to configure gateway TLS", error);
}
} else {
builder.useTransportSecurity();
}
if (!options.serverNameOverride().isBlank()) {
builder.overrideAuthority(options.serverNameOverride());
}
return builder.build();
}
private <T extends io.grpc.stub.AbstractStub<T>> T withDeadline(T stub) {
if (options.callTimeout().isNegative()) {
return stub;
}
return stub.withDeadlineAfter(options.callTimeout().toNanos(), TimeUnit.NANOSECONDS);
}
private <T extends io.grpc.stub.AbstractStub<T>> T withStreamDeadline(T stub) {
if (options.streamTimeout() == null || options.streamTimeout().isNegative()) {
return stub;
}
return stub.withDeadlineAfter(options.streamTimeout().toNanos(), TimeUnit.NANOSECONDS);
}
private static <T> CompletableFuture<T> toCompletable(com.google.common.util.concurrent.ListenableFuture<T> source) {
CompletableFuture<T> target = new CompletableFuture<>();
Futures.addCallback(
source,
new FutureCallback<>() {
@Override
public void onSuccess(T result) {
target.complete(result);
}
@Override
public void onFailure(Throwable error) {
if (error instanceof RuntimeException runtimeException) {
target.completeExceptionally(MxGatewayErrors.fromGrpc("async call", runtimeException));
return;
}
target.completeExceptionally(error);
}
},
MoreExecutors.directExecutor());
target.whenComplete((ignoredResult, ignoredError) -> {
if (target.isCancelled()) {
source.cancel(true);
}
});
return target;
}
static ProtocolStatusCode okStatusCode() {
return ProtocolStatusCode.PROTOCOL_STATUS_CODE_OK;
}
@@ -11,25 +11,35 @@ public final class MxGatewaySecrets {
}
/**
* Redacts the body of an API key, leaving only short prefix and suffix
* windows so it remains comparable in logs.
* Redacts the secret portion of an API key, leaving only the non-secret
* key identifier visible so the value remains comparable in logs.
*
* <p>A gateway API key has the form {@code mxgw_<key-id>_<secret>}. Only the
* {@code mxgw_<key-id>_} prefix is non-secret; everything after the second
* underscore is the secret and is masked entirely &mdash; no leading or
* trailing characters of the secret are echoed. Tokens that do not match
* the gateway shape are masked completely as {@code "<redacted>"}.
*
* @param apiKey the API key to redact, may be {@code null} or empty
* @return an empty string for {@code null}/empty input, {@code "<redacted>"}
* for keys eight characters or shorter, or a masked form preserving
* the leading and trailing four characters
* for non-gateway-shaped tokens, or {@code mxgw_<key-id>_***} with the
* secret masked for gateway-shaped keys
*/
public static String redactApiKey(String apiKey) {
if (apiKey == null || apiKey.isEmpty()) {
return "";
}
if (apiKey.length() <= 8) {
return "<redacted>";
// Gateway keys are mxgw_<key-id>_<secret>; keep only the non-secret prefix.
if (apiKey.startsWith("mxgw_")) {
int secretSeparator = apiKey.indexOf('_', "mxgw_".length());
if (secretSeparator >= 0 && secretSeparator < apiKey.length() - 1) {
return apiKey.substring(0, secretSeparator + 1) + "***";
}
}
return apiKey.substring(0, 4)
+ "*".repeat(apiKey.length() - 8)
+ apiKey.substring(apiKey.length() - 4);
// Anything else is treated as wholly secret reveal nothing.
return "<redacted>";
}
/**
@@ -40,6 +40,7 @@ import mxaccess_gateway.v1.MxaccessGateway.WriteCommand;
*/
public final class MxGatewaySession implements AutoCloseable {
private static final SecureRandom RANDOM = new SecureRandom();
private static final System.Logger LOGGER = System.getLogger(MxGatewaySession.class.getName());
private final MxGatewayClient client;
private final OpenSessionReply openReply;
@@ -99,9 +100,26 @@ public final class MxGatewaySession implements AutoCloseable {
return closeReply;
}
/**
* Closes the session as part of try-with-resources.
*
* <p>This performs a {@code CloseSession} network RPC. Unlike
* {@link #closeRaw()}, any failure of that RPC is swallowed (and recorded
* as a suppressed exception when the JVM permits) rather than thrown: a
* close-time transport or protocol failure must not replace the exception
* that a try-with-resources body is already propagating. Callers that need
* to observe the close result should call {@link #closeRaw()} explicitly.
*/
@Override
public void close() {
closeRaw();
try {
closeRaw();
} catch (MxGatewayException error) {
LOGGER.log(
System.Logger.Level.WARNING,
() -> "ignoring close-time failure for session " + sessionId(),
error);
}
}
/**
@@ -116,7 +134,11 @@ public final class MxGatewaySession implements AutoCloseable {
if (reply.hasRegister()) {
return reply.getRegister().getServerHandle();
}
return reply.getReturnValue().getInt32Value();
if (reply.hasReturnValue()) {
return reply.getReturnValue().getInt32Value();
}
throw new MxGatewayException(
"gateway register reply carried neither a register payload nor a return value");
}
/**
@@ -159,7 +181,11 @@ public final class MxGatewaySession implements AutoCloseable {
if (reply.hasAddItem()) {
return reply.getAddItem().getItemHandle();
}
return reply.getReturnValue().getInt32Value();
if (reply.hasReturnValue()) {
return reply.getReturnValue().getInt32Value();
}
throw new MxGatewayException(
"gateway addItem reply carried neither an add-item payload nor a return value");
}
/**
@@ -193,7 +219,11 @@ public final class MxGatewaySession implements AutoCloseable {
if (reply.hasAddItem2()) {
return reply.getAddItem2().getItemHandle();
}
return reply.getReturnValue().getInt32Value();
if (reply.hasReturnValue()) {
return reply.getReturnValue().getInt32Value();
}
throw new MxGatewayException(
"gateway addItem2 reply carried neither an add-item payload nor a return value");
}
/**
@@ -0,0 +1,503 @@
package com.dohertylan.mxgateway.client;
import static org.junit.jupiter.api.Assertions.assertEquals;
import static org.junit.jupiter.api.Assertions.assertFalse;
import static org.junit.jupiter.api.Assertions.assertNotNull;
import static org.junit.jupiter.api.Assertions.assertThrows;
import static org.junit.jupiter.api.Assertions.assertTrue;
import io.grpc.ManagedChannel;
import io.grpc.Server;
import io.grpc.Status;
import io.grpc.inprocess.InProcessChannelBuilder;
import io.grpc.inprocess.InProcessServerBuilder;
import io.grpc.stub.ClientCallStreamObserver;
import io.grpc.stub.ClientResponseObserver;
import io.grpc.stub.StreamObserver;
import java.nio.file.Path;
import java.time.Duration;
import java.util.ArrayList;
import java.util.List;
import java.util.UUID;
import java.util.concurrent.CompletableFuture;
import java.util.concurrent.CompletionException;
import java.util.concurrent.CountDownLatch;
import java.util.concurrent.ExecutionException;
import java.util.concurrent.TimeUnit;
import java.util.concurrent.atomic.AtomicReference;
import mxaccess_gateway.v1.MxAccessGatewayGrpc;
import mxaccess_gateway.v1.MxaccessGateway.AcknowledgeAlarmReply;
import mxaccess_gateway.v1.MxaccessGateway.AcknowledgeAlarmRequest;
import mxaccess_gateway.v1.MxaccessGateway.ActiveAlarmSnapshot;
import mxaccess_gateway.v1.MxaccessGateway.AlarmConditionState;
import mxaccess_gateway.v1.MxaccessGateway.MxEvent;
import mxaccess_gateway.v1.MxaccessGateway.ProtocolStatus;
import mxaccess_gateway.v1.MxaccessGateway.ProtocolStatusCode;
import mxaccess_gateway.v1.MxaccessGateway.QueryActiveAlarmsRequest;
import mxaccess_gateway.v1.MxaccessGateway.StreamEventsRequest;
import org.junit.jupiter.api.Test;
/**
* Regression tests for the Low-severity Client.Java code-review findings
* (Client.Java-006 through Client.Java-012). Covers the alarm RPC surface,
* async streaming/subscription cancellation, queue overflow, and TLS-config
* construction that Client.Java-007 reports as untested.
*/
final class MxGatewayLowFindingsTests {
// --- Client.Java-007: AcknowledgeAlarm RPC coverage ---
@Test
void acknowledgeAlarmReturnsReplyAndSendsAuthMetadata() throws Exception {
AtomicReference<String> authorization = new AtomicReference<>();
AtomicReference<AcknowledgeAlarmRequest> seen = new AtomicReference<>();
TestService service = new TestService() {
@Override
public void acknowledgeAlarm(
AcknowledgeAlarmRequest request, StreamObserver<AcknowledgeAlarmReply> responseObserver) {
seen.set(request);
responseObserver.onNext(AcknowledgeAlarmReply.newBuilder()
.setSessionId(request.getSessionId())
.setProtocolStatus(ok())
.setDiagnosticMessage("acked")
.build());
responseObserver.onCompleted();
}
};
try (Harness harness = Harness.start(service, "mxgw_keyid_secret", authorization)) {
AcknowledgeAlarmReply reply = harness.client().acknowledgeAlarm(AcknowledgeAlarmRequest.newBuilder()
.setSessionId("s-1")
.setAlarmFullReference("Area1.Pump.PV.HiHi")
.setComment("operator note")
.build());
assertEquals("acked", reply.getDiagnosticMessage());
assertEquals("Area1.Pump.PV.HiHi", seen.get().getAlarmFullReference());
assertEquals("Bearer mxgw_keyid_secret", authorization.get());
}
}
@Test
void acknowledgeAlarmThrowsTypedExceptionOnProtocolFailure() throws Exception {
TestService service = new TestService() {
@Override
public void acknowledgeAlarm(
AcknowledgeAlarmRequest request, StreamObserver<AcknowledgeAlarmReply> responseObserver) {
responseObserver.onNext(AcknowledgeAlarmReply.newBuilder()
.setSessionId(request.getSessionId())
.setProtocolStatus(ProtocolStatus.newBuilder()
.setCode(ProtocolStatusCode.PROTOCOL_STATUS_CODE_SESSION_NOT_FOUND))
.build());
responseObserver.onCompleted();
}
};
try (Harness harness = Harness.start(service)) {
assertThrows(
MxGatewayException.class,
() -> harness.client().acknowledgeAlarm(AcknowledgeAlarmRequest.newBuilder()
.setSessionId("missing")
.build()));
}
}
@Test
void acknowledgeAlarmAsyncCompletesWithReply() throws Exception {
TestService service = new TestService() {
@Override
public void acknowledgeAlarm(
AcknowledgeAlarmRequest request, StreamObserver<AcknowledgeAlarmReply> responseObserver) {
responseObserver.onNext(AcknowledgeAlarmReply.newBuilder()
.setSessionId(request.getSessionId())
.setProtocolStatus(ok())
.setDiagnosticMessage("async-acked")
.build());
responseObserver.onCompleted();
}
};
try (Harness harness = Harness.start(service)) {
CompletableFuture<AcknowledgeAlarmReply> future = harness.client()
.acknowledgeAlarmAsync(AcknowledgeAlarmRequest.newBuilder().setSessionId("s-2").build());
assertEquals("async-acked", future.get(5, TimeUnit.SECONDS).getDiagnosticMessage());
}
}
@Test
void acknowledgeAlarmAsyncFailsExceptionallyWithTypedException() throws Exception {
TestService service = new TestService() {
@Override
public void acknowledgeAlarm(
AcknowledgeAlarmRequest request, StreamObserver<AcknowledgeAlarmReply> responseObserver) {
responseObserver.onError(Status.UNAVAILABLE.withDescription("worker down").asRuntimeException());
}
};
try (Harness harness = Harness.start(service)) {
CompletableFuture<AcknowledgeAlarmReply> future = harness.client()
.acknowledgeAlarmAsync(AcknowledgeAlarmRequest.newBuilder().setSessionId("s-3").build());
ExecutionException error = assertThrows(
ExecutionException.class, () -> future.get(5, TimeUnit.SECONDS));
assertTrue(error.getCause() instanceof MxGatewayException, () -> String.valueOf(error.getCause()));
}
}
// --- Client.Java-007: QueryActiveAlarms RPC + subscription coverage ---
@Test
void queryActiveAlarmsDeliversSnapshotsToObserver() throws Exception {
ActiveAlarmSnapshot snapshot = ActiveAlarmSnapshot.newBuilder()
.setAlarmFullReference("Area1.Tank.Level.Hi")
.setSeverity(800)
.setCurrentState(AlarmConditionState.ALARM_CONDITION_STATE_ACTIVE)
.build();
TestService service = new TestService() {
@Override
public void queryActiveAlarms(
QueryActiveAlarmsRequest request, StreamObserver<ActiveAlarmSnapshot> responseObserver) {
responseObserver.onNext(snapshot);
responseObserver.onCompleted();
}
};
try (Harness harness = Harness.start(service)) {
List<ActiveAlarmSnapshot> received = new ArrayList<>();
CountDownLatch done = new CountDownLatch(1);
harness.client().queryActiveAlarms(
QueryActiveAlarmsRequest.newBuilder().setSessionId("s-4").build(),
new StreamObserver<>() {
@Override
public void onNext(ActiveAlarmSnapshot value) {
received.add(value);
}
@Override
public void onError(Throwable t) {
done.countDown();
}
@Override
public void onCompleted() {
done.countDown();
}
});
assertTrue(done.await(5, TimeUnit.SECONDS), "stream should complete");
assertEquals(1, received.size());
assertEquals("Area1.Tank.Level.Hi", received.get(0).getAlarmFullReference());
}
}
@Test
void activeAlarmsSubscriptionCancelBeforeBeforeStartCancelsStream() {
MxGatewayActiveAlarmsSubscription subscription = new MxGatewayActiveAlarmsSubscription();
ClientResponseObserver<QueryActiveAlarmsRequest, ActiveAlarmSnapshot> observer =
subscription.wrap(new StreamObserver<>() {
@Override
public void onNext(ActiveAlarmSnapshot value) {
}
@Override
public void onError(Throwable t) {
}
@Override
public void onCompleted() {
}
});
RecordingActiveAlarmsRequestStream requestStream = new RecordingActiveAlarmsRequestStream();
subscription.cancel();
observer.beforeStart(requestStream);
assertTrue(requestStream.cancelled);
assertEquals("client cancelled active-alarms query", requestStream.cancelMessage);
}
// --- Client.Java-007: async streamEvents + subscription cancellation ---
@Test
void streamEventsAsyncDeliversEventsToObserver() throws Exception {
MxEvent event = MxEvent.newBuilder().setWorkerSequence(7).build();
TestService service = new TestService() {
@Override
public void streamEvents(StreamEventsRequest request, StreamObserver<MxEvent> responseObserver) {
responseObserver.onNext(event);
responseObserver.onCompleted();
}
};
try (Harness harness = Harness.start(service)) {
List<MxEvent> received = new ArrayList<>();
CountDownLatch done = new CountDownLatch(1);
harness.client().streamEventsAsync(
StreamEventsRequest.newBuilder().setSessionId("s-5").build(),
new StreamObserver<>() {
@Override
public void onNext(MxEvent value) {
received.add(value);
}
@Override
public void onError(Throwable t) {
done.countDown();
}
@Override
public void onCompleted() {
done.countDown();
}
});
assertTrue(done.await(5, TimeUnit.SECONDS), "stream should complete");
assertEquals(1, received.size());
assertEquals(7, received.get(0).getWorkerSequence());
}
}
@Test
void eventSubscriptionCancelBeforeBeforeStartCancelsStream() {
MxGatewayEventSubscription subscription = new MxGatewayEventSubscription();
ClientResponseObserver<StreamEventsRequest, MxEvent> observer =
subscription.wrap(new StreamObserver<>() {
@Override
public void onNext(MxEvent value) {
}
@Override
public void onError(Throwable t) {
}
@Override
public void onCompleted() {
}
});
RecordingEventsRequestStream requestStream = new RecordingEventsRequestStream();
subscription.cancel();
observer.beforeStart(requestStream);
assertTrue(requestStream.cancelled);
assertEquals("client cancelled event stream", requestStream.cancelMessage);
}
// --- Client.Java-007 / Client.Java-011: MxEventStream queue overflow ---
@Test
void eventStreamQueueOverflowSurfacesExceptionFromNext() {
MxEventStream stream = new MxEventStream(2);
ClientResponseObserver<StreamEventsRequest, MxEvent> observer = stream.observer();
RecordingEventsRequestStream requestStream = new RecordingEventsRequestStream();
observer.beforeStart(requestStream);
// Push far more events than the capacity-2 buffer can hold without draining.
for (int i = 0; i < 16; i++) {
observer.onNext(MxEvent.newBuilder().setWorkerSequence(i).build());
}
// Overflow must cancel the gRPC call and surface as MxGatewayException.
assertTrue(requestStream.cancelled, "overflow should cancel the underlying call");
MxGatewayException error = assertThrows(MxGatewayException.class, () -> {
while (stream.hasNext()) {
stream.next();
}
});
assertTrue(error.getMessage().contains("overflow"), error::getMessage);
}
// --- Client.Java-007: TLS channel construction ---
@Test
void connectWithMissingCaCertificateThrowsTypedTlsException() {
MxGatewayClientOptions options = MxGatewayClientOptions.builder()
.endpoint("localhost:5001")
.apiKey("mxgw_id_secret")
.plaintext(false)
.caCertificatePath(Path.of("does-not-exist-" + UUID.randomUUID() + ".pem"))
.build();
MxGatewayException error = assertThrows(MxGatewayException.class, () -> MxGatewayClient.connect(options));
assertTrue(error.getMessage().contains("TLS"), error::getMessage);
MxGatewayException galaxyError =
assertThrows(MxGatewayException.class, () -> GalaxyRepositoryClient.connect(options));
assertTrue(galaxyError.getMessage().contains("TLS"), galaxyError::getMessage);
}
@Test
void connectWithSystemTrustBuildsTlsChannelWithoutError() {
// No CA path and plaintext=false exercises the useTransportSecurity() branch.
MxGatewayClientOptions options = MxGatewayClientOptions.builder()
.endpoint("localhost:5001")
.apiKey("mxgw_id_secret")
.plaintext(false)
.build();
try (MxGatewayClient client = MxGatewayClient.connect(options)) {
assertNotNull(client);
}
try (GalaxyRepositoryClient galaxy = GalaxyRepositoryClient.connect(options)) {
assertNotNull(galaxy);
}
}
// --- Client.Java-008: async error surface is normalised ---
@Test
void openSessionAsyncNormalisesNonGatewayRuntimeExceptionFromValidator() {
// ensureGatewayProtocolCompatible already throws MxGatewayException; this verifies
// the normalisingValidator wrapper routes a stray RuntimeException through fromGrpc.
CompletableFuture<String> source = new CompletableFuture<>();
CompletableFuture<String> wrapped =
source.thenApply(MxGatewayChannels.normalisingValidator("open session", reply -> {
throw new IllegalStateException("malformed reply");
}));
source.complete("payload");
CompletionException error = assertThrows(CompletionException.class, wrapped::join);
assertTrue(error.getCause() instanceof MxGatewayException, () -> String.valueOf(error.getCause()));
}
private static ProtocolStatus ok() {
return ProtocolStatus.newBuilder()
.setCode(ProtocolStatusCode.PROTOCOL_STATUS_CODE_OK)
.build();
}
private static class TestService extends MxAccessGatewayGrpc.MxAccessGatewayImplBase {
}
private record Harness(Server server, ManagedChannel channel, MxGatewayClient client) implements AutoCloseable {
static Harness start(MxAccessGatewayGrpc.MxAccessGatewayImplBase service) throws Exception {
return start(service, "", new AtomicReference<>());
}
static Harness start(
MxAccessGatewayGrpc.MxAccessGatewayImplBase service,
String apiKey,
AtomicReference<String> authorization)
throws Exception {
String name = "mxgw-low-" + UUID.randomUUID();
io.grpc.ServerInterceptor interceptor = new io.grpc.ServerInterceptor() {
@Override
public <ReqT, RespT> io.grpc.ServerCall.Listener<ReqT> interceptCall(
io.grpc.ServerCall<ReqT, RespT> call,
io.grpc.Metadata headers,
io.grpc.ServerCallHandler<ReqT, RespT> next) {
authorization.set(headers.get(MxGatewayAuthInterceptor.AUTHORIZATION_HEADER));
return next.startCall(call, headers);
}
};
Server server = InProcessServerBuilder.forName(name)
.directExecutor()
.addService(io.grpc.ServerInterceptors.intercept(service, interceptor))
.build()
.start();
ManagedChannel channel = InProcessChannelBuilder.forName(name).directExecutor().build();
MxGatewayClient client = new MxGatewayClient(
channel,
MxGatewayClientOptions.builder()
.endpoint("in-process")
.apiKey(apiKey)
.plaintext(true)
.callTimeout(Duration.ofSeconds(5))
.streamTimeout(Duration.ofSeconds(5))
.build());
return new Harness(server, channel, client);
}
@Override
public void close() {
channel.shutdownNow();
server.shutdownNow();
}
}
private static final class RecordingEventsRequestStream
extends ClientCallStreamObserver<StreamEventsRequest> {
private boolean cancelled;
private String cancelMessage;
@Override
public void cancel(String message, Throwable cause) {
cancelled = true;
cancelMessage = message;
}
@Override
public boolean isReady() {
return true;
}
@Override
public void setOnReadyHandler(Runnable onReadyHandler) {
}
@Override
public void request(int count) {
}
@Override
public void setMessageCompression(boolean enable) {
}
@Override
public void disableAutoInboundFlowControl() {
}
@Override
public void onNext(StreamEventsRequest value) {
}
@Override
public void onError(Throwable t) {
}
@Override
public void onCompleted() {
}
}
private static final class RecordingActiveAlarmsRequestStream
extends ClientCallStreamObserver<QueryActiveAlarmsRequest> {
private boolean cancelled;
private String cancelMessage;
@Override
public void cancel(String message, Throwable cause) {
cancelled = true;
cancelMessage = message;
}
@Override
public boolean isReady() {
return true;
}
@Override
public void setOnReadyHandler(Runnable onReadyHandler) {
}
@Override
public void request(int count) {
}
@Override
public void setMessageCompression(boolean enable) {
}
@Override
public void disableAutoInboundFlowControl() {
}
@Override
public void onNext(QueryActiveAlarmsRequest value) {
}
@Override
public void onError(Throwable t) {
}
@Override
public void onCompleted() {
}
}
}
@@ -0,0 +1,394 @@
package com.dohertylan.mxgateway.client;
import static org.junit.jupiter.api.Assertions.assertEquals;
import static org.junit.jupiter.api.Assertions.assertFalse;
import static org.junit.jupiter.api.Assertions.assertNotNull;
import static org.junit.jupiter.api.Assertions.assertThrows;
import static org.junit.jupiter.api.Assertions.assertTrue;
import io.grpc.ManagedChannel;
import io.grpc.Server;
import io.grpc.inprocess.InProcessChannelBuilder;
import io.grpc.inprocess.InProcessServerBuilder;
import io.grpc.stub.StreamObserver;
import java.time.Duration;
import java.util.UUID;
import mxaccess_gateway.v1.MxAccessGatewayGrpc;
import mxaccess_gateway.v1.MxaccessGateway.CloseSessionReply;
import mxaccess_gateway.v1.MxaccessGateway.CloseSessionRequest;
import mxaccess_gateway.v1.MxaccessGateway.MxCommandKind;
import mxaccess_gateway.v1.MxaccessGateway.MxCommandReply;
import mxaccess_gateway.v1.MxaccessGateway.MxCommandRequest;
import mxaccess_gateway.v1.MxaccessGateway.OpenSessionReply;
import mxaccess_gateway.v1.MxaccessGateway.OpenSessionRequest;
import mxaccess_gateway.v1.MxaccessGateway.ProtocolStatus;
import mxaccess_gateway.v1.MxaccessGateway.ProtocolStatusCode;
import org.junit.jupiter.api.Test;
/**
* Regression tests for the Medium-severity Client.Java code-review findings
* (Client.Java-001 through Client.Java-005).
*/
final class MxGatewayMediumFindingsTests {
// --- Client.Java-001: redactApiKey must not leak trailing secret chars ---
@Test
void redactApiKeyDoesNotLeakAnyCharacterOfTheSecret() {
// mxgw_<key-id>_<secret> the secret is the segment after the second underscore.
String apiKey = "mxgw_keyid01_supersecretvalue";
String redacted = MxGatewaySecrets.redactApiKey(apiKey);
// None of the secret characters may appear in the redacted output.
assertFalse(redacted.contains("value"), () -> "redacted form leaked secret tail: " + redacted);
assertFalse(redacted.endsWith("alue"), () -> "redacted form leaked trailing secret chars: " + redacted);
assertFalse(redacted.contains("supersecret"), () -> "redacted form leaked secret: " + redacted);
// The non-secret key-id prefix may stay so the value is still comparable in logs.
assertTrue(redacted.startsWith("mxgw_keyid01_"), () -> "redacted form lost key-id prefix: " + redacted);
}
@Test
void redactApiKeyForNonGatewayShapedKeyRevealsNothing() {
String redacted = MxGatewaySecrets.redactApiKey("plain-opaque-token-1234");
assertFalse(redacted.contains("1234"), () -> "redacted form leaked trailing chars: " + redacted);
assertFalse(redacted.contains("plain-opaque-token"), () -> "redacted form leaked body: " + redacted);
}
@Test
void redactApiKeyStillHandlesNullAndShortInput() {
assertEquals("", MxGatewaySecrets.redactApiKey(null));
assertEquals("", MxGatewaySecrets.redactApiKey(""));
assertEquals("<redacted>", MxGatewaySecrets.redactApiKey("short"));
}
// --- Client.Java-002: terminal-state transition must be deterministic ---
@Test
void eventStreamOverflowExceptionSurvivesASubsequentClose() {
// Deterministic reproduction of Client.Java-002: an overflow enqueues the
// overflow exception, then a later close() must NOT discard it. The first
// terminal condition (overflow) must win and stay observable by next().
MxEventStream stream = new MxEventStream(2);
io.grpc.stub.ClientResponseObserver<
mxaccess_gateway.v1.MxaccessGateway.StreamEventsRequest,
mxaccess_gateway.v1.MxaccessGateway.MxEvent>
observer = stream.observer();
observer.beforeStart(new NoopRequestStream());
// Force a queue overflow on a capacity-2 stream.
for (int i = 0; i < 8; i++) {
observer.onNext(testEvent(i));
}
// A close() arriving after the overflow must not erase the overflow signal.
stream.close();
MxGatewayException error = assertThrows(MxGatewayException.class, () -> {
while (stream.hasNext()) {
stream.next();
}
});
assertTrue(error.getMessage().contains("overflow"), error::getMessage);
}
@Test
void eventStreamConcurrentOverflowAndCloseAlwaysTerminate() throws Exception {
// The terminal-state transition must be serialised: whatever the interleaving
// of overflow and close, hasNext() always reaches a terminal state.
for (int iteration = 0; iteration < 300; iteration++) {
MxEventStream stream = new MxEventStream(2);
io.grpc.stub.ClientResponseObserver<
mxaccess_gateway.v1.MxaccessGateway.StreamEventsRequest,
mxaccess_gateway.v1.MxaccessGateway.MxEvent>
observer = stream.observer();
observer.beforeStart(new NoopRequestStream());
Thread filler = new Thread(() -> {
for (int i = 0; i < 8; i++) {
observer.onNext(testEvent(i));
}
});
Thread closer = new Thread(stream::close);
filler.start();
closer.start();
filler.join();
closer.join();
try {
while (stream.hasNext()) {
stream.next();
}
} catch (MxGatewayException expected) {
assertTrue(expected.getMessage().contains("overflow"), expected::getMessage);
}
assertFalse(stream.hasNext());
}
}
private static final class NoopRequestStream
extends io.grpc.stub.ClientCallStreamObserver<mxaccess_gateway.v1.MxaccessGateway.StreamEventsRequest> {
@Override
public void cancel(String message, Throwable cause) {
}
@Override
public boolean isReady() {
return true;
}
@Override
public void setOnReadyHandler(Runnable onReadyHandler) {
}
@Override
public void request(int count) {
}
@Override
public void setMessageCompression(boolean enable) {
}
@Override
public void disableAutoInboundFlowControl() {
}
@Override
public void onNext(mxaccess_gateway.v1.MxaccessGateway.StreamEventsRequest value) {
}
@Override
public void onError(Throwable t) {
}
@Override
public void onCompleted() {
}
}
// --- Client.Java-003: gateway protocol version mismatch must be rejected ---
@Test
void openSessionRejectsIncompatibleGatewayProtocolVersion() throws Exception {
TestService service = new TestService() {
@Override
public void openSession(OpenSessionRequest request, StreamObserver<OpenSessionReply> responseObserver) {
responseObserver.onNext(OpenSessionReply.newBuilder()
.setSessionId("session-mismatch")
.setGatewayProtocolVersion(MxGatewayClientVersion.gatewayProtocolVersion() + 1)
.setProtocolStatus(ok())
.build());
responseObserver.onCompleted();
}
};
try (Harness harness = Harness.start(service)) {
MxGatewayException error = assertThrows(
MxGatewayException.class,
() -> harness.client().openSession("junit-session"));
assertTrue(error.getMessage().contains("protocol version"), error::getMessage);
}
}
@Test
void openSessionAcceptsMatchingOrUnsetGatewayProtocolVersion() throws Exception {
TestService matching = new TestService() {
@Override
public void openSession(OpenSessionRequest request, StreamObserver<OpenSessionReply> responseObserver) {
responseObserver.onNext(OpenSessionReply.newBuilder()
.setSessionId("session-ok")
.setGatewayProtocolVersion(MxGatewayClientVersion.gatewayProtocolVersion())
.setProtocolStatus(ok())
.build());
responseObserver.onCompleted();
}
};
try (Harness harness = Harness.start(matching)) {
assertEquals("session-ok", harness.client().openSession("junit-session").sessionId());
}
// A gateway that leaves the field unset (0) must not be rejected older gateways
// simply do not populate it.
TestService unset = new TestService();
try (Harness harness = Harness.start(unset)) {
assertEquals("session-java", harness.client().openSession("junit-session").sessionId());
}
}
// --- Client.Java-004: missing typed payload AND missing return_value must throw ---
@Test
void registerThrowsWhenReplyHasNeitherTypedPayloadNorReturnValue() throws Exception {
TestService service = new TestService() {
@Override
public void invoke(MxCommandRequest request, StreamObserver<MxCommandReply> responseObserver) {
// Reply with neither register payload nor return_value set.
responseObserver.onNext(MxCommandReply.newBuilder()
.setSessionId(request.getSessionId())
.setKind(request.getCommand().getKind())
.setProtocolStatus(ok())
.build());
responseObserver.onCompleted();
}
};
try (Harness harness = Harness.start(service)) {
MxGatewaySession session = MxGatewaySession.forSessionId(harness.client(), "s");
MxGatewayException error = assertThrows(
MxGatewayException.class, () -> session.register("c"));
assertTrue(error.getMessage().contains("register"), error::getMessage);
}
}
@Test
void addItemThrowsWhenReplyHasNeitherTypedPayloadNorReturnValue() throws Exception {
TestService service = new TestService() {
@Override
public void invoke(MxCommandRequest request, StreamObserver<MxCommandReply> responseObserver) {
responseObserver.onNext(MxCommandReply.newBuilder()
.setSessionId(request.getSessionId())
.setKind(request.getCommand().getKind())
.setProtocolStatus(ok())
.build());
responseObserver.onCompleted();
}
};
try (Harness harness = Harness.start(service)) {
MxGatewaySession session = MxGatewaySession.forSessionId(harness.client(), "s");
assertThrows(MxGatewayException.class, () -> session.addItem(1, "Tag"));
assertThrows(MxGatewayException.class, () -> session.addItem2(1, "Tag", "ctx"));
}
}
@Test
void addItemStillHonoursReturnValueFallback() throws Exception {
TestService service = new TestService() {
@Override
public void invoke(MxCommandRequest request, StreamObserver<MxCommandReply> responseObserver) {
responseObserver.onNext(MxCommandReply.newBuilder()
.setSessionId(request.getSessionId())
.setKind(request.getCommand().getKind())
.setProtocolStatus(ok())
.setReturnValue(mxaccess_gateway.v1.MxaccessGateway.MxValue.newBuilder()
.setInt32Value(99))
.build());
responseObserver.onCompleted();
}
};
try (Harness harness = Harness.start(service)) {
MxGatewaySession session = MxGatewaySession.forSessionId(harness.client(), "s");
assertEquals(99, session.addItem(1, "Tag"));
}
}
// --- Client.Java-005: close() must not mask the primary try-with-resources error ---
@Test
void closeSuppressesCloseTimeFailureInsteadOfMaskingBodyException() throws Exception {
TestService service = new TestService() {
@Override
public void closeSession(CloseSessionRequest request, StreamObserver<CloseSessionReply> responseObserver) {
responseObserver.onError(io.grpc.Status.UNAVAILABLE
.withDescription("WORKER_UNAVAILABLE")
.asRuntimeException());
}
};
try (Harness harness = Harness.start(service)) {
IllegalStateException bodyError = assertThrows(IllegalStateException.class, () -> {
try (MxGatewaySession session = MxGatewaySession.forSessionId(harness.client(), "s")) {
throw new IllegalStateException("body failure");
}
});
// The body exception must propagate; the close-time RPC failure must not replace it.
assertEquals("body failure", bodyError.getMessage());
}
}
@Test
void closeRawStillSurfacesCloseTimeFailureForCallersWhoWantIt() throws Exception {
TestService service = new TestService() {
@Override
public void closeSession(CloseSessionRequest request, StreamObserver<CloseSessionReply> responseObserver) {
responseObserver.onError(io.grpc.Status.UNAVAILABLE
.withDescription("WORKER_UNAVAILABLE")
.asRuntimeException());
}
};
try (Harness harness = Harness.start(service)) {
MxGatewaySession session = MxGatewaySession.forSessionId(harness.client(), "s");
assertThrows(MxGatewayException.class, session::closeRaw);
}
}
private static mxaccess_gateway.v1.MxaccessGateway.MxEvent testEvent(int sequence) {
return mxaccess_gateway.v1.MxaccessGateway.MxEvent.newBuilder()
.setWorkerSequence(sequence)
.build();
}
private static ProtocolStatus ok() {
return ProtocolStatus.newBuilder()
.setCode(ProtocolStatusCode.PROTOCOL_STATUS_CODE_OK)
.build();
}
private static class TestService extends MxAccessGatewayGrpc.MxAccessGatewayImplBase {
@Override
public void openSession(OpenSessionRequest request, StreamObserver<OpenSessionReply> responseObserver) {
responseObserver.onNext(OpenSessionReply.newBuilder()
.setSessionId("session-java")
.setProtocolStatus(ok())
.build());
responseObserver.onCompleted();
}
@Override
public void closeSession(CloseSessionRequest request, StreamObserver<CloseSessionReply> responseObserver) {
responseObserver.onNext(CloseSessionReply.newBuilder()
.setSessionId(request.getSessionId())
.setProtocolStatus(ok())
.build());
responseObserver.onCompleted();
}
@Override
public void invoke(MxCommandRequest request, StreamObserver<MxCommandReply> responseObserver) {
responseObserver.onNext(MxCommandReply.newBuilder()
.setSessionId(request.getSessionId())
.setKind(MxCommandKind.MX_COMMAND_KIND_UNSPECIFIED)
.setProtocolStatus(ok())
.build());
responseObserver.onCompleted();
}
}
private record Harness(Server server, ManagedChannel channel, MxGatewayClient client) implements AutoCloseable {
static Harness start(MxAccessGatewayGrpc.MxAccessGatewayImplBase service) throws Exception {
String name = "mxgw-medium-" + UUID.randomUUID();
Server server = InProcessServerBuilder.forName(name)
.directExecutor()
.addService(service)
.build()
.start();
ManagedChannel channel = InProcessChannelBuilder.forName(name).directExecutor().build();
MxGatewayClient client = new MxGatewayClient(
channel,
MxGatewayClientOptions.builder()
.endpoint("in-process")
.apiKey("")
.plaintext(true)
.callTimeout(Duration.ofSeconds(5))
.build());
return new Harness(server, channel, client);
}
@Override
public void close() {
channel.shutdownNow();
server.shutdownNow();
}
}
}
@@ -139,6 +139,68 @@ public final class MxAccessGatewayGrpc {
return getStreamEventsMethod;
}
private static volatile io.grpc.MethodDescriptor<mxaccess_gateway.v1.MxaccessGateway.AcknowledgeAlarmRequest,
mxaccess_gateway.v1.MxaccessGateway.AcknowledgeAlarmReply> getAcknowledgeAlarmMethod;
@io.grpc.stub.annotations.RpcMethod(
fullMethodName = SERVICE_NAME + '/' + "AcknowledgeAlarm",
requestType = mxaccess_gateway.v1.MxaccessGateway.AcknowledgeAlarmRequest.class,
responseType = mxaccess_gateway.v1.MxaccessGateway.AcknowledgeAlarmReply.class,
methodType = io.grpc.MethodDescriptor.MethodType.UNARY)
public static io.grpc.MethodDescriptor<mxaccess_gateway.v1.MxaccessGateway.AcknowledgeAlarmRequest,
mxaccess_gateway.v1.MxaccessGateway.AcknowledgeAlarmReply> getAcknowledgeAlarmMethod() {
io.grpc.MethodDescriptor<mxaccess_gateway.v1.MxaccessGateway.AcknowledgeAlarmRequest, mxaccess_gateway.v1.MxaccessGateway.AcknowledgeAlarmReply> getAcknowledgeAlarmMethod;
if ((getAcknowledgeAlarmMethod = MxAccessGatewayGrpc.getAcknowledgeAlarmMethod) == null) {
synchronized (MxAccessGatewayGrpc.class) {
if ((getAcknowledgeAlarmMethod = MxAccessGatewayGrpc.getAcknowledgeAlarmMethod) == null) {
MxAccessGatewayGrpc.getAcknowledgeAlarmMethod = getAcknowledgeAlarmMethod =
io.grpc.MethodDescriptor.<mxaccess_gateway.v1.MxaccessGateway.AcknowledgeAlarmRequest, mxaccess_gateway.v1.MxaccessGateway.AcknowledgeAlarmReply>newBuilder()
.setType(io.grpc.MethodDescriptor.MethodType.UNARY)
.setFullMethodName(generateFullMethodName(SERVICE_NAME, "AcknowledgeAlarm"))
.setSampledToLocalTracing(true)
.setRequestMarshaller(io.grpc.protobuf.ProtoUtils.marshaller(
mxaccess_gateway.v1.MxaccessGateway.AcknowledgeAlarmRequest.getDefaultInstance()))
.setResponseMarshaller(io.grpc.protobuf.ProtoUtils.marshaller(
mxaccess_gateway.v1.MxaccessGateway.AcknowledgeAlarmReply.getDefaultInstance()))
.setSchemaDescriptor(new MxAccessGatewayMethodDescriptorSupplier("AcknowledgeAlarm"))
.build();
}
}
}
return getAcknowledgeAlarmMethod;
}
private static volatile io.grpc.MethodDescriptor<mxaccess_gateway.v1.MxaccessGateway.QueryActiveAlarmsRequest,
mxaccess_gateway.v1.MxaccessGateway.ActiveAlarmSnapshot> getQueryActiveAlarmsMethod;
@io.grpc.stub.annotations.RpcMethod(
fullMethodName = SERVICE_NAME + '/' + "QueryActiveAlarms",
requestType = mxaccess_gateway.v1.MxaccessGateway.QueryActiveAlarmsRequest.class,
responseType = mxaccess_gateway.v1.MxaccessGateway.ActiveAlarmSnapshot.class,
methodType = io.grpc.MethodDescriptor.MethodType.SERVER_STREAMING)
public static io.grpc.MethodDescriptor<mxaccess_gateway.v1.MxaccessGateway.QueryActiveAlarmsRequest,
mxaccess_gateway.v1.MxaccessGateway.ActiveAlarmSnapshot> getQueryActiveAlarmsMethod() {
io.grpc.MethodDescriptor<mxaccess_gateway.v1.MxaccessGateway.QueryActiveAlarmsRequest, mxaccess_gateway.v1.MxaccessGateway.ActiveAlarmSnapshot> getQueryActiveAlarmsMethod;
if ((getQueryActiveAlarmsMethod = MxAccessGatewayGrpc.getQueryActiveAlarmsMethod) == null) {
synchronized (MxAccessGatewayGrpc.class) {
if ((getQueryActiveAlarmsMethod = MxAccessGatewayGrpc.getQueryActiveAlarmsMethod) == null) {
MxAccessGatewayGrpc.getQueryActiveAlarmsMethod = getQueryActiveAlarmsMethod =
io.grpc.MethodDescriptor.<mxaccess_gateway.v1.MxaccessGateway.QueryActiveAlarmsRequest, mxaccess_gateway.v1.MxaccessGateway.ActiveAlarmSnapshot>newBuilder()
.setType(io.grpc.MethodDescriptor.MethodType.SERVER_STREAMING)
.setFullMethodName(generateFullMethodName(SERVICE_NAME, "QueryActiveAlarms"))
.setSampledToLocalTracing(true)
.setRequestMarshaller(io.grpc.protobuf.ProtoUtils.marshaller(
mxaccess_gateway.v1.MxaccessGateway.QueryActiveAlarmsRequest.getDefaultInstance()))
.setResponseMarshaller(io.grpc.protobuf.ProtoUtils.marshaller(
mxaccess_gateway.v1.MxaccessGateway.ActiveAlarmSnapshot.getDefaultInstance()))
.setSchemaDescriptor(new MxAccessGatewayMethodDescriptorSupplier("QueryActiveAlarms"))
.build();
}
}
}
return getQueryActiveAlarmsMethod;
}
/**
* Creates a new async stub that supports all call types for the service
*/
@@ -232,6 +294,20 @@ public final class MxAccessGatewayGrpc {
io.grpc.stub.StreamObserver<mxaccess_gateway.v1.MxaccessGateway.MxEvent> responseObserver) {
io.grpc.stub.ServerCalls.asyncUnimplementedUnaryCall(getStreamEventsMethod(), responseObserver);
}
/**
*/
default void acknowledgeAlarm(mxaccess_gateway.v1.MxaccessGateway.AcknowledgeAlarmRequest request,
io.grpc.stub.StreamObserver<mxaccess_gateway.v1.MxaccessGateway.AcknowledgeAlarmReply> responseObserver) {
io.grpc.stub.ServerCalls.asyncUnimplementedUnaryCall(getAcknowledgeAlarmMethod(), responseObserver);
}
/**
*/
default void queryActiveAlarms(mxaccess_gateway.v1.MxaccessGateway.QueryActiveAlarmsRequest request,
io.grpc.stub.StreamObserver<mxaccess_gateway.v1.MxaccessGateway.ActiveAlarmSnapshot> responseObserver) {
io.grpc.stub.ServerCalls.asyncUnimplementedUnaryCall(getQueryActiveAlarmsMethod(), responseObserver);
}
}
/**
@@ -298,6 +374,22 @@ public final class MxAccessGatewayGrpc {
io.grpc.stub.ClientCalls.asyncServerStreamingCall(
getChannel().newCall(getStreamEventsMethod(), getCallOptions()), request, responseObserver);
}
/**
*/
public void acknowledgeAlarm(mxaccess_gateway.v1.MxaccessGateway.AcknowledgeAlarmRequest request,
io.grpc.stub.StreamObserver<mxaccess_gateway.v1.MxaccessGateway.AcknowledgeAlarmReply> responseObserver) {
io.grpc.stub.ClientCalls.asyncUnaryCall(
getChannel().newCall(getAcknowledgeAlarmMethod(), getCallOptions()), request, responseObserver);
}
/**
*/
public void queryActiveAlarms(mxaccess_gateway.v1.MxaccessGateway.QueryActiveAlarmsRequest request,
io.grpc.stub.StreamObserver<mxaccess_gateway.v1.MxaccessGateway.ActiveAlarmSnapshot> responseObserver) {
io.grpc.stub.ClientCalls.asyncServerStreamingCall(
getChannel().newCall(getQueryActiveAlarmsMethod(), getCallOptions()), request, responseObserver);
}
}
/**
@@ -348,6 +440,22 @@ public final class MxAccessGatewayGrpc {
return io.grpc.stub.ClientCalls.blockingV2ServerStreamingCall(
getChannel(), getStreamEventsMethod(), getCallOptions(), request);
}
/**
*/
public mxaccess_gateway.v1.MxaccessGateway.AcknowledgeAlarmReply acknowledgeAlarm(mxaccess_gateway.v1.MxaccessGateway.AcknowledgeAlarmRequest request) throws io.grpc.StatusException {
return io.grpc.stub.ClientCalls.blockingV2UnaryCall(
getChannel(), getAcknowledgeAlarmMethod(), getCallOptions(), request);
}
/**
*/
@io.grpc.ExperimentalApi("https://github.com/grpc/grpc-java/issues/10918")
public io.grpc.stub.BlockingClientCall<?, mxaccess_gateway.v1.MxaccessGateway.ActiveAlarmSnapshot>
queryActiveAlarms(mxaccess_gateway.v1.MxaccessGateway.QueryActiveAlarmsRequest request) {
return io.grpc.stub.ClientCalls.blockingV2ServerStreamingCall(
getChannel(), getQueryActiveAlarmsMethod(), getCallOptions(), request);
}
}
/**
@@ -397,6 +505,21 @@ public final class MxAccessGatewayGrpc {
return io.grpc.stub.ClientCalls.blockingServerStreamingCall(
getChannel(), getStreamEventsMethod(), getCallOptions(), request);
}
/**
*/
public mxaccess_gateway.v1.MxaccessGateway.AcknowledgeAlarmReply acknowledgeAlarm(mxaccess_gateway.v1.MxaccessGateway.AcknowledgeAlarmRequest request) {
return io.grpc.stub.ClientCalls.blockingUnaryCall(
getChannel(), getAcknowledgeAlarmMethod(), getCallOptions(), request);
}
/**
*/
public java.util.Iterator<mxaccess_gateway.v1.MxaccessGateway.ActiveAlarmSnapshot> queryActiveAlarms(
mxaccess_gateway.v1.MxaccessGateway.QueryActiveAlarmsRequest request) {
return io.grpc.stub.ClientCalls.blockingServerStreamingCall(
getChannel(), getQueryActiveAlarmsMethod(), getCallOptions(), request);
}
}
/**
@@ -441,12 +564,22 @@ public final class MxAccessGatewayGrpc {
return io.grpc.stub.ClientCalls.futureUnaryCall(
getChannel().newCall(getInvokeMethod(), getCallOptions()), request);
}
/**
*/
public com.google.common.util.concurrent.ListenableFuture<mxaccess_gateway.v1.MxaccessGateway.AcknowledgeAlarmReply> acknowledgeAlarm(
mxaccess_gateway.v1.MxaccessGateway.AcknowledgeAlarmRequest request) {
return io.grpc.stub.ClientCalls.futureUnaryCall(
getChannel().newCall(getAcknowledgeAlarmMethod(), getCallOptions()), request);
}
}
private static final int METHODID_OPEN_SESSION = 0;
private static final int METHODID_CLOSE_SESSION = 1;
private static final int METHODID_INVOKE = 2;
private static final int METHODID_STREAM_EVENTS = 3;
private static final int METHODID_ACKNOWLEDGE_ALARM = 4;
private static final int METHODID_QUERY_ACTIVE_ALARMS = 5;
private static final class MethodHandlers<Req, Resp> implements
io.grpc.stub.ServerCalls.UnaryMethod<Req, Resp>,
@@ -481,6 +614,14 @@ public final class MxAccessGatewayGrpc {
serviceImpl.streamEvents((mxaccess_gateway.v1.MxaccessGateway.StreamEventsRequest) request,
(io.grpc.stub.StreamObserver<mxaccess_gateway.v1.MxaccessGateway.MxEvent>) responseObserver);
break;
case METHODID_ACKNOWLEDGE_ALARM:
serviceImpl.acknowledgeAlarm((mxaccess_gateway.v1.MxaccessGateway.AcknowledgeAlarmRequest) request,
(io.grpc.stub.StreamObserver<mxaccess_gateway.v1.MxaccessGateway.AcknowledgeAlarmReply>) responseObserver);
break;
case METHODID_QUERY_ACTIVE_ALARMS:
serviceImpl.queryActiveAlarms((mxaccess_gateway.v1.MxaccessGateway.QueryActiveAlarmsRequest) request,
(io.grpc.stub.StreamObserver<mxaccess_gateway.v1.MxaccessGateway.ActiveAlarmSnapshot>) responseObserver);
break;
default:
throw new AssertionError();
}
@@ -527,6 +668,20 @@ public final class MxAccessGatewayGrpc {
mxaccess_gateway.v1.MxaccessGateway.StreamEventsRequest,
mxaccess_gateway.v1.MxaccessGateway.MxEvent>(
service, METHODID_STREAM_EVENTS)))
.addMethod(
getAcknowledgeAlarmMethod(),
io.grpc.stub.ServerCalls.asyncUnaryCall(
new MethodHandlers<
mxaccess_gateway.v1.MxaccessGateway.AcknowledgeAlarmRequest,
mxaccess_gateway.v1.MxaccessGateway.AcknowledgeAlarmReply>(
service, METHODID_ACKNOWLEDGE_ALARM)))
.addMethod(
getQueryActiveAlarmsMethod(),
io.grpc.stub.ServerCalls.asyncServerStreamingCall(
new MethodHandlers<
mxaccess_gateway.v1.MxaccessGateway.QueryActiveAlarmsRequest,
mxaccess_gateway.v1.MxaccessGateway.ActiveAlarmSnapshot>(
service, METHODID_QUERY_ACTIVE_ALARMS)))
.build();
}
@@ -579,6 +734,8 @@ public final class MxAccessGatewayGrpc {
.addMethod(getCloseSessionMethod())
.addMethod(getInvokeMethod())
.addMethod(getStreamEventsMethod())
.addMethod(getAcknowledgeAlarmMethod())
.addMethod(getQueryActiveAlarmsMethod())
.build();
}
}
@@ -1750,7 +1750,7 @@ public final class GalaxyRepositoryOuterClass extends com.google.protobuf.Genera
* <code>.google.protobuf.Timestamp time_of_last_deploy = 2;</code>
*/
private com.google.protobuf.SingleFieldBuilder<
com.google.protobuf.Timestamp, com.google.protobuf.Timestamp.Builder, com.google.protobuf.TimestampOrBuilder>
com.google.protobuf.Timestamp, com.google.protobuf.Timestamp.Builder, com.google.protobuf.TimestampOrBuilder>
internalGetTimeOfLastDeployFieldBuilder() {
if (timeOfLastDeployBuilder_ == null) {
timeOfLastDeployBuilder_ = new com.google.protobuf.SingleFieldBuilder<
@@ -2175,7 +2175,7 @@ public final class GalaxyRepositoryOuterClass extends com.google.protobuf.Genera
if (ref instanceof java.lang.String) {
return (java.lang.String) ref;
} else {
com.google.protobuf.ByteString bs =
com.google.protobuf.ByteString bs =
(com.google.protobuf.ByteString) ref;
java.lang.String s = bs.toStringUtf8();
pageToken_ = s;
@@ -2195,7 +2195,7 @@ public final class GalaxyRepositoryOuterClass extends com.google.protobuf.Genera
getPageTokenBytes() {
java.lang.Object ref = pageToken_;
if (ref instanceof java.lang.String) {
com.google.protobuf.ByteString b =
com.google.protobuf.ByteString b =
com.google.protobuf.ByteString.copyFromUtf8(
(java.lang.String) ref);
pageToken_ = b;
@@ -2246,7 +2246,7 @@ public final class GalaxyRepositoryOuterClass extends com.google.protobuf.Genera
if (ref instanceof java.lang.String) {
return (java.lang.String) ref;
} else {
com.google.protobuf.ByteString bs =
com.google.protobuf.ByteString bs =
(com.google.protobuf.ByteString) ref;
java.lang.String s = bs.toStringUtf8();
if (rootCase_ == 4) {
@@ -2266,7 +2266,7 @@ public final class GalaxyRepositoryOuterClass extends com.google.protobuf.Genera
ref = root_;
}
if (ref instanceof java.lang.String) {
com.google.protobuf.ByteString b =
com.google.protobuf.ByteString b =
com.google.protobuf.ByteString.copyFromUtf8(
(java.lang.String) ref);
if (rootCase_ == 4) {
@@ -2298,7 +2298,7 @@ public final class GalaxyRepositoryOuterClass extends com.google.protobuf.Genera
if (ref instanceof java.lang.String) {
return (java.lang.String) ref;
} else {
com.google.protobuf.ByteString bs =
com.google.protobuf.ByteString bs =
(com.google.protobuf.ByteString) ref;
java.lang.String s = bs.toStringUtf8();
if (rootCase_ == 5) {
@@ -2318,7 +2318,7 @@ public final class GalaxyRepositoryOuterClass extends com.google.protobuf.Genera
ref = root_;
}
if (ref instanceof java.lang.String) {
com.google.protobuf.ByteString b =
com.google.protobuf.ByteString b =
com.google.protobuf.ByteString.copyFromUtf8(
(java.lang.String) ref);
if (rootCase_ == 5) {
@@ -2483,7 +2483,7 @@ public final class GalaxyRepositoryOuterClass extends com.google.protobuf.Genera
if (ref instanceof java.lang.String) {
return (java.lang.String) ref;
} else {
com.google.protobuf.ByteString bs =
com.google.protobuf.ByteString bs =
(com.google.protobuf.ByteString) ref;
java.lang.String s = bs.toStringUtf8();
tagNameGlob_ = s;
@@ -2503,7 +2503,7 @@ public final class GalaxyRepositoryOuterClass extends com.google.protobuf.Genera
getTagNameGlobBytes() {
java.lang.Object ref = tagNameGlob_;
if (ref instanceof java.lang.String) {
com.google.protobuf.ByteString b =
com.google.protobuf.ByteString b =
com.google.protobuf.ByteString.copyFromUtf8(
(java.lang.String) ref);
tagNameGlob_ = b;
@@ -3328,7 +3328,7 @@ public final class GalaxyRepositoryOuterClass extends com.google.protobuf.Genera
getPageTokenBytes() {
java.lang.Object ref = pageToken_;
if (ref instanceof String) {
com.google.protobuf.ByteString b =
com.google.protobuf.ByteString b =
com.google.protobuf.ByteString.copyFromUtf8(
(java.lang.String) ref);
pageToken_ = b;
@@ -3471,7 +3471,7 @@ public final class GalaxyRepositoryOuterClass extends com.google.protobuf.Genera
ref = root_;
}
if (ref instanceof String) {
com.google.protobuf.ByteString b =
com.google.protobuf.ByteString b =
com.google.protobuf.ByteString.copyFromUtf8(
(java.lang.String) ref);
if (rootCase_ == 4) {
@@ -3564,7 +3564,7 @@ public final class GalaxyRepositoryOuterClass extends com.google.protobuf.Genera
ref = root_;
}
if (ref instanceof String) {
com.google.protobuf.ByteString b =
com.google.protobuf.ByteString b =
com.google.protobuf.ByteString.copyFromUtf8(
(java.lang.String) ref);
if (rootCase_ == 5) {
@@ -3768,7 +3768,7 @@ public final class GalaxyRepositoryOuterClass extends com.google.protobuf.Genera
* <code>.google.protobuf.Int32Value max_depth = 6;</code>
*/
private com.google.protobuf.SingleFieldBuilder<
com.google.protobuf.Int32Value, com.google.protobuf.Int32Value.Builder, com.google.protobuf.Int32ValueOrBuilder>
com.google.protobuf.Int32Value, com.google.protobuf.Int32Value.Builder, com.google.protobuf.Int32ValueOrBuilder>
internalGetMaxDepthFieldBuilder() {
if (maxDepthBuilder_ == null) {
maxDepthBuilder_ = new com.google.protobuf.SingleFieldBuilder<
@@ -4073,7 +4073,7 @@ public final class GalaxyRepositoryOuterClass extends com.google.protobuf.Genera
getTagNameGlobBytes() {
java.lang.Object ref = tagNameGlob_;
if (ref instanceof String) {
com.google.protobuf.ByteString b =
com.google.protobuf.ByteString b =
com.google.protobuf.ByteString.copyFromUtf8(
(java.lang.String) ref);
tagNameGlob_ = b;
@@ -4334,7 +4334,7 @@ public final class GalaxyRepositoryOuterClass extends com.google.protobuf.Genera
/**
* <code>repeated .galaxy_repository.v1.GalaxyObject objects = 1;</code>
*/
java.util.List<galaxy_repository.v1.GalaxyRepositoryOuterClass.GalaxyObject>
java.util.List<galaxy_repository.v1.GalaxyRepositoryOuterClass.GalaxyObject>
getObjectsList();
/**
* <code>repeated .galaxy_repository.v1.GalaxyObject objects = 1;</code>
@@ -4347,7 +4347,7 @@ public final class GalaxyRepositoryOuterClass extends com.google.protobuf.Genera
/**
* <code>repeated .galaxy_repository.v1.GalaxyObject objects = 1;</code>
*/
java.util.List<? extends galaxy_repository.v1.GalaxyRepositoryOuterClass.GalaxyObjectOrBuilder>
java.util.List<? extends galaxy_repository.v1.GalaxyRepositoryOuterClass.GalaxyObjectOrBuilder>
getObjectsOrBuilderList();
/**
* <code>repeated .galaxy_repository.v1.GalaxyObject objects = 1;</code>
@@ -4438,7 +4438,7 @@ public final class GalaxyRepositoryOuterClass extends com.google.protobuf.Genera
* <code>repeated .galaxy_repository.v1.GalaxyObject objects = 1;</code>
*/
@java.lang.Override
public java.util.List<? extends galaxy_repository.v1.GalaxyRepositoryOuterClass.GalaxyObjectOrBuilder>
public java.util.List<? extends galaxy_repository.v1.GalaxyRepositoryOuterClass.GalaxyObjectOrBuilder>
getObjectsOrBuilderList() {
return objects_;
}
@@ -4482,7 +4482,7 @@ public final class GalaxyRepositoryOuterClass extends com.google.protobuf.Genera
if (ref instanceof java.lang.String) {
return (java.lang.String) ref;
} else {
com.google.protobuf.ByteString bs =
com.google.protobuf.ByteString bs =
(com.google.protobuf.ByteString) ref;
java.lang.String s = bs.toStringUtf8();
nextPageToken_ = s;
@@ -4502,7 +4502,7 @@ public final class GalaxyRepositoryOuterClass extends com.google.protobuf.Genera
getNextPageTokenBytes() {
java.lang.Object ref = nextPageToken_;
if (ref instanceof java.lang.String) {
com.google.protobuf.ByteString b =
com.google.protobuf.ByteString b =
com.google.protobuf.ByteString.copyFromUtf8(
(java.lang.String) ref);
nextPageToken_ = b;
@@ -4834,7 +4834,7 @@ public final class GalaxyRepositoryOuterClass extends com.google.protobuf.Genera
objectsBuilder_ = null;
objects_ = other.objects_;
bitField0_ = (bitField0_ & ~0x00000001);
objectsBuilder_ =
objectsBuilder_ =
com.google.protobuf.GeneratedMessage.alwaysUseFieldBuilders ?
internalGetObjectsFieldBuilder() : null;
} else {
@@ -5111,7 +5111,7 @@ public final class GalaxyRepositoryOuterClass extends com.google.protobuf.Genera
/**
* <code>repeated .galaxy_repository.v1.GalaxyObject objects = 1;</code>
*/
public java.util.List<? extends galaxy_repository.v1.GalaxyRepositoryOuterClass.GalaxyObjectOrBuilder>
public java.util.List<? extends galaxy_repository.v1.GalaxyRepositoryOuterClass.GalaxyObjectOrBuilder>
getObjectsOrBuilderList() {
if (objectsBuilder_ != null) {
return objectsBuilder_.getMessageOrBuilderList();
@@ -5137,12 +5137,12 @@ public final class GalaxyRepositoryOuterClass extends com.google.protobuf.Genera
/**
* <code>repeated .galaxy_repository.v1.GalaxyObject objects = 1;</code>
*/
public java.util.List<galaxy_repository.v1.GalaxyRepositoryOuterClass.GalaxyObject.Builder>
public java.util.List<galaxy_repository.v1.GalaxyRepositoryOuterClass.GalaxyObject.Builder>
getObjectsBuilderList() {
return internalGetObjectsFieldBuilder().getBuilderList();
}
private com.google.protobuf.RepeatedFieldBuilder<
galaxy_repository.v1.GalaxyRepositoryOuterClass.GalaxyObject, galaxy_repository.v1.GalaxyRepositoryOuterClass.GalaxyObject.Builder, galaxy_repository.v1.GalaxyRepositoryOuterClass.GalaxyObjectOrBuilder>
galaxy_repository.v1.GalaxyRepositoryOuterClass.GalaxyObject, galaxy_repository.v1.GalaxyRepositoryOuterClass.GalaxyObject.Builder, galaxy_repository.v1.GalaxyRepositoryOuterClass.GalaxyObjectOrBuilder>
internalGetObjectsFieldBuilder() {
if (objectsBuilder_ == null) {
objectsBuilder_ = new com.google.protobuf.RepeatedFieldBuilder<
@@ -5189,7 +5189,7 @@ public final class GalaxyRepositoryOuterClass extends com.google.protobuf.Genera
getNextPageTokenBytes() {
java.lang.Object ref = nextPageToken_;
if (ref instanceof String) {
com.google.protobuf.ByteString b =
com.google.protobuf.ByteString b =
com.google.protobuf.ByteString.copyFromUtf8(
(java.lang.String) ref);
nextPageToken_ = b;
@@ -5924,7 +5924,7 @@ public final class GalaxyRepositoryOuterClass extends com.google.protobuf.Genera
* <code>.google.protobuf.Timestamp last_seen_deploy_time = 1;</code>
*/
private com.google.protobuf.SingleFieldBuilder<
com.google.protobuf.Timestamp, com.google.protobuf.Timestamp.Builder, com.google.protobuf.TimestampOrBuilder>
com.google.protobuf.Timestamp, com.google.protobuf.Timestamp.Builder, com.google.protobuf.TimestampOrBuilder>
internalGetLastSeenDeployTimeFieldBuilder() {
if (lastSeenDeployTimeBuilder_ == null) {
lastSeenDeployTimeBuilder_ = new com.google.protobuf.SingleFieldBuilder<
@@ -6871,7 +6871,7 @@ public final class GalaxyRepositoryOuterClass extends com.google.protobuf.Genera
* <code>.google.protobuf.Timestamp observed_at = 2;</code>
*/
private com.google.protobuf.SingleFieldBuilder<
com.google.protobuf.Timestamp, com.google.protobuf.Timestamp.Builder, com.google.protobuf.TimestampOrBuilder>
com.google.protobuf.Timestamp, com.google.protobuf.Timestamp.Builder, com.google.protobuf.TimestampOrBuilder>
internalGetObservedAtFieldBuilder() {
if (observedAtBuilder_ == null) {
observedAtBuilder_ = new com.google.protobuf.SingleFieldBuilder<
@@ -7028,7 +7028,7 @@ public final class GalaxyRepositoryOuterClass extends com.google.protobuf.Genera
* <code>.google.protobuf.Timestamp time_of_last_deploy = 3;</code>
*/
private com.google.protobuf.SingleFieldBuilder<
com.google.protobuf.Timestamp, com.google.protobuf.Timestamp.Builder, com.google.protobuf.TimestampOrBuilder>
com.google.protobuf.Timestamp, com.google.protobuf.Timestamp.Builder, com.google.protobuf.TimestampOrBuilder>
internalGetTimeOfLastDeployFieldBuilder() {
if (timeOfLastDeployBuilder_ == null) {
timeOfLastDeployBuilder_ = new com.google.protobuf.SingleFieldBuilder<
@@ -7286,7 +7286,7 @@ public final class GalaxyRepositoryOuterClass extends com.google.protobuf.Genera
/**
* <code>repeated .galaxy_repository.v1.GalaxyAttribute attributes = 10;</code>
*/
java.util.List<galaxy_repository.v1.GalaxyRepositoryOuterClass.GalaxyAttribute>
java.util.List<galaxy_repository.v1.GalaxyRepositoryOuterClass.GalaxyAttribute>
getAttributesList();
/**
* <code>repeated .galaxy_repository.v1.GalaxyAttribute attributes = 10;</code>
@@ -7299,7 +7299,7 @@ public final class GalaxyRepositoryOuterClass extends com.google.protobuf.Genera
/**
* <code>repeated .galaxy_repository.v1.GalaxyAttribute attributes = 10;</code>
*/
java.util.List<? extends galaxy_repository.v1.GalaxyRepositoryOuterClass.GalaxyAttributeOrBuilder>
java.util.List<? extends galaxy_repository.v1.GalaxyRepositoryOuterClass.GalaxyAttributeOrBuilder>
getAttributesOrBuilderList();
/**
* <code>repeated .galaxy_repository.v1.GalaxyAttribute attributes = 10;</code>
@@ -7374,7 +7374,7 @@ public final class GalaxyRepositoryOuterClass extends com.google.protobuf.Genera
if (ref instanceof java.lang.String) {
return (java.lang.String) ref;
} else {
com.google.protobuf.ByteString bs =
com.google.protobuf.ByteString bs =
(com.google.protobuf.ByteString) ref;
java.lang.String s = bs.toStringUtf8();
tagName_ = s;
@@ -7390,7 +7390,7 @@ public final class GalaxyRepositoryOuterClass extends com.google.protobuf.Genera
getTagNameBytes() {
java.lang.Object ref = tagName_;
if (ref instanceof java.lang.String) {
com.google.protobuf.ByteString b =
com.google.protobuf.ByteString b =
com.google.protobuf.ByteString.copyFromUtf8(
(java.lang.String) ref);
tagName_ = b;
@@ -7413,7 +7413,7 @@ public final class GalaxyRepositoryOuterClass extends com.google.protobuf.Genera
if (ref instanceof java.lang.String) {
return (java.lang.String) ref;
} else {
com.google.protobuf.ByteString bs =
com.google.protobuf.ByteString bs =
(com.google.protobuf.ByteString) ref;
java.lang.String s = bs.toStringUtf8();
containedName_ = s;
@@ -7429,7 +7429,7 @@ public final class GalaxyRepositoryOuterClass extends com.google.protobuf.Genera
getContainedNameBytes() {
java.lang.Object ref = containedName_;
if (ref instanceof java.lang.String) {
com.google.protobuf.ByteString b =
com.google.protobuf.ByteString b =
com.google.protobuf.ByteString.copyFromUtf8(
(java.lang.String) ref);
containedName_ = b;
@@ -7452,7 +7452,7 @@ public final class GalaxyRepositoryOuterClass extends com.google.protobuf.Genera
if (ref instanceof java.lang.String) {
return (java.lang.String) ref;
} else {
com.google.protobuf.ByteString bs =
com.google.protobuf.ByteString bs =
(com.google.protobuf.ByteString) ref;
java.lang.String s = bs.toStringUtf8();
browseName_ = s;
@@ -7468,7 +7468,7 @@ public final class GalaxyRepositoryOuterClass extends com.google.protobuf.Genera
getBrowseNameBytes() {
java.lang.Object ref = browseName_;
if (ref instanceof java.lang.String) {
com.google.protobuf.ByteString b =
com.google.protobuf.ByteString b =
com.google.protobuf.ByteString.copyFromUtf8(
(java.lang.String) ref);
browseName_ = b;
@@ -7573,7 +7573,7 @@ public final class GalaxyRepositoryOuterClass extends com.google.protobuf.Genera
* <code>repeated .galaxy_repository.v1.GalaxyAttribute attributes = 10;</code>
*/
@java.lang.Override
public java.util.List<? extends galaxy_repository.v1.GalaxyRepositoryOuterClass.GalaxyAttributeOrBuilder>
public java.util.List<? extends galaxy_repository.v1.GalaxyRepositoryOuterClass.GalaxyAttributeOrBuilder>
getAttributesOrBuilderList() {
return attributes_;
}
@@ -8059,7 +8059,7 @@ public final class GalaxyRepositoryOuterClass extends com.google.protobuf.Genera
attributesBuilder_ = null;
attributes_ = other.attributes_;
bitField0_ = (bitField0_ & ~0x00000200);
attributesBuilder_ =
attributesBuilder_ =
com.google.protobuf.GeneratedMessage.alwaysUseFieldBuilders ?
internalGetAttributesFieldBuilder() : null;
} else {
@@ -8226,7 +8226,7 @@ public final class GalaxyRepositoryOuterClass extends com.google.protobuf.Genera
getTagNameBytes() {
java.lang.Object ref = tagName_;
if (ref instanceof String) {
com.google.protobuf.ByteString b =
com.google.protobuf.ByteString b =
com.google.protobuf.ByteString.copyFromUtf8(
(java.lang.String) ref);
tagName_ = b;
@@ -8298,7 +8298,7 @@ public final class GalaxyRepositoryOuterClass extends com.google.protobuf.Genera
getContainedNameBytes() {
java.lang.Object ref = containedName_;
if (ref instanceof String) {
com.google.protobuf.ByteString b =
com.google.protobuf.ByteString b =
com.google.protobuf.ByteString.copyFromUtf8(
(java.lang.String) ref);
containedName_ = b;
@@ -8370,7 +8370,7 @@ public final class GalaxyRepositoryOuterClass extends com.google.protobuf.Genera
getBrowseNameBytes() {
java.lang.Object ref = browseName_;
if (ref instanceof String) {
com.google.protobuf.ByteString b =
com.google.protobuf.ByteString b =
com.google.protobuf.ByteString.copyFromUtf8(
(java.lang.String) ref);
browseName_ = b;
@@ -8851,7 +8851,7 @@ public final class GalaxyRepositoryOuterClass extends com.google.protobuf.Genera
/**
* <code>repeated .galaxy_repository.v1.GalaxyAttribute attributes = 10;</code>
*/
public java.util.List<? extends galaxy_repository.v1.GalaxyRepositoryOuterClass.GalaxyAttributeOrBuilder>
public java.util.List<? extends galaxy_repository.v1.GalaxyRepositoryOuterClass.GalaxyAttributeOrBuilder>
getAttributesOrBuilderList() {
if (attributesBuilder_ != null) {
return attributesBuilder_.getMessageOrBuilderList();
@@ -8877,12 +8877,12 @@ public final class GalaxyRepositoryOuterClass extends com.google.protobuf.Genera
/**
* <code>repeated .galaxy_repository.v1.GalaxyAttribute attributes = 10;</code>
*/
public java.util.List<galaxy_repository.v1.GalaxyRepositoryOuterClass.GalaxyAttribute.Builder>
public java.util.List<galaxy_repository.v1.GalaxyRepositoryOuterClass.GalaxyAttribute.Builder>
getAttributesBuilderList() {
return internalGetAttributesFieldBuilder().getBuilderList();
}
private com.google.protobuf.RepeatedFieldBuilder<
galaxy_repository.v1.GalaxyRepositoryOuterClass.GalaxyAttribute, galaxy_repository.v1.GalaxyRepositoryOuterClass.GalaxyAttribute.Builder, galaxy_repository.v1.GalaxyRepositoryOuterClass.GalaxyAttributeOrBuilder>
galaxy_repository.v1.GalaxyRepositoryOuterClass.GalaxyAttribute, galaxy_repository.v1.GalaxyRepositoryOuterClass.GalaxyAttribute.Builder, galaxy_repository.v1.GalaxyRepositoryOuterClass.GalaxyAttributeOrBuilder>
internalGetAttributesFieldBuilder() {
if (attributesBuilder_ == null) {
attributesBuilder_ = new com.google.protobuf.RepeatedFieldBuilder<
@@ -9088,7 +9088,7 @@ public final class GalaxyRepositoryOuterClass extends com.google.protobuf.Genera
if (ref instanceof java.lang.String) {
return (java.lang.String) ref;
} else {
com.google.protobuf.ByteString bs =
com.google.protobuf.ByteString bs =
(com.google.protobuf.ByteString) ref;
java.lang.String s = bs.toStringUtf8();
attributeName_ = s;
@@ -9104,7 +9104,7 @@ public final class GalaxyRepositoryOuterClass extends com.google.protobuf.Genera
getAttributeNameBytes() {
java.lang.Object ref = attributeName_;
if (ref instanceof java.lang.String) {
com.google.protobuf.ByteString b =
com.google.protobuf.ByteString b =
com.google.protobuf.ByteString.copyFromUtf8(
(java.lang.String) ref);
attributeName_ = b;
@@ -9127,7 +9127,7 @@ public final class GalaxyRepositoryOuterClass extends com.google.protobuf.Genera
if (ref instanceof java.lang.String) {
return (java.lang.String) ref;
} else {
com.google.protobuf.ByteString bs =
com.google.protobuf.ByteString bs =
(com.google.protobuf.ByteString) ref;
java.lang.String s = bs.toStringUtf8();
fullTagReference_ = s;
@@ -9143,7 +9143,7 @@ public final class GalaxyRepositoryOuterClass extends com.google.protobuf.Genera
getFullTagReferenceBytes() {
java.lang.Object ref = fullTagReference_;
if (ref instanceof java.lang.String) {
com.google.protobuf.ByteString b =
com.google.protobuf.ByteString b =
com.google.protobuf.ByteString.copyFromUtf8(
(java.lang.String) ref);
fullTagReference_ = b;
@@ -9177,7 +9177,7 @@ public final class GalaxyRepositoryOuterClass extends com.google.protobuf.Genera
if (ref instanceof java.lang.String) {
return (java.lang.String) ref;
} else {
com.google.protobuf.ByteString bs =
com.google.protobuf.ByteString bs =
(com.google.protobuf.ByteString) ref;
java.lang.String s = bs.toStringUtf8();
dataTypeName_ = s;
@@ -9193,7 +9193,7 @@ public final class GalaxyRepositoryOuterClass extends com.google.protobuf.Genera
getDataTypeNameBytes() {
java.lang.Object ref = dataTypeName_;
if (ref instanceof java.lang.String) {
com.google.protobuf.ByteString b =
com.google.protobuf.ByteString b =
com.google.protobuf.ByteString.copyFromUtf8(
(java.lang.String) ref);
dataTypeName_ = b;
@@ -9835,7 +9835,7 @@ public final class GalaxyRepositoryOuterClass extends com.google.protobuf.Genera
getAttributeNameBytes() {
java.lang.Object ref = attributeName_;
if (ref instanceof String) {
com.google.protobuf.ByteString b =
com.google.protobuf.ByteString b =
com.google.protobuf.ByteString.copyFromUtf8(
(java.lang.String) ref);
attributeName_ = b;
@@ -9907,7 +9907,7 @@ public final class GalaxyRepositoryOuterClass extends com.google.protobuf.Genera
getFullTagReferenceBytes() {
java.lang.Object ref = fullTagReference_;
if (ref instanceof String) {
com.google.protobuf.ByteString b =
com.google.protobuf.ByteString b =
com.google.protobuf.ByteString.copyFromUtf8(
(java.lang.String) ref);
fullTagReference_ = b;
@@ -10011,7 +10011,7 @@ public final class GalaxyRepositoryOuterClass extends com.google.protobuf.Genera
getDataTypeNameBytes() {
java.lang.Object ref = dataTypeName_;
if (ref instanceof String) {
com.google.protobuf.ByteString b =
com.google.protobuf.ByteString b =
com.google.protobuf.ByteString.copyFromUtf8(
(java.lang.String) ref);
dataTypeName_ = b;
@@ -10335,52 +10335,52 @@ public final class GalaxyRepositoryOuterClass extends com.google.protobuf.Genera
private static final com.google.protobuf.Descriptors.Descriptor
internal_static_galaxy_repository_v1_TestConnectionRequest_descriptor;
private static final
private static final
com.google.protobuf.GeneratedMessage.FieldAccessorTable
internal_static_galaxy_repository_v1_TestConnectionRequest_fieldAccessorTable;
private static final com.google.protobuf.Descriptors.Descriptor
internal_static_galaxy_repository_v1_TestConnectionReply_descriptor;
private static final
private static final
com.google.protobuf.GeneratedMessage.FieldAccessorTable
internal_static_galaxy_repository_v1_TestConnectionReply_fieldAccessorTable;
private static final com.google.protobuf.Descriptors.Descriptor
internal_static_galaxy_repository_v1_GetLastDeployTimeRequest_descriptor;
private static final
private static final
com.google.protobuf.GeneratedMessage.FieldAccessorTable
internal_static_galaxy_repository_v1_GetLastDeployTimeRequest_fieldAccessorTable;
private static final com.google.protobuf.Descriptors.Descriptor
internal_static_galaxy_repository_v1_GetLastDeployTimeReply_descriptor;
private static final
private static final
com.google.protobuf.GeneratedMessage.FieldAccessorTable
internal_static_galaxy_repository_v1_GetLastDeployTimeReply_fieldAccessorTable;
private static final com.google.protobuf.Descriptors.Descriptor
internal_static_galaxy_repository_v1_DiscoverHierarchyRequest_descriptor;
private static final
private static final
com.google.protobuf.GeneratedMessage.FieldAccessorTable
internal_static_galaxy_repository_v1_DiscoverHierarchyRequest_fieldAccessorTable;
private static final com.google.protobuf.Descriptors.Descriptor
internal_static_galaxy_repository_v1_DiscoverHierarchyReply_descriptor;
private static final
private static final
com.google.protobuf.GeneratedMessage.FieldAccessorTable
internal_static_galaxy_repository_v1_DiscoverHierarchyReply_fieldAccessorTable;
private static final com.google.protobuf.Descriptors.Descriptor
internal_static_galaxy_repository_v1_WatchDeployEventsRequest_descriptor;
private static final
private static final
com.google.protobuf.GeneratedMessage.FieldAccessorTable
internal_static_galaxy_repository_v1_WatchDeployEventsRequest_fieldAccessorTable;
private static final com.google.protobuf.Descriptors.Descriptor
internal_static_galaxy_repository_v1_DeployEvent_descriptor;
private static final
private static final
com.google.protobuf.GeneratedMessage.FieldAccessorTable
internal_static_galaxy_repository_v1_DeployEvent_fieldAccessorTable;
private static final com.google.protobuf.Descriptors.Descriptor
internal_static_galaxy_repository_v1_GalaxyObject_descriptor;
private static final
private static final
com.google.protobuf.GeneratedMessage.FieldAccessorTable
internal_static_galaxy_repository_v1_GalaxyObject_fieldAccessorTable;
private static final com.google.protobuf.Descriptors.Descriptor
internal_static_galaxy_repository_v1_GalaxyAttribute_descriptor;
private static final
private static final
com.google.protobuf.GeneratedMessage.FieldAccessorTable
internal_static_galaxy_repository_v1_GalaxyAttribute_fieldAccessorTable;
File diff suppressed because it is too large Load Diff
+35
View File
@@ -95,6 +95,22 @@ async with await GatewayClient.connect(
events available for parity tests. `Session` helpers call the method-specific
MXAccess commands and preserve raw replies on typed command exceptions.
`*_raw` methods (`GatewayClient.invoke_raw`, `Session.invoke_raw`) surface
gateway protocol failures by raising the typed `MxGateway*` exceptions, but
they deliberately do **not** run MXAccess-failure detection: an MXAccess
HRESULT or `MxStatusProxy` status failure is left embedded in the returned
reply and no `MxAccessError` is raised. `Session.invoke` adds that check on
top. Parity-test callers using `invoke_raw` must inspect the reply's
`protocol_status`, `hresult`, and `statuses` themselves. The non-raw `Session`
helpers (`register`, `add_item`, `write`, the bulk methods, etc.) run the
check and raise `MxAccessError`.
Value conversion (`to_mx_value`, used by `Session.write`/`write2` and the
bulk helpers) rejects non-finite floats — `nan`, `inf`, and `-inf` raise
`ValueError` rather than being forwarded to MXAccess, which has no defined
wire representation for them. Python `bytes` values are an opaque
`VT_RECORD` pass-through that MXAccess does not interpret.
Canceling a Python task cancels the client-side gRPC call or stream wait. It
does not abort an in-flight MXAccess COM call inside the worker process.
@@ -131,6 +147,25 @@ The methods return native Python types (`bool`, `datetime | None`, and a
into the hierarchy without learning the underlying stub class. The
service requires the `metadata:read` scope on the API key.
`discover_hierarchy` buffers every object (with its full attribute list)
into a single in-memory `list`. For a large Galaxy use `iter_hierarchy`
instead — it is an async generator that fetches one page at a time and
yields objects as they arrive, so peak memory stays bounded by a single
page rather than the whole hierarchy:
```python
async with await GalaxyRepositoryClient.connect(
endpoint="localhost:5000",
api_key="<gateway-api-key>",
plaintext=True,
) as galaxy:
async for obj in galaxy.iter_hierarchy():
print(obj.tag_name, obj.contained_name)
```
Pages are fetched lazily: the next page is only requested once the
caller has consumed every object from the current page.
### Watching deploy events
`GalaxyRepositoryClient.watch_deploy_events` opens a server-streaming
+1 -1
View File
@@ -5,7 +5,7 @@ build-backend = "setuptools.build_meta"
[project]
name = "mxaccess-gateway-client"
version = "0.1.0"
description = "Async Python client scaffold for MXAccess Gateway."
description = "Async Python client for MXAccess Gateway."
readme = "README.md"
requires-python = ">=3.12"
dependencies = [
+56 -23
View File
@@ -72,14 +72,20 @@ class GatewayClient:
await self.close()
async def close(self) -> None:
"""Close the owned gRPC channel."""
"""Close the owned gRPC channel.
Idempotent, including under concurrent calls: ``_closed`` is set
before the ``await`` so a second coroutine entering ``close()``
while the first is still awaiting the channel close returns
immediately instead of issuing a second ``channel.close()``.
"""
if self._closed:
return
self._closed = True
if self._channel is not None:
await self._channel.close()
self._closed = True
async def open_session(
self,
@@ -117,7 +123,15 @@ class GatewayClient:
return reply
async def invoke_raw(self, request: pb.MxCommandRequest) -> pb.MxCommandReply:
"""Send an `Invoke` RPC and return the raw reply."""
"""Send an `Invoke` RPC and return the raw reply.
Enforces gateway protocol success only. MXAccess HRESULT/status
failures are left embedded in the reply and do not raise
`MxAccessError` parity-test callers must inspect the reply's
`protocol_status`, `hresult`, and `statuses` themselves. Use
`Session.invoke` for the variant that also raises on MXAccess
failure.
"""
reply = await self._unary("invoke", self.raw_stub.Invoke, request)
ensure_protocol_success("invoke", reply.protocol_status, reply)
return reply
@@ -133,8 +147,8 @@ class GatewayClient:
kwargs: dict[str, Any] = {"metadata": merge_metadata(self.options.api_key, metadata)}
if self.options.stream_timeout is not None:
kwargs["timeout"] = self.options.stream_timeout
call = self.raw_stub.StreamEvents(request, **kwargs)
return _canceling_iterator(call)
call = _open_stream(self.raw_stub.StreamEvents, request, kwargs)
return _canceling_iterator(call, "stream events")
async def acknowledge_alarm(
self,
@@ -169,8 +183,8 @@ class GatewayClient:
kwargs: dict[str, Any] = {"metadata": merge_metadata(self.options.api_key, metadata)}
if self.options.stream_timeout is not None:
kwargs["timeout"] = self.options.stream_timeout
call = self.raw_stub.QueryActiveAlarms(request, **kwargs)
return _canceling_active_alarms_iterator(call)
call = _open_stream(self.raw_stub.QueryActiveAlarms, request, kwargs)
return _canceling_iterator(call, "query active alarms")
async def _unary(
self,
@@ -201,24 +215,43 @@ class GatewayClient:
raise map_rpc_error(operation, error) from error
async def _canceling_iterator(call: Any) -> AsyncIterator[pb.MxEvent]:
def _open_stream(method: Any, request: Any, kwargs: dict[str, Any]) -> Any:
"""Open a server-streaming call, dropping ``timeout`` if the stub rejects it.
Mirrors the fallback in ``_unary`` so an older or fake stub that does not
accept a ``timeout`` keyword argument does not crash when ``stream_timeout``
is configured.
"""
try:
async for event in call:
yield event
return method(request, **kwargs)
except TypeError as error:
if "timeout" not in kwargs or "unexpected keyword argument 'timeout'" not in str(error):
raise
kwargs.pop("timeout")
return method(request, **kwargs)
async def _canceling_iterator(call: Any, operation: str) -> AsyncIterator[Any]:
"""Yield from a server-streaming call and cancel it when iteration stops.
Explicitly catches :class:`asyncio.CancelledError` to cancel the
underlying call before re-raising, then repeats the cancel in the
``finally`` block so the call is also cancelled on a clean break or an
``aclose()``. ``galaxy._canceling_iterator`` delegates here so the
gateway and Galaxy stream helpers stay identical.
"""
try:
async for item in call:
yield item
except asyncio.CancelledError:
cancel = getattr(call, "cancel", None)
if cancel is not None:
cancel()
raise
except grpc.RpcError as error:
raise map_rpc_error("stream events", error) from error
finally:
cancel = getattr(call, "cancel", None)
if cancel is not None:
cancel()
async def _canceling_active_alarms_iterator(call: Any) -> AsyncIterator[pb.ActiveAlarmSnapshot]:
try:
async for snapshot in call:
yield snapshot
except grpc.RpcError as error:
raise map_rpc_error("query active alarms", error) from error
raise map_rpc_error(operation, error) from error
finally:
cancel = getattr(call, "cancel", None)
if cancel is not None:
+23 -1
View File
@@ -138,7 +138,7 @@ def ensure_mxaccess_success(operation: str, reply: pb.MxCommandReply) -> pb.MxCo
)
for mx_status in reply.statuses:
if mx_status.success == 0:
if _is_mxaccess_status_failure(mx_status):
raise MxAccessError(
_mxaccess_message(operation, reply),
protocol_status=status,
@@ -148,6 +148,28 @@ def ensure_mxaccess_success(operation: str, reply: pb.MxCommandReply) -> pb.MxCo
return reply
def _is_mxaccess_status_failure(mx_status: pb.MxStatusProxy) -> bool:
"""Return ``True`` only for a populated MXAccess status reporting failure.
MXAccess uses ``success == 0`` as the failure flag, but ``0`` is also the
proto3 scalar default. The gateway emits placeholder ``MxStatusProxy``
entries with ``success`` unset for null ``MXSTATUS_PROXY`` COM entries
(see ``MxStatusProxyConverter.ConvertMany``); such an entry has
``category`` of ``UNSPECIFIED`` or ``UNKNOWN``. Treating it as a failure
would raise ``MxAccessError`` for a reply that carries no real failure,
so failure is keyed on ``success == 0`` together with a populated,
non-OK status category.
"""
if mx_status.success != 0:
return False
return mx_status.category not in (
pb.MX_STATUS_CATEGORY_UNSPECIFIED,
pb.MX_STATUS_CATEGORY_UNKNOWN,
pb.MX_STATUS_CATEGORY_OK,
)
def _mxaccess_message(operation: str, reply: pb.MxCommandReply) -> str:
status_text = reply.protocol_status.message or "MXAccess command failed"
hresult = reply.hresult if reply.HasField("hresult") else None
+33 -25
View File
@@ -18,6 +18,7 @@ import grpc
from google.protobuf.timestamp_pb2 import Timestamp
from .auth import merge_metadata
from .client import _canceling_iterator
from .errors import MxGatewayError, map_rpc_error
from .generated import galaxy_repository_pb2 as galaxy_pb
from .generated import galaxy_repository_pb2_grpc as galaxy_pb_grpc
@@ -83,14 +84,20 @@ class GalaxyRepositoryClient:
await self.close()
async def close(self) -> None:
"""Close the owned gRPC channel."""
"""Close the owned gRPC channel.
Idempotent, including under concurrent calls: ``_closed`` is set
before the ``await`` so a second coroutine entering ``close()``
while the first is still awaiting the channel close returns
immediately instead of issuing a second ``channel.close()``.
"""
if self._closed:
return
self._closed = True
if self._channel is not None:
await self._channel.close()
self._closed = True
async def test_connection(self) -> bool:
"""Return ``True`` when the gateway can reach the Galaxy Repository DB."""
@@ -114,10 +121,17 @@ class GalaxyRepositoryClient:
return None
return reply.time_of_last_deploy.ToDatetime()
async def discover_hierarchy(self) -> list[galaxy_pb.GalaxyObject]:
"""Return the deployed Galaxy object hierarchy as raw proto messages."""
async def iter_hierarchy(self) -> AsyncIterator[galaxy_pb.GalaxyObject]:
"""Yield the deployed Galaxy object hierarchy one object at a time.
Pages are fetched lazily: a page is only requested once the caller has
consumed every object from the previous page. This keeps peak memory
bounded by a single page (``_DISCOVER_HIERARCHY_PAGE_SIZE`` objects)
rather than the whole Galaxy. Use this for large Galaxies; use
:meth:`discover_hierarchy` when a fully buffered ``list`` is convenient
and the Galaxy is known to be small.
"""
objects: list[galaxy_pb.GalaxyObject] = []
seen_page_tokens: set[str] = set()
page_token = ""
while True:
@@ -129,16 +143,27 @@ class GalaxyRepositoryClient:
page_token=page_token,
),
)
objects.extend(reply.objects)
for obj in reply.objects:
yield obj
page_token = reply.next_page_token
if not page_token:
return objects
return
if page_token in seen_page_tokens:
raise MxGatewayError(
f"galaxy discover hierarchy returned repeated page token {page_token!r}"
)
seen_page_tokens.add(page_token)
async def discover_hierarchy(self) -> list[galaxy_pb.GalaxyObject]:
"""Return the deployed Galaxy object hierarchy as raw proto messages.
This buffers every object (and its full attribute list) into a single
in-memory ``list``. For a large Galaxy prefer :meth:`iter_hierarchy`,
which streams objects page by page without holding the whole hierarchy.
"""
return [obj async for obj in self.iter_hierarchy()]
def watch_deploy_events(
self,
last_seen_deploy_time: datetime | None = None,
@@ -171,7 +196,7 @@ class GalaxyRepositoryClient:
kwargs.pop("timeout")
call = self.raw_stub.WatchDeployEvents(request, **kwargs)
return _canceling_iterator(call)
return _canceling_iterator(call, "watch deploy events")
async def _unary(
self,
@@ -200,20 +225,3 @@ class GalaxyRepositoryClient:
raise
except grpc.RpcError as error:
raise map_rpc_error(operation, error) from error
async def _canceling_iterator(call: Any) -> AsyncIterator[galaxy_pb.DeployEvent]:
try:
async for event in call:
yield event
except asyncio.CancelledError:
cancel = getattr(call, "cancel", None)
if cancel is not None:
cancel()
raise
except grpc.RpcError as error:
raise map_rpc_error("watch deploy events", error) from error
finally:
cancel = getattr(call, "cancel", None)
if cancel is not None:
cancel()
+22 -8
View File
@@ -3,11 +3,15 @@
from __future__ import annotations
from collections.abc import AsyncIterator, Sequence
from typing import TYPE_CHECKING
from .errors import ensure_mxaccess_success
from .generated import mxaccess_gateway_pb2 as pb
from .values import MxValueInput, to_mx_value
if TYPE_CHECKING:
from .client import GatewayClient
MAX_BULK_ITEMS = 1000
@@ -36,7 +40,13 @@ class Session:
await self.close()
async def close(self, *, client_correlation_id: str = "") -> pb.CloseSessionReply:
"""Close the gateway session. Repeated calls return a local closed reply."""
"""Close the gateway session. Repeated calls return a local closed reply.
Idempotent, including under concurrent calls: ``_closed`` is set
before the ``CloseSession`` RPC is awaited so a second coroutine
entering ``close()`` while the first RPC is in flight returns the
local closed reply instead of issuing a second ``CloseSession``.
"""
if self._closed:
return pb.CloseSessionReply(
@@ -44,15 +54,14 @@ class Session:
final_state=pb.SESSION_STATE_CLOSED,
protocol_status=pb.ProtocolStatus(code=pb.PROTOCOL_STATUS_CODE_OK),
)
self._closed = True
reply = await self.client.close_session_raw(
return await self.client.close_session_raw(
pb.CloseSessionRequest(
session_id=self.session_id,
client_correlation_id=client_correlation_id,
),
)
self._closed = True
return reply
async def invoke(self, command: pb.MxCommand, *, correlation_id: str = "") -> pb.MxCommandReply:
"""Invoke a raw command and enforce gateway and MXAccess success."""
@@ -66,7 +75,15 @@ class Session:
*,
correlation_id: str = "",
) -> pb.MxCommandReply:
"""Invoke a raw command and preserve the raw reply."""
"""Invoke a raw command and preserve the raw reply.
Enforces gateway protocol success only unlike :meth:`invoke`, it
does not run MXAccess-failure detection. An MXAccess HRESULT or
``MxStatusProxy`` status failure is left embedded in the returned
reply and no ``MxAccessError`` is raised. Parity-test callers must
inspect ``protocol_status``, ``hresult``, and ``statuses`` on the
reply themselves.
"""
return await self.client.invoke_raw(
pb.MxCommandRequest(
@@ -399,6 +416,3 @@ class Session:
def _ensure_bulk_size(name: str, count: int) -> None:
if count > MAX_BULK_ITEMS:
raise ValueError(f"{name} bulk commands are limited to {MAX_BULK_ITEMS} item(s)")
from .client import GatewayClient # noqa: E402
+26 -1
View File
@@ -1,7 +1,20 @@
"""MXAccess value conversion helpers."""
"""MXAccess value conversion helpers.
Value-mapping assumptions (see ``to_mx_value``):
* A Python ``float`` maps to ``VT_R8`` / ``MX_DATA_TYPE_DOUBLE``. Only finite
values are accepted ``nan``, ``inf`` and ``-inf`` raise ``ValueError``
rather than being forwarded to MXAccess, which has no defined wire
representation for non-finite doubles.
* A Python ``bytes`` value maps to ``VT_RECORD`` / ``MX_DATA_TYPE_UNKNOWN``
and is carried in ``raw_value``. This is an opaque pass-through: MXAccess
does not interpret the bytes. Pass ``data_type`` explicitly when a concrete
MXAccess type is required.
"""
from __future__ import annotations
import math
from collections.abc import Sequence
from dataclasses import dataclass
from datetime import datetime, timezone
@@ -60,6 +73,7 @@ def to_mx_value(value: MxValueInput, *, data_type: str | None = None) -> pb.MxVa
)
if isinstance(value, float):
_ensure_finite(value)
return pb.MxValue(
data_type=_data_type(data_type, pb.MX_DATA_TYPE_DOUBLE),
variant_type="VT_R8",
@@ -177,6 +191,8 @@ def _sequence_to_mx_value(
return pb.MxValue(data_type=pb.MX_DATA_TYPE_INTEGER, array_value=array)
if all(isinstance(item, float) for item in sequence):
for item in sequence:
_ensure_finite(item)
array = pb.MxArray(
element_data_type=pb.MX_DATA_TYPE_DOUBLE,
variant_type="VT_ARRAY|VT_R8",
@@ -232,3 +248,12 @@ def _data_type(name: str | None, default: int) -> int:
if name is None:
return default
return pb.MxDataType.Value(name)
def _ensure_finite(value: float) -> None:
"""Reject non-finite doubles, which MXAccess cannot represent on the wire."""
if not math.isfinite(value):
raise ValueError(
f"MxValue double inputs must be finite; got {value!r}",
)
+3 -8
View File
@@ -18,6 +18,7 @@ from mxgateway.client import GatewayClient
from mxgateway.errors import MxGatewayError
from mxgateway.generated import mxaccess_gateway_pb2 as pb
from mxgateway.options import ClientOptions
from mxgateway.session import Session
from mxgateway.values import MxValueInput
MAX_AGGREGATE_EVENTS = 10_000
@@ -383,8 +384,7 @@ async def _write2(**kwargs: Any) -> dict[str, Any]:
async def _smoke(**kwargs: Any) -> dict[str, Any]:
async with await _connect(kwargs) as client:
session = await client.open_session(client_session_name=kwargs["client_name"])
closed = False
try:
async with session:
server_handle = await session.register(kwargs["client_name"])
item_handle = await session.add_item(server_handle, kwargs["item"])
await session.advise(server_handle, item_handle)
@@ -399,9 +399,6 @@ async def _smoke(**kwargs: Any) -> dict[str, Any]:
"itemHandle": item_handle,
"events": [_message_dict(event) for event in events],
}
finally:
if not closed:
await session.close()
async def _connect(kwargs: dict[str, Any]) -> GatewayClient:
@@ -419,9 +416,7 @@ async def _connect(kwargs: dict[str, Any]) -> GatewayClient:
)
def _session(client: GatewayClient, session_id: str):
from mxgateway.session import Session
def _session(client: GatewayClient, session_id: str) -> Session:
return Session(client=client, session_id=session_id)
+284
View File
@@ -0,0 +1,284 @@
"""Regression tests for Client.Python-009: untested public paths.
Covers `Session.write2`/`add_item2` request construction, the bulk-size limit
guard, the ``None``-argument ``TypeError`` guards, the TLS ``ca_file`` read
path in `create_channel`, the generic `map_rpc_error` fallthrough, and a
happy-path CLI command body driven by a fake stub.
"""
from __future__ import annotations
import json
from datetime import datetime, timezone
from typing import Any
import grpc
import pytest
from click.testing import CliRunner
from mxgateway import ClientOptions, GatewayClient
from mxgateway.errors import MxGatewayTransportError, map_rpc_error
from mxgateway.generated import mxaccess_gateway_pb2 as pb
from mxgateway.options import create_channel
from mxgateway.session import MAX_BULK_ITEMS, Session
class _FakeUnary:
def __init__(self, replies: list[Any]) -> None:
self.replies = list(replies)
self.requests: list[Any] = []
self.metadata: tuple[tuple[str, str], ...] | None = None
async def __call__(
self,
request: Any,
*,
metadata: tuple[tuple[str, str], ...],
) -> Any:
self.requests.append(request)
self.metadata = metadata
return self.replies.pop(0)
class _FakeGatewayStub:
def __init__(self) -> None:
self.open_session = _FakeUnary(
[
pb.OpenSessionReply(
session_id="session-1",
protocol_status=pb.ProtocolStatus(code=pb.PROTOCOL_STATUS_CODE_OK),
),
],
)
self.invoke = _FakeUnary([])
self.OpenSession = self.open_session
self.Invoke = self.invoke
def _ok_reply(kind: int, **fields: Any) -> pb.MxCommandReply:
return pb.MxCommandReply(
session_id="session-1",
kind=kind,
protocol_status=pb.ProtocolStatus(code=pb.PROTOCOL_STATUS_CODE_OK),
**fields,
)
# --- write2 / add_item2 request construction -------------------------------
@pytest.mark.asyncio
async def test_add_item2_sends_item_context_and_returns_handle() -> None:
stub = _FakeGatewayStub()
stub.invoke.replies = [
_ok_reply(pb.MX_COMMAND_KIND_ADD_ITEM2, add_item2=pb.AddItem2Reply(item_handle=77)),
]
client = await GatewayClient.connect(
ClientOptions(endpoint="fake", plaintext=True),
stub=stub,
)
session = await client.open_session()
item_handle = await session.add_item2(12, "Object.Attribute", "ctx-A")
assert item_handle == 77
command = stub.invoke.requests[0].command
assert command.kind == pb.MX_COMMAND_KIND_ADD_ITEM2
assert command.add_item2.server_handle == 12
assert command.add_item2.item_definition == "Object.Attribute"
assert command.add_item2.item_context == "ctx-A"
@pytest.mark.asyncio
async def test_write2_sends_value_and_timestamp_value() -> None:
stub = _FakeGatewayStub()
stub.invoke.replies = [_ok_reply(pb.MX_COMMAND_KIND_WRITE2)]
client = await GatewayClient.connect(
ClientOptions(endpoint="fake", plaintext=True),
stub=stub,
)
session = await client.open_session()
when = datetime(2025, 4, 1, 12, 0, 0, tzinfo=timezone.utc)
await session.write2(12, 34, 123, when, user_id=5)
command = stub.invoke.requests[0].command
assert command.kind == pb.MX_COMMAND_KIND_WRITE2
assert command.write2.server_handle == 12
assert command.write2.item_handle == 34
assert command.write2.user_id == 5
# The integer value is carried as the int32 field of the MxValue oneof.
assert command.write2.value.WhichOneof("kind") == "int32_value"
assert command.write2.value.int32_value == 123
# The timestamp value carries the datetime via the timestamp_value oneof.
assert command.write2.timestamp_value.WhichOneof("kind") == "timestamp_value"
assert command.write2.timestamp_value.timestamp_value.ToDatetime(
tzinfo=timezone.utc,
) == when
# --- bulk-size limit + None-argument guards --------------------------------
@pytest.mark.asyncio
async def test_subscribe_bulk_rejects_oversized_request() -> None:
stub = _FakeGatewayStub()
client = await GatewayClient.connect(
ClientOptions(endpoint="fake", plaintext=True),
stub=stub,
)
session = await client.open_session()
oversized = [f"Tag_{i}" for i in range(MAX_BULK_ITEMS + 1)]
with pytest.raises(ValueError, match=str(MAX_BULK_ITEMS)):
await session.subscribe_bulk(12, oversized)
# No RPC should have been issued for a rejected request.
assert stub.invoke.requests == []
@pytest.mark.asyncio
async def test_advise_item_bulk_rejects_none_argument() -> None:
stub = _FakeGatewayStub()
client = await GatewayClient.connect(
ClientOptions(endpoint="fake", plaintext=True),
stub=stub,
)
session = await client.open_session()
with pytest.raises(TypeError, match="item_handles is required"):
await session.advise_item_bulk(12, None) # type: ignore[arg-type]
@pytest.mark.asyncio
async def test_add_item_bulk_at_limit_is_allowed() -> None:
stub = _FakeGatewayStub()
stub.invoke.replies = [
_ok_reply(
pb.MX_COMMAND_KIND_ADD_ITEM_BULK,
add_item_bulk=pb.BulkSubscribeReply(results=[]),
),
]
client = await GatewayClient.connect(
ClientOptions(endpoint="fake", plaintext=True),
stub=stub,
)
session = await client.open_session()
at_limit = [f"Tag_{i}" for i in range(MAX_BULK_ITEMS)]
results = await session.add_item_bulk(12, at_limit)
assert results == []
assert len(stub.invoke.requests) == 1
assert len(stub.invoke.requests[0].command.add_item_bulk.tag_addresses) == MAX_BULK_ITEMS
# --- TLS ca_file read path -------------------------------------------------
@pytest.mark.asyncio
async def test_create_channel_reads_ca_file(tmp_path: Any) -> None:
ca_path = tmp_path / "ca.pem"
ca_path.write_bytes(b"-----BEGIN CERTIFICATE-----\nfake\n-----END CERTIFICATE-----\n")
channel = create_channel(
ClientOptions(
endpoint="mxgateway.example.local:5001",
ca_file=str(ca_path),
server_name_override="mxgateway.example.local",
),
)
# A secure channel object is returned without raising; the ca_file was read.
assert channel is not None
await channel.close()
def test_create_channel_missing_ca_file_raises() -> None:
with pytest.raises(FileNotFoundError):
create_channel(
ClientOptions(
endpoint="mxgateway.example.local:5001",
ca_file="C:/does/not/exist/ca.pem",
),
)
# --- map_rpc_error generic fallthrough -------------------------------------
class _FakeRpcError(grpc.RpcError):
def __init__(self, code: grpc.StatusCode, details: str) -> None:
self._code = code
self._details = details
def code(self) -> grpc.StatusCode:
return self._code
def details(self) -> str:
return self._details
def test_map_rpc_error_generic_branch_returns_transport_error() -> None:
error = _FakeRpcError(grpc.StatusCode.UNAVAILABLE, "connection refused")
mapped = map_rpc_error("invoke", error)
assert type(mapped) is MxGatewayTransportError
assert "invoke failed: connection refused" in str(mapped)
def test_map_rpc_error_handles_error_without_code() -> None:
mapped = map_rpc_error("invoke", grpc.RpcError())
assert type(mapped) is MxGatewayTransportError
assert "invoke failed:" in str(mapped)
# --- happy-path CLI command body -------------------------------------------
def test_cli_register_happy_path_emits_server_handle(monkeypatch: Any) -> None:
"""Drive the `register` CLI command end to end against a fake stub."""
from mxgateway_cli import commands
invoke = _FakeUnary(
[
_ok_reply(
pb.MX_COMMAND_KIND_REGISTER,
register=pb.RegisterReply(server_handle=99),
),
],
)
class _Stub:
def __init__(self) -> None:
self.Invoke = invoke
async def _fake_connect(kwargs: dict[str, Any]) -> GatewayClient:
return await GatewayClient.connect(
ClientOptions(endpoint=kwargs["endpoint"], plaintext=True),
stub=_Stub(),
)
monkeypatch.setattr(commands, "_connect", _fake_connect)
runner = CliRunner()
result = runner.invoke(
commands.main,
[
"register",
"--endpoint",
"localhost:5000",
"--session-id",
"session-1",
"--client-name",
"pytest-client",
"--json",
],
)
assert result.exit_code == 0, result.output
assert json.loads(result.output) == {"serverHandle": 99}
assert invoke.requests[0].command.register.client_name == "pytest-client"
@@ -0,0 +1,127 @@
"""Regression tests for Client.Python-005: streaming hierarchy iteration.
`GalaxyRepositoryClient.iter_hierarchy` yields objects page by page instead of
buffering the entire Galaxy hierarchy in memory, and `discover_hierarchy`
remains a convenience wrapper built on top of it.
"""
from __future__ import annotations
from typing import Any
import pytest
from mxgateway import ClientOptions, GalaxyRepositoryClient
from mxgateway.generated import galaxy_repository_pb2 as galaxy_pb
class _FakeUnary:
def __init__(self, replies: list[Any]) -> None:
self.replies = list(replies)
self.requests: list[Any] = []
self.metadata: tuple[tuple[str, str], ...] | None = None
async def __call__(
self,
request: Any,
*,
metadata: tuple[tuple[str, str], ...],
timeout: float | None = None,
) -> Any:
self.requests.append(request)
self.metadata = metadata
return self.replies.pop(0)
class _FakeGalaxyStub:
def __init__(self, discover_replies: list[Any]) -> None:
self.DiscoverHierarchy = _FakeUnary(discover_replies)
def _two_page_replies() -> list[galaxy_pb.DiscoverHierarchyReply]:
return [
galaxy_pb.DiscoverHierarchyReply(
next_page_token="page-2",
total_object_count=3,
objects=[
galaxy_pb.GalaxyObject(gobject_id=1, tag_name="Area_001", is_area=True),
galaxy_pb.GalaxyObject(gobject_id=2, tag_name="Pump_001"),
],
),
galaxy_pb.DiscoverHierarchyReply(
total_object_count=3,
objects=[
galaxy_pb.GalaxyObject(gobject_id=3, tag_name="Pump_002"),
],
),
]
@pytest.mark.asyncio
async def test_iter_hierarchy_yields_objects_across_pages() -> None:
stub = _FakeGalaxyStub(_two_page_replies())
client = await GalaxyRepositoryClient.connect(
ClientOptions(endpoint="fake", plaintext=True),
stub=stub,
)
tags = [obj.tag_name async for obj in client.iter_hierarchy()]
assert tags == ["Area_001", "Pump_001", "Pump_002"]
assert len(stub.DiscoverHierarchy.requests) == 2
assert stub.DiscoverHierarchy.requests[0].page_token == ""
assert stub.DiscoverHierarchy.requests[1].page_token == "page-2"
@pytest.mark.asyncio
async def test_iter_hierarchy_is_lazy_and_does_not_prefetch_next_page() -> None:
"""Pulling only the first object must not have requested the second page."""
stub = _FakeGalaxyStub(_two_page_replies())
client = await GalaxyRepositoryClient.connect(
ClientOptions(endpoint="fake", plaintext=True),
stub=stub,
)
iterator = client.iter_hierarchy()
first = await iterator.__anext__()
assert first.tag_name == "Area_001"
# Only the first page should have been fetched so far.
assert len(stub.DiscoverHierarchy.requests) == 1
await iterator.aclose()
@pytest.mark.asyncio
async def test_iter_hierarchy_rejects_repeated_page_token() -> None:
stub = _FakeGalaxyStub(
[
galaxy_pb.DiscoverHierarchyReply(next_page_token="7:1"),
galaxy_pb.DiscoverHierarchyReply(next_page_token="7:1"),
],
)
client = await GalaxyRepositoryClient.connect(
ClientOptions(endpoint="fake", plaintext=True),
stub=stub,
)
with pytest.raises(Exception, match="repeated page token"):
async for _ in client.iter_hierarchy():
pass
@pytest.mark.asyncio
async def test_discover_hierarchy_still_returns_full_list() -> None:
"""The convenience wrapper must keep returning a buffered list."""
stub = _FakeGalaxyStub(_two_page_replies())
client = await GalaxyRepositoryClient.connect(
ClientOptions(endpoint="fake", plaintext=True),
stub=stub,
)
objects = await client.discover_hierarchy()
assert isinstance(objects, list)
assert [obj.tag_name for obj in objects] == ["Area_001", "Pump_001", "Pump_002"]
@@ -0,0 +1,228 @@
"""Regression tests for Client.Python low-severity code-review findings.
Covers Client.Python-006 (concurrent-close idempotency),
Client.Python-007 (shared cancelling stream helper),
Client.Python-008 (non-finite float / bytes value mapping), and
Client.Python-011 (`success == 0` proto3-default ambiguity).
"""
from __future__ import annotations
import asyncio
import math
from typing import Any
import pytest
from mxgateway import ClientOptions, GalaxyRepositoryClient, GatewayClient
from mxgateway.errors import ensure_mxaccess_success, MxAccessError
from mxgateway.generated import mxaccess_gateway_pb2 as pb
from mxgateway.values import to_mx_value
# --- Client.Python-006: concurrent close() is idempotent -------------------
class CountingChannel:
"""A fake gRPC channel that records and stalls on close()."""
def __init__(self) -> None:
self.close_calls = 0
self._gate = asyncio.Event()
async def close(self) -> None:
self.close_calls += 1
# Yield control so a second concurrent close() can interleave at the
# exact point a check-then-set guard would have left the window open.
await self._gate.wait()
@pytest.mark.asyncio
async def test_gateway_client_concurrent_close_closes_channel_once() -> None:
channel = CountingChannel()
client = GatewayClient(
options=ClientOptions(endpoint="fake", plaintext=True),
stub=object(),
channel=channel, # type: ignore[arg-type]
)
first = asyncio.create_task(client.close())
second = asyncio.create_task(client.close())
await asyncio.sleep(0) # let both coroutines pass the guard if racy
channel._gate.set()
await asyncio.gather(first, second)
assert channel.close_calls == 1
@pytest.mark.asyncio
async def test_galaxy_client_concurrent_close_closes_channel_once() -> None:
channel = CountingChannel()
client = GalaxyRepositoryClient(
options=ClientOptions(endpoint="fake", plaintext=True),
stub=object(),
channel=channel, # type: ignore[arg-type]
)
first = asyncio.create_task(client.close())
second = asyncio.create_task(client.close())
await asyncio.sleep(0)
channel._gate.set()
await asyncio.gather(first, second)
assert channel.close_calls == 1
@pytest.mark.asyncio
async def test_session_concurrent_close_sends_one_close_session_rpc() -> None:
gate = asyncio.Event()
rpc_calls = 0
class StallingClient:
async def close_session_raw(self, request: Any) -> pb.CloseSessionReply:
nonlocal rpc_calls
rpc_calls += 1
await gate.wait()
return pb.CloseSessionReply(
session_id=request.session_id,
final_state=pb.SESSION_STATE_CLOSED,
protocol_status=pb.ProtocolStatus(code=pb.PROTOCOL_STATUS_CODE_OK),
)
from mxgateway.session import Session
session = Session(client=StallingClient(), session_id="session-1") # type: ignore[arg-type]
first = asyncio.create_task(session.close())
second = asyncio.create_task(session.close())
await asyncio.sleep(0)
gate.set()
await asyncio.gather(first, second)
assert rpc_calls == 1
# --- Client.Python-007: shared cancelling stream helper --------------------
@pytest.mark.asyncio
async def test_gateway_stream_iterator_cancels_call_on_task_cancellation() -> None:
"""A cancelled gateway stream iterator must explicitly cancel the call."""
class CancellableStream:
def __init__(self) -> None:
self.cancelled = False
def __aiter__(self) -> "CancellableStream":
return self
async def __anext__(self) -> pb.MxEvent:
await asyncio.Event().wait() # blocks until cancelled
raise AssertionError("unreachable")
def cancel(self) -> None:
self.cancelled = True
from mxgateway.client import _canceling_iterator
stream = CancellableStream()
iterator = _canceling_iterator(stream, "stream events")
task = asyncio.create_task(anext(iterator))
await asyncio.sleep(0)
task.cancel()
with pytest.raises(asyncio.CancelledError):
await task
# aclose() unwinds the generator's finally block.
await iterator.aclose()
assert stream.cancelled
# --- Client.Python-008: non-finite float and bytes value mapping -----------
def test_to_mx_value_rejects_nan() -> None:
with pytest.raises(ValueError, match="finite"):
to_mx_value(float("nan"))
def test_to_mx_value_rejects_positive_infinity() -> None:
with pytest.raises(ValueError, match="finite"):
to_mx_value(float("inf"))
def test_to_mx_value_rejects_negative_infinity() -> None:
with pytest.raises(ValueError, match="finite"):
to_mx_value(float("-inf"))
def test_to_mx_value_accepts_finite_float() -> None:
assert to_mx_value(3.5).double_value == 3.5
def test_to_mx_value_rejects_non_finite_float_in_sequence() -> None:
with pytest.raises(ValueError, match="finite"):
to_mx_value([1.0, math.inf])
# --- Client.Python-011: success == 0 proto3-default ambiguity --------------
def test_ensure_mxaccess_success_ignores_unpopulated_status_entry() -> None:
"""A status entry left at proto3 defaults is not a real MXAccess failure.
The gateway emits such a placeholder for a null MXSTATUS_PROXY COM entry
(``MxStatusProxyConverter.ConvertMany``): ``success`` stays 0 but the
entry carries no failure category. It must not raise ``MxAccessError``.
"""
reply = pb.MxCommandReply(
session_id="session-1",
kind=pb.MX_COMMAND_KIND_SUBSCRIBE_BULK,
protocol_status=pb.ProtocolStatus(code=pb.PROTOCOL_STATUS_CODE_OK),
statuses=[
pb.MxStatusProxy(), # all-default: success == 0, category UNSPECIFIED
pb.MxStatusProxy( # the gateway's null-entry placeholder
category=pb.MX_STATUS_CATEGORY_UNKNOWN,
detected_by=pb.MX_STATUS_SOURCE_UNKNOWN,
),
],
)
assert ensure_mxaccess_success("subscribe bulk", reply) is reply
def test_ensure_mxaccess_success_raises_on_populated_failure_status() -> None:
"""A populated failure status (success == 0 with a failure category) raises."""
reply = pb.MxCommandReply(
session_id="session-1",
kind=pb.MX_COMMAND_KIND_WRITE,
protocol_status=pb.ProtocolStatus(code=pb.PROTOCOL_STATUS_CODE_OK),
statuses=[
pb.MxStatusProxy(
success=0,
category=pb.MX_STATUS_CATEGORY_COMMUNICATION_ERROR,
),
],
)
with pytest.raises(MxAccessError):
ensure_mxaccess_success("write", reply)
def test_ensure_mxaccess_success_passes_when_status_reports_success() -> None:
reply = pb.MxCommandReply(
session_id="session-1",
kind=pb.MX_COMMAND_KIND_WRITE,
protocol_status=pb.ProtocolStatus(code=pb.PROTOCOL_STATUS_CODE_OK),
statuses=[
pb.MxStatusProxy(success=1, category=pb.MX_STATUS_CATEGORY_OK),
],
)
assert ensure_mxaccess_success("write", reply) is reply
@@ -0,0 +1,132 @@
"""Regression tests for Client.Python-003: stream timeout-kwarg fallback.
`stream_events_raw` and `query_active_alarms` must tolerate a fake/older stub
that does not accept a ``timeout`` keyword argument, matching the fallback
already present in `galaxy.watch_deploy_events` and the unary `_unary` helper.
"""
from __future__ import annotations
from typing import Any
import pytest
from mxgateway import ClientOptions, GatewayClient
from mxgateway.generated import mxaccess_gateway_pb2 as pb
class _NoTimeoutStream:
"""Sync-callable unary-stream fake that rejects a ``timeout`` kwarg."""
def __init__(self, replies: list[Any]) -> None:
self._replies = list(replies)
self.requests: list[Any] = []
self.metadata: tuple[tuple[str, str], ...] | None = None
self.cancelled = False
def __call__(
self,
request: Any,
*,
metadata: tuple[tuple[str, str], ...],
) -> "_NoTimeoutStream":
self.requests.append(request)
self.metadata = metadata
return self
def __aiter__(self) -> "_NoTimeoutStream":
return self
async def __anext__(self) -> Any:
if not self._replies:
raise StopAsyncIteration
return self._replies.pop(0)
def cancel(self) -> None:
self.cancelled = True
class _NoTimeoutStubStreamEvents:
def __init__(self, stream: _NoTimeoutStream) -> None:
self.StreamEvents = stream
class _NoTimeoutStubQueryAlarms:
def __init__(self, stream: _NoTimeoutStream) -> None:
self.QueryActiveAlarms = stream
@pytest.mark.asyncio
async def test_stream_events_raw_falls_back_when_stub_rejects_timeout() -> None:
stream = _NoTimeoutStream(
[pb.MxEvent(session_id="session-1", worker_sequence=1)],
)
client = await GatewayClient.connect(
ClientOptions(endpoint="fake", plaintext=True, stream_timeout=5.0),
stub=_NoTimeoutStubStreamEvents(stream),
)
received = [
event
async for event in client.stream_events_raw(
pb.StreamEventsRequest(session_id="session-1"),
)
]
assert len(received) == 1
assert received[0].worker_sequence == 1
@pytest.mark.asyncio
async def test_query_active_alarms_falls_back_when_stub_rejects_timeout() -> None:
stream = _NoTimeoutStream(
[pb.ActiveAlarmSnapshot(alarm_full_reference="Tank01.Level.HiHi")],
)
client = await GatewayClient.connect(
ClientOptions(endpoint="fake", plaintext=True, stream_timeout=5.0),
stub=_NoTimeoutStubQueryAlarms(stream),
)
received = [
snapshot
async for snapshot in client.query_active_alarms(
pb.QueryActiveAlarmsRequest(session_id="session-1"),
)
]
assert len(received) == 1
assert received[0].alarm_full_reference == "Tank01.Level.HiHi"
@pytest.mark.asyncio
async def test_stream_events_raw_still_passes_timeout_to_capable_stub() -> None:
"""A stub that accepts ``timeout`` must still receive the configured value."""
captured: dict[str, Any] = {}
class _CapableStream(_NoTimeoutStream):
def __call__( # type: ignore[override]
self,
request: Any,
*,
metadata: tuple[tuple[str, str], ...],
timeout: float | None = None,
) -> "_CapableStream":
captured["timeout"] = timeout
return super().__call__(request, metadata=metadata)
stream = _CapableStream([pb.MxEvent(session_id="session-1", worker_sequence=9)])
client = await GatewayClient.connect(
ClientOptions(endpoint="fake", plaintext=True, stream_timeout=7.5),
stub=_NoTimeoutStubStreamEvents(stream),
)
received = [
event
async for event in client.stream_events_raw(
pb.StreamEventsRequest(session_id="session-1"),
)
]
assert len(received) == 1
assert captured["timeout"] == 7.5
+19 -14
View File
@@ -11,28 +11,34 @@ generated contract inputs.
## Crate Layout
Recommended layout:
Actual layout — the `mxgateway-client` library crate is the workspace root,
with the `mxgw` test CLI as a workspace member:
```text
clients/rust/
clients/rust/ # `mxgateway-client` library crate (workspace root)
Cargo.toml
build.rs
src/
lib.rs
client.rs
session.rs
galaxy.rs
options.rs
auth.rs
value.rs
version.rs
error.rs
generated.rs
crates/
mxgateway-client/
src/lib.rs
src/client.rs
src/session.rs
src/options.rs
src/auth.rs
src/value.rs
src/error.rs
src/generated/
mxgw-cli/
mxgw-cli/ # `mxgw` test CLI (workspace member)
Cargo.toml
src/main.rs
tests/
client_behavior.rs
proto_fixtures.rs
```
Expected dependencies:
Dependencies:
- `tonic`
- `prost`
@@ -43,7 +49,6 @@ Expected dependencies:
- `clap`
- `serde`
- `serde_json`
- `tracing`
## Library API
+8 -2
View File
@@ -1048,8 +1048,14 @@ mod tests {
fn version_json_output_has_protocol_versions() {
let value = super::version_json();
assert_eq!(value["gatewayProtocolVersion"], 2);
assert_eq!(value["workerProtocolVersion"], 1);
assert_eq!(
value["gatewayProtocolVersion"],
super::GATEWAY_PROTOCOL_VERSION
);
assert_eq!(
value["workerProtocolVersion"],
super::WORKER_PROTOCOL_VERSION
);
}
#[test]
+3 -1
View File
@@ -219,7 +219,9 @@ impl GatewayClient {
request: AcknowledgeAlarmRequest,
) -> Result<AcknowledgeAlarmReply, Error> {
let mut client = self.inner.clone();
let response = client.acknowledge_alarm(self.unary_request(request)).await?;
let response = client
.acknowledge_alarm(self.unary_request(request))
.await?;
let reply = response.into_inner();
ensure_protocol_success("acknowledge alarm", reply.protocol_status.as_ref())?;
Ok(reply)
+28 -4
View File
@@ -1,10 +1,10 @@
//! Error types surfaced by the Rust client.
//!
//! [`Error`] is the umbrella enum returned by every async wrapper. It
//! classifies `tonic::Status` codes (auth, timeout, cancellation) and folds
//! gateway protocol failures and command-level rejections into structured
//! variants. Credentials embedded in status messages are scrubbed before the
//! message reaches a caller.
//! classifies `tonic::Status` codes (auth, timeout, cancellation, transient
//! unavailability) and folds gateway protocol failures and command-level
//! rejections into structured variants. Credentials embedded in status
//! messages are scrubbed before the message reaches a caller.
use thiserror::Error as ThisError;
use tonic::Code;
@@ -85,6 +85,17 @@ pub enum Error {
status: Box<tonic::Status>,
},
/// Server returned `Unavailable` or `ResourceExhausted` — a transient
/// failure (gateway restart, overload) that a caller may reasonably retry.
#[error("gateway temporarily unavailable: {message}")]
Unavailable {
/// Redacted server-supplied detail message.
message: String,
/// Original `tonic::Status`.
#[source]
status: Box<tonic::Status>,
},
/// Any other `tonic::Status` that did not match a more specific variant.
#[error("gateway status error: {0}")]
Status(Box<tonic::Status>),
@@ -106,6 +117,15 @@ pub enum Error {
/// Detail message from the server.
message: String,
},
/// The gateway returned an OK reply whose payload did not carry the data
/// the command contract requires (for example, an `AddItem` reply with no
/// item handle and no `return_value`).
#[error("malformed gateway reply: {detail}")]
MalformedReply {
/// Human-readable description of what the reply was missing.
detail: String,
},
}
/// Wrapper around an [`MxCommandReply`] whose `protocol_status` reported a
@@ -174,6 +194,10 @@ impl From<tonic::Status> for Error {
message,
status: Box::new(status),
},
Code::Unavailable | Code::ResourceExhausted => Self::Unavailable {
message,
status: Box::new(status),
},
_ => Self::Status(Box::new(status)),
}
}
+1 -1
View File
@@ -279,7 +279,7 @@ mod tests {
_request: Request<GetLastDeployTimeRequest>,
) -> Result<Response<GetLastDeployTimeReply>, Status> {
let present = *self.state.present.lock().unwrap();
let time = self.state.last_deploy.lock().unwrap().clone();
let time = *self.state.last_deploy.lock().unwrap();
Ok(Response::new(GetLastDeployTimeReply {
present,
time_of_last_deploy: time,
+3
View File
@@ -95,6 +95,8 @@ impl ClientOptions {
self
}
/// Maximum encoded/decoded gRPC message size, in bytes, the transport
/// will accept. Defaults to 16 MiB.
pub fn with_max_grpc_message_bytes(mut self, max_grpc_message_bytes: usize) -> Self {
self.max_grpc_message_bytes = max_grpc_message_bytes;
self
@@ -140,6 +142,7 @@ impl ClientOptions {
self.stream_timeout
}
/// Configured maximum gRPC message size in bytes.
pub fn max_grpc_message_bytes(&self) -> usize {
self.max_grpc_message_bytes
}
+68 -44
View File
@@ -8,6 +8,8 @@
//! Bulk commands enforce a 1000-item cap before contacting the worker, in
//! line with the gateway's documented `MAX_BULK_ITEMS`.
use std::sync::atomic::{AtomicU64, Ordering};
use crate::client::{EventStream, GatewayClient};
use crate::error::{ensure_protocol_success, Error};
use crate::generated::mxaccess_gateway::v1::mx_command::Payload;
@@ -23,6 +25,16 @@ use crate::value::MxValue;
const MAX_BULK_ITEMS: usize = 1_000;
/// Process-wide monotonic counter that keeps client correlation ids unique.
static CORRELATION_SEQUENCE: AtomicU64 = AtomicU64::new(0);
/// Build a unique `client_correlation_id` for a request so concurrent or
/// repeated calls of the same command kind can be told apart in gateway logs.
fn next_correlation_id(label: &str) -> String {
let sequence = CORRELATION_SEQUENCE.fetch_add(1, Ordering::Relaxed);
format!("rust-client-{label}-{sequence}")
}
/// Handle to an opened gateway session.
///
/// `Session` carries the gateway-issued session id and a cloned
@@ -76,7 +88,7 @@ impl Session {
.client
.close_session_raw(CloseSessionRequest {
session_id: self.id.clone(),
client_correlation_id: "rust-client-close-session".to_owned(),
client_correlation_id: next_correlation_id("close-session"),
})
.await?;
ensure_protocol_success("close session", reply.protocol_status.as_ref())?;
@@ -99,7 +111,7 @@ impl Session {
)
.await?;
Ok(register_server_handle(&reply))
register_server_handle(&reply)
}
/// Run MXAccess `AddItem` against `server_handle` and return the
@@ -120,7 +132,7 @@ impl Session {
)
.await?;
Ok(add_item_handle(&reply))
add_item_handle(&reply)
}
/// Run MXAccess `AddItem2` (item with a caller-supplied context string)
@@ -146,7 +158,7 @@ impl Session {
)
.await?;
Ok(add_item2_handle(&reply))
add_item2_handle(&reply)
}
/// Run MXAccess `RemoveItem` for the given handle pair.
@@ -226,7 +238,7 @@ impl Session {
)
.await?;
Ok(bulk_results(reply, BulkReplyKind::AddItemBulk))
bulk_results(reply, BulkReplyKind::AddItem)
}
/// Bulk variant of [`Session::advise`].
@@ -250,7 +262,7 @@ impl Session {
)
.await?;
Ok(bulk_results(reply, BulkReplyKind::AdviseItemBulk))
bulk_results(reply, BulkReplyKind::AdviseItem)
}
/// Bulk variant of [`Session::remove_item`].
@@ -274,7 +286,7 @@ impl Session {
)
.await?;
Ok(bulk_results(reply, BulkReplyKind::RemoveItemBulk))
bulk_results(reply, BulkReplyKind::RemoveItem)
}
/// Bulk variant of [`Session::un_advise`].
@@ -298,7 +310,7 @@ impl Session {
)
.await?;
Ok(bulk_results(reply, BulkReplyKind::UnAdviseItemBulk))
bulk_results(reply, BulkReplyKind::UnAdviseItem)
}
/// Bulk `Subscribe` (atomic add-and-advise) for a list of tag addresses.
@@ -322,7 +334,7 @@ impl Session {
)
.await?;
Ok(bulk_results(reply, BulkReplyKind::SubscribeBulk))
bulk_results(reply, BulkReplyKind::Subscribe)
}
/// Bulk `Unsubscribe` (atomic un-advise-and-remove) for a list of
@@ -347,7 +359,7 @@ impl Session {
)
.await?;
Ok(bulk_results(reply, BulkReplyKind::UnsubscribeBulk))
bulk_results(reply, BulkReplyKind::Unsubscribe)
}
/// Run MXAccess `Write` (single-value, no caller-supplied timestamp).
@@ -466,7 +478,7 @@ impl Session {
fn command_request(&self, kind: MxCommandKind, payload: Payload) -> MxCommandRequest {
MxCommandRequest {
session_id: self.id.clone(),
client_correlation_id: format!("rust-client-{}", kind.as_str_name()),
client_correlation_id: next_correlation_id(kind.as_str_name()),
command: Some(MxCommand {
kind: kind as i32,
payload: Some(payload),
@@ -486,71 +498,83 @@ fn ensure_bulk_size(name: &'static str, len: usize) -> Result<(), Error> {
}
}
fn register_server_handle(reply: &MxCommandReply) -> i32 {
fn register_server_handle(reply: &MxCommandReply) -> Result<i32, Error> {
match reply.payload.as_ref() {
Some(mx_command_reply::Payload::Register(register)) => register.server_handle,
Some(mx_command_reply::Payload::Register(register)) => Ok(register.server_handle),
_ => reply
.return_value
.as_ref()
.and_then(int32_reply_value)
.unwrap_or_default(),
.ok_or_else(|| Error::MalformedReply {
detail: "Register reply carried neither a register payload nor an \
int32 return value"
.to_owned(),
}),
}
}
fn add_item_handle(reply: &MxCommandReply) -> i32 {
fn add_item_handle(reply: &MxCommandReply) -> Result<i32, Error> {
match reply.payload.as_ref() {
Some(mx_command_reply::Payload::AddItem(add_item)) => add_item.item_handle,
Some(mx_command_reply::Payload::AddItem(add_item)) => Ok(add_item.item_handle),
_ => reply
.return_value
.as_ref()
.and_then(int32_reply_value)
.unwrap_or_default(),
.ok_or_else(|| Error::MalformedReply {
detail: "AddItem reply carried neither an add_item payload nor an \
int32 return value"
.to_owned(),
}),
}
}
fn add_item2_handle(reply: &MxCommandReply) -> i32 {
fn add_item2_handle(reply: &MxCommandReply) -> Result<i32, Error> {
match reply.payload.as_ref() {
Some(mx_command_reply::Payload::AddItem2(add_item)) => add_item.item_handle,
Some(mx_command_reply::Payload::AddItem2(add_item)) => Ok(add_item.item_handle),
_ => reply
.return_value
.as_ref()
.and_then(int32_reply_value)
.unwrap_or_default(),
.ok_or_else(|| Error::MalformedReply {
detail: "AddItem2 reply carried neither an add_item2 payload nor an \
int32 return value"
.to_owned(),
}),
}
}
enum BulkReplyKind {
AddItemBulk,
AdviseItemBulk,
RemoveItemBulk,
UnAdviseItemBulk,
SubscribeBulk,
UnsubscribeBulk,
AddItem,
AdviseItem,
RemoveItem,
UnAdviseItem,
Subscribe,
Unsubscribe,
}
fn bulk_results(reply: MxCommandReply, kind: BulkReplyKind) -> Vec<SubscribeResult> {
fn bulk_results(reply: MxCommandReply, kind: BulkReplyKind) -> Result<Vec<SubscribeResult>, Error> {
match (reply.payload, kind) {
(Some(mx_command_reply::Payload::AddItemBulk(reply)), BulkReplyKind::AddItemBulk) => {
reply.results
(Some(mx_command_reply::Payload::AddItemBulk(reply)), BulkReplyKind::AddItem) => {
Ok(reply.results)
}
(Some(mx_command_reply::Payload::AdviseItemBulk(reply)), BulkReplyKind::AdviseItemBulk) => {
reply.results
(Some(mx_command_reply::Payload::AdviseItemBulk(reply)), BulkReplyKind::AdviseItem) => {
Ok(reply.results)
}
(Some(mx_command_reply::Payload::RemoveItemBulk(reply)), BulkReplyKind::RemoveItemBulk) => {
reply.results
(Some(mx_command_reply::Payload::RemoveItemBulk(reply)), BulkReplyKind::RemoveItem) => {
Ok(reply.results)
}
(
Some(mx_command_reply::Payload::UnAdviseItemBulk(reply)),
BulkReplyKind::UnAdviseItemBulk,
) => reply.results,
(Some(mx_command_reply::Payload::SubscribeBulk(reply)), BulkReplyKind::SubscribeBulk) => {
reply.results
(Some(mx_command_reply::Payload::UnAdviseItemBulk(reply)), BulkReplyKind::UnAdviseItem) => {
Ok(reply.results)
}
(
Some(mx_command_reply::Payload::UnsubscribeBulk(reply)),
BulkReplyKind::UnsubscribeBulk,
) => reply.results,
_ => Vec::new(),
(Some(mx_command_reply::Payload::SubscribeBulk(reply)), BulkReplyKind::Subscribe) => {
Ok(reply.results)
}
(Some(mx_command_reply::Payload::UnsubscribeBulk(reply)), BulkReplyKind::Unsubscribe) => {
Ok(reply.results)
}
_ => Err(Error::MalformedReply {
detail: "bulk command reply did not carry the expected bulk result payload".to_owned(),
}),
}
}
+17 -16
View File
@@ -25,15 +25,13 @@ use crate::generated::mxaccess_gateway::v1::{
#[derive(Clone, Debug, PartialEq)]
pub struct MxValue {
raw: ProtoMxValue,
projection: MxValueProjection,
}
impl MxValue {
/// Wrap a protobuf [`ProtoMxValue`] and compute its
/// [`MxValueProjection`].
/// Wrap a protobuf [`ProtoMxValue`]. The typed [`MxValueProjection`] is
/// computed on demand by [`MxValue::projection`].
pub fn from_proto(raw: ProtoMxValue) -> Self {
let projection = MxValueProjection::from_proto(&raw);
Self { raw, projection }
Self { raw }
}
/// Build a boolean `MxValue` (`MxDataType::Boolean`, `VT_BOOL`).
@@ -102,9 +100,13 @@ impl MxValue {
&self.raw
}
/// Borrow the typed projection.
pub fn projection(&self) -> &MxValueProjection {
&self.projection
/// Compute the typed projection of this value.
///
/// The projection is derived from the raw message on each call rather than
/// cached, so a value built only to be sent over the wire never pays the
/// projection's allocation cost.
pub fn projection(&self) -> MxValueProjection {
MxValueProjection::from_proto(&self.raw)
}
/// Consume the wrapper and return the underlying protobuf message.
@@ -183,15 +185,13 @@ impl MxValueProjection {
#[derive(Clone, Debug, PartialEq)]
pub struct MxArrayValue {
raw: MxArray,
projection: MxArrayProjection,
}
impl MxArrayValue {
/// Wrap a protobuf [`MxArray`] and compute its
/// [`MxArrayProjection`].
/// Wrap a protobuf [`MxArray`]. The typed [`MxArrayProjection`] is
/// computed on demand by [`MxArrayValue::projection`].
pub fn from_proto(raw: MxArray) -> Self {
let projection = MxArrayProjection::from_proto(&raw);
Self { raw, projection }
Self { raw }
}
/// Build a one-dimensional string array (`VT_ARRAY|VT_BSTR`).
@@ -210,9 +210,10 @@ impl MxArrayValue {
&self.raw
}
/// Borrow the typed projection of the array's elements.
pub fn projection(&self) -> &MxArrayProjection {
&self.projection
/// Compute the typed projection of the array's elements, derived from the
/// raw message on each call rather than cached.
pub fn projection(&self) -> MxArrayProjection {
MxArrayProjection::from_proto(&self.raw)
}
}
+3 -2
View File
@@ -3,8 +3,9 @@
//! The protocol versions track the values the gateway and worker negotiate on
//! `OpenSession` and let test harnesses cross-check the wire contract.
/// Semantic version of this Rust client crate. Mirrors `Cargo.toml`.
pub const CLIENT_VERSION: &str = "0.1.0-dev";
/// Semantic version of this Rust client crate, taken from `Cargo.toml` at
/// compile time so the two cannot drift.
pub const CLIENT_VERSION: &str = env!("CARGO_PKG_VERSION");
/// Public gateway gRPC protocol version this client targets.
pub const GATEWAY_PROTOCOL_VERSION: u32 = 3;
+73 -2
View File
@@ -203,7 +203,7 @@ fn value_conversion_fixtures_keep_typed_projection_and_raw_metadata() {
});
assert_eq!(
int64_value.projection(),
&MxValueProjection::Int64(9_223_372_036_854_770_000)
MxValueProjection::Int64(9_223_372_036_854_770_000)
);
let raw_case = case_by_id(cases, "raw-fallback.variant");
@@ -220,7 +220,7 @@ fn value_conversion_fixtures_keep_typed_projection_and_raw_metadata() {
});
assert_eq!(
raw_value.projection(),
&MxValueProjection::Raw(vec![1, 2, 3, 4, 5])
MxValueProjection::Raw(vec![1, 2, 3, 4, 5])
);
assert_eq!(raw_value.raw().raw_data_type, 32767);
assert!(raw_value.raw().raw_diagnostic.contains("No lossless"));
@@ -272,11 +272,76 @@ fn command_error_display_keeps_raw_reply_accessible() {
assert!(error.to_string().contains("MxaccessFailure"));
}
#[tokio::test]
async fn add_item_bulk_rejects_input_above_the_thousand_item_cap() {
let state = Arc::new(FakeState::default());
let endpoint = spawn_fake_gateway(state.clone()).await;
let client = GatewayClient::connect(ClientOptions::new(endpoint))
.await
.unwrap();
let session = client.session("session-fixture");
let oversized: Vec<String> = (0..1001).map(|index| format!("Tag{index}")).collect();
let error = session.add_item_bulk(12, oversized).await.unwrap_err();
assert!(
matches!(&error, Error::InvalidArgument { name, .. } if name.as_str() == "tag_addresses"),
"expected InvalidArgument for tag_addresses, got {error:?}"
);
}
#[tokio::test]
async fn event_stream_surfaces_a_mid_stream_status_fault() {
let state = Arc::new(FakeState::default());
state.emit_stream_fault.store(true, Ordering::SeqCst);
let endpoint = spawn_fake_gateway(state.clone()).await;
let client = GatewayClient::connect(ClientOptions::new(endpoint))
.await
.unwrap();
let mut stream = client
.stream_events(StreamEventsRequest {
session_id: "session-fixture".to_owned(),
after_worker_sequence: 0,
})
.await
.unwrap();
assert_eq!(stream.next().await.unwrap().unwrap().worker_sequence, 1);
assert_eq!(stream.next().await.unwrap().unwrap().worker_sequence, 2);
let fault = stream.next().await.unwrap().unwrap_err();
assert!(
matches!(fault, Error::Unavailable { .. }),
"expected Error::Unavailable, got {fault:?}"
);
}
#[tokio::test]
async fn connect_with_unreadable_ca_file_reports_invalid_endpoint() {
let options = ClientOptions::new("https://127.0.0.1:65000")
.with_plaintext(false)
.with_ca_file("definitely-not-a-real-ca-file.pem");
// GatewayClient is not Debug, so unwrap_err is unavailable here.
let error = match GatewayClient::connect(options).await {
Ok(_) => panic!("connect should fail when the CA file cannot be read"),
Err(error) => error,
};
assert!(
matches!(error, Error::InvalidEndpoint { .. }),
"expected Error::InvalidEndpoint, got {error:?}"
);
}
#[derive(Default)]
struct FakeState {
authorization: Mutex<Option<String>>,
last_command_kind: Mutex<Option<i32>>,
stream_dropped: Arc<AtomicBool>,
emit_stream_fault: AtomicBool,
}
#[derive(Clone)]
@@ -376,6 +441,12 @@ impl MxAccessGateway for FakeGateway {
let (sender, receiver) = mpsc::channel(4);
sender.send(Ok(event(1))).await.unwrap();
sender.send(Ok(event(2))).await.unwrap();
if self.state.emit_stream_fault.load(Ordering::SeqCst) {
sender
.send(Err(Status::unavailable("worker dropped the session")))
.await
.unwrap();
}
Ok(Response::new(DropAwareStream {
inner: ReceiverStream::new(receiver),
+147
View File
@@ -0,0 +1,147 @@
# Code Review — Client.Dotnet
| Field | Value |
|---|---|
| Module | `clients/dotnet` |
| Reviewer | Claude Code |
| Review date | 2026-05-18 |
| Commit reviewed | `3cc53a8` |
| Status | Reviewed |
| Open findings | 0 |
## Checklist coverage
| # | Category | Result |
|---|---|---|
| 1 | Correctness & logic bugs | Minor: handle-selector fallback `?? reply.ReturnValue.Int32Value` can mask a missing typed reply (Client.Dotnet-005); CLI redactor misses env-var keys (Client.Dotnet-008). |
| 2 | mxaccessgw conventions | Good — consumes the shared contracts project, no forked proto, `authorization: Bearer` metadata correct, parity preserved via split `EnsureProtocolSuccess`/`EnsureMxAccessSuccess`. |
| 3 | Concurrency & thread safety | Issue found: `_disposed` flags unsynchronized; `MxGatewaySession.DisposeAsync` can race a concurrent `CloseAsync` (Client.Dotnet-003). |
| 4 | Error handling & resilience | Issues found: gRPC-to-native mapping collapses non-auth statuses into one untyped exception (Client.Dotnet-001); shared retry/timeout budget (Client.Dotnet-004). |
| 5 | Security | Good — API key never logged by the library, CLI redacts keys, TLS custom-root validation correct. |
| 6 | Performance & resource management | No issues found — channels and streaming calls disposed correctly. |
| 7 | Design-document adherence | No issues found — matches `ClientLibrariesDesign.md`. |
| 8 | Code organization & conventions | Issue found: undocumented public members (Client.Dotnet-006). |
| 9 | Testing coverage | Issue found: the production retry path is never exercised (Client.Dotnet-002). |
| 10 | Documentation & comments | Issue found: doc misstates the unary timeout retry budget as per-call (Client.Dotnet-004, Client.Dotnet-007). |
## Findings
### Client.Dotnet-001
| Field | Value |
|---|---|
| Severity | Medium |
| Category | Error handling & resilience |
| Location | `clients/dotnet/MxGateway.Client/GrpcMxGatewayClientTransport.cs:190-199`, `clients/dotnet/MxGateway.Client/GrpcGalaxyRepositoryClientTransport.cs:131-140` |
| Status | Resolved |
**Description:** `MapRpcException` only produces typed exceptions for `Unauthenticated` and `PermissionDenied`. Every other gRPC status — `NotFound`, `InvalidArgument`, `ResourceExhausted`, `FailedPrecondition`, `Unavailable`, `Internal` — collapses into the base `MxGatewayException` with no surfaced `StatusCode`. Callers cannot programmatically distinguish a transient outage from a permanent bad-argument error without reflecting into `InnerException` and downcasting to `RpcException`.
**Recommendation:** Carry the gRPC `StatusCode` on `MxGatewayException` (e.g. a `StatusCode` property) and/or add typed subclasses for at least `NotFound`, `InvalidArgument`, and `Unavailable`. Populate it from `exception.StatusCode` in `MapRpcException`.
**Resolution:** (2026-05-18) Confirmed against source: both transports had a duplicated private `MapRpcException` that only typed two statuses and discarded the gRPC code for the rest. Added a nullable `StatusCode` property (`Grpc.Core.StatusCode?`) to `MxGatewayException` plus constructors that carry it, threaded it through `MxGatewayAuthenticationException`/`MxGatewayAuthorizationException`, and extracted the two duplicated mappers into a single shared internal `RpcExceptionMapper` (`RpcExceptionMapper.cs`) that populates `StatusCode` from `exception.StatusCode` for every status. Callers can now distinguish transient from permanent failures without downcasting `InnerException`. Documented in `clients/dotnet/README.md`. Regression test: `RpcExceptionMapperTests` (8 cases incl. the `[Theory]` over `NotFound`/`InvalidArgument`/`ResourceExhausted`/`FailedPrecondition`/`Unavailable`/`Internal`).
### Client.Dotnet-002
| Field | Value |
|---|---|
| Severity | Medium |
| Category | Testing coverage |
| Location | `clients/dotnet/MxGateway.Client.Tests/FakeGatewayTransport.cs:145-148`, `clients/dotnet/MxGateway.Client.Tests/MxGatewayClientSessionTests.cs:236-256` |
| Status | Resolved |
**Description:** The retry predicate `MxGatewayClientRetryPolicy.IsTransientGrpcFailure` handles two shapes: a raw `RpcException` and an `MxGatewayException { InnerException: RpcException }`. In production the transport always maps `RpcException``MxGatewayException` before it reaches the retry pipeline, so only the wrapped-`MxGatewayException` branch ever runs in production. But `FakeGatewayTransport` throws the raw `RpcException` and never maps it, so every retry test exercises only the raw-`RpcException` branch — the branch that never occurs in production. The production retry behaviour is effectively untested.
**Recommendation:** Add a fake/transport mode that maps `RpcException` to `MxGatewayException` the way `GrpcMxGatewayClientTransport` does (or add tests that enqueue a pre-wrapped `MxGatewayException`), so the actually-used predicate branch is covered.
**Resolution:** (2026-05-18) Confirmed against source: `FakeGatewayTransport` threw queued exceptions verbatim, so the existing retry tests only ever hit the raw-`RpcException` predicate branch. Added a `MapTransportExceptions` flag to `FakeGatewayTransport` that, when set, runs thrown `RpcException`s through the same shared `RpcExceptionMapper` the production gRPC transport uses, producing the wrapped `MxGatewayException` shape. Added regression test `MxGatewayClientSessionTests.InvokeAsync_RetriesSafeDiagnosticCommand_WhenTransportMapsRpcException`, which exercises the previously-untested production predicate branch. Verified red: removing the `MxGatewayException { InnerException: RpcException }` case from `IsTransientGrpcFailure` fails the new test while the pre-existing raw-`RpcException` test still passes.
### Client.Dotnet-003
| Field | Value |
|---|---|
| Severity | Medium |
| Category | Concurrency & thread safety |
| Location | `clients/dotnet/MxGateway.Client/MxGatewaySession.cs:659-663`, `clients/dotnet/MxGateway.Client/MxGatewayClient.cs:230-240` |
| Status | Resolved |
**Description:** `DisposeAsync` calls `CloseAsync()` (no token) then unconditionally `_closeLock.Dispose()`. If another thread is concurrently awaiting `CloseAsync(token)` — legal, since the type exposes public async methods and no single-threaded contract — disposing the `SemaphoreSlim` while a `WaitAsync` is pending throws `ObjectDisposedException` into that caller. The `_disposed` flags in both clients are also plain unsynchronised `bool` reads/writes; `ThrowIfDisposed` racing `DisposeAsync` can observe a stale value.
**Recommendation:** Either document `MxGatewaySession`/`MxGatewayClient` as not thread-safe for concurrent dispose, or guard `_disposed` with `Interlocked`/`volatile` and avoid disposing `_closeLock` until all in-flight `CloseAsync` calls complete.
**Resolution:** (2026-05-18) Confirmed against source: `MxGatewaySession.DisposeAsync` disposed `_closeLock` unconditionally, racing concurrent `CloseAsync` callers; `MxGatewayClient._disposed` was a plain `bool`. Fixed `MxGatewaySession` by tracking in-flight `CloseAsync` callers with an `_activeCloseCount` guarded by a dedicated `_disposeGate` lock and a `_closeLockDisposed` flag: `CloseAsync` registers under the gate (and throws `ObjectDisposedException` if disposal already won) before awaiting `_closeLock.WaitAsync`, and `DisposeAsync` drains `_activeCloseCount` to zero before disposing the semaphore, so the close lock provably outlives every pending `WaitAsync`. Fixed `MxGatewayClient` by changing `_disposed` to an `int` accessed via `Interlocked.Exchange`/`Volatile.Read`. Regression test `MxGatewayClientSessionTests.DisposeAsync_DoesNotRaceConcurrentCloseAsync` runs 100 iterations with one close holding the lock and one parked behind it while `DisposeAsync` runs concurrently; verified red against the original `DisposeAsync` (fails with `ObjectDisposedException`), green after the fix.
### Client.Dotnet-004
| Field | Value |
|---|---|
| Severity | Low |
| Category | Error handling & resilience |
| Location | `clients/dotnet/MxGateway.Client/MxGatewayClient.cs:283-294`, `clients/dotnet/MxGateway.Client/GalaxyRepositoryClient.cs:392-403` |
| Status | Resolved |
**Description:** `ExecuteSafeUnaryAsync` wraps the whole Polly retry pipeline in a single linked CTS cancelled after `Options.DefaultCallTimeout`, while `CreateCallOptions` also stamps each individual call with a `DefaultCallTimeout` gRPC deadline. The retry pipeline therefore shares one `DefaultCallTimeout` budget across the initial attempt plus all retries plus backoff delays. The README/XML docs describe `DefaultCallTimeout` as a per-call timeout, which misrepresents this. `DeadlineExceeded` is also classified as transient, so an attempt that exhausts the shared budget is retried only to immediately fail again.
**Recommendation:** Decide whether `DefaultCallTimeout` is per-attempt or per-operation and make code and docs consistent — e.g. a separate per-attempt deadline and a distinct overall-operation timeout. Reconsider retrying on `DeadlineExceeded` when the deadline was client-imposed.
**Resolution:** (2026-05-18) Confirmed against source: the shared linked-CTS budget plus per-call deadline both use `DefaultCallTimeout`, and `IsTransientStatus` listed `DeadlineExceeded`. Resolved as a per-operation budget (the simpler, non-breaking choice): the `DefaultCallTimeout` XML doc in `MxGatewayClientOptions.cs` now states it is both the per-attempt gRPC deadline and the overall budget shared across the initial attempt, every retry, and the backoff delays — an upper bound on total wall-clock time, not a fresh per-retry allowance. Removed `DeadlineExceeded` from `MxGatewayClientRetryPolicy.IsTransientStatus`: every unary deadline is client-imposed (`CreateCallOptions` stamps the shared budget), so a `DeadlineExceeded` means the budget is exhausted and an immediate retry can only fail again. Regression test `MxGatewayClientSessionTests.InvokeAsync_DoesNotRetrySafeDiagnosticCommand_OnDeadlineExceeded` asserts the safe diagnostic command (`Ping`) is attempted exactly once and the failure surfaces; verified red against the original transient set (the call retried and succeeded).
### Client.Dotnet-005
| Field | Value |
|---|---|
| Severity | Low |
| Category | Correctness & logic bugs |
| Location | `clients/dotnet/MxGateway.Client/MxGatewaySession.cs:82,124,175` |
| Status | Resolved |
**Description:** `RegisterAsync`/`AddItemAsync`/`AddItem2Async` return `reply.<Typed>?.ServerHandle ?? reply.ReturnValue.Int32Value`. After `EnsureMxAccessSuccess()` passes, a missing typed payload silently falls back to `ReturnValue.Int32Value`, which for a reply carrying no return value is `0`. A caller then uses `0` as a `ServerHandle`/`ItemHandle`, producing a confusing downstream invalid-handle failure rather than a clear "gateway reply missing payload" error.
**Recommendation:** If the typed sub-message is the contract for these commands, treat its absence on an otherwise-successful reply as an error (throw a descriptive `MxGatewayException`) rather than falling through to `ReturnValue.Int32Value`.
**Resolution:** (2026-05-18) Confirmed against source and `mxaccess_gateway.proto`: `register`/`add_item`/`add_item2` are members of the `MxCommandReply.payload` oneof, so the typed accessor is `null` whenever the worker did not set that case — and the fallback returned `ReturnValue.Int32Value` (0 for a reply with no return value). The typed sub-message is the contract for these handle-returning commands, so its absence on an otherwise-successful reply is now an error: `RegisterAsync`/`AddItemAsync`/`AddItem2Async` throw via a new private `MxGatewaySession.CreateMissingPayloadException` helper that builds a descriptive `MxGatewayException` naming the missing payload, kind, session, and correlation id. Regression tests `MxGatewayClientSessionTests.RegisterAsync_Throws_WhenSuccessfulReplyMissingPayload` and `AddItemAsync_Throws_WhenSuccessfulReplyMissingPayload` enqueue an `Ok` reply with no typed payload and assert the descriptive throw; verified red against the original fallback (returned `0` instead of throwing).
### Client.Dotnet-006
| Field | Value |
|---|---|
| Severity | Low |
| Category | Code organization & conventions |
| Location | `clients/dotnet/MxGateway.Client/MxGatewayClientOptions.cs:50`, `clients/dotnet/MxGateway.Client/MxGatewayClientContractInfo.cs:10-14` |
| Status | Resolved |
**Description:** `MxGatewayClientOptions.MaxGrpcMessageBytes` and the two `const`s in `MxGatewayClientContractInfo` are public members with no XML doc comments, inconsistent with every other public member in the assembly and with the repo's documented C# style emphasis on a documented public surface.
**Recommendation:** Add `<summary>` doc comments to `MaxGrpcMessageBytes`, `GatewayProtocolVersion`, and `WorkerProtocolVersion`.
**Resolution:** (2026-05-18) Confirmed: all three public members lacked XML docs while every other public member in the assembly is documented. Added `<summary>` comments to `MxGatewayClientOptions.MaxGrpcMessageBytes` (describing the 16 MiB default applied to both send and receive limits), and to `MxGatewayClientContractInfo.GatewayProtocolVersion` and `WorkerProtocolVersion` (describing their wire-compatibility / diagnostics purpose). Pure documentation change — no test needed; build remains warning-clean.
### Client.Dotnet-007
| Field | Value |
|---|---|
| Severity | Low |
| Category | Documentation & comments |
| Location | `clients/dotnet/MxGateway.Client/MxGatewayClient.cs:185-192` |
| Status | Resolved |
**Description:** The `AcknowledgeAlarmAsync` XML comment states the gateway authenticates against an `invoke:alarm-ack` scope, but `CLAUDE.md` documents the scope set without any `invoke:alarm-ack` sub-scope. The comment may describe an intended finer-grained scope that does not exist, misleading integrators about what API key they need.
**Recommendation:** Reconcile the comment with the actual server-side scope check, or update the scope documentation if sub-scopes were genuinely added; keep client doc and gateway auth model in sync.
**Resolution:** (2026-05-18) Confirmed against the server-side authorization model: `GatewayGrpcScopeResolver.ResolveRequiredScope` has no arm for `AcknowledgeAlarmRequest`, so it falls to the `_ => GatewayScopes.Admin` default — the RPC actually requires the `admin` scope. No `invoke:alarm-ack` sub-scope exists anywhere in `GatewayScopes`. The client XML comment on `AcknowledgeAlarmAsync` was wrong, not the docs. Corrected the comment to state the gateway authorizes `AcknowledgeAlarmRequest` against the API key's `admin` scope and that there is no finer-grained alarm-ack sub-scope. Pure documentation change — no test needed.
### Client.Dotnet-008
| Field | Value |
|---|---|
| Severity | Low |
| Category | Correctness & logic bugs |
| Location | `clients/dotnet/MxGateway.Client.Cli/MxGatewayCliSecretRedactor.cs:9-17` |
| Status | Resolved |
**Description:** The CLI redactor only removes the API key string when it was supplied via `--api-key`; `RunCoreAsync` passes `arguments.GetOptional("api-key")` to `Redact`. When the key comes from an environment variable (`--api-key-env`, the documented default path), `apiKey` is `null` and no redaction occurs. If a gRPC/transport error message ever echoes the bearer token, it would be printed unredacted.
**Recommendation:** Resolve the effective API key (same logic as `ResolveApiKey`) before redacting, so the env-var-sourced key is also stripped from error output.
**Resolution:** (2026-05-18) Confirmed against source: `MxGatewayClientCli.RunCoreAsync`'s catch block redacted only `arguments.GetOptional("api-key")`, so an env-var-sourced key (`--api-key-env`, default `MXGATEWAY_API_KEY`) was never stripped. Note `MxGatewayCliSecretRedactor` itself is correct — the defect was the caller passing the wrong value. Extracted a non-throwing `TryResolveApiKey` helper (used by both the existing `ResolveApiKey` and the catch block) that resolves `--api-key` then the `--api-key-env` environment variable; the catch block now redacts that effective key. Updated `clients/dotnet/README.md` (`smoke` paragraph) to state the CLI redacts the effective key whether from `--api-key` or `--api-key-env`. Regression test `MxGatewayClientCliTests.RunAsync_ErrorOutput_RedactsApiKey_WhenSourcedFromEnvironmentVariable` sets a test env var, forces a transport error echoing the key, and asserts the key is absent and `[redacted]` is present; verified red against the original `GetOptional("api-key")`-only redaction (key printed unredacted).
+177
View File
@@ -0,0 +1,177 @@
# Code Review — Client.Go
| Field | Value |
|---|---|
| Module | `clients/go` |
| Reviewer | Claude Code |
| Review date | 2026-05-18 |
| Commit reviewed | `3cc53a8` |
| Status | Reviewed |
| Open findings | 0 |
## Checklist coverage
| # | Category | Result |
|---|---|---|
| 1 | Correctness & logic bugs | Issues found: a typed-nil `Unwrap`/`errors.As` trap (Client.Go-001), a CLI `panic` on malformed input (Client.Go-003), empty-string correlation id on rand failure (Client.Go-007). |
| 2 | mxaccessgw conventions | Generally good; two test files fail `gofmt`, breaking the documented workflow (Client.Go-004). |
| 3 | Concurrency & thread safety | No issues found — stream goroutines and cancellation are sound. |
| 4 | Error handling & resilience | Issues found: the compatibility event path silently drops events (Client.Go-002); no transient/permanent classification (Client.Go-006). |
| 5 | Security | No issues found — TLS by default with a TLS 1.2 floor, API key redaction, no secret logging. |
| 6 | Performance & resource management | No issues found — connections/streams closed via deferred `Close`/`cancel`. |
| 7 | Design-document adherence | Issues found: deprecated `grpc.DialContext`+`WithBlock` usage and a missing error taxonomy (Client.Go-005, Client.Go-006). |
| 8 | Code organization & conventions | Issue found: duplication between `Client` and `GalaxyClient` (Client.Go-009). |
| 9 | Testing coverage | Issue found: TLS path, `callContext` deadline logic, and `NativeValue`/`NativeArray` edges untested (Client.Go-008). |
| 10 | Documentation & comments | Issue found: a stale `WithBlock` dial-cancellation claim (Client.Go-010). |
## Findings
### Client.Go-001
| Field | Value |
|---|---|
| Severity | High |
| Category | Correctness & logic bugs |
| Location | `clients/go/mxgateway/errors.go:88-93`, `clients/go/mxgateway/errors.go:117-128` |
| Status | Resolved |
**Description:** `MxAccessError.Unwrap` returns `e.Command` directly. `EnsureMxAccessSuccess` constructs `&MxAccessError{Reply: reply}` with `Command` left nil (the HRESULT / failing-`MxStatusProxy` path). When `Command` is a nil `*CommandError`, `Unwrap()` returns a non-nil `error` interface wrapping a nil pointer. Consequently `errors.As(err, &ce)` for `*CommandError` returns `true` while setting `ce` to nil — a caller writing the idiomatic `if errors.As(err, &commandErr) { use commandErr.Status }` nil-dereferences and panics. Verified empirically; the existing test only exercises the populated-`Command` path.
**Recommendation:** Make `Unwrap` return an untyped nil when `Command` is nil: `if e == nil || e.Command == nil { return nil }; return e.Command`. Add a test for the HRESULT-only `MxAccessError` asserting `errors.As(err, &ce)` is `false`.
**Resolution:** Resolved 2026-05-18: `MxAccessError.Unwrap` now returns an untyped nil when `Command` is nil, so `errors.As` no longer binds a typed-nil `*CommandError`; added `errors_test.go` regression coverage for the HRESULT-only and populated-`Command` paths.
### Client.Go-002
| Field | Value |
|---|---|
| Severity | Medium |
| Category | Error handling & resilience |
| Location | `clients/go/mxgateway/session.go:440-516` |
| Status | Resolved |
**Description:** For the `Events`/`EventsAfter` compatibility API (`cancelWhenResultBufferFull == true`), when the 16-slot `results` channel is full `sendEventResult` cancels and returns `false`; the goroutine returns and `close(results)` runs — the consumer sees the channel close with **no `EventResult{Err: ...}` ever delivered**. A slow consumer cannot distinguish "stream ended normally" from "events were silently dropped." This contradicts the design doc's "libraries should not reorder, coalesce, or drop events by default", and a test currently pins this lossy behaviour.
**Recommendation:** Before cancelling on a full buffer, deliver a terminal `EventResult` carrying an explicit error (e.g. `ErrEventBufferOverflow`). Document the behaviour on `Session.Events`; steer callers to `SubscribeEvents` (which blocks instead of dropping).
**Resolution:** Resolved 2026-05-18: confirmed against source — on a full bounded buffer the compatibility path cancelled and closed `results` with no terminal result. Added the exported sentinel `ErrEventBufferOverflow` (`errors.go`); `sendEventResult` now, on a full buffer, cancels the stream then calls the new `deliverTerminalResult` helper, which evicts one of the oldest buffered events to make room and places `EventResult{Err: ErrEventBufferOverflow}` so it becomes the consumer's last item before the channel closes. The previously lossy regression test (`TestEventsAfterCancelsStreamWhenCompatibilityChannelIsAbandoned`) was re-pointed to assert the terminal `ErrEventBufferOverflow` result is delivered. `clients/go/README.md` now documents the bounded-buffer/overflow behaviour and steers no-loss callers to `SubscribeEvents`.
### Client.Go-003
| Field | Value |
|---|---|
| Severity | Medium |
| Category | Correctness & logic bugs |
| Location | `clients/go/cmd/mxgw-go/main.go:517-532` |
| Status | Resolved |
**Description:** `parseInt32List` calls `panic(err)` when an `item-handles` token fails to parse as an int32. The CLI is a documented user-facing tool; a typo like `-item-handles 1,foo` crashes the process with an unrecovered panic and stack trace instead of returning a clean error and exit code 2 like every other validation path in `main.go`.
**Recommendation:** Change `parseInt32List` to return `([]int32, error)` and have `runUnsubscribeBulk` propagate the error, matching `parseValue`'s pattern.
**Resolution:** Resolved 2026-05-18: confirmed against source — `parseInt32List` called `panic(err)` on a malformed token. It now returns `([]int32, error)`, wrapping the bad token (`invalid item handle %q: %w`); `runUnsubscribeBulk` parses item handles before dialing and returns the error, so a typo flows through `runWithIO` to `os.Exit(2)` like other validation paths. Regression tests `TestParseInt32ListParsesValidTokens` and `TestParseInt32ListReturnsErrorOnMalformedToken` added to `cmd/mxgw-go/main_test.go`.
### Client.Go-004
| Field | Value |
|---|---|
| Severity | Low |
| Category | mxaccessgw conventions |
| Location | `clients/go/mxgateway/alarms_test.go:153-154`, `clients/go/mxgateway/galaxy_test.go:58-59` |
| Status | Resolved |
**Description:** `gofmt -l` flags `alarms_test.go` and `galaxy_test.go` for misaligned struct-literal field padding. The Go client README lists `gofmt` as part of the workflow and the repo enforces style; unformatted committed code breaks `gofmt`-gated checks and CI.
**Recommendation:** Run `gofmt -w mxgateway/alarms_test.go mxgateway/galaxy_test.go`.
**Resolution:** Resolved 2026-05-18: confirmed `gofmt -l .` flagged both files for misaligned struct-literal padding. Ran `gofmt -w` on `mxgateway/alarms_test.go` and `mxgateway/galaxy_test.go`; `gofmt -l .` is now clean for the whole module.
### Client.Go-005
| Field | Value |
|---|---|
| Severity | Low |
| Category | Design-document adherence |
| Location | `clients/go/mxgateway/client.go:64,68`, `clients/go/mxgateway/galaxy.go:83,87` |
| Status | Resolved |
**Description:** The client uses `grpc.DialContext` with `grpc.WithBlock()`. In current grpc-go both are deprecated in favour of `grpc.NewClient` (lazy connection). `WithBlock` also changes failure semantics: a transient gateway-unavailable at dial time becomes a hard `Dial` error rather than a connection that recovers when the gateway comes up, working against the design doc's resilience intent.
**Recommendation:** Migrate to `grpc.NewClient`; if a fail-fast connect probe is still wanted, do an explicit readiness wait bounded by `DialTimeout`, and update the doc comment.
**Resolution:** Resolved 2026-05-18: confirmed `Dial`/`DialGalaxy` used the deprecated `grpc.DialContext` + `grpc.WithBlock` pair. Migrated both to the shared `dial(ctx, opts)` helper, which now builds a lazy connection with `grpc.NewClient` and runs an explicit `waitForReady` readiness probe (`Connect` + `WaitForStateChange` until `connectivity.Ready`) bounded by `DialTimeout` — preserving fail-fast behavior while letting an otherwise lazy connection recover when the gateway is briefly down. Note: `grpc.NewClient` defaults the target scheme to `dns`, so the bufconn test harnesses (`client_session_test.go`, `alarms_test.go`, `galaxy_test.go`) were updated to use `passthrough:///bufnet` so the fake target reaches the context dialer. New tests `TestDialFailsFastWhenGatewayUnreachable` and `TestDialReadinessProbeReachesReady` cover the probe; `go vet` reports no deprecation. `clients/go/README.md` documents the lazy-connect + readiness-probe semantics.
### Client.Go-006
| Field | Value |
|---|---|
| Severity | Low |
| Category | Error handling & resilience |
| Location | `clients/go/mxgateway/errors.go:9-130` |
| Status | Resolved |
**Description:** `docs/ClientLibrariesDesign.md` recommends a high-level error taxonomy (`TransportError`, `AuthenticationError`, `TimeoutError`, etc.). The Go client collapses all transport/gRPC failures into a single `GatewayError` with no way to classify transient (`Unavailable`, `DeadlineExceeded`) vs permanent (`Unauthenticated`, `InvalidArgument`) without manually unwrapping and calling `status.Code`.
**Recommendation:** Add a helper (e.g. `IsTransient(err) bool`) or expose the gRPC `codes.Code` on `GatewayError`, so retry/timeout/auth handling can be written without re-parsing the wrapped error.
**Resolution:** Resolved 2026-05-18: implemented the recommended classification surface in `errors.go` rather than a full parallel type hierarchy (the existing `GatewayError`/`CommandError`/`MxAccessError` chain already separates transport from protocol from MXAccess failures). Added `GatewayError.Code()` (returns the wrapped gRPC `codes.Code`, `OK` for nil, `Unknown` for a non-status error) and the free function `IsTransient(err error) bool`, which unwraps through `*GatewayError` and any gRPC-status chain and reports `true` for `Unavailable`, `DeadlineExceeded`, `ResourceExhausted`, and `Aborted`. Tests `TestGatewayErrorCode` and `TestIsTransient` cover the matrix; `clients/go/README.md` documents both for retry/timeout/auth handling.
### Client.Go-007
| Field | Value |
|---|---|
| Severity | Low |
| Category | Correctness & logic bugs |
| Location | `clients/go/mxgateway/session.go:526-532` |
| Status | Resolved |
**Description:** `newCorrelationID` returns an empty string when `crypto/rand.Read` fails, silently producing an `MxCommandRequest` with no correlation id. `rand.Read` failure is rare, but the failure mode (untraceable command, no error surfaced) is worse than failing loud, and the empty-id path is untested.
**Recommendation:** Either propagate the error up through `invokeCommand`, or fall back to a time/counter-based id rather than an empty string.
**Resolution:** Resolved 2026-05-18: confirmed `newCorrelationID` returned `""` on a `rand.Read` failure. It now falls back to a non-empty `"fallback-<unixnano>-<counter>"` id built from `time.Now().UnixNano()` and a process-wide `atomic.Uint64` monotonic counter, so every command stays traceable even without entropy. The `crypto/rand` call was routed through a `randRead` package variable so the failure path is testable; `TestNewCorrelationIDFallsBackOnRandFailure` simulates a `rand.Read` failure and asserts the fallback id is non-empty, `fallback-` prefixed, and unique, and `TestNewCorrelationIDUsesRandEntropy` pins the happy path.
### Client.Go-008
| Field | Value |
|---|---|
| Severity | Low |
| Category | Testing coverage |
| Location | `clients/go/mxgateway/` (test files) |
| Status | Resolved |
**Description:** Several critical paths are untested: TLS credential resolution in `resolveTransportCredentials` (only the `Plaintext` path is exercised); the `callContext` deadline-shortening logic (`client.go:198-204`) including the negative-timeout disable case; and `NativeValue`/`NativeArray` for the array, raw-bytes, null, and unsupported-kind branches.
**Recommendation:** Add unit tests for `resolveTransportCredentials` precedence, `callContext` deadline arithmetic, and `NativeValue`/`NativeArray` round-trips for every kind.
**Resolution:** Resolved 2026-05-18: added `clients/go/mxgateway/coverage_test.go`. `TestResolveTransportCredentialsPrecedence` exercises every branch (explicit `TransportCredentials`, `Plaintext`, missing `CACertFile` error, `TLSConfig` + `ServerNameOverride`, default TLS floor) and `TestResolveTransportCredentialsDoesNotMutateTLSConfig` confirms the supplied `*tls.Config` is cloned. `TestCallContextDeadlineArithmetic` covers zero/default, negative-disable, positive timeout, caller-deadline-sooner-kept, and caller-deadline-later-shortened. `TestNativeValueEdgeKinds`, `TestNativeArrayEdgeKinds`, and `TestNativeValueUnsupportedKind` cover the null, raw-bytes (including the no-alias copy), array, timestamp-with-nil, and unsupported-kind branches.
### Client.Go-009
| Field | Value |
|---|---|
| Severity | Low |
| Category | Code organization & conventions |
| Location | `clients/go/mxgateway/galaxy.go:60-93,241-256`, `clients/go/mxgateway/client.go:41-74,190-205` |
| Status | Resolved |
**Description:** `DialGalaxy`/`Dial` and `GalaxyClient.callContext`/`Client.callContext` are near-identical duplicates (dial-context setup, credential resolution, dial-option assembly, deadline arithmetic). A fix to one (e.g. the Client.Go-005 dial migration) must be applied twice and can drift.
**Recommendation:** Extract a shared unexported `dial(ctx, opts)` and a free `callContext(opts, ctx)` function, and have both client constructors call them.
**Resolution:** Resolved 2026-05-18: extracted the shared unexported `dial(ctx, opts) (*grpc.ClientConn, error)` (credential resolution, dial-option assembly, `grpc.NewClient`, readiness probe) and the free `callContext(ctx, callTimeout) (context.Context, context.CancelFunc)` into `client.go`. `Dial`/`DialGalaxy` and both `(*Client).callContext`/`(*GalaxyClient).callContext` methods now delegate to them; the duplicated dial and deadline code in `galaxy.go` was removed (its now-unused `errors` import dropped). This was done together with the Client.Go-005 migration so the `grpc.NewClient` change lives in exactly one place.
### Client.Go-010
| Field | Value |
|---|---|
| Severity | Low |
| Category | Documentation & comments |
| Location | `clients/go/mxgateway/client.go:39-40` |
| Status | Resolved |
**Description:** The `Dial` doc comment states it configures "blocking dial cancellation from ctx." This describes the deprecated `WithBlock` behaviour; once Client.Go-005 is addressed the comment is misleading about how connection establishment and cancellation work.
**Recommendation:** Reword to describe the actual connect/timeout semantics after resolving Client.Go-005, and clarify that `DialTimeout` bounds the initial connect attempt.
**Resolution:** Resolved 2026-05-18: alongside the Client.Go-005 migration, the `Dial` doc comment was rewritten to describe the lazy `grpc.NewClient` connection, the `DialTimeout`-bounded (default 10s, or ctx deadline when sooner) readiness probe, that a briefly-unavailable gateway recovers instead of producing a hard error, and that cancelling `ctx` aborts the probe. `DialGalaxy` and the new `dial`/`waitForReady`/`callContext` helpers carry matching doc comments.
+207
View File
@@ -0,0 +1,207 @@
# Code Review — Client.Java
| Field | Value |
|---|---|
| Module | `clients/java` |
| Reviewer | Claude Code |
| Review date | 2026-05-18 |
| Commit reviewed | `3cc53a8` |
| Status | Reviewed |
| Open findings | 0 |
## Checklist coverage
| # | Category | Result |
|---|---|---|
| 1 | Correctness & logic bugs | Issues found: `register`/`addItem` silently fall back to `getReturnValue()` masking missing payloads (Client.Java-004); fragile `resolved()` mutation pattern (Client.Java-012). |
| 2 | mxaccessgw conventions | Largely adheres; the gateway protocol-version handshake is never verified despite the contract field existing (Client.Java-003). |
| 3 | Concurrency & thread safety | Issue found: `MxEventStream.next` is a plain field and terminal-state transitions race (Client.Java-002). |
| 4 | Error handling & resilience | Issues found: `close()` can mask the primary exception (Client.Java-005); async/sync error surfaces inconsistent (Client.Java-008). |
| 5 | Security | Issue found: API-key redaction leaks the trailing 4 secret characters (Client.Java-001). |
| 6 | Performance & resource management | Issues found: `close()` does not await termination (Client.Java-006); no stream flow control (Client.Java-011). |
| 7 | Design-document adherence | Matches `JavaClientDesign.md` closely; the protocol-version check is undocumented-missing (Client.Java-003). |
| 8 | Code organization & conventions | Issue found: ~80 duplicated lines across the two clients (Client.Java-009). |
| 9 | Testing coverage | Issue found: alarm RPCs, TLS setup, async streams, and queue overflow untested (Client.Java-007). |
| 10 | Documentation & comments | Issue found: README/Javadoc assert undocumented scope names (Client.Java-010). |
## Findings
### Client.Java-001
| Field | Value |
|---|---|
| Severity | Medium |
| Category | Security |
| Location | `clients/java/mxgateway-client/src/main/java/com/dohertylan/mxgateway/client/MxGatewaySecrets.java:30-32` |
| Status | Resolved |
**Description:** `redactApiKey` preserves the leading and trailing four characters of the key. A gateway API key has the form `mxgw_<key-id>_<secret>`; the last four characters belong to the secret portion, so the "redacted" form leaks 4 characters of the actual secret into logs, CLI JSON output (`CommonOptions.redactedJsonMap`), and `MxGatewayClientOptions.toString()`. CLAUDE.md states API keys must never reach logs.
**Recommendation:** Redact the secret entirely. Show only a stable non-secret prefix (e.g. the `mxgw_<key-id>_` portion) and mask everything after it, or emit a fixed `mxgw_***` form. Do not echo any trailing characters of the secret.
**Resolution:** (2026-05-18) Confirmed against source: the old `substring(0,4) + stars + substring(len-4)` echoed the last four secret characters. `redactApiKey` now masks the secret entirely: for gateway-shaped keys it returns the non-secret `mxgw_<key-id>_` prefix followed by `***` (locating the secret separator as the first `_` after `mxgw_`); any non-gateway-shaped token returns `<redacted>`. No leading/trailing secret characters are ever emitted. The pre-existing `MxGatewayCliTests.openSessionJsonRedactsApiKey` assertion that hardcoded the leaky `mxgw***********cret` form was corrected to assert the masked `mxgw_visible_***` form. Regression tests: `MxGatewayMediumFindingsTests.redactApiKeyDoesNotLeakAnyCharacterOfTheSecret`, `redactApiKeyForNonGatewayShapedKeyRevealsNothing`, `redactApiKeyStillHandlesNullAndShortInput`.
### Client.Java-002
| Field | Value |
|---|---|
| Severity | Medium |
| Category | Concurrency & thread safety |
| Location | `clients/java/mxgateway-client/src/main/java/com/dohertylan/mxgateway/client/MxEventStream.java:31,66-92` |
| Status | Resolved |
**Description:** The `next` field is a plain (non-volatile) instance field, and `MxEventStream` exposes no thread-confinement guarantee. More concretely, a queue-overflow `offer()` and a `close()` `offer(END)` can interleave so the overflow exception is enqueued after `END` and never observed — the contract that "next() throws after overflow" is not guaranteed once `close()` has been called.
**Recommendation:** Document single-consumer-thread usage explicitly in the Javadoc, and serialise terminal state transitions (overflow vs END vs close) behind a single guarded flag so the first terminal condition wins deterministically.
**Resolution:** (2026-05-18) Confirmed against source: the old `offer()` END-branch did `queue.clear(); queue.offer(END)` when full, so a `close()` arriving after an overflow wiped the already-enqueued overflow exception, leaving the consumer with a clean end-of-stream and the overflow silently lost. Terminal transitions are now serialised through a single `terminate(MxGatewayException)` method guarded by a `terminated` flag and a `terminalLock`; the first terminal condition wins and a later `close()`/`END` cannot overwrite a published overflow fault. The Javadoc now explicitly documents that the iterator methods are single-consumer-only while `close()` is safe from any thread. Regression tests: `MxGatewayMediumFindingsTests.eventStreamOverflowExceptionSurvivesASubsequentClose` (deterministic) and `eventStreamConcurrentOverflowAndCloseAlwaysTerminate` (300-iteration race stress).
### Client.Java-003
| Field | Value |
|---|---|
| Severity | Medium |
| Category | mxaccessgw conventions |
| Location | `clients/java/mxgateway-client/src/main/java/com/dohertylan/mxgateway/client/MxGatewayClient.java:119-140` |
| Status | Resolved |
**Description:** `OpenSessionReply` carries `gateway_protocol_version` (proto field 8), and `MxGatewayClientVersion.GATEWAY_PROTOCOL_VERSION` exists so the client can reject incompatible generated-code inputs. The client never reads `reply.getGatewayProtocolVersion()` nor compares it against the compiled-in version. A client built against an older/newer contract issues commands blindly and fails with confusing downstream errors instead of a clear version-mismatch failure.
**Recommendation:** In `openSession`/`openSessionRaw`, compare `reply.getGatewayProtocolVersion()` with `MxGatewayClientVersion.gatewayProtocolVersion()` and throw a typed `MxGatewayException` on mismatch.
**Resolution:** (2026-05-18) Confirmed against source: neither `openSessionRaw` nor `openSessionAsync` read `getGatewayProtocolVersion()`. Added a private `ensureGatewayProtocolCompatible` helper, called from both `openSessionRaw` and `openSessionAsync`, that throws `MxGatewayException` with a clear mismatch message when the gateway reports a non-zero version differing from `MxGatewayClientVersion.gatewayProtocolVersion()`. A gateway that leaves the field unset (value 0, e.g. an older gateway) is accepted unchanged for backward compatibility. `clients/java/README.md` documents the new fail-fast check. Regression tests: `MxGatewayMediumFindingsTests.openSessionRejectsIncompatibleGatewayProtocolVersion` and `openSessionAcceptsMatchingOrUnsetGatewayProtocolVersion`.
### Client.Java-004
| Field | Value |
|---|---|
| Severity | Medium |
| Category | Correctness & logic bugs |
| Location | `clients/java/mxgateway-client/src/main/java/com/dohertylan/mxgateway/client/MxGatewaySession.java:114-120,157-163,191-197` |
| Status | Resolved |
**Description:** `register`, `addItem`, and `addItem2` check `reply.hasRegister()`/`hasAddItem()` and otherwise fall back to `reply.getReturnValue().getInt32Value()`. If the gateway returns a reply with neither the typed payload nor a `return_value` set, the method silently returns `0` — indistinguishable from a legitimate handle of 0. This masks a contract violation rather than surfacing it.
**Recommendation:** If the expected typed payload is absent and no `return_value` is present, throw `MxGatewayException` (protocol violation) instead of returning `0`.
**Resolution:** (2026-05-18) Confirmed against source: all three methods returned `reply.getReturnValue().getInt32Value()` (which yields `0` for an unset message field) when the typed payload was absent. Each method now guards the fallback with `reply.hasReturnValue()` and throws `MxGatewayException` describing the protocol violation when neither the typed payload nor a `return_value` is present. The legitimate `return_value` fallback is preserved. Regression tests: `MxGatewayMediumFindingsTests.registerThrowsWhenReplyHasNeitherTypedPayloadNorReturnValue`, `addItemThrowsWhenReplyHasNeitherTypedPayloadNorReturnValue`, and `addItemStillHonoursReturnValueFallback`.
### Client.Java-005
| Field | Value |
|---|---|
| Severity | Medium |
| Category | Error handling & resilience |
| Location | `clients/java/mxgateway-client/src/main/java/com/dohertylan/mxgateway/client/MxGatewaySession.java:92-105` |
| Status | Resolved |
**Description:** `close()` delegates to `closeRaw()`, which performs a network RPC. When `MxGatewaySession` is used in try-with-resources and the body throws, a failure inside `closeSession` (e.g. `WORKER_UNAVAILABLE`) throws from `close()` and replaces the original exception as the propagated throwable (the body exception becomes a suppressed exception) — a known try-with-resources footgun for I/O-performing `close()`.
**Recommendation:** Either make `close()` swallow/log close-time failures (keeping `closeRaw()` for callers who want the result), or document clearly that `close()` performs a network call that can throw.
**Resolution:** (2026-05-18) Confirmed against source: `close()` called `closeRaw()` directly, so a `CloseSession` RPC failure propagated out of try-with-resources and replaced the body exception. `close()` now catches `MxGatewayException` from `closeRaw()` and logs it at WARNING via `System.Logger` instead of rethrowing, so a close-time failure never masks the body exception. `closeRaw()` is unchanged and still throws for callers who want to observe the close result. The behavior change and the recommendation to use `closeRaw()` for explicit close handling are documented in `clients/java/README.md` and the `close()` Javadoc. Regression tests: `MxGatewayMediumFindingsTests.closeSuppressesCloseTimeFailureInsteadOfMaskingBodyException` and `closeRawStillSurfacesCloseTimeFailureForCallersWhoWantIt`.
### Client.Java-006
| Field | Value |
|---|---|
| Severity | Low |
| Category | Performance & resource management |
| Location | `clients/java/mxgateway-client/src/main/java/com/dohertylan/mxgateway/client/MxGatewayClient.java:323-328`, `clients/java/mxgateway-client/src/main/java/com/dohertylan/mxgateway/client/GalaxyRepositoryClient.java:279-284` |
| Status | Resolved |
**Description:** `close()` (the `AutoCloseable` method invoked by try-with-resources) calls only `ownedChannel.shutdown()` and returns immediately without awaiting termination. In-flight calls and Netty event-loop threads may still be running when the caller assumes the resource is released. `closeAndAwaitTermination()` does it correctly but is not the method try-with-resources uses, and the README examples all rely on try-with-resources.
**Recommendation:** Have `close()` await termination for a bounded time and `shutdownNow()` on timeout (the logic already in `closeAndAwaitTermination()`), or document that try-with-resources callers should call `closeAndAwaitTermination()`.
**Resolution:** (2026-05-18) Confirmed against source: both `MxGatewayClient.close()` and `GalaxyRepositoryClient.close()` called only `ownedChannel.shutdown()`. `close()` in both clients now performs the bounded-wait logic previously only in `closeAndAwaitTermination()`: it shuts the channel down, waits up to the configured connect timeout for graceful termination, and calls `shutdownNow()` on timeout. Because `close()` cannot throw a checked exception, an `InterruptedException` while awaiting is handled by forcibly shutting the channel down and restoring the thread interrupt flag. `closeAndAwaitTermination()` is retained unchanged for callers who want the checked, blocking-aware variant. `clients/java/README.md` documents the new try-with-resources `close()` semantics.
### Client.Java-007
| Field | Value |
|---|---|
| Severity | Low |
| Category | Testing coverage |
| Location | `clients/java/mxgateway-client/src/test/java/com/dohertylan/mxgateway/client/` |
| Status | Resolved |
**Description:** The alarm surface — `acknowledgeAlarm`/`acknowledgeAlarmAsync`/`queryActiveAlarms` and `MxGatewayActiveAlarmsSubscription` — has zero test coverage. TLS channel construction, the async `streamEventsAsync` path, `MxGatewayEventSubscription` pre-start cancellation, and `MxEventStream` queue overflow are likewise untested. `JavaClientDesign.md` explicitly lists async stream-observer cancellation and status/error mapping as required tests.
**Recommendation:** Add in-process gRPC tests for the alarm RPCs, the async streaming/subscription cancellation paths, and at least one TLS-config construction test.
**Resolution:** (2026-05-18) Confirmed against source: no test referenced `acknowledgeAlarm`, `queryActiveAlarms`, `streamEventsAsync`, TLS construction, or `MxEventStream` overflow. Added `MxGatewayLowFindingsTests` (12 tests) covering: `acknowledgeAlarm`/`acknowledgeAlarmAsync` (success, typed protocol-failure, async transport-failure normalisation), `queryActiveAlarms` observer delivery, `MxGatewayActiveAlarmsSubscription` and `MxGatewayEventSubscription` pre-start cancellation, `streamEventsAsync` observer delivery, `MxEventStream` queue overflow surfacing `MxGatewayException`, TLS channel construction (missing CA file rejected with a typed exception, system-trust path builds cleanly), and the Client.Java-008 async-validator normalisation. While writing the TLS test a latent bug was found: a missing/unreadable CA file makes `GrpcSslContexts` throw `IllegalArgumentException` (not `SSLException`), which the old `catch (SSLException)` let escape unwrapped — the catch in the shared channel builder was broadened to also wrap `RuntimeException` so callers always see one typed `MxGatewayException`.
### Client.Java-008
| Field | Value |
|---|---|
| Severity | Low |
| Category | Error handling & resilience |
| Location | `clients/java/mxgateway-client/src/main/java/com/dohertylan/mxgateway/client/MxGatewayClient.java:298-304` |
| Status | Resolved |
**Description:** `acknowledgeAlarmAsync` and `openSessionAsync` apply `ensureProtocolSuccess` inside `thenApply`. If that validator throws a non-`MxGatewayException` `RuntimeException` it is wrapped by `CompletionException` with no `fromGrpc` normalisation, unlike the synchronous paths which normalise via `try/catch`. The async and sync error surfaces are therefore inconsistent.
**Recommendation:** Wrap the `thenApply` body so any non-`MxGatewayException` is routed through `MxGatewayErrors.fromGrpc`, matching the synchronous methods.
**Resolution:** (2026-05-18) Confirmed against source: the `thenApply` validators in `openSessionAsync`, `invokeAsync`, and `acknowledgeAlarmAsync` were not normalised — in practice the gateway's own validators (`ensureProtocolSuccess`, `ensureMxAccessSuccess`, `ensureGatewayProtocolCompatible`) only ever throw `MxGatewayException`, but a stray non-`MxGatewayException` `RuntimeException` (e.g. an NPE from a malformed reply) would surface raw inside `CompletionException`. Added `MxGatewayChannels.normalisingValidator(operation, fn)`: it rethrows `MxGatewayException` unchanged and routes any other `RuntimeException` through `MxGatewayErrors.fromGrpc`, matching the synchronous `try/catch` paths. All three async `thenApply` sites now use it. Regression test: `MxGatewayLowFindingsTests.openSessionAsyncNormalisesNonGatewayRuntimeExceptionFromValidator`.
### Client.Java-009
| Field | Value |
|---|---|
| Severity | Low |
| Category | Code organization & conventions |
| Location | `clients/java/mxgateway-client/src/main/java/com/dohertylan/mxgateway/client/GalaxyRepositoryClient.java:310-391`, `clients/java/mxgateway-client/src/main/java/com/dohertylan/mxgateway/client/MxGatewayClient.java:346-413` |
| Status | Resolved |
**Description:** `createChannel`, `withDeadline`, `withStreamDeadline`, and `toCompletable` are duplicated nearly verbatim across `MxGatewayClient` and `GalaxyRepositoryClient` (~80 lines). A fix to one will not propagate to the other.
**Recommendation:** Extract the channel-builder and future-adaptor helpers into a shared package-private utility class.
**Resolution:** (2026-05-18) Confirmed against source: the four helpers were duplicated near-verbatim. Added a package-private `MxGatewayChannels` utility class holding `createChannel(options, tlsErrorPrefix)`, `withDeadline(stub, options)`, `withStreamDeadline(stub, options)`, `toCompletable(future, operation)`, and the new `normalisingValidator` helper (Client.Java-008). Both `MxGatewayClient` and `GalaxyRepositoryClient` now delegate to it and their private copies were deleted, so a future fix lives in one place. Behavior is unchanged except the operation-name carried into `MxGatewayErrors.fromGrpc` is now the specific RPC name instead of the generic `"async call"`/`"galaxy async call"`. Verified by the full existing async test suite plus the new `MxGatewayLowFindingsTests`.
### Client.Java-010
| Field | Value |
|---|---|
| Severity | Low |
| Category | Documentation & comments |
| Location | `clients/java/mxgateway-client/src/main/java/com/dohertylan/mxgateway/client/MxGatewayClient.java:269-272`, `clients/java/README.md:76` |
| Status | Resolved |
**Description:** The `acknowledgeAlarm` Javadoc states the gateway authenticates against an `invoke:alarm-ack` scope, and the README states the Galaxy Repository requires a `metadata:read` scope. CLAUDE.md's documented scope set names neither — the Javadoc/README assert a scope contract the project's own auth documentation does not corroborate.
**Recommendation:** Reconcile the scope names with `src/MxGateway.Server/Security/` and CLAUDE.md; correct the Javadoc/README to the actual scope strings, or fix CLAUDE.md if sub-scopes were genuinely added.
**Resolution:** (2026-05-18) Partially re-triaged. Verified against `src/MxGateway.Server/Security/Authorization/GatewayScopes.cs` and `GatewayGrpcScopeResolver.cs`: the canonical scope catalog is `session:open`, `session:close`, `invoke:read`, `invoke:write`, `invoke:secure`, `events:read`, `metadata:read`, `admin`. (a) The README's `metadata:read` for the Galaxy Repository is **correct**`TestConnectionRequest`/`GetLastDeployTimeRequest`/`DiscoverHierarchyRequest`/`WatchDeployEventsRequest` all resolve to `GatewayScopes.MetadataRead`; no change needed. CLAUDE.md's prose lists only coarse scope groups, but the canonical resolver does define `metadata:read`. (b) The `acknowledgeAlarm` Javadoc's `invoke:alarm-ack` is **wrong** — no such scope exists. `AcknowledgeAlarmRequest` and `QueryActiveAlarmsRequest` are not special-cased in `GatewayGrpcScopeResolver`, so they fall through the `_ => GatewayScopes.Admin` default and require the `admin` scope. The Javadoc was corrected to state the `admin` scope; `queryActiveAlarms` did not assert a scope and was left unchanged. The README does not mention alarms, so no README change was required.
### Client.Java-011
| Field | Value |
|---|---|
| Severity | Low |
| Category | Performance & resource management |
| Location | `clients/java/mxgateway-client/src/main/java/com/dohertylan/mxgateway/client/MxEventStream.java:37-63` |
| Status | Resolved |
**Description:** The event stream relies on default gRPC auto-inbound flow control: the async stub auto-requests messages, so the server can push faster than the 16-element bounded queue drains. A momentarily slow consumer triggers queue overflow and an immediate stream-fault cancel. This is consistent with the documented fail-fast event-backpressure design, but the client never applies real flow control, so even brief consumer stalls kill the subscription.
**Recommendation:** Confirm fail-fast is intended (it appears to be); if so, document it on `MxEventStream` so callers know a slow consumer terminates the stream. Optionally expose the queue capacity or opt-in flow control.
**Resolution:** (2026-05-18) Confirmed fail-fast is intended — CLAUDE.md ("fail-fast event backpressure") and `docs/DesignDecisions.md` make a slow consumer losing its subscription a deliberate v1 design choice, so this is documentation-only, not a behavior bug. Added an explicit "Backpressure (fail-fast)" section to the `MxEventStream` class Javadoc explaining that the adaptor uses gRPC auto-inbound flow control with a fixed 16-element buffer and no client flow control, that a consumer stall long enough to fill the buffer triggers an overflow that cancels the subscription and surfaces an `MxGatewayException`, and that consumers must drain promptly and be ready to resubscribe with a resume cursor. `clients/java/README.md` carries the same caveat. The queue capacity was intentionally left non-configurable to keep the v1 surface aligned with the gateway design; overflow behavior is covered by `MxGatewayLowFindingsTests.eventStreamQueueOverflowSurfacesExceptionFromNext`.
### Client.Java-012
| Field | Value |
|---|---|
| Severity | Low |
| Category | Correctness & logic bugs |
| Location | `clients/java/mxgateway-cli/src/main/java/com/dohertylan/mxgateway/cli/MxGatewayCli.java:667-674` |
| Status | Resolved |
**Description:** `CommonOptions.resolved()` mutates `this` (`resolvedApiKey`, `resolvedTimeout`) and returns `this`, but `toClientOptions()` and `redactedJsonMap()` read those mutated fields. If `redactedJsonMap()` is ever called before `resolved()`, it silently emits empty-string defaults. The "return this after mutating" pattern is fragile and surprising.
**Recommendation:** Make `resolved()` return an immutable resolved value object, or compute `resolvedApiKey`/`resolvedTimeout` lazily in their getters so call ordering cannot produce stale output.
**Resolution:** (2026-05-18) Confirmed against source: `resolved()` populated the `resolvedApiKey`/`resolvedTimeout` mutable fields and `toClientOptions()`/`redactedJsonMap()` read them, so calling either before `resolved()` emitted stale empty/30s defaults. The two mutable fields were removed and replaced with side-effect-free accessor methods `resolvedApiKey()` and `resolvedTimeout()` that compute their value on each call (API key from `--api-key` or the `--api-key-env` variable; timeout via `parseDuration`). `toClientOptions()` and `redactedJsonMap()` now call those accessors directly, so call ordering can no longer produce stale output. `resolved()` is retained as a no-op returning `this` purely for call-site readability (`common.resolved()`), with its Javadoc updated to state resolution is now lazy. Pure-refactor with no runtime-behavior change for the existing call order, so no new test was added; covered by the existing `MxGatewayCliTests` JSON-redaction and option-parsing tests.
+207
View File
@@ -0,0 +1,207 @@
# Code Review — Client.Python
| Field | Value |
|---|---|
| Module | `clients/python` |
| Reviewer | Claude Code |
| Review date | 2026-05-18 |
| Commit reviewed | `3cc53a8` |
| Status | Reviewed |
| Open findings | 0 |
## Checklist coverage
| # | Category | Result |
|---|---|---|
| 1 | Correctness & logic bugs | Issues found: dead `closed` variable (Client.Python-004); float/bytes value-mapping assumptions (Client.Python-008). |
| 2 | mxaccessgw conventions | Largely adheres; one missing export and a `*_raw` MXAccess-failure documentation gap (Client.Python-002, Client.Python-012). |
| 3 | Concurrency & thread safety | Issue found: `close()` idempotency claim does not hold under concurrent close (Client.Python-006). |
| 4 | Error handling & resilience | Issues found: inconsistent timeout-kwarg fallback (Client.Python-003); `success == 0` default-value hazard (Client.Python-011); inconsistent cancel helpers (Client.Python-007). |
| 5 | Security | No issues found — API keys redacted in repr and CLI output, TLS supported, no secret logging. |
| 6 | Performance & resource management | Issue found: `discover_hierarchy` buffers the whole hierarchy in memory (Client.Python-005). |
| 7 | Design-document adherence | Matches the design docs closely; minor CLI doc drift (Client.Python-001). |
| 8 | Code organization & conventions | Issues found: `MxGatewayCommandError` omitted from `__all__` (Client.Python-002); fragile circular-import workaround (Client.Python-010). |
| 9 | Testing coverage | Issue found: `write2`, `add_item2`, bulk-size limits, TLS `ca_file`, and CLI command bodies untested (Client.Python-009). |
| 10 | Documentation & comments | Issue found: stale "scaffold" package description (Client.Python-001). |
## Findings
### Client.Python-001
| Field | Value |
|---|---|
| Severity | Low |
| Category | Documentation & comments |
| Location | `clients/python/pyproject.toml:8,25`, `clients/python/src/mxgateway_cli/commands.py:25` |
| Status | Resolved |
**Description:** The package `description` in `pyproject.toml` still says "Async Python client *scaffold*" even though the client is fully implemented. Stale "scaffold" wording misrepresents maturity to anyone reading PyPI metadata. (The `mxgw-py` console-script name is itself consistent between `pyproject.toml` and the README.)
**Recommendation:** Update the `pyproject.toml` description to drop "scaffold"; keep README CLI examples in sync with the actual `mxgw-py` entry point.
**Resolution:** 2026-05-18 — Confirmed: `pyproject.toml:8` `description` read "Async Python client scaffold for MXAccess Gateway." Changed to "Async Python client for MXAccess Gateway." The `mxgw-py` console-script name was already consistent with the README, so no README change was needed. Pure metadata fix — no test required.
### Client.Python-002
| Field | Value |
|---|---|
| Severity | Low |
| Category | Code organization & conventions |
| Location | `clients/python/src/mxgateway/__init__.py:27` |
| Status | Resolved |
**Description:** `MxGatewayCommandError` is imported into `__init__.py` and is a documented public exception, but it is missing from `__all__`. It is the parent of `MxAccessError` and a meaningful catch target, so omitting it from the public surface is inconsistent — `from mxgateway import *` will not expose it and tooling that respects `__all__` treats it as private.
**Recommendation:** Add `"MxGatewayCommandError"` to the `__all__` list.
**Resolution:** 2026-05-18 — Re-triaged: this finding is stale against the reviewed source. `clients/python/src/mxgateway/__init__.py` already imports `MxGatewayCommandError` (line 16) **and** lists `"MxGatewayCommandError"` in `__all__` (line 38). `from mxgateway import *` exposes it correctly. Verified at runtime (`'MxGatewayCommandError' in mxgateway.__all__` is `True`). No source change required — the defect described no longer exists.
### Client.Python-003
| Field | Value |
|---|---|
| Severity | Medium |
| Category | Error handling & resilience |
| Location | `clients/python/src/mxgateway/client.py:125-137,155-173` |
| Status | Resolved |
**Description:** `stream_events_raw` and `query_active_alarms` call the stub directly with a `timeout` kwarg when `stream_timeout` is set, with no `TypeError` fallback. `galaxy.py:watch_deploy_events` and `_unary` *do* have a fallback that strips `timeout` if the callable rejects it. This asymmetry means a fake/older stub that does not accept `timeout` crashes for gateway streams but not Galaxy streams. It is only masked today because `stream_timeout` defaults to `None`.
**Recommendation:** Apply the same `try/except TypeError` timeout-fallback pattern to `stream_events_raw` and `query_active_alarms`, or remove the fallback everywhere and standardise on a single behaviour.
**Resolution:** 2026-05-18 — Confirmed: both stream methods in `client.py` called the stub with `timeout` unconditionally and had no `TypeError` fallback, unlike `_unary` and `galaxy.watch_deploy_events`. Added a shared `_open_stream` helper in `client.py` that opens a server-streaming call and strips the `timeout` kwarg when the stub raises `TypeError: ... unexpected keyword argument 'timeout'`, then routed both `stream_events_raw` and `query_active_alarms` through it. Regression tests in `tests/test_stream_timeout_fallback.py` (`test_stream_events_raw_falls_back_when_stub_rejects_timeout`, `test_query_active_alarms_falls_back_when_stub_rejects_timeout`, `test_stream_events_raw_still_passes_timeout_to_capable_stub`) failed before the fix and pass after. No public behaviour change for real gRPC stubs, so no README update needed.
### Client.Python-004
| Field | Value |
|---|---|
| Severity | Low |
| Category | Correctness & logic bugs |
| Location | `clients/python/src/mxgateway_cli/commands.py:386,402-404` |
| Status | Resolved |
**Description:** In `_smoke`, the local variable `closed` is set to `False` and never reassigned; the `finally` block's `if not closed:` is therefore always true. This is dead/misleading code suggesting a removed early-close path.
**Recommendation:** Remove the `closed` variable and the `if not closed:` guard; call `await session.close()` directly in the `finally` block (or use `async with session:`).
**Resolution:** 2026-05-18 — Confirmed: `closed = False` was set and never reassigned, making `if not closed:` dead code. Replaced the `try/finally` with `async with session:` so the session is closed via the documented async context manager — `Session` already implements `__aexit__``close()`. Behaviour is unchanged (the session is still closed on every exit path); no test needed for the dead-code removal — exercised by the existing CLI smoke test.
### Client.Python-005
| Field | Value |
|---|---|
| Severity | Medium |
| Category | Performance & resource management |
| Location | `clients/python/src/mxgateway/galaxy.py:117-140` |
| Status | Resolved |
**Description:** `discover_hierarchy` pages through the entire Galaxy object hierarchy and accumulates every `GalaxyObject` (each carrying its full attribute list) into a single in-memory `list` before returning. For a large Galaxy this is a very large allocation with no streaming alternative and no caller-side bound.
**Recommendation:** Offer an async-generator variant (e.g. `iter_hierarchy()`) that yields objects/pages as they arrive, keeping `discover_hierarchy()` as a convenience wrapper. At minimum document the memory characteristic.
**Resolution:** 2026-05-18 — Confirmed: `discover_hierarchy` buffered the entire hierarchy with no streaming alternative. Added `GalaxyRepositoryClient.iter_hierarchy`, an async generator that fetches one `DiscoverHierarchyRequest` page at a time and yields each `GalaxyObject` as it arrives, so peak memory is bounded by a single page (`_DISCOVER_HIERARCHY_PAGE_SIZE`). Pages are fetched lazily — the next page is only requested after the current page is fully consumed. `discover_hierarchy` is now a thin convenience wrapper (`[obj async for obj in self.iter_hierarchy()]`) that preserves its `list[GalaxyObject]` contract, including the repeated-page-token guard. Regression tests in `tests/test_galaxy_iter_hierarchy.py` (`test_iter_hierarchy_yields_objects_across_pages`, `test_iter_hierarchy_is_lazy_and_does_not_prefetch_next_page`, `test_iter_hierarchy_rejects_repeated_page_token`, `test_discover_hierarchy_still_returns_full_list`) failed before the fix and pass after. `clients/python/README.md` updated with the `iter_hierarchy` usage and memory guidance since this adds a new public method.
### Client.Python-006
| Field | Value |
|---|---|
| Severity | Low |
| Category | Concurrency & thread safety |
| Location | `clients/python/src/mxgateway/client.py:74-82`, `clients/python/src/mxgateway/galaxy.py:85-93`, `clients/python/src/mxgateway/session.py:38-55` |
| Status | Resolved |
**Description:** `close()` on the clients and `Session.close()` use a plain `self._closed` check-then-set with an `await` between, with no lock. If two coroutines call `close()` concurrently both can pass the guard before either sets it, causing a double `channel.close()` / double `CloseSession` RPC. Single-task usage is the documented contract, so impact is low, but the idempotency guarantee asserted in docstrings only holds for sequential calls.
**Recommendation:** Set `self._closed = True` before the `await`, or guard with an `asyncio.Lock`, so the idempotency claim holds under concurrent close.
**Resolution:** 2026-05-18 — Confirmed the check-then-set window. Fixed `GatewayClient.close`, `GalaxyRepositoryClient.close`, and `Session.close` to set `self._closed = True` *before* the `await` (channel close / `CloseSession` RPC). A second coroutine entering `close()` while the first is still awaiting now hits the early-return guard and does not issue a second `channel.close()` / `CloseSession`. Docstrings updated to state the idempotency holds under concurrent calls. TDD: regression tests in `tests/test_low_severity_findings.py` (`test_gateway_client_concurrent_close_closes_channel_once`, `test_galaxy_client_concurrent_close_closes_channel_once`, `test_session_concurrent_close_sends_one_close_session_rpc`) — each uses a fake channel/client that stalls inside `close`/`close_session_raw` so two concurrent `close()` calls interleave at the exact race window; they failed before the fix and pass after.
### Client.Python-007
| Field | Value |
|---|---|
| Severity | Low |
| Category | Error handling & resilience |
| Location | `clients/python/src/mxgateway/client.py:204-213` |
| Status | Resolved |
**Description:** `_canceling_iterator` (gateway event stream) does not catch `asyncio.CancelledError` to invoke `call.cancel()` explicitly — it relies on the `finally` block. `galaxy.py:_canceling_iterator` *does* explicitly catch `CancelledError`, cancel, and re-raise. The two are functionally equivalent today, but the inconsistency between near-identical helpers invites future divergence.
**Recommendation:** Make the two `_canceling_iterator` helpers identical, ideally by factoring a single shared helper.
**Resolution:** 2026-05-18 — Confirmed the divergence. Factored a single shared helper: `client._canceling_iterator(call, operation)` now takes the `map_rpc_error` operation string as a parameter, explicitly catches `asyncio.CancelledError` (cancels the call, re-raises) and `grpc.RpcError`, and repeats the cancel in `finally`. This replaces both the gateway `_canceling_iterator` and the gateway `_canceling_active_alarms_iterator`; `galaxy.py` now imports and delegates to the same helper instead of defining its own, so the gateway and Galaxy stream helpers are byte-for-byte identical. TDD: `tests/test_low_severity_findings.py::test_gateway_stream_iterator_cancels_call_on_task_cancellation` drives a cancellable fake stream and asserts the gateway iterator cancels the underlying call on task cancellation. All existing stream-cancellation tests still pass.
### Client.Python-008
| Field | Value |
|---|---|
| Severity | Low |
| Category | Correctness & logic bugs |
| Location | `clients/python/src/mxgateway/values.py:62-67,83-88` |
| Status | Resolved |
**Description:** `to_mx_value` maps any Python `float` to `VT_R8`/`MX_DATA_TYPE_DOUBLE` with no handling for `nan`/`inf`, which are serialised and forwarded to MXAccess which may reject or mis-handle them. `bytes` is mapped to `VT_RECORD`/`MX_DATA_TYPE_UNKNOWN`, a questionable default. The `data_type` keyword exists but `Session.write` never forwards it.
**Recommendation:** Document the float/bytes mapping assumptions, optionally validate finiteness, and consider plumbing the `data_type` keyword through `Session.write`/`write2`.
**Resolution:** 2026-05-18 — Confirmed the non-finite-float hazard. Added an `_ensure_finite` guard in `values.py`: `to_mx_value` now raises `ValueError` for `nan`/`inf`/`-inf`, both for a scalar `float` and for a non-finite element inside a float sequence — MXAccess has no defined wire representation for non-finite doubles, so rejecting client-side is the correct fail-fast. The `float`/`bytes` mapping assumptions (finite-only doubles; `bytes` as an opaque `VT_RECORD` pass-through) are now documented in the `values.py` module docstring and `clients/python/README.md`. Plumbing `data_type` through `Session.write`/`write2` was deliberately *not* done: it is a larger public-API surface change the finding only marks as "consider", and the documented MXAccess-parity convention is type-by-Python-value; the `data_type` keyword stays available on `to_mx_value` for callers that build the `MxValue` directly. TDD: `tests/test_low_severity_findings.py` adds `test_to_mx_value_rejects_nan`, `test_to_mx_value_rejects_positive_infinity`, `test_to_mx_value_rejects_negative_infinity`, `test_to_mx_value_rejects_non_finite_float_in_sequence`, and `test_to_mx_value_accepts_finite_float`. README updated since `to_mx_value` (used by `Session.write`/`write2`) now rejects an input it previously accepted.
### Client.Python-009
| Field | Value |
|---|---|
| Severity | Medium |
| Category | Testing coverage |
| Location | `clients/python/tests/` |
| Status | Resolved |
**Description:** Several non-trivial public paths are untested: `Session.write2`/`add_item2` request construction; the bulk-size limit `_ensure_bulk_size`/`MAX_BULK_ITEMS` guard; the `None`-argument `TypeError` guards in bulk methods; the TLS `ca_file` read path in `create_channel`; most CLI command bodies; and `map_rpc_error`'s default (non-auth) branch.
**Recommendation:** Add tests for `write2`/`add_item2` request shape, the bulk-size `ValueError`, the `ca_file` TLS branch, the generic `map_rpc_error` fallthrough, and at least one happy-path CLI command using a fake stub.
**Resolution:** 2026-05-18 — Confirmed coverage gap against the existing `tests/` files. Added `tests/test_coverage_gaps.py` covering every path the finding lists: `test_add_item2_sends_item_context_and_returns_handle` and `test_write2_sends_value_and_timestamp_value` (request shape + `MxValue` oneof), `test_subscribe_bulk_rejects_oversized_request` and `test_add_item_bulk_at_limit_is_allowed` (the `MAX_BULK_ITEMS` `_ensure_bulk_size` boundary), `test_advise_item_bulk_rejects_none_argument` (the `None`-argument `TypeError` guard), `test_create_channel_reads_ca_file` and `test_create_channel_missing_ca_file_raises` (the TLS `ca_file` read path), `test_map_rpc_error_generic_branch_returns_transport_error` and `test_map_rpc_error_handles_error_without_code` (the non-auth `map_rpc_error` fallthrough and the no-`code` path), and `test_cli_register_happy_path_emits_server_handle` (a happy-path CLI command body driven end to end through `CliRunner` with a fake stub via a monkeypatched `_connect`). All 10 new tests pass. No source change required — this is a pure coverage finding.
### Client.Python-010
| Field | Value |
|---|---|
| Severity | Low |
| Category | Code organization & conventions |
| Location | `clients/python/src/mxgateway/session.py:404`, `clients/python/src/mxgateway_cli/commands.py:422-425` |
| Status | Resolved |
**Description:** `session.py` ends with a module-level late import `from .client import GatewayClient # noqa: E402` purely to satisfy a string type hint, and `commands.py:_session` does a function-local import. Both work around a circular dependency that `from __future__ import annotations` (already in effect) makes unnecessary. `_session` also lacks a return type annotation.
**Recommendation:** Drop the runtime late import in `session.py` and use a `TYPE_CHECKING`-guarded import for the hint; add the `-> Session` return annotation to `commands.py:_session`.
**Resolution:** 2026-05-18 — Confirmed: with `from __future__ import annotations` in effect all annotations are strings, so the runtime late import was unnecessary. Removed the trailing `from .client import GatewayClient # noqa: E402` in `session.py` and replaced it with a top-of-file `if TYPE_CHECKING:` import that satisfies the `GatewayClient` hint without a runtime dependency (no import cycle: `client.py` does not import `session` at module scope). In `commands.py`, hoisted the function-local `from mxgateway.session import Session` to a module-level import and added the `-> Session` return annotation to `_session`. Verified `import mxgateway` and `import mxgateway_cli.commands` succeed with no circular-import error. Pure refactor — covered by the existing import and CLI tests; no new test needed.
### Client.Python-011
| Field | Value |
|---|---|
| Severity | Low |
| Category | Error handling & resilience |
| Location | `clients/python/src/mxgateway/errors.py:122-148` |
| Status | Resolved |
**Description:** `ensure_mxaccess_success` raises `MxAccessError` if any `mx_status.success == 0`. This treats `success == 0` as the failure sentinel, but `0` is also the proto3 scalar default for an unset `MxStatusProxy`. If the gateway ever returns a reply with an unpopulated status entry (e.g. a partially-filled bulk result), the client raises `MxAccessError` even though no real failure occurred.
**Recommendation:** Confirm against the proto/gateway contract whether `success` is guaranteed populated for every `statuses` entry; if not, key the failure decision on an explicit failure field rather than the `success == 0` default.
**Resolution:** 2026-05-18 — Confirmed against the gateway contract: `success` is **not** guaranteed populated for every `statuses` entry. `src/MxGateway.Worker/Conversion/MxStatusProxyConverter.cs::ConvertMany` emits a placeholder `MxStatusProxy` for a null `MXSTATUS_PROXY` COM array entry, setting `Category`/`DetectedBy` to `Unknown` but **leaving `Success` at its proto3 default of 0**. A fully-default proto entry likewise has `success == 0`. Under the old client logic either placeholder would falsely raise `MxAccessError`. Fixed `ensure_mxaccess_success` to key the per-status failure decision on a new `_is_mxaccess_status_failure` helper that requires `success == 0` **and** a populated, non-OK `category` — a status with `category` of `MX_STATUS_CATEGORY_UNSPECIFIED` (default proto) or `MX_STATUS_CATEGORY_UNKNOWN` (the null-entry placeholder) is treated as unpopulated and ignored. `MX_STATUS_CATEGORY_OK` is also excluded so a genuine success entry never raises. Real failures (categories `WARNING` and the error categories, raw value ≥ 2) still raise as before — the existing `write.mxaccess-failure` fixture (`SECURITY_ERROR`/`OPERATIONAL_ERROR` statuses) and the `MXACCESS_FAILURE` protocol-status path are unaffected. TDD: `tests/test_low_severity_findings.py` adds `test_ensure_mxaccess_success_ignores_unpopulated_status_entry` (default + null-placeholder entries, no raise), `test_ensure_mxaccess_success_raises_on_populated_failure_status` (populated `COMMUNICATION_ERROR`, raises), and `test_ensure_mxaccess_success_passes_when_status_reports_success`. No public-behaviour change for genuine replies, so no README update.
### Client.Python-012
| Field | Value |
|---|---|
| Severity | Low |
| Category | mxaccessgw conventions |
| Location | `clients/python/src/mxgateway/client.py:84-108`, `clients/python/src/mxgateway/session.py:57-77` |
| Status | Won't Fix |
**Description:** `Session.invoke_raw` does not run `ensure_mxaccess_success` while `Session.invoke` does, so a caller using `invoke_raw` for parity tests gets a reply where an MXAccess HRESULT failure is silently embedded with no exception. This is by design but under-documented — the README's "preserve raw replies" sentence does not state that `*_raw` methods skip MXAccess-failure detection entirely.
**Recommendation:** Document explicitly (README + docstring) that `*_raw` methods surface MXAccess HRESULT/status failures only inside the reply and do not raise `MxAccessError`, so parity-test callers know to inspect `protocol_status`/`hresult`/`statuses` themselves.
**Resolution:** 2026-05-18 — Won't Fix (no behaviour change). Confirmed this is intentional, correct parity behaviour: the `*_raw` methods exist precisely so parity-test callers can inspect an unmodified gateway reply, including embedded MXAccess HRESULT/status failures, without an exception masking them. Changing `invoke_raw` to raise `MxAccessError` would defeat its purpose and duplicate `Session.invoke`. The finding's only actionable point is the documentation gap, which has been addressed: `clients/python/README.md` now states explicitly that `*_raw` methods enforce gateway protocol success only and do **not** run MXAccess-failure detection, and the docstrings of `GatewayClient.invoke_raw` and `Session.invoke_raw` say the same and point callers to inspect `protocol_status`/`hresult`/`statuses` (and to `Session.invoke` for the checked variant). No code/test change — the runtime contract is unchanged and correct.
+207
View File
@@ -0,0 +1,207 @@
# Code Review — Client.Rust
| Field | Value |
|---|---|
| Module | `clients/rust` |
| Reviewer | Claude Code |
| Review date | 2026-05-18 |
| Commit reviewed | `3cc53a8` |
| Status | Reviewed |
| Open findings | 0 |
## Checklist coverage
| # | Category | Result |
|---|---|---|
| 1 | Correctness & logic bugs | Issues found: a stale unit test fails the suite (Client.Rust-003); handle extractors silently return 0 on a shapeless OK reply (Client.Rust-005). |
| 2 | mxaccessgw conventions | `cargo clippy --workspace --all-targets -- -D warnings` fails (Client.Rust-001, Client.Rust-002, Client.Rust-012), violating a CLAUDE.md hard requirement; hard-coded correlation ids (Client.Rust-011). |
| 3 | Concurrency & thread safety | No issues found — clients are cheaply cloneable, streams are `Send`, drop-cancels-call is verified. |
| 4 | Error handling & resilience | Issues found: empty-vec on shapeless bulk reply (Client.Rust-006); no transient/permanent classification (Client.Rust-010). |
| 5 | Security | No issues found — API keys redacted in `Debug`/`Display`, status messages scrubbed, TLS handled correctly. |
| 6 | Performance & resource management | Issue found: value/array projections clone every element, doubling array memory (Client.Rust-008). |
| 7 | Design-document adherence | Issue found: `RustClientDesign.md` documents a stale crate layout and an unused `tracing` dependency (Client.Rust-007). |
| 8 | Code organization & conventions | Issue found: `BulkReplyKind` trips a clippy lint; undocumented public methods (Client.Rust-001, Client.Rust-002). |
| 9 | Testing coverage | Issue found: TLS setup, mid-stream fault propagation, and the bulk-size cap untested (Client.Rust-009). |
| 10 | Documentation & comments | Issue found: the version-constant doc comment is wrong (Client.Rust-004). |
## Findings
### Client.Rust-001
| Field | Value |
|---|---|
| Severity | High |
| Category | mxaccessgw conventions |
| Location | `clients/rust/src/options.rs:98,143` |
| Status | Resolved |
**Description:** `with_max_grpc_message_bytes` and `max_grpc_message_bytes` have no `///` doc comments. The crate sets `#![warn(missing_docs)]` and CLAUDE.md mandates that `cargo clippy --workspace --all-targets -- -D warnings` pass. Under `-D warnings` these become hard errors, so clippy fails to compile the crate — breaking the documented build/test workflow for the module.
**Recommendation:** Add doc comments to both methods, e.g. `/// Maximum encoded/decoded gRPC message size in bytes (default 16 MiB).`
**Resolution:** Resolved in `0d8a28d` (2026-05-18): doc comments added to both methods.
### Client.Rust-002
| Field | Value |
|---|---|
| Severity | High |
| Category | mxaccessgw conventions |
| Location | `clients/rust/src/session.rs:522` |
| Status | Resolved |
**Description:** The `BulkReplyKind` enum's variants (`AddItemBulk`, `AdviseItemBulk`, `RemoveItemBulk`, `UnAdviseItemBulk`, `SubscribeBulk`, `UnsubscribeBulk`) all share the `Bulk` suffix, tripping `clippy::enum_variant_names`. Under `-D warnings` this is a compile error, so `cargo clippy --workspace --all-targets -- -D warnings` fails — a violation of the CLAUDE.md requirement that clippy pass cleanly.
**Recommendation:** Rename the variants to drop the common suffix (e.g. `AddItem`, `AdviseItem`, …) or add a narrowly-scoped `#[allow(clippy::enum_variant_names)]` with a reason comment.
**Resolution:** Resolved in `0d8a28d` (2026-05-18): variants renamed to `AddItem`/`AdviseItem`/`RemoveItem`/`UnAdviseItem`/`Subscribe`/`Unsubscribe`, which no longer share a common suffix.
### Client.Rust-003
| Field | Value |
|---|---|
| Severity | High |
| Category | Correctness & logic bugs |
| Location | `clients/rust/crates/mxgw-cli/src/main.rs:1051` |
| Status | Resolved |
**Description:** The unit test `version_json_output_has_protocol_versions` asserts `value["gatewayProtocolVersion"] == 2`, but `GATEWAY_PROTOCOL_VERSION` is `3` (version.rs:10), matching the authoritative server constant `GatewayContractInfo.GatewayProtocolVersion = 3`. The test fails, so `cargo test --workspace` (the documented test step) does not pass — the test was not updated when the protocol version was bumped.
**Recommendation:** Update the assertion to `3`, or better, assert against `GATEWAY_PROTOCOL_VERSION` so it cannot drift again.
**Resolution:** Resolved in `0d8a28d` (2026-05-18): the test now asserts against the `GATEWAY_PROTOCOL_VERSION` / `WORKER_PROTOCOL_VERSION` constants, so it cannot drift again.
### Client.Rust-004
| Field | Value |
|---|---|
| Severity | Low |
| Category | Documentation & comments |
| Location | `clients/rust/src/version.rs:7` |
| Status | Resolved |
**Description:** `CLIENT_VERSION` is `"0.1.0-dev"` and its doc comment claims "Mirrors `Cargo.toml`", but `Cargo.toml` declares `version = "0.1.0"` (no `-dev` suffix). The comment is misleading and the value is not actually kept in sync with the manifest.
**Recommendation:** Either set `CLIENT_VERSION` from the build via `env!("CARGO_PKG_VERSION")`, or correct the constant to `"0.1.0"` and drop the "Mirrors Cargo.toml" claim.
**Resolution:** Resolved in `0d8a28d` (2026-05-18): `CLIENT_VERSION` is now `env!("CARGO_PKG_VERSION")`, taken from `Cargo.toml` at compile time so the two cannot drift.
### Client.Rust-005
| Field | Value |
|---|---|
| Severity | Medium |
| Category | Correctness & logic bugs |
| Location | `clients/rust/src/session.rs:489-520` |
| Status | Resolved |
**Description:** `register_server_handle`, `add_item_handle`, and `add_item2_handle` fall through to `reply.return_value … .unwrap_or_default()`, returning `0` when the reply carries neither the expected typed payload nor an `Int32` `return_value`. Because `Session::invoke` has already confirmed `protocol_status == Ok`, a malformed-but-OK reply silently yields handle `0`, which the caller then uses as a real handle against the worker.
**Recommendation:** Return `Err(Error::ProtocolStatus { … })` (or a dedicated `Error::MalformedReply`) when an OK reply lacks an extractable handle, instead of defaulting to `0`.
**Resolution:** Resolved in `0d8a28d` (2026-05-18): the three handle extractors now return `Result<i32, Error>` and yield the new `Error::MalformedReply` when an OK reply carries no usable handle.
### Client.Rust-006
| Field | Value |
|---|---|
| Severity | Medium |
| Category | Error handling & resilience |
| Location | `clients/rust/src/session.rs:531-555` |
| Status | Resolved |
**Description:** `bulk_results` returns `Vec::new()` for any `(payload, kind)` combination that does not match the expected arm — including an OK reply carrying the wrong or no payload. A caller of `subscribe_bulk`/`add_item_bulk` then sees an empty result vector and cannot distinguish "zero items processed" from "gateway returned a shapeless reply".
**Recommendation:** Treat a missing/mismatched bulk payload on an OK reply as an error rather than an empty vector, or document the empty-vec fallback explicitly and log it.
**Resolution:** Resolved in `0d8a28d` (2026-05-18): `bulk_results` now returns `Result<Vec<SubscribeResult>, Error>` and yields `Error::MalformedReply` on a mismatched or absent bulk payload.
### Client.Rust-007
| Field | Value |
|---|---|
| Severity | Low |
| Category | Design-document adherence |
| Location | `clients/rust/RustClientDesign.md:14-55` |
| Status | Resolved |
**Description:** `RustClientDesign.md` is stale relative to the implemented code. It documents a nested `crates/mxgateway-client/` layout (the real crate root is `clients/rust/` with a flat `src/`), and lists `tracing` among "Expected dependencies", but `tracing` appears in no `Cargo.toml`. CLAUDE.md requires docs to change with the source.
**Recommendation:** Update `RustClientDesign.md` to the actual flat layout and remove `tracing` from the dependency list (or add `tracing` if structured logging is genuinely intended).
**Resolution:** Resolved in `0d8a28d` (2026-05-18): the "Crate Layout" section now shows the actual flat layout (`mxgateway-client` as the workspace-root crate, `mxgw-cli` as a member) and the unused `tracing` entry was removed from the dependency list.
### Client.Rust-008
| Field | Value |
|---|---|
| Severity | Low |
| Category | Performance & resource management |
| Location | `clients/rust/src/value.rs:161-261` |
| Status | Resolved |
**Description:** `MxValueProjection::from_proto` and `MxArrayProjection::from_proto` deep-clone every element out of the wire message while `MxValue`/`MxArrayValue` also retain the original `raw` message. Every `MxValue` therefore holds two copies of its payload, wasteful for large string arrays or raw blobs arriving on the event stream.
**Recommendation:** Compute the projection lazily on demand, or have the projection borrow from `raw`, so array/raw payloads are not duplicated for every wrapped value.
**Resolution:** Resolved in `0d8a28d` (2026-05-18): `MxValue` and `MxArrayValue` no longer cache a `projection` field — `projection()` computes the typed view on demand from `raw`. A value built only to be sent over the wire now holds a single copy of its payload and pays no projection cost.
### Client.Rust-009
| Field | Value |
|---|---|
| Severity | Low |
| Category | Testing coverage |
| Location | `clients/rust/tests/client_behavior.rs`, `clients/rust/src/galaxy.rs` |
| Status | Resolved |
**Description:** Several critical paths are untested: TLS channel setup (`with_plaintext(false)` / CA-file loading), mid-stream `tonic::Status` fault propagation through `EventStream`/`DeployEventStream` (tests only send `Ok` items), and the bulk-size cap (`ensure_bulk_size` rejecting >1000 items).
**Recommendation:** Add tests that (a) feed an `Err(Status)` into the event/deploy streams and assert it surfaces as the mapped `Error`, (b) assert `add_item_bulk` with 1001 items returns `Error::InvalidArgument`, and (c) exercise the CA-file/`InvalidEndpoint` error path.
**Resolution:** Resolved in `0d8a28d` (2026-05-18): added `add_item_bulk_rejects_input_above_the_thousand_item_cap`, `event_stream_surfaces_a_mid_stream_status_fault` (the fake gateway now optionally emits a mid-stream `Status::unavailable`), and `connect_with_unreadable_ca_file_reports_invalid_endpoint`.
### Client.Rust-010
| Field | Value |
|---|---|
| Severity | Low |
| Category | Error handling & resilience |
| Location | `clients/rust/src/client.rs:255-268`, `clients/rust/src/galaxy.rs:204-216` |
| Status | Resolved |
**Description:** The client applies only a per-call deadline via `Request::set_timeout` and has no retry, reconnect, or transient-vs-permanent classification. A transient `Unavailable` (e.g. a gateway restart) maps to the catch-all `Error::Status` and is indistinguishable from a permanent failure. This is an acceptable v1 stance but is undocumented.
**Recommendation:** Either add a documented `Error::Unavailable` variant classifying `Code::Unavailable`/`Code::ResourceExhausted`, or explicitly document in the README that the client performs no retries and that transient failures arrive as `Error::Status`.
**Resolution:** Resolved in `0d8a28d` (2026-05-18): added the `Error::Unavailable` variant; `From<tonic::Status>` maps `Code::Unavailable` and `Code::ResourceExhausted` to it, so callers can classify transient failures without unwrapping the raw status.
### Client.Rust-011
| Field | Value |
|---|---|
| Severity | Low |
| Category | mxaccessgw conventions |
| Location | `clients/rust/src/session.rs:469` |
| Status | Resolved |
**Description:** `command_request` hard-codes `client_correlation_id` as `format!("rust-client-{}", kind.as_str_name())`. Every invocation of the same command kind on a session uses an identical correlation id, so the id cannot correlate a specific request/reply pair in gateway logs or among concurrent in-flight calls. MXAccess parity diagnostics rely on correlation ids being unique per call.
**Recommendation:** Append a per-call unique suffix (monotonic counter or UUID) to the correlation id, or expose a way for the caller to supply one.
**Resolution:** Resolved in `0d8a28d` (2026-05-18): correlation ids are built by `next_correlation_id`, which appends a process-wide atomic sequence number; `Session::close` uses it too.
### Client.Rust-012
| Field | Value |
|---|---|
| Severity | High |
| Category | mxaccessgw conventions |
| Location | `clients/rust/src/galaxy.rs:282` |
| Status | Resolved |
**Description:** Found while verifying the fix for Client.Rust-001/002: `cargo clippy --workspace --all-targets -- -D warnings` reported a third violation the original review missed. The `get_last_deploy_time` test fake calls `.clone()` on a `MutexGuard<Option<prost_types::Timestamp>>`, and `Option<Timestamp>` is `Copy` (`clippy::clone_on_copy`). Under `-D warnings` this is a compile error, so clippy still did not pass after Client.Rust-001/002 alone.
**Recommendation:** Dereference instead of cloning: `*self.state.last_deploy.lock().unwrap()`.
**Resolution:** Resolved in `0d8a28d` (2026-05-18): replaced `.clone()` with a deref. `cargo clippy --workspace --all-targets -- -D warnings` now passes cleanly.
+147
View File
@@ -0,0 +1,147 @@
# Code Review — Contracts
| Field | Value |
|---|---|
| Module | `src/MxGateway.Contracts` |
| Reviewer | Claude Code |
| Review date | 2026-05-18 |
| Commit reviewed | `6c64030` |
| Status | Reviewed |
| Open findings | 0 |
## Checklist coverage
| # | Category | Result |
|---|---|---|
| 1 | Correctness & logic bugs | No functional bugs; one missing reply-payload case for the by-name ack command and an `int32`-typed `success` flag that reads like a bool (Contracts-002, Contracts-006). |
| 2 | mxaccessgw conventions | Additive-only evolution honored (no renumbered/removed tags), MXAccess-aligned naming consistent, generated code untouched; no `reserved` statements declared as a guardrail (Contracts-005). |
| 3 | Concurrency & thread safety | N/A — pure contract definitions plus a static const class with no shared mutable state. |
| 4 | Error handling & resilience | HRESULT / `MxStatusProxy` / `ProtocolStatus` carriers are complete; the worker-side by-name alarm ack has no dedicated reply payload (Contracts-002). |
| 5 | Security | Credential-sensitive fields are clearly commented; no secrets forced into loggable shapes. No issues found. |
| 6 | Performance & resource management | `DiscoverHierarchy` is paged; alarm-snapshot streams are server-streamed; no bloat issues. No issues found. |
| 7 | Design-document adherence | `.proto` files match design intent but `docs/Grpc.md` is stale (Contracts-001); worker vs public alarm-status shapes unreconciled in docs (Contracts-008). |
| 8 | Code organization & conventions | Package/file layout correct; stale class summary (Contracts-004). Contracts-003 (`mxaccess_worker.proto` Protobuf item missing `ProtoRoot`) was re-triaged as not-a-defect — the attribute is already present. |
| 9 | Testing coverage | Gateway/worker/alarm round-trips covered; Galaxy Repository protos and raw `MxArray` paths untested (Contracts-007). |
| 10 | Documentation & comments | Proto comments accurate and domain-rich; one stale class summary (Contracts-004). |
## Findings
### Contracts-001
| Field | Value |
|---|---|
| Severity | Low |
| Category | Design-document adherence |
| Location | `docs/Grpc.md:13` (and `:3`, `:32`, `:39`) |
| Status | Resolved |
**Description:** `mxaccess_gateway.proto` now declares six RPCs on `MxAccessGateway` (`OpenSession`, `CloseSession`, `Invoke`, `StreamEvents`, `AcknowledgeAlarm`, `QueryActiveAlarms`). `docs/Grpc.md` still describes "the four `MxAccessGateway` RPCs" in its type table and omits `AcknowledgeAlarm`/`QueryActiveAlarms` from the Validation Rules table. CLAUDE.md requires docs to change in the same commit as the contract; the alarm RPC commits left this doc stale and misleading about the public surface.
**Recommendation:** Update `docs/Grpc.md` to enumerate all six RPCs and add `AcknowledgeAlarm`/`QueryActiveAlarms` to the type/handler and validation tables, or explicitly cross-reference `AlarmClientDiscovery.md`.
**Resolution:** _(2026-05-18)_ Confirmed against `mxaccess_gateway.proto` — six RPCs declared, doc said "four". Updated `docs/Grpc.md`: the collaborator table now says "six `MxAccessGateway` RPCs", the RPC Handlers intro enumerates all six, added dedicated `AcknowledgeAlarm` and `QueryActiveAlarms` handler subsections (noting the alarm surface routes through `IAlarmRpcDispatcher` and is validated inline rather than via `MxAccessGrpcRequestValidator`, with a cross-reference to `AlarmClientDiscovery.md`), and added both alarm RPCs to the Validation Rules table.
### Contracts-002
| Field | Value |
|---|---|
| Severity | Medium |
| Category | Error handling & resilience |
| Location | `src/MxGateway.Contracts/Protos/mxaccess_gateway.proto:384-385`, `:95` |
| Status | Resolved |
**Description:** `MxCommandKind` includes `MX_COMMAND_KIND_ACKNOWLEDGE_ALARM_BY_NAME = 29` and `MxCommand.payload` carries `AcknowledgeAlarmByNameCommand acknowledge_alarm_by_name_command = 38`, but `MxCommandReply.payload` has only `acknowledge_alarm = 34` and `query_active_alarms = 35` — there is no by-name reply case. The by-name ack must reuse `AcknowledgeAlarmReplyPayload` or rely on the top-level `hresult`. The command/reply payload asymmetry is undocumented and easy to dispatch incorrectly.
**Recommendation:** Either add an explicit comment to `MxCommandReply` stating that by-name ack reuses the `acknowledge_alarm` payload case, or add a dedicated payload case for symmetry, and document the chosen contract in `docs/Contracts.md` / `AlarmClientDiscovery.md`.
**Resolution:** _(2026-05-18)_ Verified against both the `.proto` and the dispatch code. The asymmetry is intentional and the code is correct: the worker's `MxAccessCommandExecutor.ExecuteAcknowledgeAlarmByName` builds `reply.AcknowledgeAlarm = new AcknowledgeAlarmReplyPayload { NativeStatus = rc }` — deliberately reusing the `acknowledge_alarm` payload case — and the gateway's `WorkerAlarmRpcDispatcher.AcknowledgeAsync` only reads the top-level `hresult`/`protocol_status`, so both ack arms work. The gap was documentation only. Took the finding's preferred option (a) — comment-only, no wire-format or generated-type change: added explicit comments to the `acknowledge_alarm` reply-payload case and to the `AcknowledgeAlarmReplyPayload` message in `mxaccess_gateway.proto` stating both ack kinds reuse this case and consumers must dispatch on `MxCommandReply.kind`, and documented the contract in `docs/AlarmClientDiscovery.md` section 4. Added regression test `ProtobufContractRoundTripTests.MxCommandReply_AcknowledgeAlarmByName_ReusesAcknowledgeAlarmPayloadCase` pinning the by-name-ack → `acknowledge_alarm` reuse and asserting no by-name-specific reply oneof case exists.
### Contracts-003
| Field | Value |
|---|---|
| Severity | Low |
| Category | Code organization & conventions |
| Location | `src/MxGateway.Contracts/MxGateway.Contracts.csproj:10` |
| Status | Won't Fix |
**Description:** The `<Protobuf>` item for `mxaccess_worker.proto` omits `ProtoRoot="Protos"`, while the items for `mxaccess_gateway.proto` (line 9) and `galaxy_repository.proto` (line 11) both set it. `mxaccess_worker.proto` does `import "mxaccess_gateway.proto"`, which resolves only because Grpc.Tools adds the importing file's own directory to the proto path. The inconsistency is fragile — tooling changes to ProtoRoot handling could break import resolution.
**Recommendation:** Add `ProtoRoot="Protos"` to the `mxaccess_worker.proto` `<Protobuf>` item so all three entries are consistent.
**Resolution:** _(2026-05-18)_ Re-triaged as not-a-defect: the finding's premise is factually wrong. Line 10 of `MxGateway.Contracts.csproj` already carries `ProtoRoot="Protos"` — all three `<Protobuf>` items are already consistent. `git show 6c64030:src/MxGateway.Contracts/MxGateway.Contracts.csproj` (the reviewed commit) confirms the attribute was present at review time too; the csproj has not been touched since `133c830`. No code change made. Status set to Won't Fix because there is nothing to fix.
### Contracts-004
| Field | Value |
|---|---|
| Severity | Low |
| Category | Documentation & comments |
| Location | `src/MxGateway.Contracts/GatewayContractInfo.cs:3-6` |
| Status | Resolved |
**Description:** The XML summary says the class exposes version metadata "before generated protobuf contracts are introduced." Generated protobuf contracts have long been introduced and are consumed across the solution. The comment is stale; the class now holds the authoritative `GatewayProtocolVersion`/`WorkerProtocolVersion` advertised in `OpenSessionReply` and used to validate `WorkerEnvelope` framing.
**Recommendation:** Reword the summary to describe the current purpose — version constants advertised in `OpenSessionReply` and used to validate `WorkerEnvelope` protocol framing.
**Resolution:** _(2026-05-18)_ Confirmed stale — the class is consumed by `GatewayApplication`/`OpenSessionReply` and `WorkerEnvelope` framing checks across the solution. Reworded the XML summary on `GatewayContractInfo` to describe the actual current purpose: `GatewayProtocolVersion` is advertised to clients in `OpenSessionReply`, and `WorkerProtocolVersion` validates `WorkerEnvelope` protocol framing on the gateway↔worker pipe.
### Contracts-005
| Field | Value |
|---|---|
| Severity | Low |
| Category | mxaccessgw conventions |
| Location | `src/MxGateway.Contracts/Protos/mxaccess_gateway.proto`, `src/MxGateway.Contracts/Protos/mxaccess_worker.proto` |
| Status | Resolved |
**Description:** The ProtobufStyleGuide mandates reserving removed field numbers / enum values. Evolution to date has been purely additive, so this is not a current violation — but none of the `.proto` files contain any `reserved` declarations, leaving no in-file guardrail for the first removal. This is a latent maintainability gap.
**Recommendation:** When any field or enum value is eventually removed, add a `reserved` range/name in the same change. Consider a short comment block in each message documenting the policy so future editors apply `reserved` rather than reusing tags.
**Resolution:** _(2026-05-18)_ Confirmed: no field or enum value has ever been removed, so adding `reserved` ranges now would be incorrect (there are no retired tags to reserve, and inventing ranges for never-used numbers would itself violate the contract). Took the finding's least-invasive option — added a short wire-compatibility policy comment block at the top of all three `.proto` files (`mxaccess_gateway.proto`, `mxaccess_worker.proto`, `galaxy_repository.proto`) stating the additive-only rule and instructing future editors to add a `reserved` range + name in the same change as any removal. Comment-only, no wire-format or generated-type change. The `reserved` declarations themselves remain correctly deferred to the first actual removal.
### Contracts-006
| Field | Value |
|---|---|
| Severity | Low |
| Category | Correctness & logic bugs |
| Location | `src/MxGateway.Contracts/Protos/mxaccess_gateway.proto:647` |
| Status | Resolved |
**Description:** `MxStatusProxy.success` is declared `int32 success = 1` with no comment. The name reads like a boolean flag but the type is a 32-bit integer (mirroring MXAccess `MXSTATUS_PROXY`, which stores a numeric success/HResult-like value). Without a comment a client author can reasonably misinterpret the field (treat non-1 as failure, or expect only 0/1).
**Recommendation:** Add a comment clarifying the semantic — what range of values it carries and how 0 vs non-zero map to MXAccess status — per the style guide rule to comment fields carrying raw MXAccess status detail.
**Resolution:** _(2026-05-18)_ Confirmed: `int32 success = 1` had no comment. Cross-checked against the worker `MxStatusProxyConverter`, which reads the COM struct's `success` field verbatim (a 16-bit signed value) without reinterpretation, and against the MXAccess analysis (`MXAccess-Public-API.md`: `MxStatus`/`MXSTATUS_PROXY` are identical structs with a `short success` member). Added a field comment to `MxStatusProxy.success` stating it mirrors the COM struct's numeric `success` member (NOT a boolean), is carried verbatim for diagnostics, and that clients should branch on `category` (`MX_STATUS_CATEGORY_OK` marks success) — deliberately avoiding an over-specified 0-vs-1 claim, since the gateway never maps `success` to an outcome and `category` is the authoritative field. Comment-only change.
### Contracts-007
| Field | Value |
|---|---|
| Severity | Low |
| Category | Testing coverage |
| Location | `src/MxGateway.Tests/Contracts/ProtobufContractRoundTripTests.cs` |
| Status | Resolved |
**Description:** `ProtobufContractRoundTripTests` covers gateway command/reply/event, alarm transition, alarm ack request/reply, active-alarm snapshot, and the worker envelope. It has no coverage for: (a) any `galaxy_repository.proto` message (`DiscoverHierarchy*`, `GalaxyObject`, `GalaxyAttribute`, `DeployEvent`, the `root` oneof, wrapper-typed fields); (b) `BulkSubscribeReply`/`SubscribeResult` and the bulk command kinds; (c) `MxValue`/`MxArray` `raw_value`/`RawArray` (`bytes`) paths and the `WorkerFault`/`WorkerHeartbeat` IPC bodies.
**Recommendation:** Add round-trip tests for the Galaxy Repository messages (including the `root` oneof and proto wrapper fields), the bulk-subscribe reply, and the remaining `WorkerEnvelope` body cases.
**Resolution:** _(2026-05-18)_ Confirmed the listed gaps and added round-trip tests to `ProtobufContractRoundTripTests` covering all three areas: (a) Galaxy Repository — `GalaxyRepositoryDescriptor_ContainsBrowseServiceMethods`, `DiscoverHierarchyRequest_RoundTripsRootOneofAndWrapperFields` (a `[Theory]` exercising all three `root` oneof arms plus the `Int32Value` wrapper `max_depth`), `DiscoverHierarchyReply_RoundTripsObjectAndAttributeGraph`, `DeployEvent_RoundTripsTimestampAndCounters`, `GalaxyConnectionReplies_RoundTrip`; (b) `BulkSubscribeReply_RoundTripsSubscribeResults` and `MxCommandReply_RoundTripsBulkSubscribePayload` (bulk-subscribe command kind + payload case); (c) `MxValue_RoundTripsRawValueBytesPayload`, `MxArray_RoundTripsRawArrayPayload`, `WorkerEnvelope_RoundTripsWorkerFaultBody`, `WorkerEnvelope_RoundTripsWorkerHeartbeatBody`. All new tests pass; the full `ProtobufContractRoundTripTests` class is 27 tests green.
### Contracts-008
| Field | Value |
|---|---|
| Severity | Low |
| Category | Design-document adherence |
| Location | `src/MxGateway.Contracts/Protos/mxaccess_gateway.proto:451-459`, `:627-636` |
| Status | Resolved |
**Description:** The worker-side `AcknowledgeAlarmReplyPayload` carries the alarm-ack outcome as `int32 native_status`, while the public `AcknowledgeAlarmReply` carries it as `MxStatusProxy status` plus `optional int32 hresult`. The comment explains the worker echoes `native_status` into `AcknowledgeAlarmReply.hresult`, but the two outcome shapes (raw `int32` vs structured `MxStatusProxy`) are not reconciled in `docs/Contracts.md` / `AlarmClientDiscovery.md`. A reader cannot tell whether `MxStatusProxy status` is always populated or only on COM-layer failure.
**Recommendation:** Document in `docs/Contracts.md` (or `AlarmClientDiscovery.md`) how the worker `native_status` maps onto the public reply's `status`/`hresult` pair so client authors know which field is authoritative.
**Resolution:** _(2026-05-18)_ Verified against `WorkerAlarmRpcDispatcher.AcknowledgeAsync`. The asymmetry is larger than the finding implies: the dispatcher copies the worker `MxCommandReply.hresult` into `AcknowledgeAlarmReply.hresult` but **never** assigns `AcknowledgeAlarmReply.status` — the `MxStatusProxy status` field is left UNSET on every reply. The proto comment on `status` ("Native MxAccess status describing the outcome of the ack") was therefore actively misleading. Fixed: (1) reworded the `mxaccess_gateway.proto` comments on `AcknowledgeAlarmReply.hresult` (now identifies it as the authoritative native-return-code field) and `AcknowledgeAlarmReply.status` (now states it is reserved/unset and clients must not depend on it); (2) extended `docs/AlarmClientDiscovery.md` section 4 with a "Worker `native_status` → public `AcknowledgeAlarmReply` mapping" subsection spelling out that `hresult` is authoritative (`0` = success) and `status` is always unset, and that clients should branch on `protocol_status` then `hresult`, never `status`.
+179
View File
@@ -0,0 +1,179 @@
# Code Review — IntegrationTests
| Field | Value |
|---|---|
| Module | `src/MxGateway.IntegrationTests` |
| Reviewer | Claude Code |
| Review date | 2026-05-18 |
| Commit reviewed | `6c64030` |
| Status | Reviewed |
| Open findings | 0 |
## Checklist coverage
| # | Category | Result |
|---|---|---|
| 1 | Correctness & logic bugs | Issues found: IntegrationTests-003 (asserts only on first event), IntegrationTests-010 (`WaitForMessageAsync` ignores cancellation). |
| 2 | mxaccessgw conventions | Live tests correctly gated and skip (not fail) when prerequisites are absent; `LiveGalaxyRepositoryFactAttribute` undocumented in the opt-in matrix. |
| 3 | Concurrency & thread safety | Issue found: IntegrationTests-007 (no `[Collection]`/parallelism guard for shared MXAccess/ZB/GLAuth). |
| 4 | Error handling & resilience | Issue found: IntegrationTests-004 (cleanup `WaitAsync` can mask the original failure). |
| 5 | Security | No production secrets; only documented dev GLAuth creds and a localhost ZB connection string, all env-overridable. No issues found. |
| 6 | Performance & resource management | Worker process disposed transitively via session disposal; no leaked pipes/COM/processes. No issues found. |
| 7 | Design-document adherence | Issues found: IntegrationTests-001 (Galaxy live suite absent from the opt-in matrix), IntegrationTests-002 (`GwAdmin` LDAP prerequisite undocumented). |
| 8 | Code organization & conventions | Issue found: IntegrationTests-008 (three near-identical fact attributes). |
| 9 | Testing coverage | Issues found: IntegrationTests-005 (thin MXAccess parity coverage), IntegrationTests-006 (thin LDAP failure-path coverage). |
| 10 | Documentation & comments | Issue found: IntegrationTests-009 (`TestServerCallContext` mislabelled "Mock"). |
## Findings
### IntegrationTests-001
| Field | Value |
|---|---|
| Severity | High |
| Category | Design-document adherence |
| Location | `src/MxGateway.IntegrationTests/Galaxy/LiveGalaxyRepositoryFactAttribute.cs:7`, `src/MxGateway.IntegrationTests/Galaxy/GalaxyRepositoryLiveTests.cs` |
| Status | Resolved |
**Description:** The Galaxy Repository live test suite and its gating env var `MXGATEWAY_RUN_LIVE_GALAXY_TESTS` (plus connection-string override `MXGATEWAY_LIVE_GALAXY_CONN`) are completely absent from `docs/GatewayTesting.md`. CLAUDE.md mandates updating docs in the same change as the source. The opt-in matrix documents only the MXAccess and LDAP env vars, so an operator running the documented matrix has no way to know these tests exist or how to enable them.
**Recommendation:** Add a "Live Galaxy Repository" section to `docs/GatewayTesting.md` documenting `MXGATEWAY_RUN_LIVE_GALAXY_TESTS=1`, `MXGATEWAY_LIVE_GALAXY_CONN`, the `ZB` database prerequisite, and the covered RPCs, mirroring the existing "Live MXAccess Smoke" section.
**Resolution:** Resolved 2026-05-18: Added a "Live Galaxy Repository" section to `docs/GatewayTesting.md` documenting `MXGATEWAY_RUN_LIVE_GALAXY_TESTS`, `MXGATEWAY_LIVE_GALAXY_CONN`, the deployed-`ZB` prerequisite, and the covered `GalaxyRepository` RPCs.
### IntegrationTests-002
| Field | Value |
|---|---|
| Severity | High |
| Category | Design-document adherence |
| Location | `src/MxGateway.IntegrationTests/DashboardLdapLiveTests.cs:13`, `src/MxGateway.Server/Configuration/LdapOptions.cs:27` |
| Status | Resolved |
**Description:** `DashboardLdapLiveTests` builds the authenticator with `new GatewayOptions()`, so it relies on `LdapOptions.RequiredGroup` defaulting to `GwAdmin` and asserts the `admin` user is a member of a `GwAdmin` LDAP group. `glauth.md` does not list `GwAdmin` as a provisioned group — it lists `admin` only in the five role groups and describes `GwAdmin` as a group to add "when reuse isn't enough." If GLAuth has only the documented baseline groups, `AuthenticateAsync_AdminInGwAdminGroup_Succeeds` fails (not skips) on any box where the env var is set. This is an undocumented hard prerequisite beyond "LDAP is up."
**Recommendation:** Either document the required `GwAdmin` GLAuth provisioning step in `glauth.md` and `GatewayTesting.md`, or have the test set `RequiredGroup` to a baseline group `glauth.md` guarantees `admin` belongs to (e.g. `WriteOperate`).
**Resolution:** Resolved 2026-05-18: Took the documentation fix — promoted the `glauth.md` "Adding a gw-specific group" section into a concrete "Provisioning the GwAdmin group" step that grants `GwAdmin` to `admin`, cross-referenced it from the groups/verification sections, and added a "Live LDAP" section to `docs/GatewayTesting.md` calling out `GwAdmin` as a hard prerequisite. Alternative considered: weaken the test to a baseline group (`WriteOperate`) — rejected because `GwAdmin` is the real default `LdapOptions.RequiredGroup` and the test should exercise it.
### IntegrationTests-003
| Field | Value |
|---|---|
| Severity | Medium |
| Category | Correctness & logic bugs |
| Location | `src/MxGateway.IntegrationTests/WorkerLiveMxAccessSmokeTests.cs:89-97` |
| Status | Resolved |
**Description:** The test asserts only on the first `MxEvent` recorded by `RecordingServerStreamWriter`. A live MXAccess provider can deliver an initial state/quality event whose family or handles differ from the expected `OnDataChange` (e.g. a registration-state or bad-quality bootstrap event). Because `WaitForFirstMessageAsync` returns whatever arrives first, a genuine ordering/family defect could fail spuriously or leave later wrong events unverified.
**Recommendation:** Filter for the first event with `Family == OnDataChange` (with a bounded retry/poll) or assert the full recorded sequence, so the test verifies the event the worker is supposed to emit.
**Resolution:** Resolved 2026-05-18: Confirmed against source — `WaitForFirstMessageAsync` completed a `TaskCompletionSource` on the very first `WriteAsync`. Replaced it with `RecordingServerStreamWriter.WaitForMessageAsync(predicate, timeout)`, which scans recorded messages, skips earlier non-matching events, and blocks on a `SemaphoreSlim` until a matching one arrives or the timeout elapses (throwing a `TimeoutException` that reports the scanned count). `GatewaySession_WithLiveWorker_RegistersAdvisesStreamsDataAndCloses` now waits for the first `Family == OnDataChange` event. Live execution was not possible in this environment (no MXAccess COM); verified by build.
### IntegrationTests-004
| Field | Value |
|---|---|
| Severity | Medium |
| Category | Error handling & resilience |
| Location | `src/MxGateway.IntegrationTests/WorkerLiveMxAccessSmokeTests.cs:108-111` |
| Status | Resolved |
**Description:** In the `finally` block, after `CloseSessionAsync`, the test does `await streamTask.WaitAsync(StreamShutdownTimeout)`. If closing the session does not promptly complete the stream (or `StreamEvents` itself faults), this throws `TimeoutException` from inside `finally`, which replaces/masks any original assertion failure from the `try` block. The diagnostic value of the real failure is lost.
**Recommendation:** Wrap the `streamTask.WaitAsync` (and ideally `WaitForProcessesAsync`) in a try/catch that logs the cleanup exception via `output.WriteLine` instead of letting it propagate.
**Resolution:** Resolved 2026-05-18: Confirmed — the `finally` block awaited `streamTask.WaitAsync` and `WaitForProcessesAsync` with no exception handling. Extracted a shared `ShutDownAsync` helper that wraps the session-close + stream-drain in one try/catch and the worker-process wait in a second try/catch, logging each cleanup exception via `output.WriteLine` instead of throwing. All three live tests now route shutdown through it, so a cleanup timeout can no longer mask an assertion failure. Live execution was not possible in this environment; verified by build.
### IntegrationTests-005
| Field | Value |
|---|---|
| Severity | Medium |
| Category | Testing coverage |
| Location | `src/MxGateway.IntegrationTests/WorkerLiveMxAccessSmokeTests.cs` |
| Status | Resolved |
**Description:** The only live MXAccess test covers the Register→AddItem→Advise→one-OnDataChange→Close happy path. CLAUDE.md stresses that MXAccess parity is the contract and calls out non-obvious behaviors (`WriteSecured` ordering, `OperationComplete` semantics, invalid-handle exceptions). None of `Write`, `WriteSecured`, `Unadvise`, `RemoveItem`, `Unregister`, `OperationComplete`, an invalid-handle command, or a worker-fault path is exercised against live COM — exactly the paths fake-worker tests cannot validate.
**Recommendation:** Add live coverage for at least a `Write` round-trip and an invalid-handle command, plus a worker-fault/abnormal-exit scenario, even if behind additional opt-in env vars.
**Resolution:** Resolved 2026-05-18: Added two `[LiveMxAccessFact]`-gated tests to `WorkerLiveMxAccessSmokeTests`. `GatewaySession_WithLiveWorker_WritesValueToAdvisedItem` registers/adds/advises then issues a `Write` of an integer value, asserting the command round-trips with `ProtocolStatusCode.Ok` and `MxCommandKind.Write`. `GatewaySession_WithLiveWorker_InvalidHandleCommand_SurfacesFailureWithoutTransportFault` issues `AddItem` against `int.MaxValue` as the server handle (never issued by MXAccess) and asserts the failure surfaces in the command reply without a usable item handle. Both reuse the existing opt-in env var and the `ShutDownAsync` cleanup helper. A worker-fault/abnormal-exit case was deliberately scoped out — it needs a controlled COM crash injection beyond what the existing harness supports; the two added cases cover the `Write` round-trip and invalid-handle paths the recommendation calls out. Live execution was not possible in this environment; verified by build.
### IntegrationTests-006
| Field | Value |
|---|---|
| Severity | Medium |
| Category | Testing coverage |
| Location | `src/MxGateway.IntegrationTests/DashboardLdapLiveTests.cs` |
| Status | Resolved |
**Description:** LDAP live coverage is two cases: admin succeeds, readonly is denied for missing group. There is no coverage of a wrong password for a valid user, an unknown username, or the LDAP-server-unreachable path — all of which `DashboardAuthenticator` has distinct branches for (the `LdapException` catch, the `candidate is null` branch). The negative test only proves group-membership denial, not credential rejection.
**Recommendation:** Add a live test for `admin` with a wrong password asserting `Succeeded == false` and that the password is not leaked into `FailureMessage`, and a test for an unknown username.
**Resolution:** Resolved 2026-05-18: Added three `[LiveLdapFact]`-gated tests to `DashboardLdapLiveTests`. `AuthenticateAsync_AdminWithWrongPassword_FailsWithoutLeakingPassword` exercises the `LdapException` catch via a rejected candidate bind and asserts the wrong password never reaches `FailureMessage`. `AuthenticateAsync_UnknownUsername_Fails` exercises the `candidate is null` branch. `AuthenticateAsync_ServerUnreachable_FailsWithoutThrowing` builds the authenticator with `LdapOptions.Port = 1` (a reserved port no LDAP server listens on) and asserts the connect failure is absorbed into a failed result rather than thrown — covering the generic `catch (Exception)` branch. All three are gated by the existing `MXGATEWAY_RUN_LIVE_LDAP_TESTS` opt-in so they stay opt-in. Live execution was not possible in this environment (no live LDAP); verified by build.
### IntegrationTests-007
| Field | Value |
|---|---|
| Severity | Low |
| Category | Concurrency & thread safety |
| Location | `src/MxGateway.IntegrationTests/WorkerLiveMxAccessSmokeTests.cs:20`, `src/MxGateway.IntegrationTests/Galaxy/GalaxyRepositoryLiveTests.cs:5`, `src/MxGateway.IntegrationTests/DashboardLdapLiveTests.cs:9` |
| Status | Resolved |
**Description:** The live test classes contend for genuinely shared singletons — one MXAccess COM provider, one ZB SQL database, one GLAuth instance with a 3-fail/10-minute per-IP lockout. No `[Collection]` annotation or `DisableTestParallelization` is declared, so xUnit's default cross-class parallelism could run the Galaxy tests concurrently or interleave an LDAP failure burst that trips the GLAuth lockout.
**Recommendation:** Place the live test classes in a shared `[Collection]`, or set `[assembly: CollectionBehavior(DisableTestParallelization = true)]` for this opt-in project, so live external resources are accessed serially.
**Resolution:** Resolved 2026-05-18: Confirmed — no `[Collection]` or assembly-level `CollectionBehavior` existed. Added `LiveResourcesCollection.cs` with a `[CollectionDefinition(Name, DisableParallelization = true)]` and applied `[Collection(LiveResourcesCollection.Name)]` to `WorkerLiveMxAccessSmokeTests`, `GalaxyRepositoryLiveTests`, and `DashboardLdapLiveTests`. A named collection (rather than an assembly-wide `DisableTestParallelization`) was chosen so the live classes serialize against each other and within themselves while non-live tests (`IntegrationTestEnvironmentTests`) keep parallelizing. Verified by build; live tests not executed (no MXAccess COM / live LDAP in this environment).
### IntegrationTests-008
| Field | Value |
|---|---|
| Severity | Low |
| Category | Code organization & conventions |
| Location | `src/MxGateway.IntegrationTests/LiveLdapFactAttribute.cs`, `src/MxGateway.IntegrationTests/Galaxy/LiveGalaxyRepositoryFactAttribute.cs`, `src/MxGateway.IntegrationTests/LiveMxAccessFactAttribute.cs` |
| Status | Resolved |
**Description:** Three near-identical fact attributes each re-implement the same "compare env var to `1` with `StringComparison.Ordinal`, set `Skip` otherwise" pattern. `LiveMxAccessFactAttribute` delegates to `IntegrationTestEnvironment` while the other two inline the logic, so the project has two divergent styles for the same concern.
**Recommendation:** Extract a shared helper (e.g. `IntegrationTestEnvironment.IsEnabled(string variableName)`) and have all three attributes call it.
**Resolution:** Resolved 2026-05-18: Confirmed — `LiveLdapFactAttribute.Enabled` and `LiveGalaxyRepositoryFactAttribute.Enabled` each inlined the ordinal `== "1"` comparison while `LiveMxAccessFactAttribute` delegated to `IntegrationTestEnvironment`. Added `IntegrationTestEnvironment.IsEnabled(string variableName)` as the single implementation; `LiveMxAccessTestsEnabled`, `LiveLdapFactAttribute.Enabled`, and `LiveGalaxyRepositoryFactAttribute.Enabled` now all call it. Verified by build.
### IntegrationTests-009
| Field | Value |
|---|---|
| Severity | Low |
| Category | Documentation & comments |
| Location | `src/MxGateway.IntegrationTests/WorkerLiveMxAccessSmokeTests.cs:372-375` |
| Status | Resolved |
**Description:** `TestServerCallContext` is XML-documented as a "Mock server call context," but it is a hand-written stub/fake with no mocking framework and no verification behavior. Per the style guides (accurate naming; explain why not what), calling it a mock misleads readers who may expect verifiable interactions.
**Recommendation:** Reword the summary to "test stub" / "minimal `ServerCallContext` implementation for in-process gRPC calls."
**Resolution:** Resolved 2026-05-18: Confirmed — the summary read "Mock server call context for testing gRPC calls." Reworded to "Minimal `ServerCallContext` stub for invoking the gRPC service in-process," noting it is a hand-written fake with no verification behavior. No mocking framework is involved; this is a documentation-only fix. Verified by build.
### IntegrationTests-010
| Field | Value |
|---|---|
| Severity | Low |
| Category | Correctness & logic bugs |
| Location | `src/MxGateway.IntegrationTests/WorkerLiveMxAccessSmokeTests.cs:366-369` |
| Status | Resolved |
**Description:** `WaitForFirstMessageAsync` accepts only a `timeout` and never observes a `CancellationToken`. There is no per-test cancellation propagation, so if the gateway/worker hangs without writing an event the test relies solely on the 15s `WaitAsync` timeout and gives no contextual diagnostics. Combined with IntegrationTests-004, a hung live worker produces a bare `TimeoutException`.
**Recommendation:** Accept a `CancellationToken` (linked to `TestServerCallContext`'s token), pass it to `firstMessage.Task.WaitAsync(timeout, token)`, and on timeout emit the recorded `Messages` count via `output.WriteLine` before throwing.
**Re-triage:** The named method `WaitForFirstMessageAsync` no longer exists — IntegrationTests-003's resolution renamed/replaced it with `RecordingServerStreamWriter.WaitForMessageAsync(predicate, timeout)`, which scans recorded messages and blocks on a `SemaphoreSlim`. The underlying defect still held: that replacement method also took only a `timeout` and never observed a `CancellationToken`. The finding remains valid (Low, Correctness) against the renamed method; the recommendation's `firstMessage.Task.WaitAsync` detail is stale but the intent (thread a token, surface a count on timeout) is unchanged.
**Resolution:** Resolved 2026-05-18: Added an optional `CancellationToken` parameter to `WaitForMessageAsync`, linked with the existing timeout source via `CancellationTokenSource.CreateLinkedTokenSource`, so a per-test cancellation aborts the wait promptly. `GatewaySession_WithLiveWorker_RegistersAdvisesStreamsDataAndCloses` now creates a `CancellationTokenSource`, passes its token into the `StreamEvents` `TestServerCallContext` and into `WaitForMessageAsync`, so the stream call and the wait share one cancellation source. On timeout the method already throws a `TimeoutException` whose message includes the scanned message count, satisfying the "emit recorded count" intent (the count surfaces in the test failure rather than via a separate `output.WriteLine`). Verified by build; live tests not executed.
+164
View File
@@ -0,0 +1,164 @@
# Code Reviews
<!-- GENERATED FILE - do not edit by hand. Regenerate with: python code-reviews/regen-readme.py -->
Cross-module code review index for the `mxaccessgw` codebase. The review process is defined in [../REVIEW-PROCESS.md](../REVIEW-PROCESS.md).
Each module's `findings.md` is the source of truth; this file is generated from them by `regen-readme.py` and must not be edited by hand.
## Module status
| Module | Reviewer | Date | Commit | Status | Open | Total |
|---|---|---|---|---|---|---|
| [Client.Dotnet](Client.Dotnet/findings.md) | Claude Code | 2026-05-18 | `3cc53a8` | Reviewed | 0 | 8 |
| [Client.Go](Client.Go/findings.md) | Claude Code | 2026-05-18 | `3cc53a8` | Reviewed | 0 | 10 |
| [Client.Java](Client.Java/findings.md) | Claude Code | 2026-05-18 | `3cc53a8` | Reviewed | 0 | 12 |
| [Client.Python](Client.Python/findings.md) | Claude Code | 2026-05-18 | `3cc53a8` | Reviewed | 0 | 12 |
| [Client.Rust](Client.Rust/findings.md) | Claude Code | 2026-05-18 | `3cc53a8` | Reviewed | 0 | 12 |
| [Contracts](Contracts/findings.md) | Claude Code | 2026-05-18 | `6c64030` | Reviewed | 0 | 8 |
| [IntegrationTests](IntegrationTests/findings.md) | Claude Code | 2026-05-18 | `6c64030` | Reviewed | 0 | 10 |
| [Server](Server/findings.md) | Claude Code | 2026-05-18 | `6c64030` | Reviewed | 0 | 14 |
| [Tests](Tests/findings.md) | Claude Code | 2026-05-18 | `6c64030` | Reviewed | 0 | 12 |
| [Worker](Worker/findings.md) | Claude Code | 2026-05-18 | `6c64030` | Reviewed | 0 | 15 |
| [Worker.Tests](Worker.Tests/findings.md) | Claude Code | 2026-05-18 | `6c64030` | Reviewed | 0 | 15 |
## Pending findings
Findings with status `Open` or `In Progress`, ordered by severity.
_No pending findings._
## Closed findings
Findings with status `Resolved`, `Won't Fix`, or `Deferred`.
| ID | Severity | Status | Category | Location |
|---|---|---|---|---|
| Server-001 | Critical | Resolved | Security | `src/MxGateway.Server/GatewayApplication.cs:147-149`, `src/MxGateway.Server/Dashboard/DashboardEndpointRouteBuilderExtensions.cs:55-58`, `src/MxGateway.Server/Dashboard/Components/Routes.razor:1-15` |
| Client.Go-001 | High | Resolved | Correctness & logic bugs | `clients/go/mxgateway/errors.go:88-93`, `clients/go/mxgateway/errors.go:117-128` |
| Client.Rust-001 | High | Resolved | mxaccessgw conventions | `clients/rust/src/options.rs:98,143` |
| Client.Rust-002 | High | Resolved | mxaccessgw conventions | `clients/rust/src/session.rs:522` |
| Client.Rust-003 | High | Resolved | Correctness & logic bugs | `clients/rust/crates/mxgw-cli/src/main.rs:1051` |
| Client.Rust-012 | High | Resolved | mxaccessgw conventions | `clients/rust/src/galaxy.rs:282` |
| IntegrationTests-001 | High | Resolved | Design-document adherence | `src/MxGateway.IntegrationTests/Galaxy/LiveGalaxyRepositoryFactAttribute.cs:7`, `src/MxGateway.IntegrationTests/Galaxy/GalaxyRepositoryLiveTests.cs` |
| IntegrationTests-002 | High | Resolved | Design-document adherence | `src/MxGateway.IntegrationTests/DashboardLdapLiveTests.cs:13`, `src/MxGateway.Server/Configuration/LdapOptions.cs:27` |
| Server-003 | High | Resolved | Security | `src/MxGateway.Server/Dashboard/DashboardAuthorizationHandler.cs:39,54-59`, `src/MxGateway.Server/Dashboard/DashboardAuthenticator.cs:236-258` |
| Tests-001 | High | Resolved | Testing coverage | `src/MxGateway.Tests/Gateway/Grpc/MxAccessGatewayServiceTests.cs:483-489` |
| Tests-002 | High | Resolved | Security | `src/MxGateway.Tests/Gateway/Grpc/GalaxyRepositoryGrpcServiceTests.cs:198-210` |
| Worker-001 | High | Resolved | Concurrency & thread safety | `src/MxGateway.Worker/MxAccess/WnWrapAlarmConsumer.cs:204-207` |
| Worker-002 | High | Resolved | Correctness & logic bugs | `src/MxGateway.Worker/Ipc/WorkerPipeSession.cs:545-549` |
| Worker-003 | High | Resolved | Correctness & logic bugs | `src/MxGateway.Worker/Ipc/WorkerPipeSession.cs:399-403`, `:416-419` |
| Worker.Tests-001 | High | Resolved | Testing coverage | `src/MxGateway.Worker.Tests/Sta/` (no `StaMessagePumpTests.cs`) |
| Worker.Tests-002 | High | Resolved | Testing coverage | `src/MxGateway.Worker.Tests/MxAccess/MxAccessStaSessionTests.cs`, `src/MxGateway.Worker.Tests/MxAccess/MxAccessEventMapperTests.cs` |
| Client.Dotnet-001 | Medium | Resolved | Error handling & resilience | `clients/dotnet/MxGateway.Client/GrpcMxGatewayClientTransport.cs:190-199`, `clients/dotnet/MxGateway.Client/GrpcGalaxyRepositoryClientTransport.cs:131-140` |
| Client.Dotnet-002 | Medium | Resolved | Testing coverage | `clients/dotnet/MxGateway.Client.Tests/FakeGatewayTransport.cs:145-148`, `clients/dotnet/MxGateway.Client.Tests/MxGatewayClientSessionTests.cs:236-256` |
| Client.Dotnet-003 | Medium | Resolved | Concurrency & thread safety | `clients/dotnet/MxGateway.Client/MxGatewaySession.cs:659-663`, `clients/dotnet/MxGateway.Client/MxGatewayClient.cs:230-240` |
| Client.Go-002 | Medium | Resolved | Error handling & resilience | `clients/go/mxgateway/session.go:440-516` |
| Client.Go-003 | Medium | Resolved | Correctness & logic bugs | `clients/go/cmd/mxgw-go/main.go:517-532` |
| Client.Java-001 | Medium | Resolved | Security | `clients/java/mxgateway-client/src/main/java/com/dohertylan/mxgateway/client/MxGatewaySecrets.java:30-32` |
| Client.Java-002 | Medium | Resolved | Concurrency & thread safety | `clients/java/mxgateway-client/src/main/java/com/dohertylan/mxgateway/client/MxEventStream.java:31,66-92` |
| Client.Java-003 | Medium | Resolved | mxaccessgw conventions | `clients/java/mxgateway-client/src/main/java/com/dohertylan/mxgateway/client/MxGatewayClient.java:119-140` |
| Client.Java-004 | Medium | Resolved | Correctness & logic bugs | `clients/java/mxgateway-client/src/main/java/com/dohertylan/mxgateway/client/MxGatewaySession.java:114-120,157-163,191-197` |
| Client.Java-005 | Medium | Resolved | Error handling & resilience | `clients/java/mxgateway-client/src/main/java/com/dohertylan/mxgateway/client/MxGatewaySession.java:92-105` |
| Client.Python-003 | Medium | Resolved | Error handling & resilience | `clients/python/src/mxgateway/client.py:125-137,155-173` |
| Client.Python-005 | Medium | Resolved | Performance & resource management | `clients/python/src/mxgateway/galaxy.py:117-140` |
| Client.Python-009 | Medium | Resolved | Testing coverage | `clients/python/tests/` |
| Client.Rust-005 | Medium | Resolved | Correctness & logic bugs | `clients/rust/src/session.rs:489-520` |
| Client.Rust-006 | Medium | Resolved | Error handling & resilience | `clients/rust/src/session.rs:531-555` |
| Contracts-002 | Medium | Resolved | Error handling & resilience | `src/MxGateway.Contracts/Protos/mxaccess_gateway.proto:384-385`, `:95` |
| IntegrationTests-003 | Medium | Resolved | Correctness & logic bugs | `src/MxGateway.IntegrationTests/WorkerLiveMxAccessSmokeTests.cs:89-97` |
| IntegrationTests-004 | Medium | Resolved | Error handling & resilience | `src/MxGateway.IntegrationTests/WorkerLiveMxAccessSmokeTests.cs:108-111` |
| IntegrationTests-005 | Medium | Resolved | Testing coverage | `src/MxGateway.IntegrationTests/WorkerLiveMxAccessSmokeTests.cs` |
| IntegrationTests-006 | Medium | Resolved | Testing coverage | `src/MxGateway.IntegrationTests/DashboardLdapLiveTests.cs` |
| Server-002 | Medium | Resolved | Design-document adherence | `src/MxGateway.Server/Program.cs:24`, `src/MxGateway.Server/GatewayApplication.cs` |
| Server-004 | Medium | Resolved | Code organization & conventions | `src/MxGateway.Server/Security/Authentication/ApiKeyAdminCommandLineParser.cs:227-233`, `src/MxGateway.Server/Security/Authentication/ApiKeyAdminCliRunner.cs:53-77`, `src/MxGateway.Server/Dashboard/DashboardApiKeyManagementService.cs:21-67` |
| Server-005 | Medium | Resolved | Error handling & resilience | `src/MxGateway.Server/Galaxy/GalaxyHierarchyRefreshService.cs:22-28`, `src/MxGateway.Server/Galaxy/GalaxyHierarchyCache.cs:184` |
| Server-006 | Medium | Resolved | Correctness & logic bugs | `src/MxGateway.Server/Sessions/SessionManager.cs:84-114` |
| Tests-003 | Medium | Resolved | Performance & resource management | `src/MxGateway.Tests/Security/Authentication/SqliteAuthStoreTests.cs:170-176`, `src/MxGateway.Tests/Security/Authentication/ApiKeyAdminCliRunnerTests.cs:252-258` |
| Tests-004 | Medium | Resolved | Testing coverage | `src/MxGateway.Tests/Security/Authorization/GatewayGrpcAuthorizationInterceptorTests.cs` |
| Tests-005 | Medium | Resolved | Testing coverage | `src/MxGateway.Tests/Gateway/Grpc/EventStreamServiceTests.cs:239-261`, `src/MxGateway.Tests/Gateway/Sessions/SessionManagerTests.cs` |
| Tests-006 | Medium | Resolved | Concurrency & thread safety | `src/MxGateway.Tests/Gateway/Workers/WorkerClientTests.cs:76`, `src/MxGateway.Tests/Gateway/Workers/FakeWorkerHarnessTests.cs:122` |
| Worker-004 | Medium | Resolved | Correctness & logic bugs | `src/MxGateway.Worker/Ipc/WorkerPipeSession.cs:565-588` |
| Worker-005 | Medium | Resolved | Error handling & resilience | `src/MxGateway.Worker/MxAccess/MxAccessStaSession.cs:205-258` (production alarm poll loop) |
| Worker-006 | Medium | Resolved | Correctness & logic bugs | `src/MxGateway.Worker/Ipc/WorkerPipeSession.cs:117-124`, `src/MxGateway.Worker/MxAccess/MxAccessStaSession.cs:386-491` |
| Worker-007 | Medium | Resolved | mxaccessgw conventions | `src/MxGateway.Worker/MxAccess/MxAccessComServer.cs:130-150` |
| Worker-008 | Medium | Resolved | Concurrency & thread safety | `src/MxGateway.Worker/MxAccess/MxAccessStaSession.cs:205-249`, `:429-447` |
| Worker.Tests-003 | Medium | Resolved | Concurrency & thread safety | `src/MxGateway.Worker.Tests/Sta/StaRuntimeTests.cs:46-48` |
| Worker.Tests-004 | Medium | Resolved | Concurrency & thread safety | `src/MxGateway.Worker.Tests/MxAccess/MxAccessStaSessionTests.cs:281-329` |
| Worker.Tests-005 | Medium | Resolved | Performance & resource management | `src/MxGateway.Worker.Tests/Ipc/WorkerFrameProtocolTests.cs:20-31,103-105`, `src/MxGateway.Worker.Tests/Ipc/WorkerPipeSessionTests.cs:28-31` |
| Worker.Tests-006 | Medium | Resolved | Performance & resource management | `src/MxGateway.Worker.Tests/MxAccess/MxAccessStaSessionTests.cs:282,305,315,323` |
| Worker.Tests-007 | Medium | Resolved | Design-document adherence | `docs/WorkerFrameProtocol.md:38-49` |
| Client.Dotnet-004 | Low | Resolved | Error handling & resilience | `clients/dotnet/MxGateway.Client/MxGatewayClient.cs:283-294`, `clients/dotnet/MxGateway.Client/GalaxyRepositoryClient.cs:392-403` |
| Client.Dotnet-005 | Low | Resolved | Correctness & logic bugs | `clients/dotnet/MxGateway.Client/MxGatewaySession.cs:82,124,175` |
| Client.Dotnet-006 | Low | Resolved | Code organization & conventions | `clients/dotnet/MxGateway.Client/MxGatewayClientOptions.cs:50`, `clients/dotnet/MxGateway.Client/MxGatewayClientContractInfo.cs:10-14` |
| Client.Dotnet-007 | Low | Resolved | Documentation & comments | `clients/dotnet/MxGateway.Client/MxGatewayClient.cs:185-192` |
| Client.Dotnet-008 | Low | Resolved | Correctness & logic bugs | `clients/dotnet/MxGateway.Client.Cli/MxGatewayCliSecretRedactor.cs:9-17` |
| Client.Go-004 | Low | Resolved | mxaccessgw conventions | `clients/go/mxgateway/alarms_test.go:153-154`, `clients/go/mxgateway/galaxy_test.go:58-59` |
| Client.Go-005 | Low | Resolved | Design-document adherence | `clients/go/mxgateway/client.go:64,68`, `clients/go/mxgateway/galaxy.go:83,87` |
| Client.Go-006 | Low | Resolved | Error handling & resilience | `clients/go/mxgateway/errors.go:9-130` |
| Client.Go-007 | Low | Resolved | Correctness & logic bugs | `clients/go/mxgateway/session.go:526-532` |
| Client.Go-008 | Low | Resolved | Testing coverage | `clients/go/mxgateway/` (test files) |
| Client.Go-009 | Low | Resolved | Code organization & conventions | `clients/go/mxgateway/galaxy.go:60-93,241-256`, `clients/go/mxgateway/client.go:41-74,190-205` |
| Client.Go-010 | Low | Resolved | Documentation & comments | `clients/go/mxgateway/client.go:39-40` |
| Client.Java-006 | Low | Resolved | Performance & resource management | `clients/java/mxgateway-client/src/main/java/com/dohertylan/mxgateway/client/MxGatewayClient.java:323-328`, `clients/java/mxgateway-client/src/main/java/com/dohertylan/mxgateway/client/GalaxyRepositoryClient.java:279-284` |
| Client.Java-007 | Low | Resolved | Testing coverage | `clients/java/mxgateway-client/src/test/java/com/dohertylan/mxgateway/client/` |
| Client.Java-008 | Low | Resolved | Error handling & resilience | `clients/java/mxgateway-client/src/main/java/com/dohertylan/mxgateway/client/MxGatewayClient.java:298-304` |
| Client.Java-009 | Low | Resolved | Code organization & conventions | `clients/java/mxgateway-client/src/main/java/com/dohertylan/mxgateway/client/GalaxyRepositoryClient.java:310-391`, `clients/java/mxgateway-client/src/main/java/com/dohertylan/mxgateway/client/MxGatewayClient.java:346-413` |
| Client.Java-010 | Low | Resolved | Documentation & comments | `clients/java/mxgateway-client/src/main/java/com/dohertylan/mxgateway/client/MxGatewayClient.java:269-272`, `clients/java/README.md:76` |
| Client.Java-011 | Low | Resolved | Performance & resource management | `clients/java/mxgateway-client/src/main/java/com/dohertylan/mxgateway/client/MxEventStream.java:37-63` |
| Client.Java-012 | Low | Resolved | Correctness & logic bugs | `clients/java/mxgateway-cli/src/main/java/com/dohertylan/mxgateway/cli/MxGatewayCli.java:667-674` |
| Client.Python-001 | Low | Resolved | Documentation & comments | `clients/python/pyproject.toml:8,25`, `clients/python/src/mxgateway_cli/commands.py:25` |
| Client.Python-002 | Low | Resolved | Code organization & conventions | `clients/python/src/mxgateway/__init__.py:27` |
| Client.Python-004 | Low | Resolved | Correctness & logic bugs | `clients/python/src/mxgateway_cli/commands.py:386,402-404` |
| Client.Python-006 | Low | Resolved | Concurrency & thread safety | `clients/python/src/mxgateway/client.py:74-82`, `clients/python/src/mxgateway/galaxy.py:85-93`, `clients/python/src/mxgateway/session.py:38-55` |
| Client.Python-007 | Low | Resolved | Error handling & resilience | `clients/python/src/mxgateway/client.py:204-213` |
| Client.Python-008 | Low | Resolved | Correctness & logic bugs | `clients/python/src/mxgateway/values.py:62-67,83-88` |
| Client.Python-010 | Low | Resolved | Code organization & conventions | `clients/python/src/mxgateway/session.py:404`, `clients/python/src/mxgateway_cli/commands.py:422-425` |
| Client.Python-011 | Low | Resolved | Error handling & resilience | `clients/python/src/mxgateway/errors.py:122-148` |
| Client.Python-012 | Low | Won't Fix | mxaccessgw conventions | `clients/python/src/mxgateway/client.py:84-108`, `clients/python/src/mxgateway/session.py:57-77` |
| Client.Rust-004 | Low | Resolved | Documentation & comments | `clients/rust/src/version.rs:7` |
| Client.Rust-007 | Low | Resolved | Design-document adherence | `clients/rust/RustClientDesign.md:14-55` |
| Client.Rust-008 | Low | Resolved | Performance & resource management | `clients/rust/src/value.rs:161-261` |
| Client.Rust-009 | Low | Resolved | Testing coverage | `clients/rust/tests/client_behavior.rs`, `clients/rust/src/galaxy.rs` |
| Client.Rust-010 | Low | Resolved | Error handling & resilience | `clients/rust/src/client.rs:255-268`, `clients/rust/src/galaxy.rs:204-216` |
| Client.Rust-011 | Low | Resolved | mxaccessgw conventions | `clients/rust/src/session.rs:469` |
| Contracts-001 | Low | Resolved | Design-document adherence | `docs/Grpc.md:13` (and `:3`, `:32`, `:39`) |
| Contracts-003 | Low | Won't Fix | Code organization & conventions | `src/MxGateway.Contracts/MxGateway.Contracts.csproj:10` |
| Contracts-004 | Low | Resolved | Documentation & comments | `src/MxGateway.Contracts/GatewayContractInfo.cs:3-6` |
| Contracts-005 | Low | Resolved | mxaccessgw conventions | `src/MxGateway.Contracts/Protos/mxaccess_gateway.proto`, `src/MxGateway.Contracts/Protos/mxaccess_worker.proto` |
| Contracts-006 | Low | Resolved | Correctness & logic bugs | `src/MxGateway.Contracts/Protos/mxaccess_gateway.proto:647` |
| Contracts-007 | Low | Resolved | Testing coverage | `src/MxGateway.Tests/Contracts/ProtobufContractRoundTripTests.cs` |
| Contracts-008 | Low | Resolved | Design-document adherence | `src/MxGateway.Contracts/Protos/mxaccess_gateway.proto:451-459`, `:627-636` |
| IntegrationTests-007 | Low | Resolved | Concurrency & thread safety | `src/MxGateway.IntegrationTests/WorkerLiveMxAccessSmokeTests.cs:20`, `src/MxGateway.IntegrationTests/Galaxy/GalaxyRepositoryLiveTests.cs:5`, `src/MxGateway.IntegrationTests/DashboardLdapLiveTests.cs:9` |
| IntegrationTests-008 | Low | Resolved | Code organization & conventions | `src/MxGateway.IntegrationTests/LiveLdapFactAttribute.cs`, `src/MxGateway.IntegrationTests/Galaxy/LiveGalaxyRepositoryFactAttribute.cs`, `src/MxGateway.IntegrationTests/LiveMxAccessFactAttribute.cs` |
| IntegrationTests-009 | Low | Resolved | Documentation & comments | `src/MxGateway.IntegrationTests/WorkerLiveMxAccessSmokeTests.cs:372-375` |
| IntegrationTests-010 | Low | Resolved | Correctness & logic bugs | `src/MxGateway.IntegrationTests/WorkerLiveMxAccessSmokeTests.cs:366-369` |
| Server-007 | Low | Resolved | Performance & resource management | `src/MxGateway.Server/Galaxy/GalaxyHierarchyProjector.cs:55-70` |
| Server-008 | Low | Resolved | Performance & resource management | `src/MxGateway.Server/Grpc/GalaxyRepositoryGrpcService.cs:111-134,160-189` |
| Server-009 | Low | Resolved | Error handling & resilience | `src/MxGateway.Server/Security/Authentication/AuthSqliteConnectionFactory.cs:15-32` |
| Server-010 | Low | Resolved | Security | `src/MxGateway.Server/Security/Authentication/SqliteApiKeyAdminStore.cs:91-114`, `src/MxGateway.Server/Dashboard/Components/Pages/ApiKeysPage.razor:168-172` |
| Server-011 | Low | Resolved | Code organization & conventions | `src/MxGateway.Server/Sessions/WorkerAlarmRpcDispatcher.cs:1-46` |
| Server-012 | Low | Resolved | Documentation & comments | `CLAUDE.md` (Authentication section and `apikey create` example) |
| Server-013 | Low | Resolved | Testing coverage | `src/MxGateway.Tests/Gateway/Dashboard/DashboardAuthorizationHandlerTests.cs`, `src/MxGateway.Tests/Gateway/GatewayApplicationTests.cs` |
| Server-014 | Low | Resolved | Documentation & comments | `src/MxGateway.Server/Grpc/MxAccessGatewayService.cs:162-171,191-198,206-214,229-237` |
| Tests-007 | Low | Resolved | Code organization & conventions | `src/MxGateway.Tests/Gateway/Grpc/MxAccessGatewayServiceTests.cs:682`, `src/MxGateway.Tests/Gateway/Grpc/GalaxyRepositoryGrpcServiceTests.cs:324`, `src/MxGateway.Tests/Gateway/GatewayEndToEndFakeWorkerSmokeTests.cs:460`, `src/MxGateway.Tests/Security/Authorization/GatewayGrpcAuthorizationInterceptorTests.cs:233` |
| Tests-008 | Low | Resolved | mxaccessgw conventions | `src/MxGateway.Tests/Gateway/Sessions/WorkerAlarmRpcDispatcherTests.cs:1-9`, `src/MxGateway.Tests/Gateway/Sessions/NotWiredAlarmRpcDispatcherTests.cs:1-3`, `src/MxGateway.Tests/Gateway/Sessions/SessionManagerAlarmAutoSubscribeTests.cs:1` |
| Tests-009 | Low | Resolved | Documentation & comments | `src/MxGateway.Tests/Gateway/Sessions/SessionManagerTests.cs:36-37,99,365` |
| Tests-010 | Low | Resolved | Security | `src/MxGateway.Tests/Gateway/Dashboard/DashboardAuthorizationHandlerTests.cs:26-36` |
| Tests-011 | Low | Resolved | Correctness & logic bugs | `src/MxGateway.Tests/Gateway/GatewayEndToEndFakeWorkerSmokeTests.cs:233-301` |
| Tests-012 | Low | Resolved | Concurrency & thread safety | `src/MxGateway.Tests/Gateway/Workers/Fakes/FakeWorkerHarness.cs:62`, `src/MxGateway.Tests/Gateway/Workers/WorkerClientTests.cs:472` |
| Worker-009 | Low | Resolved | Performance & resource management | `src/MxGateway.Worker/Ipc/WorkerFrameReader.cs:31,49`, `src/MxGateway.Worker/Ipc/WorkerFrameWriter.cs:57-58` |
| Worker-010 | Low | Resolved | Correctness & logic bugs | `src/MxGateway.Worker/Conversion/VariantConverter.cs:204-226` |
| Worker-011 | Low | Resolved | Correctness & logic bugs | `src/MxGateway.Worker/Ipc/WorkerPipeClient.cs:169-171` |
| Worker-012 | Low | Resolved | Documentation & comments | `src/MxGateway.Worker/MxAccess/MxAccessAlarmEventSink.cs:44-55`, `src/MxGateway.Worker/MxAccess/WnWrapAlarmConsumer.cs:38-43`, `src/MxGateway.Worker/MxAccess/MxAccessEventMapper.cs:106-112` |
| Worker-013 | Low | Resolved | Testing coverage | `src/MxGateway.Worker/Sta/StaMessagePump.cs` |
| Worker-014 | Low | Resolved | Code organization & conventions | `src/MxGateway.Worker/MxAccess/AlarmCommandHandler.cs:33`, `:202` |
| Worker-015 | Low | Resolved | Correctness & logic bugs | `src/MxGateway.Worker/MxAccess/MxAccessEventQueue.cs:115-145` |
| Worker.Tests-008 | Low | Resolved | Documentation & comments | `src/MxGateway.Worker.Tests/Conversion/VariantConverterTests.cs:175-182` |
| Worker.Tests-009 | Low | Resolved | Code organization & conventions | `src/MxGateway.Worker.Tests/MxAccess/AlarmCommandHandlerTests.cs`, `AlarmDispatcherTests.cs`, `AlarmCommandExecutorTests.cs`, `AlarmRecordTransitionMapperTests.cs`, `WnWrapAlarmConsumerXmlTests.cs` |
| Worker.Tests-010 | Low | Resolved | Correctness & logic bugs | `src/MxGateway.Worker.Tests/MxAccess/MxAccessStaSessionTests.cs:230-258` |
| Worker.Tests-011 | Low | Resolved | Documentation & comments | `src/MxGateway.Worker.Tests/Sta/StaCommandDispatcherTests.cs:92-112` |
| Worker.Tests-012 | Low | Resolved | Testing coverage | `src/MxGateway.Worker.Tests/Ipc/WorkerFrameProtocolTests.cs` |
| Worker.Tests-013 | Low | Resolved | Concurrency & thread safety | `src/MxGateway.Worker.Tests/Ipc/WorkerPipeSessionTests.cs:539-546` |
| Worker.Tests-014 | Low | Resolved | Code organization & conventions | `src/MxGateway.Worker.Tests/Ipc/WorkerPipeClientTests.cs:194`, `WorkerPipeSessionTests.cs:622`, `Sta/StaCommandDispatcherTests.cs:348`, `MxAccess/MxAccessStaSessionTests.cs:334`, `MxAccess/MxAccessCommandExecutorTests.cs:1124` |
| Worker.Tests-015 | Low | Resolved | Testing coverage | `src/MxGateway.Worker.Tests/MxAccess/MxAccessEventQueueTests.cs` |
+237
View File
@@ -0,0 +1,237 @@
# Code Review — Server
| Field | Value |
|---|---|
| Module | `src/MxGateway.Server` |
| Reviewer | Claude Code |
| Review date | 2026-05-18 |
| Commit reviewed | `6c64030` |
| Status | Reviewed |
| Open findings | 0 |
## Checklist coverage
| # | Category | Result |
|---|---|---|
| 1 | Correctness & logic bugs | Issues found: Server-006 (metrics open-session leak on alarm auto-subscribe failure), Server-010 (rotate reactivates revoked keys). |
| 2 | mxaccessgw conventions | Issues found: Server-002 (orphan-worker termination on startup not implemented), Server-011 (style deviation in `WorkerAlarmRpcDispatcher`). |
| 3 | Concurrency & thread safety | No issues found — locking is correct; inconsistent-but-safe discipline in `GatewayMetrics` noted only. |
| 4 | Error handling & resilience | Issues found: Server-005 (Galaxy first-load can fault the host BackgroundService), Server-009 (SQLite has no busy-timeout/WAL under concurrent writes). |
| 5 | Security | Issues found: Server-001 (Critical: dashboard authorization never enforced on any route), Server-003 (LDAP dashboard users denied for lack of a scope claim), Server-010. |
| 6 | Performance & resource management | Issues found: Server-007 (DiscoverHierarchy paging is O(total) per page), Server-008 (WatchDeployEvents re-projects whole hierarchy per event). |
| 7 | Design-document adherence | Issues found: Server-002 (orphan workers), Server-012 (CLAUDE.md scope names stale vs code/docs). |
| 8 | Code organization & conventions | Issues found: Server-011 (style), Server-004 (CLI accepts unvalidated scope strings). |
| 9 | Testing coverage | Issues found: Server-013 (no dashboard route-level authorization test; `WorkerExecutableValidator`, `GalaxyGlobMatcher`, projector paging untested). |
| 10 | Documentation & comments | Issues found: Server-014 (stale "not yet wired" alarm comments), Server-012. |
## Findings
### Server-001
| Field | Value |
|---|---|
| Severity | Critical |
| Category | Security |
| Location | `src/MxGateway.Server/GatewayApplication.cs:147-149`, `src/MxGateway.Server/Dashboard/DashboardEndpointRouteBuilderExtensions.cs:55-58`, `src/MxGateway.Server/Dashboard/Components/Routes.razor:1-15` |
| Status | Resolved |
**Description:** The dashboard authorization policy (`DashboardAuthenticationDefaults.AuthorizationPolicy`), `DashboardAuthorizationRequirement`, and `DashboardAuthorizationHandler` are registered in DI but never applied to any endpoint. `MapRazorComponents<App>()` has no `.RequireAuthorization(...)`, the `<Router>` in `Routes.razor` uses plain `RouteView` (not `AuthorizeRouteView`), and no dashboard page carries `[Authorize]` — a module-wide grep finds zero `RequireAuthorization`/`[Authorize]`/`AuthorizeRouteView` usages. Every dashboard page (Sessions, Workers, Events, Galaxy, Settings, and the API Keys list exposing key IDs, scopes, and constraints) is reachable by any unauthenticated remote client regardless of `Dashboard:AllowAnonymousLocalhost` or `Dashboard:RequireAdminScope`. Only the API-key mutation operations remain protected, via the separate `DashboardApiKeyManagementService.CanManage` check.
**Recommendation:** Apply the policy at the route level — `endpoints.MapRazorComponents<App>().AddInteractiveServerRenderMode().RequireAuthorization(DashboardAuthenticationDefaults.AuthorizationPolicy)` — and/or switch `Routes.razor` to `AuthorizeRouteView` with a `[Authorize]` fallback policy plus a `NotAuthorized` redirect to the login page. Add an integration test that GETs a dashboard page anonymously and asserts 302-to-login / 401.
**Resolution:** Resolved in `a8aafdf` (2026-05-18): `MapRazorComponents<App>()` now calls `.RequireAuthorization(DashboardAuthenticationDefaults.AuthorizationPolicy)`, so an unauthenticated request to any dashboard component route is challenged by the cookie scheme and redirected to the login page. `GatewayApplicationTests` gained `ComponentRoutesRequireAuthorization` (component routes carry the policy) and `AuthEndpointsAllowAnonymousAccess`, replacing the prior test that asserted the insecure behavior.
### Server-002
| Field | Value |
|---|---|
| Severity | Medium |
| Category | Design-document adherence |
| Location | `src/MxGateway.Server/Program.cs:24`, `src/MxGateway.Server/GatewayApplication.cs` |
| Status | Resolved |
**Description:** `gateway.md:583` and CLAUDE.md state the first version "terminates orphaned workers on startup." No code in MxGateway.Server enumerates or kills leftover `MxGateway.Worker.exe` processes at startup — a grep for `orphan`/`reattach`/`terminate` finds nothing. After an unclean gateway crash, x86 worker processes (each holding an MXAccess COM instance) leak and survive indefinitely, and a restarted gateway does not reclaim or kill them.
**Recommendation:** Add a startup hosted service that finds and kills stale worker processes (by executable path / a well-known argument or environment marker) before the server accepts sessions, or update the design docs if reattachment/cleanup is deliberately deferred.
**Resolution:** Resolved 2026-05-18. Confirmed against source: no code path enumerated or killed leftover workers. Added `IRunningProcessInspector` / `SystemRunningProcessInspector` (a testable seam over `Process.GetProcessesByName`/`Kill`), `OrphanWorkerTerminator` (kills processes matched by the configured worker executable path, or by image name when the x64 gateway cannot introspect the x86 worker's `MainModule`, skipping the current process and tolerating per-process kill failures), and `OrphanWorkerCleanupHostedService` (best-effort `IHostedService`). The hosted service is registered in `AddWorkerProcessLauncher` ahead of `AddGatewaySessions` so cleanup runs before the server accepts sessions. `gateway.md` updated to describe the implemented behavior. Regression tests: `OrphanWorkerTerminatorTests` (`KillsWorkerProcessesMatchingConfiguredExecutablePath`, `KillsImageNameMatchWhenExecutablePathUnreadable`, `DoesNotKillUnrelatedProcessSharingImageName`, `DoesNotKillCurrentProcess`, `ContinuesWhenOneKillThrows`).
### Server-003
| Field | Value |
|---|---|
| Severity | High |
| Category | Security |
| Location | `src/MxGateway.Server/Dashboard/DashboardAuthorizationHandler.cs:39,54-59`, `src/MxGateway.Server/Dashboard/DashboardAuthenticator.cs:236-258` |
| Status | Resolved |
**Description:** When `Dashboard:RequireAdminScope` is true (the default) and the request is not loopback, `DashboardAuthorizationHandler` succeeds only if `HasAdminScope` finds a claim of type `"scope"` with value `"admin"`. But `DashboardAuthenticator.CreatePrincipal` issues only `NameIdentifier`, `Name`, and `LdapGroupClaimType` claims — never a `scope`/`admin` claim. So a correctly LDAP-authenticated user who passed the required-group check is still denied dashboard access on any non-loopback connection. The bug is currently masked by the missing route-level enforcement (Server-001) and by `AllowAnonymousLocalhost`; fixing Server-001 would make the dashboard unusable for all real LDAP logins.
**Recommendation:** Either have `DashboardAuthenticator.CreatePrincipal` add a `scope=admin` claim when the user is in the required group, or change `DashboardAuthorizationHandler.HasAdminScope` to evaluate LDAP group membership (reuse `IsMemberOfRequiredGroup` against the `LdapGroupClaimType` claims, as `DashboardApiKeyAuthorization.CanManage` already does).
**Resolution:** Resolved in `a8aafdf` (2026-05-18): `DashboardAuthenticator.CreatePrincipal` — reached only after the required-group check passes — now emits the `scope=admin` claim that `DashboardAuthorizationHandler` checks, so group-validated LDAP users pass `RequireAdminScope` once route-level authorization (Server-001) is enforced.
### Server-004
| Field | Value |
|---|---|
| Severity | Medium |
| Category | Code organization & conventions |
| Location | `src/MxGateway.Server/Security/Authentication/ApiKeyAdminCommandLineParser.cs:227-233`, `src/MxGateway.Server/Security/Authentication/ApiKeyAdminCliRunner.cs:53-77`, `src/MxGateway.Server/Dashboard/DashboardApiKeyManagementService.cs:21-67` |
| Status | Resolved |
**Description:** `ParseScopes` accepts any comma-separated strings and `CreateKeyAsync` persists them verbatim; neither the CLI nor the dashboard create path validates scopes against `GatewayScopes`. A typo or non-canonical name (e.g. CLAUDE.md's example `--scopes session,invoke,event,metadata,admin`, which does not match the resolver's `session:open`/`invoke:read`/etc.) silently creates a key whose scope strings the authorization resolver never checks for — the key is unusable for those RPCs with no error at creation time.
**Recommendation:** Validate every requested scope against the `GatewayScopes` catalog at create time in both the CLI parser/runner and `DashboardApiKeyManagementService.ValidateCreateRequest`, rejecting unknown scope strings.
**Resolution:** Resolved 2026-05-18. Confirmed against source: `ParseScopes` split unvalidated strings into the create command and `ValidateCreateRequest` checked only key id and display name. Added `GatewayScopes.All` (the canonical scope catalog) and `GatewayScopes.IsKnown(string)`. `ApiKeyAdminCommandLineParser.Parse` now runs `ValidateScopes` for create-key commands and fails the parse listing the unknown scope(s) and valid set; `DashboardApiKeyManagementService.ValidateCreateRequest` rejects requests carrying any non-canonical scope. Revoke/rotate paths are unaffected (no scope input). Regression tests: `ApiKeyAdminCommandLineParserTests.Parse_CreateKeyCommand_RejectsUnknownScope`, `Parse_CreateKeyCommand_AcceptsAllCanonicalScopes`, and `DashboardApiKeyManagementServiceTests.CreateAsync_UnknownScope_DoesNotCallStore`.
### Server-005
| Field | Value |
|---|---|
| Severity | Medium |
| Category | Error handling & resilience |
| Location | `src/MxGateway.Server/Galaxy/GalaxyHierarchyRefreshService.cs:22-28`, `src/MxGateway.Server/Galaxy/GalaxyHierarchyCache.cs:184` |
| Status | Resolved |
**Description:** `GalaxyHierarchyCache.RefreshCoreAsync` only catches `SqlException` and `InvalidOperationException`. The initial `cache.RefreshAsync` call in `GalaxyHierarchyRefreshService.ExecuteAsync` is wrapped only for `OperationCanceledException`. A transient non-`SqlException` failure on the first refresh (e.g. a `Win32Exception`/`TimeoutException` from connection establishment, or another `DbException` subtype) escapes both layers, faults the `BackgroundService`, and — with default host behavior — stops the whole gateway. The periodic-tick loop does catch general exceptions, so only the first load is exposed.
**Recommendation:** Broaden the `catch` in `RefreshCoreAsync` to all non-cancellation exceptions (record `Unavailable`/`Stale` and still complete `_firstLoad`), or wrap the initial `RefreshAsync` in `GalaxyHierarchyRefreshService` with the same general `catch` the tick loop uses.
**Resolution:** Resolved 2026-05-18. Confirmed against source: the initial `RefreshAsync` in `ExecuteAsync` was guarded only for `OperationCanceledException`, and `RefreshCoreAsync` filtered its catch to `SqlException or InvalidOperationException`. Both recommended layers applied: `GalaxyHierarchyRefreshService.ExecuteAsync` now catches every non-cancellation exception on the initial load (logs a warning; the periodic tick retries), and `GalaxyHierarchyCache.RefreshCoreAsync` broadens its catch to all non-cancellation exceptions so the cache still records `Stale`/`Unavailable` and completes `_firstLoad`. The now-unused `Microsoft.Data.SqlClient` using was removed. Regression test: `GalaxyHierarchyRefreshServiceTests.ExecuteAsync_WhenFirstRefreshThrowsNonCancellationException_DoesNotFaultBackgroundService`.
### Server-006
| Field | Value |
|---|---|
| Severity | Medium |
| Category | Correctness & logic bugs |
| Location | `src/MxGateway.Server/Sessions/SessionManager.cs:84-114` |
| Status | Resolved |
**Description:** In `OpenSessionAsync`, `_metrics.SessionOpened()` (line 89) increments the `_openSessions` gauge before `TryAutoSubscribeAlarmsAsync` runs. If auto-subscribe throws (which it does when `Alarms.RequireSubscribeOnOpen` is true and the worker rejects the subscription), the `catch` block disposes and removes the session and records `_metrics.Fault(...)` but never calls `SessionClosed`/`SessionRemoved`. The `mxgateway.sessions.open` gauge permanently over-counts by one for every such failed open.
**Recommendation:** In the `catch` block, when the session had reached the point where `SessionOpened()` was recorded, also call `_metrics.SessionRemoved()` — or move the `SessionOpened()` call to after auto-subscribe succeeds.
**Resolution:** Resolved 2026-05-18. Confirmed against source: the `catch` block in `OpenSessionAsync` recorded `Fault(...)` and removed the session but never decremented the open-session gauge after `SessionOpened()` had run. Added a `sessionOpenedRecorded` flag set immediately after `_metrics.SessionOpened()`; the `catch` block now calls `_metrics.SessionRemoved()` when that flag is set, restoring the gauge for a post-`SessionOpened()` failure (e.g. an auto-subscribe rejection with `RequireSubscribeOnOpen=true`). Regression test: `SessionManagerAlarmAutoSubscribeTests.OpenSessionAsync_DoesNotLeakOpenSessionGauge_WhenAutoSubscribeFailsWithRequireOn`.
### Server-007
| Field | Value |
|---|---|
| Severity | Low |
| Category | Performance & resource management |
| Location | `src/MxGateway.Server/Galaxy/GalaxyHierarchyProjector.cs:55-70` |
| Status | Resolved |
**Description:** `Project` always iterates the full `entry.Index.ObjectViews` collection and re-applies all filters to skip `offset` matched items before collecting a page. Paging through a large Galaxy hierarchy is therefore O(total) per page and O(total²/pageSize) end-to-end. The cache is in-memory so impact is bounded, but for large galaxies repeated `DiscoverHierarchy` pagination wastes CPU.
**Recommendation:** Precompute and cache the filtered, ordered view list per `(filterSignature, sequence)` so subsequent pages are an O(pageSize) slice; the existing filter signature already keys page tokens.
**Resolution:** Resolved 2026-05-18. Confirmed against source: `Project` re-scanned and re-filtered the whole `ObjectViews` list on every page. Added a `ConditionalWeakTable<GalaxyHierarchyCacheEntry, ConcurrentDictionary<string, IReadOnlyList<GalaxyObjectView>>>` memo in `GalaxyHierarchyProjector`: the first projection of a given filter signature builds the filtered, ordered view list; subsequent pages take an O(pageSize) slice via index arithmetic. The memo is keyed on the immutable cache-entry instance, so when the cache publishes a new entry the stale memo becomes unreachable and is reclaimed with it — no explicit invalidation. `ResolveRoot` still runs before the memo lookup so a missing root surfaces `NotFound` consistently. Regression tests: `GalaxyHierarchyProjectorTests` (`Project_PagedAcrossEntireHierarchy_ReturnsEveryObjectExactlyOnce`, `Project_DistinctFiltersOnSameEntry_DoNotShareMemoizedViewList`, `Project_SameFilterRepeated_ReturnsIdenticalTotals`, `Project_DistinctCacheEntries_ProjectAgainstTheirOwnData`); existing `GalaxyRepositoryGrpcServiceTests` paging tests continue to pass unchanged.
### Server-008
| Field | Value |
|---|---|
| Severity | Low |
| Category | Performance & resource management |
| Location | `src/MxGateway.Server/Grpc/GalaxyRepositoryGrpcService.cs:111-134,160-189` |
| Status | Resolved |
**Description:** `WatchDeployEvents` calls `ResolveBrowseSubtrees()` on every streamed event, and `MapDeployEvent` re-runs `GalaxyHierarchyProjector.Project` over the entire cached hierarchy (and `Sum`s attribute counts) for every event of every constrained subscriber. `GalaxyGlobMatcher.IsMatch` also rebuilds the glob regex on each call. With many constrained subscribers and frequent deploys this is avoidable work.
**Recommendation:** Hoist `ResolveBrowseSubtrees()` out of the loop; compute scoped object/attribute counts once per deploy sequence and cache by `(sequence, browseSubtrees)`; cache compiled glob `Regex` instances in `GalaxyGlobMatcher`.
**Resolution:** Resolved 2026-05-18. Confirmed against source. Three changes: (1) `WatchDeployEvents` now resolves `ResolveBrowseSubtrees()` once before the streaming loop — the caller's identity and constraints are fixed for the stream lifetime, so per-event resolution was pure waste. (2) `GalaxyGlobMatcher` now caches compiled `Regex` instances in a `ConcurrentDictionary` keyed by glob pattern (with `RegexOptions.Compiled`), so the same handful of globs are translated once instead of on every `IsMatch` call. (3) The per-event `MapDeployEvent` re-projection is no longer a separate hot path: with finding Server-007 resolved, `GalaxyHierarchyProjector.Project` memoizes the filtered view list per `(cache entry, filter signature)`, so the scoped-count projection in `MapDeployEvent` for a constrained subscriber is O(matched-slice) after the first event of a given deploy sequence rather than a full re-scan — this subsumes the recommendation's `(sequence, browseSubtrees)` cache (the memo is keyed on the per-sequence cache-entry instance and the browse-subtree-bearing filter signature). Regression tests: `GalaxyFilterInputSafetyTests.GlobMatcher_RepeatedAndInterleavedPatterns_StayCorrect` (glob cache correctness); existing `WatchDeployEvents` and `GalaxyFilterInputSafetyTests` coverage continues to pass.
### Server-009
| Field | Value |
|---|---|
| Severity | Low |
| Category | Error handling & resilience |
| Location | `src/MxGateway.Server/Security/Authentication/AuthSqliteConnectionFactory.cs:15-32` |
| Status | Resolved |
**Description:** Each auth-store operation opens a fresh `SqliteConnection` with no busy timeout, no WAL journal mode, and default journaling. `MarkKeyUsedAsync` runs on every authenticated request and `SqliteApiKeyAuditStore` appends on every denial; under concurrent load these writers can collide and surface `SQLITE_BUSY` as a hard failure on the request path.
**Recommendation:** Set `Pooling`, a non-zero `DefaultTimeout`/`busy_timeout`, and enable WAL (`PRAGMA journal_mode=WAL`) once at startup so concurrent readers/writers degrade gracefully.
**Resolution:** Resolved 2026-05-18. Confirmed against source: the connection string set only `DataSource` and `Mode`. `AuthSqliteConnectionFactory.CreateConnection` now also sets `Pooling = true` and a non-zero `DefaultTimeout`. A new `OpenConnectionAsync(CancellationToken)` opens the connection and applies `PRAGMA journal_mode=WAL` and `PRAGMA busy_timeout` (5 s); WAL is a persistent database-level setting so re-applying it per connection is a cheap no-op, while `busy_timeout` is per-connection state. All nine auth-store call sites (`SqliteApiKeyAdminStore`, `SqliteApiKeyAuditStore`, `SqliteApiKeyStore`, `SqliteAuthStoreMigrator`) were switched from `CreateConnection()` + `OpenAsync()` to `OpenConnectionAsync()`. `docs/Authentication.md` updated to describe the WAL/busy-timeout behavior. Regression test: `SqliteAuthStoreTests.OpenConnectionAsync_EnablesWalJournalModeAndBusyTimeout`.
### Server-010
| Field | Value |
|---|---|
| Severity | Low |
| Category | Security |
| Location | `src/MxGateway.Server/Security/Authentication/SqliteApiKeyAdminStore.cs:91-114`, `src/MxGateway.Server/Dashboard/Components/Pages/ApiKeysPage.razor:168-172` |
| Status | Resolved |
**Description:** `RotateAsync` sets `revoked_utc = NULL`, so rotating a previously revoked key silently reactivates it. This is documented intentional behavior in `docs/Authentication.md:167`, but the dashboard renders the "Rotate" button unconditionally — including for keys whose status badge says "Revoked" — so an operator can un-revoke a deliberately disabled key without an explicit warning.
**Recommendation:** Either hide/disable the Rotate action for revoked keys in `ApiKeysPage.razor`, require an explicit confirmation, or have `RotateAsync` preserve `revoked_utc` and add a separate explicit "reactivate" operation.
**Resolution:** Resolved 2026-05-18. Confirmed against source: `ApiKeysPage.razor` rendered the Rotate button unconditionally while Revoke was already gated on `key.RevokedUtc is null`. Took the lowest-risk recommended option — the dashboard now renders the Rotate (and Revoke) actions only for keys whose status is `Active`; a revoked key shows a "No actions" placeholder, so an operator cannot un-revoke a deliberately disabled key as a side effect of a rotation. `RotateAsync`'s store-level behavior is unchanged (rotation by `key_id` still clears `revoked_utc`, which the CLI relies on); `docs/Authentication.md` updated to document both the store behavior and the dashboard restriction. No automated test added: the change is pure conditional Razor rendering and the test project has no bUnit component-rendering harness; the underlying `DashboardApiKeyManagementService` is already unit-tested.
### Server-011
| Field | Value |
|---|---|
| Severity | Low |
| Category | Code organization & conventions |
| Location | `src/MxGateway.Server/Sessions/WorkerAlarmRpcDispatcher.cs:1-46` |
| Status | Resolved |
**Description:** `WorkerAlarmRpcDispatcher` deviates from the module's conventions: it fully-qualifies `System.Guid`, `System.ArgumentNullException`, and `System.Threading` types inline instead of relying on `using` directives, and uses an explicit constructor with `this.`-qualified field assignment while the rest of the module (e.g. `ConstraintEnforcer`, `MxAccessGatewayService`, `GalaxyRepositoryGrpcService`) uses primary constructors. `docs/style-guides/CSharpStyleGuide.md` is authoritative for gateway code.
**Recommendation:** Add the needed `using` directives, drop the inline fully-qualified names, and convert to a primary constructor for consistency.
**Resolution:** Resolved 2026-05-18. Confirmed against source. Converted `WorkerAlarmRpcDispatcher` to a primary constructor with the standard `?? throw new ArgumentNullException(...)` field-initializer guard; dropped the inline `System.Guid` / `System.ArgumentNullException` qualifications (using implicit `using System;`); removed redundant `using System.Collections.Generic;` / `System.Threading` / `System.Threading.Tasks;` directives (covered by `ImplicitUsings`); replaced the two `if (... is null) throw new System.ArgumentNullException(...)` checks with `ArgumentNullException.ThrowIfNull`. The stale class-level `<summary>`/`<remarks>` ("Replaces NotWiredAlarmRpcDispatcher once ... wired in", "partially wired", "returns an Unimplemented diagnostic") were corrected to describe the actual GUID-vs-`Provider!Group.Tag` handling — overlapping with Server-014. No behavior change, so no new test; existing `WorkerAlarmRpcDispatcherTests` continue to pass and the project builds warning-free under `TreatWarningsAsErrors`.
### Server-012
| Field | Value |
|---|---|
| Severity | Low |
| Category | Documentation & comments |
| Location | `CLAUDE.md` (Authentication section and `apikey create` example) |
| Status | Resolved |
**Description:** CLAUDE.md describes scopes as `session`, `invoke`, `event`, `metadata`, `admin` and shows `apikey create --scopes session,invoke,event,metadata,admin`. The actual canonical scope strings (used by `GatewayScopes`, `GatewayGrpcScopeResolver`, and `docs/Authorization.md`) are `session:open`, `session:close`, `invoke:read`, `invoke:write`, `invoke:secure`, `events:read`, `metadata:read`, `admin`. A key created per the CLAUDE.md example carries scopes the resolver never matches.
**Recommendation:** Update CLAUDE.md's scope list and the `apikey` example to the canonical `*:*` scope strings, per CLAUDE.md's own rule that docs change with the code.
**Resolution:** Resolved 2026-05-18. Confirmed against `GatewayScopes` (`session:open`, `session:close`, `invoke:read`, `invoke:write`, `invoke:secure`, `events:read`, `metadata:read`, `admin`). CLAUDE.md's Build/Test/Run `apikey create` example and the Authentication-section scope list were both updated to the canonical `*:*` strings. (Note: since finding Server-004 was resolved, the old example would now be actively rejected at create time rather than silently creating an unusable key, making the doc correction load-bearing.) Pure documentation change; no test.
### Server-013
| Field | Value |
|---|---|
| Severity | Low |
| Category | Testing coverage |
| Location | `src/MxGateway.Tests/Gateway/Dashboard/DashboardAuthorizationHandlerTests.cs`, `src/MxGateway.Tests/Gateway/GatewayApplicationTests.cs` |
| Status | Resolved |
**Description:** `DashboardAuthorizationHandler` is unit-tested in isolation, but no test exercises the dashboard routes end-to-end to confirm the policy is actually enforced — which is why Server-001 (policy registered but never wired) went uncaught. There are also no tests for `WorkerExecutableValidator` (PE-header architecture parsing), `GalaxyGlobMatcher` (anchoring/escaping/empty-glob fail-open), or `GalaxyHierarchyProjector` pagination/page-token behavior.
**Recommendation:** Add a `WebApplicationFactory` integration test that requests a dashboard page unauthenticated and asserts the redirect/401, plus unit tests for `WorkerExecutableValidator`, `GalaxyGlobMatcher`, and projector paging.
**Resolution:** Resolved 2026-05-18. Re-triaged against the current test suite: three of the four named gaps were already closed. (1) The dashboard route-level enforcement test exists — `GatewayApplicationTests.Build_WhenDashboardEnabled_ComponentRoutesRequireAuthorization` (and `..._AuthEndpointsAllowAnonymousAccess`), added when Server-001 was fixed. (2) `GalaxyGlobMatcher` anchoring/escaping/empty-glob behavior is covered by `GalaxyFilterInputSafetyTests` (`GlobMatcher_TreatsSqlMetacharactersAsLiterals`, `GlobMatcher_DoesNotTreatLikeWildcardsAsWildcards`, `GlobMatcher_WithPathologicalInput_DoesNotHang`), now extended with `GlobMatcher_RepeatedAndInterleavedPatterns_StayCorrect`. (3) Projector pagination/page-token behavior is covered end-to-end by `GalaxyRepositoryGrpcServiceTests` and now directly by the new `GalaxyHierarchyProjectorTests`. The one genuine remaining gap — `WorkerExecutableValidator` PE-header parsing — was closed with the new `WorkerExecutableValidatorTests` (7 cases: matching/mismatched x86 and x64, missing `MZ` header, file too small, missing `PE` signature), exercising the validator against synthesized minimal PE fixtures.
### Server-014
| Field | Value |
|---|---|
| Severity | Low |
| Category | Documentation & comments |
| Location | `src/MxGateway.Server/Grpc/MxAccessGatewayService.cs:162-171,191-198,206-214,229-237` |
| Status | Resolved |
**Description:** The XML `<remarks>` and inline comments on `AcknowledgeAlarm` and `QueryActiveAlarms` describe the alarm path as not yet wired and say `NotWiredAlarmRpcDispatcher` is the default ("Clients calling this method today receive an OK reply with a 'worker alarm path not yet wired' diagnostic", "an empty stream until PR A.2"). In fact `SessionServiceCollectionExtensions.AddGatewaySessions` registers `WorkerAlarmRpcDispatcher` as `IAlarmRpcDispatcher`, so DI always injects the production dispatcher; `NotWiredAlarmRpcDispatcher` is only the null fallback. The comments are stale and misleading.
**Recommendation:** Update the `AcknowledgeAlarm`/`QueryActiveAlarms` remarks to reflect that `WorkerAlarmRpcDispatcher` is the wired default, and describe its actual GUID-vs-`Provider!Group.Tag` handling.
**Resolution:** Resolved 2026-05-18. Confirmed against source: `SessionServiceCollectionExtensions` registers `WorkerAlarmRpcDispatcher` as `IAlarmRpcDispatcher`, so the "not yet wired" / "empty stream until PR A.2" / "PR A.6/A.7 follow-up" prose in the `AcknowledgeAlarm` and `QueryActiveAlarms` `<remarks>` and inline comments was stale. Rewrote both `<remarks>` blocks and both inline comments to state that DI binds the production `WorkerAlarmRpcDispatcher`, that it routes over the worker pipe IPC, and that `AcknowledgeAlarm` handles a canonical-GUID reference (→ `AcknowledgeAlarmCommand`) and a `Provider!Group.Tag` reference (→ `AcknowledgeAlarmByNameCommand`), with `NotWiredAlarmRpcDispatcher` being only the null fallback. The matching stale `WorkerAlarmRpcDispatcher` class-level XML doc was corrected as part of Server-011. Pure documentation/comment change; no test.
+213
View File
@@ -0,0 +1,213 @@
# Code Review — Tests
| Field | Value |
|---|---|
| Module | `src/MxGateway.Tests` |
| Reviewer | Claude Code |
| Review date | 2026-05-18 |
| Commit reviewed | `6c64030` |
| Status | Reviewed |
| Open findings | 0 |
## Checklist coverage
| # | Category | Result |
|---|---|---|
| 1 | Correctness & logic bugs | Issue found: Tests-001 (`FakeSessionManager.TryGetSession` always returns true), Tests-011 (unobserved worker task). |
| 2 | mxaccessgw conventions | FakeWorkerHarness used per docs; no real secrets; minor style drift in three alarm-test files (Tests-008). |
| 3 | Concurrency & thread safety | Issues found: Tests-006 (`Task.Delay`-based timing), Tests-012 (no parallelism guard for `WebApplication` tests). |
| 4 | Error handling & resilience | Strong — timeouts, faults, overflow, kill paths, protocol violations all exercised. No issues found. |
| 5 | Security | Issues found: Tests-002 (no SQL-injection coverage of Galaxy RPCs), Tests-010 (anonymous-localhost negative cases untested). |
| 6 | Performance & resource management | Issue found: Tests-003 (temp DB/worker directories never cleaned up). |
| 7 | Design-document adherence | Tests match `docs/GatewayTesting.md`; no drift found. No issues found. |
| 8 | Code organization & conventions | Issue found: Tests-007 (`TestServerCallContext` copy-pasted into 4+ files). |
| 9 | Testing coverage | Issues found: Tests-001, Tests-004 (no end-to-end interceptor+service test), Tests-005 (no worker-crash-mid-command coverage), Tests-002. |
| 10 | Documentation & comments | Issue found: Tests-009 (stale/mismatched XML `<summary>` comments). |
## Findings
### Tests-001
| Field | Value |
|---|---|
| Severity | High |
| Category | Testing coverage |
| Location | `src/MxGateway.Tests/Gateway/Grpc/MxAccessGatewayServiceTests.cs:483-489` |
| Status | Resolved |
**Description:** `FakeSessionManager.TryGetSession` unconditionally returns `true` and synthesizes a session for any id. As a result, `Invoke_WhenSessionMissing_ThrowsNotFound` (line 52) only passes because `InvokeException` is pre-seeded — it does not verify that the gateway service maps a genuinely missing session to `NotFound`. No test exercises the real gateway path where `TryGetSession` returns `false` (for `StreamEvents`, `CloseSession`, alarm RPCs). A regression dropping the missing-session check would not be caught.
**Recommendation:** Make `FakeSessionManager.TryGetSession` return `false` for unknown ids (return only seeded sessions), then assert `NotFound`/`InvalidArgument` is produced by the service's own lookup logic rather than an injected exception.
**Resolution:** Resolved 2026-05-18: confirmed root cause — added `ResolveOnlySeededSessions`/`SeedSession` to `FakeSessionManager` so `TryGetSession` returns `false` for unseeded ids, rewrote `Invoke_WhenSessionMissing_ThrowsNotFound` to drop the injected `InvokeException` and exercise the service's own `ResolveSession` lookup (asserts `InvokeCount == 0`), and added `Invoke_WhenSessionSeeded_ResolvesAndInvokes`, `AcknowledgeAlarm_WhenSessionMissing_ThrowsNotFound`, and `QueryActiveAlarms_WhenSessionMissing_ThrowsNotFound`.
### Tests-002
| Field | Value |
|---|---|
| Severity | High |
| Category | Security |
| Location | `src/MxGateway.Tests/Gateway/Grpc/GalaxyRepositoryGrpcServiceTests.cs:198-210` |
| Status | Resolved |
**Description:** The Galaxy Repository RPCs browse a SQL Server database (`ZB`). Every test injects a `StubGalaxyHierarchyCache`, so actual SQL query construction, parameterization, and filter/glob translation are never exercised. No test demonstrates that `TagNameGlob`, `RootTagName`, `AlarmFilterPrefix`, etc. are passed as parameters rather than concatenated into SQL. SQL-injection resistance of the Galaxy layer has zero coverage.
**Recommendation:** Add tests for the `GalaxyRepository` query-building layer (against SQLite or an in-memory abstraction, or by asserting parameter objects), covering glob/prefix inputs containing `'`, `%`, `_`, and `;`. At minimum add a unit test over the SQL `LIKE`-pattern escaping helper.
**Re-triage note:** The finding's premise is partly misframed. `GalaxyRepository` issues only four *constant* SQL statements (`HierarchySql`, `AttributesSql`, `SELECT 1`, `SELECT time_of_last_deploy FROM galaxy`) — no `DiscoverHierarchyRequest` field is ever concatenated into SQL, so there is no dynamic SQL-injection surface and no `LIKE`-escaping helper to test. `AlarmFilterPrefix` belongs to the worker alarm path, not the Galaxy SQL layer. All filters (`TagNameGlob`, `RootTagName`, template-chain, category, contained-path) are applied **in memory** by `GalaxyHierarchyProjector`/`GalaxyGlobMatcher` against the cached snapshot. The genuine, testable concern — that adversarial filter strings are treated as opaque literals (no wildcard behaviour, no ReDoS, no exceptions) — remains valid and was previously uncovered. Severity left at High: an unsafe in-memory filter would still be a real security gap.
**Resolution:** Resolved 2026-05-18: added `src/MxGateway.Tests/Galaxy/GalaxyFilterInputSafetyTests.cs` (10 test methods, mostly `[Theory]` over adversarial inputs `'`, `' OR '1'='1`, `'; DROP TABLE gobject;--`, `%`, `_`, `100%_off`, `[abc]`, `Pump'001`) covering `GalaxyGlobMatcher` literal-treatment / `LIKE`-wildcard / pathological-input (ReDoS) behaviour and `GalaxyHierarchyProjector` + `DiscoverHierarchy` RPC handling of adversarial `TagNameGlob`, `RootTagName`, and `TemplateChainContains`. No product bug found — the in-memory filter layer treats all metacharacters as literals; the passing tests resolve the coverage gap.
### Tests-003
| Field | Value |
|---|---|
| Severity | Medium |
| Category | Performance & resource management |
| Location | `src/MxGateway.Tests/Security/Authentication/SqliteAuthStoreTests.cs:170-176`, `src/MxGateway.Tests/Security/Authentication/ApiKeyAdminCliRunnerTests.cs:252-258` |
| Status | Resolved |
**Description:** `CreateTempDatabasePath` creates a fresh directory under `%TEMP%\mxgateway-auth-tests\<guid>` (and `...-cli-tests`) for every test but nothing ever deletes it. `WorkerProcessLauncherTests.TestDirectory` correctly implements `IDisposable` and cleans up; these two do not. SQLite connection pooling can also keep the `.db` handle open after the test. Over many CI runs this leaks temp files and open handles.
**Recommendation:** Wrap the temp directory in an `IDisposable`/`IAsyncDisposable` helper (as `WorkerProcessLauncherTests` does) and call `SqliteConnection.ClearAllPools()` before deletion, or use `Microsoft.Data.Sqlite` in-memory mode where a real file is not needed.
**Resolution:** Resolved 2026-05-18: confirmed root cause — both `CreateTempDatabasePath` helpers created `%TEMP%` directories with no cleanup, and `Microsoft.Data.Sqlite` pools connections by default so the `.db` handle outlives the test. Added a shared `TempDatabaseDirectory` (`src/MxGateway.Tests/Security/Authentication/TempDatabaseDirectory.cs`) `IDisposable` helper that calls `SqliteConnection.ClearAllPools()` and recursively deletes its directory. `SqliteAuthStoreTests` and `ApiKeyAdminCliRunnerTests` now implement `IDisposable`, track every directory created via `CreateTempDatabasePath`, and dispose them after each test. All affected tests still pass.
### Tests-004
| Field | Value |
|---|---|
| Severity | Medium |
| Category | Testing coverage |
| Location | `src/MxGateway.Tests/Security/Authorization/GatewayGrpcAuthorizationInterceptorTests.cs` |
| Status | Resolved |
**Description:** The authorization interceptor and `MxAccessGatewayService` are each tested in isolation, but no test composes the interceptor in front of the real service to confirm scope enforcement gates real RPCs end-to-end. A wiring mistake — interceptor not registered, or a new RPC added without a scope mapping in `GatewayGrpcScopeResolver` — would pass every existing test. `GatewayGrpcScopeResolverTests` also only checks an enumerated allow-list; it never asserts an unmapped request type fails closed.
**Recommendation:** Add an end-to-end test that runs `OpenSession`/`Invoke` through the interceptor+service composition with insufficient scope and asserts `PermissionDenied`; add a `GatewayGrpcScopeResolver` test asserting an unknown/unmapped request type throws or denies rather than returning a permissive default.
**Resolution:** Resolved 2026-05-18: confirmed the coverage gap. Added three interceptor+service composition tests to `GatewayGrpcAuthorizationInterceptorTests` that run the real `GatewayGrpcAuthorizationInterceptor` continuation into a real `MxAccessGatewayService`: `InterceptorComposedWithService_OpenSessionMissingScope_DeniesBeforeServiceRuns` (asserts `PermissionDenied` and `OpenSessionCount == 0`), `InterceptorComposedWithService_OpenSessionWithScope_RunsServiceWithIdentity` (service runs and observes the interceptor-pushed identity), and `InterceptorComposedWithService_InvokeWriteCommandWithReadScope_DeniesBeforeServiceRuns` (a `Write` command with only `invoke:read` is denied). Added two `GatewayGrpcScopeResolverTests`: `ResolveRequiredScope_UnmappedRequestType_FailsClosedToAdminScope` confirms an unmapped request type resolves to the most-restrictive `Admin` scope (the resolver's `_ => GatewayScopes.Admin` default already fails closed — no product bug), and `ResolveRequiredScope_UnknownInvokeCommandKind_ReturnsInvokeReadScope` confirms an unknown command kind does not silently grant write/admin access.
### Tests-005
| Field | Value |
|---|---|
| Severity | Medium |
| Category | Testing coverage |
| Location | `src/MxGateway.Tests/Gateway/Grpc/EventStreamServiceTests.cs:239-261`, `src/MxGateway.Tests/Gateway/Sessions/SessionManagerTests.cs` |
| Status | Resolved |
**Description:** Worker-crash handling is only tested as a clean terminal exception from `ReadEventsAsync` or a pre-set `ShutdownException`. There is no test for a worker that faults mid-command — an `InvokeAsync` in flight when the pipe/worker dies — which is a core fault-handling path of the two-process design. `WorkerClientTests` covers pipe-disconnect faulting the read loop, but not the interaction where a pending `InvokeAsync` task observes the fault and surfaces a meaningful error code.
**Recommendation:** Add a `WorkerClient`/`SessionManager` test that disposes the worker pipe (or emits a `WorkerFault`) while an `InvokeAsync` is pending, and assert the invoke task fails with a `WorkerClientException`/`SessionManagerException` carrying the worker-faulted error code.
**Resolution:** Resolved 2026-05-18: confirmed the coverage gap and confirmed the product path already handles it correctly (`WorkerClient.ReadLoopAsync``SetFaulted``CompletePendingCommands(fault)` fails every pending command with the fault exception). Added two `WorkerClientTests`: `InvokeAsync_WhenPipeDisconnectsMidCommand_FailsPendingInvokeWithPipeDisconnected` (worker reads the command then disposes its pipe side; the pending invoke task fails with `WorkerClientErrorCode.PipeDisconnected`) and `InvokeAsync_WhenWorkerFaultsMidCommand_FailsPendingInvokeWithWorkerFaulted` (worker emits a `WorkerFault` envelope while the invoke is pending; the task fails with `WorkerClientErrorCode.WorkerFaulted`). Both also assert the client transitions to `Faulted`. No product change needed.
### Tests-006
| Field | Value |
|---|---|
| Severity | Medium |
| Category | Concurrency & thread safety |
| Location | `src/MxGateway.Tests/Gateway/Workers/WorkerClientTests.cs:76`, `src/MxGateway.Tests/Gateway/Workers/FakeWorkerHarnessTests.cs:122` |
| Status | Resolved |
**Description:** Several tests rely on fixed `Task.Delay` values: `WorkerClientTests.InvokeAsync_WithLateReply…` waits a hard-coded 50 ms after writing a late reply before issuing the second command, and the heartbeat tests use a 20 ms delay to make timestamps strictly increase. On a slow CI agent the 50 ms delay can be insufficient, and `DateTimeOffset.UtcNow` resolution can make the 20 ms heartbeat-advance assertion flaky.
**Recommendation:** Replace fixed delays with the existing `WaitUntilAsync` condition polling, and inject a controllable `TimeProvider` for heartbeat-timestamp comparisons instead of relying on wall-clock advance.
**Re-triage note:** The brief flagged `ReadLoop_WhenClientFaults_KillsOwnedWorkerProcess` as "a real `WorkerClient` fault→kill bug". On inspection it is **not a product bug** — it is a test race. `WorkerClient.SetFaulted` publishes the `Faulted` state under lock *before* calling `KillOwnedProcess`, so the old test's `WaitUntilAsync(() => client.State == Faulted)` could return between those two statements and observe `process.KillCount == 0`. The kill itself always runs synchronously inside `SetFaulted`, and `ShutdownAsync`/`DisposeAsync` re-issue an idempotent kill, so no real consumer relies on "state==Faulted implies process dead". The fix is therefore a test-quality fix (correctly Medium / Concurrency), not a product fix.
**Resolution:** Resolved 2026-05-18: (1) Made `ReadLoop_WhenClientFaults_KillsOwnedWorkerProcess` deterministic — it now `await`s `FakeWorkerProcess.WaitForExitAsync` (the `TaskCompletionSource` completed inside `Kill()`), which completes exactly when the kill runs, eliminating the state-polling race; verified by running it five times in isolation (5/5 pass). (2) Removed the fixed 50 ms `Task.Delay` from `InvokeAsync_WithLateReply_IgnoresLateReplyAndKeepsClientReady` — the stale reply and the second reply are now sent in pipe (FIFO) order, so the read loop discards the stale reply before the second reply with no timing window. (3) Replaced the 20 ms `Task.Delay` heartbeat-advance hacks in `WorkerClientTests.ReadLoop_WhenHeartbeatArrives_UpdatesLastHeartbeatAndWorkerProcess` and `FakeWorkerHarnessTests.SendHeartbeatAsync_UpdatesClientHeartbeatState` with an injected `ManualTimeProvider` advanced by a fixed `TimeSpan`; both tests now assert the exact post-advance timestamp instead of `>` against wall-clock drift.
### Tests-007
| Field | Value |
|---|---|
| Severity | Low |
| Category | Code organization & conventions |
| Location | `src/MxGateway.Tests/Gateway/Grpc/MxAccessGatewayServiceTests.cs:682`, `src/MxGateway.Tests/Gateway/Grpc/GalaxyRepositoryGrpcServiceTests.cs:324`, `src/MxGateway.Tests/Gateway/GatewayEndToEndFakeWorkerSmokeTests.cs:460`, `src/MxGateway.Tests/Security/Authorization/GatewayGrpcAuthorizationInterceptorTests.cs:233` |
| Status | Resolved |
**Description:** A near-identical `TestServerCallContext` implementation is copy-pasted into at least four test files (and `AllowAllConstraintEnforcer` / `TestServerStreamWriter` / `RecordingStreamWriter` into several). Duplication risks the copies drifting and bloats each file.
**Recommendation:** Extract a shared `TestServerCallContext`, `RecordingServerStreamWriter<T>`, and `AllowAllConstraintEnforcer` into a common test-support folder/namespace.
**Resolution:** Resolved 2026-05-18: confirmed five duplicated copies (the brief's four plus a fifth in `Galaxy/GalaxyFilterInputSafetyTests.cs`). Added a shared `MxGateway.Tests.TestSupport` namespace under `src/MxGateway.Tests/TestSupport/`: `TestServerCallContext.cs` (single class with an optional `Metadata? requestHeaders` constructor parameter that subsumes both the no-arg and headers-bearing variants), `RecordingServerStreamWriter.cs` (thread-safe writer with `Messages` and `WaitForFirstMessageAsync`, replacing `TestServerStreamWriter`/`RecordingStreamWriter`/`RecordingServerStreamWriter`), and `AllowAllConstraintEnforcer.cs`. Deleted all five `TestServerCallContext` copies, both `AllowAllConstraintEnforcer` copies, and the three stream-writer copies; updated the five test files to `using MxGateway.Tests.TestSupport;` and renamed `.Items` call sites to `.Messages`. Removed the now-unused `Grpc.Core` using from `GatewayEndToEndFakeWorkerSmokeTests.cs`. Build clean (0 warnings) and suite green.
### Tests-008
| Field | Value |
|---|---|
| Severity | Low |
| Category | mxaccessgw conventions |
| Location | `src/MxGateway.Tests/Gateway/Sessions/WorkerAlarmRpcDispatcherTests.cs:1-9`, `src/MxGateway.Tests/Gateway/Sessions/NotWiredAlarmRpcDispatcherTests.cs:1-3`, `src/MxGateway.Tests/Gateway/Sessions/SessionManagerAlarmAutoSubscribeTests.cs:1` |
| Status | Resolved |
**Description:** The alarm test files diverge from the project's C# style and the rest of the suite: snake_case test method names instead of the PascalCase `Method_Condition_Result` pattern; redundant explicit `using System;`/`System.Threading;` imports despite implicit global usings; and explicit-type `new` instead of target-typed `new()` used elsewhere. There is also a typo in fixture data (`"wnwrap subscribe failed"`).
**Recommendation:** Rename the alarm tests to the house `Method_Condition_Result` convention, drop redundant `System.*` usings, align `new` usage, and fix the `wnwrap` typo.
**Re-triage note:** Two of the finding's claims are incorrect. (1) `"wnwrap subscribe failed"` is **not a typo**`WnWrap` is the real name of the worker's `WnWrapAlarmConsumer` MXAccess component (`src/MxGateway.Worker/MxAccess/WnWrapAlarmConsumer.cs`); the fixture string deliberately references it, so it was left unchanged. (2) `SessionManagerAlarmAutoSubscribeTests.cs` already uses PascalCase `Method_Condition_Result` names and target-typed `new()`, and its lone `using System.Runtime.CompilerServices;` is **required** for `[EnumeratorCancellation]` (not a global using) — it is not redundant. That file needed no change. The genuine style drift was confined to `WorkerAlarmRpcDispatcherTests.cs` and `NotWiredAlarmRpcDispatcherTests.cs`.
**Resolution:** Resolved 2026-05-18: renamed all ten `WorkerAlarmRpcDispatcherTests` methods and both `NotWiredAlarmRpcDispatcherTests` methods from snake_case to the house `Method_Condition_Result` PascalCase convention; dropped the redundant `System`/`System.Collections.Generic`/`System.Linq`/`System.Threading`/`System.Threading.Tasks` usings from `WorkerAlarmRpcDispatcherTests.cs` and `System.Threading`/`System.Threading.Tasks` from `NotWiredAlarmRpcDispatcherTests.cs` (all are implicit global usings), keeping the required `System.Runtime.CompilerServices`; converted explicit-type `new SessionRegistry()`/`new WorkerAlarmRpcDispatcher(...)`/`new FakeAlarmWorkerClient`/`new List<...>()`/`new GatewaySession(...)` to target-typed `new()`; and replaced the fully-qualified `System.StringComparison` with `StringComparison`. See the re-triage note for the two claims not actioned. Suite green.
### Tests-009
| Field | Value |
|---|---|
| Severity | Low |
| Category | Documentation & comments |
| Location | `src/MxGateway.Tests/Gateway/Sessions/SessionManagerTests.cs:36-37,99,365` |
| Status | Resolved |
**Description:** Several XML `<summary>` comments are copy-paste mismatches: the comment above `OpenSessionAsync_SetsInitialDefaultLease` describes correlation-ID generation; the comment above `GatewaySessionSubscribeBulkAsync_ForwardsOneBulkCommand…` describes lease refresh; the comment above `CloseExpiredLeasesAsync_DoesNotCloseActiveEventSubscriber` describes shutdown closing all sessions. Misleading test docs hinder triage.
**Recommendation:** Correct the `<summary>` text to match each test's actual behavior, or remove the redundant comments since the test names already describe the behavior.
**Resolution:** Resolved 2026-05-18: confirmed three copy-paste `<summary>` mismatches. The mislabelled comments were the summaries of the *following* tests left attached to the wrong method (the test below each then had no summary). Corrected all three: `OpenSessionAsync_SetsInitialDefaultLease` now describes setting the initial lease expiry; the comment above `InvokeAsync_WhenSessionReady_RefreshesLease` (the finding mis-cited the method name as `GatewaySessionSubscribeBulkAsync_…`) now describes lease refresh on invoke; and `CloseExpiredLeasesAsync_DoesNotCloseActiveEventSubscriber` now describes the expired-lease sweep leaving an active-event-subscriber session open. No behavior change.
### Tests-010
| Field | Value |
|---|---|
| Severity | Low |
| Category | Security |
| Location | `src/MxGateway.Tests/Gateway/Dashboard/DashboardAuthorizationHandlerTests.cs:26-36` |
| Status | Resolved |
**Description:** The anonymous-localhost bypass is tested only for the success case (`allowAnonymousLocalhost: true` + loopback succeeds) and the remote-unauthenticated denial. There is no test for the security-critical negatives: anonymous + loopback when `AllowAnonymousLocalhost` is `false` must be denied, and anonymous + non-loopback when the flag is `true` must still be denied (the bypass is scoped strictly to loopback). Those are the misconfiguration cases that would expose the dashboard.
**Recommendation:** Add tests: anonymous + loopback + `allowAnonymousLocalhost: false` → not succeeded; anonymous + non-loopback + `allowAnonymousLocalhost: true` → not succeeded.
**Resolution:** Resolved 2026-05-18: confirmed the coverage gap and confirmed `DashboardAuthorizationHandler` already gates the bypass correctly on `AllowAnonymousLocalhost && IsLoopbackRequest()` (no product bug). Added two `DashboardAuthorizationHandlerTests`: `HandleAsync_AnonymousLocalhostDisallowed_DoesNotSucceed` (anonymous + loopback + `allowAnonymousLocalhost: false` → not succeeded) and `HandleAsync_AnonymousLocalhostAllowedFromRemoteAddress_DoesNotSucceed` (anonymous + non-loopback + `allowAnonymousLocalhost: true` → not succeeded, proving the bypass stays scoped to loopback). Both pass.
### Tests-011
| Field | Value |
|---|---|
| Severity | Low |
| Category | Correctness & logic bugs |
| Location | `src/MxGateway.Tests/Gateway/GatewayEndToEndFakeWorkerSmokeTests.cs:233-301` |
| Status | Resolved |
**Description:** `GatewayEndToEndFakeWorkerSmokeTests` correctly stores and awaits `launcher.WorkerTask`, but `SessionWorkerClientFactoryFakeWorkerTests` uses `_ = RunWorkerAsync(...)` with no stored task (lines 152, 184, 220). An unhandled exception in the scripted worker becomes an unobserved `TaskException` that can surface as a process-level failure in an unrelated later test rather than failing the owning test.
**Recommendation:** Store the worker task and either await it during disposal or attach a continuation that fails the test on fault, mirroring `GatewayEndToEndFakeWorkerSmokeTests`.
**Resolution:** Resolved 2026-05-18: confirmed all three scripted launchers in `SessionWorkerClientFactoryFakeWorkerTests` discarded the worker task. Added an `IWorkerTaskLauncher` interface (each launcher now stores its scripted task in a `WorkerTask` property and exposes `ObserveWorkerTaskAsync`); the test class now implements `IAsyncDisposable`, tracks every launcher it creates via a `Track` helper, and in `DisposeAsync` awaits each `WorkerTask` (within `TestTimeout`) so a scripted-worker fault fails the owning test instead of leaking as an unobserved `TaskScheduler.UnobservedTaskException`. `OperationCanceledException` and `IOException` — the expected outcomes of the worker client tearing the pipe down — are swallowed; anything else rethrows. `NeverReadyWorkerProcessLauncher` (which parks on an infinite `Task.Delay`) was given its own `CancellationTokenSource` so disposal can cancel and observe the parked task. Suite green.
### Tests-012
| Field | Value |
|---|---|
| Severity | Low |
| Category | Concurrency & thread safety |
| Location | `src/MxGateway.Tests/Gateway/Workers/Fakes/FakeWorkerHarness.cs:62`, `src/MxGateway.Tests/Gateway/Workers/WorkerClientTests.cs:472` |
| Status | Resolved |
**Description:** Pipe names are uniquified per test with a GUID (good), but xUnit runs test classes in parallel by default and there is no `xunit.runner.json` or collection configuration. Tests that build a full `WebApplication` bind ephemeral ports (`--urls=http://127.0.0.1:0`, fine) but spin up DI containers and hosted services concurrently. Currently safe, but a future test binding a fixed port would silently collide.
**Recommendation:** Add an `xunit.runner.json` or a collection grouping the `WebApplication`-building tests, and keep the `:0` ephemeral-port convention explicit so future tests do not introduce a fixed-port collision.
**Resolution:** Resolved 2026-05-18: added `src/MxGateway.Tests/xunit.runner.json` making the parallelism policy explicit (`parallelizeTestCollections: true`, `maxParallelThreads: -1`, `parallelizeAssembly: false`, `longRunningTestSeconds: 30`) and wired it into `MxGateway.Tests.csproj` as `<None Update="xunit.runner.json" CopyToOutputDirectory="PreserveNewest" />` so the runner picks it up (confirmed present in `bin/Debug/net10.0/`). Added a comment at the only `WebApplication`-building call site (`GatewayApplicationTests.cs`, `--urls=http://127.0.0.1:0`) documenting that the ephemeral-port (`:0`) convention is mandatory because test collections run in parallel. No fixed-port binding exists today; this is a preventative guardrail as the finding recommends.
+252
View File
@@ -0,0 +1,252 @@
# Code Review — Worker.Tests
| Field | Value |
|---|---|
| Module | `src/MxGateway.Worker.Tests` |
| Reviewer | Claude Code |
| Review date | 2026-05-18 |
| Commit reviewed | `6c64030` |
| Status | Reviewed |
| Open findings | 0 |
## Checklist coverage
| # | Category | Result |
|---|---|---|
| 1 | Correctness & logic bugs | Issues found: Worker.Tests-010 (weak substring assertion), Worker.Tests-011 (test name overstates what it proves). |
| 2 | mxaccessgw conventions | Tests respect STA-affinity and the WorkerEnvelope frame protocol; naming-convention drift only (Worker.Tests-009). |
| 3 | Concurrency & thread safety | Issues found: Worker.Tests-003/004/013 (wall-clock and fixed-delay timing assertions). |
| 4 | Error handling & resilience | COMException/HResult, pipe-never-appears, malformed frames, shutdown-during-command, watchdog all covered; queue branch gap (Worker.Tests-015). |
| 5 | Security | No real secrets; redaction explicitly tested. No issues found. |
| 6 | Performance & resource management | Issues found: Worker.Tests-005 (`MemoryStream` not disposed), Worker.Tests-006 (`MxAccessStaSession` leak on assertion failure). |
| 7 | Design-document adherence | Tests match `docs/Worker*.md`; `docs/WorkerFrameProtocol.md` is stale (Worker.Tests-007). |
| 8 | Code organization & conventions | Issues found: Worker.Tests-009 (two naming conventions), Worker.Tests-014 (duplicated test doubles). |
| 9 | Testing coverage | Issues found: Worker.Tests-001 (`StaMessagePump` untested), Worker.Tests-002 (COM-event delivery untested), Worker.Tests-012 (frame-validation gaps). |
| 10 | Documentation & comments | Issues found: Worker.Tests-008 (misplaced redaction test), Worker.Tests-011 (misleading test name). |
## Findings
### Worker.Tests-001
| Field | Value |
|---|---|
| Severity | High |
| Category | Testing coverage |
| Location | `src/MxGateway.Worker.Tests/Sta/` (no `StaMessagePumpTests.cs`) |
| Status | Resolved |
**Description:** `StaMessagePump` — whose entire reason for existing is pumping Windows messages so MXAccess COM event sink calls deliver onto the STA — has no direct unit test. `WaitForWorkOrMessages` (timeout conversion, the `MsgWaitForMultipleObjectsEx` failure path) and `PumpPendingMessages` (drain count) are exercised only indirectly via `StaRuntime`, which never asserts the pump returns/throws correctly. The `MsgWaitFailed` error branch and `ToTimeoutMilliseconds` edge cases (`InfiniteTimeSpan`, `<= Zero`, `>= uint.MaxValue`) are completely uncovered.
**Recommendation:** Add `StaMessagePumpTests` that post a Windows message to the STA thread and assert `PumpPendingMessages` returns the expected count; cover `WaitForWorkOrMessages` waking on a signaled event vs timeout; cover `ToTimeoutMilliseconds` boundaries through an internals-visible seam.
**Resolution:** 2026-05-18 — Added `src/MxGateway.Worker.Tests/Sta/StaMessagePumpTests.cs` (8 `[Fact]` tests, run on dedicated STA threads). Covers `WaitForWorkOrMessages` null-argument validation, returning immediately when the wake event is pre-signalled, waking when the event is signalled mid-wait, returning on timeout when never signalled, the `TimeSpan.Zero` (`<= Zero`) conversion branch, and waking on a `WM_NULL` Windows message posted to the STA thread (the `QS_ALLINPUT` path). `PumpPendingMessages` is covered for both an empty queue (returns 0) and three posted messages (returns 3). Boundary noted in the file: the `MsgWaitFailed` branch is not exercised because forcing `MsgWaitForMultipleObjectsEx` to fail needs a deliberately invalid native handle, which is unsafe to construct in-process; `ToTimeoutMilliseconds` is `private static` and is covered indirectly through wait-latency assertions rather than reflection.
### Worker.Tests-002
| Field | Value |
|---|---|
| Severity | High |
| Category | Testing coverage |
| Location | `src/MxGateway.Worker.Tests/MxAccess/MxAccessStaSessionTests.cs`, `src/MxGateway.Worker.Tests/MxAccess/MxAccessEventMapperTests.cs` |
| Status | Resolved |
**Description:** No test verifies that a COM event raised on the STA thread is converted to protobuf and lands in the `MxAccessEventQueue`. `MxAccessEventMapperTests` exercises the mapper directly with hand-built fakes, and `AlarmDispatcherTests` covers the alarm sink, but the non-alarm COM-event path (`MxAccessBaseEventSink`/`MxAccessComServer` event handlers → `MxAccessEventMapper` → queue, triggered by an actual sink callback) is never end-to-end tested. Given the worker's core purpose is to convert COM events to protobuf, this is a significant gap.
**Recommendation:** Add a test that invokes the base event sink's data-change handler (via an internal seam or a fake COM event source) and asserts a converted `WorkerEvent` with correct family/sequence appears in the queue.
**Resolution:** 2026-05-18 — Added `src/MxGateway.Worker.Tests/MxAccess/MxAccessBaseEventSinkTests.cs` (5 `[Fact]` tests). The four `MxAccessBaseEventSink` COM event handlers (`OnDataChange`, `OnWriteComplete`, `OperationComplete`, `OnBufferedDataChange`) — the exact delegate targets the MXAccess COM runtime invokes — were widened from `private` to `internal` (with XML-doc notes that this is a unit-test seam), and `[assembly: InternalsVisibleTo("MxGateway.Worker.Tests")]` was added to `MxGateway.Worker.csproj`. The tests construct a real `MxAccessBaseEventSink` over a real `MxAccessEventMapper` and `MxAccessEventQueue`, invoke each handler with COM-style arguments, and assert a correctly-converted protobuf `WorkerEvent` (family, body case, server/item handle, value, quality, source timestamp, monotonic `WorkerSequence`) lands in the queue. Boundary noted in the file: the COM `+=` wire-up in `Attach`/`Detach` casts to the sealed `LMXProxyServerClass` RCW and cannot run without a live MXAccess COM object, so it is not exercised; invoking the handlers directly reproduces an STA-thread COM callback and exercises the genuine conversion + enqueue path.
### Worker.Tests-003
| Field | Value |
|---|---|
| Severity | Medium |
| Category | Concurrency & thread safety |
| Location | `src/MxGateway.Worker.Tests/Sta/StaRuntimeTests.cs:46-48` |
| Status | Resolved |
**Description:** `InvokeAsync_WakesIdlePumpForQueuedCommand` asserts `stopwatch.Elapsed < TimeSpan.FromSeconds(2)` — a wall-clock assertion that on a loaded CI agent can exceed 2s, producing a false failure. The test also does not actually prove the wake event (vs the 50 ms idle pump) caused the dispatch.
**Recommendation:** Remove the wall-clock assertion (the awaited result already proves the command ran), or raise the budget substantially with a comment that it is a coarse smoke check.
**Resolution:** 2026-05-18 — Removed the `Stopwatch` and the `stopwatch.Elapsed < TimeSpan.FromSeconds(2)` wall-clock assertion from `InvokeAsync_WakesIdlePumpForQueuedCommand`. The test already constructs the `StaRuntime` with a 30-second idle pump period, so the awaited `InvokeAsync` completing at all proves the command wake event — not the idle pump tick — drove the dispatch; no timing budget is needed. The XML-doc comment now states this explicitly. The now-unused `using System.Diagnostics;` was removed (`TreatWarningsAsErrors`).
### Worker.Tests-004
| Field | Value |
|---|---|
| Severity | Medium |
| Category | Concurrency & thread safety |
| Location | `src/MxGateway.Worker.Tests/MxAccess/MxAccessStaSessionTests.cs:281-329` |
| Status | Resolved |
**Description:** `StartAsync_WithAlarmCommandHandlerFactory_PollOnceCalledViaSta` and `Dispose_StopsAlarmPollLoop` use poll-until loops, and `Dispose_StopsAlarmPollLoop` additionally does `await Task.Delay(1000)` then asserts `PollCount` is unchanged. The 1s "no further polls" window is a timing race: a poll scheduled just before disposal could increment the counter afterward, and a slow agent could simply not run a poll in the window even without correct stop logic.
**Recommendation:** Make the poll loop deterministically observable — expose a "poll loop stopped" signal or have `Dispose` join the poll task — then assert on that rather than on elapsed-time silence.
**Resolution:** 2026-05-18 — `MxAccessStaSession.Dispose` now joins the alarm poll task (`pollTaskToJoin.Wait(TimeSpan.FromSeconds(5))`) after cancelling the poll CTS, instead of setting `alarmPollTask = null` and discarding it. Once `Dispose` returns, the poll loop has provably exited and no `PollOnce` call can still be in flight. `Dispose_StopsAlarmPollLoop` was rewritten to drop the `await Task.Delay(1000)` "no further polls" window: it now captures `PollCount` immediately after `Dispose()` returns and re-asserts equality after a bare `await Task.Yield()` — a deterministic frozen-count check rather than an elapsed-time race. The success-direction poll-until loop in `PollOnceCalledViaSta` was left as-is: waiting for an event to *occur* is sound; only waiting for an event to *not* occur is the race, and that pattern is now eliminated. Note: `ShutdownGracefullyAsync` already joined the poll task, so this change makes `Dispose` consistent with the graceful path.
### Worker.Tests-005
| Field | Value |
|---|---|
| Severity | Medium |
| Category | Performance & resource management |
| Location | `src/MxGateway.Worker.Tests/Ipc/WorkerFrameProtocolTests.cs:20-31,103-105`, `src/MxGateway.Worker.Tests/Ipc/WorkerPipeSessionTests.cs:28-31` |
| Status | Resolved |
**Description:** `MemoryStream` instances are created and never disposed across the frame-protocol and pipe-session tests (`MemoryStream stream = new();` with no `using`). Disposal is cheap so impact is low, but it is inconsistent with the rest of the suite (which carefully `using`s `CancellationTokenSource`, `StaRuntime`, `PipePair`). `WorkerFrameWriter`/`WorkerFrameReader` are also constructed without disposal.
**Recommendation:** Wrap `MemoryStream` (and reader/writer if they are `IDisposable`) in `using` declarations for consistency.
**Resolution:** 2026-05-18 — All six `MemoryStream` test-body declarations in `WorkerFrameProtocolTests.cs` and the five `inbound`/`outbound` `MemoryStream` declarations in the `WorkerPipeSessionTests.cs` handshake tests were converted to `using` declarations, matching how the rest of the suite handles `CancellationTokenSource`/`StaRuntime`/`PipePair`. Re-triage of the parenthetical: `WorkerFrameWriter` and `WorkerFrameReader` are **not** `IDisposable` (`sealed class` with no `IDisposable` and no `Dispose` member — verified in `src/MxGateway.Worker/Ipc/`), so the finding's "reader/writer if they are `IDisposable`" suggestion does not apply and no change was made there. The shared `MemoryStream` instances inside the `WorkerPipeSessionTests` harness/helper classes (`ReadWrittenFrames` parameter, the `PipePair`/harness fields) are out of the cited line scope and were left untouched.
### Worker.Tests-006
| Field | Value |
|---|---|
| Severity | Medium |
| Category | Performance & resource management |
| Location | `src/MxGateway.Worker.Tests/MxAccess/MxAccessStaSessionTests.cs:282,305,315,323` |
| Status | Resolved |
**Description:** `Dispose_StopsAlarmPollLoop` constructs `MxAccessStaSession session` without `using` (unlike every sibling test) and relies on an explicit `session.Dispose()`. If an assertion between `StartAsync` and `Dispose()` throws, the session — its STA thread and poll loop — leaks for the rest of the run. The `StaRuntime` is `using`d so the thread is eventually reclaimed, but the alarm poll loop and handler are not.
**Recommendation:** Use `using MxAccessStaSession session = ...` and drop the manual `Dispose()`, or wrap the body in try/finally.
**Resolution:** 2026-05-18 — `Dispose_StopsAlarmPollLoop` now declares its `MxAccessStaSession` with a `using` declaration. The manual `session.Dispose()` is kept because the test's purpose is to observe poll behaviour across disposal — but `MxAccessStaSession.Dispose` is idempotent (guarded by the `disposed` field), so the explicit mid-test call and the `using`-scope call do not conflict. An assertion thrown anywhere in the body now still tears the session (STA poll loop + alarm handler) down. The cited line numbers in the finding were imprecise — they straddle `PollOnceCalledViaSta` and `Dispose_StopsAlarmPollLoop` — but the described root cause (one `MxAccessStaSession` constructed without `using`) was singular and is the one in `Dispose_StopsAlarmPollLoop`; the sibling tests `PollOnceCalledViaSta` and `RunAlarmPollLoop_WhenPollOnceThrows_RecordsFaultOnEventQueue` already used `using` and needed no change.
### Worker.Tests-007
| Field | Value |
|---|---|
| Severity | Medium |
| Category | Design-document adherence |
| Location | `docs/WorkerFrameProtocol.md:38-49` |
| Status | Resolved |
**Description:** `docs/WorkerFrameProtocol.md` instructs running `dotnet test src/MxGateway.Tests/MxGateway.Tests.csproj --filter WorkerFrameProtocolTests` and states the frame protocol "is part of `MxGateway.Server`". The frame protocol actually lives in `MxGateway.Worker.Ipc` and is tested by `src/MxGateway.Worker.Tests/Ipc/WorkerFrameProtocolTests.cs`. The doc's verification command points at the wrong project and build, so anyone following it after changing the worker frame protocol will not run the relevant tests.
**Recommendation:** Update `docs/WorkerFrameProtocol.md` to reference `src/MxGateway.Worker.Tests` and the x86 worker build (`-p:Platform=x86`).
**Resolution:** 2026-05-18 — Rewrote the `## Verification` section of `docs/WorkerFrameProtocol.md`. The test command now targets `src/MxGateway.Worker.Tests/MxGateway.Worker.Tests.csproj -p:Platform=x86 --filter WorkerFrameProtocolTests`; the build command now targets `src/MxGateway.Worker/MxGateway.Worker.csproj -p:Platform=x86`. The prose now states the frame protocol lives in `MxGateway.Worker.Ipc` (naming `WorkerFrameReader`/`WorkerFrameWriter`/`WorkerFrameProtocolOptions` and the `WorkerFrameProtocolTests.cs` test file) and notes the worker is an x86 process. Verified against the source: the frame-protocol types are confirmed under `src/MxGateway.Worker/Ipc/` and the tests under `src/MxGateway.Worker.Tests/Ipc/`, so the original doc was wrong on both project and component. Fenced code blocks were also relabelled `powershell` (the build/test commands are run from PowerShell on this Windows dev box).
### Worker.Tests-008
| Field | Value |
|---|---|
| Severity | Low |
| Category | Documentation & comments |
| Location | `src/MxGateway.Worker.Tests/Conversion/VariantConverterTests.cs:175-182` |
| Status | Resolved |
**Description:** `Redactor_WithCredentialBearingValueFields_RedactsBeforeLogging` lives in `VariantConverterTests` but asserts on `WorkerLogRedactor.RedactValue`, which has nothing to do with `VariantConverter`. It is also a near-duplicate of coverage in `WorkerLogRedactorTests`. Placing redaction coverage inside the variant-converter class is misleading.
**Recommendation:** Move this test into `Bootstrap/WorkerLogRedactorTests.cs` (which already exists and tests `RedactFields`).
**Resolution:** 2026-05-18 — The misplaced redaction test was removed from `VariantConverterTests.cs` and re-added to `Bootstrap/WorkerLogRedactorTests.cs` as `RedactValue_WithCredentialBearingFieldNames_ReturnsRedactedValue` — alongside the existing `RedactFields` coverage, where redaction tests belong. Confirmed root cause: the old test asserted only on `WorkerLogRedactor.RedactValue` and never touched `VariantConverter`. The now-orphaned `using MxGateway.Worker.Bootstrap;` was removed from `VariantConverterTests.cs` (`TreatWarningsAsErrors`). The new home is `RedactValue` per-field coverage; `WorkerLogRedactorTests.RedactFields_...` already covers the dictionary path, so the two are complementary rather than duplicates.
### Worker.Tests-009
| Field | Value |
|---|---|
| Severity | Low |
| Category | Code organization & conventions |
| Location | `src/MxGateway.Worker.Tests/MxAccess/AlarmCommandHandlerTests.cs`, `AlarmDispatcherTests.cs`, `AlarmCommandExecutorTests.cs`, `AlarmRecordTransitionMapperTests.cs`, `WnWrapAlarmConsumerXmlTests.cs` |
| Status | Resolved |
**Description:** The alarm-related test files use `snake_case` method names while the rest of the project uses the `Method_State_Result` PascalCase convention. `docs/style-guides/CSharpStyleGuide.md` and the surrounding code establish PascalCase as the project convention; the alarm files diverge.
**Recommendation:** Rename alarm-test methods to the `Method_Scenario_Expectation` PascalCase form for one consistent convention.
**Resolution:** 2026-05-18 — Renamed every `[Fact]`/`[Theory]` method in the five alarm test files from `snake_case` to the project's `Method_Scenario_Expectation` PascalCase form (46 test methods total: 10 in `AlarmCommandHandlerTests`, 8 in `AlarmDispatcherTests`, 12 in `AlarmCommandExecutorTests`, 8 in `AlarmRecordTransitionMapperTests`, 9 in `WnWrapAlarmConsumerXmlTests` minus the existing PascalCase probe methods). Only test methods were renamed — `snake_case` is not present; the method names that *look* like helpers (`Subscribe`, `PollOnce`, `Dispose` on the fake doubles) are interface implementations of `IAlarmCommandHandler`/`IAlarmTransitionConsumer`/`IDisposable` and were correctly left unchanged. The suite stays green; xUnit discovers tests by attribute, not name, so the renames are behaviour-neutral.
### Worker.Tests-010
| Field | Value |
|---|---|
| Severity | Low |
| Category | Correctness & logic bugs |
| Location | `src/MxGateway.Worker.Tests/MxAccess/MxAccessStaSessionTests.cs:230-258` |
| Status | Resolved |
**Description:** `StartAsync_WithoutAlarmCommandHandlerFactory_SubscribeAlarmsReturnsInvalidRequest` asserts `Assert.Contains("alarm", reply.DiagnosticMessage, StringComparison.OrdinalIgnoreCase)`. The XML doc claims it verifies the diagnostic says "alarm consumer not configured", but the assertion only checks the substring "alarm" — which would also match an unrelated message like "invalid alarm GUID". The assertion is weaker than the documented intent.
**Recommendation:** Assert the full diagnostic phrase so the test fails if the diagnostic regresses to a misleading message.
**Resolution:** 2026-05-18 — The weak `Assert.Contains("alarm", ...)` was replaced with an exact `Assert.Equal` against the diagnostic the executor actually emits. Re-triage: the test's XML doc claimed the phrase was "alarm consumer not configured", but `MxAccessCommandExecutor.ExecuteSubscribeAlarms` (verified in `src/MxGateway.Worker/MxAccess/MxAccessCommandExecutor.cs:310-315`) produces "SubscribeAlarms requires an alarm command handler; the worker was constructed without one." — the doc was wrong, so both the assertion and the XML doc were corrected to the real phrase. The test now fails if the diagnostic regresses to any other message.
### Worker.Tests-011
| Field | Value |
|---|---|
| Severity | Low |
| Category | Documentation & comments |
| Location | `src/MxGateway.Worker.Tests/Sta/StaCommandDispatcherTests.cs:92-112` |
| Status | Resolved |
**Description:** `DispatchAsync_WhenCanceledAfterExecutionStarts_StillReturnsLateReply` is named and documented as if it proves cancellation arrived after execution began. The test does `Started.Wait(...)` then `cancellation.Cancel()`, which proves execution started, but because the executor is already running on the STA the cancellation is inherently a no-op — the test cannot distinguish "cancel was observed and ignored" from "cancel was never checked". The name overstates what is proven.
**Recommendation:** Either tighten the test (assert the dispatcher's cancel path was reached and declined) or rename/comment it to "cancellation cannot abort an in-flight STA command", matching `gateway.md`'s stated behavior.
**Resolution:** 2026-05-18 — Took the rename/re-document option. The test is renamed `DispatchAsync_WhenCanceledWhileExecuting_DoesNotAbortInFlightCommand` and its XML doc rewritten to state exactly what it proves — an in-flight STA command is *not* aborted by cancellation — and to state explicitly that the test cannot and does not distinguish "cancel observed and ignored" from "cancel never checked". The doc now cites `gateway.md`'s wording ("cannot safely abort an in-flight COM call on the STA"). The test body is unchanged: it already asserts the command runs to completion and returns its normal `Ok` reply, which is the genuine behaviour. No runtime behaviour changed.
### Worker.Tests-012
| Field | Value |
|---|---|
| Severity | Low |
| Category | Testing coverage |
| Location | `src/MxGateway.Worker.Tests/Ipc/WorkerFrameProtocolTests.cs` |
| Status | Resolved |
**Description:** `docs/WorkerFrameProtocol.md` states the reader "rejects zero-length payloads and payloads larger than the configured maximum (default 16 MiB) before allocating the payload buffer." `WorkerFrameProtocolTests` covers malformed-length, wrong protocol version, wrong session, and malformed payload, but has no test for the zero-length-payload rejection or the oversized-frame rejection — both explicit security-relevant input-validation paths.
**Recommendation:** Add tests feeding a frame with `payload_length == 0` and one with `payload_length` above the configured maximum, asserting the corresponding `WorkerFrameProtocolErrorCode`.
**Resolution:** 2026-05-18 — Re-triage of the zero-length half: the finding's "no test for the zero-length-payload rejection" is partly inaccurate. The pre-existing `ReadAsync_WithMalformedLength_ThrowsMalformedLength` fed a four-zero-byte stream — which is exactly a frame declaring `payload_length == 0` — so the zero-length path *was* already covered, just under a misleading name (the length prefix itself is well-formed; only the declared length is zero). That test was renamed `ReadAsync_WithZeroLengthPayload_ThrowsMalformedLength` with an XML doc explaining the four-zero-byte construction, rather than adding a duplicate. The oversized half was a genuine gap: a new `ReadAsync_WithPayloadAboveConfiguredMaximum_ThrowsMessageTooLarge` constructs `WorkerFrameProtocolOptions` with a 64-byte maximum, feeds a length prefix of 65, and asserts `WorkerFrameProtocolErrorCode.MessageTooLarge` — verified against `WorkerFrameReader.ReadAsync`, both checks fire before the payload buffer is rented. The small configured maximum keeps the test from allocating a multi-megabyte buffer.
### Worker.Tests-013
| Field | Value |
|---|---|
| Severity | Low |
| Category | Concurrency & thread safety |
| Location | `src/MxGateway.Worker.Tests/Ipc/WorkerPipeSessionTests.cs:539-546` |
| Status | Resolved |
**Description:** `ThrowIfCompletedAsync` does an unconditional `await Task.Delay(TimeSpan.FromMilliseconds(100))` then checks `task.IsCompleted`. This adds a fixed 100 ms to the test and only catches a `RunAsync` that fails within that arbitrary window; a session that faults after 100 ms slips past undetected.
**Recommendation:** Replace with a deterministic race: `await Task.WhenAny(runTask, <first-expected-frame-read>)` and assert the run task did not win.
**Resolution:** 2026-05-18 — `ThrowIfCompletedAsync` was deleted (it had a single call site, in `RunAsync_SendsHeartbeatPayloadFromRuntimeSnapshot`). That test now races `runTask` against the first-heartbeat `ReadUntilAsync` with `Task.WhenAny`; if `runTask` wins it is awaited to surface the underlying fault and the test fails via `Assert.Fail`. The fixed 100 ms delay is gone — the check is now deterministic: a `RunAsync` faulting at *any* time before the first heartbeat is caught, and a healthy run completes as soon as the heartbeat arrives instead of always paying 100 ms.
### Worker.Tests-014
| Field | Value |
|---|---|
| Severity | Low |
| Category | Code organization & conventions |
| Location | `src/MxGateway.Worker.Tests/Ipc/WorkerPipeClientTests.cs:194`, `WorkerPipeSessionTests.cs:622`, `Sta/StaCommandDispatcherTests.cs:348`, `MxAccess/MxAccessStaSessionTests.cs:334`, `MxAccess/MxAccessCommandExecutorTests.cs:1124` |
| Status | Resolved |
**Description:** `FakeRuntimeSession`, `NoopComApartmentInitializer`, `NoopEventSink`/`NullEventSink`, and the `CreateFrame`/`WriteUInt32LittleEndian` helpers are re-implemented independently in multiple test files. The two `FakeRuntimeSession` implementations have already diverged (one supports `BlockDispatch`/event enqueue, one does not), and `NoopComApartmentInitializer` is defined four times.
**Recommendation:** Extract shared test doubles (`NoopComApartmentInitializer`, frame helpers, a single configurable `FakeRuntimeSession`) into a `TestSupport` folder/namespace consumed by all test classes.
**Resolution:** 2026-05-18 — Added a `src/MxGateway.Worker.Tests/TestSupport/` folder (namespace `MxGateway.Worker.Tests.TestSupport`) with four shared doubles: `NoopComApartmentInitializer`, `NoopEventSink`, `WorkerFrameTestHelpers` (`CreateFrame`/`WriteUInt32LittleEndian`), and a single configurable `FakeRuntimeSession`. The consolidated `FakeRuntimeSession` is the richer of the two divergent copies (it supports `BlockDispatch`, event enqueue, shutdown-timeout, and throw-after-release); the minimal `WorkerPipeClientTests` caller simply leaves the options unset. The per-file copies were deleted from `WorkerPipeClientTests`, `WorkerPipeSessionTests`, `StaCommandDispatcherTests`, `MxAccessStaSessionTests`, `MxAccessCommandExecutorTests`, and `WorkerFrameProtocolTests`, and the orphaned `NullEventSink` in `AlarmCommandExecutorTests` was replaced with the shared `NoopEventSink`. Re-triage: the finding says `NoopComApartmentInitializer` "is defined four times" — it was defined **three** times (`StaCommandDispatcherTests`, `MxAccessStaSessionTests`, `MxAccessCommandExecutorTests`); the fourth alarm-area `IStaComApartmentInitializer` implementation is `StaRuntimeTests.RecordingComApartmentInitializer`, which is a *recording* double (asserts init/uninit ordering), not a no-op, so it was deliberately left in place rather than folded into the shared no-op. Unused `using` directives left behind by the removals were stripped (`TreatWarningsAsErrors`).
### Worker.Tests-015
| Field | Value |
|---|---|
| Severity | Low |
| Category | Testing coverage |
| Location | `src/MxGateway.Worker.Tests/MxAccess/MxAccessEventQueueTests.cs` |
| Status | Resolved |
**Description:** `MxAccessEventQueueTests` covers monotonic sequencing, drain, capacity overflow, and first-fault-wins, but does not cover `Drain` with `maxEvents: 0` (drain-all) — a branch `FakeRuntimeSession.DrainEvents` even special-cases — nor draining an empty queue, nor enqueue after a manual `RecordFault`. These are minor branches but the overflow/fault interaction is the worker's backpressure contract.
**Recommendation:** Add a `Drain(0)` drain-all test and an empty-queue drain test.
**Resolution:** 2026-05-18 — Added three tests to `MxAccessEventQueueTests`. `Drain_WithZeroMaxEvents_DrainsAllEvents` covers the `maxEvents == 0` drain-all branch in `MxAccessEventQueue.Drain` (verified at `src/MxGateway.Worker/MxAccess/MxAccessEventQueue.cs:174`) — three events enqueued, `Drain(0)` returns all three in order and empties the queue. `Drain_WhenQueueIsEmpty_ReturnsEmptyList` covers the `drainCount == 0` early-return branch for both `Drain(0)` and `Drain(5)` on an empty queue. `Enqueue_AfterRecordFault_ThrowsInvalidOperationException` covers the backpressure contract gap the finding flagged — after a manual `RecordFault`, `Enqueue` throws `InvalidOperationException` ("outbound event queue is faulted") and the event is not queued.
+260
View File
@@ -0,0 +1,260 @@
# Code Review — Worker
| Field | Value |
|---|---|
| Module | `src/MxGateway.Worker` |
| Reviewer | Claude Code |
| Review date | 2026-05-18 |
| Commit reviewed | `6c64030` |
| Status | Reviewed |
| Open findings | 0 |
## Checklist coverage
| # | Category | Result |
|---|---|---|
| 1 | Correctness & logic bugs | Issues found: heartbeat loop sleeps before first beat (Worker-002), `ProcessCommandAsync` state race drops replies (Worker-003), watchdog/heartbeat state inconsistency (Worker-004), double-dispose path (Worker-006), plus Worker-010/011/015. |
| 2 | mxaccessgw conventions | Issue found: Worker-007 (reflection-based COM invocation bypasses the typed interface contract). |
| 3 | Concurrency & thread safety | Issues found: Worker-001 (`WnWrapAlarmConsumer` timer fires COM off the STA), Worker-008 (consumer factory STA-affinity not enforced). |
| 4 | Error handling & resilience | Issue found: Worker-005 (`OnPoll` silently swallows all poll failures). |
| 5 | Security | No secret logging (redaction applied); inbound frame validation reasonable. No issues found. |
| 6 | Performance & resource management | Issue found: Worker-009 (per-frame `byte[]` allocations on the hot event path). COM release is correct. |
| 7 | Design-document adherence | Code matches `WorkerSta.md`/`WorkerFrameProtocol.md`; stale alarm-path docs (Worker-012). |
| 8 | Code organization & conventions | Issue found: Worker-014 (`AlarmCommandHandler.cs` declares two public types in one file). |
| 9 | Testing coverage | Issue found: Worker-013 (`StaMessagePump` has no direct tests; poll-loop lifecycle untested). |
| 10 | Documentation & comments | Issue found: Worker-012 (stale "future PR / A.3" comments now describe shipped code). |
## Findings
### Worker-001
| Field | Value |
|---|---|
| Severity | High |
| Category | Concurrency & thread safety |
| Location | `src/MxGateway.Worker/MxAccess/WnWrapAlarmConsumer.cs:204-207` |
| Status | Resolved |
**Description:** When constructed with `pollIntervalMilliseconds > 0`, `Subscribe` starts a `System.Threading.Timer` whose `OnPoll` callback runs `PollOnce()` — which calls `wwAlarmConsumerClass.GetXmlCurrentAlarms2` — on a thread-pool thread. The wnwrap CLSID is registered `ThreadingModel=Apartment`; calling its methods off the owning STA violates the hard rule that all COM calls happen on the dedicated STA thread, and can deadlock on cross-apartment marshaling when the STA is not pumping. The production path (default constructor, interval 0) is safe, but the public 3-arg constructor leaves this footgun callable, and tests/live-smoke use it.
**Recommendation:** Remove the internal `Timer` entirely (production already drives `PollOnce` from the STA), or document and gate it so it can only be used from an STA thread. At minimum, make the timer-driven mode unreachable from any production wiring.
**Resolution:** 2026-05-18 — Removed the off-STA timer infrastructure from `WnWrapAlarmConsumer`: the `Timer? pollTimer` and `pollIntervalMs` fields, the `DefaultPollIntervalMilliseconds` constant, the `OnPoll` callback, the timer-arming arm in `Subscribe`, and the timer disposal block in `Dispose`. The `pollIntervalMilliseconds` parameter is gone from both public constructors (the test-seam ctor is now 2-arg: `wwAlarmConsumerClass` + `maxAlarmsPerFetch`), so the off-STA footgun is structurally unreachable. `PollOnce()` remains the public STA-driven entry point. The stale "poll … on a timer below" comment was corrected. Verified by the regression tests `WnWrapAlarmConsumer_has_no_internal_timer_field` and `WnWrapAlarmConsumer_exposes_no_poll_interval_constructor_parameter`; the `AlarmsLiveSmokeTests` call site was updated to the 2-arg constructor.
### Worker-002
| Field | Value |
|---|---|
| Severity | High |
| Category | Correctness & logic bugs |
| Location | `src/MxGateway.Worker/Ipc/WorkerPipeSession.cs:545-549` |
| Status | Resolved |
**Description:** `RunHeartbeatLoopAsync` calls `await Task.Delay(_sessionOptions.HeartbeatInterval, ...)` before sending the first heartbeat. The gateway therefore receives no heartbeat for the first full interval (default 5s) after the worker reaches `Ready`. If the gateway's liveness watchdog expects a heartbeat sooner, a healthy worker can be misclassified as hung at startup.
**Recommendation:** Send an initial heartbeat immediately on entering the loop, or move the `Task.Delay` to the end of the loop body.
**Resolution:** 2026-05-18 — Restructured `RunHeartbeatLoopAsync` so the `Task.Delay(HeartbeatInterval)` is applied between beats only, not before the first. A `firstBeat` guard skips the delay on the initial iteration, so the gateway sees a heartbeat as soon as the worker is `Ready`; cancellation behavior is preserved (the loop still observes the token and the delay still throws on cancellation). Verified by the regression test `RunAsync_SendsFirstHeartbeatImmediatelyOnEnteringLoop`. Three pre-existing tests (`WorkerPipeClientTests.RunAsync_ConnectsToPipeAndCompletesHandshake`, `WorkerPipeClientTests.RunAsync_RetriesUntilPipeServerAppears`, `WorkerPipeSessionTests.RunAsync_WhenCommandThrowsAfterShutdown_DropsLateFaultAndWritesShutdownAck`) assumed strict frame ordering and were updated to skip the now-interleaved first heartbeat while still asserting the same shutdown-ack behavior.
### Worker-003
| Field | Value |
|---|---|
| Severity | High |
| Category | Correctness & logic bugs |
| Location | `src/MxGateway.Worker/Ipc/WorkerPipeSession.cs:399-403`, `:416-419` |
| Status | Resolved |
**Description:** `ProcessCommandAsync` checks `_state` after `DispatchAsync` completes and silently `return`s without writing a `WorkerCommandReply` (or fault) when `_state` is not `Ready`/`ExecutingCommand`. `_state` is a plain field mutated from multiple tasks (heartbeat loop, event-drain loop, shutdown). A command that completes successfully while `_state` has transitioned will have its reply dropped with no diagnostic, and the gateway's correlation-id wait then hangs until its own timeout. The `_state` read is also not synchronized.
**Recommendation:** Always attempt to write the reply/fault for an in-flight command, or explicitly reject in-flight commands with a `Canceled`/`WorkerUnavailable` reply during state transitions. Make `_state` access thread-safe (volatile or locked).
**Resolution:** 2026-05-18 — Both silent-drop `return` sites in `ProcessCommandAsync` (the post-`DispatchAsync` success path and the exception path) now call a new `LogCommandResultDropped` helper before returning. The helper logs an Information event named `WorkerCommandResultDropped` via the session's `IWorkerLogger`, carrying the command's `correlation_id` plus `command_method` and `worker_state`, so a stuck gateway correlation-id wait is now traceable. The `_state` field was made `volatile` (`WorkerState` is an int-backed protobuf enum, so volatile is valid) so cross-thread reads observe the latest value without tearing; this is a low-risk, non-behavioral change and did not destabilize any test. Verified by the regression test `RunAsync_WhenReplyIsDroppedAfterShutdown_LogsDiagnostic`.
### Worker-004
| Field | Value |
|---|---|
| Severity | Medium |
| Category | Correctness & logic bugs |
| Location | `src/MxGateway.Worker/Ipc/WorkerPipeSession.cs:565-588` |
| Status | Resolved |
**Description:** After `ReportWatchdogFaultIfNeededAsync` sends an `StaHung` fault, the heartbeat loop continues sending normal heartbeats with `State` derived from `_state`, which the watchdog path never sets to `Faulted`. The heartbeat then keeps reporting a non-faulted state that contradicts the fault just sent.
**Recommendation:** Set `_state = WorkerState.Faulted` (thread-safely) when the watchdog fault fires so heartbeat state and fault stay consistent.
**Resolution:** 2026-05-18 — `ReportWatchdogFaultIfNeededAsync` now sets `_state = WorkerState.Faulted` immediately after `_watchdogFaultSent = true` and before the `StaHung` fault is written, so the next heartbeat reports `Faulted` instead of contradicting the fault. `_state` is already `volatile` (Worker-003), so the cross-thread write from the heartbeat loop is observed correctly by the heartbeat's own `CreateHeartbeat` read; no further locking is required. Verified by the regression test `WorkerPipeSessionTests.RunAsync_AfterWatchdogFault_HeartbeatReportsFaultedState`, which uses a stale-activity snapshot with an empty current-command correlation id so the heartbeat `State` is derived from `_state` rather than forced to `ExecutingCommand`.
### Worker-005
| Field | Value |
|---|---|
| Severity | Medium |
| Category | Error handling & resilience |
| Location | `src/MxGateway.Worker/MxAccess/MxAccessStaSession.cs:205-258` (production alarm poll loop) |
| Status | Resolved |
**Description:** `OnPoll` catches every exception from `PollOnce()` and discards it (`_ = ex;`). The production poll path (`MxAccessStaSession.RunAlarmPollLoopAsync``AlarmCommandHandler.PollOnce``AlarmDispatcher.PollOnce``consumer.PollOnce()`) has no fault recording either. A permanently failing alarm provider (e.g. `GetXmlCurrentAlarms2` returning `E_FAIL`, malformed XML throwing in `XmlDocument.LoadXml`) is therefore completely silent — no fault on the event queue, no log.
**Recommendation:** Route poll failures to `MxAccessEventQueue.RecordFault` (or a logger) so a broken alarm subscription becomes observable. Update the now-stale comment.
**Re-triage:** The cited location `WnWrapAlarmConsumer.cs:297-313` and the `OnPoll` callback no longer exist as of this branch — Worker-001 removed the off-STA `Timer` and its `OnPoll` callback entirely. The substantive concern still held, however: the **production** poll path in `MxAccessStaSession.RunAlarmPollLoopAsync` caught only `OperationCanceledException`, `ObjectDisposedException`, and `InvalidOperationException`. A genuine poll failure (`COMException` from `GetXmlCurrentAlarms2`, a malformed-XML `XmlException`) escaped uncaught, faulted the never-awaited `Task.Run` poll task, and was silently lost — exactly the silent-failure the finding describes. The finding was re-pointed at the live location and fixed there rather than at the removed `OnPoll`.
**Resolution:** 2026-05-18 — `RunAlarmPollLoopAsync` gained a trailing `catch (Exception exception)` arm after the three graceful-stop catches. A real alarm-poll failure is now converted to a `WorkerFault` (category `MxaccessEventConversionFailed`, carrying the exception type and, for a `COMException`, its `HResult`) by the new `CreateAlarmPollFault` helper and recorded on the session's `MxAccessEventQueue` via `RecordFault`. The worker's event-drain loop drains that fault and forwards it to the gateway, so a broken alarm subscription is now observable on the IPC fault path instead of vanishing. The poll loop still stops after the failure (the subscription is dead). No new proto enum value was added — `MxaccessEventConversionFailed` is the closest existing alarm-path category, avoiding a contracts regeneration across all clients. Verified by the regression test `MxAccessStaSessionTests.RunAlarmPollLoop_WhenPollOnceThrows_RecordsFaultOnEventQueue`.
### Worker-006
| Field | Value |
|---|---|
| Severity | Medium |
| Category | Correctness & logic bugs |
| Location | `src/MxGateway.Worker/Ipc/WorkerPipeSession.cs:117-124`, `src/MxGateway.Worker/MxAccess/MxAccessStaSession.cs:386-491` |
| Status | Resolved |
**Description:** `RunAsync`'s `finally` calls `_runtimeSession?.Dispose()` unless `_shutdownTimedOut`. On the normal path `ShutdownGracefullyAsync` already disposed the STA runtime, so re-entering `Dispose()` is a harmless no-op only because `ShutdownGracefullyAsync` reached its end and set `disposed = true`. If `ShutdownGracefullyAsync` throws `TimeoutException` after partial teardown with `_shutdownTimedOut` set, the session is never disposed at all — the `finally` skips it — leaking the STA thread and COM object, leaving cleanup to rely solely on process exit.
**Recommendation:** Make the dispose decision explicit and confirm process exit always follows a timed-out shutdown; otherwise dispose defensively. At minimum document why disposal is deliberately skipped on timeout.
**Resolution:** 2026-05-18 — `RunAsync`'s `finally` now always calls `_runtimeSession?.Dispose()`; the `if (!_shutdownTimedOut)` guard and the `_shutdownTimedOut` field (which had become write-only) were removed. `MxAccessStaSession.Dispose` is idempotent (`if (disposed) return`) and bounded — each STA join is capped with `Wait(TimeSpan.FromSeconds(2))` — so re-entering it on the normal path (where `ShutdownGracefullyAsync` already disposed the runtime) is a harmless no-op, while on the timed-out path it is now the only thing that reclaims the STA thread and releases the MXAccess COM object. The previous behaviour leaked both on a shutdown timeout and relied solely on process exit. A code comment in the `finally` block documents the reasoning. Verified by the regression test `WorkerPipeSessionTests.RunAsync_WhenShutdownTimesOut_StillDisposesRuntimeSession`, which forces a `TimeoutException` from `ShutdownGracefullyAsync` and asserts the runtime session is disposed before `RunAsync` rethrows.
### Worker-007
| Field | Value |
|---|---|
| Severity | Medium |
| Category | mxaccessgw conventions |
| Location | `src/MxGateway.Worker/MxAccess/MxAccessComServer.cs:130-150` |
| Status | Resolved |
**Description:** `Invoke` uses late-bound `Type.InvokeMember` reflection as a fallback when the COM object does not cast to `ILMXProxyServer*`. In production the object is always `LMXProxyServerClass`, so the reflection path exists only for test doubles — it is dead/untested code on the production path and obscures the interface contract. `params object[] arguments` also boxes value-type handles on every call.
**Recommendation:** Drop the reflection fallback and require the COM object to implement the interface (tests can supply a typed fake), or clearly mark the fallback as test-only.
**Re-triage:** The finding's claim that the reflection path is "dead/untested code" is partly inaccurate — it was in fact the path exercised by the entire `MxAccessCommandExecutorTests` suite, whose `FakeMxAccessComObject` did not implement any typed interface. So the reflection fallback was test-only but *not* untested. The convention concern (bypassing the typed interface contract, boxing value-type handles) is valid, so the fix follows the recommendation's first option.
**Resolution:** 2026-05-18 — The late-bound `Type.InvokeMember` reflection fallback and its `params object[]`-boxing `Invoke` helper were removed from `MxAccessComServer`. Each adapter method now takes one of two typed paths: an `is IMxAccessServer` fast path (test fakes implement `IMxAccessServer` directly) and the production path that casts to the typed `ILMXProxyServer` / `ILMXProxyServer3` / `ILMXProxyServer4` COM interfaces via new `AsProxyServer*` helpers. A COM object implementing neither now fails fast with a clear `InvalidOperationException` naming the missing interface, instead of an opaque late-bound call. The test seam was migrated accordingly: `MxAccessCommandExecutorTests.FakeMxAccessComObject` now declares `: IMxAccessServer` (its method signatures already matched the interface exactly, so no behavioural change). Verified by the new `MxAccessComServerTests` (typed-server routing, untyped-object rejection, original-exception propagation — no more `TargetInvocationException` wrapping) plus the unchanged, still-passing `MxAccessCommandExecutorTests` suite which now exercises the typed `IMxAccessServer` path.
### Worker-008
| Field | Value |
|---|---|
| Severity | Medium |
| Category | Concurrency & thread safety |
| Location | `src/MxGateway.Worker/MxAccess/MxAccessStaSession.cs:205-249`, `:429-447` |
| Status | Resolved |
**Description:** `RunAlarmPollLoopAsync` correctly marshals `handler.PollOnce()` onto the STA via `staRuntime.InvokeAsync`, and the cancel/await/dispose ordering in `ShutdownGracefullyAsync` is sound. However, nothing enforces that the `consumerFactory` and all `IMxAccessAlarmConsumer` calls run on the STA thread; a future caller could break STA affinity silently.
**Recommendation:** Add an assertion or documented invariant that the consumer factory and all `IMxAccessAlarmConsumer` calls run on the STA thread, mirroring the existing `MxAccessSession.CreationThreadId` pattern.
**Resolution:** 2026-05-18 — `MxAccessStaSession` now records the STA thread id (`alarmConsumerThreadId`) at the point the alarm-command-handler factory is invoked — which already runs inside `staRuntime.InvokeAsync` during `StartAsync`, mirroring the `MxAccessSession.CreationThreadId` capture. `RunAlarmPollLoopAsync`'s marshalled poll lambda now calls `EnsureOnAlarmConsumerThread()` before `handler.PollOnce()`, asserting the poll runs on the recorded STA thread. The check is delegated to a new `internal static` guard `AssertOnAlarmConsumerThread(int? expected, int actual)` that throws a descriptive `InvalidOperationException` on an affinity violation and is a no-op when the consumer thread is unrecorded (no alarm handler configured). Making the guard `static` and `internal` keeps it directly unit-testable. The STA-affinity invariant is documented in the guard's XML doc. Verified by the regression tests `MxAccessStaSessionTests.AssertOnAlarmConsumerThread_WhenOffOwningThread_Throws` and `AssertOnAlarmConsumerThread_OnOwningThreadOrUnset_DoesNotThrow`.
### Worker-009
| Field | Value |
|---|---|
| Severity | Low |
| Category | Performance & resource management |
| Location | `src/MxGateway.Worker/Ipc/WorkerFrameReader.cs:31,49`, `src/MxGateway.Worker/Ipc/WorkerFrameWriter.cs:57-58` |
| Status | Resolved |
**Description:** Every frame read allocates a fresh 4-byte length buffer and a payload `byte[]`; every write allocates `ToByteArray()` plus a 4-byte prefix. On the hot event-drain path (batches of up to 128 `WorkerEvent` frames every 25 ms) this produces steady gen-0 garbage. `WorkerFrameWriter` also effectively serializes twice (`CalculateSize()` then `ToByteArray()`).
**Recommendation:** Reuse a pooled buffer / `ArrayPool<byte>` for the length prefix and payload, and write directly into a pooled buffer using `CodedOutputStream`. Low priority unless event throughput is high.
**Resolution:** 2026-05-18 — `WorkerFrameWriter.WriteAsync` now serializes the envelope exactly once into a single frame buffer that carries the 4-byte length prefix followed by the payload, via `envelope.WriteTo(new Span<byte>(frame, sizeof(uint), payloadLength))`. This eliminates the redundant second serialization pass (`ToByteArray()` re-runs `CalculateSize()` internally), the separate length-prefix array, and the separate prefix `WriteAsync`/extra `FlushAsync` round. `WorkerFrameReader.ReadAsync` now rents its payload buffer from `ArrayPool<byte>.Shared` and returns it in a `finally` once `WorkerEnvelope.Parser.ParseFrom(payload, 0, length)` has copied what it needs; `ReadExactlyOrThrowAsync` gained an explicit `count` parameter so it honours the logical frame length rather than the (possibly larger) rented buffer length. The 4-byte length-prefix buffer is left as a per-call stack-sized allocation — pooling a 4-byte array is not worthwhile. Verified by the new regression test `WorkerFrameProtocolTests.ReadAsync_WithVaryingFrameSizes_ParsesEachFrameExactly`, which reads a large frame followed by a small frame through one reader to prove the pooled buffer is sliced to each frame's own length and never leaks stale trailing bytes; the existing round-trip, malformed-payload, and concurrent-write tests continue to pass.
### Worker-010
| Field | Value |
|---|---|
| Severity | Low |
| Category | Correctness & logic bugs |
| Location | `src/MxGateway.Worker/Conversion/VariantConverter.cs:204-226` |
| Status | Resolved |
**Description:** `ConvertInt64Scalar` is reached for `TypeCode.UInt32` and `TypeCode.Int64`. For a `uint` with `expectedDataType == MxDataType.Time`, the value is treated as a Windows `FILETIME` via `DateTime.FromFileTimeUtc(longValue)`; a 32-bit FILETIME is never a valid full FILETIME, so this silently produces a near-epoch timestamp rather than a raw/diagnostic value. Unlikely in practice but a silent misconversion.
**Recommendation:** Only apply the `MxDataType.Time` FILETIME projection for 64-bit source types; for `uint` fall through to integer or raw.
**Resolution:** 2026-05-18 — `ConvertInt64Scalar`'s `MxDataType.Time` FILETIME projection is now gated on `value is long`. A genuine 64-bit `long` still projects to a `Timestamp` via `DateTime.FromFileTimeUtc`; a 32-bit `uint` — which can only hold the low half of a FILETIME — now falls through to the integer projection (`DataType = Integer`, `Int64Value`) instead of silently producing a bogus near-1601 timestamp. Verified by the regression test `VariantConverterTests.Convert_WithUInt32AndExpectedTime_DoesNotProjectFileTime`; the existing `Convert_WithFileTimeAndExpectedTime_ProjectsTimestamp` (a `long` FILETIME) continues to pass, confirming the 64-bit path is unchanged.
### Worker-011
| Field | Value |
|---|---|
| Severity | Low |
| Category | Correctness & logic bugs |
| Location | `src/MxGateway.Worker/Ipc/WorkerPipeClient.cs:169-171` |
| Status | Resolved |
**Description:** `retryAttempts` is computed as `(connectTimeout / min(connectTimeout, attemptTimeout)) - 1`. With defaults (30000 / 2000) this yields 14 retries, but each retry also incurs Polly exponential backoff. The overall `connectDeadline` (`CancelAfter(connectTimeout)`) is the real bound, so the computed attempt count can be larger or smaller than the time budget allows, and the formula is opaque.
**Recommendation:** Drive retries purely off the `connectDeadline` token (Polly stops when cancelled) and drop the fragile attempt-count arithmetic, or add a comment explaining the intent.
**Resolution:** 2026-05-18 — The opaque `retryAttempts` arithmetic in `ConnectWithRetryAsync` was removed. `MaxRetryAttempts` is now `int.MaxValue`, so the retry loop is bounded solely by the `connectDeadline` linked token (`CancelAfter(_connectTimeoutMilliseconds)`): Polly stops retrying the moment that token is cancelled, making the overall connect timeout the single source of truth and correctly accounting for the exponential backoff between attempts (which the old formula ignored). A comment documents the intent. No new test was added — the change does not alter observable behavior (the deadline was always the real bound; the old formula always permitted more attempts than fit the budget), and the existing `WorkerPipeClientTests.RunAsync_RetriesUntilPipeServerAppears` (server appears mid-retry) and `RunAsync_WhenPipeNeverAppears_ThrowsTimeoutException` (deadline ends the loop) already cover both retry-until-success and deadline-bounded termination.
### Worker-012
| Field | Value |
|---|---|
| Severity | Low |
| Category | Documentation & comments |
| Location | `src/MxGateway.Worker/MxAccess/MxAccessAlarmEventSink.cs:44-55`, `src/MxGateway.Worker/MxAccess/WnWrapAlarmConsumer.cs:38-43`, `src/MxGateway.Worker/MxAccess/MxAccessEventMapper.cs:106-112` |
| Status | Resolved |
**Description:** Multiple comments describe the alarm path as not-yet-wired future work ("PR A.2 — COM-side subscription scaffold … the worker advertises no alarm subscription", "the worker bootstrap will gain a thin 'run-on-STA' wrapper as part of A.3"). As of commit 6c64030 the alarm command handler, STA poll loop, and `SubscribeAlarms`/`AcknowledgeAlarm`/`QueryActiveAlarms` are all wired. These comments are stale and misleading.
**Recommendation:** Update the XML docs/comments to describe the shipped behavior; remove the "future PR" framing.
**Re-triage:** The `WnWrapAlarmConsumer.cs:38-43` citation is inaccurate — those lines were rewritten by Worker-001 and already describe the shipped no-internal-timer threading model correctly; nothing stale there. Conversely, two stale comments the finding did *not* cite were found on the same alarm path and fixed under the same root cause: `AlarmDispatcher.cs`'s `<remarks>` still framed the dispatcher as "the in-process slice of A.3" with a "companion follow-up PR" adding the (now-shipped) `SubscribeAlarmsCommand`/`AcknowledgeAlarmCommand`/`QueryActiveAlarmsCommand`, and stated the consumer "polls on a `System.Threading.Timer` thread today" — a claim made false by Worker-001's removal of that timer; and `AlarmCommandHandler.cs`'s `<remarks>` likewise asserted "the wnwrap consumer's polling timer fires on a thread-pool thread". The discovery document `docs/AlarmClientDiscovery.md` (referenced by the source comments) was deliberately left untouched: it is a historical research log of the investigation that chose the shipped design, not API/contract/lifecycle prose, and the source comments cite only its still-accurate "Option A — captured" payload schema.
**Resolution:** 2026-05-18 — Rewrote the stale alarm-path comments to describe shipped behavior with no "future PR / A.2 / A.3" framing. `MxAccessAlarmEventSink`: the class `<remarks>` and the `Attach` comment now explain that `AlarmDispatcher` owns the consumer→sink→queue wire-up and that `Attach` carries only the session id (no COM-event subscription is needed because the polled wnwrap consumer raises transition events itself). `MxAccessEventMapper.CreateOnAlarmTransition`'s XML summary now states the worker drives it from `MxAccessAlarmEventSink.EnqueueTransition` once `AlarmDispatcher` decodes a wnwrap transition. `AlarmDispatcher` and `AlarmCommandHandler` `<remarks>` were corrected to describe the shipped command surface and the no-internal-timer / STA-driven polling model (the `System.Threading.Timer` claims were factually wrong post-Worker-001). Pure documentation change — no behavior altered, no test needed; the build stays green.
### Worker-013
| Field | Value |
|---|---|
| Severity | Low |
| Category | Testing coverage |
| Location | `src/MxGateway.Worker/Sta/StaMessagePump.cs` |
| Status | Resolved |
**Description:** `StaMessagePump` — the heart of COM event delivery (`MsgWaitForMultipleObjectsEx` + `PeekMessage`/`DispatchMessage`) — has no direct unit tests. `StaRuntimeTests` exercises it indirectly for command wake-up but never verifies that a posted Windows message actually wakes the wait and is dispatched, nor that `PumpPendingMessages` returns a correct count. The alarm poll-loop lifecycle in `MxAccessStaSession` (start/cancel/await on shutdown) also has no test. These are the most failure-sensitive paths in the module.
**Recommendation:** Add tests that post a message to the STA thread and assert it is pumped, and tests covering alarm poll-loop start/stop and shutdown ordering.
**Re-triage:** This finding is stale as of the reviewed branch — the coverage it asks for already exists. `src/MxGateway.Worker.Tests/Sta/StaMessagePumpTests.cs` contains direct `StaMessagePump` tests covering null-argument validation, waking on a signalled event, returning on timeout, the zero-timeout conversion branch, `PumpPendingMessages` returning the correct count for messages posted to the STA thread (`PumpPendingMessages_MessagesPostedToStaThread_ReturnsCountProcessed`, `PumpPendingMessages_NoMessagesPosted_ReturnsZero`), and `WaitForWorkOrMessages` waking on a posted Windows message (`WaitForWorkOrMessages_WindowsMessagePosted_ReturnsForInputAvailable`) — exactly the "post a message and assert it is pumped" test the recommendation asks for. The alarm poll-loop lifecycle is covered by `MxAccessStaSessionTests.StartAsync_WithAlarmCommandHandlerFactory_PollOnceCalledViaSta` (start → poll runs on the STA) and `Dispose_StopsAlarmPollLoop` (Dispose joins the poll task; no further polls). The finding was raised against a stale view of the test project; no source or test change is required. Re-triaged as already resolved rather than fixed.
**Resolution:** 2026-05-18 — No code change. Re-triaged: the requested direct `StaMessagePump` tests (including posted-message dispatch and pump count) and the alarm poll-loop start/stop lifecycle tests already exist in `StaMessagePumpTests.cs` and `MxAccessStaSessionTests.cs`. See the re-triage note above for the specific test names.
### Worker-014
| Field | Value |
|---|---|
| Severity | Low |
| Category | Code organization & conventions |
| Location | `src/MxGateway.Worker/MxAccess/AlarmCommandHandler.cs:33`, `:202` |
| Status | Resolved |
**Description:** The file declares two public types — the `AlarmCommandHandler` class and the `IAlarmCommandHandler` interface. The C# style guide and the rest of the module follow one-public-type-per-file (e.g. interfaces in their own `I*.cs` files like `IMxAccessAlarmConsumer.cs`).
**Recommendation:** Move `IAlarmCommandHandler` to its own `IAlarmCommandHandler.cs` for consistency.
**Resolution:** 2026-05-18 — The `IAlarmCommandHandler` interface (with its XML docs) was moved verbatim out of `AlarmCommandHandler.cs` into a new `src/MxGateway.Worker/MxAccess/IAlarmCommandHandler.cs`, with its own `using` directives (`System`, `System.Collections.Generic`, `MxGateway.Contracts.Proto`). `AlarmCommandHandler.cs` now declares one public type, matching the module's one-public-type-per-file convention (cf. `IMxAccessAlarmConsumer.cs`). Pure file-organization change — no API surface, behavior, or namespace changed; no test needed. The worker build is clean with zero warnings (no unused usings left behind in `AlarmCommandHandler.cs`).
### Worker-015
| Field | Value |
|---|---|
| Severity | Low |
| Category | Correctness & logic bugs |
| Location | `src/MxGateway.Worker/MxAccess/MxAccessEventQueue.cs:115-145` |
| Status | Resolved |
**Description:** On overflow, `Enqueue` records the overflow fault and throws `MxAccessEventQueueOverflowException`; `MxAccessBaseEventSink.EnqueueEvent` catches it and calls `RecordFault` again. `RecordFault` is a no-op when a fault already exists, so the second call is harmless — but the intent is muddled, and there is no test asserting the dropped-event behavior. This is acceptable per the fail-fast design but undocumented at the call site.
**Recommendation:** Add a brief comment in `EnqueueEvent` clarifying that an overflow exception is expected and already self-records its fault, so the catch is intentionally a near no-op.
**Resolution:** 2026-05-18 — Added a comment in `MxAccessBaseEventSink.EnqueueEvent`'s catch block (per the finding's recommendation) explaining that two distinct fail-fast failures land there: a conversion failure from `createEvent()` (recorded here as an `MxaccessEventConversionFailed` fault) and an `MxAccessEventQueueOverflowException` from `Enqueue` at capacity, which — per the fail-fast backpressure design in `docs/DesignDecisions.md` — drops the event and has *already* self-recorded a `QueueOverflow` fault inside `Enqueue`. Because `MxAccessEventQueue.RecordFault` keeps only the first fault, the catch's `RecordFault` call is then a deliberate near no-op rather than a second, conflicting fault. Pure comment change as recommended — no behavior altered. `docs/DesignDecisions.md` already documents the fail-fast event backpressure rule, so no doc change was required.
+53
View File
@@ -0,0 +1,53 @@
# Code Review — &lt;Module&gt;
<!-- Template for a per-module findings file. Copy to code-reviews/<Module>/findings.md.
See ../../REVIEW-PROCESS.md for the full process. The base README.md is generated
from these files by regen-readme.py — do not edit README.md by hand. -->
| Field | Value |
|---|---|
| Module | `src/MxGateway.<Module>` |
| Reviewer | <name> |
| Review date | <YYYY-MM-DD> |
| Commit reviewed | `<short-sha>` |
| Status | Not started |
| Open findings | 0 |
## Checklist coverage
A comprehensive review completes every category, recording "No issues found" where
a category produced nothing rather than leaving it blank.
| # | Category | Result |
|---|---|---|
| 1 | Correctness & logic bugs | _pending_ |
| 2 | mxaccessgw conventions | _pending_ |
| 3 | Concurrency & thread safety | _pending_ |
| 4 | Error handling & resilience | _pending_ |
| 5 | Security | _pending_ |
| 6 | Performance & resource management | _pending_ |
| 7 | Design-document adherence | _pending_ |
| 8 | Code organization & conventions | _pending_ |
| 9 | Testing coverage | _pending_ |
| 10 | Documentation & comments | _pending_ |
## Findings
<!-- One ### entry per finding. IDs are <Module>-NNN, sequential within the module,
never reused. Findings are never deleted — close them by changing Status and
completing Resolution. -->
### <Module>-001
| Field | Value |
|---|---|
| Severity | Critical / High / Medium / Low |
| Category | one of the 10 checklist categories |
| Location | `path/to/File.cs:NN` |
| Status | Open / In Progress / Resolved / Won't Fix / Deferred |
**Description:** What is wrong and why it matters.
**Recommendation:** Concrete suggested fix.
**Resolution:** _(empty until closed; on close, record the fixing commit SHA, the date, and a one-line description of the fix)_
+76
View File
@@ -0,0 +1,76 @@
# Prompt — resolve open code-review findings
Reusable orchestration prompt for clearing the `code-reviews/` backlog. Paste it
to a fresh agent when you want the remaining findings worked through.
---
Resolve all open code-review findings (every severity), following the same
workflow already used to resolve the Critical dashboard finding and the
Client.Rust module (see git commits `a8aafdf`, `0d8a28d`, `9082e50`).
## Setup
- Read `code-reviews/README.md` for the open findings and `REVIEW-PROCESS.md`
for the workflow. Group the open findings by module.
- A module is one folder under `code-reviews/` — a `src/MxGateway.*` project or
a `clients/` language client. The module→source mapping and the per-module
build/test commands are in `CLAUDE.md` (the "Source Update Workflow" table and
the per-client commands).
## Dispatch — one general-purpose subagent per module, in batches of ~5 modules
Each subagent, for every open finding in its assigned module, must:
- Verify the finding's root cause against the actual source. Do NOT trust the
finding text — if it is wrong or misclassified, re-triage it (correct the
severity/description in that module's `findings.md`) instead of forcing a fix.
- Use real TDD: write the regression test FIRST and run it to confirm it fails,
THEN implement the root-cause fix, THEN confirm it passes. (Do not use
`git stash` — parallel agents would race on the shared stash stack.)
- Run that module's full build and test suite with the module-appropriate
toolchain and confirm it is green:
- `src/MxGateway.*` .NET projects — `dotnet build` + `dotnet test` for the
project; the Worker must build x86 (`-p:Platform=x86`).
- `clients/dotnet``dotnet build clients/dotnet/MxGateway.Client.sln` and its tests.
- `clients/go``gofmt`, `go build ./...`, `go test ./...`.
- `clients/rust``cargo fmt`, `cargo test --workspace`,
`cargo clippy --workspace --all-targets -- -D warnings`.
- `clients/python``python -m pytest`.
- `clients/java``gradle test`.
- A regression test for a gateway-server finding belongs in `src/MxGateway.Tests`;
for a worker finding, in `src/MxGateway.Worker.Tests`. Adding a test there is
permitted even though it is a different module's source tree.
- Update only that module's `code-reviews/<Module>/findings.md`: set each
resolved finding's Status to `Resolved` with a Resolution note describing the
fix (the orchestrator appends the fixing commit SHA), and update the header
"Open findings" count.
- CONSTRAINTS: edit only the source and test files needed for the assigned
module's findings, plus that module's own `findings.md`. Do NOT edit
`code-reviews/README.md`. Do NOT commit. Do NOT touch another module's
`findings.md`.
- Report a summary: each finding — root-cause confirmation, the fix, test names,
and any re-triage.
Batch so that no two subagents in the same batch write to the same test project
— e.g. do not run the `Server` and `Contracts` agents together, since both add
regression tests under `src/MxGateway.Tests`.
## After each batch returns (orchestrator does this — keep your own context lean)
- Build and test every component the batch touched, using the `CLAUDE.md`
commands; confirm clean. For any .NET change, `dotnet build src/MxGateway.sln`.
- Commit per module — one commit per module, message referencing the finding
IDs. Record the fixing commit SHA in each finding's Resolution.
- Regenerate the index: `python code-reviews/regen-readme.py`, then
`python code-reviews/regen-readme.py --check` to confirm it is consistent;
stage `code-reviews/README.md`. (Use `python` — the bare `python3` alias on
this box resolves to the Windows Store stub and fails.) You may stage
`README.md` with each module's commit, or commit it once per batch after the
script runs.
- Push.
## Continue
Continue batch by batch until all findings are Resolved or re-triaged. If a
finding needs a design decision, skip it and surface it rather than guessing.
+236
View File
@@ -0,0 +1,236 @@
#!/usr/bin/env python3
"""Regenerate code-reviews/README.md from the per-module findings.md files.
The per-module findings.md files are the source of truth. This script aggregates
them into the single cross-module README.md (module status + pending/closed
finding tables).
Usage:
python code-reviews/regen-readme.py # rewrite README.md
python code-reviews/regen-readme.py --check # exit 1 if stale or inconsistent
`--check` fails when README.md is out of date OR when a module's header
`Open findings` count disagrees with its finding statuses, or a finding
carries an unrecognised Status value.
"""
from __future__ import annotations
import re
import sys
from pathlib import Path
ROOT = Path(__file__).resolve().parent
README = ROOT / "README.md"
PENDING_STATUSES = {"Open", "In Progress"}
KNOWN_STATUSES = {"Open", "In Progress", "Resolved", "Won't Fix", "Deferred"}
SEVERITY_ORDER = {"Critical": 0, "High": 1, "Medium": 2, "Low": 3}
GENERATED_NOTE = (
"<!-- GENERATED FILE - do not edit by hand. "
"Regenerate with: python code-reviews/regen-readme.py -->"
)
def cell(value: str) -> str:
"""Escape a value for safe inclusion in a markdown table cell."""
return value.replace("|", "\\|").strip()
def summarize(value: str, limit: int = 240) -> str:
"""Trim a long description to a single-cell-friendly summary."""
value = value.strip()
if len(value) <= limit:
return value
return value[: limit - 1].rstrip() + ""
def first_table(text: str) -> dict[str, str]:
"""Parse the first contiguous block of '| key | value |' rows into a dict."""
rows: dict[str, str] = {}
started = False
for line in text.splitlines():
stripped = line.strip()
if stripped.startswith("|"):
started = True
cells = [c.strip() for c in stripped.strip("|").split("|")]
if len(cells) >= 2:
key, value = cells[0], cells[1]
if key and not set(key) <= {"-", ":"} and key != "Field":
rows[key] = value
elif started:
break
return rows
def parse_module(findings_path: Path) -> dict:
"""Parse one module's findings.md into its header and finding list."""
text = findings_path.read_text(encoding="utf-8")
module = findings_path.parent.name
parts = re.split(r"^##\s+Findings\s*$", text, maxsplit=1, flags=re.M)
header = first_table(parts[0])
findings: list[dict] = []
if len(parts) > 1:
for chunk in re.split(r"^###\s+", parts[1], flags=re.M)[1:]:
fid = chunk.splitlines()[0].strip()
tbl = first_table(chunk)
desc_m = re.search(
r"\*\*Description:\*\*\s*(.*?)(?=\n\*\*|\Z)", chunk, re.S
)
desc = re.sub(r"\s+", " ", desc_m.group(1)).strip() if desc_m else ""
findings.append(
{
"id": fid,
"severity": tbl.get("Severity", ""),
"category": tbl.get("Category", ""),
"location": tbl.get("Location", ""),
"status": tbl.get("Status", ""),
"description": desc,
}
)
return {"module": module, "header": header, "findings": findings}
def build_readme(modules: list[dict]) -> str:
modules = sorted(modules, key=lambda m: m["module"])
all_findings = [
dict(f, module=m["module"]) for m in modules for f in m["findings"]
]
pending = [f for f in all_findings if f["status"] in PENDING_STATUSES]
closed = [
f
for f in all_findings
if f["status"] and f["status"] not in PENDING_STATUSES
]
def sev_key(f: dict) -> tuple:
return (SEVERITY_ORDER.get(f["severity"], 9), f["id"])
pending.sort(key=sev_key)
closed.sort(key=sev_key)
out: list[str] = [
"# Code Reviews",
"",
GENERATED_NOTE,
"",
"Cross-module code review index for the `mxaccessgw` codebase. The review "
"process is defined in [../REVIEW-PROCESS.md](../REVIEW-PROCESS.md).",
"",
"Each module's `findings.md` is the source of truth; this file is generated "
"from them by `regen-readme.py` and must not be edited by hand.",
"",
"## Module status",
"",
"| Module | Reviewer | Date | Commit | Status | Open | Total |",
"|---|---|---|---|---|---|---|",
]
for m in modules:
h = m["header"]
open_n = sum(
1 for f in m["findings"] if f["status"] in PENDING_STATUSES
)
out.append(
f"| [{m['module']}]({m['module']}/findings.md) "
f"| {cell(h.get('Reviewer', ''))} "
f"| {cell(h.get('Review date', ''))} "
f"| {cell(h.get('Commit reviewed', ''))} "
f"| {cell(h.get('Status', ''))} "
f"| {open_n} | {len(m['findings'])} |"
)
out += ["", "## Pending findings", ""]
out.append(
"Findings with status `Open` or `In Progress`, ordered by severity."
)
out.append("")
if pending:
out.append("| ID | Severity | Category | Location | Description |")
out.append("|---|---|---|---|---|")
for f in pending:
out.append(
f"| {cell(f['id'])} | {cell(f['severity'])} "
f"| {cell(f['category'])} | {cell(f['location'])} "
f"| {cell(summarize(f['description']))} |"
)
else:
out.append("_No pending findings._")
out += ["", "## Closed findings", ""]
out.append("Findings with status `Resolved`, `Won't Fix`, or `Deferred`.")
out.append("")
if closed:
out.append("| ID | Severity | Status | Category | Location |")
out.append("|---|---|---|---|---|")
for f in closed:
out.append(
f"| {cell(f['id'])} | {cell(f['severity'])} "
f"| {cell(f['status'])} | {cell(f['category'])} "
f"| {cell(f['location'])} |"
)
else:
out.append("_No closed findings._")
return "\n".join(out) + "\n"
def find_inconsistencies(modules: list[dict]) -> list[str]:
"""Return human-readable problems in the per-module findings.md files.
Checks that each module header's `Open findings` count agrees with its
finding statuses, and that every finding carries a known Status value.
"""
issues: list[str] = []
for m in modules:
open_n = sum(
1 for f in m["findings"] if f["status"] in PENDING_STATUSES
)
declared = m["header"].get("Open findings", "").strip()
if declared != str(open_n):
issues.append(
f"{m['module']}: header 'Open findings' = '{declared}' but "
f"{open_n} finding(s) are Open/In Progress"
)
for f in m["findings"]:
if f["status"] not in KNOWN_STATUSES:
issues.append(
f"{m['module']}: finding {f['id']} has unrecognised "
f"Status '{f['status']}'"
)
return issues
def main(argv: list[str]) -> int:
check = "--check" in argv[1:]
module_dirs = sorted(
d
for d in ROOT.iterdir()
if d.is_dir() and d.name != "_template" and (d / "findings.md").is_file()
)
modules = [parse_module(d / "findings.md") for d in module_dirs]
content = build_readme(modules)
issues = find_inconsistencies(modules)
if check:
stale = (
README.read_text(encoding="utf-8") if README.exists() else ""
) != content
for issue in issues:
print(f"inconsistent: {issue}", file=sys.stderr)
if stale:
print(
"code-reviews/README.md is stale - run regen-readme.py",
file=sys.stderr,
)
if stale or issues:
return 1
print("code-reviews/README.md is up to date and consistent.")
return 0
for issue in issues:
print(f"warning: {issue}", file=sys.stderr)
README.write_text(content, encoding="utf-8", newline="\n")
print(f"Wrote {README} ({len(modules)} modules).")
return 0
if __name__ == "__main__":
raise SystemExit(main(sys.argv))
+158
View File
@@ -0,0 +1,158 @@
#!/usr/bin/env python3
"""Tests for regen-readme.py.
Dependency-free: run with `python code-reviews/test_regen_readme.py`.
Exits 0 if all tests pass, 1 otherwise.
"""
from __future__ import annotations
import importlib.util
import tempfile
import traceback
from pathlib import Path
HERE = Path(__file__).resolve().parent
# regen-readme.py is not an importable module name (hyphen), so load it by path.
_spec = importlib.util.spec_from_file_location("regen_readme", HERE / "regen-readme.py")
regen = importlib.util.module_from_spec(_spec)
_spec.loader.exec_module(regen)
FIXTURE = """# Code Review — Demo
| Field | Value |
|---|---|
| Module | `src/Demo` |
| Reviewer | Tester |
| Review date | 2026-05-18 |
| Commit reviewed | `abc1234` |
| Status | Reviewed |
| Open findings | 1 |
## Findings
### Demo-001
| Field | Value |
|---|---|
| Severity | High |
| Category | Security |
| Location | `src/Demo/File.cs:10` |
| Status | Open |
**Description:** A first problem that matters.
**Recommendation:** Fix it.
**Resolution:** _(open)_
### Demo-002
| Field | Value |
|---|---|
| Severity | Low |
| Category | Documentation & comments |
| Location | `src/Demo/File.cs:20` |
| Status | Resolved |
**Description:** A second, minor problem.
**Recommendation:** Tidy it.
**Resolution:** Fixed in def5678 on 2026-05-18.
"""
def _parse_fixture() -> dict:
"""Write FIXTURE to a temp Demo/findings.md and parse it."""
with tempfile.TemporaryDirectory() as tmp:
path = Path(tmp) / "Demo" / "findings.md"
path.parent.mkdir()
path.write_text(FIXTURE, encoding="utf-8")
return regen.parse_module(path)
def test_first_table_skips_separator_and_field_header():
table = regen.first_table("| Field | Value |\n|---|---|\n| Severity | High |\n")
assert table == {"Severity": "High"}, table
def test_parse_module_header():
m = _parse_fixture()
assert m["module"] == "Demo", m["module"]
assert m["header"]["Reviewer"] == "Tester"
assert m["header"]["Status"] == "Reviewed"
assert m["header"]["Open findings"] == "1"
def test_parse_module_findings():
m = _parse_fixture()
assert len(m["findings"]) == 2, len(m["findings"])
first = m["findings"][0]
assert first["id"] == "Demo-001"
assert first["severity"] == "High"
assert first["category"] == "Security"
assert first["location"] == "`src/Demo/File.cs:10`"
assert first["status"] == "Open"
assert first["description"] == "A first problem that matters."
assert m["findings"][1]["status"] == "Resolved"
def test_build_readme_splits_pending_and_closed():
readme = regen.build_readme([_parse_fixture()])
assert "## Pending findings" in readme
assert "## Closed findings" in readme
pending, closed = readme.split("## Closed findings", 1)
assert "Demo-001" in pending # Open -> pending
assert "Demo-001" not in closed
assert "Demo-002" in closed # Resolved -> closed
assert "_No pending findings._" not in pending
def test_find_inconsistencies_clean_fixture():
assert regen.find_inconsistencies([_parse_fixture()]) == []
def test_find_inconsistencies_detects_wrong_open_count():
m = _parse_fixture()
m["header"]["Open findings"] = "7"
issues = regen.find_inconsistencies([m])
assert len(issues) == 1 and "Open findings" in issues[0], issues
def test_find_inconsistencies_detects_unknown_status():
m = _parse_fixture()
m["findings"][0]["status"] = "Bogus"
issues = regen.find_inconsistencies([m])
# Wrong status also shifts the open count, so expect the status issue present.
assert any("unrecognised Status" in i for i in issues), issues
def test_summarize_truncates_long_text():
long = "x" * 500
out = regen.summarize(long)
assert len(out) <= 240 and out.endswith(""), len(out)
assert regen.summarize("short") == "short"
def main() -> int:
tests = sorted(
(name, fn)
for name, fn in globals().items()
if name.startswith("test_") and callable(fn)
)
failed = 0
for name, fn in tests:
try:
fn()
print(f"PASS {name}")
except Exception: # noqa: BLE001 - test runner reports all failures
failed += 1
print(f"FAIL {name}")
traceback.print_exc()
print(f"\n{len(tests) - failed}/{len(tests)} passed.")
return 1 if failed else 0
if __name__ == "__main__":
raise SystemExit(main())
+36
View File
@@ -762,6 +762,42 @@ in the codebase for the forward-compat shape, but the gateway-side
`AcknowledgeAlarmByName` when the public RPC supplies a recognizable
`Provider!Group.Tag` reference.
**Command/reply payload reuse.** `MxCommand.payload` has a dedicated
`acknowledge_alarm_by_name_command` field, but `MxCommandReply.payload`
intentionally has **no** by-name-specific case. The by-name ack carries
no outcome detail beyond the native return code, so the worker's
`ExecuteAcknowledgeAlarmByName` sets the same `acknowledge_alarm`
(`AcknowledgeAlarmReplyPayload`) reply case used by the GUID arm, with
`native_status` = the `AlarmAckByName` return code (also echoed into the
top-level `MxCommandReply.hresult`). Reply consumers must dispatch on
`MxCommandReply.kind` (`MX_COMMAND_KIND_ACKNOWLEDGE_ALARM` vs.
`MX_COMMAND_KIND_ACKNOWLEDGE_ALARM_BY_NAME`), not on the payload oneof
case, to distinguish the two acks. `WorkerAlarmRpcDispatcher` reads only
the top-level `hresult`/`protocol_status`, so it handles both arms
without unpacking the payload.
**Worker `native_status` → public `AcknowledgeAlarmReply` mapping.** The
worker carries the ack outcome as a single `int32`
(`AcknowledgeAlarmReplyPayload.native_status`, the `AlarmAckByName` /
`AlarmAckByGUID` return code; `0` = success), also mirrored into the
worker `MxCommandReply.hresult`. The public `AcknowledgeAlarmReply` has
two outcome-shaped fields, but only one is populated:
- `AcknowledgeAlarmReply.hresult``WorkerAlarmRpcDispatcher` copies the
worker's `MxCommandReply.hresult` (the native return code) into this
field. **This is the authoritative ack-outcome field**; `0` means the
ack succeeded. It is absent only when the worker reply omitted the
value, which is a protocol violation surfaced in `protocol_status`.
- `AcknowledgeAlarmReply.status` (`MxStatusProxy`) — the worker by-name /
by-GUID ack path produces only the `int32` return code, never a
populated `MXSTATUS_PROXY` struct, so `WorkerAlarmRpcDispatcher` leaves
this field **unset on every reply**. It is reserved for a future
structured view of the ack outcome. Clients must not depend on it.
Client authors should therefore branch on `protocol_status` first (for
transport/session-level failures) and then on `hresult` (`0` = ack
accepted by MXAccess) — never on `status`.
### 5. STA / threading — production fix needed
The wnwrap COM is `ThreadingModel=Apartment`. The consumer's
+16 -18
View File
@@ -107,29 +107,20 @@ The gateway keeps API key state in a dedicated SQLite database. SQLite is suffic
### Connection factory
`AuthSqliteConnectionFactory` reads `GatewayOptions.Authentication.SqlitePath`, ensures the parent directory exists, and opens the connection in `ReadWriteCreate` mode so first-run installations can create the file without manual provisioning:
`AuthSqliteConnectionFactory` reads `GatewayOptions.Authentication.SqlitePath`, ensures the parent directory exists, and builds a connection string in `ReadWriteCreate` mode so first-run installations can create the file without manual provisioning. Connection pooling is enabled and the connection string carries a non-zero `DefaultTimeout`:
```csharp
public SqliteConnection CreateConnection()
SqliteConnectionStringBuilder builder = new()
{
string sqlitePath = options.Value.Authentication.SqlitePath;
string? directory = Path.GetDirectoryName(sqlitePath);
if (!string.IsNullOrWhiteSpace(directory))
{
Directory.CreateDirectory(directory);
}
SqliteConnectionStringBuilder builder = new()
{
DataSource = sqlitePath,
Mode = SqliteOpenMode.ReadWriteCreate
};
return new SqliteConnection(builder.ToString());
}
DataSource = sqlitePath,
Mode = SqliteOpenMode.ReadWriteCreate,
Pooling = true,
DefaultTimeout = (int)BusyTimeout.TotalSeconds,
};
```
Every store opens its connection through `OpenConnectionAsync`, which opens the connection and then applies `PRAGMA journal_mode=WAL` and `PRAGMA busy_timeout`. WAL is a persistent database-level setting so re-applying it per connection is a cheap no-op; `busy_timeout` is per-connection state. Because `MarkKeyUsedAsync` runs on every authenticated request and `SqliteApiKeyAuditStore` appends on every denial, this lets concurrent readers and writers retry briefly instead of surfacing `SQLITE_BUSY` as a hard failure on the request path.
### Schema
`SqliteAuthSchema` declares table names and the current schema version as constants. Three tables are involved:
@@ -166,6 +157,8 @@ public static ApiKeyRecord Read(SqliteDataReader reader)
`SqliteApiKeyAdminStore` (`IApiKeyAdminStore`) implements administrative mutations: `CreateAsync` accepts an `ApiKeyCreateRequest`, `RevokeAsync` sets `revoked_utc` only when not already revoked, and `RotateAsync` replaces `secret_hash`, clears `last_used_utc`, and clears `revoked_utc` so a rotated key is immediately usable.
Because `RotateAsync` clears `revoked_utc`, rotating a previously revoked key reactivates it. The dashboard API Keys page therefore offers the Rotate (and Revoke) action only for keys whose status is `Active`; a revoked key shows no actions, so an operator cannot un-revoke a deliberately disabled key as a side effect of a rotation.
### Audit trail
`SqliteApiKeyAuditStore` (`IApiKeyAuditStore`) appends `ApiKeyAuditEntry` values to the `api_key_audit` table and stamps each row with a UTC timestamp inside the store rather than trusting the caller. `ListRecentAsync` returns the most recent rows ordered by `audit_id` descending and projects them into `ApiKeyAuditRecord`. Rows are kept even after the referenced key is revoked because the audit history is the durable record of administrative action; the `key_id` column is nullable to accommodate non-key-scoped events such as `init-db`.
@@ -223,6 +216,10 @@ constraints remain fully unconstrained after migration.
Key ids are restricted by the parser to ASCII letters, digits, periods, and hyphens so they remain safe to embed in the token format and in URL paths used by administrative tooling.
The CLI is not the only management surface: the dashboard API Keys page
creates, rotates, and revokes keys through the same `IApiKeyAdminStore`. See
[Gateway Dashboard Design](./GatewayDashboardDesign.md#api-keys-page).
## Scope Serialization
Scopes are persisted as a single TEXT column rather than a join table because the set is small, never queried by membership at the database level, and changes atomically with the owning row. `ApiKeyScopeSerializer.Serialize` writes a JSON array sorted with `StringComparer.Ordinal` so equivalent scope sets produce byte-identical column values, which makes audit diffing and database comparisons deterministic:
@@ -276,4 +273,5 @@ Singletons are safe because each operation opens its own short-lived `SqliteConn
- [Gateway Configuration](./GatewayConfiguration.md)
- [Authorization](./Authorization.md)
- [Gateway Dashboard Design](./GatewayDashboardDesign.md)
- [Diagnostics](./Diagnostics.md)
+7
View File
@@ -161,6 +161,12 @@ Glob matching is anchored, case-insensitive, and supports `*` and `?`.
Subtree and tag glob lists are alternatives: matching either list allows that
scope dimension. Empty lists mean unconstrained for that dimension.
Constraints are set when a key is created — through the `apikey create-key`
flags (see [Authentication](./Authentication.md)) or the dashboard API Keys
page create dialog (see
[Gateway Dashboard Design](./GatewayDashboardDesign.md#api-keys-page)). The
dashboard API Keys page also renders each key's effective constraints.
The service checks read constraints for `AddItem`, `AddItem2`, `AddItemBulk`,
`SubscribeBulk`, and `AdviseItemBulk`. It checks write constraints for
`Write`, `Write2`, `WriteSecured`, and `WriteSecured2`. Successful item
@@ -252,6 +258,7 @@ Singleton lifetimes are appropriate because none of the three classes hold per-r
## Related Documentation
- [Authentication](./Authentication.md)
- [Gateway Dashboard Design](./GatewayDashboardDesign.md)
- [Grpc](./Grpc.md)
- [GatewayConfiguration](./GatewayConfiguration.md)
- [Galaxy Repository Browse](./GalaxyRepository.md)
+51
View File
@@ -49,6 +49,7 @@ Endpoint layout:
/dashboard/workers
/dashboard/events
/dashboard/galaxy
/dashboard/apikeys
/dashboard/settings
/dashboard/_blazor
```
@@ -83,6 +84,7 @@ MxGateway.Server
SessionDetailsPage.razor
WorkersPage.razor
EventsPage.razor
ApiKeysPage.razor
SettingsPage.razor
Shared/
MetricCard.razor
@@ -91,6 +93,9 @@ MxGateway.Server
DashboardSnapshotService.cs
DashboardAuthorizationHandler.cs
DashboardAuthenticator.cs
DashboardApiKeyAuthorization.cs
DashboardApiKeyManagementService.cs
DashboardApiKeySummary.cs
DashboardSnapshot.cs
DashboardSessionSummary.cs
DashboardWorkerSummary.cs
@@ -249,6 +254,52 @@ Show aggregate event diagnostics:
Do not display full tag values by default. If value display is later added, make
it opt-in and redacted.
### API keys page
`/dashboard/apikeys` lists the gateway's API keys and, for authorized
operators, manages them. It reads key metadata through the same
`IApiKeyAdminStore` the `apikey` CLI uses, so the dashboard and the CLI act
on one source of truth.
The table shows one row per key:
- key id,
- status (`Active` or `Revoked`),
- display name,
- scopes,
- constraints (rendered as `unconstrained` when none are set),
- created timestamp,
- last-used timestamp.
Key secrets are never listed. Only the peppered hash is stored, and the page
never reconstructs a key. See [Authorization](./Authorization.md#constraint-enforcement)
for what each constraint means and how it is enforced on the gRPC path.
#### Management actions
Create, Rotate, and Revoke controls render only when the signed-in user is
authorized. `DashboardApiKeyAuthorization.CanManage` requires an authenticated
principal that is a member of the LDAP `MxGateway:Ldap:RequiredGroup` — the
same group the dashboard login enforces. An anonymous localhost viewer can read
the table but sees no action controls.
- **Create** opens a dialog for the key id, display name, scope checkboxes
(the `GatewayScopes` catalog), and the optional constraint fields: read and
write subtrees, read and write tag globs, browse subtrees, max write
classification, and the read-alarm-only / read-historized-only flags.
- **Rotate** issues a new secret for an existing key id and invalidates the
old one.
- **Revoke** marks a key revoked; a revoked key cannot be un-revoked.
Create and Rotate return the assembled `mxgw_<keyId>_<secret>` token **once**,
in a one-time banner. It is never shown again, so the operator must copy it
immediately. This mirrors the `apikey create-key` / `rotate-key` CLI.
Every management action appends an `api_key_audit` entry
(`dashboard-create-key`, `dashboard-rotate-key`, `dashboard-revoke-key`) with
the key id and the caller's remote address. Secrets and pepper values are never
logged.
### Settings page
Show read-only effective configuration:
+74 -3
View File
@@ -44,9 +44,22 @@ skipped unless `MXGATEWAY_RUN_LIVE_MXACCESS_TESTS=1` is set because it creates
the installed MXAccess COM object and depends on live provider state.
The live smoke opens a gateway session, launches the x86 worker, runs
`Register`, `AddItem`, and `Advise`, waits a bounded time for one
`OnDataChange`, and closes the session in a `finally` block so the worker gets a
graceful shutdown request even when a command or event assertion fails.
`Register`, `AddItem`, and `Advise`, waits a bounded time for the first
`OnDataChange` event (skipping any earlier bootstrap/registration-state event),
and closes the session in a `finally` block so the worker gets a graceful
shutdown request even when a command or event assertion fails. Cleanup failures
in that `finally` block are logged rather than thrown, so a real assertion
failure is never masked by a shutdown timeout.
`WorkerLiveMxAccessSmokeTests` additionally covers two MXAccess parity paths the
fake-worker tests cannot validate:
- a `Write` round-trip against an advised item, and
- an `AddItem` against an invalid server handle, asserting the MXAccess failure
surfaces in the command reply without faulting the gateway transport.
All three tests are gated by the same `MXGATEWAY_RUN_LIVE_MXACCESS_TESTS=1`
opt-in variable.
Build the worker before running the smoke:
@@ -74,6 +87,64 @@ The test output includes session id, worker process id, command status,
HRESULT/status diagnostics, event sequence and handles, close status, and worker
stdout/stderr lines emitted during the run.
## Live Galaxy Repository
`GalaxyRepositoryLiveTests` in `src/MxGateway.IntegrationTests/Galaxy/` exercises
`GalaxyRepository` directly against the `ZB` Galaxy Repository SQL database. It is
skipped unless `MXGATEWAY_RUN_LIVE_GALAXY_TESTS=1` is set because it depends on a
reachable SQL Server instance and deployed Galaxy state — fake-worker tests cannot
cover the SQL browse RPCs.
The suite covers `TestConnectionAsync`, `GetLastDeployTimeAsync`,
`GetHierarchyAsync`, and `GetAttributesAsync`. `GetHierarchyAsync` and
`GetAttributesAsync` assert a non-empty result, so the connected `ZB` database
must contain a deployed Galaxy, not just an empty schema.
Run the Galaxy live tests explicitly:
```bash
$env:MXGATEWAY_RUN_LIVE_GALAXY_TESTS = "1"
dotnet test src/MxGateway.IntegrationTests/MxGateway.IntegrationTests.csproj --filter FullyQualifiedName~GalaxyRepositoryLiveTests
```
Optional live Galaxy variables:
| Variable | Default | Description |
|----------|---------|-------------|
| `MXGATEWAY_LIVE_GALAXY_CONN` | `Server=localhost;Database=ZB;Integrated Security=True;TrustServerCertificate=True;Encrypt=False;` | Galaxy Repository connection string. Set this when the `ZB` database is on a non-default instance or needs SQL authentication. |
The default connection string targets `ZB` on `localhost` with Windows
authentication, which matches the Galaxy Repository conventions in CLAUDE.md.
## Live LDAP
`DashboardLdapLiveTests` in `src/MxGateway.IntegrationTests/` exercises
`DashboardAuthenticator` against the live GLAuth directory. It is skipped unless
`MXGATEWAY_RUN_LIVE_LDAP_TESTS=1` is set because it binds against the GLAuth
service described in `glauth.md`.
The suite builds the authenticator with a default `GatewayOptions`, so
`LdapOptions.RequiredGroup` keeps its `GwAdmin` default. `GwAdmin` is the
gateway-specific dashboard-admin role and is **not** part of the five baseline
GLAuth role groups — it must be provisioned before the LDAP live tests pass.
`AuthenticateAsync_AdminInGwAdminGroup_Succeeds` fails (rather than skips) when
GLAuth has only the baseline groups, so this is a hard prerequisite beyond "LDAP
is up." See the "Adding a gw-specific group" section of `glauth.md` for the
provisioning step that adds `GwAdmin` and grants it to `admin`.
The suite covers both the success path and the `DashboardAuthenticator` failure
branches: `admin` in `GwAdmin` succeeds; `readonly` is denied for missing group;
`admin` with a wrong password is rejected by the candidate bind without leaking
the password into `FailureMessage`; an unknown username yields no candidate; and
an unreachable LDAP server is absorbed into a failed result rather than throwing.
Run the LDAP live tests explicitly:
```bash
$env:MXGATEWAY_RUN_LIVE_LDAP_TESTS = "1"
dotnet test src/MxGateway.IntegrationTests/MxGateway.IntegrationTests.csproj --filter FullyQualifiedName~DashboardLdapLiveTests
```
## Client E2E Scripts
`scripts/discover-testmachine-tags.ps1` queries the ZB Galaxy Repository for the
+12 -2
View File
@@ -10,7 +10,7 @@ The layer is composed of four collaborators:
| Type | Lifetime | Role |
|------|----------|------|
| `MxAccessGatewayService` | scoped (gRPC) | Implements the four `MxAccessGateway` RPCs, performs exception mapping. |
| `MxAccessGatewayService` | scoped (gRPC) | Implements the six `MxAccessGateway` RPCs, performs exception mapping. |
| `MxAccessGrpcRequestValidator` | singleton | Rejects malformed requests before any session work runs. |
| `MxAccessGrpcMapper` | singleton | Converts public proto types to internal `WorkerCommand`/`WorkerEvent` types and back. |
| `IEventStreamService` (`EventStreamService`) | singleton | Owns the event stream pipeline, including bounded queue and backpressure handling. |
@@ -29,7 +29,7 @@ A second gRPC service, `GalaxyRepositoryGrpcService`, is mapped alongside it. It
## RPC Handlers
`MxAccessGatewayService` derives from the generated `MxAccessGateway.MxAccessGatewayBase` and implements every RPC declared in `mxaccess_gateway.proto`. The proto contract itself is documented in [Contracts](./Contracts.md); this section covers only what the server-side handler does on top of that contract.
`MxAccessGatewayService` derives from the generated `MxAccessGateway.MxAccessGatewayBase` and implements every RPC declared in `mxaccess_gateway.proto` — six in total: `OpenSession`, `CloseSession`, `Invoke`, `StreamEvents`, `AcknowledgeAlarm`, and `QueryActiveAlarms`. The proto contract itself is documented in [Contracts](./Contracts.md); this section covers only what the server-side handler does on top of that contract.
Public gRPC send and receive message sizes are configured from
`MxGateway:Protocol:MaxGrpcMessageBytes` (default 16 MiB). Official clients use
@@ -86,6 +86,14 @@ Carrying the enqueue timestamp into the worker layer is what lets queue-wait tim
`StreamEvents` is a server-streaming RPC. The handler delegates the full pipeline to `IEventStreamService` and just forwards each `MxEvent` onto the response stream. Keeping the channel and producer/consumer machinery out of the handler means cancellation, exception mapping, and metric bookkeeping live in one place.
### `AcknowledgeAlarm`
`AcknowledgeAlarm` is a unary RPC that acknowledges a single alarm. The handler validates `session_id` and `alarm_full_reference` inline (it does not run through `MxAccessGrpcRequestValidator`, because the alarm surface routes through `IAlarmRpcDispatcher` rather than the generic `Invoke` path), resolves the session, then delegates to the registered `IAlarmRpcDispatcher`. The production `WorkerAlarmRpcDispatcher` routes the ack over the worker IPC by GUID (`AcknowledgeAlarmCommand`) when the reference parses as a canonical GUID, or by `Provider!Group.Tag` reference (`AcknowledgeAlarmByNameCommand`) otherwise. The handler-level RPC behaviour and the alarm contract itself are documented in [Alarm Client Discovery](./AlarmClientDiscovery.md).
### `QueryActiveAlarms`
`QueryActiveAlarms` is a server-streaming RPC that returns an `ActiveAlarmSnapshot` per currently active alarm. The handler validates `session_id` inline, resolves the session, and delegates to `IAlarmRpcDispatcher`; `WorkerAlarmRpcDispatcher` issues a `QueryActiveAlarmsCommand` over the worker IPC and streams each snapshot from the worker reply.
## Validation Rules
`MxAccessGrpcRequestValidator` rejects requests with `StatusCode.InvalidArgument` before any session work happens. The rules are intentionally narrow — anything that requires session state (for example, "session does not exist") is left for `ISessionManager` so the validator can stay synchronous and side-effect free.
@@ -96,6 +104,8 @@ Carrying the enqueue timestamp into the worker layer is what lets queue-wait tim
| `CloseSession` | `session_id` must be non-empty. | `InvalidArgument` |
| `StreamEvents` | `session_id` must be non-empty. | `InvalidArgument` |
| `Invoke` | `session_id` non-empty, `command` present, `kind` not `Unspecified`, payload oneof must match `kind`. | `InvalidArgument` |
| `AcknowledgeAlarm` | `session_id` and `alarm_full_reference` must be non-empty. Validated inline in the handler, not by `MxAccessGrpcRequestValidator`. | `InvalidArgument` |
| `QueryActiveAlarms` | `session_id` must be non-empty. Validated inline in the handler, not by `MxAccessGrpcRequestValidator`. | `InvalidArgument` |
The payload-vs-kind check matters because the `MxCommand.payload` oneof is non-discriminated on the wire — a misaligned client could send `kind = Write` with a `Register` payload and silently confuse the worker. The validator turns that into a clear client error:
+11 -6
View File
@@ -35,17 +35,22 @@ oversized frames, protocol version mismatches, and session mismatches.
## Verification
The frame protocol lives in `MxGateway.Worker.Ipc` (`WorkerFrameReader`,
`WorkerFrameWriter`, `WorkerFrameProtocolOptions`) and is covered by
`src/MxGateway.Worker.Tests/Ipc/WorkerFrameProtocolTests.cs`. The worker is an
x86 process, so build and test it with `-p:Platform=x86`.
Run the focused tests after changing the frame protocol:
```bash
dotnet test src/MxGateway.Tests/MxGateway.Tests.csproj --filter WorkerFrameProtocolTests
```powershell
dotnet test src/MxGateway.Worker.Tests/MxGateway.Worker.Tests.csproj -p:Platform=x86 --filter WorkerFrameProtocolTests
```
Run the gateway build because the frame protocol is part of
`MxGateway.Server`:
Run the x86 worker build because the frame protocol is part of
`MxGateway.Worker`:
```bash
dotnet build src/MxGateway.Server/MxGateway.Server.csproj
```powershell
dotnet build src/MxGateway.Worker/MxGateway.Worker.csproj -p:Platform=x86
```
## Related Documentation
+5 -2
View File
@@ -579,8 +579,11 @@ Policy:
- command exceptions return structured command fault with HRESULT if known,
- stale sessions are closed by lease timeout,
- stuck workers are killed by process id,
- gateway restart should not attempt to reattach old workers unless explicitly
designed; first version should terminate orphaned workers on startup.
- gateway restart does not reattach old workers; `OrphanWorkerCleanupHostedService`
runs `OrphanWorkerTerminator` once on startup — before the server accepts
sessions — to kill leftover `MxGateway.Worker.exe` processes (matched by the
configured worker executable path, or by image name when the x64 gateway cannot
introspect the x86 worker's module) left behind by a previous unclean run.
Because each client owns one worker, a crash or leak affects only that session.
+280
View File
@@ -0,0 +1,280 @@
# GLAuth — LDAP authn reference for mxaccessgw
GLAuth is a lightweight LDAP server installed on this dev box at
`C:\publish\glauth\` and run as a Windows service via NSSM. It already
backs the LmxOpcUa OPC UA server's UserName-token authn and the LmxOpcUa
Admin UI's cookie login; this doc captures everything mxaccessgw needs
to consume the same directory so a single set of dev credentials covers
both stacks.
The authoritative copy of LmxOpcUa's reference lives at
`C:\publish\glauth\auth.md`. This doc is a redistilled view tailored to
mxaccessgw — what users + groups are already provisioned, how to bind
against them, and what's needed to add a gw-specific role.
## Connection details
| Setting | Value |
|---|---|
| Protocol | LDAP (unencrypted) |
| Host | `localhost` |
| Port | `3893` |
| LDAPS | disabled in dev (set `[ldaps]` block to enable) |
| Base DN | `dc=lmxopcua,dc=local` |
| Bind DN format | `cn={username},dc=lmxopcua,dc=local` |
| Group OU | `ou=<groupname>,ou=groups,dc=lmxopcua,dc=local` |
| Failed-bind throttle | 3 fails → 10-minute IP lockout (per `[behaviors]`) |
## Pre-existing groups (LmxOpcUa role taxonomy)
These map cleanly onto MxAccess capability boundaries — mxaccessgw
should reuse them rather than define parallel groups so an operator with
LmxOpcUa write rights doesn't need a second account for the gw.
| Group | GID | DN | LmxOpcUa meaning | Suggested mxgw mapping |
|---|---|---|---|---|
| ReadOnly | 5501 | `ou=ReadOnly,ou=groups,dc=lmxopcua,dc=local` | Browse + read OPC UA nodes | `Browse` + `Subscribe` (read paths only) |
| WriteOperate | 5502 | `ou=WriteOperate,ou=groups,dc=lmxopcua,dc=local` | Write FreeAccess / Operate attrs | `Write` (plain) |
| WriteTune | 5504 | `ou=WriteTune,ou=groups,dc=lmxopcua,dc=local` | Write Tune attrs | `WriteSecured` (Tune only) |
| WriteConfigure | 5505 | `ou=WriteConfigure,ou=groups,dc=lmxopcua,dc=local` | Write Configure attrs | `WriteSecured` (Configure) |
| AlarmAck | 5503 | `ou=AlarmAck,ou=groups,dc=lmxopcua,dc=local` | Acknowledge alarms | gw alarm-ack RPC, when added |
**A user can be in multiple groups** — `othergroups = [...]` in the
config is a list. `admin` is the canonical example (in every role
group below).
## Pre-provisioned users
| Username | Password | UID | Primary group | Other groups | Capabilities |
|---|---|---|---|---|---|
| `readonly` | `readonly123` | 5001 | ReadOnly | — | Browse, read |
| `writeop` | `writeop123` | 5002 | WriteOperate | — | + plain Write |
| `writetune` | `writetune123` | 5005 | WriteTune | — | + WriteSecured (Tune) |
| `writeconfig` | `writeconfig123` | 5006 | WriteConfigure | — | + WriteSecured (Configure) |
| `alarmack` | `alarmack123` | 5003 | AlarmAck | — | Alarm acknowledgment |
| `admin` | `admin123` | 5004 | ReadOnly | WriteOperate, AlarmAck, WriteTune, WriteConfigure | All roles |
| `serviceaccount` | `serviceaccount123` | 5999 | ReadOnly | — | LDAP search capability (for bind-then-search) |
For mxaccessgw dev, `admin` covers every gw-side capability test;
`readonly` is the right "negative" case for proving Browse-OK /
Write-denied.
The gateway dashboard adds one role beyond this LmxOpcUa taxonomy:
`GwAdmin`. `LdapOptions.RequiredGroup` defaults to `GwAdmin`, so the
dashboard login and `DashboardLdapLiveTests` require `admin` to be a
member of a `GwAdmin` group. `GwAdmin` is **not** in the baseline
GLAuth config — it must be provisioned before dashboard authn or the
LDAP live tests work. See [Provisioning the GwAdmin
group](#provisioning-the-gwadmin-group) below.
## Two bind patterns
### 1. Direct bind (simplest)
```
DN: cn=admin,dc=lmxopcua,dc=local
Password: admin123
```
Construct the DN from the username; bind. Works on GLAuth because
`backend.nameformat = "cn"` and `groupformat = "ou"` are set in the
config. **Doesn't translate to Active Directory** — AD users are keyed
by `sAMAccountName`, not `cn`. Use this only for dev convenience.
### 2. Bind-then-search (production-grade)
```
1. Bind as the service account (cn=serviceaccount,dc=lmxopcua,dc=local
/ serviceaccount123).
2. Search under dc=lmxopcua,dc=local with filter
(uid=<entered-username>) — or any attribute the deployment
identifies users by. GLAuth populates uid + cn.
3. Read the returned entry's DN + memberOf list (groups).
4. Bind again as the discovered DN with the entered password. If that
succeeds, authn passes; the memberOf values become the role set.
```
The second bind is the actual password check — the search is just a DN
discovery. This is the AD-friendly path: AD's
`tokenGroups` / `LDAP_MATCHING_RULE_IN_CHAIN` flatten nested groups, but
that's an enhancement, not required for first-pass dev.
LmxOpcUa's `Server/Security/LdapUserAuthenticator.cs` ships a working
implementation of this pattern using `Novell.Directory.Ldap.NETStandard`
v3.6.0 — copy the bind-then-search loop from there if mxaccessgw wants
to avoid re-deriving the LDAP escape-string handling.
## Suggested mxgw configuration shape
A YAML/JSON section for mxaccessgw that mirrors LmxOpcUa's `LdapOptions`
record:
```yaml
ldap:
enabled: true
server: localhost
port: 3893
useTls: false
allowInsecureLdap: true # dev only
searchBase: "dc=lmxopcua,dc=local"
serviceAccountDn: "cn=serviceaccount,dc=lmxopcua,dc=local"
serviceAccountPassword: "serviceaccount123"
userNameAttribute: "uid" # GLAuth populates this; AD uses sAMAccountName
displayNameAttribute: "cn"
groupAttribute: "memberOf"
groupToRole:
ReadOnly: "Browse"
WriteOperate: "Write"
WriteTune: "WriteSecured"
WriteConfigure: "WriteSecured"
AlarmAck: "AlarmAck"
```
`groupAttribute` returns full DNs like
`ou=ReadOnly,ou=groups,dc=lmxopcua,dc=local` — the authenticator
should strip the leading `ou=` (or `cn=` against AD) RDN value and
look that up in `groupToRole`.
## Provisioning the GwAdmin group
`GwAdmin` is the gateway-specific dashboard-admin role. It is the
default `LdapOptions.RequiredGroup`, so the dashboard cookie login and
`DashboardLdapLiveTests` (`MXGATEWAY_RUN_LIVE_LDAP_TESTS=1`) reject
`admin` until a `GwAdmin` group exists and `admin` is a member.
GLAuth's baseline config ships only the five LmxOpcUa role groups, so
`GwAdmin` must be added to GLAuth rather than run from a separate LDAP
server:
1. Edit `C:\publish\glauth\glauth.cfg`
2. Append the group:
```toml
[[groups]]
name = "GwAdmin"
gidnumber = 5510 # pick the next free GID
```
3. Add `5510` to `admin`'s `othergroups` list so `admin` resolves the
`GwAdmin` role. Add it to any other user that needs dashboard-admin
rights. Or create a dedicated user:
```toml
[[users]]
name = "gwadmin"
givenname = "Gateway"
sn = "Admin"
mail = "gwadmin@lmxopcua.local"
uidnumber = 5010
primarygroup = 5510
passsha256 = "<sha256 of the password — see below>"
```
4. `nssm restart GLAuth`
After the restart, `admin`'s `memberOf` includes
`ou=GwAdmin,ou=groups,dc=lmxopcua,dc=local`, which the authenticator
strips to `GwAdmin` and matches against `RequiredGroup`. The same
pattern applies to any future permission that doesn't fit the existing
five roles.
Generate `passsha256` from a plaintext password:
```powershell
# Windows / PowerShell
$bytes = [System.Text.Encoding]::UTF8.GetBytes("yourpassword")
$hash = [System.Security.Cryptography.SHA256]::Create().ComputeHash($bytes)
-join ($hash | ForEach-Object { $_.ToString("x2") })
```
```bash
# WSL / git-bash
echo -n "yourpassword" | openssl dgst -sha256
```
## Quick verification
From mxaccessgw's dev box, prove the directory is reachable:
```powershell
# Plain bind via PowerShell + System.DirectoryServices.Protocols
$ldap = New-Object System.DirectoryServices.Protocols.LdapConnection("localhost:3893")
$ldap.AuthType = [System.DirectoryServices.Protocols.AuthType]::Basic
$ldap.SessionOptions.ProtocolVersion = 3
$ldap.SessionOptions.SecureSocketLayer = $false
$cred = New-Object System.Net.NetworkCredential("cn=admin,dc=lmxopcua,dc=local","admin123")
$ldap.Bind($cred)
"Bind OK"
```
Or via `ldapsearch` if you have OpenLDAP CLI tools:
```bash
ldapsearch -x -H ldap://localhost:3893 \
-D "cn=admin,dc=lmxopcua,dc=local" -w admin123 \
-b "dc=lmxopcua,dc=local" "(uid=admin)"
```
The response should list `admin`'s entry with `memberOf` populated for
all five role groups — plus `GwAdmin` once the gateway-specific group
is provisioned.
## Service management
```powershell
# Status / start / stop / restart
nssm status GLAuth
nssm start GLAuth
nssm stop GLAuth
nssm restart GLAuth
# Inspect what NSSM was told to launch
nssm get GLAuth Parameters
```
Logs:
| File | Purpose |
|---|---|
| `C:\publish\glauth\logs\stdout.log` | Bind events, search responses |
| `C:\publish\glauth\logs\stderr.log` | Startup errors, config parse failures |
After editing `glauth.cfg`, always tail `stderr.log` after the restart
to catch a fat-fingered TOML before it bites at first bind:
```powershell
nssm restart GLAuth
Get-Content C:\publish\glauth\logs\stderr.log -Tail 20 -Wait
```
## Active Directory migration cheat-sheet
LmxOpcUa's `LdapOptions` xml-doc captures the AD overrides; same set
applies to mxaccessgw verbatim. Keys that change:
| Field | GLAuth dev value | AD production value |
|---|---|---|
| `Server` | `localhost` | a domain controller FQDN, or the domain itself |
| `Port` | `3893` | `636` (LDAPS) — AD increasingly rejects plain bind under LDAP-signing enforcement |
| `UseTls` | `false` | `true` |
| `AllowInsecureLdap` | `true` | `false` |
| `SearchBase` | `dc=lmxopcua,dc=local` | `DC=corp,DC=example,DC=com` |
| `ServiceAccountDn` | `cn=serviceaccount,dc=lmxopcua,dc=local` | `CN=MxGwSvc,OU=Service Accounts,DC=corp,...` |
| `UserNameAttribute` | `uid` | `sAMAccountName` (or `userPrincipalName`) |
| `GroupAttribute` | `memberOf` (unchanged) | `memberOf` (unchanged) |
`memberOf` returns full DNs; the authenticator strips the leading
`CN=` value and uses it as the lookup key in `groupToRole`. Nested
groups are **not** auto-expanded; either flatten in the directory or
add a `tokenGroups` query as an enhancement.
## Security notes for production
- **Plaintext passwords in `glauth.cfg` are dev-only.** The config is
unencrypted on disk; anyone with read access to `C:\publish\glauth\`
can SHA256-rainbow-table the entries. Treat the dev creds as
throwaway. Production LDAP is Active Directory.
- The 3-fail / 10-minute lockout is per source IP, not per user — a
shared NAT can lock out a whole office. Tunable in `[behaviors]`.
- LDAPS isn't enabled in dev; binding sends passwords cleartext on the
wire. Fine for `localhost`, never expose port 3893 off-box without
enabling TLS first.
+20
View File
@@ -0,0 +1,20 @@
# Verifies code-reviews/README.md is regenerated from, and consistent with, the
# per-module findings.md files. Intended as a CI / pre-commit gate.
#
# Exits non-zero when README.md is stale, when a module header's "Open findings"
# count disagrees with its finding statuses, or when a finding carries an
# unrecognised Status value. See REVIEW-PROCESS.md section 5.
[CmdletBinding()]
param()
Set-StrictMode -Version Latest
$ErrorActionPreference = "Stop"
$repoRoot = Resolve-Path (Join-Path $PSScriptRoot "..")
$script = Join-Path $repoRoot "code-reviews/regen-readme.py"
# The bare `python3` alias on this platform resolves to the Windows Store stub;
# `python` is the real interpreter.
& python $script --check
exit $LASTEXITCODE
@@ -1,8 +1,10 @@
namespace MxGateway.Contracts;
/// <summary>
/// Exposes version metadata shared by gateway components before generated
/// protobuf contracts are introduced.
/// Holds the protocol version constants shared by gateway components.
/// <see cref="GatewayProtocolVersion"/> is advertised to clients in
/// <c>OpenSessionReply</c>; <see cref="WorkerProtocolVersion"/> is used to
/// validate <c>WorkerEnvelope</c> protocol framing on the gateway↔worker pipe.
/// </summary>
public static class GatewayContractInfo
{
@@ -13388,6 +13388,17 @@ namespace MxGateway.Contracts.Proto {
/// <summary>Field number for the "acknowledge_alarm" field.</summary>
public const int AcknowledgeAlarmFieldNumber = 34;
/// <summary>
/// Reply payload for BOTH MX_COMMAND_KIND_ACKNOWLEDGE_ALARM (by GUID)
/// and MX_COMMAND_KIND_ACKNOWLEDGE_ALARM_BY_NAME. There is intentionally
/// no by-name-specific reply case: the by-name ack carries no outcome
/// detail beyond the native ack return code, so the worker reuses this
/// `acknowledge_alarm` payload for both command kinds (the worker's
/// MxAccessCommandExecutor sets `acknowledge_alarm` for the by-name arm
/// too). Consumers must dispatch on MxCommandReply.kind, not on the
/// payload case, to tell the two acks apart. The top-level `hresult`
/// mirrors AcknowledgeAlarmReplyPayload.native_status and is preferred.
/// </summary>
[global::System.Diagnostics.DebuggerNonUserCodeAttribute]
[global::System.CodeDom.Compiler.GeneratedCode("protoc", null)]
public global::MxGateway.Contracts.Proto.AcknowledgeAlarmReplyPayload AcknowledgeAlarm {
@@ -17339,12 +17350,16 @@ namespace MxGateway.Contracts.Proto {
}
/// <summary>
/// Reply payload for AcknowledgeAlarmCommand. Surfaces AVEVA's native
/// AlarmAckByGUID return code; 0 means success. The MxCommandReply's
/// hresult field carries the same value and is preferred for protocol
/// consumers — this payload exists so the gateway-side
/// WorkerAlarmRpcDispatcher can echo native_status into
/// AcknowledgeAlarmReply.hresult without unpacking the outer envelope.
/// Reply payload for AcknowledgeAlarmCommand AND
/// AcknowledgeAlarmByNameCommand — both ack command kinds reuse this
/// payload case (`MxCommandReply.acknowledge_alarm`); there is no
/// dedicated by-name reply case. Surfaces AVEVA's native ack return
/// code (AlarmAckByGUID for the GUID arm, AlarmAckByName for the
/// by-name arm); 0 means success. The MxCommandReply's hresult field
/// carries the same value and is preferred for protocol consumers —
/// this payload exists so the gateway-side WorkerAlarmRpcDispatcher
/// can echo native_status into AcknowledgeAlarmReply.hresult without
/// unpacking the outer envelope.
/// </summary>
[global::System.Diagnostics.DebuggerDisplayAttribute("{ToString(),nq}")]
public sealed partial class AcknowledgeAlarmReplyPayload : pb::IMessage<AcknowledgeAlarmReplyPayload>
@@ -21403,7 +21418,12 @@ namespace MxGateway.Contracts.Proto {
private int hresult_;
/// <summary>
/// HRESULT captured from MXAccess if the ack failed at the COM layer.
/// Native ack return code echoed from the worker. The worker carries the
/// ack outcome as a single int32 (AcknowledgeAlarmReplyPayload.native_status,
/// = AlarmAckByName / AlarmAckByGUID return code; 0 = success); the gateway's
/// WorkerAlarmRpcDispatcher copies that value here. This is the authoritative
/// ack-outcome field for the public RPC. Absent only when the worker reply
/// omitted the value entirely (a protocol violation).
/// </summary>
[global::System.Diagnostics.DebuggerNonUserCodeAttribute]
[global::System.CodeDom.Compiler.GeneratedCode("protoc", null)]
@@ -21431,7 +21451,11 @@ namespace MxGateway.Contracts.Proto {
public const int StatusFieldNumber = 5;
private global::MxGateway.Contracts.Proto.MxStatusProxy status_;
/// <summary>
/// Native MxAccess status describing the outcome of the ack.
/// Reserved for a structured MxStatusProxy view of the ack outcome. The
/// worker by-name/by-GUID ack path produces only the int32 return code
/// (see `hresult`), so the current gateway leaves this field UNSET on every
/// reply. Clients must read `hresult` (and `protocol_status`) for the ack
/// result and must not depend on `status` being populated.
/// </summary>
[global::System.Diagnostics.DebuggerNonUserCodeAttribute]
[global::System.CodeDom.Compiler.GeneratedCode("protoc", null)]
@@ -22063,6 +22087,17 @@ namespace MxGateway.Contracts.Proto {
/// <summary>Field number for the "success" field.</summary>
public const int SuccessFieldNumber = 1;
private int success_;
/// <summary>
/// Mirrors the `success` member of the MXAccess MXSTATUS_PROXY struct
/// (a 16-bit signed value in the COM struct, widened to int32 on the
/// wire). Despite the name it is NOT a boolean — it is the raw numeric
/// indicator the worker reads off the COM struct without reinterpretation.
/// It is carried verbatim for diagnostics; the authoritative success/
/// failure of the operation is `category` (MX_STATUS_CATEGORY_OK marks
/// success), with `detail`, `diagnostic_text`, `raw_category`, and
/// `raw_detected_by` describing any non-OK outcome. Clients should branch
/// on `category`, not on a specific `success` value.
/// </summary>
[global::System.Diagnostics.DebuggerNonUserCodeAttribute]
[global::System.CodeDom.Compiler.GeneratedCode("protoc", null)]
public int Success {
@@ -7,6 +7,13 @@ option csharp_namespace = "MxGateway.Contracts.Proto.Galaxy";
import "google/protobuf/timestamp.proto";
import "google/protobuf/wrappers.proto";
// Wire-compatibility policy (ProtobufStyleGuide): this contract evolves
// additively only. Never renumber or repurpose an existing field number or
// enum value. When a field or enum value is removed, add a `reserved` range
// (and `reserved` name) covering it in the same change so a future editor
// cannot accidentally reuse the retired tag. There are no `reserved`
// declarations today because no field or enum value has ever been removed.
// Read-only browse over the AVEVA System Platform Galaxy Repository (ZB SQL
// database). Lets clients enumerate the deployed object hierarchy and each
// object's dynamic attributes so they know what tag references to subscribe
@@ -7,6 +7,13 @@ option csharp_namespace = "MxGateway.Contracts.Proto";
import "google/protobuf/duration.proto";
import "google/protobuf/timestamp.proto";
// Wire-compatibility policy (ProtobufStyleGuide): this contract evolves
// additively only. Never renumber or repurpose an existing field number or
// enum value. When a field or enum value is removed, add a `reserved` range
// (and `reserved` name) covering it in the same change so a future editor
// cannot accidentally reuse the retired tag. There are no `reserved`
// declarations today because no field or enum value has ever been removed.
// Public client API for MXAccess sessions hosted by the gateway.
service MxAccessGateway {
rpc OpenSession(OpenSessionRequest) returns (OpenSessionReply);
@@ -381,6 +388,15 @@ message MxCommandReply {
BulkSubscribeReply un_advise_item_bulk = 31;
BulkSubscribeReply subscribe_bulk = 32;
BulkSubscribeReply unsubscribe_bulk = 33;
// Reply payload for BOTH MX_COMMAND_KIND_ACKNOWLEDGE_ALARM (by GUID)
// and MX_COMMAND_KIND_ACKNOWLEDGE_ALARM_BY_NAME. There is intentionally
// no by-name-specific reply case: the by-name ack carries no outcome
// detail beyond the native ack return code, so the worker reuses this
// `acknowledge_alarm` payload for both command kinds (the worker's
// MxAccessCommandExecutor sets `acknowledge_alarm` for the by-name arm
// too). Consumers must dispatch on MxCommandReply.kind, not on the
// payload case, to tell the two acks apart. The top-level `hresult`
// mirrors AcknowledgeAlarmReplyPayload.native_status and is preferred.
AcknowledgeAlarmReplyPayload acknowledge_alarm = 34;
QueryActiveAlarmsReplyPayload query_active_alarms = 35;
SessionStateReply session_state = 100;
@@ -448,12 +464,16 @@ message DrainEventsReply {
repeated MxEvent events = 1;
}
// Reply payload for AcknowledgeAlarmCommand. Surfaces AVEVA's native
// AlarmAckByGUID return code; 0 means success. The MxCommandReply's
// hresult field carries the same value and is preferred for protocol
// consumers this payload exists so the gateway-side
// WorkerAlarmRpcDispatcher can echo native_status into
// AcknowledgeAlarmReply.hresult without unpacking the outer envelope.
// Reply payload for AcknowledgeAlarmCommand AND
// AcknowledgeAlarmByNameCommand both ack command kinds reuse this
// payload case (`MxCommandReply.acknowledge_alarm`); there is no
// dedicated by-name reply case. Surfaces AVEVA's native ack return
// code (AlarmAckByGUID for the GUID arm, AlarmAckByName for the
// by-name arm); 0 means success. The MxCommandReply's hresult field
// carries the same value and is preferred for protocol consumers
// this payload exists so the gateway-side WorkerAlarmRpcDispatcher
// can echo native_status into AcknowledgeAlarmReply.hresult without
// unpacking the outer envelope.
message AcknowledgeAlarmReplyPayload {
int32 native_status = 1;
}
@@ -628,9 +648,18 @@ message AcknowledgeAlarmReply {
string session_id = 1;
string correlation_id = 2;
ProtocolStatus protocol_status = 3;
// HRESULT captured from MXAccess if the ack failed at the COM layer.
// Native ack return code echoed from the worker. The worker carries the
// ack outcome as a single int32 (AcknowledgeAlarmReplyPayload.native_status,
// = AlarmAckByName / AlarmAckByGUID return code; 0 = success); the gateway's
// WorkerAlarmRpcDispatcher copies that value here. This is the authoritative
// ack-outcome field for the public RPC. Absent only when the worker reply
// omitted the value entirely (a protocol violation).
optional int32 hresult = 4;
// Native MxAccess status describing the outcome of the ack.
// Reserved for a structured MxStatusProxy view of the ack outcome. The
// worker by-name/by-GUID ack path produces only the int32 return code
// (see `hresult`), so the current gateway leaves this field UNSET on every
// reply. Clients must read `hresult` (and `protocol_status`) for the ack
// result and must not depend on `status` being populated.
MxStatusProxy status = 5;
string diagnostic_message = 6;
}
@@ -644,6 +673,15 @@ message QueryActiveAlarmsRequest {
}
message MxStatusProxy {
// Mirrors the `success` member of the MXAccess MXSTATUS_PROXY struct
// (a 16-bit signed value in the COM struct, widened to int32 on the
// wire). Despite the name it is NOT a boolean it is the raw numeric
// indicator the worker reads off the COM struct without reinterpretation.
// It is carried verbatim for diagnostics; the authoritative success/
// failure of the operation is `category` (MX_STATUS_CATEGORY_OK marks
// success), with `detail`, `diagnostic_text`, `raw_category`, and
// `raw_detected_by` describing any non-OK outcome. Clients should branch
// on `category`, not on a specific `success` value.
int32 success = 1;
MxStatusCategory category = 2;
MxStatusSource detected_by = 3;
@@ -8,6 +8,13 @@ import "google/protobuf/duration.proto";
import "google/protobuf/timestamp.proto";
import "mxaccess_gateway.proto";
// Wire-compatibility policy (ProtobufStyleGuide): this contract evolves
// additively only. Never renumber or repurpose an existing field number or
// enum value. When a field or enum value is removed, add a `reserved` range
// (and `reserved` name) covering it in the same change so a future editor
// cannot accidentally reuse the retired tag. There are no `reserved`
// declarations today because no field or enum value has ever been removed.
// Gateway-to-worker IPC envelope. Named-pipe framing prepends a little-endian
// uint32 payload length to this protobuf payload.
message WorkerEnvelope {
@@ -0,0 +1,116 @@
using System.Security.Claims;
using Microsoft.Extensions.Logging.Abstractions;
using Microsoft.Extensions.Options;
using MxGateway.Server.Configuration;
using MxGateway.Server.Dashboard;
namespace MxGateway.IntegrationTests;
[Collection(LiveResourcesCollection.Name)]
public sealed class DashboardLdapLiveTests
{
[LiveLdapFact]
[Trait("Category", "LiveLdap")]
public async Task AuthenticateAsync_AdminInGwAdminGroup_Succeeds()
{
DashboardAuthenticator authenticator = CreateAuthenticator();
DashboardAuthenticationResult result = await authenticator.AuthenticateAsync(
"admin",
"admin123",
CancellationToken.None);
Assert.True(result.Succeeded);
Assert.NotNull(result.Principal);
Assert.Equal("admin", result.Principal.FindFirst(ClaimTypes.NameIdentifier)?.Value);
Assert.Contains(result.Principal.Claims, claim =>
claim.Type == DashboardAuthenticationDefaults.LdapGroupClaimType
&& claim.Value.Contains("GwAdmin", StringComparison.OrdinalIgnoreCase));
}
[LiveLdapFact]
[Trait("Category", "LiveLdap")]
public async Task AuthenticateAsync_ReadOnlyUserMissingGwAdminGroup_Fails()
{
DashboardAuthenticator authenticator = CreateAuthenticator();
DashboardAuthenticationResult result = await authenticator.AuthenticateAsync(
"readonly",
"readonly123",
CancellationToken.None);
Assert.False(result.Succeeded);
Assert.Null(result.Principal);
Assert.DoesNotContain("readonly123", result.FailureMessage, StringComparison.Ordinal);
}
[LiveLdapFact]
[Trait("Category", "LiveLdap")]
public async Task AuthenticateAsync_AdminWithWrongPassword_FailsWithoutLeakingPassword()
{
// Exercises the LdapException branch: the user exists and the service
// account search succeeds, but the candidate bind is rejected.
const string wrongPassword = "definitely-not-the-admin-password";
DashboardAuthenticator authenticator = CreateAuthenticator();
DashboardAuthenticationResult result = await authenticator.AuthenticateAsync(
"admin",
wrongPassword,
CancellationToken.None);
Assert.False(result.Succeeded);
Assert.Null(result.Principal);
Assert.DoesNotContain(wrongPassword, result.FailureMessage, StringComparison.Ordinal);
}
[LiveLdapFact]
[Trait("Category", "LiveLdap")]
public async Task AuthenticateAsync_UnknownUsername_Fails()
{
// Exercises the `candidate is null` branch: the service-account search
// returns no entry, so no candidate bind is attempted.
DashboardAuthenticator authenticator = CreateAuthenticator();
DashboardAuthenticationResult result = await authenticator.AuthenticateAsync(
"no-such-user-9f3c1",
"irrelevant-password",
CancellationToken.None);
Assert.False(result.Succeeded);
Assert.Null(result.Principal);
}
[LiveLdapFact]
[Trait("Category", "LiveLdap")]
public async Task AuthenticateAsync_ServerUnreachable_FailsWithoutThrowing()
{
// Exercises the connect-failure path: a closed loopback port produces a
// connection error that DashboardAuthenticator must absorb into a Fail
// result rather than propagating an exception to the dashboard.
DashboardAuthenticator authenticator = new(
Options.Create(new GatewayOptions
{
Ldap = new LdapOptions
{
// 1 is a reserved port number that no LDAP server listens on.
Port = 1,
},
}),
NullLogger<DashboardAuthenticator>.Instance);
DashboardAuthenticationResult result = await authenticator.AuthenticateAsync(
"admin",
"admin123",
CancellationToken.None);
Assert.False(result.Succeeded);
Assert.Null(result.Principal);
}
private static DashboardAuthenticator CreateAuthenticator()
{
return new DashboardAuthenticator(
Options.Create(new GatewayOptions()),
NullLogger<DashboardAuthenticator>.Instance);
}
}

Some files were not shown because too many files have changed in this diff Show More