Compare commits

...

6 Commits

Author SHA1 Message Date
Joseph Doherty 430187c28b code-reviews: regenerate after batch 1 resolutions
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-24 08:50:39 -04:00
Joseph Doherty f5b50c4484 Resolve Client.Python-022..026: TLS-by-default, batch CLI, README
Client.Python-022  README CLI examples for stream-alarms and
                   acknowledge-alarm now use the correct flags;
                   regression test parses every documented line through
                   Click.
Client.Python-023  Re-applied Client.Python-013 — _use_plaintext drops
                   the silent localhost / 127.0.0.1 auto-downgrade
                   branch; --plaintext and --tls are mutually exclusive
                   and TLS is the default.
Client.Python-024  batch dispatch routes through main.main(...,
                   standalone_mode=False) under a redirected stdout
                   instead of click.testing.CliRunner; recursive batch
                   lines are rejected outright.
Client.Python-025  Added behavioural tests for the five bulk SDK methods,
                   stream_alarms, and the new CLI subcommands.
Client.Python-026  _bench_read_bulk hoists 'import time' to module scope
                   and logs cleanup failures instead of swallowing them.

All resolved at 2026-05-24; python -m pytest is 65/65 green.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-24 08:50:27 -04:00
Joseph Doherty 4a0f88b17d Resolve Client.Rust-022..029: MalformedReply, correlation ids, clippy
Client.Rust-022  Restored Error::MalformedReply for register / add_item /
                 add_item2 and the bulk-subscribe / read-bulk / write-bulk
                 dispatch arms so malformed-but-OK replies fail loudly
                 instead of returning Vec::new().
Client.Rust-023  Restored next_correlation_id and routed every CLI close /
                 stream-alarms / acknowledge-alarm / bench-read-bulk call
                 through it so each call carries a unique opaque token.
Client.Rust-024  Added round-trip tests for read_bulk / write_bulk /
                 write2_bulk / write_secured_bulk / write_secured2_bulk
                 plus stream_alarms and percentile_summary unit tests.
Client.Rust-025  RustClientDesign.md re-synced — new bulk SDK, alarms
                 surface, Error variants, CLI command list, and the
                 Windows stack workaround.
Client.Rust-026  Session::read_bulk now borrows a tag slice; bench-read-
                 bulk binds tags once outside the warm-up / steady-state
                 loops.
Client.Rust-027  .cargo/config.toml selector tightened to
                 cfg(all(windows, target_env = "msvc")) and comment
                 rewritten to match reality (release + debug ship the
                 8 MB reservation).
Client.Rust-028  run_batch removed the empty-line break; stdin EOF is
                 the only terminator.
Client.Rust-029  Re-applied Client.Rust-001 / 002 / 012 — added the
                 missing doc comments, renamed BulkReplyKind variants,
                 and replaced the clone-on-copy with a deref under lock
                 so cargo clippy -D warnings is clean.

All resolved at 2026-05-24; cargo fmt + check + clippy + test all green
(55 tests).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-24 08:50:15 -04:00
Joseph Doherty 82996aa8e6 Resolve Client.Go-022..027: bulk flags, bench cancel, batch loop
Client.Go-022  Re-applied Client.Go-015 shape — runWriteBulkVariant drops
               the unused secured param and gates -current-user-id /
               -verifier-user-id / -user-id behind the secured-only
               variants.
Client.Go-023  Re-applied Client.Go-018 shape — bench warm-up and steady-
               state loops respect ctx.Err().
Client.Go-024  Added SDK-level tests for WriteBulk / Write2Bulk /
               WriteSecuredBulk / WriteSecured2Bulk / ReadBulk and
               StreamAlarms via the existing bufconn fake gateway pattern.
Client.Go-025  Five bulk SDK methods short-circuit on empty input without
               an RPC round-trip and document the behavior.
Client.Go-026  runBatch widens scanner.Buffer to 16 MiB and emits an
               error-with-sentinel if a longer line still arrives, rather
               than aborting the session silently.
Client.Go-027  runBatch treats blank lines as skip-and-continue; only EOF
               ends the session.

All resolved at 2026-05-24; gofmt + go vet + go build + go test ./... all
green.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-24 08:49:58 -04:00
Joseph Doherty 712cb06442 Resolve Client.Dotnet-018..021: README + bench-read-bulk hardening
Client.Dotnet-018  README CLI examples for stream-alarms / acknowledge-alarm
                   replaced with parser-correct flags; new theory test
                   parses each documented README example through the CLI.
Client.Dotnet-019  BenchReadBulkAsync routes through new
                   RequireRegisterServerHandle helper that fails loudly when
                   the OK register reply has no typed payload.
Client.Dotnet-020  Bench steady-state catch is now
                   catch (Exception ex) when (ex is not OperationCanceledException)
                   so user-driven cancellation exits promptly.
Client.Dotnet-021  --timeout-ms now flows through ParseTimeoutMs which
                   rejects negatives with a clear error in both read-bulk
                   and bench-read-bulk.

All resolved at 2026-05-24; 67/67 .NET client tests pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-24 08:49:45 -04:00
Joseph Doherty 4d77279e7e Resolve Server-044..050: KillWorker accounting + admin service hardening
Server-044  KillWorkerAsync catch path now calls _metrics.SessionRemoved
            so the open-session gauge does not leak when KillWorker throws.
Server-045  KillWorkerAsync routes through a new
            GatewaySession.KillWorkerWithCloseGateAsync that takes the
            per-session close lock, so concurrent kills count SessionsClosed
            exactly once.
Server-046  CloseSessionCoreAsync's SessionCloseStartedException branch and
            ShutdownAsync's kill fallback both increment SessionsClosed (not
            just the gauge), so the counter and gauge stay consistent.
Server-047  ApiKeysPage.ConfirmPendingAsync holds PendingAction across the
            awaited action and clears it in finally, matching the sessions
            pages.
Server-048  Closed: the 044/045 regression tests cover the previously-
            untested kill paths.
Server-049  IDashboardSessionAdminService + DashboardSessionAdminService
            now carry XML docs that pin the Admin gate, missing-session
            return-Fail semantics, and the dashboard-admin-kill reason.
Server-050  CloseSessionAsync and KillWorkerAsync catch unexpected
            exceptions after the SessionManagerException catches and return
            a friendly Fail; OperationCanceledException tied to the caller
            token still propagates.

All resolved at 2026-05-24; 503/503 gateway tests pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-24 08:49:34 -04:00
33 changed files with 3373 additions and 269 deletions
+2 -2
View File
@@ -134,8 +134,8 @@ dotnet run --project clients/dotnet/ZB.MOM.WW.MxGateway.Client.Cli -- advise --s
dotnet run --project clients/dotnet/ZB.MOM.WW.MxGateway.Client.Cli -- write --session-id <id> --server-handle 1 --item-handle 1 --type int32 --value 123 --json
dotnet run --project clients/dotnet/ZB.MOM.WW.MxGateway.Client.Cli -- write2 --session-id <id> --server-handle 1 --item-handle 1 --type int32 --value 123 --timestamp 2026-01-01T00:00:00Z --json
dotnet run --project clients/dotnet/ZB.MOM.WW.MxGateway.Client.Cli -- stream-events --session-id <id> --max-events 1 --json
dotnet run --project clients/dotnet/ZB.MOM.WW.MxGateway.Client.Cli -- stream-alarms --session-id <id> --max-messages 1 --json
dotnet run --project clients/dotnet/ZB.MOM.WW.MxGateway.Client.Cli -- acknowledge-alarm --session-id <id> --alarm-reference "\\Galaxy\Area001.Pump001.PumpFault" --json
dotnet run --project clients/dotnet/ZB.MOM.WW.MxGateway.Client.Cli -- stream-alarms --filter-prefix Area001 --max-events 1 --json
dotnet run --project clients/dotnet/ZB.MOM.WW.MxGateway.Client.Cli -- acknowledge-alarm --reference "\\Galaxy\Area001.Pump001.PumpFault" --comment "ack from cli" --operator operator1 --json
dotnet run --project clients/dotnet/ZB.MOM.WW.MxGateway.Client.Cli -- smoke --endpoint http://localhost:5000 --api-key-env MXGATEWAY_API_KEY --item Area001.Pump001.Speed --json
```
@@ -487,7 +487,7 @@ public static class MxGatewayClientCli
ReadBulkCommand command = new()
{
ServerHandle = arguments.GetInt32("server-handle"),
TimeoutMs = (uint)arguments.GetInt32("timeout-ms", 0),
TimeoutMs = ParseTimeoutMs(arguments, defaultValue: 0),
};
command.TagAddresses.Add(ParseStringList(arguments.GetRequired("items")));
@@ -692,6 +692,49 @@ public static class MxGatewayClientCli
}
}
/// <summary>
/// Parses the optional <c>--timeout-ms</c> argument as a non-negative
/// unsigned millisecond count. Mirrors the SDK-side <c>(uint)Math.Min</c>
/// guard on <c>MxGatewaySession.ReadBulkAsync</c>: a negative value
/// (e.g. <c>-1</c>, an easy copy-paste mistake for "unbounded") is
/// rejected loudly rather than silently wrapped to <c>~49.7 days</c>,
/// which would park one worker thread per pending tag for hours.
/// Resolves Client.Dotnet-021.
/// </summary>
private static uint ParseTimeoutMs(CliArguments arguments, int defaultValue)
{
int raw = arguments.GetInt32("timeout-ms", defaultValue);
if (raw < 0)
{
throw new ArgumentException(
"--timeout-ms must be a non-negative integer (use 0 for the gateway default).");
}
return (uint)raw;
}
/// <summary>
/// Extracts the <c>ServerHandle</c> from a Register reply, throwing a
/// descriptive <see cref="MxGatewayException"/> when the typed
/// <c>Register</c> payload is absent on an otherwise-successful reply.
/// The typed sub-message is the contract for the Register command, so
/// its absence must not silently fall through to
/// <c>ReturnValue.Int32Value</c> (which would be <c>0</c> for an empty
/// reply, driving the rest of the bench against an invalid handle).
/// Resolves Client.Dotnet-019.
/// </summary>
private static int RequireRegisterServerHandle(MxCommandReply reply, string sessionId)
{
if (reply.Register is null)
{
throw new MxGatewayException(
$"Gateway reply for Register on session '{sessionId}' (correlation '{reply.CorrelationId}') "
+ "succeeded but is missing the typed 'register' payload required to read ServerHandle.");
}
return reply.Register.ServerHandle;
}
/// <summary>
/// Cross-language stress benchmark for ReadBulk. Opens its own session,
/// subscribes to N tags so the worker's MxAccessValueCache populates from
@@ -712,7 +755,7 @@ public static class MxGatewayClientCli
int tagStart = arguments.GetInt32("tag-start", 1);
string tagPrefix = arguments.GetOptional("tag-prefix") ?? "TestMachine_";
string tagAttribute = arguments.GetOptional("tag-attribute") ?? "TestChangingInt";
uint timeoutMs = (uint)arguments.GetInt32("timeout-ms", 1500);
uint timeoutMs = ParseTimeoutMs(arguments, defaultValue: 1500);
string clientName = arguments.GetOptional("client-name") ?? "mxgw-dotnet-bench";
string[] tags = new string[bulkSize];
@@ -742,7 +785,7 @@ public static class MxGatewayClientCli
}),
cancellationToken)
.ConfigureAwait(false);
int serverHandle = registerReply.Register?.ServerHandle ?? registerReply.ReturnValue.Int32Value;
int serverHandle = RequireRegisterServerHandle(registerReply, sessionId);
SubscribeBulkCommand subscribe = new() { ServerHandle = serverHandle };
subscribe.TagAddresses.Add(tags);
@@ -801,8 +844,13 @@ public static class MxGatewayClientCli
.ConfigureAwait(false);
sw.Stop();
}
catch
catch (Exception ex) when (ex is not OperationCanceledException)
{
// Client.Dotnet-020: never swallow OperationCanceledException
// here. A bare `catch` would let Ctrl+C / parent CTS /
// wall-clock timeouts keep spinning until --duration-seconds
// elapsed, burning CPU and skewing the p99/max latency numbers
// with hundreds of immediate-OCE iterations.
sw.Stop();
failedCalls++;
latencyMillis.Add(sw.Elapsed.TotalMilliseconds);
@@ -509,6 +509,354 @@ public sealed class MxGatewayClientCliTests
Assert.Contains("gateway-protocol=", text);
}
/// <summary>
/// Client.Dotnet-018: the README CLI examples for the alarm subcommands at
/// `clients/dotnet/README.md` must drive cleanly through the production
/// CLI argument parser. The previous text used non-existent flags
/// (`--session-id`, `--max-messages`, `--alarm-reference`) that would
/// fail with "Unknown command" / "Missing required option --reference".
/// Each documented example is extracted from the README, parsed via the
/// production <see cref="MxGatewayClientCli.RunAsync"/>, and asserted
/// against exit code 0.
/// </summary>
[Theory]
[InlineData("stream-alarms")]
[InlineData("acknowledge-alarm")]
public async Task RunAsync_ReadmeExamples_ForAlarmCommands_ParseSuccessfully(string command)
{
string readme = LocateClientReadme();
string[] commandLine = ExtractReadmeCommandLine(readme, command);
// The documented examples do not include --api-key (the README assumes
// the env var path documented elsewhere). Inject an API key via the
// standard env var so CreateOptions succeeds and the parser fully
// exercises the documented flag shape.
string? previousKey = Environment.GetEnvironmentVariable("MXGATEWAY_API_KEY");
Environment.SetEnvironmentVariable("MXGATEWAY_API_KEY", "test-api-key");
try
{
using var output = new StringWriter();
using var error = new StringWriter();
FakeCliClient fakeClient = new();
fakeClient.AlarmFeedMessages.Add(new AlarmFeedMessage
{
ActiveAlarm = new ActiveAlarmSnapshot { AlarmFullReference = "fixture" },
});
fakeClient.AcknowledgeAlarmReplies.Enqueue(new AcknowledgeAlarmReply
{
CorrelationId = "ack-fixture",
ProtocolStatus = new ProtocolStatus { Code = ProtocolStatusCode.Ok },
});
int exitCode = await MxGatewayClientCli.RunAsync(
commandLine,
output,
error,
_ => fakeClient);
Assert.True(
exitCode == 0,
$"README example for '{command}' exited {exitCode}; stderr=<<{error}>>");
Assert.DoesNotContain("Unknown command", error.ToString());
Assert.DoesNotContain("Missing required option", error.ToString());
}
finally
{
Environment.SetEnvironmentVariable("MXGATEWAY_API_KEY", previousKey);
}
}
/// <summary>
/// Client.Dotnet-019: `BenchReadBulkAsync` previously fell back to
/// <c>reply.ReturnValue.Int32Value</c> when the register reply had no
/// typed <c>Register</c> payload, silently driving the rest of the bench
/// against a zero server handle. The fix must fail loudly with a
/// descriptive <see cref="MxGatewayException"/>.
/// </summary>
[Fact]
public async Task RunAsync_BenchReadBulk_WhenRegisterReplyMissingTypedPayload_FailsLoudly()
{
using var output = new StringWriter();
using var error = new StringWriter();
FakeCliClient fakeClient = new();
// Successful protocol + MX status but no typed `Register` payload.
// Before the Client.Dotnet-019 fix this silently became serverHandle=0
// and the bench proceeded through SubscribeBulk / warmup / steady-state
// against an invalid handle, producing a misleading zero-result summary.
fakeClient.InvokeReplies.Enqueue(new MxCommandReply
{
SessionId = "session-fixture",
Kind = MxCommandKind.Register,
ProtocolStatus = new ProtocolStatus { Code = ProtocolStatusCode.Ok },
});
int exitCode = await MxGatewayClientCli.RunAsync(
[
"bench-read-bulk",
"--endpoint",
"http://localhost:5000",
"--api-key",
"test-api-key",
"--duration-seconds",
"1",
"--warmup-seconds",
"0",
"--bulk-size",
"1",
],
output,
error,
_ => fakeClient);
Assert.Equal(1, exitCode);
// Descriptive message that names the missing typed payload.
string err = error.ToString();
Assert.Contains("Register", err);
// The bench must not produce any aggregate stats JSON.
Assert.DoesNotContain("bench-read-bulk", output.ToString());
}
/// <summary>
/// Client.Dotnet-020: the steady-state loop in `BenchReadBulkAsync` had a
/// bare `catch { failedCalls++; continue; }` that swallowed
/// <see cref="OperationCanceledException"/>, so token-driven cancellation
/// kept spinning until <c>--duration-seconds</c> elapsed. After the fix
/// the bench must exit promptly when the supplied token cancels.
/// </summary>
[Fact]
public async Task RunAsync_BenchReadBulk_WhenSteadyStateLoopReceivesCancellation_ExitsPromptly()
{
using var output = new StringWriter();
using var error = new StringWriter();
int invokeCount = 0;
FakeCliClient fakeClient = new()
{
InvokeHandler = (request, ct) =>
{
int n = Interlocked.Increment(ref invokeCount);
// Reply 1 = Register (success with typed payload).
if (request.Command.Kind == MxCommandKind.Register)
{
return Task.FromResult(new MxCommandReply
{
SessionId = "session-fixture",
Kind = MxCommandKind.Register,
ProtocolStatus = new ProtocolStatus { Code = ProtocolStatusCode.Ok },
Register = new RegisterReply { ServerHandle = 1 },
});
}
// Reply 2 = SubscribeBulk (success).
if (request.Command.Kind == MxCommandKind.SubscribeBulk)
{
var subscribeReply = new MxCommandReply
{
SessionId = "session-fixture",
Kind = MxCommandKind.SubscribeBulk,
ProtocolStatus = new ProtocolStatus { Code = ProtocolStatusCode.Ok },
SubscribeBulk = new BulkSubscribeReply(),
};
return Task.FromResult(subscribeReply);
}
// ReadBulk reply 1 = success (so the steady-state loop enters
// and starts iterating). Reply 2+ = simulated cancellation.
if (request.Command.Kind == MxCommandKind.ReadBulk && n <= 3)
{
return Task.FromResult(new MxCommandReply
{
SessionId = "session-fixture",
Kind = MxCommandKind.ReadBulk,
ProtocolStatus = new ProtocolStatus { Code = ProtocolStatusCode.Ok },
ReadBulk = new BulkReadReply(),
});
}
// From here on every ReadBulk throws OCE — the steady-state
// loop must exit promptly rather than spinning until
// --duration-seconds elapses.
throw new OperationCanceledException();
},
};
var sw = System.Diagnostics.Stopwatch.StartNew();
await Assert.ThrowsAsync<OperationCanceledException>(async () =>
await MxGatewayClientCli.RunAsync(
[
"bench-read-bulk",
"--endpoint",
"http://localhost:5000",
"--api-key",
"test-api-key",
"--duration-seconds",
"30",
"--warmup-seconds",
"0",
"--bulk-size",
"1",
],
output,
error,
_ => fakeClient));
sw.Stop();
// Without the fix the loop swallows OCE and continues until the 30 s
// steady-state deadline expires. With the fix it exits as soon as OCE
// surfaces. Generous 10 s ceiling to keep the test stable under load.
Assert.True(
sw.Elapsed < TimeSpan.FromSeconds(10),
$"Bench did not exit promptly on cancellation; took {sw.Elapsed}.");
}
/// <summary>
/// Client.Dotnet-021: both `ReadBulkAsync` and `BenchReadBulkAsync` cast
/// the user-supplied <c>--timeout-ms</c> to <see cref="uint"/> without
/// bounds checking, so a negative value (e.g. <c>-1</c>) silently wraps
/// to ~49.7 days. The fix must reject negatives with a clear error.
/// </summary>
[Theory]
[InlineData("read-bulk")]
[InlineData("bench-read-bulk")]
public async Task RunAsync_TimeoutMs_NegativeValue_RejectsWithClearError(string command)
{
using var output = new StringWriter();
using var error = new StringWriter();
FakeCliClient fakeClient = new();
string[] args = command is "read-bulk"
? [
"read-bulk",
"--endpoint",
"http://localhost:5000",
"--api-key",
"test-api-key",
"--session-id",
"session-fixture",
"--server-handle",
"1",
"--items",
"Area001.Pump001.Speed",
"--timeout-ms",
"-1",
]
: [
"bench-read-bulk",
"--endpoint",
"http://localhost:5000",
"--api-key",
"test-api-key",
"--duration-seconds",
"1",
"--warmup-seconds",
"0",
"--bulk-size",
"1",
"--timeout-ms",
"-1",
];
int exitCode = await MxGatewayClientCli.RunAsync(
args,
output,
error,
_ => fakeClient);
Assert.NotEqual(0, exitCode);
string err = error.ToString();
Assert.Contains("timeout-ms", err);
Assert.Contains("non-negative", err);
}
/// <summary>
/// Locates the .NET client README by walking up from the test assembly's
/// base directory until <c>clients/dotnet/README.md</c> is found. Keeps
/// the regression test independent of the current working directory.
/// </summary>
private static string LocateClientReadme()
{
string? directory = AppContext.BaseDirectory;
while (!string.IsNullOrEmpty(directory))
{
string candidate = Path.Combine(directory, "clients", "dotnet", "README.md");
if (File.Exists(candidate))
{
return candidate;
}
directory = Path.GetDirectoryName(directory);
}
throw new FileNotFoundException("clients/dotnet/README.md not found above test assembly base directory.");
}
/// <summary>
/// Extracts the documented CLI invocation for the requested subcommand
/// from the README, returning only the arguments after the
/// <c>mxgw-dotnet</c>-equivalent prefix so they can be passed straight
/// to <see cref="MxGatewayClientCli.RunAsync"/>.
/// </summary>
private static string[] ExtractReadmeCommandLine(string readmePath, string command)
{
string[] lines = File.ReadAllLines(readmePath);
// Look for the documented `dotnet run ... -- <command> ...` line.
foreach (string line in lines)
{
int dashes = line.IndexOf("-- " + command, StringComparison.Ordinal);
if (dashes < 0)
{
continue;
}
string after = line[(dashes + 3)..].Trim();
// Tokenize by whitespace, respecting "..." quoted segments.
return TokenizeCommandLine(after);
}
throw new InvalidOperationException(
$"README at '{readmePath}' has no documented example for subcommand '{command}'.");
}
/// <summary>
/// Splits a single command-line string into argv tokens, honouring
/// double-quoted segments so paths with embedded spaces survive intact.
/// </summary>
private static string[] TokenizeCommandLine(string input)
{
var tokens = new List<string>();
var current = new System.Text.StringBuilder();
bool inQuotes = false;
foreach (char ch in input)
{
if (ch == '"')
{
inQuotes = !inQuotes;
continue;
}
if (!inQuotes && char.IsWhiteSpace(ch))
{
if (current.Length > 0)
{
tokens.Add(current.ToString());
current.Clear();
}
continue;
}
current.Append(ch);
}
if (current.Length > 0)
{
tokens.Add(current.ToString());
}
return tokens.ToArray();
}
/// <summary>Fake CLI client for testing.</summary>
private sealed class FakeCliClient : IMxGatewayCliClient
{
@@ -527,6 +875,9 @@ public sealed class MxGatewayClientCliTests
/// <summary>Exception to throw on invoke, if any.</summary>
public Exception? InvokeFailure { get; init; }
/// <summary>Optional per-call handler that overrides queue-based behaviour.</summary>
public Func<MxCommandRequest, CancellationToken, Task<MxCommandReply>>? InvokeHandler { get; init; }
/// <inheritdoc />
public ValueTask DisposeAsync()
{
@@ -572,6 +923,11 @@ public sealed class MxGatewayClientCliTests
throw InvokeFailure;
}
if (InvokeHandler is not null)
{
return InvokeHandler(request, cancellationToken);
}
return Task.FromResult(InvokeReplies.Dequeue());
}
+59 -20
View File
@@ -396,25 +396,31 @@ func runReadBulk(ctx context.Context, args []string, stdout, stderr io.Writer) e
}
func runWriteBulk(ctx context.Context, args []string, stdout, stderr io.Writer) error {
return runWriteBulkVariant(ctx, args, stdout, stderr, "write-bulk", false, false)
return runWriteBulkVariant(ctx, args, stdout, stderr, "write-bulk", false)
}
func runWrite2Bulk(ctx context.Context, args []string, stdout, stderr io.Writer) error {
return runWriteBulkVariant(ctx, args, stdout, stderr, "write2-bulk", true, false)
return runWriteBulkVariant(ctx, args, stdout, stderr, "write2-bulk", true)
}
func runWriteSecuredBulk(ctx context.Context, args []string, stdout, stderr io.Writer) error {
return runWriteBulkVariant(ctx, args, stdout, stderr, "write-secured-bulk", false, true)
return runWriteBulkVariant(ctx, args, stdout, stderr, "write-secured-bulk", false)
}
func runWriteSecured2Bulk(ctx context.Context, args []string, stdout, stderr io.Writer) error {
return runWriteBulkVariant(ctx, args, stdout, stderr, "write-secured2-bulk", true, true)
return runWriteBulkVariant(ctx, args, stdout, stderr, "write-secured2-bulk", true)
}
// runWriteBulkVariant shares the flag-parsing + entry-build skeleton across
// the four bulk-write families. withTimestamp adds a --timestamp-value flag;
// secured switches from --user-id to --current-user-id / --verifier-user-id.
func runWriteBulkVariant(ctx context.Context, args []string, stdout, stderr io.Writer, command string, withTimestamp bool, secured bool) error {
// the four bulk-write families. The variant is derived from command alone;
// withTimestamp adds a --timestamp-value flag. To keep wrong-variant flags
// from silently no-op'ing, secured-only flags (-current-user-id /
// -verifier-user-id) are only registered for the secured variants, and
// -user-id only for the non-secured Write/Write2 variants — a wrong-variant
// flag then surfaces as a clean "flag provided but not defined" error.
func runWriteBulkVariant(ctx context.Context, args []string, stdout, stderr io.Writer, command string, withTimestamp bool) error {
secured := command == "write-secured-bulk" || command == "write-secured2-bulk"
flags := flag.NewFlagSet(command, flag.ContinueOnError)
flags.SetOutput(stderr)
common := bindCommonFlags(flags)
@@ -424,9 +430,17 @@ func runWriteBulkVariant(ctx context.Context, args []string, stdout, stderr io.W
itemHandles := flags.String("item-handles", "", "comma-separated item handles")
valueType := flags.String("type", "string", "value type: bool, int32, int64, float, double, string")
values := flags.String("values", "", "comma-separated values (one per item handle)")
userID := flags.Int("user-id", 0, "MXAccess user id (Write/Write2 variants)")
currentUserID := flags.Int("current-user-id", 0, "MXAccess current user id (Secured variants)")
verifierUserID := flags.Int("verifier-user-id", 0, "MXAccess verifier user id (Secured variants)")
var (
userID *int
currentUserID *int
verifierUserID *int
)
if secured {
currentUserID = flags.Int("current-user-id", 0, "MXAccess current user id (Secured variants)")
verifierUserID = flags.Int("verifier-user-id", 0, "MXAccess verifier user id (Secured variants)")
} else {
userID = flags.Int("user-id", 0, "MXAccess user id (Write/Write2 variants)")
}
timestampValue := flags.String("timestamp-value", "", "RFC 3339 timestamp shared across all entries (Write2/WriteSecured2 variants)")
if err := flags.Parse(args); err != nil {
@@ -514,7 +528,6 @@ func runWriteBulkVariant(ctx context.Context, args []string, stdout, stderr io.W
default:
return fmt.Errorf("unsupported bulk write command %q", command)
}
_ = secured // currently only used for routing above; reserved for future per-variant validation
return writeWriteBulkOutput(stdout, *jsonOutput, command, options, results, err)
}
@@ -598,10 +611,12 @@ func runBenchReadBulk(ctx context.Context, args []string, stdout, stderr io.Writ
}()
// Warm-up: drive identical calls so any first-call JIT / connection-pool
// setup is amortised before the measurement window opens.
// setup is amortised before the measurement window opens. The ctx.Err()
// guard short-circuits on Ctrl+C / parent-cancel instead of spinning
// failing ReadBulk calls until the wall-clock deadline elapses.
warmupDeadline := time.Now().Add(time.Duration(*warmupSeconds) * time.Second)
timeout := time.Duration(*timeoutMs) * time.Millisecond
for time.Now().Before(warmupDeadline) {
for time.Now().Before(warmupDeadline) && ctx.Err() == nil {
_, _ = session.ReadBulk(ctx, serverHandle, tags, timeout)
}
@@ -613,7 +628,7 @@ func runBenchReadBulk(ctx context.Context, args []string, stdout, stderr io.Writ
steadyStart := time.Now()
steadyDeadline := steadyStart.Add(time.Duration(*durationSeconds) * time.Second)
for time.Now().Before(steadyDeadline) {
for time.Now().Before(steadyDeadline) && ctx.Err() == nil {
callStart := time.Now()
results, err := session.ReadBulk(ctx, serverHandle, tags, timeout)
elapsed := time.Since(callStart)
@@ -1191,18 +1206,28 @@ const batchEOR = "__MXGW_BATCH_EOR__"
// runBatch reads one command line at a time from in, dispatches each via the
// normal runWithIO routing, and writes a batchEOR sentinel to stdout after
// every result. Errors are serialised as JSON to stdout (not stderr) so the
// harness can parse them without interleaving stderr. The loop never terminates
// on command error; only stdin EOF (or an empty line) ends the session.
// harness can parse them without interleaving stderr. Blank lines are
// skipped; only stdin EOF ends the session.
//
// The scanner buffer is widened to 16 MiB so a single long command line
// (e.g. a bulk-write with several thousand handles) does not trip the
// default 64 KiB bufio.Scanner token-too-long error and abort the session.
// If a line still exceeds the cap, the error is surfaced as a per-command
// error-with-sentinel and the session continues.
func runBatch(ctx context.Context, in io.Reader, stdout, stderr io.Writer) error {
bw := bufio.NewWriter(stdout)
scanner := bufio.NewScanner(in)
for scanner.Scan() {
line := scanner.Text()
if line == "" {
scanner.Buffer(make([]byte, 0, 64*1024), 16*1024*1024)
for {
if !scanner.Scan() {
break
}
line := scanner.Text()
args := strings.Fields(line)
if len(args) == 0 {
// Skip blank / whitespace-only lines; do NOT terminate. The
// session ends only on stdin EOF so a stray blank line in a
// PowerShell here-string does not silently drop later commands.
continue
}
if err := runWithIO(ctx, args, bw, stderr); err != nil {
@@ -1217,7 +1242,21 @@ func runBatch(ctx context.Context, in io.Reader, stdout, stderr io.Writer) error
_, _ = fmt.Fprintln(bw, batchEOR)
_ = bw.Flush()
}
return scanner.Err()
if err := scanner.Err(); err != nil {
// Emit the scanner failure as a final error-with-sentinel so the
// harness sees the failure framed, then return the error so the
// process exit reflects it. This handles bufio.ErrTooLong for any
// pathological line above the 16 MiB cap.
errPayload := map[string]string{
"error": err.Error(),
"type": "error",
}
_ = writeJSON(bw, errPayload)
_, _ = fmt.Fprintln(bw, batchEOR)
_ = bw.Flush()
return err
}
return nil
}
func dialGalaxyForCommand(ctx context.Context, common *commonOptions) (*mxgateway.GalaxyClient, commonOptions, error) {
+210
View File
@@ -2,9 +2,15 @@ package main
import (
"bytes"
"context"
"encoding/json"
"net"
"strings"
"testing"
"time"
pb "gitea.dohertylan.com/dohertj2/mxaccessgw/clients/go/internal/generated"
"google.golang.org/grpc"
)
func TestRunVersionJSON(t *testing.T) {
@@ -84,3 +90,207 @@ func TestParseValueBuildsTypedValue(t *testing.T) {
t.Fatalf("int32 value = %d, want 123", got)
}
}
// TestRunWriteBulkVariantGatesSecuredFlags pins the Client.Go-022 fix:
// secured-only flags must be unavailable on non-secured variants, and
// vice-versa, so a wrong-variant flag fails with a clean "flag provided
// but not defined" error instead of silently no-op'ing.
func TestRunWriteBulkVariantGatesSecuredFlags(t *testing.T) {
cases := []struct {
name string
args []string
}{
{
name: "write-bulk-rejects-current-user-id",
args: []string{"write-bulk", "-current-user-id", "5", "-item-handles", "1", "-values", "1"},
},
{
name: "write-bulk-rejects-verifier-user-id",
args: []string{"write-bulk", "-verifier-user-id", "5", "-item-handles", "1", "-values", "1"},
},
{
name: "write2-bulk-rejects-current-user-id",
args: []string{"write2-bulk", "-current-user-id", "5", "-item-handles", "1", "-values", "1"},
},
{
name: "write-secured-bulk-rejects-user-id",
args: []string{"write-secured-bulk", "-user-id", "5", "-item-handles", "1", "-values", "1"},
},
{
name: "write-secured2-bulk-rejects-user-id",
args: []string{"write-secured2-bulk", "-user-id", "5", "-item-handles", "1", "-values", "1"},
},
}
for _, tc := range cases {
t.Run(tc.name, func(t *testing.T) {
var stdout, stderr bytes.Buffer
err := runWithIO(t.Context(), tc.args, &stdout, &stderr)
if err == nil {
t.Fatalf("runWithIO(%v) returned no error", tc.args)
}
if !strings.Contains(err.Error(), "flag provided but not defined") {
t.Fatalf("runWithIO(%v) error = %v; want 'flag provided but not defined'", tc.args, err)
}
})
}
}
// TestRunBenchReadBulkRespectsContextCancellation pins the Client.Go-023
// fix: the warm-up and steady-state wall-clock loops must honour ctx.Err()
// so an external cancel (Ctrl+C, parent-cancel from a cross-language bench
// driver) short-circuits the bench instead of spinning failing ReadBulk
// calls until the wall-clock deadline elapses.
func TestRunBenchReadBulkRespectsContextCancellation(t *testing.T) {
listener, err := net.Listen("tcp", "127.0.0.1:0")
if err != nil {
t.Fatalf("listen: %v", err)
}
server := grpc.NewServer()
fake := &benchFakeGateway{}
pb.RegisterMxAccessGatewayServer(server, fake)
go func() {
_ = server.Serve(listener)
}()
defer server.Stop()
defer listener.Close()
ctx, cancel := context.WithCancel(context.Background())
defer cancel()
// Long warm-up + duration, so if the ctx.Err() guard were missing the
// loops would run for ~10s. With the guard, the cancel below short-
// circuits both loops within ~one ReadBulk iteration.
args := []string{
"bench-read-bulk",
"-endpoint", listener.Addr().String(),
"-plaintext",
"-api-key", "test",
"-warmup-seconds", "5",
"-duration-seconds", "5",
"-bulk-size", "1",
"-timeout-ms", "100",
}
// Cancel after a brief delay — far less than warmup+duration (10s).
go func() {
time.Sleep(150 * time.Millisecond)
cancel()
}()
var stdout, stderr bytes.Buffer
start := time.Now()
err = runWithIO(ctx, args, &stdout, &stderr)
elapsed := time.Since(start)
// With the ctx.Err() guard, the loops exit well before the wall-clock
// deadlines (warmup=5s + duration=5s = 10s). Allow generous slack for
// CI noise but assert clearly less than the un-guarded worst case.
if elapsed > 4*time.Second {
t.Fatalf("bench-read-bulk took %s after ctx cancel; want <4s (ctx.Err() guard missing?). err=%v stderr=%s", elapsed, err, stderr.String())
}
}
// benchFakeGateway is a minimal MxAccessGatewayServer that satisfies the
// bench-read-bulk session-setup sequence (OpenSession + Invoke for Register
// / SubscribeBulk / ReadBulk / UnsubscribeBulk / CloseSession).
type benchFakeGateway struct {
pb.UnimplementedMxAccessGatewayServer
}
func (g *benchFakeGateway) OpenSession(_ context.Context, _ *pb.OpenSessionRequest) (*pb.OpenSessionReply, error) {
return &pb.OpenSessionReply{
SessionId: "bench-session",
ProtocolStatus: &pb.ProtocolStatus{Code: pb.ProtocolStatusCode_PROTOCOL_STATUS_CODE_OK},
}, nil
}
func (g *benchFakeGateway) CloseSession(_ context.Context, req *pb.CloseSessionRequest) (*pb.CloseSessionReply, error) {
return &pb.CloseSessionReply{
SessionId: req.GetSessionId(),
ProtocolStatus: &pb.ProtocolStatus{Code: pb.ProtocolStatusCode_PROTOCOL_STATUS_CODE_OK},
}, nil
}
func (g *benchFakeGateway) Invoke(_ context.Context, req *pb.MxCommandRequest) (*pb.MxCommandReply, error) {
kind := req.GetCommand().GetKind()
reply := &pb.MxCommandReply{
SessionId: req.GetSessionId(),
Kind: kind,
ProtocolStatus: &pb.ProtocolStatus{Code: pb.ProtocolStatusCode_PROTOCOL_STATUS_CODE_OK},
}
switch kind {
case pb.MxCommandKind_MX_COMMAND_KIND_REGISTER:
reply.Payload = &pb.MxCommandReply_Register{Register: &pb.RegisterReply{ServerHandle: 1}}
case pb.MxCommandKind_MX_COMMAND_KIND_SUBSCRIBE_BULK:
reply.Payload = &pb.MxCommandReply_SubscribeBulk{SubscribeBulk: &pb.BulkSubscribeReply{
Results: []*pb.SubscribeResult{{ServerHandle: 1, ItemHandle: 1, WasSuccessful: true}},
}}
case pb.MxCommandKind_MX_COMMAND_KIND_READ_BULK:
reply.Payload = &pb.MxCommandReply_ReadBulk{ReadBulk: &pb.BulkReadReply{
Results: []*pb.BulkReadResult{{ItemHandle: 1, WasSuccessful: true, WasCached: true}},
}}
case pb.MxCommandKind_MX_COMMAND_KIND_UNSUBSCRIBE_BULK:
reply.Payload = &pb.MxCommandReply_UnsubscribeBulk{UnsubscribeBulk: &pb.BulkSubscribeReply{}}
}
return reply, nil
}
// TestRunBenchReadBulkRejectsNonPositiveBulkSize pins the Client.Go-023-adjacent
// positivity checks so they cannot drift while resolving the cancellation finding.
func TestRunBenchReadBulkRejectsNonPositiveBulkSize(t *testing.T) {
var stdout, stderr bytes.Buffer
err := runWithIO(t.Context(), []string{"bench-read-bulk", "-bulk-size", "0"}, &stdout, &stderr)
if err == nil || !strings.Contains(err.Error(), "bulk-size must be positive") {
t.Fatalf("bench-read-bulk -bulk-size 0 error = %v", err)
}
}
// TestRunBatchSkipsBlankLinesAndContinuesUntilEOF pins the Client.Go-027 fix:
// a blank line in the middle of a batch session must NOT terminate the loop —
// only stdin EOF ends the session.
func TestRunBatchSkipsBlankLinesAndContinuesUntilEOF(t *testing.T) {
var stdout, stderr bytes.Buffer
// version -> blank -> version (a stray blank line in the middle of a
// programmatic session).
in := strings.NewReader("version --json\n\nversion --json\n")
if err := runBatch(t.Context(), in, &stdout, &stderr); err != nil {
t.Fatalf("runBatch() error = %v; stderr = %s", err, stderr.String())
}
out := stdout.String()
// Both version commands must have produced a result before the EOR sentinel.
if count := strings.Count(out, batchEOR); count != 2 {
t.Fatalf("EOR sentinel count = %d, want 2 (one per command, blank line skipped); out = %q", count, out)
}
}
// TestRunBatchHandlesLongCommandLine pins the Client.Go-026 fix: a command
// line longer than the default bufio.Scanner token size (64 KiB) must not
// abort the batch session.
func TestRunBatchHandlesLongCommandLine(t *testing.T) {
var stdout, stderr bytes.Buffer
// Build a single command line larger than 64 KiB. The command itself is
// invalid (no real session) but runBatch must still emit an EOR sentinel
// and continue to the next command rather than dropping the line on the
// floor with a bufio.ErrTooLong from the outer return.
huge := strings.Repeat("tag-with-a-reasonably-long-name-and-suffix,", 2000) + "trailing"
line := "subscribe-bulk -session-id none -items " + huge
if len(line) <= 64*1024 {
t.Fatalf("test setup error: long line length = %d, want > 64KiB", len(line))
}
in := strings.NewReader(line + "\nversion --json\n")
if err := runBatch(t.Context(), in, &stdout, &stderr); err != nil {
t.Fatalf("runBatch() error = %v; stderr = %s", err, stderr.String())
}
out := stdout.String()
// Both commands must produce an EOR sentinel — the long line should be a
// per-command error (still emitted with EOR), then the version command
// should run normally.
if count := strings.Count(out, batchEOR); count != 2 {
t.Fatalf("EOR sentinel count = %d, want 2 (one per command, even when first is too long); out length = %d", count, len(out))
}
}
+75
View File
@@ -168,6 +168,66 @@ func TestQueryActiveAlarmsPassesFilterPrefix(t *testing.T) {
}
}
func TestStreamAlarmsPassesFilterPrefixAndReceivesFeedMessages(t *testing.T) {
fake := &fakeGatewayWithAlarms{
feedMessages: []*pb.AlarmFeedMessage{
{
Payload: &pb.AlarmFeedMessage_ActiveAlarm{
ActiveAlarm: &pb.ActiveAlarmSnapshot{
AlarmFullReference: "Tank01.Level.HiHi",
CurrentState: pb.AlarmConditionState_ALARM_CONDITION_STATE_ACTIVE,
},
},
},
{
Payload: &pb.AlarmFeedMessage_SnapshotComplete{
SnapshotComplete: true,
},
},
},
}
client, cleanup := newBufconnClientWithAlarms(t, fake)
defer cleanup()
stream, err := client.StreamAlarms(context.Background(), &pb.StreamAlarmsRequest{
AlarmFilterPrefix: "Tank01.",
})
if err != nil {
t.Fatalf("StreamAlarms() error = %v", err)
}
var received []*pb.AlarmFeedMessage
for {
msg, err := stream.Recv()
if errors.Is(err, io.EOF) {
break
}
if err != nil {
t.Fatalf("stream.Recv() error = %v", err)
}
received = append(received, msg)
}
if len(received) != 2 {
t.Fatalf("received count = %d, want 2", len(received))
}
if got := fake.streamRequest.GetAlarmFilterPrefix(); got != "Tank01." {
t.Fatalf("captured filter prefix = %q", got)
}
if got := fake.streamAuth; got != "Bearer test-api-key" {
t.Fatalf("stream authorization metadata = %q", got)
}
}
func TestStreamAlarmsRejectsNilRequest(t *testing.T) {
fake := &fakeGatewayWithAlarms{}
client, cleanup := newBufconnClientWithAlarms(t, fake)
defer cleanup()
if _, err := client.StreamAlarms(context.Background(), nil); err == nil {
t.Fatal("StreamAlarms(nil) returned no error")
}
}
type fakeGatewayWithAlarms struct {
pb.UnimplementedMxAccessGatewayServer
@@ -178,6 +238,10 @@ type fakeGatewayWithAlarms struct {
queryRequest *pb.QueryActiveAlarmsRequest
activeSnapshots []*pb.ActiveAlarmSnapshot
streamRequest *pb.StreamAlarmsRequest
feedMessages []*pb.AlarmFeedMessage
streamAuth string
}
func (s *fakeGatewayWithAlarms) AcknowledgeAlarm(ctx context.Context, req *pb.AcknowledgeAlarmRequest) (*pb.AcknowledgeAlarmReply, error) {
@@ -207,6 +271,17 @@ func (s *fakeGatewayWithAlarms) QueryActiveAlarms(req *pb.QueryActiveAlarmsReque
return nil
}
func (s *fakeGatewayWithAlarms) StreamAlarms(req *pb.StreamAlarmsRequest, stream grpc.ServerStreamingServer[pb.AlarmFeedMessage]) error {
s.streamRequest = req
s.streamAuth = authorizationFromContext(stream.Context())
for _, msg := range s.feedMessages {
if err := stream.Send(msg); err != nil {
return err
}
}
return nil
}
func newBufconnClientWithAlarms(t *testing.T, fake *fakeGatewayWithAlarms) (*Client, func()) {
t.Helper()
listener := bufconn.Listen(bufSize)
+200
View File
@@ -230,6 +230,206 @@ func TestSubscribeBulkBuildsOneBulkCommandAndReturnsResults(t *testing.T) {
}
}
func TestWriteBulkBuildsOneBulkCommandAndReturnsPerEntryResults(t *testing.T) {
fake := &fakeGatewayServer{
invokeReply: &pb.MxCommandReply{
SessionId: "session-1",
Kind: pb.MxCommandKind_MX_COMMAND_KIND_WRITE_BULK,
ProtocolStatus: &pb.ProtocolStatus{
Code: pb.ProtocolStatusCode_PROTOCOL_STATUS_CODE_OK,
},
Payload: &pb.MxCommandReply_WriteBulk{
WriteBulk: &pb.BulkWriteReply{
Results: []*pb.BulkWriteResult{
{ItemHandle: 10, WasSuccessful: true},
{ItemHandle: 11, WasSuccessful: true},
},
},
},
},
}
client, cleanup := newBufconnClient(t, fake)
defer cleanup()
session := NewSessionForID(client, "session-1")
entries := []*WriteBulkEntry{
{ItemHandle: 10, Value: Int32Value(7), UserId: 100},
{ItemHandle: 11, Value: Int32Value(8), UserId: 100},
}
results, err := session.WriteBulk(context.Background(), 12, entries)
if err != nil {
t.Fatalf("WriteBulk() error = %v", err)
}
if len(results) != 2 {
t.Fatalf("results len = %d, want 2", len(results))
}
req := fake.invokeRequest
if req.GetCommand().GetKind() != pb.MxCommandKind_MX_COMMAND_KIND_WRITE_BULK {
t.Fatalf("command kind = %s", req.GetCommand().GetKind())
}
if got := req.GetCommand().GetWriteBulk().GetEntries(); len(got) != 2 {
t.Fatalf("entry count = %d, want 2", len(got))
}
}
func TestWriteBulkRejectsNilEntries(t *testing.T) {
fake := &fakeGatewayServer{}
client, cleanup := newBufconnClient(t, fake)
defer cleanup()
session := NewSessionForID(client, "session-1")
if _, err := session.WriteBulk(context.Background(), 12, nil); err == nil {
t.Fatal("WriteBulk(nil) returned no error")
}
if _, err := session.Write2Bulk(context.Background(), 12, nil); err == nil {
t.Fatal("Write2Bulk(nil) returned no error")
}
if _, err := session.WriteSecuredBulk(context.Background(), 12, nil); err == nil {
t.Fatal("WriteSecuredBulk(nil) returned no error")
}
if _, err := session.WriteSecured2Bulk(context.Background(), 12, nil); err == nil {
t.Fatal("WriteSecured2Bulk(nil) returned no error")
}
if _, err := session.ReadBulk(context.Background(), 12, nil, 0); err == nil {
t.Fatal("ReadBulk(nil) returned no error")
}
}
func TestBulkMethodsShortCircuitOnEmptySliceWithoutRoundTrip(t *testing.T) {
fake := &fakeGatewayServer{
invokeReply: &pb.MxCommandReply{
ProtocolStatus: &pb.ProtocolStatus{
Code: pb.ProtocolStatusCode_PROTOCOL_STATUS_CODE_OK,
},
},
}
client, cleanup := newBufconnClient(t, fake)
defer cleanup()
session := NewSessionForID(client, "session-1")
results, err := session.WriteBulk(context.Background(), 12, []*WriteBulkEntry{})
if err != nil {
t.Fatalf("WriteBulk(empty) error = %v", err)
}
if len(results) != 0 {
t.Fatalf("WriteBulk(empty) results len = %d, want 0", len(results))
}
if fake.invokeRequest != nil {
t.Fatal("WriteBulk(empty) sent a round trip; expected short-circuit")
}
results2, err := session.Write2Bulk(context.Background(), 12, []*Write2BulkEntry{})
if err != nil {
t.Fatalf("Write2Bulk(empty) error = %v", err)
}
if len(results2) != 0 {
t.Fatalf("Write2Bulk(empty) results len = %d, want 0", len(results2))
}
if fake.invokeRequest != nil {
t.Fatal("Write2Bulk(empty) sent a round trip; expected short-circuit")
}
results3, err := session.WriteSecuredBulk(context.Background(), 12, []*WriteSecuredBulkEntry{})
if err != nil {
t.Fatalf("WriteSecuredBulk(empty) error = %v", err)
}
if len(results3) != 0 {
t.Fatalf("WriteSecuredBulk(empty) results len = %d, want 0", len(results3))
}
if fake.invokeRequest != nil {
t.Fatal("WriteSecuredBulk(empty) sent a round trip; expected short-circuit")
}
results4, err := session.WriteSecured2Bulk(context.Background(), 12, []*WriteSecured2BulkEntry{})
if err != nil {
t.Fatalf("WriteSecured2Bulk(empty) error = %v", err)
}
if len(results4) != 0 {
t.Fatalf("WriteSecured2Bulk(empty) results len = %d, want 0", len(results4))
}
if fake.invokeRequest != nil {
t.Fatal("WriteSecured2Bulk(empty) sent a round trip; expected short-circuit")
}
readResults, err := session.ReadBulk(context.Background(), 12, []string{}, 0)
if err != nil {
t.Fatalf("ReadBulk(empty) error = %v", err)
}
if len(readResults) != 0 {
t.Fatalf("ReadBulk(empty) results len = %d, want 0", len(readResults))
}
if fake.invokeRequest != nil {
t.Fatal("ReadBulk(empty) sent a round trip; expected short-circuit")
}
}
func TestReadBulkForwardsTimeoutAndUnpacksCachedFlag(t *testing.T) {
fake := &fakeGatewayServer{
invokeReply: &pb.MxCommandReply{
SessionId: "session-1",
Kind: pb.MxCommandKind_MX_COMMAND_KIND_READ_BULK,
ProtocolStatus: &pb.ProtocolStatus{
Code: pb.ProtocolStatusCode_PROTOCOL_STATUS_CODE_OK,
},
Payload: &pb.MxCommandReply_ReadBulk{
ReadBulk: &pb.BulkReadReply{
Results: []*pb.BulkReadResult{
{TagAddress: "Tank01.Level", WasSuccessful: true, WasCached: true},
{TagAddress: "Tank02.Level", WasSuccessful: true, WasCached: false},
},
},
},
},
}
client, cleanup := newBufconnClient(t, fake)
defer cleanup()
session := NewSessionForID(client, "session-1")
results, err := session.ReadBulk(context.Background(), 12, []string{"Tank01.Level", "Tank02.Level"}, 250*time.Millisecond)
if err != nil {
t.Fatalf("ReadBulk() error = %v", err)
}
if len(results) != 2 {
t.Fatalf("results len = %d, want 2", len(results))
}
if !results[0].GetWasCached() || results[1].GetWasCached() {
t.Fatalf("WasCached flags = [%v %v], want [true false]", results[0].GetWasCached(), results[1].GetWasCached())
}
req := fake.invokeRequest
if req.GetCommand().GetKind() != pb.MxCommandKind_MX_COMMAND_KIND_READ_BULK {
t.Fatalf("command kind = %s", req.GetCommand().GetKind())
}
if got := req.GetCommand().GetReadBulk().GetTimeoutMs(); got != 250 {
t.Fatalf("timeout ms = %d, want 250", got)
}
}
func TestReadBulkSaturatesTimeoutAboveMaxUint32(t *testing.T) {
fake := &fakeGatewayServer{
invokeReply: &pb.MxCommandReply{
SessionId: "session-1",
Kind: pb.MxCommandKind_MX_COMMAND_KIND_READ_BULK,
ProtocolStatus: &pb.ProtocolStatus{
Code: pb.ProtocolStatusCode_PROTOCOL_STATUS_CODE_OK,
},
},
}
client, cleanup := newBufconnClient(t, fake)
defer cleanup()
session := NewSessionForID(client, "session-1")
// 100 days in milliseconds exceeds MaxUint32 (~49.7 days).
hugeTimeout := 100 * 24 * time.Hour
_, err := session.ReadBulk(context.Background(), 12, []string{"Tank01.Level"}, hugeTimeout)
if err != nil {
t.Fatalf("ReadBulk() error = %v", err)
}
got := fake.invokeRequest.GetCommand().GetReadBulk().GetTimeoutMs()
if got != ^uint32(0) {
t.Fatalf("timeout ms = %d, want %d (MaxUint32)", got, ^uint32(0))
}
}
func TestInvokeReturnsTypedMxAccessErrorWithRawReply(t *testing.T) {
hresult := int32(-2147467259)
fake := &fakeGatewayServer{
+31
View File
@@ -392,6 +392,9 @@ func (s *Session) UnsubscribeBulk(ctx context.Context, serverHandle int32, itemH
// Per-entry failures appear as BulkWriteResult entries with WasSuccessful=false; the call
// never returns an error for per-entry MXAccess failures (it returns an error only for
// protocol-level failures or transport errors).
//
// A non-nil but empty entries slice is treated as a no-op and returns an empty result
// without a wire round-trip; pass nil to surface a clear "entries are required" error.
func (s *Session) WriteBulk(ctx context.Context, serverHandle int32, entries []*WriteBulkEntry) ([]*BulkWriteResult, error) {
if entries == nil {
return nil, errors.New("mxgateway: write bulk entries are required")
@@ -399,6 +402,9 @@ func (s *Session) WriteBulk(ctx context.Context, serverHandle int32, entries []*
if err := ensureBulkSize("write bulk entries", len(entries)); err != nil {
return nil, err
}
if len(entries) == 0 {
return []*BulkWriteResult{}, nil
}
reply, err := s.invokeCommand(ctx, &pb.MxCommand{
Kind: pb.MxCommandKind_MX_COMMAND_KIND_WRITE_BULK,
Payload: &pb.MxCommand_WriteBulk{
@@ -415,6 +421,9 @@ func (s *Session) WriteBulk(ctx context.Context, serverHandle int32, entries []*
}
// Write2Bulk invokes MXAccess Write2 (timestamped) for each entry inside one gateway command.
//
// A non-nil but empty entries slice is treated as a no-op and returns an empty result
// without a wire round-trip; pass nil to surface a clear "entries are required" error.
func (s *Session) Write2Bulk(ctx context.Context, serverHandle int32, entries []*Write2BulkEntry) ([]*BulkWriteResult, error) {
if entries == nil {
return nil, errors.New("mxgateway: write2 bulk entries are required")
@@ -422,6 +431,9 @@ func (s *Session) Write2Bulk(ctx context.Context, serverHandle int32, entries []
if err := ensureBulkSize("write2 bulk entries", len(entries)); err != nil {
return nil, err
}
if len(entries) == 0 {
return []*BulkWriteResult{}, nil
}
reply, err := s.invokeCommand(ctx, &pb.MxCommand{
Kind: pb.MxCommandKind_MX_COMMAND_KIND_WRITE2_BULK,
Payload: &pb.MxCommand_Write2Bulk{
@@ -439,6 +451,9 @@ func (s *Session) Write2Bulk(ctx context.Context, serverHandle int32, entries []
// WriteSecuredBulk invokes MXAccess WriteSecured for each entry. Credential-sensitive
// values must not be logged by callers; mirrors the single-item WriteSecured contract.
//
// A non-nil but empty entries slice is treated as a no-op and returns an empty result
// without a wire round-trip; pass nil to surface a clear "entries are required" error.
func (s *Session) WriteSecuredBulk(ctx context.Context, serverHandle int32, entries []*WriteSecuredBulkEntry) ([]*BulkWriteResult, error) {
if entries == nil {
return nil, errors.New("mxgateway: write-secured bulk entries are required")
@@ -446,6 +461,9 @@ func (s *Session) WriteSecuredBulk(ctx context.Context, serverHandle int32, entr
if err := ensureBulkSize("write-secured bulk entries", len(entries)); err != nil {
return nil, err
}
if len(entries) == 0 {
return []*BulkWriteResult{}, nil
}
reply, err := s.invokeCommand(ctx, &pb.MxCommand{
Kind: pb.MxCommandKind_MX_COMMAND_KIND_WRITE_SECURED_BULK,
Payload: &pb.MxCommand_WriteSecuredBulk{
@@ -462,6 +480,9 @@ func (s *Session) WriteSecuredBulk(ctx context.Context, serverHandle int32, entr
}
// WriteSecured2Bulk invokes MXAccess WriteSecured2 (timestamped) for each entry.
//
// A non-nil but empty entries slice is treated as a no-op and returns an empty result
// without a wire round-trip; pass nil to surface a clear "entries are required" error.
func (s *Session) WriteSecured2Bulk(ctx context.Context, serverHandle int32, entries []*WriteSecured2BulkEntry) ([]*BulkWriteResult, error) {
if entries == nil {
return nil, errors.New("mxgateway: write-secured2 bulk entries are required")
@@ -469,6 +490,9 @@ func (s *Session) WriteSecured2Bulk(ctx context.Context, serverHandle int32, ent
if err := ensureBulkSize("write-secured2 bulk entries", len(entries)); err != nil {
return nil, err
}
if len(entries) == 0 {
return []*BulkWriteResult{}, nil
}
reply, err := s.invokeCommand(ctx, &pb.MxCommand{
Kind: pb.MxCommandKind_MX_COMMAND_KIND_WRITE_SECURED2_BULK,
Payload: &pb.MxCommand_WriteSecured2Bulk{
@@ -492,6 +516,10 @@ func (s *Session) WriteSecured2Bulk(ctx context.Context, serverHandle int32, ent
// otherwise. timeout bounds the wait per tag in the snapshot case; pass zero to use the
// worker default. Per-tag failures (timeout, invalid tag) appear as BulkReadResult entries
// with WasSuccessful=false; the call never returns an error for per-tag MXAccess failures.
//
// A non-nil but empty tagAddresses slice is treated as a no-op and returns an empty
// result without a wire round-trip; pass nil to surface a clear "tag addresses are
// required" error.
func (s *Session) ReadBulk(ctx context.Context, serverHandle int32, tagAddresses []string, timeout time.Duration) ([]*BulkReadResult, error) {
if tagAddresses == nil {
return nil, errors.New("mxgateway: tag addresses are required")
@@ -499,6 +527,9 @@ func (s *Session) ReadBulk(ctx context.Context, serverHandle int32, tagAddresses
if err := ensureBulkSize("tag addresses", len(tagAddresses)); err != nil {
return nil, err
}
if len(tagAddresses) == 0 {
return []*BulkReadResult{}, nil
}
var timeoutMs uint32
if timeout > 0 {
ms := timeout.Milliseconds()
+2 -2
View File
@@ -198,8 +198,8 @@ mxgw-py register --session-id <id> --client-name python-client --json
mxgw-py add-item --session-id <id> --server-handle 1 --item Object.Attribute --json
mxgw-py advise --session-id <id> --server-handle 1 --item-handle 2 --json
mxgw-py stream-events --session-id <id> --max-events 1 --json
mxgw-py stream-alarms --session-id <id> --max-messages 1 --json
mxgw-py acknowledge-alarm --session-id <id> --alarm-reference "\\Galaxy\Area001.Pump001.PumpFault" --json
mxgw-py stream-alarms --max-messages 1 --json
mxgw-py acknowledge-alarm --reference "\\Galaxy\Area001.Pump001.PumpFault" --json
mxgw-py write --session-id <id> --server-handle 1 --item-handle 2 --type int32 --value 123 --json
```
@@ -3,15 +3,18 @@
from __future__ import annotations
import asyncio
import contextlib
import io
import json
import logging
import os
import sys
import time
from collections.abc import Awaitable, Callable
from datetime import datetime, timezone
from typing import Any
import click
from click.testing import CliRunner
from google.protobuf.json_format import MessageToDict
from zb_mom_ww_mxgateway import __version__
@@ -22,6 +25,8 @@ from zb_mom_ww_mxgateway.generated import mxaccess_gateway_pb2 as pb
from zb_mom_ww_mxgateway.options import ClientOptions
from zb_mom_ww_mxgateway.values import MxValueInput, to_mx_value
logger = logging.getLogger(__name__)
MAX_AGGREGATE_EVENTS = 10_000
_BATCH_EOR = "__MXGW_BATCH_EOR__"
@@ -56,9 +61,10 @@ def batch() -> None:
Errors do NOT terminate the loop. Each command's output (including any error JSON) is
written to stdout followed by a line containing exactly ``__MXGW_BATCH_EOR__``, then
stdout is flushed. Error output is formatted as ``{"error": "...", "type": "..."}``.
"""
runner = CliRunner()
Recursive ``batch`` lines are rejected (Client.Python-024) — re-entering the batch
dispatcher would silently spawn a nested loop reading from the same exhausted stdin.
"""
for raw_line in sys.stdin:
line = raw_line.rstrip("\n").rstrip("\r")
@@ -68,44 +74,77 @@ def batch() -> None:
args = line.split()
try:
result = runner.invoke(main, args, catch_exceptions=True)
except Exception as exc: # noqa: BLE001 — be safe; never let batch loop die
_batch_write_error(exc.__class__.__name__, str(exc))
# Reject a recursive `batch` line outright: the nested invocation would
# read from the already-exhausted stdin (or, depending on harness, the
# same stream the outer batch is consuming line-by-line) and silently
# exit. Surface it as an explicit error block so callers can audit the
# mis-routed line.
if args and args[0] == "batch":
_batch_write_error(
"RecursiveBatchError",
"nested 'batch' invocation is not allowed inside batch mode",
)
_batch_flush_eor()
continue
if result.exit_code == 0:
# Normal success — write captured output as-is.
sys.stdout.write(result.output)
_dispatch_batch_line(args)
def _dispatch_batch_line(args: list[str]) -> None:
"""Run a single batch line through the Click parser directly (no CliRunner).
Captures the subcommand's stdout via :func:`contextlib.redirect_stdout` and
synthesises the standard ``{"error": ..., "type": ...}`` shape on failure.
Click exceptions (`ClickException`, `UsageError`) are caught and rendered;
`SystemExit(0)` from a Click command is treated as a clean exit, while a
non-zero `SystemExit` is rendered as a CLI error. All other exceptions are
captured and rendered as `{"error": str(exc), "type": exc.__class__.__name__}`
so the loop never dies.
"""
buffer = io.StringIO()
exit_code = 0
exc: BaseException | None = None
try:
with contextlib.redirect_stdout(buffer):
try:
# `standalone_mode=False` makes Click raise instead of calling
# `sys.exit`; we still need to handle SystemExit because some
# commands explicitly raise it (or `click.UsageError` converts
# to a SystemExit under some entry-point paths).
main.main(args=args, standalone_mode=False, prog_name="mxgw-py")
except click.exceptions.Exit as click_exit:
exit_code = click_exit.exit_code
except click.ClickException as click_exc:
exit_code = click_exc.exit_code
exc = click_exc
click.echo(f"Error: {click_exc.format_message()}", err=False)
except SystemExit as sys_exit:
code = sys_exit.code
exit_code = int(code) if isinstance(code, int) else (0 if code is None else 1)
except Exception as captured: # noqa: BLE001 — never let batch loop die
exc = captured
exit_code = 1
output = buffer.getvalue()
if exit_code == 0 and exc is None:
sys.stdout.write(output)
else:
if output.lstrip().startswith("{"):
# Inner command already emitted JSON (e.g. a structured error) —
# relay verbatim.
sys.stdout.write(output)
if output and not output.endswith("\n"):
sys.stdout.write("\n")
elif exc is not None:
_batch_write_error(type(exc).__name__, str(exc))
else:
# Something went wrong. If the command already emitted a JSON object
# (e.g. the output starts with '{'), trust that and relay it verbatim.
# Otherwise synthesise the standard {"error": ..., "type": ...} shape.
output = result.output or ""
exc = result.exception
msg = output.strip()
if msg.startswith("Error: "):
msg = msg[len("Error: "):]
_batch_write_error("CliError", msg)
if output.lstrip().startswith("{"):
# Already JSON — relay verbatim (may or may not end with newline).
sys.stdout.write(output)
if not output.endswith("\n"):
sys.stdout.write("\n")
elif exc is not None and not isinstance(exc, SystemExit):
_batch_write_error(type(exc).__name__, str(exc))
else:
# Click's default error format is "Error: <message>\n"; extract the
# message so the harness gets clean JSON.
msg = output.strip()
if msg.startswith("Error: "):
msg = msg[len("Error: "):]
exc_type = (
type(exc).__name__
if exc is not None and not isinstance(exc, SystemExit)
else "CliError"
)
_batch_write_error(exc_type, msg)
_batch_flush_eor()
_batch_flush_eor()
def _batch_write_error(exc_type: str, message: str) -> None:
@@ -673,7 +712,6 @@ async def _write_secured2_bulk(**kwargs: Any) -> dict[str, Any]:
async def _bench_read_bulk(**kwargs: Any) -> dict[str, Any]:
"""ReadBulk stress benchmark — matches the .NET / Go / Rust / Java schema."""
import time
bulk_size = int(kwargs["bulk_size"])
if bulk_size < 1:
@@ -730,12 +768,12 @@ async def _bench_read_bulk(**kwargs: Any) -> dict[str, Any]:
if item_handles:
try:
await session.unsubscribe_bulk(server_handle, item_handles)
except Exception:
pass
except Exception as exc: # noqa: BLE001 — bench is best-effort
logger.warning("bench-read-bulk: unsubscribe_bulk cleanup failed: %s", exc)
try:
await session.close()
except Exception:
pass
except Exception as exc: # noqa: BLE001 — bench is best-effort
logger.warning("bench-read-bulk: session.close cleanup failed: %s", exc)
return {
"language": "python",
@@ -899,11 +937,21 @@ def _session(client: GatewayClient, session_id: str):
def _use_plaintext(kwargs: dict[str, Any]) -> bool:
if kwargs.get("use_tls"):
return False
if kwargs.get("plaintext"):
return True
return kwargs["endpoint"].startswith("localhost:") or kwargs["endpoint"].startswith("127.0.0.1:")
"""Resolve the plaintext / TLS contract from the CLI flags.
TLS is the default. ``--plaintext`` is the only way to opt in to an
unencrypted channel; ``--tls`` is accepted as a redundant explicit
affirmation. Combining the two is a usage error (regression-guarded by
Client.Python-023 — the previous silent ``localhost:`` /
``127.0.0.1:`` auto-plaintext branch leaked the API-key bearer over a
plaintext channel when a user ran the gateway behind TLS on loopback).
"""
plaintext = bool(kwargs.get("plaintext"))
use_tls = bool(kwargs.get("use_tls"))
if plaintext and use_tls:
raise click.UsageError("--plaintext and --tls are mutually exclusive")
return plaintext
def _api_key_from_env(name: str | None) -> str | None:
@@ -0,0 +1,789 @@
"""Regression tests for Client.Python-022..026.
Each test corresponds to a finding from the latest re-review. Tests are
TDD-first they failed against the pre-fix source and pass against the
fixed source.
"""
from __future__ import annotations
import json
import re
import time as _time_module_ref
from pathlib import Path
from typing import Any
import pytest
from click.testing import CliRunner
from zb_mom_ww_mxgateway import ClientOptions, GatewayClient
from zb_mom_ww_mxgateway.generated import mxaccess_gateway_pb2 as pb
from zb_mom_ww_mxgateway_cli import commands as cli_commands
from zb_mom_ww_mxgateway_cli.commands import _use_plaintext, main
_BATCH_EOR = "__MXGW_BATCH_EOR__"
# ---------------------------------------------------------------------------
# Client.Python-022 — README CLI examples must parse against the implementation.
# ---------------------------------------------------------------------------
def _readme_path() -> Path:
return Path(__file__).resolve().parent.parent / "README.md"
def _extract_mxgw_py_examples() -> list[list[str]]:
"""Return the README's ``mxgw-py ...`` example lines as click arg lists.
Replaces angle-bracket placeholders (``<id>``) with safe stub values and
leaves real flag names untouched. The returned arg lists drop the
``mxgw-py`` prefix.
"""
text = _readme_path().read_text(encoding="utf-8")
args: list[list[str]] = []
for raw_line in text.splitlines():
line = raw_line.strip()
if not line.startswith("mxgw-py "):
continue
# Strip the leading "mxgw-py " token.
body = line[len("mxgw-py ") :]
# Replace common placeholders so click does not error on the placeholder.
body = body.replace("<id>", "session-1")
# Backtick-quoted hostnames in the TLS example are not represented
# in CLI; safe to leave as-is.
tokens = _split_cli_tokens(body)
# Keep only examples that exercise a real subcommand. Skip TLS
# multi-flag example (we only need the README CLI examples added in
# commits 8738735 — stream-alarms / acknowledge-alarm).
args.append(tokens)
return args
def _split_cli_tokens(body: str) -> list[str]:
"""Split a CLI body into argv tokens, honouring double-quoted strings."""
tokens: list[str] = []
pattern = re.compile(r'"([^"]*)"|(\S+)')
for match in pattern.finditer(body):
quoted, plain = match.group(1), match.group(2)
tokens.append(quoted if quoted is not None else plain)
return tokens
def test_readme_alarm_examples_parse_against_cli() -> None:
"""README `stream-alarms` / `acknowledge-alarm` examples must parse without
triggering Click's ``no such option`` error.
Drives every README ``mxgw-py`` example through Click's ``--help`` style
parser by re-invoking the documented argv with a trailing ``--help`` flag so
only the parser runs (no RPC is attempted). If a documented flag does not
exist on the subcommand, Click prints ``no such option: --<flag>`` and
exits 2 that is the regression we want to catch.
"""
runner = CliRunner()
examples = _extract_mxgw_py_examples()
assert any(
"stream-alarms" in args for args in examples
), "README must include a stream-alarms example."
assert any(
"acknowledge-alarm" in args for args in examples
), "README must include an acknowledge-alarm example."
for argv in examples:
# Strip "--json" (already a real flag) and any value-bearing flag that
# requires a host/file/value, then append --help so we exercise the
# parser only.
# We just append --help — Click parses all options up to --help and
# then prints help; an unknown option still errors out first.
result = runner.invoke(main, [*argv, "--help"])
# Either help text printed (exit 0) or some other parser issue (exit 2);
# we only want to assert NO "no such option" error.
assert "no such option" not in result.output.lower(), (
f"README example failed Click parsing: argv={argv!r}\n"
f"output={result.output!r}"
)
# ---------------------------------------------------------------------------
# Client.Python-023 — REGRESSION of Client.Python-013. _use_plaintext must
# not silently auto-downgrade on localhost / 127.0.0.1.
# ---------------------------------------------------------------------------
def test_use_plaintext_does_not_auto_downgrade_for_localhost_endpoint() -> None:
"""A bare ``localhost:...`` endpoint with no flags must default to TLS."""
assert _use_plaintext({
"endpoint": "localhost:5001",
"plaintext": False,
"use_tls": False,
}) is False
def test_use_plaintext_does_not_auto_downgrade_for_loopback_ipv4_endpoint() -> None:
"""A bare ``127.0.0.1:...`` endpoint with no flags must default to TLS."""
assert _use_plaintext({
"endpoint": "127.0.0.1:5001",
"plaintext": False,
"use_tls": False,
}) is False
def test_use_plaintext_requires_explicit_plaintext_flag() -> None:
"""``--plaintext`` is the only way to opt in."""
assert _use_plaintext({
"endpoint": "localhost:5001",
"plaintext": True,
"use_tls": False,
}) is True
def test_use_plaintext_tls_flag_explicitly_disables_plaintext() -> None:
"""``--tls`` is accepted as an explicit affirmation of the default."""
assert _use_plaintext({
"endpoint": "localhost:5001",
"plaintext": False,
"use_tls": True,
}) is False
def test_use_plaintext_rejects_plaintext_and_tls_combined() -> None:
"""``--plaintext`` and ``--tls`` together must be rejected as ambiguous."""
import click as _click
with pytest.raises(_click.UsageError):
_use_plaintext({
"endpoint": "localhost:5001",
"plaintext": True,
"use_tls": True,
})
def test_cli_localhost_endpoint_with_no_flags_uses_tls_channel(monkeypatch) -> None:
"""End-to-end CLI: against ``localhost:...`` with no flags, the resolved
``ClientOptions.plaintext`` flowing into ``GatewayClient.connect`` must be
``False`` (TLS), so the API key bearer cannot leak over plaintext.
"""
captured: dict[str, Any] = {}
class _FakeStub:
def __init__(self) -> None:
pass
async def OpenSession(self, request: Any, *, metadata: tuple[Any, ...]) -> Any:
captured["metadata"] = metadata
return pb.OpenSessionReply(
session_id="session-1",
protocol_status=pb.ProtocolStatus(code=pb.PROTOCOL_STATUS_CODE_OK),
)
real_connect = GatewayClient.connect
@classmethod
async def _spy_connect(cls, options: ClientOptions, **kwargs: Any) -> GatewayClient:
captured["options"] = options
return await real_connect(options, stub=_FakeStub())
monkeypatch.setattr(GatewayClient, "connect", _spy_connect)
runner = CliRunner()
result = runner.invoke(
main,
[
"open-session",
"--endpoint",
"localhost:5000",
"--api-key",
"mxgw_test_secret",
"--json",
],
)
assert result.exit_code == 0, result.output
assert "options" in captured
assert captured["options"].plaintext is False, (
"localhost endpoint without --plaintext must NOT auto-downgrade to plaintext"
)
# ---------------------------------------------------------------------------
# Client.Python-024 — `batch` must not use CliRunner from production code,
# and a recursive `batch` line must not silently re-enter.
# ---------------------------------------------------------------------------
def test_batch_command_does_not_use_clirunner_in_production() -> None:
"""`commands.py` must not import or instantiate the test-only CliRunner helper.
Docstring references explaining what the module deliberately avoids are
permitted; what is forbidden is an actual ``import`` of ``click.testing``
or an actual ``CliRunner()`` instantiation in executable code.
"""
source = Path(cli_commands.__file__).read_text(encoding="utf-8")
assert "from click.testing" not in source, (
"click.testing is a test-only helper and must not be used by production code"
)
assert "import click.testing" not in source, (
"click.testing is a test-only helper and must not be used by production code"
)
# `CliRunner()` (instantiation) must not appear in production code.
assert "CliRunner(" not in source, (
"CliRunner() must not be instantiated in production code"
)
def test_batch_recursive_batch_line_is_bounded() -> None:
"""A `batch` line nested inside `batch` stdin must not be silently spawned.
The pre-fix implementation re-invoked the test runner with empty stdin,
so `batch` inside `batch` exited cleanly with no error. The fix either
rejects the nested invocation or surfaces it as an error block so the
behaviour is auditable.
"""
runner = CliRunner()
result = runner.invoke(
main,
["batch"],
input="batch\nversion --json\n",
)
# Outer batch must still exit 0 and process both lines.
assert result.exit_code == 0
assert result.output.count(_BATCH_EOR) == 2
blocks = [block for block in result.output.split(_BATCH_EOR + "\n") if block]
# The first block — the recursive `batch` line — must surface an error
# JSON. (Either an explicit rejection, or some non-empty error block —
# NOT a silently empty block.)
first_block = blocks[0].strip()
assert first_block, "recursive batch line must not be silently swallowed"
payload = json.loads(first_block.splitlines()[-1])
assert "error" in payload, (
f"recursive batch line should surface an error: got {payload!r}"
)
# ---------------------------------------------------------------------------
# Client.Python-025 — Behavioural tests for new bulk SDK methods,
# stream_alarms, and the new CLI subcommands.
# ---------------------------------------------------------------------------
class _AlarmFakeStream:
def __init__(self, messages: list[pb.AlarmFeedMessage]) -> None:
self._messages = list(messages)
self.cancelled = False
def __aiter__(self) -> "_AlarmFakeStream":
return self
async def __anext__(self) -> pb.AlarmFeedMessage:
if not self._messages:
raise StopAsyncIteration
return self._messages.pop(0)
def cancel(self) -> None:
self.cancelled = True
class _BulkFakeUnary:
def __init__(self, replies: list[Any]) -> None:
self.replies = replies
self.requests: list[Any] = []
self.metadata: tuple[tuple[str, str], ...] | None = None
async def __call__(self, request: Any, *, metadata: tuple[tuple[str, str], ...]) -> Any:
self.requests.append(request)
self.metadata = metadata
return self.replies.pop(0)
class _BulkFakeStub:
def __init__(self) -> None:
self.open_session = _BulkFakeUnary(
[
pb.OpenSessionReply(
session_id="session-1",
protocol_status=pb.ProtocolStatus(code=pb.PROTOCOL_STATUS_CODE_OK),
),
],
)
self.invoke = _BulkFakeUnary([])
self.OpenSession = self.open_session
self.Invoke = self.invoke
self.stream_alarms_metadata: tuple[tuple[str, str], ...] | None = None
self._alarm_stream = _AlarmFakeStream([])
def set_invoke_replies(self, replies: list[Any]) -> None:
self.invoke.replies = replies
def set_alarm_stream(self, stream: _AlarmFakeStream) -> None:
self._alarm_stream = stream
def StreamAlarms(self, request: Any, *, metadata: tuple[tuple[str, str], ...]) -> Any:
self.stream_alarms_request = request
self.stream_alarms_metadata = metadata
return self._alarm_stream
@pytest.mark.asyncio
async def test_session_read_bulk_sends_expected_request_shape() -> None:
stub = _BulkFakeStub()
stub.set_invoke_replies(
[
pb.MxCommandReply(
session_id="session-1",
kind=pb.MX_COMMAND_KIND_READ_BULK,
protocol_status=pb.ProtocolStatus(code=pb.PROTOCOL_STATUS_CODE_OK),
read_bulk=pb.BulkReadReply(
results=[
pb.BulkReadResult(
tag_address="Tank01.Level",
was_successful=True,
),
],
),
),
],
)
client = await GatewayClient.connect(
ClientOptions(endpoint="fake", api_key="mxgw_test_secret", plaintext=True),
stub=stub,
)
session = await client.open_session()
results = await session.read_bulk(12, ["Tank01.Level"], timeout_ms=1500)
assert len(results) == 1
assert results[0].tag_address == "Tank01.Level"
request = stub.invoke.requests[0]
assert request.command.kind == pb.MX_COMMAND_KIND_READ_BULK
assert request.command.read_bulk.server_handle == 12
assert list(request.command.read_bulk.tag_addresses) == ["Tank01.Level"]
assert request.command.read_bulk.timeout_ms == 1500
@pytest.mark.asyncio
async def test_session_write_bulk_sends_expected_request_shape() -> None:
stub = _BulkFakeStub()
stub.set_invoke_replies(
[
pb.MxCommandReply(
session_id="session-1",
kind=pb.MX_COMMAND_KIND_WRITE_BULK,
protocol_status=pb.ProtocolStatus(code=pb.PROTOCOL_STATUS_CODE_OK),
write_bulk=pb.BulkWriteReply(
results=[pb.BulkWriteResult(was_successful=True)],
),
),
],
)
client = await GatewayClient.connect(
ClientOptions(endpoint="fake", api_key="mxgw_test_secret", plaintext=True),
stub=stub,
)
session = await client.open_session()
from zb_mom_ww_mxgateway.values import to_mx_value
entries = [
pb.WriteBulkEntry(item_handle=34, user_id=99, value=to_mx_value(123)),
]
results = await session.write_bulk(12, entries)
assert results[0].was_successful is True
cmd = stub.invoke.requests[0].command
assert cmd.kind == pb.MX_COMMAND_KIND_WRITE_BULK
assert cmd.write_bulk.server_handle == 12
assert cmd.write_bulk.entries[0].item_handle == 34
assert cmd.write_bulk.entries[0].user_id == 99
@pytest.mark.asyncio
async def test_session_write2_bulk_sends_expected_request_shape() -> None:
stub = _BulkFakeStub()
stub.set_invoke_replies(
[
pb.MxCommandReply(
session_id="session-1",
kind=pb.MX_COMMAND_KIND_WRITE2_BULK,
protocol_status=pb.ProtocolStatus(code=pb.PROTOCOL_STATUS_CODE_OK),
write2_bulk=pb.BulkWriteReply(
results=[pb.BulkWriteResult(was_successful=True)],
),
),
],
)
client = await GatewayClient.connect(
ClientOptions(endpoint="fake", api_key="mxgw_test_secret", plaintext=True),
stub=stub,
)
session = await client.open_session()
from zb_mom_ww_mxgateway.values import to_mx_value
entries = [
pb.Write2BulkEntry(
item_handle=34,
user_id=99,
value=to_mx_value(123),
timestamp_value=to_mx_value(1.5),
),
]
results = await session.write2_bulk(12, entries)
assert results[0].was_successful is True
cmd = stub.invoke.requests[0].command
assert cmd.kind == pb.MX_COMMAND_KIND_WRITE2_BULK
assert cmd.write2_bulk.server_handle == 12
assert cmd.write2_bulk.entries[0].item_handle == 34
@pytest.mark.asyncio
async def test_session_write_secured_bulk_sends_expected_request_shape() -> None:
stub = _BulkFakeStub()
stub.set_invoke_replies(
[
pb.MxCommandReply(
session_id="session-1",
kind=pb.MX_COMMAND_KIND_WRITE_SECURED_BULK,
protocol_status=pb.ProtocolStatus(code=pb.PROTOCOL_STATUS_CODE_OK),
write_secured_bulk=pb.BulkWriteReply(
results=[pb.BulkWriteResult(was_successful=True)],
),
),
],
)
client = await GatewayClient.connect(
ClientOptions(endpoint="fake", api_key="mxgw_test_secret", plaintext=True),
stub=stub,
)
session = await client.open_session()
from zb_mom_ww_mxgateway.values import to_mx_value
entries = [
pb.WriteSecuredBulkEntry(
item_handle=34,
current_user_id=42,
verifier_user_id=43,
value=to_mx_value("secret"),
),
]
results = await session.write_secured_bulk(12, entries)
assert results[0].was_successful is True
cmd = stub.invoke.requests[0].command
assert cmd.kind == pb.MX_COMMAND_KIND_WRITE_SECURED_BULK
assert cmd.write_secured_bulk.server_handle == 12
assert cmd.write_secured_bulk.entries[0].current_user_id == 42
assert cmd.write_secured_bulk.entries[0].verifier_user_id == 43
@pytest.mark.asyncio
async def test_session_write_secured2_bulk_sends_expected_request_shape() -> None:
stub = _BulkFakeStub()
stub.set_invoke_replies(
[
pb.MxCommandReply(
session_id="session-1",
kind=pb.MX_COMMAND_KIND_WRITE_SECURED2_BULK,
protocol_status=pb.ProtocolStatus(code=pb.PROTOCOL_STATUS_CODE_OK),
write_secured2_bulk=pb.BulkWriteReply(
results=[pb.BulkWriteResult(was_successful=True)],
),
),
],
)
client = await GatewayClient.connect(
ClientOptions(endpoint="fake", api_key="mxgw_test_secret", plaintext=True),
stub=stub,
)
session = await client.open_session()
from zb_mom_ww_mxgateway.values import to_mx_value
entries = [
pb.WriteSecured2BulkEntry(
item_handle=34,
current_user_id=42,
verifier_user_id=43,
value=to_mx_value("secret"),
timestamp_value=to_mx_value(1.5),
),
]
results = await session.write_secured2_bulk(12, entries)
assert results[0].was_successful is True
cmd = stub.invoke.requests[0].command
assert cmd.kind == pb.MX_COMMAND_KIND_WRITE_SECURED2_BULK
assert cmd.write_secured2_bulk.entries[0].current_user_id == 42
@pytest.mark.asyncio
async def test_stream_alarms_yields_feed_messages_and_cancels_on_close() -> None:
transitions = [
pb.AlarmFeedMessage(
transition=pb.OnAlarmTransitionEvent(
alarm_full_reference="Tank01.Level.HiHi",
transition_kind=pb.ALARM_TRANSITION_KIND_RAISE,
),
),
]
stream = _AlarmFakeStream(transitions)
stub = _BulkFakeStub()
stub.set_alarm_stream(stream)
client = await GatewayClient.connect(
ClientOptions(endpoint="fake", api_key="mxgw_test_secret", plaintext=True),
stub=stub,
)
iterator = client.stream_alarms(pb.StreamAlarmsRequest(alarm_filter_prefix="Tank01."))
first = await anext(iterator)
await iterator.aclose()
assert first.transition.alarm_full_reference == "Tank01.Level.HiHi"
assert stream.cancelled
assert stub.stream_alarms_metadata == (("authorization", "Bearer mxgw_test_secret"),)
assert stub.stream_alarms_request.alarm_filter_prefix == "Tank01."
# ---- CLI happy-path coverage for the new subcommands ----
def _install_fake_connect(monkeypatch, stub: Any) -> dict[str, Any]:
"""Patch `GatewayClient.connect` so the CLI uses the supplied fake stub."""
captured: dict[str, Any] = {}
real_connect = GatewayClient.connect
@classmethod
async def _spy_connect(cls, options: ClientOptions, **kwargs: Any) -> GatewayClient:
captured["options"] = options
return await real_connect(options, stub=stub)
monkeypatch.setattr(GatewayClient, "connect", _spy_connect)
return captured
def test_cli_read_bulk_happy_path(monkeypatch) -> None:
stub = _BulkFakeStub()
stub.set_invoke_replies(
[
pb.MxCommandReply(
session_id="session-1",
kind=pb.MX_COMMAND_KIND_READ_BULK,
protocol_status=pb.ProtocolStatus(code=pb.PROTOCOL_STATUS_CODE_OK),
read_bulk=pb.BulkReadReply(
results=[
pb.BulkReadResult(
tag_address="Tank01.Level",
was_successful=True,
),
],
),
),
],
)
_install_fake_connect(monkeypatch, stub)
runner = CliRunner()
result = runner.invoke(
main,
[
"read-bulk",
"--endpoint",
"localhost:5000",
"--plaintext",
"--session-id",
"session-1",
"--server-handle",
"12",
"--items",
"Tank01.Level",
"--timeout-ms",
"1500",
"--json",
],
)
assert result.exit_code == 0, result.output
payload = json.loads(result.output)
assert payload["results"][0]["tagAddress"] == "Tank01.Level"
def test_cli_write_bulk_happy_path(monkeypatch) -> None:
stub = _BulkFakeStub()
stub.set_invoke_replies(
[
pb.MxCommandReply(
session_id="session-1",
kind=pb.MX_COMMAND_KIND_WRITE_BULK,
protocol_status=pb.ProtocolStatus(code=pb.PROTOCOL_STATUS_CODE_OK),
write_bulk=pb.BulkWriteReply(
results=[pb.BulkWriteResult(was_successful=True)],
),
),
],
)
_install_fake_connect(monkeypatch, stub)
runner = CliRunner()
result = runner.invoke(
main,
[
"write-bulk",
"--endpoint",
"localhost:5000",
"--plaintext",
"--session-id",
"session-1",
"--server-handle",
"12",
"--item-handles",
"34",
"--values",
"123",
"--type",
"int32",
"--json",
],
)
assert result.exit_code == 0, result.output
payload = json.loads(result.output)
assert payload["results"][0]["wasSuccessful"] is True
cmd = stub.invoke.requests[0].command
assert cmd.kind == pb.MX_COMMAND_KIND_WRITE_BULK
def test_cli_stream_alarms_happy_path(monkeypatch) -> None:
transitions = [
pb.AlarmFeedMessage(
transition=pb.OnAlarmTransitionEvent(
alarm_full_reference="Tank01.Level.HiHi",
transition_kind=pb.ALARM_TRANSITION_KIND_RAISE,
),
),
]
stream = _AlarmFakeStream(transitions)
stub = _BulkFakeStub()
stub.set_alarm_stream(stream)
_install_fake_connect(monkeypatch, stub)
runner = CliRunner()
result = runner.invoke(
main,
[
"stream-alarms",
"--endpoint",
"localhost:5000",
"--plaintext",
"--max-messages",
"1",
"--timeout",
"5.0",
"--filter-prefix",
"Tank01.",
"--json",
],
)
assert result.exit_code == 0, result.output
payload = json.loads(result.output)
assert payload["messages"][0]["transition"]["alarmFullReference"] == "Tank01.Level.HiHi"
def test_cli_acknowledge_alarm_happy_path(monkeypatch) -> None:
stub = _BulkFakeStub()
stub.acknowledge_alarm = _BulkFakeUnary(
[
pb.AcknowledgeAlarmReply(
correlation_id="corr-1",
protocol_status=pb.ProtocolStatus(code=pb.PROTOCOL_STATUS_CODE_OK),
status=pb.MxStatusProxy(success=1, category=pb.MX_STATUS_CATEGORY_OK),
),
],
)
stub.AcknowledgeAlarm = stub.acknowledge_alarm
_install_fake_connect(monkeypatch, stub)
runner = CliRunner()
result = runner.invoke(
main,
[
"acknowledge-alarm",
"--endpoint",
"localhost:5000",
"--plaintext",
"--reference",
"Tank01.Level.HiHi",
"--comment",
"investigating",
"--operator",
"alice",
"--json",
],
)
assert result.exit_code == 0, result.output
captured_request = stub.acknowledge_alarm.requests[0]
assert captured_request.alarm_full_reference == "Tank01.Level.HiHi"
assert captured_request.comment == "investigating"
assert captured_request.operator_user == "alice"
# ---------------------------------------------------------------------------
# Client.Python-026 — `import time` at module scope; tighter cleanup excepts.
# ---------------------------------------------------------------------------
def test_commands_module_imports_time_at_module_scope() -> None:
"""`time` must be imported at module scope, not inside `_bench_read_bulk`.
`inspect.getsource(_bench_read_bulk)` must not contain a function-local
``import time`` statement.
"""
import inspect
source = inspect.getsource(cli_commands._bench_read_bulk)
# The function body must NOT contain a function-local `import time` line.
for line in source.splitlines():
stripped = line.strip()
assert stripped != "import time", (
f"_bench_read_bulk must not have function-local `import time`: {line!r}"
)
# And the module-level `time` attribute must be present.
assert hasattr(cli_commands, "time"), (
"`time` must be imported at module scope on commands.py"
)
assert cli_commands.time is _time_module_ref
def test_commands_module_bench_read_bulk_does_not_use_bare_except_pass() -> None:
"""The two `except Exception: pass` cleanup blocks in `_bench_read_bulk`
must be removed in favour of either logging or a narrower exception class.
"""
import inspect
source = inspect.getsource(cli_commands._bench_read_bulk)
# Reject the bare `except Exception:` followed by `pass` pattern in
# `_bench_read_bulk`. We tolerate `except Exception as <name>:` because the
# fix logs the exception.
pattern = re.compile(r"except\s+Exception\s*:\s*\n\s*pass\b")
assert not pattern.search(source), (
"_bench_read_bulk cleanup blocks must log or narrow the except clause"
)
+17 -7
View File
@@ -1,9 +1,19 @@
[target.'cfg(windows)']
# Bump the default 1 MB Windows stack to 8 MB. clap-derive builds a large
# Command enum in this CLI (one variant per subcommand, each carrying flag
# args); in debug builds the enum is materialized on the stack without
# MSVC-only: bump the default 1 MB Windows stack to 8 MB. clap-derive builds
# a large Command enum in this CLI (one variant per subcommand, each carrying
# flag args); in debug builds the enum is materialized on the stack without
# optimization and overflows the default Windows main-thread stack before
# even reaching our code. Release builds are unaffected but the e2e matrix
# drives the CLI through `cargo run` (debug), so the link-arg ships with
# every dev-time invocation.
# even reaching our code.
#
# The /STACK: link-arg goes into the PE header's IMAGE_OPTIONAL_HEADER.
# SizeOfStackReserve at link time and applies to both debug and release
# builds — release artifacts ship with the same 8 MB stack reservation. At
# runtime the optimizer elides the enum from the stack frame, so release
# builds would not overflow without this setting; it is kept on for them so
# both build flavours produce binaries with identical stack metadata.
#
# `/STACK:` is an MSVC-linker (`link.exe` / `lld-link`) directive. The
# `target_env = "msvc"` selector below scopes the rustflag to the MSVC
# toolchain so `x86_64-pc-windows-gnu` (mingw) builds, which route link
# args through the GNU linker and reject `/STACK:`, are unaffected.
[target.'cfg(all(windows, target_env = "msvc"))']
rustflags = ["-C", "link-arg=/STACK:8388608"]
+115 -12
View File
@@ -56,7 +56,23 @@ Expected dependencies:
- `clap`
- `serde`
- `serde_json`
- `tracing`
## Windows Build Notes
`clients/rust/.cargo/config.toml` carries an MSVC-scoped rustflag that bumps
the default 1 MB Windows main-thread stack to 8 MB
(`-C link-arg=/STACK:8388608`, under `cfg(all(windows, target_env = "msvc"))`).
The setting is required because clap-derive materialises a large `Command`
enum (one variant per CLI subcommand, each carrying its flag args) on the
main task's stack in debug builds, before any user code runs; the default 1
MB stack overflows during enum construction. The `/STACK:` link-arg writes
into the PE header's `IMAGE_OPTIONAL_HEADER.SizeOfStackReserve` at link
time, so both debug and release artifacts ship with the same 8 MB stack
reservation. Release builds would not overflow without it (the optimizer
elides the enum from the stack frame), but the setting is kept on for
release too so both build flavours produce binaries with identical stack
metadata. The MSVC-only selector keeps `x86_64-pc-windows-gnu` (mingw)
builds unaffected, since the GNU linker rejects `/STACK:`.
## Library API
@@ -94,18 +110,65 @@ impl Session {
pub async fn add_item(&self, server_handle: i32, item: &str) -> Result<i32, Error>;
pub async fn add_item2(&self, server_handle: i32, item: &str, context: &str) -> Result<i32, Error>;
pub async fn advise(&self, server_handle: i32, item_handle: i32) -> Result<(), Error>;
pub async fn un_advise(&self, server_handle: i32, item_handle: i32) -> Result<(), Error>;
pub async fn remove_item(&self, server_handle: i32, item_handle: i32) -> Result<(), Error>;
pub async fn add_item_bulk(&self, server_handle: i32, tag_addresses: Vec<String>) -> Result<Vec<SubscribeResult>, Error>;
pub async fn advise_item_bulk(&self, server_handle: i32, item_handles: Vec<i32>) -> Result<Vec<SubscribeResult>, Error>;
pub async fn remove_item_bulk(&self, server_handle: i32, item_handles: Vec<i32>) -> Result<Vec<SubscribeResult>, Error>;
pub async fn un_advise_item_bulk(&self, server_handle: i32, item_handles: Vec<i32>) -> Result<Vec<SubscribeResult>, Error>;
pub async fn subscribe_bulk(&self, server_handle: i32, tag_addresses: Vec<String>) -> Result<Vec<SubscribeResult>, Error>;
pub async fn unsubscribe_bulk(&self, server_handle: i32, item_handles: Vec<i32>) -> Result<Vec<SubscribeResult>, Error>;
pub async fn read_bulk<S: AsRef<str>>(&self, server_handle: i32, tag_addresses: &[S], timeout_ms: u32) -> Result<Vec<BulkReadResult>, Error>;
pub async fn write(&self, server_handle: i32, item_handle: i32, value: MxValue, user_id: i32) -> Result<(), Error>;
pub async fn write2(&self, server_handle: i32, item_handle: i32, value: MxValue, timestamp_value: MxValue, user_id: i32) -> Result<(), Error>;
pub async fn write_bulk(&self, server_handle: i32, entries: Vec<WriteBulkEntry>) -> Result<Vec<BulkWriteResult>, Error>;
pub async fn write2_bulk(&self, server_handle: i32, entries: Vec<Write2BulkEntry>) -> Result<Vec<BulkWriteResult>, Error>;
pub async fn write_secured_bulk(&self, server_handle: i32, entries: Vec<WriteSecuredBulkEntry>) -> Result<Vec<BulkWriteResult>, Error>;
pub async fn write_secured2_bulk(&self, server_handle: i32, entries: Vec<WriteSecured2BulkEntry>) -> Result<Vec<BulkWriteResult>, Error>;
pub async fn events(&self) -> Result<impl Stream<Item = Result<MxEvent, Error>>, Error>;
pub async fn close(&self) -> Result<(), Error>;
}
```
The per-entry credentials and timestamps (`user_id`, `timestamp_value`,
`current_user_id`, `verifier_user_id`) live on the `WriteBulkEntry` /
`Write2BulkEntry` / `WriteSecuredBulkEntry` / `WriteSecured2BulkEntry`
structs rather than as trailing positional arguments on the bulk-write
helpers, matching the protobuf shapes in `mxaccess_gateway.proto`.
`read_bulk` is generic over `AsRef<str>` so callers can pass `&[String]` or
`&[&str]` without cloning at the call site (the cross-language bench-read-bulk
hot loop relies on this).
The `session::next_correlation_id` helper is `pub` and re-exported at the
crate root (`zb_mom_ww_mxgateway_client::next_correlation_id`); raw-RPC
consumers like the `mxgw` CLI's `Ping`, `CloseSession`, `StreamAlarms`,
`AcknowledgeAlarm`, and `BenchReadBulk` paths call it so every request
carries a unique correlation id that gateway logs can tell apart from
concurrent CLI smokes. The textual format is intentionally not part of the
public contract.
## Alarms
`GatewayClient` exposes the gateway's session-less central alarm surface:
```rust
pub type AlarmFeedStream = Pin<Box<dyn Stream<Item = Result<AlarmFeedMessage, Error>> + Send + 'static>>;
impl GatewayClient {
pub async fn stream_alarms(&self, request: StreamAlarmsRequest) -> Result<AlarmFeedStream, Error>;
pub async fn acknowledge_alarm(&self, request: AcknowledgeAlarmRequest) -> Result<AcknowledgeAlarmReply, Error>;
}
```
`stream_alarms` opens with one `active_alarm` per currently-active alarm
(the ConditionRefresh snapshot), then a single `snapshot_complete`, then a
`transition` for every subsequent raise / acknowledge / clear. The feed is
served by the gateway's always-on alarm monitor — no worker session is
opened — so any number of clients may attach. Dropping the stream cancels
the gRPC call cooperatively. `acknowledge_alarm` is idempotent at the
MxAccess layer; the returned `AcknowledgeAlarmReply` carries the native
MxStatus from the worker.
## Authentication
Use a `tonic` interceptor or request extension layer to add:
@@ -140,19 +203,31 @@ Use `thiserror`:
```rust
pub enum Error {
InvalidEndpoint { endpoint: String, detail: String },
InvalidArgument { name: String, detail: String },
Transport(tonic::transport::Error),
Status(tonic::Status),
Authentication(String),
Authorization(String),
Session(SessionError),
Worker(WorkerError),
Command(CommandError),
MxAccess(MxAccessError),
Timeout,
Cancelled,
Authentication { message: String, status: Box<tonic::Status> },
Authorization { message: String, status: Box<tonic::Status> },
Timeout { message: String, status: Box<tonic::Status> },
Cancelled { message: String, status: Box<tonic::Status> },
Unavailable { message: String, status: Box<tonic::Status> },
Status(Box<tonic::Status>),
Command(Box<CommandError>),
ProtocolStatus { operation: &'static str, code: ProtocolStatusCode, message: String },
MalformedReply { detail: String },
}
```
- `Unavailable` classifies `Code::Unavailable` / `Code::ResourceExhausted`
so callers can distinguish transient failures from permanent ones.
- `MalformedReply` surfaces the rare case where the gateway returned a
protocol-level `Ok` envelope but the typed payload arm was missing or did
not match the command kind (e.g. an `AddItemReply` body on a `WriteBulk`
reply). This is distinct from `ProtocolStatus` because the protocol-level
envelope itself succeeded; the corruption is in the payload shape.
- `InvalidEndpoint` is raised before any RPC dispatch when the endpoint URL
or CA file fails to parse / load.
Preserve raw command replies in `CommandError` where applicable.
## Test CLI
@@ -165,11 +240,39 @@ Commands:
```text
mxgw version
mxgw smoke --endpoint http://localhost:5000 --api-key-env MXGATEWAY_API_KEY --item TestChildObject.TestInt
mxgw ping
mxgw open-session
mxgw close-session --session-id <id>
mxgw register --session-id <id>
mxgw add-item --session-id <id> --server-handle <h> --item <tag>
mxgw advise --session-id <id> --server-handle <h> --item-handle <h>
mxgw subscribe-bulk --session-id <id> --server-handle <h> --items <csv>
mxgw unsubscribe-bulk --session-id <id> --server-handle <h> --item-handles <csv>
mxgw read-bulk --session-id <id> --server-handle <h> --items <csv> [--timeout-ms <ms>]
mxgw write --session-id <id> --server-handle 1 --item-handle 1 --value-type int32 --value 123
mxgw write2 --session-id <id> --server-handle 1 --item-handle 1 --value-type int32 --value 123 --timestamp <iso>
mxgw write-bulk --session-id <id> --server-handle <h> --item-handles <csv> --value-type <t> --values <csv>
mxgw write2-bulk ...
mxgw write-secured-bulk ...
mxgw write-secured2-bulk ...
mxgw stream-events --session-id <id> --json
mxgw write --session-id <id> --server-handle 1 --item-handle 1 --type int32 --value 123
mxgw stream-alarms [--filter-prefix <prefix>] [--max-events <n>]
mxgw acknowledge-alarm --reference <full-ref> [--comment <txt>] [--operator <user>]
mxgw bench-read-bulk [--duration-seconds <n>] [--warmup-seconds <n>] [--bulk-size <n>]
mxgw smoke --endpoint http://localhost:5000 --api-key-env MXGATEWAY_API_KEY --item TestChildObject.TestInt
mxgw batch
mxgw galaxy {test-connection,last-deploy-time,discover-hierarchy,watch}
```
`batch` reads commands from stdin one per line and dispatches each through
the normal subcommand path; the loop terminates only on stdin EOF (blank
lines log an empty-EOR-bracketed result and continue) so accidental empty
lines from the PowerShell e2e harness do not silently end the session.
`bench-read-bulk` opens its own session, subscribes to `--bulk-size` tags so
the worker's value cache populates from OnDataChange events, hammers
`read_bulk` in a tight loop for `--duration-seconds`, and emits the
cross-language JSON shape that `scripts/bench-read-bulk.ps1` collates.
JSON output should use `serde_json`.
## Unit Tests
+63 -18
View File
@@ -26,8 +26,8 @@ use zb_mom_ww_mxgateway_client::generated::mxaccess_gateway::v1::{
WriteBulkEntry, WriteSecured2BulkEntry, WriteSecuredBulkEntry,
};
use zb_mom_ww_mxgateway_client::{
ApiKey, ClientOptions, Error, GalaxyClient, GatewayClient, MxValue, MxValueProjection,
CLIENT_VERSION, GATEWAY_PROTOCOL_VERSION, WORKER_PROTOCOL_VERSION,
next_correlation_id, ApiKey, ClientOptions, Error, GalaxyClient, GatewayClient, MxValue,
MxValueProjection, CLIENT_VERSION, GATEWAY_PROTOCOL_VERSION, WORKER_PROTOCOL_VERSION,
};
const MAX_AGGREGATE_EVENTS: usize = 10_000;
@@ -359,8 +359,9 @@ enum Command {
/// write `__MXGW_BATCH_EOR__` to stdout after every result. Errors are
/// written as `{"error":"…","type":"error"}` JSON to stdout (not stderr)
/// so the harness can parse them without interleaving stderr. The loop
/// never terminates on command error; only stdin EOF (or an empty line)
/// ends the session.
/// never terminates on command error or accidental blank lines; only
/// stdin EOF ends the session — empty lines log an empty-EOR-bracketed
/// result and continue, matching the other four language CLIs.
Batch,
#[command(subcommand)]
Galaxy(GalaxyCommand),
@@ -503,7 +504,7 @@ async fn dispatch(command: Command) -> Result<(), Error> {
let client = connect(connection).await?;
let reply = client
.invoke(MxCommandRequest {
client_correlation_id: "rust-cli-ping".to_owned(),
client_correlation_id: next_correlation_id("cli-ping"),
command: Some(MxCommand {
kind: MxCommandKind::Ping as i32,
payload: Some(zb_mom_ww_mxgateway_client::generated::mxaccess_gateway::v1::mx_command::Payload::Ping(
@@ -550,7 +551,7 @@ async fn dispatch(command: Command) -> Result<(), Error> {
let reply = client
.close_session_raw(CloseSessionRequest {
session_id,
client_correlation_id: "rust-cli-close-session".to_owned(),
client_correlation_id: next_correlation_id("cli-close-session"),
})
.await?;
if json {
@@ -624,7 +625,7 @@ async fn dispatch(command: Command) -> Result<(), Error> {
json,
} => {
let session = session_for(connection, session_id).await?;
let results = session.read_bulk(server_handle, items, timeout_ms).await?;
let results = session.read_bulk(server_handle, &items, timeout_ms).await?;
print_read_bulk_results("read-bulk", &results, json);
}
Command::WriteBulk {
@@ -832,7 +833,7 @@ async fn dispatch(command: Command) -> Result<(), Error> {
let client = connect(connection).await?;
let mut stream = client
.stream_alarms(StreamAlarmsRequest {
client_correlation_id: "rust-cli-stream-alarms".to_owned(),
client_correlation_id: next_correlation_id("cli-stream-alarms"),
alarm_filter_prefix: filter_prefix.unwrap_or_default(),
})
.await?;
@@ -869,7 +870,7 @@ async fn dispatch(command: Command) -> Result<(), Error> {
let client = connect(connection).await?;
let reply = client
.acknowledge_alarm(AcknowledgeAlarmRequest {
client_correlation_id: "rust-cli-acknowledge-alarm".to_owned(),
client_correlation_id: next_correlation_id("cli-acknowledge-alarm"),
alarm_full_reference: reference,
comment,
operator_user: operator,
@@ -1113,8 +1114,15 @@ const BATCH_EOR: &str = "__MXGW_BATCH_EOR__";
/// each through the normal [`dispatch`] path, and write [`BATCH_EOR`] to
/// stdout after every result. Errors are serialised as JSON to stdout so
/// the harness can parse them without interleaving stderr. The loop never
/// terminates on command error; only stdin EOF or an empty line ends the
/// session.
/// terminates on command error or accidental blank lines; only stdin EOF
/// ends the session — empty lines log an empty-EOR-bracketed result and
/// continue.
///
/// `std::io::Stdin::lock().lines()` is a blocking iterator and the dispatch
/// future is spawned on a separate tokio task so the runtime's main worker
/// stays free. When the runtime is multi-threaded the blocking read keeps
/// one worker parked on `ReadFile`; that is acceptable here because no other
/// future on the main task needs to run while we wait for the next command.
async fn run_batch() -> Result<(), Error> {
let stdin = io::stdin();
let stdout = io::stdout();
@@ -1125,12 +1133,11 @@ async fn run_batch() -> Result<(), Error> {
detail: e.to_string(),
})?;
if line.is_empty() {
break;
}
let parts: Vec<&str> = line.split_ascii_whitespace().collect();
if parts.is_empty() {
// Empty / whitespace-only line: log an empty-EOR-bracketed
// result and continue so accidental blank lines from the
// PowerShell e2e harness do not silently end the session.
println!("{BATCH_EOR}");
stdout.lock().flush().ok();
continue;
@@ -1388,6 +1395,7 @@ async fn run_bench_read_bulk(
let bench_outcome = async {
let server_handle = session.register(&client_name).await?;
let subscribe_results = session.subscribe_bulk(server_handle, tags.clone()).await?;
let tags_ref: &[String] = &tags;
let item_handles: Vec<i32> = subscribe_results
.iter()
.filter(|r| r.was_successful)
@@ -1401,7 +1409,7 @@ async fn run_bench_read_bulk(
let warmup_deadline = Instant::now() + Duration::from_secs(warmup_seconds);
while Instant::now() < warmup_deadline {
let _ = session
.read_bulk(server_handle, tags.clone(), timeout_ms_param)
.read_bulk(server_handle, tags_ref, timeout_ms_param)
.await;
}
@@ -1419,7 +1427,7 @@ async fn run_bench_read_bulk(
while Instant::now() < steady_deadline {
let call_start = Instant::now();
let result = session
.read_bulk(server_handle, tags.clone(), timeout_ms_param)
.read_bulk(server_handle, tags_ref, timeout_ms_param)
.await;
let elapsed = call_start.elapsed();
latencies_ms.push(elapsed.as_secs_f64() * 1000.0);
@@ -1473,7 +1481,7 @@ async fn run_bench_read_bulk(
let close_result = client
.close_session_raw(CloseSessionRequest {
session_id: session_id.clone(),
client_correlation_id: "rust-cli-bench-read-bulk-close".to_owned(),
client_correlation_id: next_correlation_id("cli-bench-read-bulk-close"),
})
.await;
@@ -2100,6 +2108,43 @@ mod tests {
assert_eq!(super::BATCH_EOR, "__MXGW_BATCH_EOR__");
}
#[test]
fn bench_percentile_summary_matches_hand_built_sample() {
// Hand-built sample with 5 values: 1, 2, 3, 4, 5.
let sample: Vec<f64> = vec![1.0, 2.0, 3.0, 4.0, 5.0];
let summary = super::percentile_summary(&sample);
assert_eq!(summary.max, 5.0);
// Mean = 15/5 = 3.0
assert!((summary.mean - 3.0).abs() < f64::EPSILON);
// p50: rank = 0.5 * 4 = 2 -> sorted[2] = 3.0
assert!((summary.p50 - 3.0).abs() < f64::EPSILON);
// p95: rank = 0.95 * 4 = 3.8 -> 4.0 + 0.8 * (5.0 - 4.0) = 4.8
assert!((summary.p95 - 4.8).abs() < f64::EPSILON);
// p99: rank = 0.99 * 4 = 3.96 -> 4.0 + 0.96 * 1.0 = 4.96
assert!((summary.p99 - 4.96).abs() < f64::EPSILON);
}
#[test]
fn bench_percentile_summary_handles_empty_sample() {
let summary = super::percentile_summary(&[]);
assert_eq!(summary.p50, 0.0);
assert_eq!(summary.p95, 0.0);
assert_eq!(summary.p99, 0.0);
assert_eq!(summary.max, 0.0);
assert_eq!(summary.mean, 0.0);
}
#[test]
fn bench_percentile_summary_handles_single_value_sample() {
let summary = super::percentile_summary(&[42.0]);
assert_eq!(summary.p50, 42.0);
assert_eq!(summary.p95, 42.0);
assert_eq!(summary.p99, 42.0);
assert_eq!(summary.max, 42.0);
assert_eq!(summary.mean, 42.0);
}
#[test]
fn rfc3339_parser_round_trips_z_and_offset_inputs() {
// 2026-04-28T15:30:00Z = 1_777_995_000 (sanity-checked once below)
+25
View File
@@ -106,6 +106,27 @@ pub enum Error {
/// Detail message from the server.
message: String,
},
/// Gateway returned an `Ok` protocol status but the reply lacked the
/// expected typed payload (or carried the wrong payload arm). Distinct
/// from [`Error::ProtocolStatus`] because the protocol-level envelope
/// itself succeeded — the corruption is in the payload shape.
#[error("gateway returned a malformed reply: {detail}")]
MalformedReply {
/// Human-readable description of what was missing or mismatched.
detail: String,
},
/// Server returned `Unavailable` or `ResourceExhausted` — classify
/// transient failures separately from the catch-all [`Error::Status`].
#[error("gateway unavailable: {message}")]
Unavailable {
/// Redacted server-supplied detail message.
message: String,
/// Original `tonic::Status`.
#[source]
status: Box<tonic::Status>,
},
}
/// Wrapper around an [`MxCommandReply`] whose `protocol_status` reported a
@@ -174,6 +195,10 @@ impl From<tonic::Status> for Error {
message,
status: Box::new(status),
},
Code::Unavailable | Code::ResourceExhausted => Self::Unavailable {
message,
status: Box::new(status),
},
_ => Self::Status(Box::new(status)),
}
}
+1 -1
View File
@@ -279,7 +279,7 @@ mod tests {
_request: Request<GetLastDeployTimeRequest>,
) -> Result<Response<GetLastDeployTimeReply>, Status> {
let present = *self.state.present.lock().unwrap();
let time = self.state.last_deploy.lock().unwrap().clone();
let time = *self.state.last_deploy.lock().unwrap();
Ok(Response::new(GetLastDeployTimeReply {
present,
time_of_last_deploy: time,
+1 -1
View File
@@ -32,7 +32,7 @@ pub use galaxy::{DeployEventStream, GalaxyClient};
#[doc(inline)]
pub use options::ClientOptions;
#[doc(inline)]
pub use session::Session;
pub use session::{next_correlation_id, Session};
#[doc(inline)]
pub use value::{MxArrayProjection, MxArrayValue, MxStatus, MxValue, MxValueProjection};
#[doc(inline)]
+2
View File
@@ -95,6 +95,7 @@ impl ClientOptions {
self
}
/// Maximum encoded/decoded gRPC message size in bytes (default 16 MiB).
pub fn with_max_grpc_message_bytes(mut self, max_grpc_message_bytes: usize) -> Self {
self.max_grpc_message_bytes = max_grpc_message_bytes;
self
@@ -140,6 +141,7 @@ impl ClientOptions {
self.stream_timeout
}
/// Configured maximum encoded/decoded gRPC message size in bytes.
pub fn max_grpc_message_bytes(&self) -> usize {
self.max_grpc_message_bytes
}
+104 -61
View File
@@ -8,6 +8,8 @@
//! Bulk commands enforce a 1000-item cap before contacting the worker, in
//! line with the gateway's documented `MAX_BULK_ITEMS`.
use std::sync::atomic::{AtomicU64, Ordering};
use crate::client::{EventStream, GatewayClient};
use crate::error::{ensure_protocol_success, Error};
use crate::generated::mxaccess_gateway::v1::mx_command::Payload;
@@ -26,6 +28,28 @@ use crate::value::MxValue;
const MAX_BULK_ITEMS: usize = 1_000;
/// Process-wide monotonic sequence used by [`next_correlation_id`].
static CORRELATION_SEQUENCE: AtomicU64 = AtomicU64::new(1);
/// Build a per-call correlation id that embeds the supplied `label`.
///
/// The returned token is opaque and guaranteed to be unique within the
/// current process: every call increments a process-wide atomic counter,
/// so concurrent CLI smokes and library callers on the same machine produce
/// distinct ids that gateway logs can tell apart. The token carries no
/// embedded secret beyond `label`.
///
/// The exact textual format (currently `rust-client-{label}-{N}`) is *not*
/// part of the public contract — callers must not parse it. The crate root
/// re-exports this helper as
/// [`zb_mom_ww_mxgateway_client::next_correlation_id`] so out-of-tree
/// consumers can build correlation ids without referencing the `session`
/// module path.
pub fn next_correlation_id(label: &str) -> String {
let sequence = CORRELATION_SEQUENCE.fetch_add(1, Ordering::Relaxed);
format!("rust-client-{label}-{sequence}")
}
/// Handle to an opened gateway session.
///
/// `Session` carries the gateway-issued session id and a cloned
@@ -79,7 +103,7 @@ impl Session {
.client
.close_session_raw(CloseSessionRequest {
session_id: self.id.clone(),
client_correlation_id: "rust-client-close-session".to_owned(),
client_correlation_id: next_correlation_id("close-session"),
})
.await?;
ensure_protocol_success("close session", reply.protocol_status.as_ref())?;
@@ -102,7 +126,7 @@ impl Session {
)
.await?;
Ok(register_server_handle(&reply))
register_server_handle(&reply)
}
/// Run MXAccess `AddItem` against `server_handle` and return the
@@ -123,7 +147,7 @@ impl Session {
)
.await?;
Ok(add_item_handle(&reply))
add_item_handle(&reply)
}
/// Run MXAccess `AddItem2` (item with a caller-supplied context string)
@@ -149,7 +173,7 @@ impl Session {
)
.await?;
Ok(add_item2_handle(&reply))
add_item2_handle(&reply)
}
/// Run MXAccess `RemoveItem` for the given handle pair.
@@ -229,7 +253,7 @@ impl Session {
)
.await?;
Ok(bulk_results(reply, BulkReplyKind::AddItemBulk))
bulk_results(reply, BulkReplyKind::AddItem)
}
/// Bulk variant of [`Session::advise`].
@@ -253,7 +277,7 @@ impl Session {
)
.await?;
Ok(bulk_results(reply, BulkReplyKind::AdviseItemBulk))
bulk_results(reply, BulkReplyKind::AdviseItem)
}
/// Bulk variant of [`Session::remove_item`].
@@ -277,7 +301,7 @@ impl Session {
)
.await?;
Ok(bulk_results(reply, BulkReplyKind::RemoveItemBulk))
bulk_results(reply, BulkReplyKind::RemoveItem)
}
/// Bulk variant of [`Session::un_advise`].
@@ -301,7 +325,7 @@ impl Session {
)
.await?;
Ok(bulk_results(reply, BulkReplyKind::UnAdviseItemBulk))
bulk_results(reply, BulkReplyKind::UnAdviseItem)
}
/// Bulk `Subscribe` (atomic add-and-advise) for a list of tag addresses.
@@ -325,7 +349,7 @@ impl Session {
)
.await?;
Ok(bulk_results(reply, BulkReplyKind::SubscribeBulk))
bulk_results(reply, BulkReplyKind::Subscribe)
}
/// Bulk `Unsubscribe` (atomic un-advise-and-remove) for a list of
@@ -350,7 +374,7 @@ impl Session {
)
.await?;
Ok(bulk_results(reply, BulkReplyKind::UnsubscribeBulk))
bulk_results(reply, BulkReplyKind::Unsubscribe)
}
/// Bulk `Read` — snapshot the current value for each requested tag.
@@ -366,10 +390,10 @@ impl Session {
/// # Errors
///
/// Same conditions as [`Session::add_item_bulk`].
pub async fn read_bulk(
pub async fn read_bulk<S: AsRef<str>>(
&self,
server_handle: i32,
tag_addresses: Vec<String>,
tag_addresses: &[S],
timeout_ms: u32,
) -> Result<Vec<BulkReadResult>, Error> {
ensure_bulk_size("tag_addresses", tag_addresses.len())?;
@@ -378,16 +402,21 @@ impl Session {
MxCommandKind::ReadBulk,
Payload::ReadBulk(ReadBulkCommand {
server_handle,
tag_addresses,
tag_addresses: tag_addresses
.iter()
.map(|tag| tag.as_ref().to_owned())
.collect(),
timeout_ms,
}),
)
.await?;
Ok(match reply.payload {
Some(mx_command_reply::Payload::ReadBulk(reply)) => reply.results,
_ => Vec::new(),
})
match reply.payload {
Some(mx_command_reply::Payload::ReadBulk(reply)) => Ok(reply.results),
_ => Err(Error::MalformedReply {
detail: "read_bulk reply did not carry a ReadBulk payload".to_owned(),
}),
}
}
/// Bulk `Write` (sequential MXAccess Write per entry, on the worker's STA).
@@ -416,7 +445,7 @@ impl Session {
)
.await?;
Ok(bulk_write_results(reply, BulkWriteReplyKind::Write))
bulk_write_results(reply, BulkWriteReplyKind::Write)
}
/// Bulk `Write2` (timestamped) — see [`Session::write_bulk`].
@@ -440,7 +469,7 @@ impl Session {
)
.await?;
Ok(bulk_write_results(reply, BulkWriteReplyKind::Write2))
bulk_write_results(reply, BulkWriteReplyKind::Write2)
}
/// Bulk `WriteSecured` — credential-sensitive values follow the same
@@ -465,7 +494,7 @@ impl Session {
)
.await?;
Ok(bulk_write_results(reply, BulkWriteReplyKind::WriteSecured))
bulk_write_results(reply, BulkWriteReplyKind::WriteSecured)
}
/// Bulk `WriteSecured2` (timestamped) — see [`Session::write_secured_bulk`].
@@ -489,7 +518,7 @@ impl Session {
)
.await?;
Ok(bulk_write_results(reply, BulkWriteReplyKind::WriteSecured2))
bulk_write_results(reply, BulkWriteReplyKind::WriteSecured2)
}
/// Run MXAccess `Write` (single-value, no caller-supplied timestamp).
@@ -608,7 +637,7 @@ impl Session {
fn command_request(&self, kind: MxCommandKind, payload: Payload) -> MxCommandRequest {
MxCommandRequest {
session_id: self.id.clone(),
client_correlation_id: format!("rust-client-{}", kind.as_str_name()),
client_correlation_id: next_correlation_id(kind.as_str_name()),
command: Some(MxCommand {
kind: kind as i32,
payload: Some(payload),
@@ -628,71 +657,80 @@ fn ensure_bulk_size(name: &'static str, len: usize) -> Result<(), Error> {
}
}
fn register_server_handle(reply: &MxCommandReply) -> i32 {
fn register_server_handle(reply: &MxCommandReply) -> Result<i32, Error> {
match reply.payload.as_ref() {
Some(mx_command_reply::Payload::Register(register)) => register.server_handle,
Some(mx_command_reply::Payload::Register(register)) => Ok(register.server_handle),
_ => reply
.return_value
.as_ref()
.and_then(int32_reply_value)
.unwrap_or_default(),
.ok_or_else(|| Error::MalformedReply {
detail: "register reply lacked a server_handle payload or int32 return_value"
.to_owned(),
}),
}
}
fn add_item_handle(reply: &MxCommandReply) -> i32 {
fn add_item_handle(reply: &MxCommandReply) -> Result<i32, Error> {
match reply.payload.as_ref() {
Some(mx_command_reply::Payload::AddItem(add_item)) => add_item.item_handle,
Some(mx_command_reply::Payload::AddItem(add_item)) => Ok(add_item.item_handle),
_ => reply
.return_value
.as_ref()
.and_then(int32_reply_value)
.unwrap_or_default(),
.ok_or_else(|| Error::MalformedReply {
detail: "add_item reply lacked an item_handle payload or int32 return_value"
.to_owned(),
}),
}
}
fn add_item2_handle(reply: &MxCommandReply) -> i32 {
fn add_item2_handle(reply: &MxCommandReply) -> Result<i32, Error> {
match reply.payload.as_ref() {
Some(mx_command_reply::Payload::AddItem2(add_item)) => add_item.item_handle,
Some(mx_command_reply::Payload::AddItem2(add_item)) => Ok(add_item.item_handle),
_ => reply
.return_value
.as_ref()
.and_then(int32_reply_value)
.unwrap_or_default(),
.ok_or_else(|| Error::MalformedReply {
detail: "add_item2 reply lacked an item_handle payload or int32 return_value"
.to_owned(),
}),
}
}
enum BulkReplyKind {
AddItemBulk,
AdviseItemBulk,
RemoveItemBulk,
UnAdviseItemBulk,
SubscribeBulk,
UnsubscribeBulk,
AddItem,
AdviseItem,
RemoveItem,
UnAdviseItem,
Subscribe,
Unsubscribe,
}
fn bulk_results(reply: MxCommandReply, kind: BulkReplyKind) -> Vec<SubscribeResult> {
fn bulk_results(reply: MxCommandReply, kind: BulkReplyKind) -> Result<Vec<SubscribeResult>, Error> {
match (reply.payload, kind) {
(Some(mx_command_reply::Payload::AddItemBulk(reply)), BulkReplyKind::AddItemBulk) => {
reply.results
(Some(mx_command_reply::Payload::AddItemBulk(reply)), BulkReplyKind::AddItem) => {
Ok(reply.results)
}
(Some(mx_command_reply::Payload::AdviseItemBulk(reply)), BulkReplyKind::AdviseItemBulk) => {
reply.results
(Some(mx_command_reply::Payload::AdviseItemBulk(reply)), BulkReplyKind::AdviseItem) => {
Ok(reply.results)
}
(Some(mx_command_reply::Payload::RemoveItemBulk(reply)), BulkReplyKind::RemoveItemBulk) => {
reply.results
(Some(mx_command_reply::Payload::RemoveItemBulk(reply)), BulkReplyKind::RemoveItem) => {
Ok(reply.results)
}
(
Some(mx_command_reply::Payload::UnAdviseItemBulk(reply)),
BulkReplyKind::UnAdviseItemBulk,
) => reply.results,
(Some(mx_command_reply::Payload::SubscribeBulk(reply)), BulkReplyKind::SubscribeBulk) => {
reply.results
(Some(mx_command_reply::Payload::UnAdviseItemBulk(reply)), BulkReplyKind::UnAdviseItem) => {
Ok(reply.results)
}
(
Some(mx_command_reply::Payload::UnsubscribeBulk(reply)),
BulkReplyKind::UnsubscribeBulk,
) => reply.results,
_ => Vec::new(),
(Some(mx_command_reply::Payload::SubscribeBulk(reply)), BulkReplyKind::Subscribe) => {
Ok(reply.results)
}
(Some(mx_command_reply::Payload::UnsubscribeBulk(reply)), BulkReplyKind::Unsubscribe) => {
Ok(reply.results)
}
_ => Err(Error::MalformedReply {
detail: "bulk subscribe reply did not carry the expected payload arm".to_owned(),
}),
}
}
@@ -703,23 +741,28 @@ enum BulkWriteReplyKind {
WriteSecured2,
}
fn bulk_write_results(reply: MxCommandReply, kind: BulkWriteReplyKind) -> Vec<BulkWriteResult> {
fn bulk_write_results(
reply: MxCommandReply,
kind: BulkWriteReplyKind,
) -> Result<Vec<BulkWriteResult>, Error> {
match (reply.payload, kind) {
(Some(mx_command_reply::Payload::WriteBulk(reply)), BulkWriteReplyKind::Write) => {
reply.results
Ok(reply.results)
}
(Some(mx_command_reply::Payload::Write2Bulk(reply)), BulkWriteReplyKind::Write2) => {
reply.results
Ok(reply.results)
}
(
Some(mx_command_reply::Payload::WriteSecuredBulk(reply)),
BulkWriteReplyKind::WriteSecured,
) => reply.results,
) => Ok(reply.results),
(
Some(mx_command_reply::Payload::WriteSecured2Bulk(reply)),
BulkWriteReplyKind::WriteSecured2,
) => reply.results,
_ => Vec::new(),
) => Ok(reply.results),
_ => Err(Error::MalformedReply {
detail: "bulk write reply did not carry the expected payload arm".to_owned(),
}),
}
}
+569 -11
View File
@@ -20,16 +20,19 @@ use zb_mom_ww_mxgateway_client::generated::mxaccess_gateway::v1::mx_access_gatew
use zb_mom_ww_mxgateway_client::generated::mxaccess_gateway::v1::mx_command_reply;
use zb_mom_ww_mxgateway_client::generated::mxaccess_gateway::v1::mx_value::Kind;
use zb_mom_ww_mxgateway_client::generated::mxaccess_gateway::v1::{
AcknowledgeAlarmReply, AcknowledgeAlarmRequest, ActiveAlarmSnapshot, AddItemReply,
AlarmFeedMessage, BulkSubscribeReply, CloseSessionReply, CloseSessionRequest, MxCommandKind,
MxCommandReply, MxDataType, MxEvent, MxEventFamily, MxStatusCategory, MxStatusProxy,
MxStatusSource, MxValue, OpenSessionReply, OpenSessionRequest, ProtocolStatus,
ProtocolStatusCode, QueryActiveAlarmsRequest, SessionState, StreamAlarmsRequest,
StreamEventsRequest, SubscribeResult,
alarm_feed_message, AcknowledgeAlarmReply, AcknowledgeAlarmRequest, ActiveAlarmSnapshot,
AddItem2Reply, AddItemReply, AlarmConditionState, AlarmFeedMessage, AlarmTransitionKind,
BulkReadReply, BulkReadResult, BulkSubscribeReply, BulkWriteReply, BulkWriteResult,
CloseSessionReply, CloseSessionRequest, MxCommandKind, MxCommandReply, MxDataType, MxEvent,
MxEventFamily, MxStatusCategory, MxStatusProxy, MxStatusSource, MxValue,
OnAlarmTransitionEvent, OpenSessionReply, OpenSessionRequest, ProtocolStatus,
ProtocolStatusCode, QueryActiveAlarmsRequest, RegisterReply, SessionState, StreamAlarmsRequest,
StreamEventsRequest, SubscribeResult, Write2BulkEntry, WriteBulkEntry, WriteSecured2BulkEntry,
WriteSecuredBulkEntry,
};
use zb_mom_ww_mxgateway_client::{
ApiKey, ClientOptions, CommandError, Error, GatewayClient, MxStatus, MxValue as ClientMxValue,
MxValueProjection,
next_correlation_id, ApiKey, ClientOptions, CommandError, Error, GatewayClient, MxStatus,
MxValue as ClientMxValue, MxValueProjection,
};
#[tokio::test]
@@ -272,11 +275,414 @@ fn command_error_display_keeps_raw_reply_accessible() {
assert!(error.to_string().contains("MxaccessFailure"));
}
// ---- Client.Rust-022 / 024 regression coverage ---------------------------
#[tokio::test]
async fn register_returns_malformed_reply_when_ok_reply_has_no_payload() {
let state = Arc::new(FakeState::default());
*state.invoke_override.lock().await = Some(InvokeOverride::OkWithoutPayload);
let endpoint = spawn_fake_gateway(state.clone()).await;
let client = GatewayClient::connect(ClientOptions::new(endpoint))
.await
.unwrap();
let session = client.session("session-fixture");
let error = session.register("client").await.unwrap_err();
assert!(
matches!(error, Error::MalformedReply { .. }),
"expected MalformedReply, got {error:?}"
);
}
#[tokio::test]
async fn add_item_returns_malformed_reply_when_ok_reply_has_no_payload() {
let state = Arc::new(FakeState::default());
*state.invoke_override.lock().await = Some(InvokeOverride::OkWithoutPayload);
let endpoint = spawn_fake_gateway(state.clone()).await;
let client = GatewayClient::connect(ClientOptions::new(endpoint))
.await
.unwrap();
let session = client.session("session-fixture");
let error = session.add_item(12, "Plant.Area.Tag").await.unwrap_err();
assert!(
matches!(error, Error::MalformedReply { .. }),
"expected MalformedReply, got {error:?}"
);
}
#[tokio::test]
async fn add_item2_returns_malformed_reply_when_ok_reply_has_no_payload() {
let state = Arc::new(FakeState::default());
*state.invoke_override.lock().await = Some(InvokeOverride::OkWithoutPayload);
let endpoint = spawn_fake_gateway(state.clone()).await;
let client = GatewayClient::connect(ClientOptions::new(endpoint))
.await
.unwrap();
let session = client.session("session-fixture");
let error = session
.add_item2(12, "Plant.Area.Tag", "ctx")
.await
.unwrap_err();
assert!(
matches!(error, Error::MalformedReply { .. }),
"expected MalformedReply, got {error:?}"
);
}
#[tokio::test]
async fn subscribe_bulk_returns_malformed_reply_on_mismatched_payload_arm() {
let state = Arc::new(FakeState::default());
*state.invoke_override.lock().await = Some(InvokeOverride::OkWithMismatchedPayload);
let endpoint = spawn_fake_gateway(state.clone()).await;
let client = GatewayClient::connect(ClientOptions::new(endpoint))
.await
.unwrap();
let session = client.session("session-fixture");
let error = session
.subscribe_bulk(12, vec!["Area001.Pump001.Speed".to_owned()])
.await
.unwrap_err();
assert!(
matches!(error, Error::MalformedReply { .. }),
"expected MalformedReply, got {error:?}"
);
}
#[tokio::test]
async fn read_bulk_returns_malformed_reply_on_mismatched_payload_arm() {
let state = Arc::new(FakeState::default());
*state.invoke_override.lock().await = Some(InvokeOverride::OkWithMismatchedPayload);
let endpoint = spawn_fake_gateway(state.clone()).await;
let client = GatewayClient::connect(ClientOptions::new(endpoint))
.await
.unwrap();
let session = client.session("session-fixture");
let error = session
.read_bulk(12, &["Area001.Pump001.Speed".to_owned()], 1000)
.await
.unwrap_err();
assert!(
matches!(error, Error::MalformedReply { .. }),
"expected MalformedReply, got {error:?}"
);
}
#[tokio::test]
async fn write_bulk_returns_malformed_reply_on_mismatched_payload_arm() {
let state = Arc::new(FakeState::default());
*state.invoke_override.lock().await = Some(InvokeOverride::OkWithMismatchedPayload);
let endpoint = spawn_fake_gateway(state.clone()).await;
let client = GatewayClient::connect(ClientOptions::new(endpoint))
.await
.unwrap();
let session = client.session("session-fixture");
let error = session
.write_bulk(
12,
vec![WriteBulkEntry {
item_handle: 34,
value: Some(ClientMxValue::int32(1).into_proto()),
user_id: 0,
}],
)
.await
.unwrap_err();
assert!(
matches!(error, Error::MalformedReply { .. }),
"expected MalformedReply, got {error:?}"
);
}
#[tokio::test]
async fn unary_invoke_maps_status_unavailable_to_error_unavailable() {
let state = Arc::new(FakeState::default());
*state.invoke_override.lock().await =
Some(InvokeOverride::Unavailable("gateway restarting".to_owned()));
let endpoint = spawn_fake_gateway(state.clone()).await;
let client = GatewayClient::connect(ClientOptions::new(endpoint))
.await
.unwrap();
let session = client.session("session-fixture");
let error = session.add_item(12, "Plant.Area.Tag").await.unwrap_err();
assert!(
matches!(error, Error::Unavailable { .. }),
"expected Unavailable, got {error:?}"
);
}
#[tokio::test]
async fn read_bulk_round_trips_through_the_fake_gateway() {
let state = Arc::new(FakeState::default());
let endpoint = spawn_fake_gateway(state.clone()).await;
let client = GatewayClient::connect(ClientOptions::new(endpoint))
.await
.unwrap();
let session = client.session("session-fixture");
let results = session
.read_bulk(12, &["Area001.Pump001.Speed".to_owned()], 1000)
.await
.unwrap();
assert_eq!(results.len(), 1);
assert!(results[0].was_successful);
assert!(results[0].was_cached);
}
#[tokio::test]
async fn write_bulk_round_trips_through_the_fake_gateway() {
let state = Arc::new(FakeState::default());
let endpoint = spawn_fake_gateway(state.clone()).await;
let client = GatewayClient::connect(ClientOptions::new(endpoint))
.await
.unwrap();
let session = client.session("session-fixture");
let results = session
.write_bulk(
12,
vec![WriteBulkEntry {
item_handle: 34,
value: Some(ClientMxValue::int32(1).into_proto()),
user_id: 0,
}],
)
.await
.unwrap();
assert_eq!(results.len(), 1);
assert!(results[0].was_successful);
let last_command = state.last_command_kind.lock().await;
assert_eq!(*last_command, Some(MxCommandKind::WriteBulk as i32));
}
#[tokio::test]
async fn write2_bulk_round_trips_through_the_fake_gateway() {
let state = Arc::new(FakeState::default());
let endpoint = spawn_fake_gateway(state.clone()).await;
let client = GatewayClient::connect(ClientOptions::new(endpoint))
.await
.unwrap();
let session = client.session("session-fixture");
let results = session
.write2_bulk(
12,
vec![Write2BulkEntry {
item_handle: 34,
value: Some(ClientMxValue::int32(1).into_proto()),
timestamp_value: Some(ClientMxValue::string("2026-05-24T00:00:00Z").into_proto()),
user_id: 0,
}],
)
.await
.unwrap();
assert_eq!(results.len(), 1);
assert!(results[0].was_successful);
let last_command = state.last_command_kind.lock().await;
assert_eq!(*last_command, Some(MxCommandKind::Write2Bulk as i32));
}
#[tokio::test]
async fn write_secured_bulk_round_trips_through_the_fake_gateway() {
let state = Arc::new(FakeState::default());
let endpoint = spawn_fake_gateway(state.clone()).await;
let client = GatewayClient::connect(ClientOptions::new(endpoint))
.await
.unwrap();
let session = client.session("session-fixture");
let results = session
.write_secured_bulk(
12,
vec![WriteSecuredBulkEntry {
item_handle: 34,
value: Some(ClientMxValue::int32(1).into_proto()),
current_user_id: 0,
verifier_user_id: 0,
}],
)
.await
.unwrap();
assert_eq!(results.len(), 1);
assert!(results[0].was_successful);
let last_command = state.last_command_kind.lock().await;
assert_eq!(*last_command, Some(MxCommandKind::WriteSecuredBulk as i32));
}
#[tokio::test]
async fn write_secured2_bulk_round_trips_through_the_fake_gateway() {
let state = Arc::new(FakeState::default());
let endpoint = spawn_fake_gateway(state.clone()).await;
let client = GatewayClient::connect(ClientOptions::new(endpoint))
.await
.unwrap();
let session = client.session("session-fixture");
let results = session
.write_secured2_bulk(
12,
vec![WriteSecured2BulkEntry {
item_handle: 34,
value: Some(ClientMxValue::int32(1).into_proto()),
timestamp_value: Some(ClientMxValue::string("2026-05-24T00:00:00Z").into_proto()),
current_user_id: 0,
verifier_user_id: 0,
}],
)
.await
.unwrap();
assert_eq!(results.len(), 1);
assert!(results[0].was_successful);
let last_command = state.last_command_kind.lock().await;
assert_eq!(*last_command, Some(MxCommandKind::WriteSecured2Bulk as i32));
}
#[tokio::test]
async fn stream_alarms_emits_snapshot_then_complete_then_transition_in_order() {
let state = Arc::new(FakeState::default());
*state.stream_alarms_script.lock().await = Some(vec![
AlarmFeedMessage {
payload: Some(alarm_feed_message::Payload::ActiveAlarm(
ActiveAlarmSnapshot {
alarm_full_reference: "Tank01.Level.HiHi".to_owned(),
current_state: AlarmConditionState::Active as i32,
..ActiveAlarmSnapshot::default()
},
)),
},
AlarmFeedMessage {
payload: Some(alarm_feed_message::Payload::SnapshotComplete(true)),
},
AlarmFeedMessage {
payload: Some(alarm_feed_message::Payload::Transition(
OnAlarmTransitionEvent {
alarm_full_reference: "Tank01.Level.HiHi".to_owned(),
transition_kind: AlarmTransitionKind::Raise as i32,
..OnAlarmTransitionEvent::default()
},
)),
},
]);
let endpoint = spawn_fake_gateway(state.clone()).await;
let client = GatewayClient::connect(ClientOptions::new(endpoint))
.await
.unwrap();
let mut stream = client
.stream_alarms(StreamAlarmsRequest {
client_correlation_id: next_correlation_id("test-stream-alarms"),
alarm_filter_prefix: String::new(),
})
.await
.unwrap();
let first = stream.next().await.unwrap().unwrap();
let second = stream.next().await.unwrap().unwrap();
let third = stream.next().await.unwrap().unwrap();
assert!(matches!(
first.payload,
Some(alarm_feed_message::Payload::ActiveAlarm(_))
));
assert!(matches!(
second.payload,
Some(alarm_feed_message::Payload::SnapshotComplete(true))
));
assert!(matches!(
third.payload,
Some(alarm_feed_message::Payload::Transition(_))
));
}
#[tokio::test]
async fn cli_subcommands_propagate_unique_correlation_ids_from_next_correlation_id() {
// The CLI's `stream-alarms` and `acknowledge-alarm` paths used to
// hard-code their correlation ids (Client.Rust-023). Verify the
// resolution end-to-end through `next_correlation_id`: every call
// observed at the fake gateway has a unique id that embeds the
// `cli-...` label, so concurrent CLI smokes can tell collisions apart.
let state = Arc::new(FakeState::default());
let endpoint = spawn_fake_gateway(state.clone()).await;
let client = GatewayClient::connect(ClientOptions::new(endpoint))
.await
.unwrap();
let first_corr = next_correlation_id("cli-stream-alarms");
let _ = client
.stream_alarms(StreamAlarmsRequest {
client_correlation_id: first_corr.clone(),
alarm_filter_prefix: String::new(),
})
.await
.unwrap();
assert_eq!(
*state.last_correlation_id.lock().await,
Some(first_corr.clone())
);
let second_corr = next_correlation_id("cli-stream-alarms");
assert_ne!(first_corr, second_corr);
assert!(second_corr.contains("cli-stream-alarms"));
let third_corr = next_correlation_id("cli-acknowledge-alarm");
let _ = client
.acknowledge_alarm(AcknowledgeAlarmRequest {
client_correlation_id: third_corr.clone(),
alarm_full_reference: "Tank01.Level.HiHi".to_owned(),
comment: String::new(),
operator_user: String::new(),
})
.await
.unwrap();
assert_eq!(*state.last_correlation_id.lock().await, Some(third_corr));
}
#[derive(Default)]
struct FakeState {
authorization: Mutex<Option<String>>,
last_command_kind: Mutex<Option<i32>>,
last_correlation_id: Mutex<Option<String>>,
stream_dropped: Arc<AtomicBool>,
/// Optional per-test override that pins the fake's `Invoke` handler to
/// a specific reply shape (or `Err(Status)`). The default of `None`
/// keeps the existing happy-path dispatcher.
invoke_override: Mutex<Option<InvokeOverride>>,
/// Optional per-test override that pins the fake's `StreamAlarms`
/// handler to emit a synthetic ConditionRefresh -> snapshot_complete
/// -> transition sequence.
stream_alarms_script: Mutex<Option<Vec<AlarmFeedMessage>>>,
}
/// Per-test override for the fake's `Invoke` handler.
#[allow(dead_code)]
enum InvokeOverride {
/// Reply with `protocol_status = Ok` and no `payload` set.
OkWithoutPayload,
/// Reply with `protocol_status = Ok` and a deliberately wrong payload
/// arm — e.g. an `AddItemReply` body when the caller invoked a bulk
/// command. The variant carries the kind to recognise in tests but the
/// reply itself is the mismatched-payload shape.
OkWithMismatchedPayload,
/// Fail the unary call with `Status::unavailable(...)` so the client's
/// `Code::Unavailable` -> `Error::Unavailable` mapping is exercised.
Unavailable(String),
}
#[derive(Clone)]
@@ -331,6 +737,35 @@ impl MxAccessGateway for FakeGateway {
.map(|command| command.kind)
.unwrap_or_default();
*self.state.last_command_kind.lock().await = Some(kind);
*self.state.last_correlation_id.lock().await = Some(request.client_correlation_id.clone());
// Honour any per-test override before falling through to the
// happy-path dispatcher.
if let Some(override_) = self.state.invoke_override.lock().await.take() {
return match override_ {
InvokeOverride::OkWithoutPayload => Ok(Response::new(MxCommandReply {
session_id: request.session_id,
correlation_id: "fake-correlation".to_owned(),
kind,
protocol_status: Some(ok_status("command ok")),
payload: None,
..MxCommandReply::default()
})),
InvokeOverride::OkWithMismatchedPayload => Ok(Response::new(MxCommandReply {
session_id: request.session_id,
correlation_id: "fake-correlation".to_owned(),
kind,
protocol_status: Some(ok_status("command ok")),
// Deliberately the wrong payload arm — `AddItemReply`
// for whatever command was actually invoked.
payload: Some(mx_command_reply::Payload::AddItem(AddItemReply {
item_handle: 99,
})),
..MxCommandReply::default()
})),
InvokeOverride::Unavailable(message) => Err(Status::unavailable(message)),
};
}
if kind == MxCommandKind::Write as i32 {
return Ok(Response::new(mxaccess_failure_reply()));
@@ -357,6 +792,92 @@ impl MxAccessGateway for FakeGateway {
}));
}
if kind == MxCommandKind::Register as i32 {
return Ok(Response::new(MxCommandReply {
session_id: request.session_id,
correlation_id: "fake-correlation".to_owned(),
kind,
protocol_status: Some(ok_status("command ok")),
payload: Some(mx_command_reply::Payload::Register(RegisterReply {
server_handle: 12,
})),
..MxCommandReply::default()
}));
}
if kind == MxCommandKind::AddItem2 as i32 {
return Ok(Response::new(MxCommandReply {
session_id: request.session_id,
correlation_id: "fake-correlation".to_owned(),
kind,
protocol_status: Some(ok_status("command ok")),
payload: Some(mx_command_reply::Payload::AddItem2(AddItem2Reply {
item_handle: 56,
})),
..MxCommandReply::default()
}));
}
if kind == MxCommandKind::ReadBulk as i32 {
return Ok(Response::new(MxCommandReply {
session_id: request.session_id,
correlation_id: "fake-correlation".to_owned(),
kind,
protocol_status: Some(ok_status("command ok")),
payload: Some(mx_command_reply::Payload::ReadBulk(BulkReadReply {
results: vec![BulkReadResult {
server_handle: 12,
tag_address: "Area001.Pump001.Speed".to_owned(),
item_handle: 34,
was_successful: true,
was_cached: true,
..BulkReadResult::default()
}],
})),
..MxCommandReply::default()
}));
}
if kind == MxCommandKind::WriteBulk as i32 {
return Ok(Response::new(write_bulk_reply_for(
request.session_id,
kind,
mx_command_reply::Payload::WriteBulk(BulkWriteReply {
results: vec![bulk_write_result_ok(12, 34)],
}),
)));
}
if kind == MxCommandKind::Write2Bulk as i32 {
return Ok(Response::new(write_bulk_reply_for(
request.session_id,
kind,
mx_command_reply::Payload::Write2Bulk(BulkWriteReply {
results: vec![bulk_write_result_ok(12, 34)],
}),
)));
}
if kind == MxCommandKind::WriteSecuredBulk as i32 {
return Ok(Response::new(write_bulk_reply_for(
request.session_id,
kind,
mx_command_reply::Payload::WriteSecuredBulk(BulkWriteReply {
results: vec![bulk_write_result_ok(12, 34)],
}),
)));
}
if kind == MxCommandKind::WriteSecured2Bulk as i32 {
return Ok(Response::new(write_bulk_reply_for(
request.session_id,
kind,
mx_command_reply::Payload::WriteSecured2Bulk(BulkWriteReply {
results: vec![bulk_write_result_ok(12, 34)],
}),
)));
}
Ok(Response::new(MxCommandReply {
session_id: request.session_id,
correlation_id: "fake-correlation".to_owned(),
@@ -387,8 +908,10 @@ impl MxAccessGateway for FakeGateway {
async fn acknowledge_alarm(
&self,
_request: Request<AcknowledgeAlarmRequest>,
request: Request<AcknowledgeAlarmRequest>,
) -> Result<Response<AcknowledgeAlarmReply>, Status> {
*self.state.last_correlation_id.lock().await =
Some(request.into_inner().client_correlation_id);
Ok(Response::new(AcknowledgeAlarmReply {
correlation_id: "corr-1".to_owned(),
protocol_status: Some(ok_status("ack ok")),
@@ -407,9 +930,18 @@ impl MxAccessGateway for FakeGateway {
async fn stream_alarms(
&self,
_request: Request<StreamAlarmsRequest>,
request: Request<StreamAlarmsRequest>,
) -> Result<Response<Self::StreamAlarmsStream>, Status> {
let (_sender, receiver) = mpsc::channel::<Result<AlarmFeedMessage, Status>>(1);
*self.state.last_correlation_id.lock().await =
Some(request.into_inner().client_correlation_id);
let script = self.state.stream_alarms_script.lock().await.take();
let (sender, receiver) =
mpsc::channel::<Result<AlarmFeedMessage, Status>>(script.as_ref().map_or(1, Vec::len));
if let Some(messages) = script {
for message in messages {
sender.send(Ok(message)).await.unwrap();
}
}
let stream = ReceiverStream::new(receiver);
Ok(Response::new(Box::pin(stream)))
}
@@ -469,6 +1001,32 @@ async fn spawn_fake_gateway(state: Arc<FakeState>) -> String {
format!("http://{address}")
}
fn write_bulk_reply_for(
session_id: String,
kind: i32,
payload: mx_command_reply::Payload,
) -> MxCommandReply {
MxCommandReply {
session_id,
correlation_id: "fake-correlation".to_owned(),
kind,
protocol_status: Some(ok_status("command ok")),
payload: Some(payload),
..MxCommandReply::default()
}
}
fn bulk_write_result_ok(server_handle: i32, item_handle: i32) -> BulkWriteResult {
BulkWriteResult {
server_handle,
item_handle,
was_successful: true,
hresult: Some(0),
statuses: Vec::new(),
error_message: String::new(),
}
}
fn ok_status(message: &str) -> ProtocolStatus {
ProtocolStatus {
code: ProtocolStatusCode::Ok as i32,
+13 -5
View File
@@ -7,7 +7,7 @@
| Review date | 2026-05-24 |
| Commit reviewed | `42b0037` |
| Status | Re-reviewed |
| Open findings | 4 |
| Open findings | 0 |
## Checklist coverage
@@ -390,7 +390,7 @@ Re-review pass at `42b0037`. Diff against `d692232` consists of four commits:
| Severity | Medium |
| Category | Documentation & comments |
| Location | `clients/dotnet/README.md:137-138` |
| Status | Open |
| Status | Resolved |
**Description:** The README example block for the two new alarm CLI subcommands shipped in commit `11cc671` shows:
@@ -412,6 +412,8 @@ mxgw-dotnet acknowledge-alarm --reference "\\Galaxy\Area001.Pump001.PumpFault" -
A quick sanity check would be to drive each example through the test harness's `MxGatewayClientCli.RunAsync` shape and confirm exit 0 — copy/paste safety on the documented examples is the only realistic safeguard.
**Resolution:** 2026-05-24 — Confirmed against source: the README `dotnet run -- stream-alarms` example used `--session-id <id> --max-messages 1`, neither of which the CLI accepts (the production parser routes through `--filter-prefix` / `--max-events`), and `acknowledge-alarm` used `--session-id <id> --alarm-reference <ref>` against a session-less central alarm monitor that actually consumes `--reference`. Replaced both README example lines with the parser-correct shape: `stream-alarms --filter-prefix Area001 --max-events 1 --json` and `acknowledge-alarm --reference "\\Galaxy\Area001.Pump001.PumpFault" --comment "ack from cli" --operator operator1 --json`. Regression test `MxGatewayClientCliTests.RunAsync_ReadmeExamples_ForAlarmCommands_ParseSuccessfully` (xUnit `[Theory]` over `stream-alarms` and `acknowledge-alarm`) locates `clients/dotnet/README.md`, extracts the documented CLI command line for each subcommand, tokenizes it (preserving quoted segments), and drives it through the production `MxGatewayClientCli.RunAsync` against a `FakeCliClient`; the test asserts exit code 0 and that stderr contains neither "Unknown command" nor "Missing required option". Verified red against the original README text (both theory rows failed with exit code 1) and green after the README update, so any future documentation drift between README examples and the actual parser shape is caught at test time.
### Client.Dotnet-019
| Field | Value |
@@ -419,7 +421,7 @@ A quick sanity check would be to drive each example through the test harness's `
| Severity | Low |
| Category | Correctness & logic bugs |
| Location | `clients/dotnet/ZB.MOM.WW.MxGateway.Client.Cli/MxGatewayClientCli.cs:745` |
| Status | Open |
| Status | Resolved |
**Description:** Client.Dotnet-005 / 010 documented (and recorded as resolved) the silent register-handle fallback pattern `reply.Register?.ServerHandle ?? reply.ReturnValue.Int32Value`, where a successful protocol+MX-status reply missing its typed `register` oneof case falls through to `ReturnValue.Int32Value` and silently yields `0` for the handle. The new `BenchReadBulkAsync` handler introduced in commit `b3ae200` reinstates exactly that pattern at line 745:
@@ -431,6 +433,8 @@ The bench then drives the rest of the run — `SubscribeBulk`, warmup `ReadBulk`
**Recommendation:** Replace the fallback with an explicit null-check on `registerReply.Register` that throws `MxGatewayException` with the missing-payload context (kind = `Register`, session id, correlation id) — the same shape Client.Dotnet-005 prescribes. If the upstream SDK helpers in `MxGatewaySession` are restored to throw, route the bench through `MxGatewaySession.RegisterAsync` instead so the CLI inherits the SDK's protection.
**Resolution:** 2026-05-24 — Confirmed against source: `BenchReadBulkAsync` at line 745 carried the silent fallback `registerReply.Register?.ServerHandle ?? registerReply.ReturnValue.Int32Value`, so a successful protocol+MX-status reply missing its typed `register` payload would yield ServerHandle=0 and the bench would drive the rest of SubscribeBulk / warmup / steady-state ReadBulk / UnsubscribeBulk against an invalid handle. Added a private `RequireRegisterServerHandle(MxCommandReply reply, string sessionId)` helper that throws a descriptive `MxGatewayException` naming the missing typed payload, the session id, and the correlation id, then replaced the fallback site with `int serverHandle = RequireRegisterServerHandle(registerReply, sessionId);`. The bench's outer `RunCoreAsync` catch then surfaces the throw as exit code 1 plus the descriptive message on stderr, so the failure mode is loud rather than a wall of zero-result stats. Regression test `MxGatewayClientCliTests.RunAsync_BenchReadBulk_WhenRegisterReplyMissingTypedPayload_FailsLoudly` enqueues an `Ok` MX command reply with no typed `Register` payload, runs `bench-read-bulk` against the fake, and asserts exit code 1, that stderr names the `Register` payload, and that no `bench-read-bulk` stats JSON was emitted on stdout. Verified red against the original fallback (an NRE bubbled up later in the run rather than the descriptive throw) and green after the helper landed.
### Client.Dotnet-020
| Field | Value |
@@ -438,7 +442,7 @@ The bench then drives the rest of the run — `SubscribeBulk`, warmup `ReadBulk`
| Severity | Low |
| Category | Error handling & resilience |
| Location | `clients/dotnet/ZB.MOM.WW.MxGateway.Client.Cli/MxGatewayClientCli.cs:792-810`, `clients/dotnet/ZB.MOM.WW.MxGateway.Client.Cli/MxGatewayClientCli.cs:774-780` |
| Status | Open |
| Status | Resolved |
**Description:** `BenchReadBulkAsync`'s steady-state `while (DateTime.UtcNow < steadyDeadline)` loop wraps each `client.InvokeAsync(...)` in a bare `catch`:
@@ -466,6 +470,8 @@ The warmup loop above (lines 774-780) has no catch at all, so a warmup-time OCE
**Recommendation:** Replace the bare `catch` with `catch (Exception) when (!cancellationToken.IsCancellationRequested)`, or split into `catch (OperationCanceledException) { throw; } catch (Exception) { failedCalls++; ... continue; }`. The first form is the smallest diff and matches the pattern used elsewhere in the CLI. Add a regression test that runs `bench-read-bulk` with a `--duration-seconds 10` budget against a fake that throws on every `InvokeAsync`, cancels the supplied token after 100 ms, and asserts the run exits in well under 10 s. The wider precedent — Client.Dotnet-016's `BenchStreamEventsAsync` cancellation hardening — should already cover the shape of the test fixture.
**Resolution:** 2026-05-24 — Confirmed against source: the steady-state loop wrapped `client.InvokeAsync` in a bare `catch { sw.Stop(); failedCalls++; latencyMillis.Add(...); continue; }` with no type filter, so an `OperationCanceledException` thrown from a cancelled token (Ctrl+C, parent CTS, or the wall-clock budget) was swallowed and the loop spun until `DateTime.UtcNow >= steadyDeadline`. Replaced the bare clause with `catch (Exception ex) when (ex is not OperationCanceledException)`, so OCE now propagates out of the bench and unwinds through the outer `RunCoreAsync` (which intentionally lets OCE escape). Inline comment names the finding and the reason the filter exists. Regression test `MxGatewayClientCliTests.RunAsync_BenchReadBulk_WhenSteadyStateLoopReceivesCancellation_ExitsPromptly` configures the fake CLI client so Register and SubscribeBulk succeed, the first three ReadBulk calls succeed (the loop enters the steady-state body), then every subsequent ReadBulk throws `OperationCanceledException`; the test runs `bench-read-bulk --duration-seconds 30` and asserts the call throws OCE and wall-clock elapsed time stays under 10 s. Verified red against the original bare `catch` (the test ran for the full 30 s without throwing) and green after the filter landed (the bench exited promptly with OCE).
### Client.Dotnet-021
| Field | Value |
@@ -473,7 +479,7 @@ The warmup loop above (lines 774-780) has no catch at all, so a warmup-time OCE
| Severity | Low |
| Category | Correctness & logic bugs |
| Location | `clients/dotnet/ZB.MOM.WW.MxGateway.Client.Cli/MxGatewayClientCli.cs:487`, `clients/dotnet/ZB.MOM.WW.MxGateway.Client.Cli/MxGatewayClientCli.cs:715` |
| Status | Open |
| Status | Resolved |
**Description:** Both new bulk-read CLI handlers cast a signed `--timeout-ms` argument to `uint` without bounds checking:
@@ -499,3 +505,5 @@ uint timeoutMs = (uint)timeoutMsRaw;
```
A single shared helper (e.g. `ParseTimeoutMs(CliArguments, string, int)`) on `MxGatewayClientCli` would cover both call sites and remove the duplication.
**Resolution:** 2026-05-24 — Confirmed against source: both `ReadBulkAsync` (line 490) and `BenchReadBulkAsync` (line 715) cast `arguments.GetInt32("timeout-ms", ...)` straight to `uint`, so `--timeout-ms -1` silently wrapped to `0xFFFFFFFF` (~49.7 days). Added a single shared private helper `ParseTimeoutMs(CliArguments arguments, int defaultValue)` on `MxGatewayClientCli` that reads the int32, rejects negatives with a clear `ArgumentException` ("--timeout-ms must be a non-negative integer (use 0 for the gateway default)."), and returns the safe `(uint)`. Both call sites now route through the helper. Regression test `MxGatewayClientCliTests.RunAsync_TimeoutMs_NegativeValue_RejectsWithClearError` (xUnit `[Theory]` over `read-bulk` and `bench-read-bulk`) drives the CLI with `--timeout-ms -1` and asserts the exit code is non-zero, that stderr contains "timeout-ms", and that the "non-negative" guard text is present. Verified red against the original `(uint)arguments.GetInt32(...)` casts (the bench proceeded past the timeout parse and tripped a downstream "Queue empty" error rather than the descriptive guard message) and green after the helper landed.
+13 -13
View File
@@ -7,7 +7,7 @@
| Review date | 2026-05-24 |
| Commit reviewed | `42b0037` |
| Status | Re-reviewed |
| Open findings | 6 |
| Open findings | 0 |
## Checklist coverage
@@ -472,7 +472,7 @@ Each is a few lines and routes through the existing `runWithIO` entry point, so
| Severity | Medium |
| Category | Code organization & conventions |
| Location | `clients/go/cmd/mxgw-go/main.go:398-412,417-519` |
| Status | Open |
| Status | Resolved |
**Description:** Commit `8aaab82` ("Go client: port bulk read/write SDK methods + CLI subcommands") re-introduces every symptom that Client.Go-015 documented and was marked Resolved against an earlier commit:
@@ -484,7 +484,7 @@ Because the surrounding test file (`main_test.go`) lost the regression tests pro
**Recommendation:** Re-apply the Client.Go-015 fix on this re-added code. Drop the `secured` parameter and the `_ = secured` line (the `command` switch is the only routing key); derive the variant locally from `command`; register `-current-user-id` / `-verifier-user-id` only inside the secured branches and `-user-id` only inside Write/Write2 — so a wrong-variant flag fails with a clean `flag provided but not defined` usage error. Re-add the `TestRunWriteBulkVariantGatesSecuredFlags` table-test from the Client.Go-021 resolution so a future regression is caught by CI.
**Resolution:** Open.
**Resolution:** 2026-05-24 — Re-applied the Client.Go-015 fix. Dropped the unused `secured` parameter from `runWriteBulkVariant` and the misleading `_ = secured` line; the variant is now derived locally from `command` and gates flag registration. `-current-user-id` / `-verifier-user-id` are only registered for the secured variants and `-user-id` only for Write/Write2, so a wrong-variant flag now fails with a clean `flag provided but not defined` usage error. The four `runWrite*Bulk` wrappers were updated to match the new signature. Regression test `TestRunWriteBulkVariantGatesSecuredFlags` in `cmd/mxgw-go/main_test.go` (table-driven across all five wrong-variant flag/command pairs) was re-added; it failed pre-fix on every case ("session-id, item-handles, and values are required" reached because the flag was silently accepted), and passes post-fix with the expected `flag provided but not defined`.
### Client.Go-023
@@ -493,7 +493,7 @@ Because the surrounding test file (`main_test.go`) lost the regression tests pro
| Severity | Medium |
| Category | Concurrency & thread safety |
| Location | `clients/go/cmd/mxgw-go/main.go:604-606,616-632` |
| Status | Open |
| Status | Resolved |
**Description:** `runBenchReadBulk`'s warm-up and steady-state loops are wall-clock-only again:
@@ -515,7 +515,7 @@ Neither loop checks `ctx.Done()` / `ctx.Err()`. This is exactly the shape Client
**Recommendation:** Re-apply the Client.Go-018 fix: change both loop conditions to `for time.Now().Before(warmupDeadline) && ctx.Err() == nil` (and the same on `steadyDeadline`). The cross-language bench JSON shape is unchanged — the truncated window is just reported faithfully via `durationMs` / `totalCalls`. Optionally add the `signal.NotifyContext` pattern used by `runStreamAlarms` and `runGalaxyWatch` so direct Ctrl+C on the bench also short-circuits cleanly.
**Resolution:** Open.
**Resolution:** 2026-05-24 — Re-applied the Client.Go-018 fix. Both the warm-up and steady-state loops in `runBenchReadBulk` now carry an `&& ctx.Err() == nil` guard alongside the wall-clock check, so a cancelled parent context breaks the loops instead of spinning failing `ReadBulk` calls until the deadline elapses. The cross-language bench JSON shape (`durationMs` / `totalCalls`) is unchanged — the truncated window is just reported faithfully. Regression test `TestRunBenchReadBulkRespectsContextCancellation` in `cmd/mxgw-go/main_test.go` spins up a localhost TCP gRPC fake (`benchFakeGateway`) that answers OpenSession + Invoke for the register/subscribe/read/unsubscribe sequence, runs the bench with `-warmup-seconds 5 -duration-seconds 5`, cancels the ctx after 150ms, and asserts the bench returns in under 4s. Pre-fix the test ran for the full 10s (warmup+duration); post-fix it returns within ~250ms.
### Client.Go-024
@@ -524,7 +524,7 @@ Neither loop checks `ctx.Done()` / `ctx.Err()`. This is exactly the shape Client
| Severity | Low |
| Category | Testing coverage |
| Location | `clients/go/mxgateway/session.go:395-525`, `clients/go/mxgateway/alarms.go:65-76` |
| Status | Open |
| Status | Resolved |
**Description:** The five new bulk SDK methods on `Session` and the new `Client.StreamAlarms` method have **no unit tests** in `clients/go/mxgateway/`:
@@ -545,7 +545,7 @@ Neither loop checks `ctx.Done()` / `ctx.Err()`. This is exactly the shape Client
These plus `nil` / empty-slice rejection tests for each bulk method close out the new public surface.
**Resolution:** Open.
**Resolution:** 2026-05-24 — Added SDK-level tests using the existing `newBufconnClient` / `newBufconnClientWithAlarms` fake-gateway pattern. In `clients/go/mxgateway/client_session_test.go`: `TestWriteBulkBuildsOneBulkCommandAndReturnsPerEntryResults` confirms the protobuf payload carries `MX_COMMAND_KIND_WRITE_BULK` with all entries and returns per-entry results; `TestWriteBulkRejectsNilEntries` pins the nil guard on all five new bulk methods (WriteBulk/Write2Bulk/WriteSecuredBulk/WriteSecured2Bulk/ReadBulk); `TestReadBulkForwardsTimeoutAndUnpacksCachedFlag` pins normal `timeoutMs` arithmetic and `WasCached` propagation; `TestReadBulkSaturatesTimeoutAboveMaxUint32` pins the `> MaxUint32 ms` clamp at `session.go:504-509`. In `clients/go/mxgateway/alarms_test.go`: `TestStreamAlarmsPassesFilterPrefixAndReceivesFeedMessages` asserts request flows and stream Recv returns each fake `AlarmFeedMessage` (active-alarm snapshot, snapshot-complete sentinel) with auth metadata attached; `TestStreamAlarmsRejectsNilRequest` pins the nil guard. The `fakeGatewayWithAlarms` was extended with a `StreamAlarms` method.
### Client.Go-025
@@ -554,7 +554,7 @@ These plus `nil` / empty-slice rejection tests for each bulk method close out th
| Severity | Low |
| Category | Correctness & logic bugs |
| Location | `clients/go/mxgateway/session.go:395-485,495-525` |
| Status | Open |
| Status | Resolved |
**Description:** The five new bulk methods (`WriteBulk`, `Write2Bulk`, `WriteSecuredBulk`, `WriteSecured2Bulk`, `ReadBulk`) each guard with `if entries == nil { return error }` and an upper-bound `ensureBulkSize` check, but accept a non-nil empty slice (e.g. `[]*WriteBulkEntry{}` or `[]string{}`). The call then sends an `MX_COMMAND_KIND_WRITE_BULK` (or peer) command with zero entries across the gRPC wire to the gateway, which forwards to the worker for a no-op round trip. This is the same shape Client.Go-015 / Client.Go-021 were written against (the CLI now also accepts `mxgw-go write-bulk -item-handles , -values ,` which `parseInt32List` returns as empty without error). The pre-existing bulk methods (`AddItemBulk`, `AdviseItemBulk`, etc. at `session.go:253-343`) carry the identical pattern, so this is a long-standing convention — but it's still a real cost on the hot path. The Java / .NET / Rust / Python clients should be checked for parity if this is fixed.
@@ -565,7 +565,7 @@ These plus `nil` / empty-slice rejection tests for each bulk method close out th
Option 1 is cheaper for callers (one less round trip and one clearer error message) and removes the empty-list footgun for cross-language drivers that may pass empty arrays from PowerShell `,` splits.
**Resolution:** Open.
**Resolution:** 2026-05-24 — Audited the four other clients for parity: .NET (`MxGatewaySession.cs:520`), Rust (`session.rs:408` via `ensure_bulk_size`), Python (`session.py:350`), and Java (`MxGatewaySession.java:451`) all accept empty slices and make the round-trip with zero entries. To preserve cross-language behaviour (no error on empty input) while removing the wasteful round trip on the Go hot path, all five new bulk methods (`WriteBulk`, `Write2Bulk`, `WriteSecuredBulk`, `WriteSecured2Bulk`, `ReadBulk`) now short-circuit on `len(entries) == 0` and return an empty result slice without invoking the command. The nil guard is preserved (returns the "...are required" error) and the SDK doc comments now document the empty-slice no-op shape explicitly. Regression test `TestBulkMethodsShortCircuitOnEmptySliceWithoutRoundTrip` in `client_session_test.go` invokes each of the five methods with an empty slice and asserts (a) no error, (b) zero-length result, and (c) `fake.invokeRequest == nil` (no gRPC round trip). Pre-fix the test failed on the first assertion ("WriteBulk(empty) sent a round trip; expected short-circuit"); post-fix it passes.
### Client.Go-026
@@ -574,7 +574,7 @@ Option 1 is cheaper for callers (one less round trip and one clearer error messa
| Severity | Low |
| Category | Error handling & resilience |
| Location | `clients/go/cmd/mxgw-go/main.go:1196-1222` |
| Status | Open |
| Status | Resolved |
**Description:** `runBatch` reads command lines with a default `bufio.Scanner`:
@@ -595,7 +595,7 @@ A second weakness: `strings.Fields(line)` splits on whitespace and does no quote
**Recommendation:** Call `scanner.Buffer(make([]byte, 0, 64*1024), 16*1024*1024)` immediately after `bufio.NewScanner` so a long bulk-args line doesn't abort the session. If `runBatch` is intended to support free-text flag values (the `acknowledge-alarm -comment` shape is the obvious case), swap `strings.Fields` for a quote-aware tokeniser (`mvdan.cc/sh/v3/syntax` or a small inline state machine matching the .NET/Rust harness shape). Otherwise add a one-line comment to `runBatch`'s doc-comment that batch-mode arguments must not contain whitespace.
**Resolution:** Open.
**Resolution:** 2026-05-24 — `runBatch` now sets `scanner.Buffer(make([]byte, 0, 64*1024), 16*1024*1024)` immediately after `bufio.NewScanner`, lifting the per-line cap from 64 KiB to 16 MiB so a long bulk-args line (several thousand handles) no longer aborts the session. If a single line still exceeds the 16 MiB cap, the resulting `scanner.Err()` is now framed as a final error-with-sentinel (JSON payload + `batchEOR`) and returned, so the harness never sees an unframed bufio failure. Regression test `TestRunBatchHandlesLongCommandLine` in `cmd/mxgw-go/main_test.go` feeds an ~88 KiB `subscribe-bulk` line (above the old 64 KiB default) followed by `version --json` and asserts two EOR sentinels are emitted — pre-fix the test failed with "bufio.Scanner: token too long" returned from `runBatch`; post-fix both commands run and the session completes cleanly. Quote-aware tokenisation is out of scope for this finding (the recommendation accepts either fix); the `strings.Fields` shape is unchanged.
### Client.Go-027
@@ -604,7 +604,7 @@ A second weakness: `strings.Fields(line)` splits on whitespace and does no quote
| Severity | Low |
| Category | Code organization & conventions |
| Location | `clients/go/cmd/mxgw-go/main.go:1195-1206` |
| Status | Open |
| Status | Resolved |
**Description:** `runBatch`'s doc-comment says the loop "never terminates on command error; only stdin EOF (or an empty line) ends the session", and the implementation matches:
@@ -624,4 +624,4 @@ The two cases the empty-line check seems to cover — (a) operator pressing Ente
**Recommendation:** Change `if line == "" { break }` to `if line == "" { continue }` (alongside the existing `len(args) == 0` continue, which is then redundant — keep one, drop the other for clarity). Update the `runBatch` doc-comment to read "only stdin EOF ends the session" and drop the "or an empty line" clause. If the interactive ergonomic is genuinely wanted, gate it on `isatty(stdin)` so the batch-from-pipe case isn't affected.
**Resolution:** Open.
**Resolution:** 2026-05-24 — `runBatch` no longer treats a blank line as end-of-session. The `if line == "" { break }` early-exit was removed; blank or whitespace-only lines now fall through the existing `if len(args) == 0 { continue }` guard (kept as the single blank-line skip rule for clarity), so only stdin EOF ends the session. The doc-comment was updated to read "Blank lines are skipped; only stdin EOF ends the session." Regression test `TestRunBatchSkipsBlankLinesAndContinuesUntilEOF` in `cmd/mxgw-go/main_test.go` feeds `version --json\n\nversion --json\n` (a stray blank line between two commands) and asserts two EOR sentinels are emitted — pre-fix the test failed with "EOR sentinel count = 1, want 2" because the blank line broke the loop and the second command never ran; post-fix both commands run.
+117 -6
View File
@@ -7,7 +7,7 @@
| Review date | 2026-05-24 |
| Commit reviewed | `42b0037` |
| Status | Re-reviewed |
| Open findings | 5 |
| Open findings | 0 |
## Checklist coverage
@@ -835,7 +835,7 @@ parity fix.
| Severity | High |
| Category | Documentation & comments |
| Location | `clients/python/README.md:201-202`, `clients/python/src/zb_mom_ww_mxgateway_cli/commands.py:389-420` |
| Status | Open |
| Status | Resolved |
**Description:** The README CLI examples added by commit `8738735` for the
new alarm subcommands cite flags the CLI does not accept:
@@ -868,6 +868,19 @@ rename the CLI option to `--alarm-reference` and add a test that copy-pastes
the README examples through `CliRunner` to assert they parse. Option (1) is
the smaller change.
**Resolution:** 2026-05-24 — Fixed the README examples to match the
implementation (option 1, smaller change). `clients/python/README.md:201-202`
now reads `mxgw-py stream-alarms --max-messages 1 --json` and
`mxgw-py acknowledge-alarm --reference "\\Galaxy\Area001.Pump001.PumpFault" --json`
`--session-id` is dropped from both lines (the alarm feed is gateway-served,
session-less) and `--alarm-reference` is renamed to the real `--reference` flag.
Regression test
`tests/test_review_findings_022_to_026.py::test_readme_alarm_examples_parse_against_cli`
extracts every `mxgw-py …` line from the README, appends `--help` so only the
parser runs, and asserts that no example produces a `no such option` Click error.
Failed before the fix (the original `stream-alarms --session-id <id> …` line
emitted `Error: No such option: --session-id`), passes after.
### Client.Python-023
| Field | Value |
@@ -875,7 +888,7 @@ the smaller change.
| Severity | Medium |
| Category | Security |
| Location | `clients/python/src/zb_mom_ww_mxgateway_cli/commands.py:901-906` |
| Status | Open |
| Status | Resolved |
**Description:** Client.Python-013 (severity Medium, Security) was marked
**Resolved** on 2026-05-20 with the explicit claim that the silent
@@ -919,6 +932,31 @@ is marked Resolved with a 2026-05-20 commit reference, do **not** silently
re-resolve this finding — keep it Open with a fresh ID so the regression
audit trail is preserved.
**Resolution:** 2026-05-24 — Re-applied the Client.Python-013 fix on the
renamed CLI module. Dropped the `endpoint.startswith("localhost:") or
endpoint.startswith("127.0.0.1:")` auto-plaintext branch from
`_use_plaintext` in `clients/python/src/zb_mom_ww_mxgateway_cli/commands.py`.
TLS is now the default and `--plaintext` is the only way to opt in to
plaintext; `--tls` is accepted as a redundant affirmation and the two
flags combined raise `click.UsageError`. Regression tests live in
`tests/test_review_findings_022_to_026.py`:
`test_use_plaintext_does_not_auto_downgrade_for_localhost_endpoint` and
`test_use_plaintext_does_not_auto_downgrade_for_loopback_ipv4_endpoint`
exercise the bare-endpoint path,
`test_use_plaintext_requires_explicit_plaintext_flag` and
`test_use_plaintext_tls_flag_explicitly_disables_plaintext` pin the explicit
opt-in / opt-out contract,
`test_use_plaintext_rejects_plaintext_and_tls_combined` asserts mutual
exclusivity, and
`test_cli_localhost_endpoint_with_no_flags_uses_tls_channel` is an
end-to-end CliRunner test that intercepts `GatewayClient.connect` and
asserts the resolved `ClientOptions.plaintext` is `False` for a
`localhost:5000` endpoint without `--plaintext`. All five tests failed
against the pre-fix source and pass against the fix. **Behaviour change for
callers:** scripts that previously relied on
`mxgw-py … --endpoint localhost:5000 …` selecting plaintext silently must
now add an explicit `--plaintext` flag (or set up TLS on the gateway).
### Client.Python-024
| Field | Value |
@@ -926,7 +964,7 @@ audit trail is preserved.
| Severity | Medium |
| Category | Code organization & conventions |
| Location | `clients/python/src/zb_mom_ww_mxgateway_cli/commands.py:13,48-119` |
| Status | Open |
| Status | Resolved |
**Description:** The new `batch` subcommand (commit `71d2c39`) implements
the cross-language batch protocol by importing `click.testing.CliRunner`
@@ -965,6 +1003,33 @@ batch loop can interleave inner-command output with the
a regression test that drives `batch` with `batch\n` on stdin and asserts
recursive invocation is either rejected or correctly bounded.
**Resolution:** 2026-05-24 — Removed the `from click.testing import CliRunner`
import and the `CliRunner()` instantiation from
`clients/python/src/zb_mom_ww_mxgateway_cli/commands.py`. The `batch`
command body now dispatches each stdin line through a new
`_dispatch_batch_line` helper that calls `main.main(args=…,
standalone_mode=False, prog_name="mxgw-py")` directly and captures the
subcommand's stdout via `contextlib.redirect_stdout(io.StringIO())`. Click
exit conditions (`click.exceptions.Exit`, `click.ClickException`,
`SystemExit`) are caught and rendered as
`{"error": …, "type": …}` JSON; arbitrary exceptions are caught with a
broad `except Exception` so the batch loop never dies. A nested `batch`
line is rejected outright with a `RecursiveBatchError` JSON record before
the dispatcher runs, eliminating the silent-recursive-spawn footgun the
original `CliRunner.invoke(main, ["batch"], …)` path enabled. Regression
tests:
`tests/test_review_findings_022_to_026.py::test_batch_command_does_not_use_clirunner_in_production`
asserts the production module no longer imports `from click.testing` or
calls `CliRunner(`; and
`test_batch_recursive_batch_line_is_bounded` drives a `batch\nversion --json\n`
stdin payload and asserts the recursive `batch` line emits an error JSON
record rather than silently exiting. The pre-existing batch tests
(`test_batch_runs_version_command_and_writes_eor`,
`test_batch_terminates_on_empty_line`,
`test_batch_continues_after_error_line`) still pass against the new
implementation, confirming the wire-level contract (one EOR per line,
clean JSON error blocks) is preserved.
### Client.Python-025
| Field | Value |
@@ -972,7 +1037,7 @@ recursive invocation is either rejected or correctly bounded.
| Severity | Low |
| Category | Testing coverage |
| Location | `clients/python/tests/test_cli.py`, `clients/python/src/zb_mom_ww_mxgateway/{client.py,session.py}`, `clients/python/src/zb_mom_ww_mxgateway_cli/commands.py` |
| Status | Open |
| Status | Resolved |
**Description:** Commits `6add4b4` and `828e3e6` added five new SDK
methods (`Session.read_bulk`, `Session.write_bulk`, `Session.write2_bulk`,
@@ -1020,6 +1085,32 @@ applied to the renamed bench). At minimum, add a request-shape test for
`write_secured_bulk` since the secured family is the highest-risk
parity surface.
**Resolution:** 2026-05-24 — Added behavioural test coverage for the five
new bulk SDK methods, `stream_alarms`, and the new CLI subcommand bodies
in `tests/test_review_findings_022_to_026.py`. Request-shape tests
(`test_session_read_bulk_sends_expected_request_shape`,
`test_session_write_bulk_sends_expected_request_shape`,
`test_session_write2_bulk_sends_expected_request_shape`,
`test_session_write_secured_bulk_sends_expected_request_shape`,
`test_session_write_secured2_bulk_sends_expected_request_shape`) drive
each `Session.*_bulk` method against a fake `Invoke` stub and assert
the captured `MxCommand`'s `kind`, sub-message, `server_handle`, and
per-entry fields (including `current_user_id` / `verifier_user_id`
on the secured family — the highest-risk parity surface the finding
calls out). `test_stream_alarms_yields_feed_messages_and_cancels_on_close`
covers the `GatewayClient.stream_alarms` happy path including the
`_canceling_alarm_feed_iterator` cancel-on-close contract and the
authorization metadata header. CLI happy-path tests
(`test_cli_read_bulk_happy_path`, `test_cli_write_bulk_happy_path`,
`test_cli_stream_alarms_happy_path`, `test_cli_acknowledge_alarm_happy_path`)
each drive their subcommand through `CliRunner` against a fake stub
injected via a monkeypatched `GatewayClient.connect` and assert the
emitted JSON shape and that the captured RPC request carries the
expected fields. The four CLI happy-path tests passed even before any
production fix (the implementations were correct; the finding is a
coverage gap), but they now exist as regression guards against future
drift. No source change — pure coverage finding.
### Client.Python-026
| Field | Value |
@@ -1027,7 +1118,7 @@ parity surface.
| Severity | Low |
| Category | Correctness & logic bugs |
| Location | `clients/python/src/zb_mom_ww_mxgateway_cli/commands.py:674-738` |
| Status | Open |
| Status | Resolved |
**Description:** Two minor quality issues in the new `_bench_read_bulk`
body (commit `6add4b4`):
@@ -1060,3 +1151,23 @@ module-level `logger = logging.getLogger(__name__)`. No behavioural
change in the happy path; failure path becomes diagnosable. No new test
required for the import hoist; the logger change is exercised by the
existing bench smoke test once `caplog` is added to the test signature.
**Resolution:** 2026-05-24 — Hoisted `import time` to the module-level
import block in `clients/python/src/zb_mom_ww_mxgateway_cli/commands.py`
alongside the existing standard-library imports; the function-local
`import time` line at the top of `_bench_read_bulk` is gone. Added a
module-level `logger = logging.getLogger(__name__)` and rewrote the two
`finally` cleanup blocks to bind the swallowed exception and log it at
`WARNING` level — `unsubscribe_bulk` failures now emit
`"bench-read-bulk: unsubscribe_bulk cleanup failed: %s"` and the
`session.close()` failure path emits the equivalent — so a future
regression in the cleanup path is diagnosable at the next benchmark run
rather than silently corrupting subscription bookkeeping. Regression
tests in `tests/test_review_findings_022_to_026.py`:
`test_commands_module_imports_time_at_module_scope` uses
`inspect.getsource(_bench_read_bulk)` to assert no function-local
`import time` line, and asserts the module exposes `time` at module
scope; `test_commands_module_bench_read_bulk_does_not_use_bare_except_pass`
greps the function source for the `except Exception:\n pass` pattern
and rejects it. Both tests failed against the pre-fix source and pass
against the fix.
+25 -9
View File
@@ -7,7 +7,7 @@
| Review date | 2026-05-24 |
| Commit reviewed | `42b0037` |
| Status | Re-reviewed |
| Open findings | 8 |
| Open findings | 0 |
## Checklist coverage
@@ -481,7 +481,7 @@ The CLI integration in Client.Rust-014 works either way; this is solely about de
| Severity | Medium |
| Category | Correctness & logic bugs |
| Location | `clients/rust/src/session.rs:369-391,403-420,427-444,452-469,476-493,631-696,706-724` |
| Status | Open |
| Status | Resolved |
**Description:** Commit `3251069` re-introduced the bulk read/write SDK methods (`read_bulk`, `write_bulk`, `write2_bulk`, `write_secured_bulk`, `write_secured2_bulk`) on `Session`. Each method falls back to `Vec::new()` when an OK reply does not carry the expected typed payload arm:
@@ -496,6 +496,8 @@ and `bulk_write_results` does the same for the four write families. A caller of
**Recommendation:** Re-apply the Client.Rust-005/006 resolutions on top of the new methods: add `Error::MalformedReply { detail: String }` back to `error.rs`, change `register_server_handle` / `add_item_handle` / `add_item2_handle` to return `Result<i32, Error>` (yielding `MalformedReply` when the reply lacks an extractable handle), change `bulk_results` and `bulk_write_results` to return `Result<Vec<_>, Error>` (yielding `MalformedReply` on a mismatched / absent payload arm), and route the same fix through the new `read_bulk` inline branch. Re-add the six `…_returns_malformed_reply_…` tests from the Client.Rust-016 resolution to lock the contract in.
**Resolution:** 2026-05-24 — Re-added `Error::MalformedReply { detail: String }` to `clients/rust/src/error.rs` (alongside re-adding `Error::Unavailable` for Client.Rust-010's resolution). Changed `register_server_handle` / `add_item_handle` / `add_item2_handle` in `clients/rust/src/session.rs` to return `Result<i32, Error>` and yield `MalformedReply` when the reply lacks both a typed payload and an int32 `return_value`. Changed `bulk_results` and `bulk_write_results` to return `Result<Vec<_>, Error>` and yield `MalformedReply` on a mismatched or absent payload arm. Rewrote `read_bulk`'s inline branch the same way. Added six regression tests in `clients/rust/tests/client_behavior.rs``register_returns_malformed_reply_when_ok_reply_has_no_payload`, `add_item_returns_malformed_reply_when_ok_reply_has_no_payload`, `add_item2_returns_malformed_reply_when_ok_reply_has_no_payload`, `subscribe_bulk_returns_malformed_reply_on_mismatched_payload_arm`, `read_bulk_returns_malformed_reply_on_mismatched_payload_arm`, `write_bulk_returns_malformed_reply_on_mismatched_payload_arm` — driven by a new `InvokeOverride` enum on `FakeState` that pins the fake gateway's `Invoke` handler to `OkWithoutPayload`, `OkWithMismatchedPayload`, or `Unavailable(...)` per test.
### Client.Rust-023
| Field | Value |
@@ -503,7 +505,7 @@ and `bulk_write_results` does the same for the four write families. A caller of
| Severity | Low |
| Category | mxaccessgw conventions |
| Location | `clients/rust/crates/mxgw-cli/src/main.rs:835,872,1476` |
| Status | Open |
| Status | Resolved |
**Description:** Three CLI subcommands added since `d692232` hard-code their `client_correlation_id`:
@@ -517,6 +519,8 @@ Every invocation of these subcommands on the same machine — and every iteratio
**Recommendation:** Replace the three string literals with calls to the existing `next_correlation_id` helper (already `pub` and intended for raw-RPC consumers): `zb_mom_ww_mxgateway_client::session::next_correlation_id("cli-stream-alarms")`, `next_correlation_id("cli-acknowledge-alarm")`, `next_correlation_id("cli-bench-read-bulk-close")`. While here, restore the same fix at lines 506 and 553 so the CLI surface as a whole shares the library's correlation-id discipline.
**Resolution:** 2026-05-24 — Restored `pub fn next_correlation_id(label: &str) -> String` in `clients/rust/src/session.rs` (the `d692232` rename had reverted Client.Rust-011/014's resolution along with the rest) and re-exported it at the crate root in `clients/rust/src/lib.rs` (`pub use session::{next_correlation_id, Session};`) so the in-tree `mxgw` CLI calls the short `zb_mom_ww_mxgateway_client::next_correlation_id(...)` path. Replaced all five hard-coded correlation-id literals in `clients/rust/crates/mxgw-cli/src/main.rs` (`rust-cli-ping`, `rust-cli-close-session`, `rust-cli-stream-alarms`, `rust-cli-acknowledge-alarm`, `rust-cli-bench-read-bulk-close`) with `next_correlation_id("cli-...")`. `Session::command_request` and `Session::close` now use the same helper. Added a regression test `cli_subcommands_propagate_unique_correlation_ids_from_next_correlation_id` in `clients/rust/tests/client_behavior.rs` that records the correlation id observed at the fake gateway and asserts both that successive `next_correlation_id` calls produce distinct values and that the label is propagated through `stream_alarms` / `acknowledge_alarm`.
### Client.Rust-024
| Field | Value |
@@ -524,7 +528,7 @@ Every invocation of these subcommands on the same machine — and every iteratio
| Severity | Medium |
| Category | Testing coverage |
| Location | `clients/rust/tests/client_behavior.rs:405-415`; `clients/rust/src/session.rs:369-493`; `clients/rust/src/client.rs:265-291`; `clients/rust/crates/mxgw-cli/src/main.rs:1310-1505` |
| Status | Open |
| Status | Resolved |
**Description:** The diff under review adds substantial SDK and CLI surface with no positive-path coverage:
@@ -534,6 +538,8 @@ Every invocation of these subcommands on the same machine — and every iteratio
**Recommendation:** Extend the fake-gateway `Invoke` dispatcher in `tests/client_behavior.rs` with the five new bulk reply arms, add round-trip tests for each (`write_bulk` / `write2_bulk` / `write_secured_bulk` / `write_secured2_bulk` / `read_bulk`), and re-add the six malformed-reply tests from the Client.Rust-016 resolution. Replace the trivial `stream_alarms` stub with one that emits a synthetic `ActiveAlarm``SnapshotComplete``Transition` sequence and assert the client surfaces them in order. For the bench, factor the percentile / accounting helpers out of `run_bench_read_bulk` into a small struct (matching the previous `BenchReadBulkStats`) and add unit tests asserting `latencyMs.{p50,p95,p99,max,mean}` are computed correctly from a hand-built sample.
**Resolution:** 2026-05-24 — Extended the fake gateway's `Invoke` dispatcher to handle `Register`, `AddItem2`, `ReadBulk`, `WriteBulk`, `Write2Bulk`, `WriteSecuredBulk`, and `WriteSecured2Bulk`, with shared `write_bulk_reply_for` / `bulk_write_result_ok` helpers. Added round-trip integration tests in `clients/rust/tests/client_behavior.rs` for all five bulk SDK methods: `read_bulk_round_trips_through_the_fake_gateway`, `write_bulk_round_trips_through_the_fake_gateway`, `write2_bulk_round_trips_through_the_fake_gateway`, `write_secured_bulk_round_trips_through_the_fake_gateway`, `write_secured2_bulk_round_trips_through_the_fake_gateway`. Replaced the trivial `stream_alarms` stub with a per-test script (`FakeState::stream_alarms_script`) and added `stream_alarms_emits_snapshot_then_complete_then_transition_in_order`, which feeds an `ActiveAlarm``SnapshotComplete``Transition` sequence and asserts the client surfaces all three payload arms in order. Added three CLI-side unit tests against the existing `percentile_summary` helper in `clients/rust/crates/mxgw-cli/src/main.rs`: `bench_percentile_summary_matches_hand_built_sample` (asserts p50/p95/p99/max/mean on the sample `[1,2,3,4,5]`), `bench_percentile_summary_handles_empty_sample`, and `bench_percentile_summary_handles_single_value_sample`. The malformed-reply suite from Client.Rust-022 plus the unary `Error::Unavailable` test together cover the remaining surface.
### Client.Rust-025
| Field | Value |
@@ -541,7 +547,7 @@ Every invocation of these subcommands on the same machine — and every iteratio
| Severity | Low |
| Category | Design-document adherence |
| Location | `clients/rust/RustClientDesign.md:92-106,142-153,164-171` |
| Status | Open |
| Status | Resolved |
**Description:** CLAUDE.md mandates that "When public APIs, contracts, configuration, build steps, security behavior, event shapes, value conversion, status mapping, or lifecycle rules change, the affected docs ... must change in the same commit." The diff under review adds the following public surface, none of which is reflected in `RustClientDesign.md`:
@@ -553,6 +559,8 @@ Every invocation of these subcommands on the same machine — and every iteratio
**Recommendation:** Bring `RustClientDesign.md` back in sync with the actual implementation. Restore the `Session` API block to enumerate the five new bulk read/write methods alongside the existing six bulk-subscribe helpers (cross-reference Client.Rust-019's recommendation for the correct signatures — no trailing positional `user_id`/`timestamp`/`current_user_id`/`verifier_user_id`, those live on the per-entry structs). Add `stream_alarms` / `acknowledge_alarm` / `AlarmFeedStream` to a new "Alarms" section adjacent to the existing event-stream section. Expand the CLI command list to enumerate every subcommand the binary exposes today. Expand the `Error` enum sketch to match `clients/rust/src/error.rs`. Add a short "Windows build notes" subsection documenting the `.cargo/config.toml` stack workaround and why clap-derive's large `Command` enum needs it.
**Resolution:** 2026-05-24 — Brought `clients/rust/RustClientDesign.md` back in sync with the implementation. Removed the orphan `tracing` dependency line and added a new "Windows Build Notes" section explaining the `clients/rust/.cargo/config.toml` `/STACK:8388608` MSVC link-arg (why clap-derive's large `Command` enum needs it, that it writes into `IMAGE_OPTIONAL_HEADER.SizeOfStackReserve` for both debug and release builds, and that the MSVC-only target selector keeps mingw unaffected). Extended the `Session` API block to enumerate the five new bulk read/write helpers — `read_bulk<S: AsRef<str>>(&[S], u32)`, `write_bulk(Vec<WriteBulkEntry>)`, `write2_bulk(Vec<Write2BulkEntry>)`, `write_secured_bulk(Vec<WriteSecuredBulkEntry>)`, `write_secured2_bulk(Vec<WriteSecured2BulkEntry>)` — plus `add_item2`, `un_advise`, `remove_item`, and `write2`. Added a paragraph noting that per-entry credentials/timestamps live on the entry structs and that `read_bulk`'s borrow-and-`AsRef<str>` shape is what the cross-language bench-read-bulk hot loop relies on. Added a new "Alarms" section with `GatewayClient::stream_alarms` / `acknowledge_alarm` / the `AlarmFeedStream` type alias and the always-on (no-session) snapshot → complete → transition contract. Expanded the `Error` enum block to match `clients/rust/src/error.rs` (every public variant including `MalformedReply`, `Unavailable`, `InvalidEndpoint`, `InvalidArgument`). Expanded the CLI command list to enumerate every subcommand the binary exposes today, with notes on `batch`'s EOF-only termination and `bench-read-bulk`'s cross-language JSON shape.
### Client.Rust-026
| Field | Value |
@@ -560,7 +568,7 @@ Every invocation of these subcommands on the same machine — and every iteratio
| Severity | Low |
| Category | Performance & resource management |
| Location | `clients/rust/crates/mxgw-cli/src/main.rs:1402-1406,1419-1423` |
| Status | Open |
| Status | Resolved |
**Description:** `run_bench_read_bulk` clones the `tags: Vec<String>` on every iteration of both the warmup loop and the steady-state measurement loop:
@@ -586,6 +594,8 @@ Each clone allocates one fresh `Vec<String>` plus `bulk_size` heap-allocated `St
**Recommendation:** Change `Session::read_bulk` to take `tag_addresses: &[String]` or generic over `AsRef<str>` (as Client.Rust-019's recommendation noted is what `read_bulk` was at `a020350`) so the bench can pass `&tags` once and avoid the per-call clone. Alternatively keep the `Vec<String>` signature but borrow the underlying buffer for the hot loop — e.g. build the bench's payload once outside the steady-state window and re-use it. The current `read_bulk(..., Vec<String>, ...)` shape forces the clone at the call site.
**Resolution:** 2026-05-24 — Changed `Session::read_bulk` to `read_bulk<S: AsRef<str>>(&self, server_handle, tag_addresses: &[S], timeout_ms)` so the bench loop can borrow the tag list once instead of cloning it per iteration. `run_bench_read_bulk` now binds `let tags_ref: &[String] = &tags;` outside the warm-up / steady-state loops and passes `tags_ref` into both, eliminating the per-call `Vec<String>` + `String` clone-allocations that were previously charged into the per-call latency reported by the cross-language `latencyMs` JSON contract. The CLI `read-bulk` subcommand was updated to call `session.read_bulk(server_handle, &items, timeout_ms)`. Clippy's `clone_on_copy` / `redundant_clone` lints are clean at HEAD so no extra regression test is needed beyond `read_bulk_round_trips_through_the_fake_gateway` from Client.Rust-024.
### Client.Rust-027
| Field | Value |
@@ -593,7 +603,7 @@ Each clone allocates one fresh `Vec<String>` plus `bulk_size` heap-allocated `St
| Severity | Low |
| Category | Documentation & comments |
| Location | `clients/rust/.cargo/config.toml:1-9` |
| Status | Open |
| Status | Resolved |
**Description:** The new build-config file added by `71d2c39` carries this leading comment:
@@ -616,6 +626,8 @@ Two issues with the documentation vs the configuration:
**Recommendation:** Either tighten the `[target.…]` selector to only the MSVC-linker tier (`cfg(all(windows, target_env = "msvc"))`) and re-word the comment to make the release-build behaviour explicit ("the stack reservation goes into the PE header for both debug and release builds; release is unaffected at runtime because the optimizer elides the enum from the stack frame"), or add the `+gnu` variant as a parallel block using `-Wl,--stack,8388608`. As a complementary fix, see Client.Rust-024's recommendation — boxing the largest clap-derive variants would let the stack workaround be retired entirely.
**Resolution:** 2026-05-24 — Tightened the target selector in `clients/rust/.cargo/config.toml` from `cfg(windows)` to `cfg(all(windows, target_env = "msvc"))` so `x86_64-pc-windows-gnu` (mingw) builds, which route link args through the GNU linker and reject `/STACK:`, are no longer affected. Rewrote the leading comment to make the release-build behaviour explicit: the `/STACK:` link-arg writes into `IMAGE_OPTIONAL_HEADER.SizeOfStackReserve` at link time and applies to both debug and release builds (so release artifacts ship with the same 8 MB stack reservation); release builds do not need it at runtime because the optimizer elides the enum from the stack frame, but the setting is kept on so both flavours produce binaries with identical stack metadata. The doc note in `RustClientDesign.md` now mirrors this.
### Client.Rust-028
| Field | Value |
@@ -623,7 +635,7 @@ Two issues with the documentation vs the configuration:
| Severity | Low |
| Category | mxaccessgw conventions |
| Location | `clients/rust/crates/mxgw-cli/src/main.rs:1126-1166` |
| Status | Open |
| Status | Resolved |
**Description:** `run_batch` reads commands from stdin with the blocking `std::io::Stdin::lock().lines()` iterator while the surrounding function is `async fn` and the runtime is `#[tokio::main]` (multi-threaded by default). Each `for line in stdin.lock().lines()` iteration pins one of tokio's worker threads on a blocking syscall (`ReadFile` on Windows), then spawns the dispatch as a separate `tokio::task::spawn(dispatch(cli.command))` — which itself runs on another worker thread — and awaits its `JoinHandle`. The pattern works for a single-threaded harness driver but has two latent problems:
@@ -632,6 +644,8 @@ Two issues with the documentation vs the configuration:
**Recommendation:** Replace the empty-line break with an EOF-only terminator (`if line.trim().is_empty() { println!("{BATCH_EOR}"); stdout.lock().flush().ok(); continue; }`) so accidental blank lines log an empty-EOR-bracketed result instead of ending the session. Optionally wrap the stdin reader in `tokio::task::spawn_blocking` so the runtime can move it onto a dedicated blocking-pool thread; the current shape works for today's harness contract but is brittle if the dispatch future ever needs to share the runtime with stdin reads.
**Resolution:** 2026-05-24 — Removed the `if line.is_empty() { break; }` empty-line sentinel in `run_batch` so the only terminator is now stdin EOF (the implicit end of the `for line in stdin.lock().lines()` iterator). Blank or whitespace-only lines now fall into the existing `parts.is_empty()` branch which logs `__MXGW_BATCH_EOR__` and continues, matching the other four language CLIs and shielding the PowerShell e2e harness from accidental `Write-Output ""` calls between commands. Added a comment block on `run_batch` explaining the tokio-runtime / blocking-stdin trade-off: dispatch is already spawned on a fresh tokio task so the blocking iterator parks at most one worker thread on `ReadFile`, and that is acceptable because no other future on the main task needs to run while we wait for the next command. The CLI doc comment on the `Batch` clap variant was updated to mirror the new semantics. No unit test added — `run_batch` reads from real stdin which can't be driven from a `#[tokio::test]` without a subprocess harness, and the existing `parses_batch_command` / `batch_eor_marker_is_stable` tests already cover the parser and the EOR sentinel.
### Client.Rust-029
| Field | Value |
@@ -639,7 +653,7 @@ Two issues with the documentation vs the configuration:
| Severity | High |
| Category | mxaccessgw conventions |
| Location | `clients/rust/src/options.rs:98,143`; `clients/rust/src/galaxy.rs:282`; `clients/rust/src/session.rs:664-671` |
| Status | Open |
| Status | Resolved |
**Description:** `cargo clippy --workspace --all-targets -- -D warnings` fails at HEAD `42b0037` with three errors that the prior d692232 reviewer noted as "out of scope for Client.Rust-021" but did not open as a tracked finding:
@@ -671,3 +685,5 @@ All three were resolved at `a020350` (Client.Rust-001, Client.Rust-002, Client.R
The third error (`BulkReplyKind` enum-variant-names) is also touched by the diff under review: commit `3251069` added a sibling enum `BulkWriteReplyKind` at `session.rs:699-704` whose variants (`Write`, `Write2`, `WriteSecured`, `WriteSecured2`) do not share a suffix and so do not trip the lint — but the pattern is now duplicated. A fix should rename both enums consistently or apply the same scoped `#[allow(clippy::enum_variant_names)]` reason-annotated allow to both.
**Recommendation:** Re-apply Client.Rust-001 (add doc comments on `with_max_grpc_message_bytes` / `max_grpc_message_bytes` in `options.rs`), Client.Rust-002 (drop the `Bulk` suffix from `BulkReplyKind`'s variants so they become `AddItem` / `AdviseItem` / …, or add a narrowly-scoped `#[allow(clippy::enum_variant_names)]` with a reason comment), and Client.Rust-012 (replace `last_deploy.lock().unwrap().clone()` with `*last_deploy.lock().unwrap()` in `galaxy.rs:282`). Verify with `cargo clippy --workspace --all-targets -- -D warnings`. Consider adding a pre-commit / CI gate so the next reviewer never has to discover the regression by running clippy.
**Resolution:** 2026-05-24 — Re-applied all three resolutions. `clients/rust/src/options.rs` now has `///` doc comments on `with_max_grpc_message_bytes` and `max_grpc_message_bytes`. `clients/rust/src/galaxy.rs:282` uses `*self.state.last_deploy.lock().unwrap()` instead of `.clone()`. `clients/rust/src/session.rs`'s `BulkReplyKind` variants are renamed to `AddItem` / `AdviseItem` / `RemoveItem` / `UnAdviseItem` / `Subscribe` / `Unsubscribe` (no shared `Bulk` suffix), with the call sites in `add_item_bulk` / `advise_item_bulk` / `remove_item_bulk` / `un_advise_item_bulk` / `subscribe_bulk` / `unsubscribe_bulk` updated accordingly. The sibling `BulkWriteReplyKind` already had non-suffix-sharing variants (`Write` / `Write2` / `WriteSecured` / `WriteSecured2`) and required no rename. `cargo clippy --workspace --all-targets -- -D warnings` is clean at HEAD.
+35 -35
View File
@@ -10,14 +10,14 @@ Each module's `findings.md` is the source of truth; this file is generated from
| Module | Reviewer | Date | Commit | Status | Open | Total |
|---|---|---|---|---|---|---|
| [Client.Dotnet](Client.Dotnet/findings.md) | Claude Code | 2026-05-24 | `42b0037` | Re-reviewed | 4 | 21 |
| [Client.Go](Client.Go/findings.md) | Claude Code | 2026-05-24 | `42b0037` | Re-reviewed | 6 | 27 |
| [Client.Dotnet](Client.Dotnet/findings.md) | Claude Code | 2026-05-24 | `42b0037` | Re-reviewed | 0 | 21 |
| [Client.Go](Client.Go/findings.md) | Claude Code | 2026-05-24 | `42b0037` | Re-reviewed | 0 | 27 |
| [Client.Java](Client.Java/findings.md) | Claude Code | 2026-05-24 | `42b0037` | Re-reviewed | 5 | 36 |
| [Client.Python](Client.Python/findings.md) | Claude Code | 2026-05-24 | `42b0037` | Re-reviewed | 5 | 26 |
| [Client.Rust](Client.Rust/findings.md) | Claude Code | 2026-05-24 | `42b0037` | Re-reviewed | 8 | 29 |
| [Client.Python](Client.Python/findings.md) | Claude Code | 2026-05-24 | `42b0037` | Re-reviewed | 0 | 26 |
| [Client.Rust](Client.Rust/findings.md) | Claude Code | 2026-05-24 | `42b0037` | Re-reviewed | 0 | 29 |
| [Contracts](Contracts/findings.md) | Claude Code | 2026-05-24 | `42b0037` | Re-reviewed | 0 | 17 |
| [IntegrationTests](IntegrationTests/findings.md) | Claude Code | 2026-05-24 | `42b0037` | Re-reviewed | 1 | 25 |
| [Server](Server/findings.md) | Claude Code | 2026-05-24 | `42b0037` | Re-reviewed | 7 | 50 |
| [Server](Server/findings.md) | Claude Code | 2026-05-24 | `42b0037` | Re-reviewed | 0 | 50 |
| [Tests](Tests/findings.md) | Claude Code | 2026-05-24 | `42b0037` | Re-reviewed | 5 | 31 |
| [Worker](Worker/findings.md) | Claude Code | 2026-05-24 | `42b0037` | Re-reviewed | 0 | 25 |
| [Worker.Tests](Worker.Tests/findings.md) | Claude Code | 2026-05-24 | `42b0037` | Re-reviewed | 0 | 30 |
@@ -29,42 +29,12 @@ Findings with status `Open` or `In Progress`, ordered by severity.
| ID | Severity | Category | Location | Description |
|---|---|---|---|---|
| Client.Java-032 | High | Documentation & comments | `clients/java/README.md:182-183` | Commit `8738735` ("clients: document StreamAlarms + AcknowledgeAlarm in each README") added two new gradle invocations to the CLI Usage block: ``` gradle :zb-mom-ww-mxgateway-cli:run --args="stream-alarms --endpoint localhost:5000 --api-ke… |
| Client.Python-022 | High | Documentation & comments | `clients/python/README.md:201-202`, `clients/python/src/zb_mom_ww_mxgateway_cli/commands.py:389-420` | The README CLI examples added by commit `8738735` for the new alarm subcommands cite flags the CLI does not accept: ``` mxgw-py stream-alarms --session-id <id> --max-messages 1 --json mxgw-py acknowledge-alarm --session-id <id> --alarm-ref… |
| Client.Rust-029 | High | mxaccessgw conventions | `clients/rust/src/options.rs:98,143`; `clients/rust/src/galaxy.rs:282`; `clients/rust/src/session.rs:664-671` | `cargo clippy --workspace --all-targets -- -D warnings` fails at HEAD `42b0037` with three errors that the prior d692232 reviewer noted as "out of scope for Client.Rust-021" but did not open as a tracked finding: ``` error: missing documen… |
| Client.Dotnet-018 | Medium | Documentation & comments | `clients/dotnet/README.md:137-138` | The README example block for the two new alarm CLI subcommands shipped in commit `11cc671` shows: ``` mxgw-dotnet stream-alarms --session-id <id> --max-messages 1 --json mxgw-dotnet acknowledge-alarm --session-id <id> --alarm-reference "\\… |
| Client.Go-022 | Medium | Code organization & conventions | `clients/go/cmd/mxgw-go/main.go:398-412,417-519` | Commit `8aaab82` ("Go client: port bulk read/write SDK methods + CLI subcommands") re-introduces every symptom that Client.Go-015 documented and was marked Resolved against an earlier commit: - `runWriteBulkVariant(ctx, args, stdout, stder… |
| Client.Go-023 | Medium | Concurrency & thread safety | `clients/go/cmd/mxgw-go/main.go:604-606,616-632` | `runBenchReadBulk`'s warm-up and steady-state loops are wall-clock-only again: ```go warmupDeadline := time.Now().Add(time.Duration(*warmupSeconds) * time.Second) timeout := time.Duration(*timeoutMs) * time.Millisecond for time.Now().Befor… |
| Client.Java-033 | Medium | Correctness & logic bugs | `clients/java/zb-mom-ww-mxgateway-cli/src/main/java/com/zb/mom/ww/mxgateway/cli/MxGatewayCli.java:1078-1098` | `StreamAlarmsCommand.call()` allocates a bounded `ArrayBlockingQueue<Object>(1024)` and the gRPC observer publishes each `AlarmFeedMessage` via `queue.offer(value)`: ``` BlockingQueue<Object> queue = new ArrayBlockingQueue<>(1024); … @Over… |
| Client.Java-034 | Medium | Correctness & logic bugs | `clients/java/zb-mom-ww-mxgateway-cli/src/main/java/com/zb/mom/ww/mxgateway/cli/MxGatewayCli.java:182-198` | `BatchCommand.call()` reads one CLI invocation per stdin line and tokenises with: ``` String[] args = line.trim().split("\\s+"); … int exitCode = cmd.execute(args); ``` `split("\\s+")` does no shell-quoting parsing — it just splits on whit… |
| Client.Python-023 | Medium | Security | `clients/python/src/zb_mom_ww_mxgateway_cli/commands.py:901-906` | Client.Python-013 (severity Medium, Security) was marked |
| Client.Python-024 | Medium | Code organization & conventions | `clients/python/src/zb_mom_ww_mxgateway_cli/commands.py:13,48-119` | The new `batch` subcommand (commit `71d2c39`) implements the cross-language batch protocol by importing `click.testing.CliRunner` into production code and calling `runner.invoke(main, args, catch_exceptions=True)` in a `for raw_line in sys… |
| Client.Rust-022 | Medium | Correctness & logic bugs | `clients/rust/src/session.rs:369-391,403-420,427-444,452-469,476-493,631-696,706-724` | Commit `3251069` re-introduced the bulk read/write SDK methods (`read_bulk`, `write_bulk`, `write2_bulk`, `write_secured_bulk`, `write_secured2_bulk`) on `Session`. Each method falls back to `Vec::new()` when an OK reply does not carry the… |
| Client.Rust-024 | Medium | Testing coverage | `clients/rust/tests/client_behavior.rs:405-415`; `clients/rust/src/session.rs:369-493`; `clients/rust/src/client.rs:265-291`; `clients/rust/crates/mxgw-cli/src/main.rs:1310-1505` | The diff under review adds substantial SDK and CLI surface with no positive-path coverage: 1. **`GatewayClient::stream_alarms`** (client.rs:280-291) has no test. The fake gateway's `stream_alarms` impl in `tests/client_behavior.rs:408-415`… |
| Server-044 | Medium | Correctness & logic bugs | `src/ZB.MOM.WW.MxGateway.Server/Sessions/SessionManager.cs:216-254` | `KillWorkerAsync` is the mirror of `CloseSessionCoreAsync` for the new admin-only Kill flow, but its catch path leaks the `mxgateway.sessions.open` gauge — the exact bug that Server-006 closed for `OpenSessionAsync`. The happy path increme… |
| Tests-027 | Medium | Concurrency & thread safety | `src/ZB.MOM.WW.MxGateway.Tests/Gateway/Grpc/MxAccessGatewayServiceTests.cs:199-240`, `src/ZB.MOM.WW.MxGateway.Server/Metrics/GatewayMetrics.cs:8,73,246-251` | The review brief explicitly flagged `MxAccessGatewayServiceTests.StreamEvents_WhenEventIsWritten_RecordsSendDuration` as a known flake that "passed solo on rerun". The root cause is the `MeterListener` subscribes by `instrument.Meter.Name… |
| Client.Dotnet-019 | Low | Correctness & logic bugs | `clients/dotnet/ZB.MOM.WW.MxGateway.Client.Cli/MxGatewayClientCli.cs:745` | Client.Dotnet-005 / 010 documented (and recorded as resolved) the silent register-handle fallback pattern `reply.Register?.ServerHandle ?? reply.ReturnValue.Int32Value`, where a successful protocol+MX-status reply missing its typed `regist… |
| Client.Dotnet-020 | Low | Error handling & resilience | `clients/dotnet/ZB.MOM.WW.MxGateway.Client.Cli/MxGatewayClientCli.cs:792-810`, `clients/dotnet/ZB.MOM.WW.MxGateway.Client.Cli/MxGatewayClientCli.cs:774-780` | `BenchReadBulkAsync`'s steady-state `while (DateTime.UtcNow < steadyDeadline)` loop wraps each `client.InvokeAsync(...)` in a bare `catch`: ```csharp try { reply = await client.InvokeAsync( CreateCommandRequest(sessionId, readBulkMxCommand… |
| Client.Dotnet-021 | Low | Correctness & logic bugs | `clients/dotnet/ZB.MOM.WW.MxGateway.Client.Cli/MxGatewayClientCli.cs:487`, `clients/dotnet/ZB.MOM.WW.MxGateway.Client.Cli/MxGatewayClientCli.cs:715` | Both new bulk-read CLI handlers cast a signed `--timeout-ms` argument to `uint` without bounds checking: ```csharp // ReadBulkAsync (line 487) TimeoutMs = (uint)arguments.GetInt32("timeout-ms", 0), // BenchReadBulkAsync (line 715) uint tim… |
| Client.Go-024 | Low | Testing coverage | `clients/go/mxgateway/session.go:395-525`, `clients/go/mxgateway/alarms.go:65-76` | The five new bulk SDK methods on `Session` and the new `Client.StreamAlarms` method have **no unit tests** in `clients/go/mxgateway/`: - `Session.WriteBulk` (`session.go:395`) - `Session.Write2Bulk` (`session.go:418`) - `Session.WriteSecur… |
| Client.Go-025 | Low | Correctness & logic bugs | `clients/go/mxgateway/session.go:395-485,495-525` | The five new bulk methods (`WriteBulk`, `Write2Bulk`, `WriteSecuredBulk`, `WriteSecured2Bulk`, `ReadBulk`) each guard with `if entries == nil { return error }` and an upper-bound `ensureBulkSize` check, but accept a non-nil empty slice (e.… |
| Client.Go-026 | Low | Error handling & resilience | `clients/go/cmd/mxgw-go/main.go:1196-1222` | `runBatch` reads command lines with a default `bufio.Scanner`: ```go scanner := bufio.NewScanner(in) for scanner.Scan() { ... } return scanner.Err() ``` The default `bufio.Scanner` token size is 64 KiB (`bufio.MaxScanTokenSize`). One long… |
| Client.Go-027 | Low | Code organization & conventions | `clients/go/cmd/mxgw-go/main.go:1195-1206` | `runBatch`'s doc-comment says the loop "never terminates on command error; only stdin EOF (or an empty line) ends the session", and the implementation matches: ```go for scanner.Scan() { line := scanner.Text() if line == "" { break } ... }… |
| Client.Java-035 | Low | Testing coverage | `clients/java/zb-mom-ww-mxgateway-client/src/test/java/com/zb/mom/ww/mxgateway/client/MxGatewayClientSessionTests.java` | Commit `8a0c59d` added `MxGatewayClient.streamAlarms(StreamAlarmsRequest, StreamObserver<AlarmFeedMessage>)` and a new public `MxGatewayAlarmFeedSubscription` class. No library-side test exercises either: a grep for `streamAlarms` across `… |
| Client.Java-036 | Low | Code organization & conventions | `clients/java/zb-mom-ww-mxgateway-client/src/main/java/com/zb/mom/ww/mxgateway/client/MxGatewayAlarmFeedSubscription.java`, `MxGatewayEventSubscription.java`, `MxGatewayActiveAlarmsSubscription.java`, `DeployEventSubscription.java` | `MxGatewayAlarmFeedSubscription` is a structural near-copy of `MxGatewayEventSubscription` — same `AtomicReference<ClientCallStreamObserver<…>>` + `AtomicBoolean cancelled` field shape, the same `wrap(observer)` returning a `ClientResponse… |
| Client.Python-025 | Low | Testing coverage | `clients/python/tests/test_cli.py`, `clients/python/src/zb_mom_ww_mxgateway/{client.py,session.py}`, `clients/python/src/zb_mom_ww_mxgateway_cli/commands.py` | Commits `6add4b4` and `828e3e6` added five new SDK methods (`Session.read_bulk`, `Session.write_bulk`, `Session.write2_bulk`, `Session.write_secured_bulk`, `Session.write_secured2_bulk`), `GatewayClient.stream_alarms`, the helper `_canceli… |
| Client.Python-026 | Low | Correctness & logic bugs | `clients/python/src/zb_mom_ww_mxgateway_cli/commands.py:674-738` | Two minor quality issues in the new `_bench_read_bulk` body (commit `6add4b4`): 1. `import time` is done inside the function body (line 676) rather than at module top. `PythonStyleGuide.md` does not state this explicitly, but every other h… |
| Client.Rust-023 | Low | mxaccessgw conventions | `clients/rust/crates/mxgw-cli/src/main.rs:835,872,1476` | Three CLI subcommands added since `d692232` hard-code their `client_correlation_id`: ```rust client_correlation_id: "rust-cli-stream-alarms".to_owned(), // line 835 client_correlation_id: "rust-cli-acknowledge-alarm".to_owned(), // line 87… |
| Client.Rust-025 | Low | Design-document adherence | `clients/rust/RustClientDesign.md:92-106,142-153,164-171` | CLAUDE.md mandates that "When public APIs, contracts, configuration, build steps, security behavior, event shapes, value conversion, status mapping, or lifecycle rules change, the affected docs ... must change in the same commit." The diff… |
| Client.Rust-026 | Low | Performance & resource management | `clients/rust/crates/mxgw-cli/src/main.rs:1402-1406,1419-1423` | `run_bench_read_bulk` clones the `tags: Vec<String>` on every iteration of both the warmup loop and the steady-state measurement loop: ```rust while Instant::now() < warmup_deadline { let _ = session .read_bulk(server_handle, tags.clone(),… |
| Client.Rust-027 | Low | Documentation & comments | `clients/rust/.cargo/config.toml:1-9` | The new build-config file added by `71d2c39` carries this leading comment: ``` [target.'cfg(windows)'] # Bump the default 1 MB Windows stack to 8 MB. clap-derive builds a large # Command enum in this CLI (one variant per subcommand, each c… |
| Client.Rust-028 | Low | mxaccessgw conventions | `clients/rust/crates/mxgw-cli/src/main.rs:1126-1166` | `run_batch` reads commands from stdin with the blocking `std::io::Stdin::lock().lines()` iterator while the surrounding function is `async fn` and the runtime is `#[tokio::main]` (multi-threaded by default). Each `for line in stdin.lock().… |
| IntegrationTests-025 | Low | Correctness & logic bugs | `src/ZB.MOM.WW.MxGateway.IntegrationTests/IntegrationTestEnvironmentTests.cs:57-84` (`ResolveRepositoryRoot_NoMarkers_ThrowsInvalidOperationExceptionNamingStartAndMarkers`) | The new regression test for IntegrationTests-022 builds an "isolated" start directory under `Path.GetTempPath()` (e.g. `C:\Users\<user>\AppData\Local\Temp\<random>\nested` on Windows) and calls `ResolveRepositoryRoot(isolatedStart)`, asser… |
| Server-045 | Low | Concurrency & thread safety | `src/ZB.MOM.WW.MxGateway.Server/Sessions/SessionManager.cs:225,242-245`, `src/ZB.MOM.WW.MxGateway.Server/Sessions/GatewaySession.cs:837-841` | `KillWorkerAsync` reads `session.State` once into a local `bool wasClosed` (line 225) before calling `session.KillWorker(reason)`. The read is unsynchronized — `State` is a getter that takes `_syncRoot` internally so the read itself is saf… |
| Server-046 | Low | Error handling & resilience | `src/ZB.MOM.WW.MxGateway.Server/Sessions/SessionManager.cs:286-307` | `ShutdownAsync` was updated to fall back to `KillWorker` when `CloseSessionCoreAsync` throws (lines 294-305) — a useful resilience improvement on its own. But the fallback's bookkeeping is wrong: `session.KillWorker(GatewayShutdownReason)`… |
| Server-047 | Low | Code organization & conventions | `src/ZB.MOM.WW.MxGateway.Server/Dashboard/Components/Pages/ApiKeysPage.razor:324-334`, `src/ZB.MOM.WW.MxGateway.Server/Dashboard/Components/Pages/SessionsPage.razor:171-195`, `src/ZB.MOM.WW.MxGateway.Server/Dashboard/Components/Pages/SessionDetailsPage.razor:231-255` | The shared `ConfirmDialog.razor` (added in `0e56b5b` / `24cc5fd`) is wired by three pages, but the pages handle `PendingAction` cleanup inconsistently: - `ApiKeysPage.ConfirmPendingAsync` captures the action, sets `PendingAction = null` sy… |
| Server-048 | Low | Testing coverage | `src/ZB.MOM.WW.MxGateway.Tests/Gateway/Sessions/SessionManagerTests.cs:463-498` | The two new `KillWorkerAsync_*` tests cover the happy path (`KillWorkerAsync_KillsWorkerAndRemovesSession`) and the missing-session error (`KillWorkerAsync_WhenSessionMissing_ThrowsSessionNotFound`). Three behaviorally distinct cases are m… |
| Server-049 | Low | Documentation & comments | `src/ZB.MOM.WW.MxGateway.Server/Dashboard/IDashboardSessionAdminService.cs:5-18`, `src/ZB.MOM.WW.MxGateway.Server/Dashboard/DashboardSessionAdminService.cs:8-25` | `IDashboardSessionAdminService` declares three members — `CanManage`, `CloseSessionAsync`, `KillWorkerAsync` — none of which carry XML documentation. `DashboardSessionAdminService.CanManage` and the two operation methods are also undocumen… |
| Server-050 | Low | Error handling & resilience | `src/ZB.MOM.WW.MxGateway.Server/Dashboard/DashboardSessionAdminService.cs:42-75,92-125` | `CloseSessionAsync` and `KillWorkerAsync` catch only `SessionManagerException` (the `SessionNotFound` filter, then a general `SessionManagerException` catch). Anything else propagates raw to Blazor's error boundary. The propagation paths e… |
| Tests-028 | Low | Testing coverage | `src/ZB.MOM.WW.MxGateway.Tests/Gateway/Sessions/SessionManagerTests.cs:466-496,802-807`, `src/ZB.MOM.WW.MxGateway.Server/Sessions/SessionManager.cs:216-253` | The new `KillWorkerAsync_KillsWorkerAndRemovesSession` (line 466) and `KillWorkerAsync_WhenSessionMissing_ThrowsSessionNotFound` (line 486) pin the new kill-path entry, but they do not pin the `reason` argument propagating through the chai… |
| Tests-029 | Low | Error handling & resilience | `src/ZB.MOM.WW.MxGateway.Tests/Gateway/Dashboard/DashboardSessionAdminServiceTests.cs:61-106,139-222`, `src/ZB.MOM.WW.MxGateway.Server/Dashboard/DashboardSessionAdminService.cs:77-125` | The new `DashboardSessionAdminServiceTests` covers the happy path and the viewer-denial path for both `CloseSessionAsync` and `KillWorkerAsync`, plus `CloseSessionAsync_WhenSessionMissing_ReportsFriendlyError` for the close-side `SessionNo… |
| Tests-030 | Low | Testing coverage | `src/ZB.MOM.WW.MxGateway.Tests/Gateway/Dashboard/DashboardApiKeyManagementServiceTests.cs:115-163`, `src/ZB.MOM.WW.MxGateway.Server/Dashboard/DashboardApiKeyManagementService.cs:146-177` | The three new `DeleteAsync_*` fixtures cover unauthorised user, success path with audit, and store-refuses-with-friendly-error. They do not exercise two production behaviours: (1) `DeleteAsync_WhenStoreRefuses_ReportsFriendlyError` (line 1… |
@@ -80,11 +50,13 @@ Findings with status `Resolved`, `Won't Fix`, or `Deferred`.
| Client.Go-001 | High | Resolved | Correctness & logic bugs | `clients/go/mxgateway/errors.go:88-93`, `clients/go/mxgateway/errors.go:117-128` |
| Client.Java-013 | High | Resolved | Testing coverage | `clients/java/mxgateway-cli/src/test/java/com/dohertylan/mxgateway/cli/MxGatewayCliTests.java:212-304`, `clients/java/mxgateway-cli/src/main/java/com/dohertylan/mxgateway/cli/MxGatewayCli.java:1214-1244` |
| Client.Python-018 | High | Resolved | Code organization & conventions | `clients/python/pyproject.toml:11` |
| Client.Python-022 | High | Resolved | Documentation & comments | `clients/python/README.md:201-202`, `clients/python/src/zb_mom_ww_mxgateway_cli/commands.py:389-420` |
| Client.Rust-001 | High | Resolved | mxaccessgw conventions | `clients/rust/src/options.rs:98,143` |
| Client.Rust-002 | High | Resolved | mxaccessgw conventions | `clients/rust/src/session.rs:522` |
| Client.Rust-003 | High | Resolved | Correctness & logic bugs | `clients/rust/crates/mxgw-cli/src/main.rs:1051` |
| Client.Rust-012 | High | Resolved | mxaccessgw conventions | `clients/rust/src/galaxy.rs:282` |
| Client.Rust-013 | High | Resolved | mxaccessgw conventions | `src/MxGateway.Contracts/Protos/mxaccess_gateway.proto:414-424` (origin); `clients/rust/src/generated.rs:11-31` (suppression site) |
| Client.Rust-029 | High | Resolved | mxaccessgw conventions | `clients/rust/src/options.rs:98,143`; `clients/rust/src/galaxy.rs:282`; `clients/rust/src/session.rs:664-671` |
| IntegrationTests-001 | High | Resolved | Design-document adherence | `src/MxGateway.IntegrationTests/Galaxy/LiveGalaxyRepositoryFactAttribute.cs:7`, `src/MxGateway.IntegrationTests/Galaxy/GalaxyRepositoryLiveTests.cs` |
| IntegrationTests-002 | High | Resolved | Design-document adherence | `src/MxGateway.IntegrationTests/DashboardLdapLiveTests.cs:13`, `src/MxGateway.Server/Configuration/LdapOptions.cs:27` |
| Server-003 | High | Resolved | Security | `src/MxGateway.Server/Dashboard/DashboardAuthorizationHandler.cs:39,54-59`, `src/MxGateway.Server/Dashboard/DashboardAuthenticator.cs:236-258` |
@@ -99,8 +71,11 @@ Findings with status `Resolved`, `Won't Fix`, or `Deferred`.
| Client.Dotnet-001 | Medium | Resolved | Error handling & resilience | `clients/dotnet/MxGateway.Client/GrpcMxGatewayClientTransport.cs:190-199`, `clients/dotnet/MxGateway.Client/GrpcGalaxyRepositoryClientTransport.cs:131-140` |
| Client.Dotnet-002 | Medium | Resolved | Testing coverage | `clients/dotnet/MxGateway.Client.Tests/FakeGatewayTransport.cs:145-148`, `clients/dotnet/MxGateway.Client.Tests/MxGatewayClientSessionTests.cs:236-256` |
| Client.Dotnet-003 | Medium | Resolved | Concurrency & thread safety | `clients/dotnet/MxGateway.Client/MxGatewaySession.cs:659-663`, `clients/dotnet/MxGateway.Client/MxGatewayClient.cs:230-240` |
| Client.Dotnet-018 | Medium | Resolved | Documentation & comments | `clients/dotnet/README.md:137-138` |
| Client.Go-002 | Medium | Resolved | Error handling & resilience | `clients/go/mxgateway/session.go:440-516` |
| Client.Go-003 | Medium | Resolved | Correctness & logic bugs | `clients/go/cmd/mxgw-go/main.go:517-532` |
| Client.Go-022 | Medium | Resolved | Code organization & conventions | `clients/go/cmd/mxgw-go/main.go:398-412,417-519` |
| Client.Go-023 | Medium | Resolved | Concurrency & thread safety | `clients/go/cmd/mxgw-go/main.go:604-606,616-632` |
| Client.Java-001 | Medium | Resolved | Security | `clients/java/mxgateway-client/src/main/java/com/dohertylan/mxgateway/client/MxGatewaySecrets.java:30-32` |
| Client.Java-002 | Medium | Resolved | Concurrency & thread safety | `clients/java/mxgateway-client/src/main/java/com/dohertylan/mxgateway/client/MxEventStream.java:31,66-92` |
| Client.Java-003 | Medium | Resolved | mxaccessgw conventions | `clients/java/mxgateway-client/src/main/java/com/dohertylan/mxgateway/client/MxGatewayClient.java:119-140` |
@@ -115,11 +90,15 @@ Findings with status `Resolved`, `Won't Fix`, or `Deferred`.
| Client.Python-005 | Medium | Resolved | Performance & resource management | `clients/python/src/mxgateway/galaxy.py:117-140` |
| Client.Python-009 | Medium | Resolved | Testing coverage | `clients/python/tests/` |
| Client.Python-013 | Medium | Resolved | Security | `clients/python/src/mxgateway_cli/commands.py:757-762` |
| Client.Python-023 | Medium | Resolved | Security | `clients/python/src/zb_mom_ww_mxgateway_cli/commands.py:901-906` |
| Client.Python-024 | Medium | Resolved | Code organization & conventions | `clients/python/src/zb_mom_ww_mxgateway_cli/commands.py:13,48-119` |
| Client.Rust-005 | Medium | Resolved | Correctness & logic bugs | `clients/rust/src/session.rs:489-520` |
| Client.Rust-006 | Medium | Resolved | Error handling & resilience | `clients/rust/src/session.rs:531-555` |
| Client.Rust-015 | Medium | Resolved | Error handling & resilience | `clients/rust/crates/mxgw-cli/src/main.rs:1053-1070` |
| Client.Rust-016 | Medium | Resolved | Testing coverage | `clients/rust/tests/client_behavior.rs`, `clients/rust/src/session.rs:489-519,654-768` |
| Client.Rust-018 | Medium | Resolved | Error handling & resilience | `clients/rust/crates/mxgw-cli/src/main.rs:1098-1170`; `scripts/bench-read-bulk.ps1:347-365`; siblings: `clients/go/cmd/mxgw-go/main.go:600-648`, `clients/python/src/mxgateway_cli/commands.py:614-662`, `clients/dotnet/MxGateway.Client.Cli/MxGatewayClientCli.cs:685-770`, `clients/java/mxgateway-cli/src/main/java/com/dohertylan/mxgateway/cli/MxGatewayCli.java:855-940` |
| Client.Rust-022 | Medium | Resolved | Correctness & logic bugs | `clients/rust/src/session.rs:369-391,403-420,427-444,452-469,476-493,631-696,706-724` |
| Client.Rust-024 | Medium | Resolved | Testing coverage | `clients/rust/tests/client_behavior.rs:405-415`; `clients/rust/src/session.rs:369-493`; `clients/rust/src/client.rs:265-291`; `clients/rust/crates/mxgw-cli/src/main.rs:1310-1505` |
| Contracts-002 | Medium | Resolved | Error handling & resilience | `src/MxGateway.Contracts/Protos/mxaccess_gateway.proto:384-385`, `:95` |
| Contracts-009 | Medium | Resolved | Design-document adherence | `docs/Contracts.md:13-24` |
| IntegrationTests-003 | Medium | Resolved | Correctness & logic bugs | `src/MxGateway.IntegrationTests/WorkerLiveMxAccessSmokeTests.cs:89-97` |
@@ -142,6 +121,7 @@ Findings with status `Resolved`, `Won't Fix`, or `Deferred`.
| Server-032 | Medium | Resolved | Error handling & resilience | `src/ZB.MOM.WW.MxGateway.Server/Workers/WorkerClient.cs:510-569` (gateway-side `_events` channel); `src/ZB.MOM.WW.MxGateway.Server/Workers/WorkerClientOptions.cs:45-53` (`EventChannelFullModeTimeout`) |
| Server-033 | Medium | Resolved | Error handling & resilience | `src/MxGateway.Server/Galaxy/GalaxyHierarchyCache.cs:265-323` (`TryRestoreFromDiskAsync`), `:84-99` (`_firstLoad` / `WaitForFirstLoadAsync`); `src/MxGateway.Server/Grpc/GalaxyRepositoryGrpcService.cs:141-163` (`WaitForCacheBootstrap`) |
| Server-038 | Medium | Resolved | Security | `src/ZB.MOM.WW.MxGateway.Server/Dashboard/Hubs/EventsHub.cs:23-44` |
| Server-044 | Medium | Resolved | Correctness & logic bugs | `src/ZB.MOM.WW.MxGateway.Server/Sessions/SessionManager.cs:216-254` |
| Tests-003 | Medium | Resolved | Performance & resource management | `src/MxGateway.Tests/Security/Authentication/SqliteAuthStoreTests.cs:170-176`, `src/MxGateway.Tests/Security/Authentication/ApiKeyAdminCliRunnerTests.cs:252-258` |
| Tests-004 | Medium | Resolved | Testing coverage | `src/MxGateway.Tests/Security/Authorization/GatewayGrpcAuthorizationInterceptorTests.cs` |
| Tests-005 | Medium | Resolved | Testing coverage | `src/MxGateway.Tests/Gateway/Grpc/EventStreamServiceTests.cs:239-261`, `src/MxGateway.Tests/Gateway/Sessions/SessionManagerTests.cs` |
@@ -180,6 +160,9 @@ Findings with status `Resolved`, `Won't Fix`, or `Deferred`.
| Client.Dotnet-015 | Low | Resolved | Correctness & logic bugs | `clients/dotnet/MxGateway.Client.Cli/MxGatewayClientCli.cs:221-236`, `clients/dotnet/MxGateway.Client.Cli/MxGatewayClientCli.cs:596-1065` |
| Client.Dotnet-016 | Low | Resolved | Concurrency & thread safety | `clients/dotnet/MxGateway.Client.Cli/MxGatewayClientCli.cs:922-976` |
| Client.Dotnet-017 | Low | Resolved | Error handling & resilience | `clients/dotnet/MxGateway.Client.Cli/MxGatewayClientCli.cs:1190-1262` |
| Client.Dotnet-019 | Low | Resolved | Correctness & logic bugs | `clients/dotnet/ZB.MOM.WW.MxGateway.Client.Cli/MxGatewayClientCli.cs:745` |
| Client.Dotnet-020 | Low | Resolved | Error handling & resilience | `clients/dotnet/ZB.MOM.WW.MxGateway.Client.Cli/MxGatewayClientCli.cs:792-810`, `clients/dotnet/ZB.MOM.WW.MxGateway.Client.Cli/MxGatewayClientCli.cs:774-780` |
| Client.Dotnet-021 | Low | Resolved | Correctness & logic bugs | `clients/dotnet/ZB.MOM.WW.MxGateway.Client.Cli/MxGatewayClientCli.cs:487`, `clients/dotnet/ZB.MOM.WW.MxGateway.Client.Cli/MxGatewayClientCli.cs:715` |
| Client.Go-004 | Low | Resolved | mxaccessgw conventions | `clients/go/mxgateway/alarms_test.go:153-154`, `clients/go/mxgateway/galaxy_test.go:58-59` |
| Client.Go-005 | Low | Resolved | Design-document adherence | `clients/go/mxgateway/client.go:64,68`, `clients/go/mxgateway/galaxy.go:83,87` |
| Client.Go-006 | Low | Resolved | Error handling & resilience | `clients/go/mxgateway/errors.go:9-130` |
@@ -198,6 +181,10 @@ Findings with status `Resolved`, `Won't Fix`, or `Deferred`.
| Client.Go-019 | Low | Resolved | Documentation & comments | `clients/go/cmd/mxgw-go/main.go:710-716`, `clients/go/cmd/mxgw-go/main.go:1204,1213` |
| Client.Go-020 | Low | Resolved | Code organization & conventions | `clients/go/cmd/mxgw-go/main.go:753-802`, `clients/go/cmd/mxgw-go/main.go:1199-1275` |
| Client.Go-021 | Low | Resolved | Testing coverage | `clients/go/cmd/mxgw-go/main_test.go`, `clients/go/cmd/mxgw-go/main.go:363-520,522-655` |
| Client.Go-024 | Low | Resolved | Testing coverage | `clients/go/mxgateway/session.go:395-525`, `clients/go/mxgateway/alarms.go:65-76` |
| Client.Go-025 | Low | Resolved | Correctness & logic bugs | `clients/go/mxgateway/session.go:395-485,495-525` |
| Client.Go-026 | Low | Resolved | Error handling & resilience | `clients/go/cmd/mxgw-go/main.go:1196-1222` |
| Client.Go-027 | Low | Resolved | Code organization & conventions | `clients/go/cmd/mxgw-go/main.go:1195-1206` |
| Client.Java-006 | Low | Resolved | Performance & resource management | `clients/java/mxgateway-client/src/main/java/com/dohertylan/mxgateway/client/MxGatewayClient.java:323-328`, `clients/java/mxgateway-client/src/main/java/com/dohertylan/mxgateway/client/GalaxyRepositoryClient.java:279-284` |
| Client.Java-007 | Low | Resolved | Testing coverage | `clients/java/mxgateway-client/src/test/java/com/dohertylan/mxgateway/client/` |
| Client.Java-008 | Low | Resolved | Error handling & resilience | `clients/java/mxgateway-client/src/main/java/com/dohertylan/mxgateway/client/MxGatewayClient.java:298-304` |
@@ -234,6 +221,8 @@ Findings with status `Resolved`, `Won't Fix`, or `Deferred`.
| Client.Python-019 | Low | Resolved | Code organization & conventions | `clients/python/pyproject.toml:60-61`, `clients/python/src/mxgateway_cli/` |
| Client.Python-020 | Low | Resolved | Testing coverage | `clients/python/tests/`, `scripts/` |
| Client.Python-021 | Low | Resolved | Documentation & comments | `clients/python/src/mxgateway_cli/commands.py`, `clients/python/README.md:235-258` |
| Client.Python-025 | Low | Resolved | Testing coverage | `clients/python/tests/test_cli.py`, `clients/python/src/zb_mom_ww_mxgateway/{client.py,session.py}`, `clients/python/src/zb_mom_ww_mxgateway_cli/commands.py` |
| Client.Python-026 | Low | Resolved | Correctness & logic bugs | `clients/python/src/zb_mom_ww_mxgateway_cli/commands.py:674-738` |
| Client.Rust-004 | Low | Resolved | Documentation & comments | `clients/rust/src/version.rs:7` |
| Client.Rust-007 | Low | Resolved | Design-document adherence | `clients/rust/RustClientDesign.md:14-55` |
| Client.Rust-008 | Low | Resolved | Performance & resource management | `clients/rust/src/value.rs:161-261` |
@@ -245,6 +234,11 @@ Findings with status `Resolved`, `Won't Fix`, or `Deferred`.
| Client.Rust-019 | Low | Resolved | Design-document adherence | `clients/rust/RustClientDesign.md:96-100` |
| Client.Rust-020 | Low | Resolved | Documentation & comments | `clients/rust/src/session.rs:31-46`; `clients/rust/src/lib.rs:14-39` |
| Client.Rust-021 | Low | Resolved | Design-document adherence | `clients/rust/RustClientDesign.md:14-33` |
| Client.Rust-023 | Low | Resolved | mxaccessgw conventions | `clients/rust/crates/mxgw-cli/src/main.rs:835,872,1476` |
| Client.Rust-025 | Low | Resolved | Design-document adherence | `clients/rust/RustClientDesign.md:92-106,142-153,164-171` |
| Client.Rust-026 | Low | Resolved | Performance & resource management | `clients/rust/crates/mxgw-cli/src/main.rs:1402-1406,1419-1423` |
| Client.Rust-027 | Low | Resolved | Documentation & comments | `clients/rust/.cargo/config.toml:1-9` |
| Client.Rust-028 | Low | Resolved | mxaccessgw conventions | `clients/rust/crates/mxgw-cli/src/main.rs:1126-1166` |
| Contracts-001 | Low | Resolved | Design-document adherence | `docs/Grpc.md:13` (and `:3`, `:32`, `:39`) |
| Contracts-003 | Low | Won't Fix | Code organization & conventions | `src/MxGateway.Contracts/MxGateway.Contracts.csproj:10` |
| Contracts-004 | Low | Resolved | Documentation & comments | `src/MxGateway.Contracts/GatewayContractInfo.cs:3-6` |
@@ -302,6 +296,12 @@ Findings with status `Resolved`, `Won't Fix`, or `Deferred`.
| Server-041 | Low | Resolved | Design-document adherence | `src/ZB.MOM.WW.MxGateway.Server/Grpc/EventStreamService.cs:123-126`, `src/ZB.MOM.WW.MxGateway.Server/Dashboard/Hubs/IDashboardEventBroadcaster.cs:6-10` |
| Server-042 | Low | Resolved | Performance & resource management | `src/ZB.MOM.WW.MxGateway.Server/Dashboard/Hubs/DashboardSnapshotPublisher.cs:18-41` |
| Server-043 | Low | Resolved | Documentation & comments | `src/ZB.MOM.WW.MxGateway.Server/Dashboard/HubTokenService.cs:1`, `src/ZB.MOM.WW.MxGateway.Server/Dashboard/DashboardServiceCollectionExtensions.cs:24` |
| Server-045 | Low | Resolved | Concurrency & thread safety | `src/ZB.MOM.WW.MxGateway.Server/Sessions/SessionManager.cs:225,242-245`, `src/ZB.MOM.WW.MxGateway.Server/Sessions/GatewaySession.cs:837-841` |
| Server-046 | Low | Resolved | Error handling & resilience | `src/ZB.MOM.WW.MxGateway.Server/Sessions/SessionManager.cs:286-307` |
| Server-047 | Low | Resolved | Code organization & conventions | `src/ZB.MOM.WW.MxGateway.Server/Dashboard/Components/Pages/ApiKeysPage.razor:324-334`, `src/ZB.MOM.WW.MxGateway.Server/Dashboard/Components/Pages/SessionsPage.razor:171-195`, `src/ZB.MOM.WW.MxGateway.Server/Dashboard/Components/Pages/SessionDetailsPage.razor:231-255` |
| Server-048 | Low | Resolved | Testing coverage | `src/ZB.MOM.WW.MxGateway.Tests/Gateway/Sessions/SessionManagerTests.cs:463-498` |
| Server-049 | Low | Resolved | Documentation & comments | `src/ZB.MOM.WW.MxGateway.Server/Dashboard/IDashboardSessionAdminService.cs:5-18`, `src/ZB.MOM.WW.MxGateway.Server/Dashboard/DashboardSessionAdminService.cs:8-25` |
| Server-050 | Low | Resolved | Error handling & resilience | `src/ZB.MOM.WW.MxGateway.Server/Dashboard/DashboardSessionAdminService.cs:42-75,92-125` |
| Tests-007 | Low | Resolved | Code organization & conventions | `src/MxGateway.Tests/Gateway/Grpc/MxAccessGatewayServiceTests.cs:682`, `src/MxGateway.Tests/Gateway/Grpc/GalaxyRepositoryGrpcServiceTests.cs:324`, `src/MxGateway.Tests/Gateway/GatewayEndToEndFakeWorkerSmokeTests.cs:460`, `src/MxGateway.Tests/Security/Authorization/GatewayGrpcAuthorizationInterceptorTests.cs:233` |
| Tests-008 | Low | Resolved | mxaccessgw conventions | `src/MxGateway.Tests/Gateway/Sessions/WorkerAlarmRpcDispatcherTests.cs:1-9`, `src/MxGateway.Tests/Gateway/Sessions/NotWiredAlarmRpcDispatcherTests.cs:1-3`, `src/MxGateway.Tests/Gateway/Sessions/SessionManagerAlarmAutoSubscribeTests.cs:1` |
| Tests-009 | Low | Resolved | Documentation & comments | `src/MxGateway.Tests/Gateway/Sessions/SessionManagerTests.cs:36-37,99,365` |
+22 -8
View File
@@ -7,7 +7,7 @@
| Review date | 2026-05-24 |
| Commit reviewed | `42b0037` |
| Status | Re-reviewed |
| Open findings | 7 |
| Open findings | 0 |
## Checklist coverage
@@ -816,7 +816,7 @@ Add a regression test that advises N items without an active `StreamEvents` cons
| Severity | Medium |
| Category | Correctness & logic bugs |
| Location | `src/ZB.MOM.WW.MxGateway.Server/Sessions/SessionManager.cs:216-254` |
| Status | Open |
| Status | Resolved |
**Description:** `KillWorkerAsync` is the mirror of `CloseSessionCoreAsync` for the new admin-only Kill flow, but its catch path leaks the `mxgateway.sessions.open` gauge — the exact bug that Server-006 closed for `OpenSessionAsync`. The happy path increments `_metrics.SessionClosed()` once after `session.KillWorker(reason)` returns (line 244), which decrements `_openSessions`. The catch path, however, records `_metrics.Fault(...)`, calls `session.MarkFaulted(...)`, and then awaits `RemoveSessionAsync(session)` — but never calls `_metrics.SessionClosed()` (nor `SessionRemoved()`), so a kill that throws from `session.KillWorker` leaves the open-session gauge permanently incremented. `RemoveSessionAsync` only calls `_metrics.RemoveSessionEvents(...)` and `ReleaseSessionSlot()`; neither touches `_openSessions`. Server-006's fix pattern (track whether the open-counter was recorded, and decrement on the failing path) was applied to `OpenSessionAsync` but not propagated to this new write path.
@@ -824,6 +824,8 @@ In practice the trigger is narrow — `GatewaySession.KillWorker` calls `_worker
**Recommendation:** Mirror Server-006's fix: track whether the session was counted as opened (it always is in `KillWorkerAsync``GetRequiredSession` only succeeds for sessions in the registry, all of which had `SessionOpened()` called), and decrement on the failing path. Concretely, add `_metrics.SessionClosed()` (or `_metrics.SessionRemoved()` if the kill is being treated as an unclean removal) inside the catch block before `RemoveSessionAsync(session)`. The cleanest form is to record `SessionClosed()` once at the top of the method (under a flag), then only re-record if the happy path actually transitions; or to add `_metrics.SessionClosed()` in the catch right after `MarkFaulted`. Add a `SessionManagerTests.KillWorkerAsync_WhenSessionKillThrows_DecrementsOpenSessionGauge` regression test that uses a `FakeWorkerClient.KillThrows = true` to exercise the catch.
**Resolution:** 2026-05-24 — Confirmed against source: `KillWorkerAsync`'s catch block called `MarkFaulted`, `Fault`, and `RemoveSessionAsync` but never decremented the open-session gauge, mirroring exactly the Server-006 leak on the open path. The catch path now calls `_metrics.SessionRemoved()` after `MarkFaulted`, so the gauge is restored when `session.KillWorker` (via the new `KillWorkerWithCloseGateAsync` helper) throws. Combined with the Server-045 fix (the kill path now routes through a new `GatewaySession.KillWorkerWithCloseGateAsync` that takes the per-session `_closeLock`), every session reaching `KillWorkerAsync` had `SessionOpened()` recorded and the catch correctly decrements it. Regression test in `src/ZB.MOM.WW.MxGateway.Tests/Gateway/Sessions/SessionManagerTests.cs`: `KillWorkerAsync_WhenSessionKillThrows_DecrementsOpenSessionGauge` (uses a new `FakeWorkerClient.KillException` flag to force `_workerClient.Kill` to throw and asserts the open-session gauge returns to 0 after the kill faults). Confirmed to fail before the fix and pass after.
### Server-045
| Field | Value |
@@ -831,12 +833,14 @@ In practice the trigger is narrow — `GatewaySession.KillWorker` calls `_worker
| Severity | Low |
| Category | Concurrency & thread safety |
| Location | `src/ZB.MOM.WW.MxGateway.Server/Sessions/SessionManager.cs:225,242-245`, `src/ZB.MOM.WW.MxGateway.Server/Sessions/GatewaySession.cs:837-841` |
| Status | Open |
| Status | Resolved |
**Description:** `KillWorkerAsync` reads `session.State` once into a local `bool wasClosed` (line 225) before calling `session.KillWorker(reason)`. The read is unsynchronized — `State` is a getter that takes `_syncRoot` internally so the read itself is safe, but there is no lock spanning "read state, call KillWorker, conditionally record metric." Two concurrent `KillWorkerAsync` calls on the same session (e.g. one operator clicking Kill on the Sessions page and another clicking Kill on the Session Details page within the same render tick) can both observe `wasClosed = false`, then both call `session.KillWorker(...)` (the second is effectively a no-op because `TransitionTo` refuses to overwrite `Closed`), and both call `_metrics.SessionClosed()` at line 244. The `_openSessions` gauge is bounded at 0 by `GatewayMetrics.SessionClosed`'s `if (_openSessions > 0)` guard, but the `_sessionsClosed` counter (and the `mxgateway.sessions.closed` counter exported by the meter) is double-incremented; `_metrics.Fault` is not used here, so the only mitigation is the SessionsRegistry race — the second call's `GetRequiredSession` could miss if the first already removed the session via `RemoveSessionAsync`, but only if the second arrives after the first's removal completes. The window is small but exists, and the same race exists for "Kill from one tab while the lease-expired sweep is closing the session." `CloseSessionCoreAsync` has the same shape, so this isn't a regression specifically from the kill change — but the new path widens the surface where the issue can fire.
**Recommendation:** Either (a) gate `KillWorkerAsync` on a per-session lock — extending the `_closeLock` pattern that `GatewaySession.CloseAsync` already uses, or introducing a new `_killLock` and accepting that close + kill don't serialize against each other — or (b) accept the metric double-count as harmless and document it on `KillWorkerAsync`'s XML doc. Option (a) is the more defensible long-term fix; option (b) is acceptable for v1 if the metric is purely informational. Adding a test that issues concurrent kills against the same session id and asserts `_sessionsClosed == 1` would pin the chosen behavior either way.
**Resolution:** 2026-05-24 — Took recommended option (a). Added `GatewaySession.KillWorkerWithCloseGateAsync(reason, ct)` that acquires the per-session `_closeLock`, reads `_state` under `_syncRoot`, calls `_workerClient.Kill(reason)`, then `TransitionTo(Closed)`, and returns the wasClosed observation. `SessionManager.KillWorkerAsync` now invokes that helper instead of reading `State` and calling `KillWorker` separately. Concurrent kill (and concurrent close+kill) callers now serialize on `_closeLock`, so the first caller observes `wasClosed=false` and the second observes `wasClosed=true`, eliminating the double-increment of `mxgateway.sessions.closed`. Regression test in `src/ZB.MOM.WW.MxGateway.Tests/Gateway/Sessions/SessionManagerTests.cs`: `KillWorkerAsync_ConcurrentCallsOnSameSession_CountClosedExactlyOnce` (issues two `KillWorkerAsync` calls on the same session id concurrently, accepts `SessionNotFound` on whichever loses the race after `RemoveSessionAsync`, and asserts `SessionsClosed == 1` and `OpenSessions == 0`).
### Server-046
| Field | Value |
@@ -844,12 +848,14 @@ In practice the trigger is narrow — `GatewaySession.KillWorker` calls `_worker
| Severity | Low |
| Category | Error handling & resilience |
| Location | `src/ZB.MOM.WW.MxGateway.Server/Sessions/SessionManager.cs:286-307` |
| Status | Open |
| Status | Resolved |
**Description:** `ShutdownAsync` was updated to fall back to `KillWorker` when `CloseSessionCoreAsync` throws (lines 294-305) — a useful resilience improvement on its own. But the fallback's bookkeeping is wrong: `session.KillWorker(GatewayShutdownReason)` is called and `RemoveSessionAsync(session)` is awaited, but `_metrics.SessionClosed()` is never invoked, so for every session whose graceful close throws, the `mxgateway.sessions.open` gauge stays incremented after shutdown completes. Worse, `CloseSessionCoreAsync`'s `SessionCloseStartedException` catch (line 330) already records `_metrics.SessionRemoved()` (line 334-336) before re-throwing — so for that specific exception type, the gauge is decremented inside the inner catch, then the outer fallback runs and does not double-decrement (good), but `_metrics.SessionClosed()` is never called, so the `_sessionsClosed` counter under-counts by one. For any other exception (the more common case), neither inner catch records anything, so both `_sessionsClosed` and `_openSessions` end up wrong: gauge is left high, counter is left low.
**Recommendation:** Inside the `ShutdownAsync` fallback (after the `KillWorker` call but before/inside the `RemoveSessionAsync`), call `_metrics.SessionClosed()` unless the inner catch already recorded the close. The simplest shape is to propagate a `wasClosed` flag out of `CloseSessionCoreAsync` (or replace the fallback's manual choreography with a single call into `KillWorkerAsync(...)`, which has the right metric path once Server-044 is fixed). The latter is the cleanest — `ShutdownAsync` becomes "try graceful, fall back to `KillWorkerAsync`," and there's exactly one accounting path for each session. Add a `SessionManagerTests.ShutdownAsync_WhenCloseThrows_StillDecrementsOpenSessionGauge` test using a session whose `CloseAsync` throws (e.g. a `BlockingShutdownWorkerClient` configured to throw on `ShutdownAsync`).
**Resolution:** 2026-05-24 — Two coordinated changes: (1) `CloseSessionCoreAsync`'s `SessionCloseStartedException` catch now calls `_metrics.SessionClosed()` (decrements the open-session gauge AND increments the closed counter) instead of `_metrics.SessionRemoved()` (gauge only). A close that ran far enough to attempt the worker shutdown but failed is still a closed session for accounting purposes — the session is removed from the registry and disposed below, so the counter must reflect that. (2) `ShutdownAsync`'s outer fallback now routes the kill through `KillWorkerAsync` (which has the correct metric path post-Server-044) rather than manually calling `session.KillWorker` + `RemoveSessionAsync`. In practice the inner catch already removes the session so the outer fallback is defensive — but routing both paths through the same accounting eliminates the inconsistency the finding called out. The pre-existing `CloseSessionAsync_WhenWorkerShutdownFails_RemovesSessionAndReleasesSlot` test was updated to assert the new (correct) `SessionsClosed == 1` value, with a comment back-referencing Server-046. New regression test in `src/ZB.MOM.WW.MxGateway.Tests/Gateway/Sessions/SessionManagerTests.cs`: `ShutdownAsync_WhenSessionCloseThrows_StillDecrementsOpenSessionGaugeAndIncrementsClosedCounter` (uses a `FakeWorkerClient.ShutdownException` to force the graceful close to throw, then asserts both the open-session gauge drops to 0 and the closed counter increments to 1). Confirmed to fail before the fix and pass after.
### Server-047
| Field | Value |
@@ -857,7 +863,7 @@ In practice the trigger is narrow — `GatewaySession.KillWorker` calls `_worker
| Severity | Low |
| Category | Code organization & conventions |
| Location | `src/ZB.MOM.WW.MxGateway.Server/Dashboard/Components/Pages/ApiKeysPage.razor:324-334`, `src/ZB.MOM.WW.MxGateway.Server/Dashboard/Components/Pages/SessionsPage.razor:171-195`, `src/ZB.MOM.WW.MxGateway.Server/Dashboard/Components/Pages/SessionDetailsPage.razor:231-255` |
| Status | Open |
| Status | Resolved |
**Description:** The shared `ConfirmDialog.razor` (added in `0e56b5b` / `24cc5fd`) is wired by three pages, but the pages handle `PendingAction` cleanup inconsistently:
@@ -868,6 +874,8 @@ The user-visible difference: rotating/revoking/deleting a key vs closing/killing
**Recommendation:** Align `ApiKeysPage.ConfirmPendingAsync` with the sessions pages: hold `PendingAction`, set `IsBusy = true`, run the action, then clear `PendingAction` in the `finally`. The current ApiKeysPage shape was inherited from before the dialog existed (when the confirmation was a `confirm()` JS call); the dialog component change can flatten the difference now. As a smaller alternative, document the divergence on the component's XML doc — but the shared component should ideally be used consistently.
**Resolution:** 2026-05-24 — Took the recommended alignment. `ApiKeysPage.ConfirmPendingAsync` (`src/ZB.MOM.WW.MxGateway.Server/Dashboard/Components/Pages/ApiKeysPage.razor`) now holds `PendingAction` for the duration of the awaited action (so the shared `ConfirmDialog` renders its `IsBusy` in-flight state on the dialog itself, matching the sessions pages) and clears it in `finally` regardless of outcome. The action is captured up front so a clear in `finally` works even when the action throws. `RunManagementActionAsync` continues to drive `IsBusy = true` inside its own `try/finally`, so the dialog now correctly disables Confirm/Cancel while the awaited service call runs. Pure UX-consistency change; no new automated test (no bUnit harness in the test project — same precedent as Server-010).
### Server-048
| Field | Value |
@@ -875,7 +883,7 @@ The user-visible difference: rotating/revoking/deleting a key vs closing/killing
| Severity | Low |
| Category | Testing coverage |
| Location | `src/ZB.MOM.WW.MxGateway.Tests/Gateway/Sessions/SessionManagerTests.cs:463-498` |
| Status | Open |
| Status | Resolved |
**Description:** The two new `KillWorkerAsync_*` tests cover the happy path (`KillWorkerAsync_KillsWorkerAndRemovesSession`) and the missing-session error (`KillWorkerAsync_WhenSessionMissing_ThrowsSessionNotFound`). Three behaviorally distinct cases are missing:
@@ -885,6 +893,8 @@ The user-visible difference: rotating/revoking/deleting a key vs closing/killing
**Recommendation:** Add the three tests above. The fakes in `MxGateway.Tests/TestSupport/` already cover most of the moving parts; `FakeWorkerClient` needs a single `ThrowOnKill` flag (or the existing `KillThrowing` if any).
**Resolution:** 2026-05-24 — Closed by the regression tests added for Server-044 and Server-045 per the prompt's direction: case (1) is covered by `KillWorkerAsync_WhenSessionKillThrows_DecrementsOpenSessionGauge` (uses the new `FakeWorkerClient.KillException` flag); case (3) is covered by `KillWorkerAsync_ConcurrentCallsOnSameSession_CountClosedExactlyOnce`. Case (2) (`wasClosed=true` short-circuit) is implicitly exercised by the concurrent test — once the kill path serializes on the per-session close lock (Server-045 fix), the second kill that wins the registry race observes `wasClosed=true` and skips the counter increment, which is what the test pins (`SessionsClosed == 1`, not 2). The dedicated `KillWorkerAsync_WhenSessionAlreadyClosed_DoesNotReincrementClosedCounter` test was drafted but removed: closing a session disposes it (Server-016's `_closeLock.Dispose()`), so re-issuing a kill against a previously-closed-and-disposed session always fails on the disposed semaphore, which is realistic for production but not a useful unit-test shape. No new test file; the regression coverage already lives in `src/ZB.MOM.WW.MxGateway.Tests/Gateway/Sessions/SessionManagerTests.cs`.
### Server-049
| Field | Value |
@@ -892,12 +902,14 @@ The user-visible difference: rotating/revoking/deleting a key vs closing/killing
| Severity | Low |
| Category | Documentation & comments |
| Location | `src/ZB.MOM.WW.MxGateway.Server/Dashboard/IDashboardSessionAdminService.cs:5-18`, `src/ZB.MOM.WW.MxGateway.Server/Dashboard/DashboardSessionAdminService.cs:8-25` |
| Status | Open |
| Status | Resolved |
**Description:** `IDashboardSessionAdminService` declares three members — `CanManage`, `CloseSessionAsync`, `KillWorkerAsync` — none of which carry XML documentation. `DashboardSessionAdminService.CanManage` and the two operation methods are also undocumented (only the constructor parameters are named). The C# style guide requires public-surface XML docs and CLAUDE.md mandates that "docs change with the code." The peer `IDashboardApiKeyManagementService` is also undocumented, so this isn't unique — but the new interface is a fresh public surface being landed in `c5e7479`, and the contract subtleties (CanManage returns false for non-Admin; missing-session paths surface as `Succeeded = false` not as a thrown exception; `KillReason` is fixed at `"dashboard-admin-kill"` and that value reaches the audit log) are exactly what XML docs are for.
**Recommendation:** Add `<summary>` blocks to `IDashboardSessionAdminService.CanManage` (states the Admin-role gate), `CloseSessionAsync` and `KillWorkerAsync` (state that missing sessions return `DashboardSessionAdminResult.Fail(...)` rather than throwing, and that the audit log captures actor + remote IP). Add `<param>` and `<returns>` for the request/response shape. The same sweep can pick up the longstanding gap on `IDashboardApiKeyManagementService` if the team wants — but the new file is the load-bearing one.
**Resolution:** 2026-05-24 — Added `<summary>` + `<remarks>` blocks to every member of `IDashboardSessionAdminService` (`src/ZB.MOM.WW.MxGateway.Server/Dashboard/IDashboardSessionAdminService.cs`): an interface-level `<remarks>` describing the Admin-role gate, audit log shape, and `DashboardSessionAdminResult.Fail` semantics; per-member docs on `CanManage`, `CloseSessionAsync`, and `KillWorkerAsync` calling out the missing-session-returns-Fail contract and the `dashboard-admin-kill` reason constant that reaches the worker-kill audit log and `mxgateway.workers.killed` counter tag. `DashboardSessionAdminService` (`src/ZB.MOM.WW.MxGateway.Server/Dashboard/DashboardSessionAdminService.cs`) picked up a class-level `<summary>` + `<remarks>` describing the per-page audit-log seam, plus `<inheritdoc />` on each public method. Pure documentation change; no test (the behavioral contracts the docs describe are already exercised by the existing `DashboardSessionAdminServiceTests` cases).
### Server-050
| Field | Value |
@@ -905,7 +917,7 @@ The user-visible difference: rotating/revoking/deleting a key vs closing/killing
| Severity | Low |
| Category | Error handling & resilience |
| Location | `src/ZB.MOM.WW.MxGateway.Server/Dashboard/DashboardSessionAdminService.cs:42-75,92-125` |
| Status | Open |
| Status | Resolved |
**Description:** `CloseSessionAsync` and `KillWorkerAsync` catch only `SessionManagerException` (the `SessionNotFound` filter, then a general `SessionManagerException` catch). Anything else propagates raw to Blazor's error boundary. The propagation paths exist:
@@ -915,3 +927,5 @@ The user-visible difference: rotating/revoking/deleting a key vs closing/killing
Today neither call site has a Blazor error boundary, so an unhandled exception lands as a generic Blazor circuit error page. The friendlier-error contract that Server-044's commit message advertises ("audit-logs, friendly errors") is incomplete: only `SessionManagerException` gets a friendly error.
**Recommendation:** Add a general `catch (Exception exception)` after the `SessionManagerException` catch in both `CloseSessionAsync` and `KillWorkerAsync`, log a warning (matching the SessionManagerException pattern), and return `DashboardSessionAdminResult.Fail($"{operation} failed unexpectedly. See the gateway log for details.")`. This makes the result type truly the only output the page sees. Add a regression test using a `ThrowingSessionManager` that throws e.g. `InvalidOperationException` from `KillWorkerAsync` and asserts the service returns a failing result rather than propagating.
**Resolution:** 2026-05-24 — Added the recommended general `catch (Exception)` arms to both `DashboardSessionAdminService.CloseSessionAsync` and `KillWorkerAsync` (`src/ZB.MOM.WW.MxGateway.Server/Dashboard/DashboardSessionAdminService.cs`), placed after the `SessionManagerException` catches and behind a `catch (OperationCanceledException) when (cancellationToken.IsCancellationRequested) throw;` so caller cancellation still propagates cleanly. The new catches log a warning with actor + session id and return `DashboardSessionAdminResult.Fail("{Operation} failed unexpectedly for session {SessionId}. See the gateway log for details.")`, mirroring the SessionManagerException pattern. Regression tests in `src/ZB.MOM.WW.MxGateway.Tests/Gateway/Dashboard/DashboardSessionAdminServiceTests.cs`: `CloseSessionAsync_WhenManagerThrowsUnexpected_ReturnsFriendlyFail` (the `ISessionManager` throws `InvalidOperationException("unexpected")`) and `KillWorkerAsync_WhenManagerThrowsUnexpected_ReturnsFriendlyFail` (throws `IOException("pipe broken")`); both assert the service returns a failing result with a non-blank message rather than propagating. The fake's new `CloseThrowsUnexpected` / `KillThrowsUnexpected` properties hold the configured exception. Confirmed to fail before the fix (raw exception propagated) and pass after.
@@ -328,9 +328,18 @@ else
return;
}
// Server-047: align the pending-action lifecycle with SessionsPage / SessionDetailsPage —
// hold PendingAction while the awaited action runs so the shared ConfirmDialog can render
// its in-flight (IsBusy) state, then clear in finally regardless of outcome.
Func<System.Security.Claims.ClaimsPrincipal, Task<DashboardApiKeyManagementResult>> action = PendingAction.Action;
PendingAction = null;
await RunManagementActionAsync(action).ConfigureAwait(false);
try
{
await RunManagementActionAsync(action).ConfigureAwait(false);
}
finally
{
PendingAction = null;
}
}
private sealed record PendingConfirm(
@@ -5,6 +5,19 @@ using ZB.MOM.WW.MxGateway.Server.Sessions;
namespace ZB.MOM.WW.MxGateway.Server.Dashboard;
/// <summary>
/// Default implementation of <see cref="IDashboardSessionAdminService"/>: gates
/// destructive session actions on the <see cref="DashboardRoles.Admin"/> role,
/// audit-logs successful operations, and converts <see cref="SessionManagerException"/>
/// (and any other unexpected exceptions) into <see cref="DashboardSessionAdminResult.Fail(string)"/>
/// so the Blazor pages never see a raw exception.
/// </summary>
/// <remarks>
/// The constant <c>dashboard-admin-kill</c> is the reason passed to
/// <see cref="ISessionManager.KillWorkerAsync"/> and forwarded as the
/// <c>reason</c> tag on the <c>mxgateway.workers.killed</c> counter and in
/// the worker-kill audit log entries.
/// </remarks>
public sealed class DashboardSessionAdminService(
ISessionManager sessionManager,
IHttpContextAccessor httpContextAccessor,
@@ -16,6 +29,7 @@ public sealed class DashboardSessionAdminService(
private readonly ILogger<DashboardSessionAdminService> _logger =
logger ?? NullLogger<DashboardSessionAdminService>.Instance;
/// <inheritdoc />
public bool CanManage(ClaimsPrincipal user)
{
ArgumentNullException.ThrowIfNull(user);
@@ -24,6 +38,7 @@ public sealed class DashboardSessionAdminService(
&& user.IsInRole(DashboardRoles.Admin);
}
/// <inheritdoc />
public async Task<DashboardSessionAdminResult> CloseSessionAsync(
ClaimsPrincipal user,
string sessionId,
@@ -72,8 +87,27 @@ public sealed class DashboardSessionAdminService(
return DashboardSessionAdminResult.Fail(
$"Close failed: {exception.Message}");
}
catch (OperationCanceledException) when (cancellationToken.IsCancellationRequested)
{
throw;
}
catch (Exception exception)
{
// Server-050: any non-SessionManagerException (e.g. an IOException or
// InvalidOperationException from the session DisposeAsync / pipe teardown
// path) used to propagate raw into Blazor's error boundary. Convert it to
// a friendly failure so the Razor pages see only DashboardSessionAdminResult.
_logger.LogWarning(
exception,
"Dashboard admin {Actor} close failed unexpectedly for session {SessionId}.",
actor,
sessionId);
return DashboardSessionAdminResult.Fail(
$"Close failed unexpectedly for session {sessionId}. See the gateway log for details.");
}
}
/// <inheritdoc />
public async Task<DashboardSessionAdminResult> KillWorkerAsync(
ClaimsPrincipal user,
string sessionId,
@@ -122,6 +156,26 @@ public sealed class DashboardSessionAdminService(
return DashboardSessionAdminResult.Fail(
$"Kill failed: {exception.Message}");
}
catch (OperationCanceledException) when (cancellationToken.IsCancellationRequested)
{
throw;
}
catch (Exception exception)
{
// Server-050: any non-SessionManagerException (e.g. an IOException from
// worker pipe teardown surfacing through session.DisposeAsync, or an
// InvalidOperationException from a corrupted worker handle) used to
// propagate raw into Blazor's error boundary. Convert it to a friendly
// failure so the page renders the ResultMessage rather than the circuit
// error page.
_logger.LogWarning(
exception,
"Dashboard admin {Actor} kill failed unexpectedly for session {SessionId}.",
actor,
sessionId);
return DashboardSessionAdminResult.Fail(
$"Kill failed unexpectedly for session {sessionId}. See the gateway log for details.");
}
}
private static string ResolveActor(ClaimsPrincipal user)
@@ -2,15 +2,82 @@ using System.Security.Claims;
namespace ZB.MOM.WW.MxGateway.Server.Dashboard;
/// <summary>
/// Dashboard surface for the destructive session-management actions —
/// Close (graceful shutdown) and Kill (force-terminate) — exposed by the
/// Admin role.
/// </summary>
/// <remarks>
/// The dashboard binds the destructive Close/Kill UI to this service so
/// the underlying <see cref="Sessions.ISessionManager"/> calls flow through
/// a single audited and role-gated entry point. All operations are gated
/// by <see cref="CanManage"/>; non-Admin callers are rejected with a
/// <c>Succeeded = false</c> result rather than throwing. Missing sessions
/// also surface as <see cref="DashboardSessionAdminResult.Fail(string)"/> so
/// Razor pages can render the message without an error boundary. Each
/// successful call is logged at Information including the acting user
/// (from <see cref="ClaimsPrincipal.Identity"/>'s name) and the remote
/// address resolved from <see cref="IHttpContextAccessor"/>.
/// </remarks>
public interface IDashboardSessionAdminService
{
/// <summary>
/// Returns whether the given principal is authorized to perform
/// destructive session-management actions.
/// </summary>
/// <remarks>
/// Requires <see cref="System.Security.Principal.IIdentity.IsAuthenticated"/>
/// to be true and membership in the
/// <see cref="DashboardRoles.Admin"/> role. Pages typically gate the
/// Close/Kill buttons on this value at render time so non-Admin
/// viewers never see them.
/// </remarks>
/// <param name="user">Caller principal.</param>
/// <returns><c>true</c> when the caller may close or kill sessions; otherwise <c>false</c>.</returns>
bool CanManage(ClaimsPrincipal user);
/// <summary>
/// Closes the given session gracefully (worker is given the configured
/// shutdown grace period before being terminated).
/// </summary>
/// <remarks>
/// Returns <see cref="DashboardSessionAdminResult.Fail(string)"/>
/// when the caller is not Admin, when the session id is blank, or when
/// <see cref="Sessions.ISessionManager.CloseSessionAsync"/> raises a
/// <see cref="Sessions.SessionManagerException"/> (including the
/// <see cref="Sessions.SessionManagerErrorCode.SessionNotFound"/>
/// case). Successful closes are audit-logged with the caller name,
/// session id, and remote address.
/// </remarks>
/// <param name="user">Caller principal.</param>
/// <param name="sessionId">Session identifier to close.</param>
/// <param name="cancellationToken">Cancellation token.</param>
/// <returns>Result describing success/failure and a user-facing message.</returns>
Task<DashboardSessionAdminResult> CloseSessionAsync(
ClaimsPrincipal user,
string sessionId,
CancellationToken cancellationToken);
/// <summary>
/// Force-terminates the worker process backing the given session without
/// attempting a graceful shutdown.
/// </summary>
/// <remarks>
/// Invoked from the dashboard Kill button. Uses the
/// <c>dashboard-admin-kill</c> reason constant — that string reaches
/// the audit log and the <c>mxgateway.workers.killed</c> metric tag.
/// Returns <see cref="DashboardSessionAdminResult.Fail(string)"/> for
/// non-Admin callers, blank session ids, or any
/// <see cref="Sessions.SessionManagerException"/> from the underlying
/// manager (the <see cref="Sessions.SessionManagerErrorCode.SessionNotFound"/>
/// case is recognized and reported as "not found"). Successful kills
/// are audit-logged with the caller name, session id, and remote
/// address.
/// </remarks>
/// <param name="user">Caller principal.</param>
/// <param name="sessionId">Session identifier to kill.</param>
/// <param name="cancellationToken">Cancellation token.</param>
/// <returns>Result describing success/failure and a user-facing message.</returns>
Task<DashboardSessionAdminResult> KillWorkerAsync(
ClaimsPrincipal user,
string sessionId,
@@ -840,6 +840,45 @@ public sealed class GatewaySession
TransitionTo(SessionState.Closed);
}
/// <summary>
/// Terminates the worker process immediately while holding the per-session
/// close lock so concurrent close/kill callers serialize. Returns the
/// session state observed at the start of the call so the caller can
/// dedup metric accounting (e.g. only record <c>SessionClosed</c> when
/// the session was not already closed).
/// </summary>
/// <remarks>
/// Mirrors <see cref="CloseAsync"/>'s use of <c>_closeLock</c> so that
/// a Close in flight from one caller and a Kill from another do not
/// race on the "was the session already closed" observation that
/// drives metric increments (Server-045).
/// </remarks>
/// <param name="reason">Reason for killing the worker.</param>
/// <param name="cancellationToken">Cancellation token.</param>
/// <returns><c>true</c> if the session was already <see cref="SessionState.Closed"/> when the lock was acquired; otherwise <c>false</c>.</returns>
public async ValueTask<bool> KillWorkerWithCloseGateAsync(
string reason,
CancellationToken cancellationToken)
{
await _closeLock.WaitAsync(cancellationToken).ConfigureAwait(false);
try
{
bool wasClosed;
lock (_syncRoot)
{
wasClosed = _state == SessionState.Closed;
}
_workerClient?.Kill(reason);
TransitionTo(SessionState.Closed);
return wasClosed;
}
finally
{
_closeLock.Release();
}
}
/// <summary>
/// Disposes the session and frees associated resources.
/// </summary>
@@ -222,16 +222,26 @@ public sealed class SessionManager : ISessionManager
cancellationToken.ThrowIfCancellationRequested();
GatewaySession session = GetRequiredSession(sessionId);
bool wasClosed = session.State == SessionState.Closed;
// Serialize concurrent kill/close attempts on this session by routing through the
// per-session close lock (Server-045). Returns whether the session was already in
// Closed state when the lock was acquired so the metric counter is incremented at
// most once across concurrent callers.
bool wasClosed;
try
{
session.KillWorker(reason);
wasClosed = await session.KillWorkerWithCloseGateAsync(reason, cancellationToken).ConfigureAwait(false);
}
catch (Exception exception)
{
session.MarkFaulted(exception.Message);
_metrics.Fault(SessionManagerErrorCode.CloseFailed.ToString());
// Server-044: the open-session gauge was incremented in OpenSessionAsync;
// every session reaching KillWorkerAsync had SessionOpened recorded. If the
// kill path throws, decrement the gauge here so mxgateway.sessions.open
// does not leak — mirroring the Server-006 fix on OpenSessionAsync.
_metrics.SessionRemoved();
await RemoveSessionAsync(session).ConfigureAwait(false);
throw new SessionManagerException(
SessionManagerErrorCode.CloseFailed,
@@ -297,10 +307,24 @@ public sealed class SessionManager : ISessionManager
exception,
"Graceful shutdown failed for session {SessionId}; killing worker.",
session.SessionId);
// Defensive fallback: CloseSessionCoreAsync's inner SessionCloseStartedException
// catch normally removes the session and accounts the close (Server-046). The
// outer fallback only fires for sessions still in the registry — route through
// KillWorkerAsync so the bookkeeping is identical to the dashboard kill path.
if (_registry.TryGet(session.SessionId, out _))
{
session.KillWorker(GatewayShutdownReason);
await RemoveSessionAsync(session).ConfigureAwait(false);
try
{
await KillWorkerAsync(session.SessionId, GatewayShutdownReason, cancellationToken).ConfigureAwait(false);
}
catch (SessionManagerException killException)
{
_logger.LogWarning(
killException,
"Worker kill fallback failed for session {SessionId}.",
session.SessionId);
}
}
}
}
@@ -332,7 +356,12 @@ public sealed class SessionManager : ISessionManager
session.MarkFaulted(exception.Message);
if (!wasClosed)
{
_metrics.SessionRemoved();
// Server-046: account the close as a SessionClosed (decrements the open-session
// gauge AND increments the sessions.closed counter), not just SessionRemoved.
// The session is being removed from the registry below; treating this as a
// half-finished close that only decremented the gauge under-counted the closed
// counter.
_metrics.SessionClosed();
}
_metrics.Fault(SessionManagerErrorCode.CloseFailed.ToString());
@@ -115,6 +115,52 @@ public sealed class DashboardSessionAdminServiceTests
Assert.True(service.CanManage(CreateUser(DashboardRoles.Admin)));
}
/// <summary>
/// Regression for Server-050: an unexpected (non-<see cref="SessionManagerException"/>)
/// exception from <c>CloseSessionAsync</c> — e.g. an <see cref="InvalidOperationException"/>
/// or <see cref="IOException"/> surfaced from <c>RemoveSessionAsync</c>/<c>DisposeAsync</c> —
/// must be converted to a friendly <see cref="DashboardSessionAdminResult.Fail(string)"/>
/// rather than propagating raw into Blazor's error boundary.
/// </summary>
[Fact]
public async Task CloseSessionAsync_WhenManagerThrowsUnexpected_ReturnsFriendlyFail()
{
FakeSessionManager sessionManager = new()
{
CloseThrowsUnexpected = new InvalidOperationException("unexpected"),
};
DashboardSessionAdminService service = CreateService(sessionManager);
DashboardSessionAdminResult result = await service.CloseSessionAsync(
CreateUser(DashboardRoles.Admin),
"session-1",
CancellationToken.None);
Assert.False(result.Succeeded);
Assert.False(string.IsNullOrWhiteSpace(result.Message));
}
/// <summary>
/// Regression for Server-050: same friendly-fail contract for the Kill path.
/// </summary>
[Fact]
public async Task KillWorkerAsync_WhenManagerThrowsUnexpected_ReturnsFriendlyFail()
{
FakeSessionManager sessionManager = new()
{
KillThrowsUnexpected = new IOException("pipe broken"),
};
DashboardSessionAdminService service = CreateService(sessionManager);
DashboardSessionAdminResult result = await service.KillWorkerAsync(
CreateUser(DashboardRoles.Admin),
"session-1",
CancellationToken.None);
Assert.False(result.Succeeded);
Assert.False(string.IsNullOrWhiteSpace(result.Message));
}
private static DashboardSessionAdminService CreateService(ISessionManager sessionManager)
{
DefaultHttpContext httpContext = new();
@@ -150,6 +196,10 @@ public sealed class DashboardSessionAdminServiceTests
public bool CloseThrowsNotFound { get; init; }
public Exception? CloseThrowsUnexpected { get; init; }
public Exception? KillThrowsUnexpected { get; init; }
public Task<GatewaySession> OpenSessionAsync(
SessionOpenRequest request,
string? clientIdentity,
@@ -194,6 +244,11 @@ public sealed class DashboardSessionAdminServiceTests
$"Session {sessionId} was not found.");
}
if (CloseThrowsUnexpected is not null)
{
throw CloseThrowsUnexpected;
}
return Task.FromResult(new SessionCloseResult(sessionId, SessionState.Closed, AlreadyClosed: false));
}
@@ -205,6 +260,11 @@ public sealed class DashboardSessionAdminServiceTests
KillCount++;
LastKilledSessionId = sessionId;
LastKillReason = reason;
if (KillThrowsUnexpected is not null)
{
throw KillThrowsUnexpected;
}
return Task.FromResult(new SessionCloseResult(sessionId, SessionState.Closed, AlreadyClosed: false));
}
@@ -410,7 +410,10 @@ public sealed class SessionManagerTests
Assert.Equal(1, failingWorkerClient.KillCount);
Assert.Equal(1, failingWorkerClient.DisposeCount);
GatewayMetricsSnapshot snapshot = metrics.GetSnapshot();
Assert.Equal(0, snapshot.SessionsClosed);
// Server-046: a close-that-failed now accounts as SessionClosed (counter += 1) rather
// than SessionRemoved (gauge -= 1, counter unchanged). The session is being removed
// from the registry on this path, so it must show up in the closed count.
Assert.Equal(1, snapshot.SessionsClosed);
Assert.False(snapshot.EventsBySession.ContainsKey(firstSession.SessionId));
Assert.Equal(1, snapshot.OpenSessions);
}
@@ -495,6 +498,110 @@ public sealed class SessionManagerTests
Assert.Equal(SessionManagerErrorCode.SessionNotFound, exception.ErrorCode);
}
/// <summary>
/// Regression for Server-044: when <c>session.KillWorker</c> throws, the catch path must still
/// decrement <c>mxgateway.sessions.open</c> (parity with the Server-006 fix in
/// <c>OpenSessionAsync</c>). Without the fix the gauge leaks one open session per failed kill.
/// </summary>
[Fact]
public async Task KillWorkerAsync_WhenSessionKillThrows_DecrementsOpenSessionGauge()
{
FakeWorkerClient workerClient = new()
{
KillException = new InvalidOperationException("worker kill failed"),
};
using GatewayMetrics metrics = new();
SessionManager manager = CreateManager(
new FakeSessionWorkerClientFactory(workerClient),
metrics: metrics);
GatewaySession session = await manager.OpenSessionAsync(
CreateOpenRequest(),
"client-1",
CancellationToken.None);
Assert.Equal(1, metrics.GetSnapshot().OpenSessions);
SessionManagerException exception = await Assert.ThrowsAsync<SessionManagerException>(
async () => await manager.KillWorkerAsync(session.SessionId, "test-kill", CancellationToken.None));
Assert.Equal(SessionManagerErrorCode.CloseFailed, exception.ErrorCode);
Assert.False(manager.TryGetSession(session.SessionId, out _));
Assert.Equal(0, metrics.GetSnapshot().OpenSessions);
Assert.True(metrics.GetSnapshot().Faults > 0);
}
/// <summary>
/// Regression for Server-045 / Server-048: concurrent kills on the same session must not
/// double-increment <c>mxgateway.sessions.closed</c>. The first kill wins, the second
/// observes <c>wasClosed == true</c> (or a missing session after removal) and short-circuits.
/// </summary>
[Fact]
public async Task KillWorkerAsync_ConcurrentCallsOnSameSession_CountClosedExactlyOnce()
{
FakeWorkerClient workerClient = new();
using GatewayMetrics metrics = new();
SessionManager manager = CreateManager(
new FakeSessionWorkerClientFactory(workerClient),
metrics: metrics);
GatewaySession session = await manager.OpenSessionAsync(
CreateOpenRequest(),
"client-1",
CancellationToken.None);
Task<SessionCloseResult> first = manager.KillWorkerAsync(session.SessionId, "kill-a", CancellationToken.None);
Task<SessionCloseResult> second = Task.Run(async () =>
{
try
{
return await manager.KillWorkerAsync(session.SessionId, "kill-b", CancellationToken.None);
}
catch (SessionManagerException missing) when (missing.ErrorCode == SessionManagerErrorCode.SessionNotFound)
{
return new SessionCloseResult(session.SessionId, SessionState.Closed, AlreadyClosed: true);
}
});
await Task.WhenAll(first, second);
Assert.Equal(1, metrics.GetSnapshot().SessionsClosed);
Assert.Equal(0, metrics.GetSnapshot().OpenSessions);
Assert.False(manager.TryGetSession(session.SessionId, out _));
}
/// <summary>
/// Regression for Server-046: <c>ShutdownAsync</c>'s graceful-close fallback (which calls
/// <c>KillWorker</c> + <c>RemoveSessionAsync</c> when <c>CloseSessionCoreAsync</c> throws)
/// must still account a successful close: both the open-session gauge must drop to zero AND
/// the <c>mxgateway.sessions.closed</c> counter must increment. Without the fix, the
/// graceful-close failure path under-counts the closed counter.
/// </summary>
[Fact]
public async Task ShutdownAsync_WhenSessionCloseThrows_StillDecrementsOpenSessionGaugeAndIncrementsClosedCounter()
{
FakeWorkerClient throwingClient = new()
{
ShutdownException = new InvalidOperationException("worker shutdown failed"),
};
using GatewayMetrics metrics = new();
SessionManager manager = CreateManager(
new FakeSessionWorkerClientFactory(throwingClient),
metrics: metrics);
GatewaySession session = await manager.OpenSessionAsync(
CreateOpenRequest(),
"client-1",
CancellationToken.None);
Assert.Equal(1, metrics.GetSnapshot().OpenSessions);
await manager.ShutdownAsync(CancellationToken.None);
// After shutdown, regardless of whether the graceful close path or the kill fallback ran,
// the open-session gauge must be zero and the closed counter must be incremented.
Assert.Equal(0, metrics.GetSnapshot().OpenSessions);
Assert.Equal(1, metrics.GetSnapshot().SessionsClosed);
Assert.False(manager.TryGetSession(session.SessionId, out _));
}
/// <summary>Verifies that when worker creation fails, the session is removed from the registry.</summary>
[Fact]
public async Task OpenSessionAsync_WhenWorkerCreationFails_RemovesSessionFromRegistry()
@@ -726,6 +833,9 @@ public sealed class SessionManagerTests
/// <summary>Gets the exception to throw when shutdown is called, if any.</summary>
public Exception? ShutdownException { get; init; }
/// <summary>Gets the exception to throw when kill is called, if any.</summary>
public Exception? KillException { get; init; }
/// <summary>Gets a value indicating whether to block shutdown on the fake worker client.</summary>
public bool BlockShutdown { get; init; }
@@ -803,6 +913,11 @@ public sealed class SessionManagerTests
public void Kill(string reason)
{
KillCount++;
if (KillException is not null)
{
throw KillException;
}
State = WorkerClientState.Faulted;
}