Second re-review pass at commit a020350 caught 48 new findings — including
one High-severity regression I introduced in the prior sweep — and fixed
them all in one parallel wave.
High (1)
- Client.Python-018: prior sweep set `license = "Proprietary"` in
pyproject.toml. setuptools >= 77 enforces PEP 639 and rejects the
string (it must be a valid SPDX expression), so `pip wheel .` and
`pip install -e .` both fail before any source compiles. Tests
still pass because pytest bypasses the build backend via
`pythonpath`. Dropped the invalid license string, kept the
`License :: Other/Proprietary License` classifier, and added
`tests/test_packaging.py` so a future regression of the same shape
is caught in CI.
Mediums (6)
- Worker-023: `HeartbeatStuckCeiling` (default 75s = 5x HeartbeatGrace)
on WorkerPipeSessionOptions bounds the in-flight-command watchdog
suppression so a truly stuck COM call still triggers StaHung
instead of permanently defeating the watchdog.
- Client.Rust-018: reverted Rust's `latencyMs` split so the
cross-language bench comparison is apples-to-apples again;
`failureLatencyMs` kept as Rust-only enrichment.
- Client.Java-021: applied Client.Java-002's terminal-state
serialisation pattern to DeployEventStream so close() arriving
after queue-overflow can't erase the overflow exception.
- IntegrationTests-017: teardown-parity test now uses a two-window
stability check after UnAdvise instead of strict equality against
the pre-UnAdvise count (which raced against in-flight events).
- IntegrationTests-019: new RecordingTestOutputHelper wraps every
log sink the WriteSecured live test owns (worker stdout/stderr,
gateway logs, direct WriteLine) so the credential is proven
absent from the full output buffer, not just the diagnostic
message.
- Tests-020: added MxAccessGatewayServiceConstraintTests coverage
for the previously-uncovered Write2Bulk and WriteSecured2Bulk
arms of WriteBulkConstraintPlan.SetPayload.
Lows (41 — highlights)
- Server: Galaxy glob cache eviction is race-free (Server-024);
GalaxyRepositoryGrpcService takes IGalaxyRepository (Server-025);
AlarmsOptions validated at startup (Server-026); Authorization.md
Constraint Enforcement snippet/prose enumerate the bulk write/read
family (Server-027); bulk-read-commands and bulk-write-commands
capability tokens added to OpenSession (Server-029);
NotWiredAlarmRpcDispatcher XML doc and missing scope-resolver and
state-machine tests cleaned up (023, 028).
- Worker: AlarmCommandHandler now invokes the same STA-affinity
guard the poll path uses, at every command entry (Worker-024);
RunAsync null-checks the runtime-session factory result
(Worker-025).
- Worker.Tests: shared LiveMxAccessOptInVariableName lives on
GatewayContractInfo (Worker.Tests-025); MxAccessSession.CreateForTesting
rejects production sinks (Worker.Tests-026); FakeRuntimeSession's
CancelCommandReturnValue serialised under lock (Worker.Tests-027);
Probes namespace lifted to MxGateway.Worker.Tests.Probes
(Worker.Tests-029); cancel-envelope sequence numbers monotonised
(Worker.Tests-030); docs/GatewayTesting.md gains a "Dev-rig Probes"
section (Worker.Tests-028).
- Tests: ManualTimeProvider consolidated into one TestSupport/ copy
(Tests-021); SessionManagerBulkTests adds a mid-flight cancellation
test backed by a TaskCompletionSource fake (Tests-022); companion
FakeWorkerProcess.WaitForExitAsync no longer fakes its exit signal
(Tests-023); constraint plan reply-count divergence pinned
(Tests-024).
- IntegrationTests: TryGetSession chain carries [MaybeNullWhen(false)]
end-to-end (IntegrationTests-018); abnormal-exit keyword set
tightened to pipe-disconnected/end-of-stream and the test now
asserts streamTask.IsFaulted (020, 021).
- Client.Dotnet: bench commands added to isLongRunning so the
default 30s wall-clock budget doesn't kill them (015);
BenchStreamEventsAsync observes the inner stream task on every
exit path (016).
- Client.Go: parseValue wraps strconv errors with flag context and
%w (017); bench loops honour ctx.Done() (018); galaxy-watch parses
RFC3339Nano with fractional seconds (019); runStreamEvents installs
signal.NotifyContext like runGalaxyWatch (020); five new CLI-level
table-driven tests cover the bulk/bench subcommands (021).
- Client.Java: toCompletable Javadoc rewritten to match the actual
cancellation contract Client.Java-015 established (022); stream-events
text path uses Long.toUnsignedString for worker_sequence (023);
bench-read-bulk no longer pollutes success-latency histogram with
failure durations (024); --shutdown-timeout CLI option propagates
through to ClientOptions (025); seven new MxGatewayCliTests cover
the bulk and bench commands (026).
- Client.Python: mxgateway_cli ships its own py.typed marker (019);
wheel-build smoke test added under tests/test_packaging.py (020);
README documents the Galaxy CLI parity gap explicitly (021).
- Client.Rust: RustClientDesign.md signatures match session.rs and
document the AsRef<str> read_bulk genericism (019);
next_correlation_id re-exported at the crate root, with a
property-style doc contract and an explicit disclaimer that the
literal textual format is not part of the contract (020).
- Contracts: BulkWriteResult comment names the actual
IConstraintEnforcer mechanism instead of "tag-allowlist filter"
(014); BulkReadResult gains explicit per-arm payload-population
documentation for the success vs failure cases (015).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -1,5 +1,7 @@
|
||||
using System.Collections.Concurrent;
|
||||
using System.Diagnostics;
|
||||
using System.Diagnostics.CodeAnalysis;
|
||||
using System.Text;
|
||||
using Google.Protobuf.WellKnownTypes;
|
||||
using Grpc.Core;
|
||||
using Microsoft.Extensions.Logging;
|
||||
@@ -357,14 +359,6 @@ public sealed class WorkerLiveMxAccessSmokeTests(ITestOutputHelper output)
|
||||
.ConfigureAwait(false);
|
||||
LogEvent(firstDataChange);
|
||||
|
||||
// RecordingServerStreamWriter.Messages returns a snapshot copy under its own
|
||||
// lock, so iterating after each teardown step is safe without external sync.
|
||||
int dataChangeCountBeforeUnadvise = CountMatchingEvents(
|
||||
eventWriter,
|
||||
e => e.Family == MxEventFamily.OnDataChange
|
||||
&& e.ServerHandle == serverHandle
|
||||
&& e.ItemHandle == itemHandle);
|
||||
|
||||
// 1) UnAdvise — must reply Ok; the worker must stop emitting OnDataChange
|
||||
// for this (server, item) pair after this returns.
|
||||
MxCommandReply unadviseReply = await fixture.Service.Invoke(
|
||||
@@ -390,21 +384,33 @@ public sealed class WorkerLiveMxAccessSmokeTests(ITestOutputHelper output)
|
||||
Assert.Equal(ProtocolStatusCode.Ok, unregisterReply.ProtocolStatus.Code);
|
||||
Assert.Equal(MxCommandKind.Unregister, unregisterReply.Kind);
|
||||
|
||||
// Allow a short settle window for any in-flight OnDataChange to drain, then
|
||||
// assert no further events arrived for the un-advised (serverHandle, itemHandle).
|
||||
// MXAccess parity: after UnAdvise the provider must stop publishing OnDataChange
|
||||
// for this item — a regression that left a stale subscription alive would surface
|
||||
// as additional events after this delay.
|
||||
// Parity rule: after UnAdvise returns Ok the worker must stop emitting
|
||||
// OnDataChange for this (server, item) pair. Events the provider already
|
||||
// published before that ack are in-flight and not a regression — the rule
|
||||
// only constrains events generated AFTER the teardown returned. So the
|
||||
// "before" baseline is taken *after* a first settle window drains those
|
||||
// in-flight events, not before UnAdvise was issued (which races against
|
||||
// the round-trip + STA dispatch + pipe send window — see IntegrationTests-017).
|
||||
//
|
||||
// RecordingServerStreamWriter.Messages returns a snapshot copy under its
|
||||
// own lock, so iterating after each settle window is safe without external
|
||||
// sync.
|
||||
await Task.Delay(TimeSpan.FromMilliseconds(500)).ConfigureAwait(false);
|
||||
int dataChangeCountAfterFirstSettle = CountMatchingEvents(
|
||||
eventWriter,
|
||||
e => e.Family == MxEventFamily.OnDataChange
|
||||
&& e.ServerHandle == serverHandle
|
||||
&& e.ItemHandle == itemHandle);
|
||||
|
||||
int dataChangeCountAfterTeardown = CountMatchingEvents(
|
||||
await Task.Delay(TimeSpan.FromMilliseconds(500)).ConfigureAwait(false);
|
||||
int dataChangeCountAfterSecondSettle = CountMatchingEvents(
|
||||
eventWriter,
|
||||
e => e.Family == MxEventFamily.OnDataChange
|
||||
&& e.ServerHandle == serverHandle
|
||||
&& e.ItemHandle == itemHandle);
|
||||
output.WriteLine(
|
||||
$"DataChange count before UnAdvise={dataChangeCountBeforeUnadvise} after teardown+settle={dataChangeCountAfterTeardown}");
|
||||
Assert.Equal(dataChangeCountBeforeUnadvise, dataChangeCountAfterTeardown);
|
||||
$"DataChange count after first settle={dataChangeCountAfterFirstSettle} after second settle={dataChangeCountAfterSecondSettle}");
|
||||
Assert.Equal(dataChangeCountAfterFirstSettle, dataChangeCountAfterSecondSettle);
|
||||
|
||||
// A RemoveItem against the just-freed item handle must not silently succeed —
|
||||
// the worker has to relay MXAccess's invalid-handle response. Closing the
|
||||
@@ -438,8 +444,16 @@ public sealed class WorkerLiveMxAccessSmokeTests(ITestOutputHelper output)
|
||||
File.Exists(workerExecutablePath),
|
||||
$"Live MXAccess worker executable was not found at {workerExecutablePath}. Build the worker or set {IntegrationTestEnvironment.LiveMxAccessWorkerExecutableVariableName}.");
|
||||
|
||||
TestWorkerProcessFactory processFactory = new(output);
|
||||
await using GatewayServiceFixture fixture = new(workerExecutablePath, processFactory, output);
|
||||
// IntegrationTests-019: CLAUDE.md's credential-redaction rule covers every log
|
||||
// surface the test sees, not just the reply's DiagnosticMessage. Wire a buffering
|
||||
// wrapper around output and route the worker stdout/stderr echo and the gateway
|
||||
// ILogger sink through it so the post-run assertion covers the accumulated test
|
||||
// output. A regression that logged the request body, the WorkerCommandRequest
|
||||
// envelope, or printed the credential from inside the worker is caught here
|
||||
// even if the bare DiagnosticMessage check still passes.
|
||||
RecordingTestOutputHelper recordedOutput = new(output);
|
||||
TestWorkerProcessFactory processFactory = new(recordedOutput);
|
||||
await using GatewayServiceFixture fixture = new(workerExecutablePath, processFactory, recordedOutput);
|
||||
// Stream events so a regression that emitted an OperationComplete or
|
||||
// OnWriteComplete with wrong handles would still be observable via the test
|
||||
// output (we don't assert a specific event here — the docs note successful
|
||||
@@ -450,6 +464,7 @@ public sealed class WorkerLiveMxAccessSmokeTests(ITestOutputHelper output)
|
||||
string? sessionId = null;
|
||||
Task? streamTask = null;
|
||||
using CancellationTokenSource streamCancellation = new();
|
||||
(string verifyUser, string verifyPassword) = ResolveLiveMxAccessSecuredCredentials();
|
||||
|
||||
try
|
||||
{
|
||||
@@ -473,32 +488,31 @@ public sealed class WorkerLiveMxAccessSmokeTests(ITestOutputHelper output)
|
||||
MxCommandReply registerReply = await fixture.Service.Invoke(
|
||||
CreateRegisterRequest(sessionId),
|
||||
new TestServerCallContext()).ConfigureAwait(false);
|
||||
LogReply("Register", registerReply);
|
||||
LogReplyTo(recordedOutput, "Register", registerReply);
|
||||
Assert.Equal(ProtocolStatusCode.Ok, registerReply.ProtocolStatus.Code);
|
||||
int serverHandle = registerReply.Register.ServerHandle;
|
||||
|
||||
MxCommandReply addItemReply = await fixture.Service.Invoke(
|
||||
CreateAddItemRequest(sessionId, serverHandle),
|
||||
new TestServerCallContext()).ConfigureAwait(false);
|
||||
LogReply("AddItem", addItemReply);
|
||||
LogReplyTo(recordedOutput, "AddItem", addItemReply);
|
||||
Assert.Equal(ProtocolStatusCode.Ok, addItemReply.ProtocolStatus.Code);
|
||||
int itemHandle = addItemReply.AddItem.ItemHandle;
|
||||
|
||||
MxCommandReply adviseReply = await fixture.Service.Invoke(
|
||||
CreateAdviseRequest(sessionId, serverHandle, itemHandle),
|
||||
new TestServerCallContext()).ConfigureAwait(false);
|
||||
LogReply("Advise", adviseReply);
|
||||
LogReplyTo(recordedOutput, "Advise", adviseReply);
|
||||
Assert.Equal(ProtocolStatusCode.Ok, adviseReply.ProtocolStatus.Code);
|
||||
|
||||
// AuthenticateUser resolves an ArchestrA user id for the WriteSecured call.
|
||||
// Credentials are env-overridable so the test honors the gateway's "do not
|
||||
// log secrets" rule and works against either MXAccess's own user store or
|
||||
// the LmxOpcUa-baseline GLAuth-bridged ArchestrA identity (admin/admin123).
|
||||
(string verifyUser, string verifyPassword) = ResolveLiveMxAccessSecuredCredentials();
|
||||
MxCommandReply authReply = await fixture.Service.Invoke(
|
||||
CreateAuthenticateUserRequest(sessionId, serverHandle, verifyUser, verifyPassword),
|
||||
new TestServerCallContext()).ConfigureAwait(false);
|
||||
output.WriteLine(
|
||||
recordedOutput.WriteLine(
|
||||
$"AuthenticateUser status={authReply.ProtocolStatus.Code} hresult={authReply.Hresult} user_id={authReply.AuthenticateUser?.UserId}");
|
||||
|
||||
// AuthenticateUser is allowed to fail (the underlying provider may reject
|
||||
@@ -518,7 +532,7 @@ public sealed class WorkerLiveMxAccessSmokeTests(ITestOutputHelper output)
|
||||
currentUserId,
|
||||
verifierUserId: 0),
|
||||
new TestServerCallContext()).ConfigureAwait(false);
|
||||
LogReply("WriteSecured", writeSecuredReply);
|
||||
LogReplyTo(recordedOutput, "WriteSecured", writeSecuredReply);
|
||||
|
||||
// Parity: the command itself completed its round-trip — the reply kind is
|
||||
// WriteSecured and the gateway protocol status is set. The MXAccess outcome
|
||||
@@ -538,6 +552,13 @@ public sealed class WorkerLiveMxAccessSmokeTests(ITestOutputHelper output)
|
||||
streamCancellation.Cancel();
|
||||
await ShutDownAsync(fixture, processFactory, sessionId, streamTask).ConfigureAwait(false);
|
||||
}
|
||||
|
||||
// CLAUDE.md credential contract: passwords and WriteSecured payloads must never
|
||||
// reach logs. The buffered output covers the gateway ILogger sink, worker
|
||||
// stdout/stderr, and every direct WriteLine the test body issued. A regression
|
||||
// that dumped the request envelope, the AuthenticateUserCommand body, or any
|
||||
// command-level WriteSecured payload would land here and trip this assertion.
|
||||
Assert.DoesNotContain(verifyPassword, recordedOutput.Captured, StringComparison.Ordinal);
|
||||
}
|
||||
|
||||
/// <summary>
|
||||
@@ -611,15 +632,50 @@ public sealed class WorkerLiveMxAccessSmokeTests(ITestOutputHelper output)
|
||||
|
||||
// The fault classification must come from a known worker-client error code so
|
||||
// operators get an actionable cause string rather than an opaque exception
|
||||
// trace. We accept any of the abnormal-exit classifications WorkerClient
|
||||
// routes through SetFaulted on a killed worker.
|
||||
// trace. We accept the classifications WorkerClient actually drives on an
|
||||
// abnormal exit (kill-the-process path): the read loop hits EndOfStream and
|
||||
// calls SetFaulted with WorkerClientErrorCode.PipeDisconnected and the
|
||||
// message "Worker pipe disconnected." (see WorkerClient.cs:378-381). The
|
||||
// earlier broad list (including "worker") matched every WorkerClient fault
|
||||
// message (they all begin with "Worker"); tighten to the pipe/disconnect/
|
||||
// end-of-stream classifications that match THIS path, so a regression that
|
||||
// routed an unrelated fault here would surface as a test failure rather
|
||||
// than silently passing (see IntegrationTests-020). "heartbeat" is dropped
|
||||
// because HeartbeatGraceSeconds (15s) exceeds the StreamShutdownTimeout
|
||||
// (10s) poll window, so a heartbeat-expired transition can never be
|
||||
// observed inside this test.
|
||||
Assert.True(
|
||||
observedFault!.Contains("disconnect", StringComparison.OrdinalIgnoreCase)
|
||||
|| observedFault.Contains("pipe", StringComparison.OrdinalIgnoreCase)
|
||||
|| observedFault.Contains("heartbeat", StringComparison.OrdinalIgnoreCase)
|
||||
|| observedFault.Contains("worker", StringComparison.OrdinalIgnoreCase)
|
||||
observedFault!.Contains("pipe disconnected", StringComparison.OrdinalIgnoreCase)
|
||||
|| observedFault.Contains("end of stream", StringComparison.OrdinalIgnoreCase),
|
||||
$"Fault description '{observedFault}' did not match a known worker-exit classification.");
|
||||
$"Fault description '{observedFault}' did not match a known abnormal-exit classification "
|
||||
+ "(expected 'pipe disconnected' or 'end of stream' from WorkerClient's EndOfStream path).");
|
||||
|
||||
// IntegrationTests-021: also assert the StreamEvents call observed the fault
|
||||
// — the chain that puts the session into Faulted goes through ReadEventsAsync
|
||||
// propagating a WorkerClientException into EventStreamService, which calls
|
||||
// session.MarkFaulted. The gateway then maps the WorkerClientException to an
|
||||
// RpcException at the public boundary (MxAccessGatewayService.MapException →
|
||||
// MapWorkerClientException). Polling session.State alone would silently pass
|
||||
// if a future refactor moved MarkFaulted off the stream-consumption path —
|
||||
// assert the streamTask itself terminated with a fault so the test couples
|
||||
// to the actual fault-propagation path. Compare to the inverse assertion in
|
||||
// the Write parity test (line 217: Assert.False(streamTask.IsFaulted, ...)).
|
||||
try
|
||||
{
|
||||
await streamTask.WaitAsync(StreamShutdownTimeout).ConfigureAwait(false);
|
||||
}
|
||||
catch (Exception streamException)
|
||||
{
|
||||
output.WriteLine($"StreamEvents task terminated with: {streamException.GetType().Name}: {streamException.Message}");
|
||||
}
|
||||
|
||||
Assert.True(
|
||||
streamTask.IsCompleted,
|
||||
"StreamEvents task did not complete within the shutdown timeout after the worker was killed.");
|
||||
Assert.True(
|
||||
streamTask.IsFaulted,
|
||||
"StreamEvents task must fault on abnormal worker exit, not complete cleanly — "
|
||||
+ "the fault-propagation path from WorkerClient.SetFaulted through ReadEventsAsync is the contract.");
|
||||
}
|
||||
finally
|
||||
{
|
||||
@@ -948,12 +1004,20 @@ public sealed class WorkerLiveMxAccessSmokeTests(ITestOutputHelper output)
|
||||
string method,
|
||||
MxCommandReply reply)
|
||||
{
|
||||
output.WriteLine(
|
||||
LogReplyTo(output, method, reply);
|
||||
}
|
||||
|
||||
private static void LogReplyTo(
|
||||
ITestOutputHelper sink,
|
||||
string method,
|
||||
MxCommandReply reply)
|
||||
{
|
||||
sink.WriteLine(
|
||||
$"{method} status={reply.ProtocolStatus.Code} hresult={reply.Hresult} diagnostic={reply.DiagnosticMessage}");
|
||||
|
||||
foreach (MxStatusProxy status in reply.Statuses)
|
||||
{
|
||||
output.WriteLine(
|
||||
sink.WriteLine(
|
||||
$"{method} mxstatus success={status.Success} category={status.Category} detail={status.Detail} text={status.DiagnosticText}");
|
||||
}
|
||||
}
|
||||
@@ -1034,7 +1098,7 @@ public sealed class WorkerLiveMxAccessSmokeTests(ITestOutputHelper output)
|
||||
/// transitions it to Faulted, which the public gRPC API only exposes indirectly via
|
||||
/// CloseSession's reply (and not before a graceful close completes).
|
||||
/// </summary>
|
||||
public bool TryGetSession(string sessionId, out GatewaySession session)
|
||||
public bool TryGetSession(string sessionId, [MaybeNullWhen(false)] out GatewaySession session)
|
||||
{
|
||||
return _registry.TryGet(sessionId, out session);
|
||||
}
|
||||
@@ -1439,6 +1503,56 @@ public sealed class WorkerLiveMxAccessSmokeTests(ITestOutputHelper output)
|
||||
}
|
||||
}
|
||||
|
||||
/// <summary>
|
||||
/// Buffering wrapper around an <see cref="ITestOutputHelper"/> that mirrors every line
|
||||
/// written through it into a <see cref="StringBuilder"/> the test owns. The WriteSecured
|
||||
/// parity test (IntegrationTests-019) uses this to make CLAUDE.md's "passwords and
|
||||
/// <c>WriteSecured</c> payloads must never reach logs" rule a property of the entire
|
||||
/// test output stream — gateway <see cref="ILogger"/> entries (echoed via
|
||||
/// <see cref="TestOutputLoggerProvider"/>), worker stdout/stderr (echoed via
|
||||
/// <see cref="TestWorkerProcessFactory.WriteWorkerOutput"/>), and direct
|
||||
/// <c>output.WriteLine</c> calls all land in the same buffer, so a future maintenance
|
||||
/// change that prints a credential through any of those channels is caught by the
|
||||
/// assertion rather than slipping past the existing <c>DiagnosticMessage</c> check.
|
||||
/// </summary>
|
||||
private sealed class RecordingTestOutputHelper(ITestOutputHelper inner) : ITestOutputHelper
|
||||
{
|
||||
private readonly StringBuilder buffer = new();
|
||||
private readonly object syncRoot = new();
|
||||
|
||||
public string Captured
|
||||
{
|
||||
get
|
||||
{
|
||||
lock (syncRoot)
|
||||
{
|
||||
return buffer.ToString();
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
public void WriteLine(string message)
|
||||
{
|
||||
lock (syncRoot)
|
||||
{
|
||||
buffer.AppendLine(message);
|
||||
}
|
||||
|
||||
inner.WriteLine(message);
|
||||
}
|
||||
|
||||
public void WriteLine(string format, params object[] args)
|
||||
{
|
||||
string formatted = string.Format(System.Globalization.CultureInfo.InvariantCulture, format, args);
|
||||
lock (syncRoot)
|
||||
{
|
||||
buffer.AppendLine(formatted);
|
||||
}
|
||||
|
||||
inner.WriteLine(format, args);
|
||||
}
|
||||
}
|
||||
|
||||
private sealed class AllowAllConstraintEnforcer : IConstraintEnforcer
|
||||
{
|
||||
public Task<ConstraintFailure?> CheckReadTagAsync(
|
||||
|
||||
Reference in New Issue
Block a user