Code-review 2026-05-20 sweep #2: re-review at a020350, resolve 48 findings

Second re-review pass at commit a020350 caught 48 new findings — including
one High-severity regression I introduced in the prior sweep — and fixed
them all in one parallel wave.

High (1)
- Client.Python-018: prior sweep set `license = "Proprietary"` in
  pyproject.toml. setuptools >= 77 enforces PEP 639 and rejects the
  string (it must be a valid SPDX expression), so `pip wheel .` and
  `pip install -e .` both fail before any source compiles. Tests
  still pass because pytest bypasses the build backend via
  `pythonpath`. Dropped the invalid license string, kept the
  `License :: Other/Proprietary License` classifier, and added
  `tests/test_packaging.py` so a future regression of the same shape
  is caught in CI.

Mediums (6)
- Worker-023: `HeartbeatStuckCeiling` (default 75s = 5x HeartbeatGrace)
  on WorkerPipeSessionOptions bounds the in-flight-command watchdog
  suppression so a truly stuck COM call still triggers StaHung
  instead of permanently defeating the watchdog.
- Client.Rust-018: reverted Rust's `latencyMs` split so the
  cross-language bench comparison is apples-to-apples again;
  `failureLatencyMs` kept as Rust-only enrichment.
- Client.Java-021: applied Client.Java-002's terminal-state
  serialisation pattern to DeployEventStream so close() arriving
  after queue-overflow can't erase the overflow exception.
- IntegrationTests-017: teardown-parity test now uses a two-window
  stability check after UnAdvise instead of strict equality against
  the pre-UnAdvise count (which raced against in-flight events).
- IntegrationTests-019: new RecordingTestOutputHelper wraps every
  log sink the WriteSecured live test owns (worker stdout/stderr,
  gateway logs, direct WriteLine) so the credential is proven
  absent from the full output buffer, not just the diagnostic
  message.
- Tests-020: added MxAccessGatewayServiceConstraintTests coverage
  for the previously-uncovered Write2Bulk and WriteSecured2Bulk
  arms of WriteBulkConstraintPlan.SetPayload.

Lows (41 — highlights)
- Server: Galaxy glob cache eviction is race-free (Server-024);
  GalaxyRepositoryGrpcService takes IGalaxyRepository (Server-025);
  AlarmsOptions validated at startup (Server-026); Authorization.md
  Constraint Enforcement snippet/prose enumerate the bulk write/read
  family (Server-027); bulk-read-commands and bulk-write-commands
  capability tokens added to OpenSession (Server-029);
  NotWiredAlarmRpcDispatcher XML doc and missing scope-resolver and
  state-machine tests cleaned up (023, 028).
- Worker: AlarmCommandHandler now invokes the same STA-affinity
  guard the poll path uses, at every command entry (Worker-024);
  RunAsync null-checks the runtime-session factory result
  (Worker-025).
- Worker.Tests: shared LiveMxAccessOptInVariableName lives on
  GatewayContractInfo (Worker.Tests-025); MxAccessSession.CreateForTesting
  rejects production sinks (Worker.Tests-026); FakeRuntimeSession's
  CancelCommandReturnValue serialised under lock (Worker.Tests-027);
  Probes namespace lifted to MxGateway.Worker.Tests.Probes
  (Worker.Tests-029); cancel-envelope sequence numbers monotonised
  (Worker.Tests-030); docs/GatewayTesting.md gains a "Dev-rig Probes"
  section (Worker.Tests-028).
- Tests: ManualTimeProvider consolidated into one TestSupport/ copy
  (Tests-021); SessionManagerBulkTests adds a mid-flight cancellation
  test backed by a TaskCompletionSource fake (Tests-022); companion
  FakeWorkerProcess.WaitForExitAsync no longer fakes its exit signal
  (Tests-023); constraint plan reply-count divergence pinned
  (Tests-024).
- IntegrationTests: TryGetSession chain carries [MaybeNullWhen(false)]
  end-to-end (IntegrationTests-018); abnormal-exit keyword set
  tightened to pipe-disconnected/end-of-stream and the test now
  asserts streamTask.IsFaulted (020, 021).
- Client.Dotnet: bench commands added to isLongRunning so the
  default 30s wall-clock budget doesn't kill them (015);
  BenchStreamEventsAsync observes the inner stream task on every
  exit path (016).
- Client.Go: parseValue wraps strconv errors with flag context and
  %w (017); bench loops honour ctx.Done() (018); galaxy-watch parses
  RFC3339Nano with fractional seconds (019); runStreamEvents installs
  signal.NotifyContext like runGalaxyWatch (020); five new CLI-level
  table-driven tests cover the bulk/bench subcommands (021).
- Client.Java: toCompletable Javadoc rewritten to match the actual
  cancellation contract Client.Java-015 established (022); stream-events
  text path uses Long.toUnsignedString for worker_sequence (023);
  bench-read-bulk no longer pollutes success-latency histogram with
  failure durations (024); --shutdown-timeout CLI option propagates
  through to ClientOptions (025); seven new MxGatewayCliTests cover
  the bulk and bench commands (026).
- Client.Python: mxgateway_cli ships its own py.typed marker (019);
  wheel-build smoke test added under tests/test_packaging.py (020);
  README documents the Galaxy CLI parity gap explicitly (021).
- Client.Rust: RustClientDesign.md signatures match session.rs and
  document the AsRef<str> read_bulk genericism (019);
  next_correlation_id re-exported at the crate root, with a
  property-style doc contract and an explicit disclaimer that the
  literal textual format is not part of the contract (020).
- Contracts: BulkWriteResult comment names the actual
  IConstraintEnforcer mechanism instead of "tag-allowlist filter"
  (014); BulkReadResult gains explicit per-arm payload-population
  documentation for the success vs failure cases (015).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
Joseph Doherty
2026-05-20 10:28:54 -04:00
parent a0203503a7
commit 1aafd6bde4
74 changed files with 3349 additions and 395 deletions
+28 -12
View File
@@ -589,14 +589,19 @@ func runBenchReadBulk(ctx context.Context, args []string, stdout, stderr io.Writ
}()
// Warm-up: drive identical calls so any first-call JIT / connection-pool
// setup is amortised before the measurement window opens.
// setup is amortised before the measurement window opens. Honor ctx so
// Ctrl+C or a parent-cancel (e.g. the cross-language bench driver killing
// the child early) exits promptly rather than spinning failing calls until
// the wall-clock deadline.
warmupDeadline := time.Now().Add(time.Duration(*warmupSeconds) * time.Second)
timeout := time.Duration(*timeoutMs) * time.Millisecond
for time.Now().Before(warmupDeadline) {
for time.Now().Before(warmupDeadline) && ctx.Err() == nil {
_, _ = session.ReadBulk(ctx, serverHandle, tags, timeout)
}
// Steady state: per-call latency captured via time.Now() deltas.
// Steady state: per-call latency captured via time.Now() deltas. Same ctx
// guard as warm-up; on cancel we stop the loop and report the truncated
// window faithfully.
latenciesMs := make([]float64, 0, 65536)
var totalReadResults int64
var cachedReadResults int64
@@ -604,7 +609,7 @@ func runBenchReadBulk(ctx context.Context, args []string, stdout, stderr io.Writ
steadyStart := time.Now()
steadyDeadline := steadyStart.Add(time.Duration(*durationSeconds) * time.Second)
for time.Now().Before(steadyDeadline) {
for time.Now().Before(steadyDeadline) && ctx.Err() == nil {
callStart := time.Now()
results, err := session.ReadBulk(ctx, serverHandle, tags, timeout)
elapsed := time.Since(callStart)
@@ -772,8 +777,15 @@ func runStreamEvents(ctx context.Context, args []string, stdout, stderr io.Write
}
defer client.Close()
// Mirror runGalaxyWatch so Ctrl+C on a long-running stream-events command
// cancels the gRPC stream cleanly (the gateway sees codes.Canceled rather
// than a torn TCP connection) and the deferred subscription.Close() /
// client.Close() actually run.
signalCtx, stopSignals := signal.NotifyContext(ctx, os.Interrupt, syscall.SIGTERM)
defer stopSignals()
session := mxgateway.NewSessionForID(client, *sessionID)
streamCtx, cancelStream := context.WithCancel(ctx)
streamCtx, cancelStream := context.WithCancel(signalCtx)
defer cancelStream()
subscription, err := session.SubscribeEventsAfter(streamCtx, *after)
if err != nil {
@@ -956,31 +968,31 @@ func parseValue(valueType, valueText string) (*mxgateway.MxValue, error) {
case "bool":
value, err := strconv.ParseBool(valueText)
if err != nil {
return nil, err
return nil, fmt.Errorf("invalid -value for -type %s: %q: %w", valueType, valueText, err)
}
return mxgateway.BoolValue(value), nil
case "int32":
value, err := strconv.ParseInt(valueText, 10, 32)
if err != nil {
return nil, err
return nil, fmt.Errorf("invalid -value for -type %s: %q: %w", valueType, valueText, err)
}
return mxgateway.Int32Value(int32(value)), nil
case "int64":
value, err := strconv.ParseInt(valueText, 10, 64)
if err != nil {
return nil, err
return nil, fmt.Errorf("invalid -value for -type %s: %q: %w", valueType, valueText, err)
}
return mxgateway.Int64Value(value), nil
case "float":
value, err := strconv.ParseFloat(valueText, 32)
if err != nil {
return nil, err
return nil, fmt.Errorf("invalid -value for -type %s: %q: %w", valueType, valueText, err)
}
return mxgateway.FloatValue(float32(value)), nil
case "double":
value, err := strconv.ParseFloat(valueText, 64)
if err != nil {
return nil, err
return nil, fmt.Errorf("invalid -value for -type %s: %q: %w", valueType, valueText, err)
}
return mxgateway.DoubleValue(value), nil
case "string":
@@ -1201,7 +1213,7 @@ func runGalaxyWatch(ctx context.Context, args []string, stdout, stderr io.Writer
flags.SetOutput(stderr)
common := bindCommonFlags(flags)
jsonOutput := flags.Bool("json", false, "write JSON output")
lastSeen := flags.String("last-seen-deploy-time", "", "RFC3339 timestamp; when set, suppresses the bootstrap event")
lastSeen := flags.String("last-seen-deploy-time", "", "RFC 3339 timestamp (with optional fractional seconds); when set, suppresses the bootstrap event")
limit := flags.Int("limit", 0, "maximum events to read; 0 means unbounded (Ctrl+C to stop)")
if err := flags.Parse(args); err != nil {
@@ -1210,7 +1222,11 @@ func runGalaxyWatch(ctx context.Context, args []string, stdout, stderr io.Writer
var lastSeenPtr *time.Time
if *lastSeen != "" {
parsed, err := time.Parse(time.RFC3339, *lastSeen)
// Use RFC3339Nano so values copy-pasted from galaxy-watch -json output
// (which formatDeployEvent emits with fractional seconds) round-trip;
// RFC3339Nano also accepts whole-second values, so the layout switch is
// strictly broader than the previous time.RFC3339 parse.
parsed, err := time.Parse(time.RFC3339Nano, *lastSeen)
if err != nil {
return fmt.Errorf("invalid -last-seen-deploy-time: %w", err)
}