From db72dd1dcabbd71754795ef653cceac2049405b5 Mon Sep 17 00:00:00 2001 From: Joseph Doherty Date: Fri, 19 Jun 2026 11:34:34 -0400 Subject: [PATCH] review(Driver.Galaxy): re-review at HEAD (1 Low deferred; vendoring findings obsolete) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Re-review at 7286d320. Driver.Galaxy-019 (Low, Open): EnsureSessionIntervalAsync caches MxaccessFailure as 'interval applied' (sub-optimal cadence until reconnect) — deferred (sealed gateway session + gateway-contract design call). 001-014 re-verified Resolved; 015-018 (vendoring) now obsolete (libs/ replaced by real MxGateway PackageReferences). --- code-reviews/Driver.Galaxy/findings.md | 114 ++++++++++++++++++++++++- 1 file changed, 111 insertions(+), 3 deletions(-) diff --git a/code-reviews/Driver.Galaxy/findings.md b/code-reviews/Driver.Galaxy/findings.md index 15d910cf..ce67e145 100644 --- a/code-reviews/Driver.Galaxy/findings.md +++ b/code-reviews/Driver.Galaxy/findings.md @@ -4,10 +4,10 @@ |---|---| | Module | `src/Drivers/ZB.MOM.WW.OtOpcUa.Driver.Galaxy` | | Reviewer | Claude Code | -| Review date | 2026-05-23 | -| Commit reviewed | `a9be809` | +| Review date | 2026-06-19 | +| Commit reviewed | `7286d320` | | Status | Reviewed | -| Open findings | 0 | +| Open findings | 1 | ## Checklist coverage @@ -51,6 +51,54 @@ Correctness (1), OtOpcUa conventions (2), Concurrency (3), Error handling recorded below (Driver.Galaxy-015..018) — none Critical, none High; two Medium, two Low. +## Re-review 2026-06-19 (commit `7286d320`) + +Heavy churn since `a9be809`: Galaxy completed its transition from the retired +alias/relay machinery to a plain Equipment-kind driver, and gained the +write-through path (`AdviseSupervisory` → raw `Write`), native alarms via the +session-less `GatewayGalaxyAlarmFeed`, the `GalaxyDriverProbe` Test-Connect +preflight (+212 LOC, new), and per-platform health probing. 39 src files, +5305 LOC; 31 test files, 6137 LOC. + +**The vendoring is gone.** The biggest structural change retires the entire +`libs/` concern that drove the last re-review: the two vendored binary DLLs +(`MxGateway.Client.dll` / `MxGateway.Contracts.dll`) and the path-based +`` items were replaced by real +`` / +`ZB.MOM.WW.MxGateway.Contracts` NuGet references, and +`Config/GalaxyDriverOptions.cs` moved to the sibling `.Contracts` project (out +of this module's scope). This makes **Driver.Galaxy-015, -016, -017, -018** +(all vendoring-provenance / package-skew / contract-version findings) obsolete +as live concerns — they remain in the history below as closed records and are +NOT re-opened. + +All twelve prior code-affecting findings (Driver.Galaxy-001..014) verified +still-resolved against the churned source: the EventPump stream-fault → +supervisor hand-off (`OnEventPumpStreamFault`), the Int64 `DataTypeMap` arm, +`StatusCodeMap.IsSuccess()` semantics + `ToQualityCategoryByte` helper, +deterministic earliest-handle alarm-owner selection, `IAsyncDisposable`, +`ReplayAsync` rebind + EventPump restart, `StartDeployWatcher` fault +observation + single `_ownedRepositoryClient`, the `ResolveApiKey` +`env:`/`file:`/`dev:`/warn-on-literal arms, the live `GetMemoryFootprint` +estimate, the O(1) `SubscriptionRegistry` indices, and the `ReinitializeAsync` +equivalent-config gate all survive intact. + +One new finding (Driver.Galaxy-019, Low, Open-deferred). No Critical / High / +Medium new findings. Category results: + +| # | Category | Result | +|---|---|---| +| 1 | Correctness & logic bugs | No new issues found | +| 2 | OtOpcUa conventions | No new issues found | +| 3 | Concurrency & thread safety | No new issues found | +| 4 | Error handling & resilience | Driver.Galaxy-019 | +| 5 | Security | No new issues found (vendoring retired; no secret logging; `ResolveApiKey` warns on cleartext) | +| 6 | Performance & resource management | No new issues found | +| 7 | Design-document adherence | No new issues found | +| 8 | Code organization & conventions | No new issues found | +| 9 | Testing coverage | No new issues found (31 suites; `GatewayGalaxySubscriber` untestable per Driver.Galaxy-019) | +| 10 | Documentation & comments | No new issues found | + ## Findings ### Driver.Galaxy-001 @@ -385,3 +433,63 @@ recommendation would be cosmetic without changing behaviour. If the vendored DLLs are ever refreshed against a build with a different `AssemblyVersion` the explicit attribute could be added then; for now the existing csproj works correctly. + +### Driver.Galaxy-019 + +| Field | Value | +|---|---| +| Severity | Low | +| Category | Error handling & resilience | +| Location | `Runtime/GatewayGalaxySubscriber.cs:89-100` | +| Status | Open | + +**Description:** `GatewayGalaxySubscriber.EnsureSessionIntervalAsync` applies +the session-level `SetBufferedUpdateInterval` command and then caches the +requested interval in `_lastAppliedIntervalMs` so later `SubscribeBulk` calls +skip a redundant set. The gate is +`if (reply.ProtocolStatus?.Code is not (ProtocolStatusCode.Ok or ProtocolStatusCode.MxaccessFailure)) return;` +— i.e. it early-returns (no cache) for unexpected protocol codes, but for both +`Ok` **and** `MxaccessFailure` it falls through and records the interval as +applied. `MxaccessFailure` means the gateway reached MXAccess but the COM-side +`SetBufferedUpdateInterval` failed, so the requested cadence was **not** applied +— yet caching it as applied means every subsequent subscribe permanently skips +the retry and the session stays at the gateway's default cadence until a +reconnect clears the cache. A transient MXAccess hiccup during the first +buffered subscribe thus silently pins the wrong publish rate for the session's +life. Two secondary issues compound it: (a) the inline comment claims "the +warning here gives ops the signal," but no warning is logged on the soft-failure +path (and the class holds no logger at all), so the only signal is the PR-6.1 +trace span; (b) the early-return (truly-unexpected code) path also caches +nothing AND logs nothing, so an operator has no log-level evidence the cadence +was rejected. + +This is bounded in impact — buffered cadence is a performance knob, not a +correctness one; data still flows at the gateway default, and a reconnect resets +the cache (the writer's `InvalidateHandleCaches` clears writer state, and a fresh +`GatewayGalaxySubscriber`/session is built on reopen). Hence Low. + +**Recommendation:** Treat `MxaccessFailure` as a non-cacheable soft failure — +only set `_lastAppliedIntervalMs` when `ProtocolStatus.Code == Ok` so a failed +MXAccess apply is retried on the next subscribe. Give the subscriber an +`ILogger` (it is constructed in `BuildProductionRuntimeAsync`, which already has +`_logger`) and emit a `Warning` on both the soft-failure and unexpected-code +paths so the comment's promised signal actually exists. To make this testable, +extract the reply→outcome classification into a pure internal helper +(`ClassifyIntervalReply(ProtocolStatusCode?)`) so it can be unit-tested without +the sealed, internal-ctor `MxGatewaySession` — `GatewayGalaxySubscriber` +currently has zero unit tests for exactly this reason. + +**Resolution:** Deferred 2026-06-19 — the fix is small but cannot be landed +with a real TDD red→green cycle in this module: the behaviour lives entirely +behind the sealed `MxGatewaySession` (no public/internal ctor; the same +constraint that forced every other gw call onto an injectable `IGalaxy*` seam), +so there is no way to drive a synthetic `MxCommandReply` through +`EnsureSessionIntervalAsync` without first refactoring out the pure-classifier +helper the recommendation calls for. That refactor plus giving the subscriber a +logger is a low-risk but non-trivial change to the production gw session path, +and the underlying intent (is caching on `MxaccessFailure` deliberate "the +gateway processed it, don't retry" behaviour, or an oversight?) is a design +question better confirmed against the gateway contract than guessed at in a +review sweep. Tracked for a follow-up that lands the classifier extraction + +its unit tests together. Impact is bounded to a sub-optimal (not broken) +publish cadence that self-heals on reconnect.