review(Driver.Galaxy): re-review at HEAD (1 Low deferred; vendoring findings obsolete)
Re-review at 7286d320. Driver.Galaxy-019 (Low, Open): EnsureSessionIntervalAsync caches
MxaccessFailure as 'interval applied' (sub-optimal cadence until reconnect) — deferred
(sealed gateway session + gateway-contract design call). 001-014 re-verified Resolved;
015-018 (vendoring) now obsolete (libs/ replaced by real MxGateway PackageReferences).
This commit is contained in:
@@ -4,10 +4,10 @@
|
||||
|---|---|
|
||||
| Module | `src/Drivers/ZB.MOM.WW.OtOpcUa.Driver.Galaxy` |
|
||||
| Reviewer | Claude Code |
|
||||
| Review date | 2026-05-23 |
|
||||
| Commit reviewed | `a9be809` |
|
||||
| Review date | 2026-06-19 |
|
||||
| Commit reviewed | `7286d320` |
|
||||
| Status | Reviewed |
|
||||
| Open findings | 0 |
|
||||
| Open findings | 1 |
|
||||
|
||||
## Checklist coverage
|
||||
|
||||
@@ -51,6 +51,54 @@ Correctness (1), OtOpcUa conventions (2), Concurrency (3), Error handling
|
||||
recorded below (Driver.Galaxy-015..018) — none Critical, none High; two
|
||||
Medium, two Low.
|
||||
|
||||
## Re-review 2026-06-19 (commit `7286d320`)
|
||||
|
||||
Heavy churn since `a9be809`: Galaxy completed its transition from the retired
|
||||
alias/relay machinery to a plain Equipment-kind driver, and gained the
|
||||
write-through path (`AdviseSupervisory` → raw `Write`), native alarms via the
|
||||
session-less `GatewayGalaxyAlarmFeed`, the `GalaxyDriverProbe` Test-Connect
|
||||
preflight (+212 LOC, new), and per-platform health probing. 39 src files,
|
||||
5305 LOC; 31 test files, 6137 LOC.
|
||||
|
||||
**The vendoring is gone.** The biggest structural change retires the entire
|
||||
`libs/` concern that drove the last re-review: the two vendored binary DLLs
|
||||
(`MxGateway.Client.dll` / `MxGateway.Contracts.dll`) and the path-based
|
||||
`<Reference HintPath>` items were replaced by real
|
||||
`<PackageReference Include="ZB.MOM.WW.MxGateway.Client" />` /
|
||||
`ZB.MOM.WW.MxGateway.Contracts` NuGet references, and
|
||||
`Config/GalaxyDriverOptions.cs` moved to the sibling `.Contracts` project (out
|
||||
of this module's scope). This makes **Driver.Galaxy-015, -016, -017, -018**
|
||||
(all vendoring-provenance / package-skew / contract-version findings) obsolete
|
||||
as live concerns — they remain in the history below as closed records and are
|
||||
NOT re-opened.
|
||||
|
||||
All twelve prior code-affecting findings (Driver.Galaxy-001..014) verified
|
||||
still-resolved against the churned source: the EventPump stream-fault →
|
||||
supervisor hand-off (`OnEventPumpStreamFault`), the Int64 `DataTypeMap` arm,
|
||||
`StatusCodeMap.IsSuccess()` semantics + `ToQualityCategoryByte` helper,
|
||||
deterministic earliest-handle alarm-owner selection, `IAsyncDisposable`,
|
||||
`ReplayAsync` rebind + EventPump restart, `StartDeployWatcher` fault
|
||||
observation + single `_ownedRepositoryClient`, the `ResolveApiKey`
|
||||
`env:`/`file:`/`dev:`/warn-on-literal arms, the live `GetMemoryFootprint`
|
||||
estimate, the O(1) `SubscriptionRegistry` indices, and the `ReinitializeAsync`
|
||||
equivalent-config gate all survive intact.
|
||||
|
||||
One new finding (Driver.Galaxy-019, Low, Open-deferred). No Critical / High /
|
||||
Medium new findings. Category results:
|
||||
|
||||
| # | Category | Result |
|
||||
|---|---|---|
|
||||
| 1 | Correctness & logic bugs | No new issues found |
|
||||
| 2 | OtOpcUa conventions | No new issues found |
|
||||
| 3 | Concurrency & thread safety | No new issues found |
|
||||
| 4 | Error handling & resilience | Driver.Galaxy-019 |
|
||||
| 5 | Security | No new issues found (vendoring retired; no secret logging; `ResolveApiKey` warns on cleartext) |
|
||||
| 6 | Performance & resource management | No new issues found |
|
||||
| 7 | Design-document adherence | No new issues found |
|
||||
| 8 | Code organization & conventions | No new issues found |
|
||||
| 9 | Testing coverage | No new issues found (31 suites; `GatewayGalaxySubscriber` untestable per Driver.Galaxy-019) |
|
||||
| 10 | Documentation & comments | No new issues found |
|
||||
|
||||
## Findings
|
||||
|
||||
### Driver.Galaxy-001
|
||||
@@ -385,3 +433,63 @@ recommendation would be cosmetic without changing behaviour. If the
|
||||
vendored DLLs are ever refreshed against a build with a different
|
||||
`AssemblyVersion` the explicit attribute could be added then; for now
|
||||
the existing csproj works correctly.
|
||||
|
||||
### Driver.Galaxy-019
|
||||
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Severity | Low |
|
||||
| Category | Error handling & resilience |
|
||||
| Location | `Runtime/GatewayGalaxySubscriber.cs:89-100` |
|
||||
| Status | Open |
|
||||
|
||||
**Description:** `GatewayGalaxySubscriber.EnsureSessionIntervalAsync` applies
|
||||
the session-level `SetBufferedUpdateInterval` command and then caches the
|
||||
requested interval in `_lastAppliedIntervalMs` so later `SubscribeBulk` calls
|
||||
skip a redundant set. The gate is
|
||||
`if (reply.ProtocolStatus?.Code is not (ProtocolStatusCode.Ok or ProtocolStatusCode.MxaccessFailure)) return;`
|
||||
— i.e. it early-returns (no cache) for unexpected protocol codes, but for both
|
||||
`Ok` **and** `MxaccessFailure` it falls through and records the interval as
|
||||
applied. `MxaccessFailure` means the gateway reached MXAccess but the COM-side
|
||||
`SetBufferedUpdateInterval` failed, so the requested cadence was **not** applied
|
||||
— yet caching it as applied means every subsequent subscribe permanently skips
|
||||
the retry and the session stays at the gateway's default cadence until a
|
||||
reconnect clears the cache. A transient MXAccess hiccup during the first
|
||||
buffered subscribe thus silently pins the wrong publish rate for the session's
|
||||
life. Two secondary issues compound it: (a) the inline comment claims "the
|
||||
warning here gives ops the signal," but no warning is logged on the soft-failure
|
||||
path (and the class holds no logger at all), so the only signal is the PR-6.1
|
||||
trace span; (b) the early-return (truly-unexpected code) path also caches
|
||||
nothing AND logs nothing, so an operator has no log-level evidence the cadence
|
||||
was rejected.
|
||||
|
||||
This is bounded in impact — buffered cadence is a performance knob, not a
|
||||
correctness one; data still flows at the gateway default, and a reconnect resets
|
||||
the cache (the writer's `InvalidateHandleCaches` clears writer state, and a fresh
|
||||
`GatewayGalaxySubscriber`/session is built on reopen). Hence Low.
|
||||
|
||||
**Recommendation:** Treat `MxaccessFailure` as a non-cacheable soft failure —
|
||||
only set `_lastAppliedIntervalMs` when `ProtocolStatus.Code == Ok` so a failed
|
||||
MXAccess apply is retried on the next subscribe. Give the subscriber an
|
||||
`ILogger` (it is constructed in `BuildProductionRuntimeAsync`, which already has
|
||||
`_logger`) and emit a `Warning` on both the soft-failure and unexpected-code
|
||||
paths so the comment's promised signal actually exists. To make this testable,
|
||||
extract the reply→outcome classification into a pure internal helper
|
||||
(`ClassifyIntervalReply(ProtocolStatusCode?)`) so it can be unit-tested without
|
||||
the sealed, internal-ctor `MxGatewaySession` — `GatewayGalaxySubscriber`
|
||||
currently has zero unit tests for exactly this reason.
|
||||
|
||||
**Resolution:** Deferred 2026-06-19 — the fix is small but cannot be landed
|
||||
with a real TDD red→green cycle in this module: the behaviour lives entirely
|
||||
behind the sealed `MxGatewaySession` (no public/internal ctor; the same
|
||||
constraint that forced every other gw call onto an injectable `IGalaxy*` seam),
|
||||
so there is no way to drive a synthetic `MxCommandReply` through
|
||||
`EnsureSessionIntervalAsync` without first refactoring out the pure-classifier
|
||||
helper the recommendation calls for. That refactor plus giving the subscriber a
|
||||
logger is a low-risk but non-trivial change to the production gw session path,
|
||||
and the underlying intent (is caching on `MxaccessFailure` deliberate "the
|
||||
gateway processed it, don't retry" behaviour, or an oversight?) is a design
|
||||
question better confirmed against the gateway contract than guessed at in a
|
||||
review sweep. Tracked for a follow-up that lands the classifier extraction +
|
||||
its unit tests together. Impact is bounded to a sub-optimal (not broken)
|
||||
publish cadence that self-heals on reconnect.
|
||||
|
||||
Reference in New Issue
Block a user