Driver.Galaxy-015, -016, -017, -018 resolution (one logical change set).
Driver.Galaxy-016 (Medium, Perf/Resource):
Reconciled the csproj PackageReferences with what the vendored
MxGateway.Client.dll was actually built against, verified by
reflecting Assembly.GetReferencedAssemblies() on the DLL:
- Polly 8.5.2 → Polly.Core 8.6.6
(most consequential — Polly v7 fluent API vs Polly.Core v8
resilience-pipeline API are DIFFERENT packages; the DLL was
built against Polly.Core so the prior Polly reference would
have failed at runtime with MissingMethodException the first
time the gateway client's retry pipeline ran)
- Grpc.Net.Client 2.71.0 → 2.76.0 (matches sibling Server/Worker)
- Microsoft.Extensions.Logging.Abstractions 10.0.0 → 10.0.7
Google.Protobuf 3.34.1 and Grpc.Core.Api 2.76.0 already matched —
left unchanged.
Driver.Galaxy-015 (re-triaged from Medium-Security → Low-Documentation):
Original framing was a security concern about unknown-provenance
binaries. User clarified the DLLs are their own code, built from
their own mxaccessgw project, not third-party. Re-triaged to a
documentation / audit-trail concern. Fix:
- Added a Provenance section to libs/README.md recording the
source-commit SHA (dd7ca1634e2d2b8a866c81f0009bf87ee9427750,
extracted from the AssemblyInformationalVersion baked into
both DLLs by the original build) and SHA-256 checksums.
- Documented the re-verification recipe (sha256sum + ilspycmd
| grep AssemblyInformationalVersion).
Recommendations about .gitattributes and CI hash-check deferred —
the DLLs are frozen until an unwinding path is taken, so adding
LFS or CI infrastructure now would need removal at unwinding.
Driver.Galaxy-018 (Low, Documentation):
Most of the recommendation folded into the libs/README.md rewrite
(pointed at sibling Server/Worker csproj as the live version source
rather than the deleted MxGateway.Client.csproj; recorded source
commit + SHA-256). <SpecificVersion>false</SpecificVersion> on the
<Reference> items intentionally not added — MSBuild's default for
HintPath references with bare-name Include attributes is already
SpecificVersion=false, so explicitly setting it would be cosmetic
without changing behaviour.
Driver.Galaxy-017 (Low, Design) — Deferred:
Recommendation part (b) (record mxaccessgw source-commit SHA in
libs/README.md) is satisfied by Driver.Galaxy-015's resolution.
Parts (a) and (c) — a GetVersion RPC at session-open and a parity
test against the live gateway's proto descriptor — are substantial
new RPC + plumbing work not in scope for this code-review sweep.
The risk surface is bounded because either of the libs/README.md
unwinding paths closes the vendoring + this concern naturally.
Re-open if neither path is taken within the next quarter and the
live gateway evolves its proto under the driver.
Verification:
- Build clean (Driver.Galaxy.csproj 0 errors, 0 warnings).
- Driver.Galaxy.Tests: 245/245 pass against the corrected
package set.
- Solution-wide build remains clean.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
41 KiB
Code Review — Driver.Galaxy
| Field | Value |
|---|---|
| Module | src/Drivers/ZB.MOM.WW.OtOpcUa.Driver.Galaxy |
| Reviewer | Claude Code |
| Review date | 2026-05-23 |
| Commit reviewed | a9be809 |
| Status | Reviewed |
| Open findings | 0 |
Checklist coverage
| # | Category | Result |
|---|---|---|
| 1 | Correctness & logic bugs | Driver.Galaxy-001, Driver.Galaxy-002, Driver.Galaxy-003, Driver.Galaxy-004 |
| 2 | OtOpcUa conventions | Driver.Galaxy-005 |
| 3 | Concurrency & thread safety | Driver.Galaxy-006, Driver.Galaxy-007 |
| 4 | Error handling & resilience | Driver.Galaxy-001, Driver.Galaxy-008, Driver.Galaxy-009 |
| 5 | Security | Driver.Galaxy-010, Driver.Galaxy-015 |
| 6 | Performance & resource management | Driver.Galaxy-011, Driver.Galaxy-012, Driver.Galaxy-016 |
| 7 | Design-document adherence | Driver.Galaxy-013, Driver.Galaxy-017 |
| 8 | Code organization & conventions | No issues found |
| 9 | Testing coverage | Driver.Galaxy-014 |
| 10 | Documentation & comments | Driver.Galaxy-005, Driver.Galaxy-013, Driver.Galaxy-018 |
Re-review 2026-05-23 (commit a9be809)
The only code-affecting change since 76d35d1 was commit 994997b — the
sibling mxaccessgw repo restructured (the clients/dotnet/MxGateway.Client
project path and the MxGateway.Contracts.Proto namespace both moved), and
the driver's path-based ProjectReference started producing 87 build errors
solution-wide. The fix is build-time only: the broken ProjectReference was
replaced with <Reference HintPath="libs\…"> items pointing at vendored
binary copies of MxGateway.Client.dll (99 KB, May 2026 known-good build)
and MxGateway.Contracts.dll (490 KB), and five PackageReferences that
the dropped project was previously providing transitively (Google.Protobuf,
Grpc.Core.Api, Grpc.Net.Client, Microsoft.Extensions.Logging.Abstractions,
Polly) were declared explicitly. The matching Tests csproj got the same
binary <Reference> for MxGateway.Contracts (replacing its own broken
ProjectReference). A libs/README.md documents what is vendored and the
two unwinding paths (sibling restores a client library, or driver migrates
to the new ZB.MOM.WW.MxGateway.Contracts.Proto namespace + reimplements
the MxGatewayClient / MxGatewaySession / GalaxyRepositoryClient
wrapper, ~2,200 LoC).
No *.cs file changed; the re-review walked only the categories that apply
to a build-time/packaging change. Categories with no new findings:
Correctness (1), OtOpcUa conventions (2), Concurrency (3), Error handling
(4), Code organization (8), Testing coverage (9). Four new findings are
recorded below (Driver.Galaxy-015..018) — none Critical, none High; two
Medium, two Low.
Findings
Driver.Galaxy-001
| Field | Value |
|---|---|
| Severity | Critical |
| Category | Error handling & resilience |
| Location | Runtime/EventPump.cs:128, GalaxyDriver.cs:222 |
| Status | Resolved |
Description: The ReconnectSupervisor is constructed in BuildProductionRuntimeAsync and exposes ReportTransportFailure(Exception) as the only entry point that starts the reopen -> replay recovery loop. Nothing in the driver ever calls ReportTransportFailure (a repo-wide search finds only the declaration). When the gateway StreamEvents stream faults, EventPump.RunAsync catches the exception, logs "reconnect supervisor (PR 4.5) handles restart", completes the channel, and exits — but the supervisor is never told. The result: a transient gateway transport drop permanently kills the event stream. Data-change notifications stop, no reconnect/replay runs, and GetHealth() keeps reporting Healthy because _supervisor.IsDegraded stays false. This is a production outage with no self-recovery.
Recommendation: Wire the EventPump (and any gw RPC that observes a transport fault) to call _supervisor.ReportTransportFailure(ex). The simplest path: give EventPump a fault callback (or expose a StreamFaulted event) that GalaxyDriver subscribes to and forwards to the supervisor. The supervisor's ReopenAsync/ReplayAsync must also restart the EventPump itself (see Driver.Galaxy-008).
Resolution: Resolved 2026-05-22 — added an optional onStreamFault callback to EventPump; RunAsync's stream-fault catch block now invokes it, and GalaxyDriver.EnsureEventPumpStarted wires it to OnEventPumpStreamFault which forwards the cause to ReconnectSupervisor.ReportTransportFailure, so a transient gw transport drop now drives reopen → replay. Regression coverage in EventPumpStreamFaultTests. Note: the EventPump itself is still not restarted on reconnect — that pump-restart gap remains tracked under Driver.Galaxy-008.
Driver.Galaxy-002
| Field | Value |
|---|---|
| Severity | High |
| Category | Correctness & logic bugs |
| Location | Browse/DataTypeMap.cs:13, Runtime/MxValueDecoder.cs:9 |
| Status | Resolved |
Description: DataTypeMap.Map maps Galaxy mx_data_type codes to six DriverDataType values (Boolean, Int32, Float32, Float64, String, DateTime) — there is no Int64 arm. Yet MxValueDecoder and MxValueEncoder both fully support Int64 (MxValue.Int64Value, Int64Array), and the decoder's own XML doc claims "the seven Galaxy data types ... (Boolean, Int32, Int64, Float32, Float64, String, DateTime)". Any Galaxy attribute whose mx_data_type is the Int64 code (or any code > 5) falls through the _ => DriverDataType.String default. The address-space node is then created as a String variable while runtime reads decode an Int64 boxed value — a type mismatch that produces wrong OPC UA DataType/ValueRank metadata and likely fails value coercion at the server node layer.
Recommendation: Confirm the Galaxy mx_data_type integer code for 64-bit integers and add the explicit arm to DataTypeMap.Map. If the wire format genuinely has no Int64 type, correct the MxValueDecoder/MxValueEncoder doc comments instead. Either way the encoder/decoder and the type map must agree.
Resolution: Resolved 2026-05-22 — added 6 => DriverDataType.Int64 to DataTypeMap.Map, extending the contiguous 0..5 scheme so the type map covers the same seven Galaxy data types MxValueDecoder/MxValueEncoder already decode/encode; Int64 attributes now build as Int64 nodes instead of falling through to the String default. Regression coverage in DataTypeMapTests.
Driver.Galaxy-003
| Field | Value |
|---|---|
| Severity | Medium |
| Category | Correctness & logic bugs |
| Location | Runtime/StatusCodeMap.cs:86 |
| Status | Resolved |
Description: FromMxStatus returns Good whenever status.Success != 0. The intent (per the surrounding comment "Honors the success flag") is that a non-zero Success means success. But if MxStatusProxy.Success is itself a native HRESULT/return code rather than a boolean-as-int, then Success != 0 is exactly the failure condition and the mapper inverts it — every failed write/read would report Good. The field name is ambiguous and the rest of the file (Detail, RawDetectedBy, and Hresult used elsewhere) treats 0 as success. GatewayGalaxyAlarmAcknowledger.cs:62 uses the opposite convention for the sibling field (reply.Hresult != 0 means failure).
Recommendation: Verify the semantics of MxStatusProxy.Success against the gateway proto contract. If it is a success-boolean encoded as int, add a code comment pinning that; if it is an HRESULT, invert the check to status.Success == 0 => Good.
Resolution: Resolved 2026-05-22 — replaced status.Success != 0 with status.IsSuccess() (the MxStatusProxyExtensions helper that checks both success != 0 AND category == Ok); the proto contract explicitly documents that success is not a boolean and that clients must branch on category. Regression coverage updated in StatusCodeMapTests with a SuccessNonZeroButCategoryNotOk_IsNotGood assertion pinning the fix.
Driver.Galaxy-004
| Field | Value |
|---|---|
| Severity | Medium |
| Category | Correctness & logic bugs |
| Location | GalaxyDriver.cs:901 |
| Status | Resolved |
Description: OnPumpDataChange reconstructs a raw OPC DA quality byte from an OPC UA StatusCode for the probe watcher: it shifts StatusCode >> 30 and maps 0->192, 1->64, _->0. The StatusCode was itself produced upstream by StatusCodeMap.FromQualityByte/FromMxStatus, so this is a lossy round-trip — it collapses every specific code back to the three category bytes (192/64/0). That happens to satisfy PerPlatformProbeWatcher.DecodeState (which only checks qualityByte < 192), so the bug is currently benign, but the mapping is fragile and undocumented except for one inline comment. A future edit to the StatusCodeMap constants or to the shift width would silently desync the probe-health decode with no test guarding it.
Recommendation: Route the probe path off the original quality information rather than reverse-engineering it from a StatusCode. Either carry the raw quality byte on DataValueSnapshot, or add a StatusCodeMap.ToQualityCategoryByte(uint) helper with unit tests so the mapping lives in one place next to its inverse.
Resolution: Resolved 2026-05-22 — added StatusCodeMap.ToQualityCategoryByte(uint) helper that extracts top-two bits of the OPC UA StatusCode into the OPC DA category byte (Good=192, Uncertain=64, Bad=0); GalaxyDriver.OnPumpDataChange now calls this helper instead of inlining the shift+switch, so the mapping lives next to its inverse. Unit tests in StatusCodeMapTests cover all three category buckets and the round-trip invariant.
Driver.Galaxy-005
| Field | Value |
|---|---|
| Severity | Low |
| Category | OtOpcUa conventions |
| Location | Runtime/EventPump.cs:81-88 |
| Status | Resolved |
Description: The BoundedChannelOptions comment states "Newest-dropped policy: when full, the producer's TryWrite returns false ... We do this manually rather than relying on BoundedChannelFullMode.DropWrite" — but the option is then set to FullMode = BoundedChannelFullMode.Wait. With Wait, TryWrite returning false on a full channel is correct behaviour, so the code works, but the comment naming the mode and the actual mode disagree, which is confusing for a maintainer deciding whether the policy is Wait, DropWrite, or DropNewest.
Recommendation: Either reword the comment to say "we use Wait mode but never call the awaitable WriteAsync — TryWrite gives us synchronous newest-dropped semantics", or switch to BoundedChannelFullMode.DropWrite and keep the manual drop count. Make the comment and the mode consistent.
Resolution: Resolved 2026-05-23 — reworded the BoundedChannelOptions comment to say "we use FullMode.Wait but never call the awaitable WriteAsync — only synchronous TryWrite, which returns false immediately on a full channel and lets us account for drops on the EventsDropped counter". Also explains why we deliberately do NOT use BoundedChannelFullMode.DropWrite (it would silently discard without surfacing on the counter). Comment and FullMode value now agree.
Driver.Galaxy-006
| Field | Value |
|---|---|
| Severity | Medium |
| Category | Concurrency & thread safety |
| Location | GalaxyDriver.cs:848-861 |
| Status | Resolved |
Description: OnAlarmFeedTransition picks the "owner" handle with _alarmSubscriptions.First() under _alarmHandlersLock. HashSet<T>.First() enumeration order is unspecified and unstable across mutations — when multiple alarm subscriptions are active, the handle attached to a given AlarmEventArgs can change arbitrarily between transitions. The XML doc acknowledges "we still only fire the event once" but the downstream AlarmConditionService correlates transitions to the originating subscription via this handle; a non-deterministic owner can misroute unsubscribe bookkeeping or per-subscription state.
Recommendation: If alarm transitions genuinely fan out to all subscriptions, raise OnAlarmEvent once per active handle (or document that the handle is a non-correlating sentinel and have the server stop relying on it). If a single owner is required, make the choice deterministic (e.g. the earliest-created handle) and stable.
Resolution: Resolved 2026-05-22 — changed _alarmSubscriptions from HashSet<GalaxyAlarmSubscriptionHandle> to List<GalaxyAlarmSubscriptionHandle> so insertion order is preserved; OnAlarmFeedTransition now picks [0] (earliest-registered handle) instead of First() on a HashSet, making the owner selection deterministic and stable across mutations. Server routing uses SourceNodeId not the handle, so every active subscriber sees the same transition regardless of which handle is attached.
Driver.Galaxy-007
| Field | Value |
|---|---|
| Severity | Medium |
| Category | Concurrency & thread safety |
| Location | GalaxyDriver.cs:937-968 |
| Status | Resolved |
Description: Dispose() is not synchronized against the capability methods. It sets _disposed = true then disposes _eventPump, _alarmFeed, _ownedMxSession, _ownedMxClient, _supervisor, etc. A concurrent SubscribeAsync/ReadAsync/WriteAsync that passed its ObjectDisposedException.ThrowIf check at entry can then dereference _subscriber/_dataWriter whose backing GalaxyMxSession is being disposed mid-call, producing ObjectDisposedException/NullReferenceException from deep inside the gw client rather than a clean failure. Dispose also blocks the caller on GetAwaiter().GetResult() of several async disposals, risking a deadlock if invoked from a thread-pool-starved context.
Recommendation: Gate capability entry points so they cannot start new gw work once _disposed is set (e.g. a CancellationTokenSource linked into every call, cancelled first in Dispose). Consider implementing IAsyncDisposable so the async sub-component disposals do not block on GetResult().
Resolution: Resolved 2026-05-22 — added IAsyncDisposable to GalaxyDriver and implemented DisposeAsync() as the primary disposal path that awaits each async sub-component (EventPump, AlarmFeed, MxSession, MxClient, RepositoryClient) without blocking; Dispose() delegates to DisposeAsync().AsTask().GetAwaiter().GetResult() for using-statement compatibility. The sync blocking-on-GetResult anti-pattern in the previous Dispose body is eliminated on the hot path. Note: the CancellationTokenSource gate for concurrent capability entry was not added — the existing ObjectDisposedException.ThrowIf(_disposed, this) guards at capability entry points already provide the fast-fail, and a separate CTS would add complexity without solving the TOCTOU window noted in the finding; that window is benign in practice (the sub-component's own disposed check catches it).
Driver.Galaxy-008
| Field | Value |
|---|---|
| Severity | High |
| Category | Error handling & resilience |
| Location | GalaxyDriver.cs:264-276, Runtime/EventPump.cs:97-103 |
| Status | Resolved |
Description: Even if Driver.Galaxy-001 is fixed and the supervisor's ReplayAsync runs, recovery is incomplete. ReplayAsync re-issues SubscribeBulkAsync for the tracked tags, but the EventPump background loop that consumes StreamEvents is not restarted. After a stream fault EventPump.RunAsync exits and _channel is completed; EventPump.Start() is a no-op (if (_loop is not null) return) because _loop is a completed-but-non-null task. So a replayed subscription has no consumer — values are subscribed on the gw but never reach OnDataChange. Additionally ReplayAsync never re-registers the new item handles the gw returns into SubscriptionRegistry; the old stale item handles remain, so even with a live pump the fan-out reverse-map would miss the post-reconnect handles.
Recommendation: On reconnect, dispose and recreate the EventPump (or make it restartable), and have ReplayAsync update SubscriptionRegistry bindings with the new item handles returned by the post-reconnect SubscribeBulkAsync. Add an integration/parity test that drops the stream mid-subscription and asserts OnDataChange resumes.
Resolution: Resolved 2026-05-22 — ReplayAsync now calls a new RestartEventPumpForReplay (disposes the faulted pump, recreates and restarts a fresh one) and re-issues SubscribeBulkAsync per subscription, then SubscriptionRegistry.Rebind swaps each subscription's stale pre-reconnect item handles for the post-reconnect handles so the fan-out reverse map dispatches to the live pump. New SubscriptionRegistry.SnapshotEntries/Rebind APIs back the per-subscription replay. Regression coverage in SubscriptionRegistryTests (Rebind/SnapshotEntries) and EventPumpStreamFaultTests.FaultedPump_IsNotRestartableInPlace_ButAFreshPumpResumesDispatch.
Driver.Galaxy-009
| Field | Value |
|---|---|
| Severity | Medium |
| Category | Error handling & resilience |
| Location | GalaxyDriver.cs:354-371 |
| Status | Resolved |
Description: StartDeployWatcher launches the watch loop with _ = _deployWatcher.StartAsync(CancellationToken.None) — a fire-and-forget with a discarded Task. StartAsync can throw synchronously (InvalidOperationException if already started); the discard masks that programming error. Separately, StartDeployWatcher builds an _ownedRepositoryClient purely for the watcher when discovery has not run yet — if DiscoverAsync later runs, BuildDefaultHierarchySource overwrites _ownedRepositoryClient with a second client, leaking the first (only the latest reference is disposed in Dispose).
Recommendation: Await StartAsync (it completes synchronously after scheduling) or at least observe its result. Reuse a single GalaxyRepositoryClient across the deploy watcher and the hierarchy source instead of letting BuildDefaultHierarchySource clobber the field — guard the assignment or build the client once in InitializeAsync.
Resolution: Resolved 2026-05-22 — (a) replaced _ = _deployWatcher.StartAsync(...) discard with an explicit variable + IsFaulted check so any synchronous throw from StartAsync (e.g. called-twice InvalidOperationException) propagates rather than being silently swallowed; (b) changed both StartDeployWatcher and BuildDefaultHierarchySource to use _ownedRepositoryClient ??= so a client built by the watcher is reused by discovery instead of being overwritten and leaked — only one GalaxyRepositoryClient instance is now created and disposed.
Driver.Galaxy-010
| Field | Value |
|---|---|
| Severity | Low |
| Category | Security |
| Location | GalaxyDriver.cs:311-341 |
| Status | Resolved |
Description: ResolveApiKey supports an env:/file: indirection and otherwise treats the config string as the literal API key ("Anything else — used as the literal API key. Convenient for dev"). GalaxyGatewayOptions' own XML doc claims "the API key never appears in cleartext config". The literal-key fallback silently permits a plaintext API key in the DriverConfig JSON column of the central config DB, contradicting the documented contract. There is no warning logged when the literal path is taken.
Recommendation: Log a startup warning when ResolveApiKey falls through to the literal arm so an operator who accidentally committed a cleartext key sees it, and update the GalaxyGatewayOptions doc comment so it no longer over-promises. Consider gating the literal arm behind an explicit dev:-style prefix so a cleartext key cannot be used by accident.
Resolution: Resolved 2026-05-23 — (a) added a logger-aware ResolveApiKey(string, ILogger?) overload that emits a Warning when the back-compat literal arm is taken, and wired the BuildClientOptions call site to pass _logger; (b) added an explicit dev:KEY prefix that returns the literal value without warning, so dev rigs / parity tests can opt-in deliberately; (c) rewrote the GalaxyGatewayOptions.ApiKeySecretRef XML doc so it no longer claims "the API key never appears in cleartext config" — it now documents all four supported forms (env:, file:, dev:, and the warning-on-literal back-compat path). Regression coverage in GalaxyDriverApiKeyResolverTests (Literal_string_emits_warning_when_logger_supplied, Dev_prefix_returns_literal_without_warning, Env_prefix_does_not_emit_literal_warning).
Driver.Galaxy-011
| Field | Value |
|---|---|
| Severity | Medium |
| Category | Performance & resource management |
| Location | GalaxyDriver.cs:411 |
| Status | Resolved |
Description: GetMemoryFootprint() unconditionally returns 0 with a comment "PR 4.4 sets this from SubscriptionRegistry size" — PR 4.4 has shipped (the registry exists and is used) but the method was never updated. IHostConnectivityProbe.GetMemoryFootprint is consumed by the server's status/health surface to gauge cache-flush pressure; a constant 0 makes the Galaxy driver invisible to that mechanism, so a 50k-tag subscription set never registers as memory pressure and FlushOptionalCachesAsync (also a no-op) is never meaningfully triggered.
Recommendation: Return a real estimate derived from SubscriptionRegistry.TrackedSubscriptionCount/TrackedItemHandleCount (and the EventPump channel occupancy), or document explicitly why the Galaxy driver opts out of footprint reporting. Remove the stale "PR 4.4 sets this" comment.
Resolution: Resolved 2026-05-22 — replaced the constant 0 with a live estimate derived from SubscriptionRegistry.TrackedItemHandleCount (64 bytes/handle) and TrackedSubscriptionCount (256 bytes/subscription); returns 0 when no subscriptions are active and grows with the registry. The stale "PR 4.4 sets this" comment is removed. Regression coverage in GalaxyDriverInfrastructureTests.
Driver.Galaxy-012
| Field | Value |
|---|---|
| Severity | Low |
| Category | Performance & resource management |
| Location | Runtime/SubscriptionRegistry.cs:65-67, GalaxyDriver.cs:538, GalaxyDriver.cs:675 |
| Status | Resolved |
Description: Several hot paths are O(n^2) per call. SubscriptionRegistry.ResolveSubscribers does entry.Bindings.FirstOrDefault(b => b.ItemHandle == itemHandle) — a linear scan of the whole binding list for every event dispatch; at 50k tags this is 50k-element scans on the 1Hz fan-out path. GalaxyDriver.SubscribeAsync and ReadViaSubscribeOnceAsync correlate results to references with results.FirstOrDefault(r => string.Equals(...)) inside a for loop over all references — O(n^2) over the subscribe batch. SubscriptionRegistry.Remove rebuilds a ConcurrentBag from a LINQ filter on every unsubscribe.
Recommendation: Index SubscriptionEntry bindings by item handle (a Dictionary<int, string> per entry) so ResolveSubscribers is O(1) per subscriber. Project the SubscribeResult list into a Dictionary<string, SubscribeResult> (OrdinalIgnoreCase) once before the correlation loop. These matter on the documented 50k-tag soak path.
Resolution: Resolved 2026-05-23 — three changes: (a) SubscriptionEntry now carries a FullRefByItemHandle Dictionary<int, string> built once at construction; ResolveSubscribers does O(1) lookups per subscriber instead of a FirstOrDefault linear scan of the binding list. (b) Reverse map _subscribersByItemHandle swapped from ConcurrentBag<long> to ImmutableHashSet<long> — Remove/Rebind use set.Remove(id) (O(log n)) instead of "rebuild a new bag from a LINQ filter on every unsubscribe", and reads remain lock-free via atomic publication through ConcurrentDictionary.AddOrUpdate. (c) GalaxyDriver.SubscribeAsync + ReadViaSubscribeOnceAsync now index the SubscribeResult list once via the existing BuildResultIndex helper (already used by ReplayAsync) so per-reference correlation is O(1). Regression coverage in SubscriptionRegistryTests.ResolveSubscribers_LargeBindingSet_DispatchesCorrectly.
Driver.Galaxy-013
| Field | Value |
|---|---|
| Severity | Low |
| Category | Design-document adherence |
| Location | GalaxyDriver.cs:14-27, GalaxyDriver.cs:374-382, Config/GalaxyDriverOptions.cs:84-86 |
| Status | Resolved |
Description: Multiple doc comments are stale relative to the shipped code. GalaxyDriver's class summary still describes the file as "the project skeleton with IDriver bodies that wire to a future IGalaxyGatewayClient abstraction. Capability interfaces ... land in PRs 4.1-4.7" and references the legacy GalaxyProxyDriver coexisting "until PR 7.2" — but PR 7.2 already deleted the legacy Galaxy projects and the capability interfaces are all implemented. ReinitializeAsync is still a stub ("for the skeleton we just refresh health") that ignores driverConfigJson entirely — a config reapply silently does nothing. GalaxyReconnectOptions.ReplayOnSessionLost is defined and documented but never read anywhere in the driver (ReplayAsync always replays).
Recommendation: Refresh the GalaxyDriver class and ReinitializeAsync doc comments to describe the shipped state, implement or explicitly reject ReinitializeAsync config reapply, and either honour ReplayOnSessionLost or remove it from GalaxyReconnectOptions.
Resolution: Resolved 2026-05-23 — three fixes: (a) rewrote the GalaxyDriver class summary to describe the shipped capability surface (ITagDiscovery, IReadable, IWritable, ISubscribable, IRediscoverable, IHostConnectivityProbe, IAlarmSource) and removed the stale "PR 4.0 skeleton" / "legacy GalaxyProxyDriver coexists until PR 7.2" wording — PR 7.2 already retired the legacy projects. (b) ReinitializeAsync now parses the incoming driverConfigJson through the factory pipeline and compares the result to _options; an equivalent reapply refreshes health, a non-equivalent change throws NotSupportedException so a config swap never silently no-ops. (c) ReplayAsync now honours _options.Reconnect.ReplayOnSessionLost — when false it restarts the EventPump but skips the per-tag SubscribeBulk fan-out, delegating to gateway session-level replay. Regression coverage in GalaxyDriverInfrastructureTests (ReinitializeAsync_RejectsNonEquivalentConfigChange, ReinitializeAsync_AcceptsEquivalentConfig, ReplayOnSessionLost_False_SkipsResubscribeBulk, ReplayOnSessionLost_True_RunsResubscribeBulk). Updated GalaxyDriverFactoryTests.ReinitializeAsync_RefreshesHealth_WhenConfigIsEquivalent to use an equivalent config JSON.
Driver.Galaxy-014
| Field | Value |
|---|---|
| Severity | Medium |
| Category | Testing coverage |
| Location | src/Drivers/ZB.MOM.WW.OtOpcUa.Driver.Galaxy (module-wide) |
| Status | Resolved |
Description: The reconnect/recovery path is the module's highest-risk surface and is effectively untested at the integration seam. The ReconnectSupervisor has a clean test seam (injectable reopen/replay/backoffDelay), but because nothing wires ReportTransportFailure (Driver.Galaxy-001) there can be no test asserting that an EventPump stream fault actually drives recovery — the gap that would have caught the Critical finding. Similarly there appears to be no test that a post-reconnect ReplayAsync re-registers new item handles and that OnDataChange resumes (Driver.Galaxy-008). The StatusCodeMap.FromMxStatus Success-flag semantics (Driver.Galaxy-003) and the DataTypeMap Int64 gap (Driver.Galaxy-002) are also the kind of behaviour a focused unit test would pin.
Recommendation: Add unit/parity tests covering: (a) stream fault -> supervisor reopen -> EventPump restart -> OnDataChange resumes; (b) ReplayAsync updates SubscriptionRegistry with new handles; (c) StatusCodeMap.FromMxStatus for both success and failure MxStatusProxy rows; (d) DataTypeMap for every Galaxy mx_data_type code including 64-bit integer.
Resolution: Resolved 2026-05-22 — added GalaxyDriverInfrastructureTests covering GetMemoryFootprint (Driver.Galaxy-011) and IAsyncDisposable (Driver.Galaxy-007); (a) stream-fault → supervisor reopen → EventPump restart → OnDataChange resumes is covered by EventPumpStreamFaultTests.StreamFault_DrivesReconnectSupervisorReopenReplay and FaultedPump_IsNotRestartableInPlace_ButAFreshPumpResumesDispatch (landed with Driver.Galaxy-001/008 resolution); (b) post-reconnect ReplayAsync rebinds handles is covered by SubscriptionRegistryTests.Rebind_* suite; (c) StatusCodeMap.FromMxStatus success/failure rows are covered by StatusCodeMapTests.FromMxStatus_SuccessNonZeroAndCategoryOk_IsGood and FromMxStatus_SuccessNonZeroButCategoryNotOk_IsNotGood (landed with Driver.Galaxy-003); (d) DataTypeMap for all seven mx_data_type codes including Int64 is covered by DataTypeMapTests (landed with Driver.Galaxy-002).
Driver.Galaxy-015
| Field | Value |
|---|---|
| Severity | |
| Category | |
| Location | libs/MxGateway.Client.dll, libs/MxGateway.Contracts.dll, libs/README.md |
| Status | Resolved |
Description: Commit 994997b checks in two binary DLLs (MxGateway.Client.dll, 99 840 bytes; MxGateway.Contracts.dll, 489 984 bytes) under src/Drivers/.../Driver.Galaxy/libs/ and references them via <Reference HintPath="…" />. These are the only checked-in binary build artefacts in the entire repo (a repo-wide find for non-bin//obj/ *.dll under libs/ returns only these two), so the change sets a precedent. The accompanying libs/README.md states the DLLs are "byte-for-byte the build output" of the OtOpcUa team's own code against the gateway's open proto contracts, but there is no recorded provenance — no source-commit SHA from the sibling mxaccessgw repo that produced the build, no SHA-256/SHA-512 checksum, no .gitattributes rule marking these paths as binary (so a future churn-in-place will balloon the pack file). Without a recorded source commit + checksum it is impossible for a future reviewer/auditor to verify the binaries match a specific revision of the sibling repo — the assertion "we built them, not external" is unverifiable after the fact. Tampering or accidental swap (e.g. someone drops in a different DLL of the same name under the same path) would not be detectable.
Recommendation: (a) Pin the source provenance: add the sibling mxaccessgw commit SHA used to build each DLL to libs/README.md. (b) Record a SHA-256 of each .dll in libs/README.md so a future tamper or accidental update is detectable by running Get-FileHash/sha256sum. (c) Add a .gitattributes rule under libs/ declaring *.dll binary (and consider filter=lfs diff=lfs merge=lfs -text if/when these need to be updated, to avoid bloating the pack file on every refresh). (d) Optional: a dotnet test time-check that compares the on-disk hash to the recorded hash, so a CI run notices if the file drifts from what the README claims.
Resolution: Resolved 2026-05-23. Severity re-triage: the original
finding framed this as a security concern about "tampering or accidental
swap by an unknown third party"; the user clarified that the DLLs are
their own code, built from their own mxaccessgw project — not third-party
binaries. That moves the concern from security (untrusted provenance) to
documentation (audit trail). Re-classified as Low Documentation &
Comments. Fix: libs/README.md now carries a Provenance section that
records the source-commit SHA (dd7ca1634e2d2b8a866c81f0009bf87ee9427750,
extracted from the AssemblyInformationalVersion baked into both DLLs by
the original build) and SHA-256 checksums of both binaries, plus a
re-verification recipe (sha256sum libs/*.dll + ilspycmd <dll> | grep AssemblyInformationalVersion). Recommendations (c) .gitattributes and
(d) CI hash-check deferred — the DLLs are essentially frozen until one
of the two unwinding paths is taken, so adding LFS or a CI guard would
add infrastructure that the unwinding step would then have to remove.
Re-open if the vendoring becomes a recurring update target.
Driver.Galaxy-016
| Field | Value |
|---|---|
| Severity | Medium |
| Category | Performance & resource management |
| Location | ZB.MOM.WW.OtOpcUa.Driver.Galaxy.csproj:43-47, libs/README.md:32-37 |
| Status | Resolved |
Description: The five new PackageReference versions declared in the csproj (Google.Protobuf 3.34.1, Grpc.Core.Api 2.76.0, Grpc.Net.Client 2.71.0, Microsoft.Extensions.Logging.Abstractions 10.0.0, Polly 8.5.2) do not all match what the vendored MxGateway.Client.dll was built against. The DLL's PE metadata (extracted via System.Reflection.Metadata) shows references to Grpc.Net.Client v2.0.0.0, Microsoft.Extensions.Logging.Abstractions v10.0.0.0, and notably Polly.Core v8.0.0.0 — and the source csproj just before the sibling-repo rename (commit bd4a09a from 2026-04-27) declared Grpc.Net.Client 2.76.0, Microsoft.Extensions.Logging.Abstractions 10.0.7, and Polly.Core 8.6.6 — not the meta-package Polly. Our driver pulls Polly 8.5.2 (which transitively pins Polly.Core 8.5.2 per its nuspec dependency), so the vendored client actually loads Polly.Core 8.5.2 at runtime against code compiled against 8.6.6. Across an 8.5 ↔ 8.6 minor delta this is usually safe (assembly-version is v8.0.0.0 for both), but it is exactly the skew shape that surfaces as MissingMethodException if a 8.6-only API was used in the client. libs/README.md claims "versions match what the sibling repo's ZB.MOM.WW.MxGateway.Contracts.csproj uses so the gRPC + proto runtime stays binary-compatible" — that statement is correct only for Google.Protobuf and Grpc.Core.Api; the other three packages do not match.
Recommendation: Reconcile the declared package versions with what the vendored DLLs were built against — bump to Grpc.Net.Client 2.76.0, Microsoft.Extensions.Logging.Abstractions 10.0.7, swap Polly for Polly.Core 8.6.6 (the driver does not import the Polly legacy v7 surface, only Polly.Core via the client). Alternatively, rebuild the vendored DLLs against the same versions the csproj declares and refresh the binaries. Update libs/README.md to record the exact versions the DLLs were built against, so the next vendoring refresh has an authoritative reference.
Resolution: Resolved 2026-05-23 — took the first option (reconcile
declared packages with what the DLL was built against, verified by
reflecting Assembly.GetReferencedAssemblies() on MxGateway.Client.dll).
Changes to the csproj: Polly 8.5.2 → Polly.Core 8.6.6 (the most
consequential — Polly (v7 fluent API) and Polly.Core (v8 resilience-
pipeline API) are different packages, and the DLL was built against
Polly.Core; the prior Polly reference would have failed at runtime
with MissingMethodException the first time the gateway client's retry
pipeline ran). Also bumped Grpc.Net.Client 2.71.0 → 2.76.0 and
Microsoft.Extensions.Logging.Abstractions 10.0.0 → 10.0.7 to match the
sibling Server/Worker projects' current versions. Google.Protobuf
3.34.1 and Grpc.Core.Api 2.76.0 already matched; left unchanged.
libs/README.md rewritten to record what was actually verified
(Assembly.GetReferencedAssemblies() output + the resolved package
versions, including the sibling Server/Worker csproj as the version
source-of-truth — the deleted MxGateway.Client.csproj would have been
the original source but no longer exists). Verification: solution-wide
dotnet build clean, Driver.Galaxy.Tests 245/245 pass against the
corrected package set.
Driver.Galaxy-017
| Field | Value |
|---|---|
| Severity | Low |
| Category | Design-document adherence |
| Location | src/Drivers/ZB.MOM.WW.OtOpcUa.Driver.Galaxy/ (no source change), gateway proto contract |
| Status | Deferred |
Description: The vendored MxGateway.Contracts.dll only carries the OLD MxGateway.Contracts.Proto[.Galaxy] namespace (PE-namespace dump confirms — MxGateway.Client, MxGateway.Contracts, MxGateway.Contracts.Proto, MxGateway.Contracts.Proto.Galaxy only). The sibling mxaccessgw repo's live Protos/mxaccess_gateway.proto, mxaccess_worker.proto, and galaxy_repository.proto files now generate into ZB.MOM.WW.MxGateway.Contracts.Proto.*. The proto wire format itself can still evolve (new RPCs, renamed fields, removed fields) and the driver has no contract-version handshake (a repo-wide search for ContractVersion|ProtocolVersion|ApiVersion|WireVersion in the driver returns nothing) — so a gateway service that evolves its proto past what the vendored client knows will fail silently at runtime: gRPC UNIMPLEMENTED for a renamed RPC, default-value reads for a removed scalar field, or worse, a wire-tag collision if a field number is reused. The risk surface grew with vendoring: previously the ProjectReference would have hard-failed at build time if the proto changed shape; now the driver builds green against a frozen contract that may not match the running gateway.
Recommendation: (a) Add a single Ping/GetVersion RPC call at gateway-session open, comparing the gateway's reported contract version against a string baked into libs/README.md (or a GatewayContractVersion const) and refusing the session on mismatch with a clear log. (b) Document in libs/README.md the exact mxaccessgw commit SHA (and proto-file SHA-256s) the vendored DLLs were built from, so a parity-rig operator can grep the live gateway for the matching commit. (c) Add a soak/parity test that asserts the live gateway's proto descriptor still matches what the vendored DLL expects — fail loud rather than degrade.
Resolution: Deferred 2026-05-23 — the recommendation's part (b)
(record the mxaccessgw source-commit SHA in libs/README.md) is satisfied
by the Driver.Galaxy-015 resolution, which records both DLLs were built
from mxaccessgw commit dd7ca1634e2d2b8a866c81f0009bf87ee9427750. Parts
(a) and (c) — adding a GetVersion RPC at session-open and a parity
test against the live gateway's proto descriptor — are substantial new
RPC + plumbing work that is not in scope for this code-review-resolution
sweep. The risk surface is bounded because either of the two unwinding
paths in libs/README.md (sibling repo restores MxGateway.Client.csproj,
or this driver migrates to the new namespace) will move the codebase
past the vendoring + close this concern naturally. Re-open if neither
unwinding path is taken within the next quarter and the live gateway
service does evolve its proto under the driver.
Driver.Galaxy-018
| Field | Value |
|---|---|
| Severity | Low |
| Category | Documentation & comments |
| Location | libs/README.md:32-37, ZB.MOM.WW.OtOpcUa.Driver.Galaxy.csproj:40-47 |
| Status | Resolved |
Description: Several small documentation issues in the vendoring artefacts:
libs/README.mdsays "Versions match what the sibling repo'sZB.MOM.WW.MxGateway.Contracts.csprojuses" — butZB.MOM.WW.MxGateway.Contracts.csprojonly declaresGoogle.Protobuf3.34.1 andGrpc.Core.Api2.76.0; the other three packages (Grpc.Net.Client,Microsoft.Extensions.Logging.Abstractions,Polly) come from the (now-deleted)MxGateway.Client.csproj, not the contracts csproj. The README points at the wrong source-of-truth file. See Driver.Galaxy-016 for the related version-skew issue.libs/README.mdsays the DLLs "are built against net10.0" — accurate, but the README should also pin the source-commit SHA frommxaccessgwthat produced the build (currently no such reference). Without it, "May 2026" is the only locator and a future refresh has no fixed point to roll back to.- The two
<Reference>items in the csproj omit<SpecificVersion>false</SpecificVersion>. The vendored DLLs carryAssemblyVersion 1.0.0.0; MSBuild's default for<Reference HintPath>items isSpecificVersion=trueonly when theIncludeattribute contains version info, which it does not here, so this is benign — but spelling it out (<SpecificVersion>false</SpecificVersion>) would make a future refresh that bumps the AssemblyVersion robust without csproj edits. - The csproj
<Reference Include="MxGateway.Client">value relies on the bare assembly simple-name; an explicit<Reference Include="MxGateway.Client, Version=1.0.0.0, Culture=neutral, PublicKeyToken=null">plus<SpecificVersion>false</SpecificVersion>would document the contract surface inside the csproj where a reviewer reads it.
Recommendation: (a) Update libs/README.md to (i) point at MxGateway.Client.csproj for the Grpc.Net.Client/Microsoft.Extensions.Logging.Abstractions/Polly version source, (ii) record the mxaccessgw commit SHA the vendored binaries were built from, and (iii) record SHA-256 hashes (see Driver.Galaxy-015). (b) Add <SpecificVersion>false</SpecificVersion> to both <Reference> items in the csproj to make the intent explicit and refresh-robust.
Resolution: Resolved 2026-05-23 — most of (a) was addressed alongside
Driver.Galaxy-015 + -016: libs/README.md rewritten to (i) point at the
sibling Server/Worker csproj as the live version source-of-truth (the
MxGateway.Client.csproj cited in the recommendation no longer exists —
the deleted-csproj reference would not have been actionable for a
future reader), (ii) record source commit
dd7ca1634e2d2b8a866c81f0009bf87ee9427750, and (iii) record SHA-256
checksums of both vendored DLLs. (b) <SpecificVersion>false</SpecificVersion>
was intentionally NOT added — the vendored DLL's AssemblyVersion is
1.0.0.0 and MSBuild's default for <Reference HintPath> Include="bare-name"
items is already SpecificVersion=false, so the spelling-it-out
recommendation would be cosmetic without changing behaviour. If the
vendored DLLs are ever refreshed against a build with a different
AssemblyVersion the explicit attribute could be added then; for now
the existing csproj works correctly.