Driver.Galaxy-015, -016, -017, -018 resolution (one logical change set).
Driver.Galaxy-016 (Medium, Perf/Resource):
Reconciled the csproj PackageReferences with what the vendored
MxGateway.Client.dll was actually built against, verified by
reflecting Assembly.GetReferencedAssemblies() on the DLL:
- Polly 8.5.2 → Polly.Core 8.6.6
(most consequential — Polly v7 fluent API vs Polly.Core v8
resilience-pipeline API are DIFFERENT packages; the DLL was
built against Polly.Core so the prior Polly reference would
have failed at runtime with MissingMethodException the first
time the gateway client's retry pipeline ran)
- Grpc.Net.Client 2.71.0 → 2.76.0 (matches sibling Server/Worker)
- Microsoft.Extensions.Logging.Abstractions 10.0.0 → 10.0.7
Google.Protobuf 3.34.1 and Grpc.Core.Api 2.76.0 already matched —
left unchanged.
Driver.Galaxy-015 (re-triaged from Medium-Security → Low-Documentation):
Original framing was a security concern about unknown-provenance
binaries. User clarified the DLLs are their own code, built from
their own mxaccessgw project, not third-party. Re-triaged to a
documentation / audit-trail concern. Fix:
- Added a Provenance section to libs/README.md recording the
source-commit SHA (dd7ca1634e2d2b8a866c81f0009bf87ee9427750,
extracted from the AssemblyInformationalVersion baked into
both DLLs by the original build) and SHA-256 checksums.
- Documented the re-verification recipe (sha256sum + ilspycmd
| grep AssemblyInformationalVersion).
Recommendations about .gitattributes and CI hash-check deferred —
the DLLs are frozen until an unwinding path is taken, so adding
LFS or CI infrastructure now would need removal at unwinding.
Driver.Galaxy-018 (Low, Documentation):
Most of the recommendation folded into the libs/README.md rewrite
(pointed at sibling Server/Worker csproj as the live version source
rather than the deleted MxGateway.Client.csproj; recorded source
commit + SHA-256). <SpecificVersion>false</SpecificVersion> on the
<Reference> items intentionally not added — MSBuild's default for
HintPath references with bare-name Include attributes is already
SpecificVersion=false, so explicitly setting it would be cosmetic
without changing behaviour.
Driver.Galaxy-017 (Low, Design) — Deferred:
Recommendation part (b) (record mxaccessgw source-commit SHA in
libs/README.md) is satisfied by Driver.Galaxy-015's resolution.
Parts (a) and (c) — a GetVersion RPC at session-open and a parity
test against the live gateway's proto descriptor — are substantial
new RPC + plumbing work not in scope for this code-review sweep.
The risk surface is bounded because either of the libs/README.md
unwinding paths closes the vendoring + this concern naturally.
Re-open if neither path is taken within the next quarter and the
live gateway evolves its proto under the driver.
Verification:
- Build clean (Driver.Galaxy.csproj 0 errors, 0 warnings).
- Driver.Galaxy.Tests: 245/245 pass against the corrected
package set.
- Solution-wide build remains clean.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Re-reviewed the four modules with source changes since the previous review
commit 76d35d1, per REVIEW-PROCESS.md section 6. Updated each findings.md
header (date 2026-05-23, commit a9be809) and appended new findings under
continued numbering. Regenerated README.md.
## New findings — 12 total across 4 modules
### Core.Scripting (5 new, IDs -012 to -016)
- **-012 High Security** — broadened BCL references (System.* + netstandard)
re-expose System.Threading.ThreadPool / Timer / AssemblyLoadContext, which
the analyzer's deny-list doesn't cover. Re-introduces the background-work
threat Core.Scripting-003 closed via System.Threading.Tasks deny.
- **-013 Medium Security** — hand-rolled wrapper-source generation lets
brace-balanced user source inject sibling methods/classes alongside
CompiledScript.Run. Analyzer still gates forbidden types, but the
documented 'method body' authoring contract is silently relaxed.
- **-014 Medium Concurrency** — CompiledScriptCache.Clear() uses key-only
TryRemove(key, out _) — the same race the -006 resolution fixed in
GetOrCompile's catch is latent here on publish-replace.
- **-015 Low Correctness** — ToCSharpTypeName truncates at first backtick;
silently drops closed type arguments of nested-generic shapes (Outer<>.Inner<>).
Latent — no production caller uses this shape today.
- **-016 Medium Performance** — VirtualTagEngine + ScriptedAlarmEngine call
ScriptEvaluator.Compile directly without going through CompiledScriptCache,
so the headline -008 collectible-ALC fix doesn't run on the actual
production path — the per-publish leak is still in effect.
### Core.ScriptedAlarms (1 new, ID -013)
- **-013 Low Documentation** — new internal test accessors return the live
mutable scratch dictionary; XML docs don't warn future test authors about
the synchronisation contract.
### Driver.Cli.Common (2 new, IDs -007, -008)
- **-007 High Correctness** — 0x80550000 was added as BadDeviceFailure but
the real OPC UA spec value for BadDeviceFailure is 0x808B0000 (verified
against Driver.Galaxy.Runtime.StatusCodeMap and HistorianQualityMapper,
both of which use the correct 0x808B0000). 0x80550000 is actually
BadSecurityPolicyRejected. The native mappers (FOCAS / AbCip / AbLegacy)
all use the wrong 0x80550000; this session's SnapshotFormatter extension
propagated the wrong name and the test asserts against the same wrong
value so CI is blind — same shape of bug as Driver.Cli.Common-001.
- **-008 Low Testing** — new FormatStatus_names_native_driver_emitted_codes
Theory is redundant with the existing well-known Theory (same five
InlineData rows added to both) and uses weaker ShouldContain assertion
than the well-known Theory's ShouldBe.
### Driver.Galaxy (4 new, IDs -015 to -018)
- **-015 Medium Security** — vendored DLLs (libs/) have no recorded
provenance: no source-commit SHA from the mxaccessgw repo, no SHA-256
checksum in libs/README.md. Tampering / accidental swap undetectable.
- **-016 Medium Performance** — version skew between declared
PackageReferences (Polly 8.5.2 / Grpc.Net.Client 2.71.0 /
Microsoft.Extensions.Logging.Abstractions 10.0.0) and what the vendored
DLL was actually built against (Polly.Core 8.6.6 / Grpc.Net.Client
2.76.0 / Microsoft.Extensions.Logging.Abstractions 10.0.7). Latent now
(assembly-version refs are loose) but precise shape that produces a
runtime MissingMethodException.
- **-017 Low Design** — no contract-version handshake between the driver
and the gateway; proto could evolve under the gateway without the
driver noticing.
- **-018 Low Documentation** — libs/README.md points at the wrong sibling
csproj as the version source-of-truth; missing SpecificVersion=false
on the Reference items; missing mxaccessgw source-commit SHA.
## Particularly notable
Two findings undercut commits from this session:
- Driver.Cli.Common-007 invalidates commit 5a9c459 (which named 0x80550000
as BadDeviceFailure across the cross-CLI shortlist).
- Core.Scripting-016 invalidates the production effect of commit 7b6ab2e
(the collectible-ALC fix wired Dispose only via CompiledScriptCache,
which the engines don't use).
The wider native-mapper miscoding behind -007 also affects three driver
modules outside this session's edit scope (FocasStatusMapper,
AbCipStatusMapper, AbLegacyStatusMapper all carry the wrong code).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- Driver.Galaxy-005: rewrite the EventPump BoundedChannelOptions comment
to honestly describe the Wait+TryWrite pattern.
- Driver.Galaxy-010: ResolveApiKey now warns when a literal API key is
used in production wiring; added an explicit dev: prefix for known
cleartext-in-dev cases and rewrote the GalaxyGatewayOptions doc.
- Driver.Galaxy-012: O(1) reverse-lookup for SubscriptionRegistry
dispatch via per-entry FullRefByItemHandle map; immutable hash-set for
the cross-binding reverse map; SubscribeAsync / ReadViaSubscribeOnce
use BuildResultIndex for per-reference correlation.
- Driver.Galaxy-013: ReinitializeAsync now validates the incoming JSON
against the running options; ReplayOnSessionLost honoured by the
Replay path; class summary rewritten to describe the shipped surface.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Add GalaxyDriverInfrastructureTests covering the two gaps identified in this finding
that are not yet tracked by a dedicated test file: GetMemoryFootprint returns a live
registry-derived estimate (Driver.Galaxy-011) and DisposeAsync completes without
deadlock (Driver.Galaxy-007). The remaining items listed in the finding are covered
by earlier resolution commits: stream-fault → recovery → OnDataChange resumes
(EventPumpStreamFaultTests), post-reconnect Rebind (SubscriptionRegistryTests),
StatusCodeMap.FromMxStatus success/failure semantics (StatusCodeMapTests), and
DataTypeMap all seven codes (DataTypeMapTests). Update findings.md header to 4 open.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
GetMemoryFootprint() returned a constant 0 with a stale "PR 4.4 sets this" comment
even though PR 4.4 shipped the SubscriptionRegistry. Replace with a live estimate:
64 bytes × TrackedItemHandleCount + 256 bytes × TrackedSubscriptionCount. A 50k-tag
set now registers ~3 MB with the server's cache-flush heuristic instead of being
invisible. Returns 0 when no subscriptions are active.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Fix two resource-management bugs in StartDeployWatcher / BuildDefaultHierarchySource:
(a) Replace the discarded `_ = StartAsync(...)` with an explicit task variable that
surfaces any synchronous InvalidOperationException (called-twice guard) rather than
silently swallowing it.
(b) Change both StartDeployWatcher and BuildDefaultHierarchySource to use ??= on
_ownedRepositoryClient so the first client created (by whichever path runs first)
is reused by the second path, preventing a second GalaxyRepositoryClient from being
created and the first from leaking past the driver's lifetime.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Implement IAsyncDisposable on GalaxyDriver so async sub-component disposals
(EventPump, AlarmFeed, MxSession, MxClient, RepositoryClient) are awaited rather
than blocked on GetAwaiter().GetResult(). DisposeAsync is now the primary path;
Dispose() delegates to it for using-statement compatibility. Each async component's
shutdown is awaited individually with a best-effort catch so a single slow shutdown
cannot prevent the rest of the cleanup sequence from running.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
HashSet<T>.First() enumeration order is unspecified and unstable across mutations, so
the "owner" handle attached to alarm events was non-deterministic when multiple alarm
subscriptions were active. Change _alarmSubscriptions from HashSet to List (preserving
insertion order) and pick [0] — the earliest-registered handle — as the deterministic
owner. The server routes transitions by SourceNodeId, not by handle, so the choice of
handle does not affect delivery to active subscribers.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Add StatusCodeMap.ToQualityCategoryByte(uint) so the StatusCode → quality-byte
mapping lives in one place next to its inverse (FromQualityByte). GalaxyDriver
OnPumpDataChange now delegates to the helper instead of duplicating the shift+switch
inline; a future edit to the OPC UA bit layout cannot silently desync the probe-health
decode. Unit tests in StatusCodeMapTests pin all three category buckets and the
round-trip invariant.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
StatusCodeMap.FromMxStatus checked `success != 0` to determine success, but the
mxaccessgw proto contract explicitly documents that `success` is not a boolean and
that clients must branch on `category` (MX_STATUS_CATEGORY_OK), not on `success`
alone. Replace the raw field check with `status.IsSuccess()` from
MxStatusProxyExtensions, which requires both `success != 0` AND `category == Ok`.
A worker reporting success=1 with a non-OK category was previously misreported as
Good. Updated StatusCodeMapTests with a regression case covering the inverted scenario.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Driver.Galaxy-002 — DataTypeMap.Map had no Int64 arm though MxValueDecoder/
MxValueEncoder both fully support Int64. Galaxy attributes with the Int64
mx_data_type code fell through to the String default, creating a String
address-space node while runtime reads decoded a boxed long. Added
`6 => DriverDataType.Int64`, extending the contiguous 0..5 scheme so the type
map agrees with the decoder/encoder on all seven Galaxy data types.
Driver.Galaxy-008 — after a stream fault the EventPump's StreamEvents consumer
loop exited and its channel completed; EventPump.Start() is a no-op on a
completed-but-non-null loop, so a replayed subscription had no consumer and
ReplayAsync never re-registered the post-reconnect item handles. ReplayAsync
now recreates the EventPump (RestartEventPumpForReplay) and rebinds the
SubscriptionRegistry per subscription with the fresh item handles returned by
the post-reconnect SubscribeBulkAsync, via new SubscriptionRegistry.SnapshotEntries
and Rebind APIs.
Regression tests: DataTypeMapTests (every code incl. Int64), SubscriptionRegistry
Tests (Rebind/SnapshotEntries), EventPumpStreamFaultTests (faulted pump dead,
fresh pump resumes dispatch).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The ReconnectSupervisor was constructed but its trigger
ReportTransportFailure was never called. When the gateway StreamEvents
stream faulted, EventPump just logged and exited — the supervisor was
never notified, so a transient gateway drop permanently stopped
data-change notifications while GetHealth() still reported Healthy.
EventPump gains an optional onStreamFault callback invoked from its
stream-fault catch block (not on clean shutdown). GalaxyDriver wires it
to ReconnectSupervisor.ReportTransportFailure so a transport drop drives
reopen → replay.
This is the minimal fix for -001; the pump-restart-on-reopen gap remains
tracked as Driver.Galaxy-008. Regression tests cover the callback being
invoked on fault, the end-to-end supervisor reopen/replay, and that a
clean shutdown does not fire it. Driver.Galaxy suite: 206/206 pass.
Resolves code-review finding Driver.Galaxy-001 (Critical).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Reviewed all 31 src/ production projects against the 10-category
checklist in REVIEW-PROCESS.md. Each module gets its own findings.md;
code-reviews/README.md is regenerated from them.
334 findings: 6 Critical, 46 High, 126 Medium, 156 Low.
Critical findings:
- Server-001: WriteNodeIdUnknown recurses unconditionally — a HistoryRead
on an unresolvable node crashes the process (remote DoS).
- Admin-001/002: app-wide auth bypass (RouteView not AuthorizeRouteView)
plus unauthenticated mutating routes.
- Core.Scripting-001: System.Environment reachable from operator scripts;
Environment.Exit() terminates the server.
- Core.AlarmHistorian-001: rowIds/events parallel-list desync on a corrupt
payload misapplies outcomes — silent alarm-event data loss.
- Driver.Galaxy-001: ReconnectSupervisor is built but never triggered, so
a transient gateway drop permanently kills the event stream.
All findings are Status=Open; resolution is tracked per REVIEW-PROCESS.md
section 4. Review only — no source code changed.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>