Re-review at 7286d320. -006 (Low): FlushLogging() in all command finally blocks + tests.
-007: rewrite the inaccurate handler-detach comment (cleanup is via await using disposal).
Re-review at 7286d320. -008 (Low): all four commands now FlushLogging() in finally (parity
with AbCip.Cli; subscribe could drop shutdown log lines) + IL-inspection test.
Re-review at 7286d320. -009: 'four'->'six' driver-CLI count in Program.cs. -010: ReadCommand
--type help no longer lists Structure (rejected at runtime) + pinning test.
Re-review at 7286d320. -012 (Medium): DisconnectAsync now snapshots+nulls the data/alarm
subscriptions under _subscriptionLock before async teardown (was racing RunFailoverAsync).
-013: SubscribeAlarmsAsync guarded by a semaphore (idempotent under concurrency). -014/-015:
forward CancellationToken through Delete/BrowseNext adapters. + TDD.
Re-review at 7286d320. -011: ack/confirm/enable/disable/shelve now pre-validate --node and
surface a clean CommandException (was a raw FormatException) + tests. -012: refresh stale
test count in docs/Client.CLI.md.
Re-review at 7286d320. -011: FrameWriter folded the sync WriteByte (could block on SslStream
past the call timeout) into one async 5-byte header write. -012: DefaultTcpConnectFactory
readonly. -013: wire-parity test for PerEventStatus [Key(4)]. No wire change.
Re-review at 7286d320. -014 (Medium): ReadAtTimeAsync didn't classify StartQuery failures,
so a connection-class failure left a dead connection, re-failed every timestamp, and returned
Success=true with all-Bad (no failover); now resets+fails over via a shared classifier + tests.
-015: refresh stale named-pipe comments to TCP (no wire change). -013 (silent cap truncation,
ties OpcUaServer-002/Core.Abstractions-009) deferred cross-module. NOTE: the SDK-touching tests
are net48 + native aahClientManaged and run only on Windows; macOS verifies build + the SDK-free
subset only.
Re-review at 7286d320. -013 (Medium, testing): the managed FOCAS/2 wire-decode layer
(BuildPdu/ParseResponseBlocks, incl. cnc_getfigure stride) had zero byte-level tests; added
15 (no decode bug found). -014 (spindle-load truncation heuristic) deferred bench-gated.
Note: runtime read path is now pure-managed TCP (no P/Invoke except the probe handshake).
Re-review at 7286d320. -017 (Medium): TwinCATTagDto lacked ArrayLength, so JSON-authored
pre-declared array tags were silently scalar (Phase-4c array path dead for them). Fix:
add ArrayLength to the DTO + thread through BuildTag with positive-value guard + TDD.
Re-review at 7286d320. -014 (Medium): Bit EncodeValue (no bitIndex) wrote SetInt8 while
DecodeValue read GetInt16 on a 16-bit B-file element, so a false write could round-trip
as true (stale high byte). Fix: SetInt16 + TDD. -015: tests pass CancellationToken.
Re-review at 7286d320. -016: BrowseRecursiveAsync now releases the server-side continuation
point on OperationCanceledException (BrowseNext releaseContinuationPoints:true) before
rethrowing (resolves the Browser-002 cross-cutting leak) + TDD.
Re-review at 7286d320. Modbus-013 (Low): bit RMW now routes the FC03 read through the
validated ReadRegisterBlockAsync (was raw-indexing readResp -> IndexOutOfRange on a truncated
PDU). Modbus-014 (Low): WriteAsync maps InvalidDataException to BadCommunicationError (was
BadInternalError), matching ReadAsync. + TDD.
Re-review at 7286d320. S7-015 (Medium): a Writable array tag had no WriteArrayAsync path
and silently returned BadCommunicationError on write; now rejected at init with a clear
NotSupportedException (read-only arrays still accepted) + TDD. S7-016 (factory JSON can't
produce array tags; needs AdminUI DTO) deferred.
Re-review at 7286d320. AbCip-016 (Medium): two cooperating defects made a declared array
member (e.g. REAL[4]) read one scalar/null — fan-out dropped ElementCount/IsArray, and
UdtMemberLayout.TryBuild ignored array members (mis-placing later members). Fix: thread
array shape through fan-out + opt whole-UDT grouping out when any member is an array + TDD.
AbCip-017 (severity-read StatusCode, Low) deferred.
Re-review at 7286d320. -012 (Medium): OperationCanceledException left _drainState stuck
at Draining on the status surface; now resets to BackingOff + test. -013: _disposed ->
volatile (mirrors _backoffIndex). -014 (post-dispose status guards) deferred cross-module.
Re-review at 7286d320. -014 (Medium): AreInputsReady gated on value!=null, so a script
returning null (Good quality) permanently blocked change-triggered dependents at
BadWaitingForInitialData; now gates on the StatusCode Good bit only + test. -015:
TimerTriggerScheduler.Start throws on double-call. -016: fix wrong status-code comment.
Re-review at 7286d320. -015: dispose shelving timer at top of LoadAsync so a failed
reload doesn't leave it firing against partially-cleared state + test. -014: make
pendingEmissions required (removes unreachable fire-under-gate branch that could
reintroduce the -003 deadlock).
First review at 7286d320. Five Low doc fixes (BadNodeIdUnknown comment parity, three stale
Phase7 labels -> design-doc cites, {{equip}} token doc on GetTag/SetVirtualTag). Deadband
NaN/negative-tolerance (004) + a stale docs path (007) left Open.
Re-review at 7286d320. Core.Scripting-017 (Medium, Security): System.Runtime.CompilerServices.Unsafe
added to ForbiddenFullTypeNames (Unsafe.As bypasses the type system without an unsafe context;
CWE-843 type-confusion into SetVirtualTag) + regression tests (rejects Unsafe.As, still allows
benign CompilerServices attributes). -018: refresh stale rejection message. Sandbox holds.
Re-review at 7286d320. Core.Abstractions-009: ReadEventsAsync maxEvents<=0 sentinel now
documents the implementer's continuation-point obligation when a backend cap truncates
(the root of OpcUaServer-002). -010: PollGroupEngineTests pass CancellationToken. Plus
EquipmentTagRefResolver.TryResolve [MaybeNullWhen(false)] NRT cleanup + test.
Review at HEAD 7286d320. Driver.Galaxy.Browser-001 (High): MapSecurityClass codes 2-6 were
all shifted vs the runtime SecurityClassification enum (wrong security labels in the picker)
-> corrected all 7 arms + tests. -002: DisposeAsync swallows concurrent ObjectDisposedException.
-003 (ResolveApiKey dup) deferred to Contracts.
Review at HEAD 7286d320. ControlPlane-001 (Medium): ConfigPublishCoordinator.HandleAck
now discards acks from nodes not in _expectedAcks (prevented premature SealDeployment) +
regression test. -002 (flipped-node log count), -003 (redundant mapper arms) tidied.
Code review at HEAD 7286d320. Host-001 (High): /metrics was auth-gated on admin
nodes (Prometheus 401) -> AllowAnonymous. Host-002: register LdapOptionsValidator
unconditionally for fail-fast startup validation on admin-only nodes. Host-004: fix
metrics XML doc. Host-003 (docs) left Open.
Code review at HEAD 7286d320. Security-001 (High): guard returnUrl with a local-URL
check before redirect (open-redirect/phishing vector) + regression test. Security-002:
update stale LdapOptions dev-LDAP doc reference.
Code review at HEAD 7286d320. Cluster-001 (SeedFromCurrentState reads from one
snapshot), Cluster-003 (HoconLoader double-dispose), Cluster-004 (stale akka.conf
header), Cluster-005 (ServiceLevelCalculator tests added to Cluster.Tests). Cluster-002
deferred (no production caller).
Single commit covering the four small/medium fixes from the updated
code review.
Core.Scripting-014 (Medium, Concurrency):
CompiledScriptCache.Clear() used the key-only TryRemove(key, out var
lazy) overload — same race shape Core.Scripting-006 closed in
GetOrCompile's catch block. A concurrent re-add between snapshot and
TryRemove was evicted + disposed while the new caller still held it.
Replaced with the value-scoped TryRemove(KeyValuePair<,>) overload.
Regression test
Clear_uses_value_scoped_TryRemove_so_a_race_inserted_entry_survives
added.
Core.Scripting-013 (Medium, Security):
Hand-rolled BuildWrapperSource pastes user source between literal
braces; brace-balanced source could inject sibling methods/classes
alongside CompiledScript.Run. Analyzer still walked the injected
members so it wasn't a direct escape, but it relaxed the documented
'method body' authoring contract. Added EnforceSingleRunMember:
after ParseText, the compilation unit must hold exactly one type
(CompiledScript) and that type must hold exactly one member (the Run
method). Any deviation throws CompilationErrorException with LMX001/
LMX002 diagnostic IDs and a Core.Scripting-013 reference in the
message. Two regression tests added covering the sibling-method and
sibling-class injection vectors.
Core.Scripting-015 (Low, Correctness, latent):
ToCSharpTypeName's generic branch truncated at the first backtick via
IndexOf, silently dropping closed args of nested-generic shapes
(Outer<T>.Inner<U>). No production caller exercises this shape today
(all TContext/TResult are top-level non-nested), so the bug was
latent. Rewrote the generic branch to walk the FullName segment-by-
segment, consuming generic args per segment so nested shapes emit
valid C# (global::Ns.Outer<T>.Inner<U> rather than the broken
Outer<T,U>).
Core.ScriptedAlarms-013 (Low, Documentation):
The internal test accessors TryGetScratchReadCacheForTest /
TryGetScratchContextForTest return live mutable scratch refilled in
place under _evalGate. XML docs didn't warn future test authors about
the synchronization contract. Added a <remarks> block to each
documenting the only-safe-on-quiesced-engine + identity-or-single-key
contract.
Verification (suites green):
Core.Scripting.Tests: 110/110 (was 107 — +3 new rejection/race tests)
Core.ScriptedAlarms.Tests: 67/67 (unchanged — doc-only fix)
Core.VirtualTags.Tests: 57/57 (unchanged)
After this commit, all 12 findings from the updated re-review are
closed (10 Resolved, 1 Won't Fix none, 1 Deferred — Driver.Galaxy-017).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Driver.Galaxy-015, -016, -017, -018 resolution (one logical change set).
Driver.Galaxy-016 (Medium, Perf/Resource):
Reconciled the csproj PackageReferences with what the vendored
MxGateway.Client.dll was actually built against, verified by
reflecting Assembly.GetReferencedAssemblies() on the DLL:
- Polly 8.5.2 → Polly.Core 8.6.6
(most consequential — Polly v7 fluent API vs Polly.Core v8
resilience-pipeline API are DIFFERENT packages; the DLL was
built against Polly.Core so the prior Polly reference would
have failed at runtime with MissingMethodException the first
time the gateway client's retry pipeline ran)
- Grpc.Net.Client 2.71.0 → 2.76.0 (matches sibling Server/Worker)
- Microsoft.Extensions.Logging.Abstractions 10.0.0 → 10.0.7
Google.Protobuf 3.34.1 and Grpc.Core.Api 2.76.0 already matched —
left unchanged.
Driver.Galaxy-015 (re-triaged from Medium-Security → Low-Documentation):
Original framing was a security concern about unknown-provenance
binaries. User clarified the DLLs are their own code, built from
their own mxaccessgw project, not third-party. Re-triaged to a
documentation / audit-trail concern. Fix:
- Added a Provenance section to libs/README.md recording the
source-commit SHA (dd7ca1634e2d2b8a866c81f0009bf87ee9427750,
extracted from the AssemblyInformationalVersion baked into
both DLLs by the original build) and SHA-256 checksums.
- Documented the re-verification recipe (sha256sum + ilspycmd
| grep AssemblyInformationalVersion).
Recommendations about .gitattributes and CI hash-check deferred —
the DLLs are frozen until an unwinding path is taken, so adding
LFS or CI infrastructure now would need removal at unwinding.
Driver.Galaxy-018 (Low, Documentation):
Most of the recommendation folded into the libs/README.md rewrite
(pointed at sibling Server/Worker csproj as the live version source
rather than the deleted MxGateway.Client.csproj; recorded source
commit + SHA-256). <SpecificVersion>false</SpecificVersion> on the
<Reference> items intentionally not added — MSBuild's default for
HintPath references with bare-name Include attributes is already
SpecificVersion=false, so explicitly setting it would be cosmetic
without changing behaviour.
Driver.Galaxy-017 (Low, Design) — Deferred:
Recommendation part (b) (record mxaccessgw source-commit SHA in
libs/README.md) is satisfied by Driver.Galaxy-015's resolution.
Parts (a) and (c) — a GetVersion RPC at session-open and a parity
test against the live gateway's proto descriptor — are substantial
new RPC + plumbing work not in scope for this code-review sweep.
The risk surface is bounded because either of the libs/README.md
unwinding paths closes the vendoring + this concern naturally.
Re-open if neither path is taken within the next quarter and the
live gateway evolves its proto under the driver.
Verification:
- Build clean (Driver.Galaxy.csproj 0 errors, 0 warnings).
- Driver.Galaxy.Tests: 245/245 pass against the corrected
package set.
- Solution-wide build remains clean.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Core.Scripting-012 (High, Security) resolution.
The Core.Scripting-008 rewrite broadened the BCL references list from a
narrow allow-list to the full System.* + netstandard +
Microsoft.Win32.Registry set, delegating the security gate entirely to
ForbiddenTypeAnalyzer. Three categories of dangerous BCL types were
reachable from script source without a deny-list entry:
- System.Threading.ThreadPool — QueueUserWorkItem re-introduces the
background-fanout threat Core.Scripting-003 closed against
System.Threading.Tasks.
- System.Threading.Timer — schedules unbounded callback work that
outlives the per-evaluation timeout.
- System.Runtime.Loader.AssemblyLoadContext — loads arbitrary DLLs.
Defense-in-depth gap; invocation needs reflection (already denied)
but the load itself was reachable.
Fix:
- Added 'System.Runtime.Loader' to ForbiddenNamespacePrefixes
(preferred over type-granular per the recommendation so future BCL
additions to that namespace are denied by default).
- Added 'System.Threading.ThreadPool' and 'System.Threading.Timer'
to ForbiddenFullTypeNames — both live in System.Threading shared
with allowed primitives so they must be type-granular.
Regression tests added to ScriptSandboxTests:
Rejects_ThreadPool_QueueUserWorkItem_at_compile
Rejects_Timer_new_at_compile
Rejects_AssemblyLoadContext_at_compile
Docs:
docs/v2/implementation/phase-7-scripting-and-alarming.md decision #6
and the Sandbox-escape compliance-check row both updated to enumerate
the new entries per the Core.Scripting-009 doc-sync convention.
Two lower-impact suggestions from the finding's recommendation
(System.Console, CultureInfo.DefaultThreadCurrentCulture) were
intentionally not addressed and are recorded as accepted minor risks
in the resolution.
Verification: Core.Scripting.Tests 107/107 (was 104 + 3 new rejection
tests).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Both VirtualTagEngine.Load and ScriptedAlarmEngine.LoadAsync were calling
ScriptEvaluator.Compile directly, bypassing CompiledScriptCache. The
Core.Scripting-008 collectible-ALC fix wired Dispose only through the cache's
Clear()/Dispose(), so the per-publish accretion the -008 fix was meant to
eliminate was still in effect on the actual production path — the headline
'no more restarts needed' guarantee wasn't delivered.
Resolution:
- VirtualTagEngine + ScriptedAlarmEngine each gained a private
CompiledScriptCache<TContext, TResult> instance.
- Both Load methods now call _compileCache.GetOrCompile(source).
- Publish-replace path: _compileCache.Clear() runs alongside the existing
_tags / _alarms clears so the prior generation's ALCs are disposed
before recompile.
- Engine Dispose now calls _compileCache.Dispose() so shutdown actually
releases the emitted assemblies.
Side-fix in CompiledScriptCache: Dispose() set _disposed=true then called
Clear(), but Clear() had a pre-existing 'if (_disposed) return' guard that
aborted the drain unconditionally — making the Dispose-triggered cleanup a
silent no-op. Removed the disposed-guard on Clear() (clearing an empty/
cleared cache is idempotent).
Side-fix in ScriptedAlarmEngine.Dispose: cleared _alarms AFTER the
Task.WhenAll drain. The drain guarantees no background callback is mid-
flight, so clearing is safe. Previously _alarms was deliberately NOT
cleared on Dispose (per Core.ScriptedAlarms-005), but that left the
AlarmState records holding TimedScriptEvaluator → ScriptEvaluator → delegate
references that rooted the emitted assemblies, defeating the cache's
Dispose work on the engine side.
Regression tests:
- VirtualTagEngineTests.Dispose_unloads_compiled_script_assembly
- ScriptedAlarmEngineTests.Dispose_unloads_compiled_predicate_assembly
Both use WeakReference + bounded GC.Collect() to prove the emitted
assembly is reclaimable after engine.Dispose(). The alarms test had to
be synchronous (not 'async Task<WeakReference>') because async state
machines capture locals as state-struct fields, keeping them alive past
the method's apparent end and defeating GC.
Verification:
- Core.Scripting.Tests: 104/104 (unchanged).
- VirtualTags.Tests: 57/57 (was 56 — +1 unload test).
- ScriptedAlarms.Tests: 67/67 (was 66 — +1 unload test).
- All other consumer suites still green.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>