Re-review at 7286d320. -006 (Low): FlushLogging() in all command finally blocks + tests.
-007: rewrite the inaccurate handler-detach comment (cleanup is via await using disposal).
Re-review at 7286d320. -008 (Low): all four commands now FlushLogging() in finally (parity
with AbCip.Cli; subscribe could drop shutdown log lines) + IL-inspection test.
Re-review at 7286d320. -009: 'four'->'six' driver-CLI count in Program.cs. -010: ReadCommand
--type help no longer lists Structure (rejected at runtime) + pinning test.
Re-review at 7286d320. -012 (Medium): DisconnectAsync now snapshots+nulls the data/alarm
subscriptions under _subscriptionLock before async teardown (was racing RunFailoverAsync).
-013: SubscribeAlarmsAsync guarded by a semaphore (idempotent under concurrency). -014/-015:
forward CancellationToken through Delete/BrowseNext adapters. + TDD.
Re-review at 7286d320. -011: ack/confirm/enable/disable/shelve now pre-validate --node and
surface a clean CommandException (was a raw FormatException) + tests. -012: refresh stale
test count in docs/Client.CLI.md.
Re-review at 7286d320. -011: FrameWriter folded the sync WriteByte (could block on SslStream
past the call timeout) into one async 5-byte header write. -012: DefaultTcpConnectFactory
readonly. -013: wire-parity test for PerEventStatus [Key(4)]. No wire change.
Re-review at 7286d320. -014 (Medium): ReadAtTimeAsync didn't classify StartQuery failures,
so a connection-class failure left a dead connection, re-failed every timestamp, and returned
Success=true with all-Bad (no failover); now resets+fails over via a shared classifier + tests.
-015: refresh stale named-pipe comments to TCP (no wire change). -013 (silent cap truncation,
ties OpcUaServer-002/Core.Abstractions-009) deferred cross-module. NOTE: the SDK-touching tests
are net48 + native aahClientManaged and run only on Windows; macOS verifies build + the SDK-free
subset only.
Re-review at 7286d320. -013 (Medium, testing): the managed FOCAS/2 wire-decode layer
(BuildPdu/ParseResponseBlocks, incl. cnc_getfigure stride) had zero byte-level tests; added
15 (no decode bug found). -014 (spindle-load truncation heuristic) deferred bench-gated.
Note: runtime read path is now pure-managed TCP (no P/Invoke except the probe handshake).
Re-review at 7286d320. -017 (Medium): TwinCATTagDto lacked ArrayLength, so JSON-authored
pre-declared array tags were silently scalar (Phase-4c array path dead for them). Fix:
add ArrayLength to the DTO + thread through BuildTag with positive-value guard + TDD.
Re-review at 7286d320. -014 (Medium): Bit EncodeValue (no bitIndex) wrote SetInt8 while
DecodeValue read GetInt16 on a 16-bit B-file element, so a false write could round-trip
as true (stale high byte). Fix: SetInt16 + TDD. -015: tests pass CancellationToken.
Re-review at 7286d320. -016: BrowseRecursiveAsync now releases the server-side continuation
point on OperationCanceledException (BrowseNext releaseContinuationPoints:true) before
rethrowing (resolves the Browser-002 cross-cutting leak) + TDD.
Re-review at 7286d320. Modbus-013 (Low): bit RMW now routes the FC03 read through the
validated ReadRegisterBlockAsync (was raw-indexing readResp -> IndexOutOfRange on a truncated
PDU). Modbus-014 (Low): WriteAsync maps InvalidDataException to BadCommunicationError (was
BadInternalError), matching ReadAsync. + TDD.
Re-review at 7286d320. S7-015 (Medium): a Writable array tag had no WriteArrayAsync path
and silently returned BadCommunicationError on write; now rejected at init with a clear
NotSupportedException (read-only arrays still accepted) + TDD. S7-016 (factory JSON can't
produce array tags; needs AdminUI DTO) deferred.
Re-review at 7286d320. AbCip-016 (Medium): two cooperating defects made a declared array
member (e.g. REAL[4]) read one scalar/null — fan-out dropped ElementCount/IsArray, and
UdtMemberLayout.TryBuild ignored array members (mis-placing later members). Fix: thread
array shape through fan-out + opt whole-UDT grouping out when any member is an array + TDD.
AbCip-017 (severity-read StatusCode, Low) deferred.
Re-review at 7286d320. -012 (Medium): OperationCanceledException left _drainState stuck
at Draining on the status surface; now resets to BackingOff + test. -013: _disposed ->
volatile (mirrors _backoffIndex). -014 (post-dispose status guards) deferred cross-module.
Re-review at 7286d320. -014 (Medium): AreInputsReady gated on value!=null, so a script
returning null (Good quality) permanently blocked change-triggered dependents at
BadWaitingForInitialData; now gates on the StatusCode Good bit only + test. -015:
TimerTriggerScheduler.Start throws on double-call. -016: fix wrong status-code comment.
Re-review at 7286d320. -015: dispose shelving timer at top of LoadAsync so a failed
reload doesn't leave it firing against partially-cleared state + test. -014: make
pendingEmissions required (removes unreachable fire-under-gate branch that could
reintroduce the -003 deadlock).
First review at 7286d320. Five Low doc fixes (BadNodeIdUnknown comment parity, three stale
Phase7 labels -> design-doc cites, {{equip}} token doc on GetTag/SetVirtualTag). Deadband
NaN/negative-tolerance (004) + a stale docs path (007) left Open.
Re-review at 7286d320. Core.Scripting-017 (Medium, Security): System.Runtime.CompilerServices.Unsafe
added to ForbiddenFullTypeNames (Unsafe.As bypasses the type system without an unsafe context;
CWE-843 type-confusion into SetVirtualTag) + regression tests (rejects Unsafe.As, still allows
benign CompilerServices attributes). -018: refresh stale rejection message. Sandbox holds.
Re-review at 7286d320. Core.Abstractions-009: ReadEventsAsync maxEvents<=0 sentinel now
documents the implementer's continuation-point obligation when a backend cap truncates
(the root of OpcUaServer-002). -010: PollGroupEngineTests pass CancellationToken. Plus
EquipmentTagRefResolver.TryResolve [MaybeNullWhen(false)] NRT cleanup + test.
Review at HEAD 7286d320. Driver.Galaxy.Browser-001 (High): MapSecurityClass codes 2-6 were
all shifted vs the runtime SecurityClassification enum (wrong security labels in the picker)
-> corrected all 7 arms + tests. -002: DisposeAsync swallows concurrent ObjectDisposedException.
-003 (ResolveApiKey dup) deferred to Contracts.
Review at HEAD 7286d320. ControlPlane-001 (Medium): ConfigPublishCoordinator.HandleAck
now discards acks from nodes not in _expectedAcks (prevented premature SealDeployment) +
regression test. -002 (flipped-node log count), -003 (redundant mapper arms) tidied.
Code review at HEAD 7286d320. Host-001 (High): /metrics was auth-gated on admin
nodes (Prometheus 401) -> AllowAnonymous. Host-002: register LdapOptionsValidator
unconditionally for fail-fast startup validation on admin-only nodes. Host-004: fix
metrics XML doc. Host-003 (docs) left Open.
Code review at HEAD 7286d320. Security-001 (High): guard returnUrl with a local-URL
check before redirect (open-redirect/phishing vector) + regression test. Security-002:
update stale LdapOptions dev-LDAP doc reference.
Code review at HEAD 7286d320. Cluster-001 (SeedFromCurrentState reads from one
snapshot), Cluster-003 (HoconLoader double-dispose), Cluster-004 (stale akka.conf
header), Cluster-005 (ServiceLevelCalculator tests added to Cluster.Tests). Cluster-002
deferred (no production caller).
Single commit covering the four small/medium fixes from the updated
code review.
Core.Scripting-014 (Medium, Concurrency):
CompiledScriptCache.Clear() used the key-only TryRemove(key, out var
lazy) overload — same race shape Core.Scripting-006 closed in
GetOrCompile's catch block. A concurrent re-add between snapshot and
TryRemove was evicted + disposed while the new caller still held it.
Replaced with the value-scoped TryRemove(KeyValuePair<,>) overload.
Regression test
Clear_uses_value_scoped_TryRemove_so_a_race_inserted_entry_survives
added.
Core.Scripting-013 (Medium, Security):
Hand-rolled BuildWrapperSource pastes user source between literal
braces; brace-balanced source could inject sibling methods/classes
alongside CompiledScript.Run. Analyzer still walked the injected
members so it wasn't a direct escape, but it relaxed the documented
'method body' authoring contract. Added EnforceSingleRunMember:
after ParseText, the compilation unit must hold exactly one type
(CompiledScript) and that type must hold exactly one member (the Run
method). Any deviation throws CompilationErrorException with LMX001/
LMX002 diagnostic IDs and a Core.Scripting-013 reference in the
message. Two regression tests added covering the sibling-method and
sibling-class injection vectors.
Core.Scripting-015 (Low, Correctness, latent):
ToCSharpTypeName's generic branch truncated at the first backtick via
IndexOf, silently dropping closed args of nested-generic shapes
(Outer<T>.Inner<U>). No production caller exercises this shape today
(all TContext/TResult are top-level non-nested), so the bug was
latent. Rewrote the generic branch to walk the FullName segment-by-
segment, consuming generic args per segment so nested shapes emit
valid C# (global::Ns.Outer<T>.Inner<U> rather than the broken
Outer<T,U>).
Core.ScriptedAlarms-013 (Low, Documentation):
The internal test accessors TryGetScratchReadCacheForTest /
TryGetScratchContextForTest return live mutable scratch refilled in
place under _evalGate. XML docs didn't warn future test authors about
the synchronization contract. Added a <remarks> block to each
documenting the only-safe-on-quiesced-engine + identity-or-single-key
contract.
Verification (suites green):
Core.Scripting.Tests: 110/110 (was 107 — +3 new rejection/race tests)
Core.ScriptedAlarms.Tests: 67/67 (unchanged — doc-only fix)
Core.VirtualTags.Tests: 57/57 (unchanged)
After this commit, all 12 findings from the updated re-review are
closed (10 Resolved, 1 Won't Fix none, 1 Deferred — Driver.Galaxy-017).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Driver.Galaxy-015, -016, -017, -018 resolution (one logical change set).
Driver.Galaxy-016 (Medium, Perf/Resource):
Reconciled the csproj PackageReferences with what the vendored
MxGateway.Client.dll was actually built against, verified by
reflecting Assembly.GetReferencedAssemblies() on the DLL:
- Polly 8.5.2 → Polly.Core 8.6.6
(most consequential — Polly v7 fluent API vs Polly.Core v8
resilience-pipeline API are DIFFERENT packages; the DLL was
built against Polly.Core so the prior Polly reference would
have failed at runtime with MissingMethodException the first
time the gateway client's retry pipeline ran)
- Grpc.Net.Client 2.71.0 → 2.76.0 (matches sibling Server/Worker)
- Microsoft.Extensions.Logging.Abstractions 10.0.0 → 10.0.7
Google.Protobuf 3.34.1 and Grpc.Core.Api 2.76.0 already matched —
left unchanged.
Driver.Galaxy-015 (re-triaged from Medium-Security → Low-Documentation):
Original framing was a security concern about unknown-provenance
binaries. User clarified the DLLs are their own code, built from
their own mxaccessgw project, not third-party. Re-triaged to a
documentation / audit-trail concern. Fix:
- Added a Provenance section to libs/README.md recording the
source-commit SHA (dd7ca1634e2d2b8a866c81f0009bf87ee9427750,
extracted from the AssemblyInformationalVersion baked into
both DLLs by the original build) and SHA-256 checksums.
- Documented the re-verification recipe (sha256sum + ilspycmd
| grep AssemblyInformationalVersion).
Recommendations about .gitattributes and CI hash-check deferred —
the DLLs are frozen until an unwinding path is taken, so adding
LFS or CI infrastructure now would need removal at unwinding.
Driver.Galaxy-018 (Low, Documentation):
Most of the recommendation folded into the libs/README.md rewrite
(pointed at sibling Server/Worker csproj as the live version source
rather than the deleted MxGateway.Client.csproj; recorded source
commit + SHA-256). <SpecificVersion>false</SpecificVersion> on the
<Reference> items intentionally not added — MSBuild's default for
HintPath references with bare-name Include attributes is already
SpecificVersion=false, so explicitly setting it would be cosmetic
without changing behaviour.
Driver.Galaxy-017 (Low, Design) — Deferred:
Recommendation part (b) (record mxaccessgw source-commit SHA in
libs/README.md) is satisfied by Driver.Galaxy-015's resolution.
Parts (a) and (c) — a GetVersion RPC at session-open and a parity
test against the live gateway's proto descriptor — are substantial
new RPC + plumbing work not in scope for this code-review sweep.
The risk surface is bounded because either of the libs/README.md
unwinding paths closes the vendoring + this concern naturally.
Re-open if neither path is taken within the next quarter and the
live gateway evolves its proto under the driver.
Verification:
- Build clean (Driver.Galaxy.csproj 0 errors, 0 warnings).
- Driver.Galaxy.Tests: 245/245 pass against the corrected
package set.
- Solution-wide build remains clean.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Core.Scripting-012 (High, Security) resolution.
The Core.Scripting-008 rewrite broadened the BCL references list from a
narrow allow-list to the full System.* + netstandard +
Microsoft.Win32.Registry set, delegating the security gate entirely to
ForbiddenTypeAnalyzer. Three categories of dangerous BCL types were
reachable from script source without a deny-list entry:
- System.Threading.ThreadPool — QueueUserWorkItem re-introduces the
background-fanout threat Core.Scripting-003 closed against
System.Threading.Tasks.
- System.Threading.Timer — schedules unbounded callback work that
outlives the per-evaluation timeout.
- System.Runtime.Loader.AssemblyLoadContext — loads arbitrary DLLs.
Defense-in-depth gap; invocation needs reflection (already denied)
but the load itself was reachable.
Fix:
- Added 'System.Runtime.Loader' to ForbiddenNamespacePrefixes
(preferred over type-granular per the recommendation so future BCL
additions to that namespace are denied by default).
- Added 'System.Threading.ThreadPool' and 'System.Threading.Timer'
to ForbiddenFullTypeNames — both live in System.Threading shared
with allowed primitives so they must be type-granular.
Regression tests added to ScriptSandboxTests:
Rejects_ThreadPool_QueueUserWorkItem_at_compile
Rejects_Timer_new_at_compile
Rejects_AssemblyLoadContext_at_compile
Docs:
docs/v2/implementation/phase-7-scripting-and-alarming.md decision #6
and the Sandbox-escape compliance-check row both updated to enumerate
the new entries per the Core.Scripting-009 doc-sync convention.
Two lower-impact suggestions from the finding's recommendation
(System.Console, CultureInfo.DefaultThreadCurrentCulture) were
intentionally not addressed and are recorded as accepted minor risks
in the resolution.
Verification: Core.Scripting.Tests 107/107 (was 104 + 3 new rejection
tests).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Both VirtualTagEngine.Load and ScriptedAlarmEngine.LoadAsync were calling
ScriptEvaluator.Compile directly, bypassing CompiledScriptCache. The
Core.Scripting-008 collectible-ALC fix wired Dispose only through the cache's
Clear()/Dispose(), so the per-publish accretion the -008 fix was meant to
eliminate was still in effect on the actual production path — the headline
'no more restarts needed' guarantee wasn't delivered.
Resolution:
- VirtualTagEngine + ScriptedAlarmEngine each gained a private
CompiledScriptCache<TContext, TResult> instance.
- Both Load methods now call _compileCache.GetOrCompile(source).
- Publish-replace path: _compileCache.Clear() runs alongside the existing
_tags / _alarms clears so the prior generation's ALCs are disposed
before recompile.
- Engine Dispose now calls _compileCache.Dispose() so shutdown actually
releases the emitted assemblies.
Side-fix in CompiledScriptCache: Dispose() set _disposed=true then called
Clear(), but Clear() had a pre-existing 'if (_disposed) return' guard that
aborted the drain unconditionally — making the Dispose-triggered cleanup a
silent no-op. Removed the disposed-guard on Clear() (clearing an empty/
cleared cache is idempotent).
Side-fix in ScriptedAlarmEngine.Dispose: cleared _alarms AFTER the
Task.WhenAll drain. The drain guarantees no background callback is mid-
flight, so clearing is safe. Previously _alarms was deliberately NOT
cleared on Dispose (per Core.ScriptedAlarms-005), but that left the
AlarmState records holding TimedScriptEvaluator → ScriptEvaluator → delegate
references that rooted the emitted assemblies, defeating the cache's
Dispose work on the engine side.
Regression tests:
- VirtualTagEngineTests.Dispose_unloads_compiled_script_assembly
- ScriptedAlarmEngineTests.Dispose_unloads_compiled_predicate_assembly
Both use WeakReference + bounded GC.Collect() to prove the emitted
assembly is reclaimable after engine.Dispose(). The alarms test had to
be synchronous (not 'async Task<WeakReference>') because async state
machines capture locals as state-struct fields, keeping them alive past
the method's apparent end and defeating GC.
Verification:
- Core.Scripting.Tests: 104/104 (unchanged).
- VirtualTags.Tests: 57/57 (was 56 — +1 unload test).
- ScriptedAlarms.Tests: 67/67 (was 66 — +1 unload test).
- All other consumer suites still green.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Driver.Cli.Common-007 + Driver.Cli.Common-008 resolution.
Driver.Cli.Common-007 (High, Correctness):
0x80550000 is the canonical OPC UA spec value for BadSecurityPolicyRejected,
not BadDeviceFailure. The correct spec value for BadDeviceFailure is
0x808B0000 (verified against OPC Foundation Opc.Ua.StatusCodes;
corroborated locally by Driver.Galaxy.Runtime.StatusCodeMap and both
Wonderware historian quality mappers which all hand-pin the correct
value).
The bug was duplicated across six driver modules:
- FocasStatusMapper.BadDeviceFailure
- AbCipStatusMapper.BadDeviceFailure
- AbLegacyStatusMapper.BadDeviceFailure
- TwinCATStatusMapper.BadDeviceFailure
- ModbusDriver.StatusBadDeviceFailure
- S7Driver.StatusBadDeviceFailure
Plus the SnapshotFormatter shortlist that named 0x80550000 as
BadDeviceFailure, and three downstream Modbus tests that asserted
against the wrong value (so CI was blind).
This commit fixes all six native-mapper constants, the formatter
shortlist, and the three Modbus tests in one pass. Added a regression
guard to FormatStatus_does_not_apply_pre_fix_wrong_names that pins
0x80550000 never renders as BadDeviceFailure (mirroring the existing
-001 wrong-name guards).
Behavior change: OPC UA clients consuming the native drivers now see
the canonical BadDeviceFailure (0x808B0000) on device-fault paths
instead of the misnamed BadSecurityPolicyRejected (0x80550000). Wire-
level status semantics now match operator-facing CLI labels.
Driver.Cli.Common-008 (Low, Testing):
Deleted the redundant FormatStatus_names_native_driver_emitted_codes
Theory — its five InlineData rows were already covered by the
well-known Theory in the same commit (5a9c459), and used a weaker
ShouldContain vs the well-known Theory's ShouldBe (exact match).
Verification:
- Driver.Cli.Common.Tests: 43/43 pass (was 48 after the -008 deletion).
- Driver.Modbus.Tests: 263/263 pass.
- Driver.AbCip.Tests: 262/262.
- Driver.AbLegacy.Tests: 157/157.
- Driver.FOCAS.Tests: 178/178.
- Driver.S7.Tests: 112/112.
- Driver.TwinCAT.Tests: 131/131.
Total: 1146 tests across the affected modules, all green.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>