Commit Graph

210 Commits

Author SHA1 Message Date
Joseph Doherty b50fd6c34a review(Driver.AbLegacy.Cli): add FlushLogging() to command finally blocks
Re-review at 7286d320. -008 (Low): all four commands now FlushLogging() in finally (parity
with AbCip.Cli; subscribe could drop shutdown log lines) + IL-inspection test.
2026-06-19 12:08:45 -04:00
Joseph Doherty 111d6983a5 review: regenerate code-review index after Batch 7 (Wonderware.Client/Client.CLI/Shared/UI/AbCip.Cli) 2026-06-19 11:58:38 -04:00
Joseph Doherty 2b077fb789 review(Driver.AbCip.Cli): fix stale CLI-count + misleading --type help
Re-review at 7286d320. -009: 'four'->'six' driver-CLI count in Program.cs. -010: ReadCommand
--type help no longer lists Structure (rejected at runtime) + pinning test.
2026-06-19 11:58:15 -04:00
Joseph Doherty 12efbffd56 review(Client.UI): single notification when removing non-retained alarm row
Re-review at 7286d320. -013: AlarmsViewModel.OnAlarmEvent removal path no longer fires a
redundant Replace+Remove (one Remove now), preventing a DataGrid re-paint flash. -012: add
update/remove-path test coverage. + TDD.
2026-06-19 11:58:15 -04:00
Joseph Doherty d68c9db9f9 review(Client.Shared): fix Disconnect/failover subscription race + CT forwarding
Re-review at 7286d320. -012 (Medium): DisconnectAsync now snapshots+nulls the data/alarm
subscriptions under _subscriptionLock before async teardown (was racing RunFailoverAsync).
-013: SubscribeAlarmsAsync guarded by a semaphore (idempotent under concurrency). -014/-015:
forward CancellationToken through Delete/BrowseNext adapters. + TDD.
2026-06-19 11:58:15 -04:00
Joseph Doherty 887a31e825 review(Client.CLI): wrap NodeId parse errors in CommandException for alarm-op commands
Re-review at 7286d320. -011: ack/confirm/enable/disable/shelve now pre-validate --node and
surface a clean CommandException (was a raw FormatException) + tests. -012: refresh stale
test count in docs/Client.CLI.md.
2026-06-19 11:58:15 -04:00
Joseph Doherty cd072baad8 review(Driver.Historian.Wonderware.Client): async frame-header write + wire-parity test
Re-review at 7286d320. -011: FrameWriter folded the sync WriteByte (could block on SslStream
past the call timeout) into one async 5-byte header write. -012: DefaultTcpConnectFactory
readonly. -013: wire-parity test for PerEventStatus [Key(4)]. No wire change.
2026-06-19 11:58:15 -04:00
Joseph Doherty c95a8c6b19 review: regenerate index after Batch 6 + fix Wonderware open-count header (0, -013 is Deferred=closed) 2026-06-19 11:47:46 -04:00
Joseph Doherty b3907efa6e review(Driver.Historian.Wonderware): AtTime fails over on connection-class errors
Re-review at 7286d320. -014 (Medium): ReadAtTimeAsync didn't classify StartQuery failures,
so a connection-class failure left a dead connection, re-failed every timestamp, and returned
Success=true with all-Bad (no failover); now resets+fails over via a shared classifier + tests.
-015: refresh stale named-pipe comments to TCP (no wire change). -013 (silent cap truncation,
ties OpcUaServer-002/Core.Abstractions-009) deferred cross-module. NOTE: the SDK-touching tests
are net48 + native aahClientManaged and run only on Windows; macOS verifies build + the SDK-free
subset only.
2026-06-19 11:47:11 -04:00
Joseph Doherty e07a4fbf52 review(Driver.FOCAS): add byte-level wire-protocol test coverage
Re-review at 7286d320. -013 (Medium, testing): the managed FOCAS/2 wire-decode layer
(BuildPdu/ParseResponseBlocks, incl. cnc_getfigure stride) had zero byte-level tests; added
15 (no decode bug found). -014 (spindle-load truncation heuristic) deferred bench-gated.
Note: runtime read path is now pure-managed TCP (no P/Invoke except the probe handshake).
2026-06-19 11:47:11 -04:00
Joseph Doherty 22f7d92b72 review(Driver.TwinCAT): thread ArrayLength through factory DTO (Medium)
Re-review at 7286d320. -017 (Medium): TwinCATTagDto lacked ArrayLength, so JSON-authored
pre-declared array tags were silently scalar (Phase-4c array path dead for them). Fix:
add ArrayLength to the DTO + thread through BuildTag with positive-value guard + TDD.
2026-06-19 11:47:11 -04:00
Joseph Doherty 91e2609560 review(Driver.AbLegacy): fix Bit write 1-byte/2-byte encode-decode mismatch (Medium)
Re-review at 7286d320. -014 (Medium): Bit EncodeValue (no bitIndex) wrote SetInt8 while
DecodeValue read GetInt16 on a 16-bit B-file element, so a false write could round-trip
as true (stale high byte). Fix: SetInt16 + TDD. -015: tests pass CancellationToken.
2026-06-19 11:47:11 -04:00
Joseph Doherty be272d960f review(Driver.OpcUaClient): release browse continuation point on cancel
Re-review at 7286d320. -016: BrowseRecursiveAsync now releases the server-side continuation
point on OperationCanceledException (BrowseNext releaseContinuationPoints:true) before
rethrowing (resolves the Browser-002 cross-cutting leak) + TDD.
2026-06-19 11:47:11 -04:00
Joseph Doherty 04e0877bff review: regenerate code-review index after Batch 5 (Galaxy/AbCip/S7/Modbus/Modbus.Addressing) 2026-06-19 11:34:46 -04:00
Joseph Doherty b5f6cdfdb9 review(Driver.Modbus.Addressing): fix misleading byte-order hint + drop dead overflow guard
Re-review at 7286d320. -010 (Low): TryParseByteOrder no longer lists REAL/DINT/UINT as type
codes (gave wrong 'field 2' advice -> second parse error); generic byte-order error instead.
-011 (Low): remove unreachable offsetWithinBank>ushort.MaxValue guard (DecodeOctalVAddress
caps at 0xFFFF). + TDD.
2026-06-19 11:34:35 -04:00
Joseph Doherty 6853a0430f review(Driver.Modbus): validate FC03 RMW response + correct write error mapping
Re-review at 7286d320. Modbus-013 (Low): bit RMW now routes the FC03 read through the
validated ReadRegisterBlockAsync (was raw-indexing readResp -> IndexOutOfRange on a truncated
PDU). Modbus-014 (Low): WriteAsync maps InvalidDataException to BadCommunicationError (was
BadInternalError), matching ReadAsync. + TDD.
2026-06-19 11:34:34 -04:00
Joseph Doherty f2bdd8bc1c review(Driver.S7): reject writable array tags at init instead of silent write failure
Re-review at 7286d320. S7-015 (Medium): a Writable array tag had no WriteArrayAsync path
and silently returned BadCommunicationError on write; now rejected at init with a clear
NotSupportedException (read-only arrays still accepted) + TDD. S7-016 (factory JSON can't
produce array tags; needs AdminUI DTO) deferred.
2026-06-19 11:34:34 -04:00
Joseph Doherty a914b73d57 review(Driver.AbCip): fix declared UDT array members read as scalar (Medium)
Re-review at 7286d320. AbCip-016 (Medium): two cooperating defects made a declared array
member (e.g. REAL[4]) read one scalar/null — fan-out dropped ElementCount/IsArray, and
UdtMemberLayout.TryBuild ignored array members (mis-placing later members). Fix: thread
array shape through fan-out + opt whole-UDT grouping out when any member is an array + TDD.
AbCip-017 (severity-read StatusCode, Low) deferred.
2026-06-19 11:34:34 -04:00
Joseph Doherty db72dd1dca review(Driver.Galaxy): re-review at HEAD (1 Low deferred; vendoring findings obsolete)
Re-review at 7286d320. Driver.Galaxy-019 (Low, Open): EnsureSessionIntervalAsync caches
MxaccessFailure as 'interval applied' (sub-optimal cadence until reconnect) — deferred
(sealed gateway session + gateway-contract design call). 001-014 re-verified Resolved;
015-018 (vendoring) now obsolete (libs/ replaced by real MxGateway PackageReferences).
2026-06-19 11:34:34 -04:00
Joseph Doherty ae3f071945 review: regenerate index after Batch 4 + fix Core.VirtualTags checklist heading
Demote the re-review '### Re-review checklist' heading to #### so regen-readme does not
parse it as a status-less finding (--check now consistent).
2026-06-19 11:22:56 -04:00
Joseph Doherty 1180b017f5 review(Driver.Cli.Common): drop dead FormatStatus branch + timestamp-kind test
Re-review at 7286d320. -009: remove unreachable name-is-null branch in FormatStatus +
invariant test. -010: pin DateTimeKind.Unspecified FormatTimestamp behavior.
2026-06-19 11:21:36 -04:00
Joseph Doherty 13c1215811 review(Analyzers): add trip-coverage for async guarded-interface methods
Re-review at 7286d320. -008: 5 regression tests for Unsubscribe/UnsubscribeAlarms/
Acknowledge/ReadEvents trip + suppression paths (analyzer source already correct).
Surfaced cross-module: Runtime DriverInstanceActor.HandleWriteAsync calls WriteAsync
directly (tracked for Runtime).
2026-06-19 11:21:35 -04:00
Joseph Doherty 6b4210cb17 review(Core.AlarmHistorian): reset drain state on cancel + volatile _disposed
Re-review at 7286d320. -012 (Medium): OperationCanceledException left _drainState stuck
at Draining on the status surface; now resets to BackingOff + test. -013: _disposed ->
volatile (mirrors _backoffIndex). -014 (post-dispose status guards) deferred cross-module.
2026-06-19 11:21:35 -04:00
Joseph Doherty 48af117bff review(Core.VirtualTags): fix Good-null upstream blocking downstream (Medium)
Re-review at 7286d320. -014 (Medium): AreInputsReady gated on value!=null, so a script
returning null (Good quality) permanently blocked change-triggered dependents at
BadWaitingForInitialData; now gates on the StatusCode Good bit only + test. -015:
TimerTriggerScheduler.Start throws on double-call. -016: fix wrong status-code comment.
2026-06-19 11:21:35 -04:00
Joseph Doherty 272a9da61e review(Core.ScriptedAlarms): stop shelving timer on failed reload + drop dead branch
Re-review at 7286d320. -015: dispose shelving timer at top of LoadAsync so a failed
reload doesn't leave it firing against partially-cleared state + test. -014: make
pendingEmissions required (removes unreachable fire-under-gate branch that could
reintroduce the -003 deadlock).
2026-06-19 11:21:35 -04:00
Joseph Doherty 621d00e455 review: regenerate code-review index after Batch 3 (Core/Abstractions/Scripting/Configuration) 2026-06-19 11:07:15 -04:00
Joseph Doherty c3d148e396 review(Configuration): fix LiteDB global BsonMapper cross-instance race (High)
Re-review at 7286d320. Configuration-012 (High): LiteDbConfigCache/GenerationSealedCache built
LiteDatabase on the process-wide BsonMapper.Global whose lazy member resolution races across
concurrently-constructed DBs (NotSupportedException/duplicate-key under contention; also caused
intermittent suite flakiness). Fix: per-cache fresh BsonMapper + pre-registered entity + TDD.
-013 (dead ValidateClusterTopology, ControlPlane) / -014 (collation case-sensitivity, needs
migration) deferred. No migration touched.
2026-06-19 11:06:56 -04:00
Joseph Doherty 145b06bec9 review(Core.Scripting.Abstractions): refresh stale Phase7 labels + document {{equip}}
First review at 7286d320. Five Low doc fixes (BadNodeIdUnknown comment parity, three stale
Phase7 labels -> design-doc cites, {{equip}} token doc on GetTag/SetVirtualTag). Deadband
NaN/negative-tolerance (004) + a stale docs path (007) left Open.
2026-06-19 11:06:56 -04:00
Joseph Doherty 38c48a009c review(Core.Scripting): block Unsafe.As sandbox bypass (Security)
Re-review at 7286d320. Core.Scripting-017 (Medium, Security): System.Runtime.CompilerServices.Unsafe
added to ForbiddenFullTypeNames (Unsafe.As bypasses the type system without an unsafe context;
CWE-843 type-confusion into SetVirtualTag) + regression tests (rejects Unsafe.As, still allows
benign CompilerServices attributes). -018: refresh stale rejection message. Sandbox holds.
2026-06-19 11:06:56 -04:00
Joseph Doherty 65e6af6001 review(Core.Abstractions): document ReadEventsAsync continuation contract (OpcUaServer-002 root)
Re-review at 7286d320. Core.Abstractions-009: ReadEventsAsync maxEvents<=0 sentinel now
documents the implementer's continuation-point obligation when a backend cap truncates
(the root of OpcUaServer-002). -010: PollGroupEngineTests pass CancellationToken. Plus
EquipmentTagRefResolver.TryResolve [MaybeNullWhen(false)] NRT cleanup + test.
2026-06-19 11:06:56 -04:00
Joseph Doherty 354b0e7613 review(Core): re-review at HEAD; clean up duplicate/unexplained doc comments
Re-review at 7286d320. Core-013 (duplicate <summary> on HostBoundHandle), Core-014
(clarify EquipmentNodeWalker test-only hardcoded attrs). Both Low, doc-only. Prior
authz/Galaxy churn verified correct.
2026-06-19 11:06:56 -04:00
Joseph Doherty 8ac5a2dbc5 review: regenerate code-review index after Batch 2 (Runtime/ControlPlane/AdminUI/Browsers) 2026-06-19 10:52:45 -04:00
Joseph Doherty 2fe8e587dd review(Driver.OpcUaClient.Browser): AttributesAsync updates LastUsedUtc
Review at HEAD 7286d320. -001: AttributesAsync now updates LastUsedUtc (IBrowseSession
contract) + test (InternalsVisibleTo+Moq added). -002 (continuation-point cancel leak)
deferred cross-cutting w/ runtime Driver.OpcUaClient.
2026-06-19 10:52:23 -04:00
Joseph Doherty 960d76ffcb review(Driver.Galaxy.Browser): fix mis-shifted MapSecurityClass codes (High)
Review at HEAD 7286d320. Driver.Galaxy.Browser-001 (High): MapSecurityClass codes 2-6 were
all shifted vs the runtime SecurityClassification enum (wrong security labels in the picker)
-> corrected all 7 arms + tests. -002: DisposeAsync swallows concurrent ObjectDisposedException.
-003 (ResolveApiKey dup) deferred to Contracts.
2026-06-19 10:52:23 -04:00
Joseph Doherty 3c908f1df0 review(AdminUI): fix null-TagConfig crash, CTS leak, unencoded historian tag
Review at HEAD 7286d320. AdminUI-002: IsValidJson null/blank -> friendly error (was
ArgumentNullException). AdminUI-003: DriverStatusPanel Reconnect/Restart dispose CTS (build-
verified, live /run deferred). AdminUI-005: HistorianWonderware picker URL-encodes tag name.
AdminUI-008: Format round-trip test. 001 (script-page authz) + 004 (hub [Authorize]) left
Open as cross-cutting w/ Host/Security.
2026-06-19 10:52:23 -04:00
Joseph Doherty 1aa7905676 review(ControlPlane): fix premature deploy-seal from unexpected-node ack
Review at HEAD 7286d320. ControlPlane-001 (Medium): ConfigPublishCoordinator.HandleAck
now discards acks from nodes not in _expectedAcks (prevented premature SealDeployment) +
regression test. -002 (flipped-node log count), -003 (redundant mapper arms) tidied.
2026-06-19 10:52:22 -04:00
Joseph Doherty 3512089c90 review(Runtime): record findings + fix artifact-decode type tolerance
Review at HEAD 7286d320. Runtime-002/006 (Medium): DeploymentArtifact lenient-parse
now degrades wrong-typed JSON fields to defaults/skipped-row instead of throwing (fails
the deploy) + regression tests; byte-parity with AddressSpaceComposer preserved. Runtime-001
(UNS rename) deferred cross-module (needs AddressSpacePlan rename signal + EnsureFolder
rename). 003/004/005 Won't-Fix.
2026-06-19 10:52:22 -04:00
Joseph Doherty 5aaa82bc26 review: regenerate code-review index after Batch 1 (OpcUaServer/Security/Host/Cluster/Commons) 2026-06-19 10:37:26 -04:00
Joseph Doherty bac6613dd2 review(OpcUaServer): record findings + fix stale node-manager/host docs
First review of the v2 OPC UA core at HEAD 7286d320. 6 findings (2 Medium, 4 Low).
OpcUaServer-006 fixed (stale NodeManager/ApplicationHost XML docs). 001-004 deferred
(cross-module: Runtime publish-actor / Core.Abstractions history contract / Wonderware
boundary semantics, or latent-only). 005 re-triaged Won't-Fix (coverage already exists).
High-scrutiny paths (Lock discipline, OnWriteValue fire-and-forget, WriteOperate/AlarmAck
gates, HistoryRead AccessLevel bits) verified correct.
2026-06-19 10:37:00 -04:00
Joseph Doherty e4abe186a1 review(Host): allow-anonymous /metrics + unconditional LDAP validator
Code review at HEAD 7286d320. Host-001 (High): /metrics was auth-gated on admin
nodes (Prometheus 401) -> AllowAnonymous. Host-002: register LdapOptionsValidator
unconditionally for fail-fast startup validation on admin-only nodes. Host-004: fix
metrics XML doc. Host-003 (docs) left Open.
2026-06-19 10:22:59 -04:00
Joseph Doherty d23e585cdb review(Security): fix login open-redirect (High) + stale LDAP doc
Code review at HEAD 7286d320. Security-001 (High): guard returnUrl with a local-URL
check before redirect (open-redirect/phishing vector) + regression test. Security-002:
update stale LdapOptions dev-LDAP doc reference.
2026-06-19 10:22:59 -04:00
Joseph Doherty b1946194d6 review(Cluster): record findings + fix snapshot consistency, dispose, stale docs
Code review at HEAD 7286d320. Cluster-001 (SeedFromCurrentState reads from one
snapshot), Cluster-003 (HoconLoader double-dispose), Cluster-004 (stale akka.conf
header), Cluster-005 (ServiceLevelCalculator tests added to Cluster.Tests). Cluster-002
deferred (no production caller).
2026-06-19 10:22:59 -04:00
Joseph Doherty 6dc74289ce review(Commons): record findings + add deferred-sink/equip-nodeid tests, fix stale Phase7 doc
Code review at HEAD 7286d320. Commons-001 (stale Phase7 telemetry doc) fixed;
Commons-003/004 close test-coverage gaps (DeferredAddressSpaceSink/ServiceLevelPublisher
forwarding seam + EquipmentNodeIds whitespace branch). Commons-002 (CorrelationId
typing) deferred as cross-cutting.
2026-06-19 10:22:59 -04:00
Joseph Doherty 23d59d73f2 fix(scripting+alarms): close remaining re-review findings
Single commit covering the four small/medium fixes from the updated
code review.

Core.Scripting-014 (Medium, Concurrency):
  CompiledScriptCache.Clear() used the key-only TryRemove(key, out var
  lazy) overload — same race shape Core.Scripting-006 closed in
  GetOrCompile's catch block. A concurrent re-add between snapshot and
  TryRemove was evicted + disposed while the new caller still held it.
  Replaced with the value-scoped TryRemove(KeyValuePair<,>) overload.
  Regression test
  Clear_uses_value_scoped_TryRemove_so_a_race_inserted_entry_survives
  added.

Core.Scripting-013 (Medium, Security):
  Hand-rolled BuildWrapperSource pastes user source between literal
  braces; brace-balanced source could inject sibling methods/classes
  alongside CompiledScript.Run. Analyzer still walked the injected
  members so it wasn't a direct escape, but it relaxed the documented
  'method body' authoring contract. Added EnforceSingleRunMember:
  after ParseText, the compilation unit must hold exactly one type
  (CompiledScript) and that type must hold exactly one member (the Run
  method). Any deviation throws CompilationErrorException with LMX001/
  LMX002 diagnostic IDs and a Core.Scripting-013 reference in the
  message. Two regression tests added covering the sibling-method and
  sibling-class injection vectors.

Core.Scripting-015 (Low, Correctness, latent):
  ToCSharpTypeName's generic branch truncated at the first backtick via
  IndexOf, silently dropping closed args of nested-generic shapes
  (Outer<T>.Inner<U>). No production caller exercises this shape today
  (all TContext/TResult are top-level non-nested), so the bug was
  latent. Rewrote the generic branch to walk the FullName segment-by-
  segment, consuming generic args per segment so nested shapes emit
  valid C# (global::Ns.Outer<T>.Inner<U> rather than the broken
  Outer<T,U>).

Core.ScriptedAlarms-013 (Low, Documentation):
  The internal test accessors TryGetScratchReadCacheForTest /
  TryGetScratchContextForTest return live mutable scratch refilled in
  place under _evalGate. XML docs didn't warn future test authors about
  the synchronization contract. Added a <remarks> block to each
  documenting the only-safe-on-quiesced-engine + identity-or-single-key
  contract.

Verification (suites green):
  Core.Scripting.Tests: 110/110 (was 107 — +3 new rejection/race tests)
  Core.ScriptedAlarms.Tests: 67/67 (unchanged — doc-only fix)
  Core.VirtualTags.Tests: 57/57 (unchanged)

After this commit, all 12 findings from the updated re-review are
closed (10 Resolved, 1 Won't Fix none, 1 Deferred — Driver.Galaxy-017).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-23 18:00:59 -04:00
Joseph Doherty c2abbf45bd fix(driver-galaxy): align package versions + record vendored-DLL provenance
Driver.Galaxy-015, -016, -017, -018 resolution (one logical change set).

Driver.Galaxy-016 (Medium, Perf/Resource):
  Reconciled the csproj PackageReferences with what the vendored
  MxGateway.Client.dll was actually built against, verified by
  reflecting Assembly.GetReferencedAssemblies() on the DLL:
    - Polly 8.5.2  →  Polly.Core 8.6.6
      (most consequential — Polly v7 fluent API vs Polly.Core v8
       resilience-pipeline API are DIFFERENT packages; the DLL was
       built against Polly.Core so the prior Polly reference would
       have failed at runtime with MissingMethodException the first
       time the gateway client's retry pipeline ran)
    - Grpc.Net.Client 2.71.0  →  2.76.0  (matches sibling Server/Worker)
    - Microsoft.Extensions.Logging.Abstractions 10.0.0  →  10.0.7
  Google.Protobuf 3.34.1 and Grpc.Core.Api 2.76.0 already matched —
  left unchanged.

Driver.Galaxy-015 (re-triaged from Medium-Security → Low-Documentation):
  Original framing was a security concern about unknown-provenance
  binaries. User clarified the DLLs are their own code, built from
  their own mxaccessgw project, not third-party. Re-triaged to a
  documentation / audit-trail concern. Fix:
    - Added a Provenance section to libs/README.md recording the
      source-commit SHA (dd7ca1634e2d2b8a866c81f0009bf87ee9427750,
      extracted from the AssemblyInformationalVersion baked into
      both DLLs by the original build) and SHA-256 checksums.
    - Documented the re-verification recipe (sha256sum + ilspycmd
      | grep AssemblyInformationalVersion).
  Recommendations about .gitattributes and CI hash-check deferred —
  the DLLs are frozen until an unwinding path is taken, so adding
  LFS or CI infrastructure now would need removal at unwinding.

Driver.Galaxy-018 (Low, Documentation):
  Most of the recommendation folded into the libs/README.md rewrite
  (pointed at sibling Server/Worker csproj as the live version source
  rather than the deleted MxGateway.Client.csproj; recorded source
  commit + SHA-256). <SpecificVersion>false</SpecificVersion> on the
  <Reference> items intentionally not added — MSBuild's default for
  HintPath references with bare-name Include attributes is already
  SpecificVersion=false, so explicitly setting it would be cosmetic
  without changing behaviour.

Driver.Galaxy-017 (Low, Design) — Deferred:
  Recommendation part (b) (record mxaccessgw source-commit SHA in
  libs/README.md) is satisfied by Driver.Galaxy-015's resolution.
  Parts (a) and (c) — a GetVersion RPC at session-open and a parity
  test against the live gateway's proto descriptor — are substantial
  new RPC + plumbing work not in scope for this code-review sweep.
  The risk surface is bounded because either of the libs/README.md
  unwinding paths closes the vendoring + this concern naturally.
  Re-open if neither path is taken within the next quarter and the
  live gateway evolves its proto under the driver.

Verification:
  - Build clean (Driver.Galaxy.csproj 0 errors, 0 warnings).
  - Driver.Galaxy.Tests: 245/245 pass against the corrected
    package set.
  - Solution-wide build remains clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-23 17:45:24 -04:00
Joseph Doherty 3a53d03d23 fix(scripting): block ThreadPool/Timer/AssemblyLoadContext in sandbox
Core.Scripting-012 (High, Security) resolution.

The Core.Scripting-008 rewrite broadened the BCL references list from a
narrow allow-list to the full System.* + netstandard +
Microsoft.Win32.Registry set, delegating the security gate entirely to
ForbiddenTypeAnalyzer. Three categories of dangerous BCL types were
reachable from script source without a deny-list entry:

  - System.Threading.ThreadPool — QueueUserWorkItem re-introduces the
    background-fanout threat Core.Scripting-003 closed against
    System.Threading.Tasks.
  - System.Threading.Timer — schedules unbounded callback work that
    outlives the per-evaluation timeout.
  - System.Runtime.Loader.AssemblyLoadContext — loads arbitrary DLLs.
    Defense-in-depth gap; invocation needs reflection (already denied)
    but the load itself was reachable.

Fix:
  - Added 'System.Runtime.Loader' to ForbiddenNamespacePrefixes
    (preferred over type-granular per the recommendation so future BCL
    additions to that namespace are denied by default).
  - Added 'System.Threading.ThreadPool' and 'System.Threading.Timer'
    to ForbiddenFullTypeNames — both live in System.Threading shared
    with allowed primitives so they must be type-granular.

Regression tests added to ScriptSandboxTests:
  Rejects_ThreadPool_QueueUserWorkItem_at_compile
  Rejects_Timer_new_at_compile
  Rejects_AssemblyLoadContext_at_compile

Docs:
  docs/v2/implementation/phase-7-scripting-and-alarming.md decision #6
  and the Sandbox-escape compliance-check row both updated to enumerate
  the new entries per the Core.Scripting-009 doc-sync convention.

Two lower-impact suggestions from the finding's recommendation
(System.Console, CultureInfo.DefaultThreadCurrentCulture) were
intentionally not addressed and are recorded as accepted minor risks
in the resolution.

Verification: Core.Scripting.Tests 107/107 (was 104 + 3 new rejection
tests).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-23 17:39:20 -04:00
Joseph Doherty fb7c6c7046 fix(scripting): route engines through CompiledScriptCache (Core.Scripting-016)
Both VirtualTagEngine.Load and ScriptedAlarmEngine.LoadAsync were calling
ScriptEvaluator.Compile directly, bypassing CompiledScriptCache. The
Core.Scripting-008 collectible-ALC fix wired Dispose only through the cache's
Clear()/Dispose(), so the per-publish accretion the -008 fix was meant to
eliminate was still in effect on the actual production path — the headline
'no more restarts needed' guarantee wasn't delivered.

Resolution:
  - VirtualTagEngine + ScriptedAlarmEngine each gained a private
    CompiledScriptCache<TContext, TResult> instance.
  - Both Load methods now call _compileCache.GetOrCompile(source).
  - Publish-replace path: _compileCache.Clear() runs alongside the existing
    _tags / _alarms clears so the prior generation's ALCs are disposed
    before recompile.
  - Engine Dispose now calls _compileCache.Dispose() so shutdown actually
    releases the emitted assemblies.

Side-fix in CompiledScriptCache: Dispose() set _disposed=true then called
Clear(), but Clear() had a pre-existing 'if (_disposed) return' guard that
aborted the drain unconditionally — making the Dispose-triggered cleanup a
silent no-op. Removed the disposed-guard on Clear() (clearing an empty/
cleared cache is idempotent).

Side-fix in ScriptedAlarmEngine.Dispose: cleared _alarms AFTER the
Task.WhenAll drain. The drain guarantees no background callback is mid-
flight, so clearing is safe. Previously _alarms was deliberately NOT
cleared on Dispose (per Core.ScriptedAlarms-005), but that left the
AlarmState records holding TimedScriptEvaluator → ScriptEvaluator → delegate
references that rooted the emitted assemblies, defeating the cache's
Dispose work on the engine side.

Regression tests:
  - VirtualTagEngineTests.Dispose_unloads_compiled_script_assembly
  - ScriptedAlarmEngineTests.Dispose_unloads_compiled_predicate_assembly
  Both use WeakReference + bounded GC.Collect() to prove the emitted
  assembly is reclaimable after engine.Dispose(). The alarms test had to
  be synchronous (not 'async Task<WeakReference>') because async state
  machines capture locals as state-struct fields, keeping them alive past
  the method's apparent end and defeating GC.

Verification:
  - Core.Scripting.Tests: 104/104 (unchanged).
  - VirtualTags.Tests: 57/57 (was 56 — +1 unload test).
  - ScriptedAlarms.Tests: 67/67 (was 66 — +1 unload test).
  - All other consumer suites still green.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-23 17:33:34 -04:00
Joseph Doherty a6ae4e22d1 fix(status-codes): correct BadDeviceFailure from 0x80550000 to 0x808B0000
Driver.Cli.Common-007 + Driver.Cli.Common-008 resolution.

Driver.Cli.Common-007 (High, Correctness):
  0x80550000 is the canonical OPC UA spec value for BadSecurityPolicyRejected,
  not BadDeviceFailure. The correct spec value for BadDeviceFailure is
  0x808B0000 (verified against OPC Foundation Opc.Ua.StatusCodes;
  corroborated locally by Driver.Galaxy.Runtime.StatusCodeMap and both
  Wonderware historian quality mappers which all hand-pin the correct
  value).

  The bug was duplicated across six driver modules:
    - FocasStatusMapper.BadDeviceFailure
    - AbCipStatusMapper.BadDeviceFailure
    - AbLegacyStatusMapper.BadDeviceFailure
    - TwinCATStatusMapper.BadDeviceFailure
    - ModbusDriver.StatusBadDeviceFailure
    - S7Driver.StatusBadDeviceFailure
  Plus the SnapshotFormatter shortlist that named 0x80550000 as
  BadDeviceFailure, and three downstream Modbus tests that asserted
  against the wrong value (so CI was blind).

  This commit fixes all six native-mapper constants, the formatter
  shortlist, and the three Modbus tests in one pass. Added a regression
  guard to FormatStatus_does_not_apply_pre_fix_wrong_names that pins
  0x80550000 never renders as BadDeviceFailure (mirroring the existing
  -001 wrong-name guards).

  Behavior change: OPC UA clients consuming the native drivers now see
  the canonical BadDeviceFailure (0x808B0000) on device-fault paths
  instead of the misnamed BadSecurityPolicyRejected (0x80550000). Wire-
  level status semantics now match operator-facing CLI labels.

Driver.Cli.Common-008 (Low, Testing):
  Deleted the redundant FormatStatus_names_native_driver_emitted_codes
  Theory — its five InlineData rows were already covered by the
  well-known Theory in the same commit (5a9c459), and used a weaker
  ShouldContain vs the well-known Theory's ShouldBe (exact match).

Verification:
  - Driver.Cli.Common.Tests: 43/43 pass (was 48 after the -008 deletion).
  - Driver.Modbus.Tests: 263/263 pass.
  - Driver.AbCip.Tests: 262/262.
  - Driver.AbLegacy.Tests: 157/157.
  - Driver.FOCAS.Tests: 178/178.
  - Driver.S7.Tests: 112/112.
  - Driver.TwinCAT.Tests: 131/131.
  Total: 1146 tests across the affected modules, all green.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-23 17:14:28 -04:00
Joseph Doherty 41e62b2663 docs(code-reviews): updated re-review at commit a9be809 — 12 new findings
Re-reviewed the four modules with source changes since the previous review
commit 76d35d1, per REVIEW-PROCESS.md section 6. Updated each findings.md
header (date 2026-05-23, commit a9be809) and appended new findings under
continued numbering. Regenerated README.md.

## New findings — 12 total across 4 modules

### Core.Scripting (5 new, IDs -012 to -016)
- **-012 High Security** — broadened BCL references (System.* + netstandard)
  re-expose System.Threading.ThreadPool / Timer / AssemblyLoadContext, which
  the analyzer's deny-list doesn't cover. Re-introduces the background-work
  threat Core.Scripting-003 closed via System.Threading.Tasks deny.
- **-013 Medium Security** — hand-rolled wrapper-source generation lets
  brace-balanced user source inject sibling methods/classes alongside
  CompiledScript.Run. Analyzer still gates forbidden types, but the
  documented 'method body' authoring contract is silently relaxed.
- **-014 Medium Concurrency** — CompiledScriptCache.Clear() uses key-only
  TryRemove(key, out _) — the same race the -006 resolution fixed in
  GetOrCompile's catch is latent here on publish-replace.
- **-015 Low Correctness** — ToCSharpTypeName truncates at first backtick;
  silently drops closed type arguments of nested-generic shapes (Outer<>.Inner<>).
  Latent — no production caller uses this shape today.
- **-016 Medium Performance** — VirtualTagEngine + ScriptedAlarmEngine call
  ScriptEvaluator.Compile directly without going through CompiledScriptCache,
  so the headline -008 collectible-ALC fix doesn't run on the actual
  production path — the per-publish leak is still in effect.

### Core.ScriptedAlarms (1 new, ID -013)
- **-013 Low Documentation** — new internal test accessors return the live
  mutable scratch dictionary; XML docs don't warn future test authors about
  the synchronisation contract.

### Driver.Cli.Common (2 new, IDs -007, -008)
- **-007 High Correctness** — 0x80550000 was added as BadDeviceFailure but
  the real OPC UA spec value for BadDeviceFailure is 0x808B0000 (verified
  against Driver.Galaxy.Runtime.StatusCodeMap and HistorianQualityMapper,
  both of which use the correct 0x808B0000). 0x80550000 is actually
  BadSecurityPolicyRejected. The native mappers (FOCAS / AbCip / AbLegacy)
  all use the wrong 0x80550000; this session's SnapshotFormatter extension
  propagated the wrong name and the test asserts against the same wrong
  value so CI is blind — same shape of bug as Driver.Cli.Common-001.
- **-008 Low Testing** — new FormatStatus_names_native_driver_emitted_codes
  Theory is redundant with the existing well-known Theory (same five
  InlineData rows added to both) and uses weaker ShouldContain assertion
  than the well-known Theory's ShouldBe.

### Driver.Galaxy (4 new, IDs -015 to -018)
- **-015 Medium Security** — vendored DLLs (libs/) have no recorded
  provenance: no source-commit SHA from the mxaccessgw repo, no SHA-256
  checksum in libs/README.md. Tampering / accidental swap undetectable.
- **-016 Medium Performance** — version skew between declared
  PackageReferences (Polly 8.5.2 / Grpc.Net.Client 2.71.0 /
  Microsoft.Extensions.Logging.Abstractions 10.0.0) and what the vendored
  DLL was actually built against (Polly.Core 8.6.6 / Grpc.Net.Client
  2.76.0 / Microsoft.Extensions.Logging.Abstractions 10.0.7). Latent now
  (assembly-version refs are loose) but precise shape that produces a
  runtime MissingMethodException.
- **-017 Low Design** — no contract-version handshake between the driver
  and the gateway; proto could evolve under the gateway without the
  driver noticing.
- **-018 Low Documentation** — libs/README.md points at the wrong sibling
  csproj as the version source-of-truth; missing SpecificVersion=false
  on the Reference items; missing mxaccessgw source-commit SHA.

## Particularly notable

Two findings undercut commits from this session:

- Driver.Cli.Common-007 invalidates commit 5a9c459 (which named 0x80550000
  as BadDeviceFailure across the cross-CLI shortlist).
- Core.Scripting-016 invalidates the production effect of commit 7b6ab2e
  (the collectible-ALC fix wired Dispose only via CompiledScriptCache,
  which the engines don't use).

The wider native-mapper miscoding behind -007 also affects three driver
modules outside this session's edit scope (FocasStatusMapper,
AbCipStatusMapper, AbLegacyStatusMapper all carry the wrong code).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-23 17:02:47 -04:00
Joseph Doherty 0001cdd579 fix(scripted-alarms): reuse per-alarm evaluation scratch on the hot path
Core.ScriptedAlarms-009 resolution: replace the per-call Dictionary +
AlarmPredicateContext allocation with a per-alarm reusable AlarmScratch
held in _scratchByAlarmId, refilled in place under _evalGate on each
evaluation. The hot path no longer allocates per upstream tag change.

Why this matters:
  On a busy line where many tags feeding many alarms change frequently,
  the old BuildReadCache allocated a fresh dictionary + context on every
  predicate evaluation — a steady stream of short-lived allocations the
  GC eventually has to reclaim. With the reuse, the dictionary and
  context are allocated once per alarm (on first evaluation) and refilled
  in place across every subsequent re-eval.

Implementation:
  - New private AlarmScratch class holds the reusable
    Dictionary<string, DataValueSnapshot> read cache (pre-sized to the
    alarm's Inputs.Count) and the AlarmPredicateContext that wraps it by
    reference. The context observes refilled values without being
    re-created.
  - ConcurrentDictionary<string, AlarmScratch> _scratchByAlarmId on the
    engine, cleared in LoadAsync alongside _alarms so a config-publish
    drops the prior generation's scratch (Inputs / Logger may change).
  - EvaluatePredicateToStateAsync looks up scratch via GetOrAdd, calls
    the new RefillReadCache(Dictionary, IReadOnlySet) helper to clear +
    repopulate the dictionary in place, then runs the predicate against
    the reused context.
  - BuildReadCache removed.

Safety:
  Reuse is serialised under _evalGate which guarantees no two threads
  ever observe the same scratch in a half-refilled state. The
  AlarmPredicateContext is bound to the scratch dictionary by reference,
  so the predicate's ctx.GetTag(path) sees the freshly-refilled values
  rather than a stale snapshot.

Verification:
  - All 66 ScriptedAlarms tests pass (was 63 — three new regression tests
    locking the reuse contract).
  - All 56 VirtualTags tests still pass (unchanged).
  - All 104 Core.Scripting tests still pass (unchanged).

New tests in ScriptedAlarmEngineTests:
  - Reevaluation_reuses_the_same_read_cache_dictionary — asserts
    ReferenceEquals(scratch_before, scratch_after) across two
    evaluations of the same alarm.
  - Reevaluation_reuses_the_same_predicate_context — same, for the
    context.
  - LoadAsync_drops_the_prior_generations_scratch — asserts a config
    publish wipes the prior scratch (so a stale Logger / Inputs can't
    leak into the new generation).

Internal test hooks TryGetScratchReadCacheForTest /
TryGetScratchContextForTest added via the existing
InternalsVisibleTo for the tests project. Kept internal — not part of
the public engine surface.

Docs:
  - docs/v2/Galaxy.Performance.md "Scripted-alarm engine" section
    rewritten as "hot-path allocation reuse" documenting the new
    contract + reuse safety reasoning + the three regression tests.
  - code-reviews/Core.ScriptedAlarms/findings.md -009 flipped
    Won't Fix → Resolved.
  - code-reviews/README.md regenerated.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-23 16:10:09 -04:00