docs(code-reviews): updated re-review at commit a9be809 — 12 new findings

Re-reviewed the four modules with source changes since the previous review commit 76d35d1, per REVIEW-PROCESS.md section 6. Updated each findings.md header (date 2026-05-23, commit a9be809) and appended new findings under continued numbering. Regenerated README.md. ## New findings — 12 total across 4 modules ### Core.Scripting (5 new, IDs -012 to -016) - **-012 High Security** — broadened BCL references (System.* + netstandard) re-expose System.Threading.ThreadPool / Timer / AssemblyLoadContext, which the analyzer's deny-list doesn't cover. Re-introduces the background-work threat Core.Scripting-003 closed via System.Threading.Tasks deny. - **-013 Medium Security** — hand-rolled wrapper-source generation lets brace-balanced user source inject sibling methods/classes alongside CompiledScript.Run. Analyzer still gates forbidden types, but the documented 'method body' authoring contract is silently relaxed. - **-014 Medium Concurrency** — CompiledScriptCache.Clear() uses key-only TryRemove(key, out _) — the same race the -006 resolution fixed in GetOrCompile's catch is latent here on publish-replace. - **-015 Low Correctness** — ToCSharpTypeName truncates at first backtick; silently drops closed type arguments of nested-generic shapes (Outer<>.Inner<>). Latent — no production caller uses this shape today. - **-016 Medium Performance** — VirtualTagEngine + ScriptedAlarmEngine call ScriptEvaluator.Compile directly without going through CompiledScriptCache, so the headline -008 collectible-ALC fix doesn't run on the actual production path — the per-publish leak is still in effect. ### Core.ScriptedAlarms (1 new, ID -013) - **-013 Low Documentation** — new internal test accessors return the live mutable scratch dictionary; XML docs don't warn future test authors about the synchronisation contract. ### Driver.Cli.Common (2 new, IDs -007, -008) - **-007 High Correctness** — 0x80550000 was added as BadDeviceFailure but the real OPC UA spec value for BadDeviceFailure is 0x808B0000 (verified against Driver.Galaxy.Runtime.StatusCodeMap and HistorianQualityMapper, both of which use the correct 0x808B0000). 0x80550000 is actually BadSecurityPolicyRejected. The native mappers (FOCAS / AbCip / AbLegacy) all use the wrong 0x80550000; this session's SnapshotFormatter extension propagated the wrong name and the test asserts against the same wrong value so CI is blind — same shape of bug as Driver.Cli.Common-001. - **-008 Low Testing** — new FormatStatus_names_native_driver_emitted_codes Theory is redundant with the existing well-known Theory (same five InlineData rows added to both) and uses weaker ShouldContain assertion than the well-known Theory's ShouldBe. ### Driver.Galaxy (4 new, IDs -015 to -018) - **-015 Medium Security** — vendored DLLs (libs/) have no recorded provenance: no source-commit SHA from the mxaccessgw repo, no SHA-256 checksum in libs/README.md. Tampering / accidental swap undetectable. - **-016 Medium Performance** — version skew between declared PackageReferences (Polly 8.5.2 / Grpc.Net.Client 2.71.0 / Microsoft.Extensions.Logging.Abstractions 10.0.0) and what the vendored DLL was actually built against (Polly.Core 8.6.6 / Grpc.Net.Client 2.76.0 / Microsoft.Extensions.Logging.Abstractions 10.0.7). Latent now (assembly-version refs are loose) but precise shape that produces a runtime MissingMethodException. - **-017 Low Design** — no contract-version handshake between the driver and the gateway; proto could evolve under the gateway without the driver noticing. - **-018 Low Documentation** — libs/README.md points at the wrong sibling csproj as the version source-of-truth; missing SpecificVersion=false on the Reference items; missing mxaccessgw source-commit SHA. ## Particularly notable Two findings undercut commits from this session: - Driver.Cli.Common-007 invalidates commit 5a9c459 (which named 0x80550000 as BadDeviceFailure across the cross-CLI shortlist). - Core.Scripting-016 invalidates the production effect of commit 7b6ab2e (the collectible-ALC fix wired Dispose only via CompiledScriptCache, which the engines don't use). The wider native-mapper miscoding behind -007 also affects three driver modules outside this session's edit scope (FocasStatusMapper, AbCipStatusMapper, AbLegacyStatusMapper all carry the wrong code). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-23 17:02:47 -04:00
parent a9be80923c
commit 41e62b2663
5 changed files with 594 additions and 35 deletions
@@ -4,16 +4,18 @@
 |---|---|
 | Module | `src/Core/ZB.MOM.WW.OtOpcUa.Core.ScriptedAlarms` |
 | Reviewer | Claude Code |
-| Review date | 2026-05-22 |
-| Commit reviewed | `76d35d1` |
+| Review date | 2026-05-23 |
+| Commit reviewed | `a9be809` |
 | Status | Reviewed |
-| Open findings | 0 |
+| Open findings | 1 |

 ## Checklist coverage

 A comprehensive review completes every category, recording "No issues found" where
 a category produced nothing rather than leaving it blank.

+### 2026-05-22 review (commit `76d35d1`)
+
 | # | Category | Result |
 |---|---|---|
 | 1 | Correctness & logic bugs | Core.ScriptedAlarms-002 |
@@ -27,6 +29,26 @@ a category produced nothing rather than leaving it blank.
 | 9 | Testing coverage | Core.ScriptedAlarms-012 |
 | 10 | Documentation & comments | Core.ScriptedAlarms-003 |

+### 2026-05-23 re-review (commit `a9be809`)
+
+Focused re-review of the Core.ScriptedAlarms-009 resolution (commit `0001cdd`) —
+new `AlarmScratch` class, `_scratchByAlarmId` ConcurrentDictionary, `RefillReadCache`
+helper, and internal test accessors. Only the changed/new code since `76d35d1` was
+re-examined; existing closed findings stay as audit trail.
+
+| # | Category | Result |
+|---|---|---|
+| 1 | Correctness & logic bugs | No issues found |
+| 2 | OtOpcUa conventions | No issues found |
+| 3 | Concurrency & thread safety | No issues found |
+| 4 | Error handling & resilience | No issues found |
+| 5 | Security | No issues found |
+| 6 | Performance & resource management | No issues found |
+| 7 | Design-document adherence | No issues found |
+| 8 | Code organization & conventions | No issues found |
+| 9 | Testing coverage | No issues found |
+| 10 | Documentation & comments | Core.ScriptedAlarms-013 |
+
 ## Findings

 ### Core.ScriptedAlarms-001
@@ -224,3 +246,21 @@ green (was 63).
 **Recommendation:** Add engine-level tests that inject a controllable `Func<DateTime>` clock to drive `RunShelvingCheck`, cover the remaining Part 9 engine methods end-to-end, assert subscriber-exception isolation, and add a store-failure fake to lock in the chosen persistence-failure semantics from finding 007.

 **Resolution:** Resolved 2026-05-22 — added 8 new engine-level tests covering all 6 gap areas: injectable-clock timed-shelve expiry via `RunShelvingCheckForTest`, `ConfirmAsync`/`TimedShelveAsync`/`UnshelveAsync`/`EnableAsync` end-to-end, subscriber-exception isolation, store-failure invariant, second-`LoadAsync` timer-leak regression, and `AreInputsReady` Bad/Uncertain guard; exposed `RunShelvingCheckForTest()` internal hook on the engine.
+
+### Core.ScriptedAlarms-013
+
+| Field | Value |
+|---|---|
+| Severity | Low |
+| Category | Documentation & comments |
+| Location | `ScriptedAlarmEngine.cs:66-81` |
+| Status | Open |
+
+**Description:** The new internal test accessors `TryGetScratchReadCacheForTest` and `TryGetScratchContextForTest` (introduced by the Core.ScriptedAlarms-009 resolution at `0001cdd`) return the *live* per-alarm scratch — the same `Dictionary<string, DataValueSnapshot>` instance the engine clears and refills in `RefillReadCache` under `_evalGate`, plus the `AlarmPredicateContext` that wraps it by reference. The XML docs describe the intended use ("assert the scratch is reused across evaluations (two reads return the same instance)") but do not explicitly warn that:
+
+1. The returned `IReadOnlyDictionary` is the engine's mutable working set. Enumerating it from a test thread while the engine is mid-evaluation (e.g. during a `ReevaluateAsync` queued by `OnUpstreamChange`, or a `ShelvingCheckAsync` callback) is a concurrent-read-while-writer scenario against a plain `Dictionary` — undefined behaviour, can throw `InvalidOperationException` or return torn data.
+2. Reference-equality comparisons (`ReferenceEquals(a, b)`) and single-key indexer reads (`dict["Temp"]`) on a quiesced engine are the only safe uses. The existing regression tests stay within those bounds, but a future test author has no in-code signal that broader reads are unsafe.
+
+The engine itself is correct — `RefillReadCache` runs only under `_evalGate`, so the engine never tears its own state. The risk is purely on the test-side contract.
+
+**Recommendation:** Add a `<remarks>` block to both `TryGetScratchReadCacheForTest` and `TryGetScratchContextForTest` stating that the returned references point at live engine state, that reads are only safe when the engine is known to be idle (no in-flight `ReevaluateAsync`/`ShelvingCheckAsync`/`LoadAsync`), and that the intended uses are reference-identity assertions plus single-key lookups against a quiesced engine — never enumeration. No code change required; the engine's correctness depends on `_evalGate`, which is already documented.